WO2017206667A1 - 分布式部署Hadoop集群的方法及装置 - Google Patents

分布式部署Hadoop集群的方法及装置 Download PDF

Info

Publication number
WO2017206667A1
WO2017206667A1 PCT/CN2017/083207 CN2017083207W WO2017206667A1 WO 2017206667 A1 WO2017206667 A1 WO 2017206667A1 CN 2017083207 W CN2017083207 W CN 2017083207W WO 2017206667 A1 WO2017206667 A1 WO 2017206667A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
task
host
hadoop cluster
deployment
Prior art date
Application number
PCT/CN2017/083207
Other languages
English (en)
French (fr)
Inventor
高林林
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017206667A1 publication Critical patent/WO2017206667A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present application relates to the field of communications, for example, to a method and apparatus for distributed deployment of a Hadoop cluster.
  • Hadoop is a distributed system infrastructure, a distributed infrastructure developed by the Apache Foundation. Hadoop is not an abbreviation, but a fictitious name, a toy name that is said to be possible with the children of the cluster creator. Related, no practical meaning. Hadoop is a software platform and open source software framework for developing and running large-scale data. It realizes distributed computing of massive data in a cluster of large numbers of computers. Users can develop distributed without knowing the details of distributed underlying details. The program takes full advantage of the power of the cluster for high-speed computing and storage.
  • the management personnel of the distributed deployment Hadoop cluster understand the Hadoop ecosystem and the hardware resources of each host in the cluster, and put forward high requirements for deploying Hadoop cluster managers, and are prone to errors.
  • Manually configuring a Hadoop cluster is cumbersome and inefficient.
  • flexible management such as dynamic capacity expansion and capacity reduction is difficult.
  • the Hadoop cluster component version package download source is single, which causes the Hadoop cluster deployment time to be uncontrollable.
  • Hadoop cluster deployment requires high requirements for operation and maintenance personnel. It needs to be familiar with the Hadoop ecosystem; understand the resource information of each node in the cluster; design the Hadoop cluster network topology; 2. Hadoop cluster component node allocation is arbitrary; 3. Hadoop cluster deployment time is longer. .
  • the embodiments of the present disclosure provide a method and an apparatus for deploying a Hadoop cluster in a distributed manner, so as to at least solve the problem that the operation is complicated and the deployment time is long due to artificial deployment of a Hadoop cluster in the related art.
  • a method for distributed deployment of a Hadoop cluster including: collecting parameter information of at least one host of the Hadoop cluster according to host information of a Hadoop cluster, where each of the hosts is configured To deploy at least one component, the component is deployed by a proxy configured to perform a corresponding task; deploying a task to at least one of the components according to task information of the Hadoop cluster and the parameter information.
  • the method further includes:
  • the parameter information includes at least one of the following: host operating system information, host network information, host CPU information, host memory information, host CPU utilization, host memory usage, host disk IO usage, and host network. Delay, host average IO operation wait time, host disk information, process information of components in the host.
  • the deploying the task to the at least one component in the Hadoop cluster according to the task information and the parameter information includes: generating a deployment task list according to the task information and the parameter information, where the deployment task list The task information, the parameter information required to execute the task, and the priority of the task are selected; and the task with the highest priority is selected from the deployment task list and sent to the corresponding component.
  • the priority is related to an attribute of the task and/or the parameter information for performing the task.
  • the method further includes: monitoring task execution progress and/or log information of the at least one component.
  • the template information includes at least one of the following: a number of Hadoop cluster hosts, Hadoop cluster component information to be deployed, a Hadoop distributed file system HDFS copy number, Hadoop. Number of client connections and timeouts, host network addresses, host user names and passwords, log storage disk information, data storage disk information, and metadata storage disk information for each component of the cluster.
  • the method further includes: parsing the template information and verifying the validity of the template information.
  • an apparatus for distributed deployment of a Hadoop cluster including: an acquisition module, configured to collect parameter information of at least one host of the Hadoop cluster according to host information of a Hadoop cluster, where Each of the hosts includes at least one component deployed by a proxy configured to perform a corresponding task; a deployment module configured to associate at least one of the task information and the parameter information of the Hadoop cluster Component deployment tasks.
  • the device further includes:
  • a receiving module configured to receive template information for deploying the Hadoop cluster, where the template information is used to indicate the task information and the host information of the Hadoop cluster, and the task information is used to describe a requirement The tasks completed by the Hadoop cluster.
  • the deployment module further includes: a generating unit, configured to generate a deployment task list according to the task information and the parameter information, where the deployment task list includes the task information, and required to execute the task The parameter information, and the priority of the task; the selecting unit is configured to select the task with the highest priority from the deployment task list and send the task to the corresponding component.
  • a generating unit configured to generate a deployment task list according to the task information and the parameter information, where the deployment task list includes the task information, and required to execute the task The parameter information, and the priority of the task
  • the selecting unit is configured to select the task with the highest priority from the deployment task list and send the task to the corresponding component.
  • the device further includes: a monitoring module, configured to monitor task execution of the at least one component after the deployment module deploys a task to the at least one component according to the template information and the parameter information Progress and / or log information.
  • a monitoring module configured to monitor task execution of the at least one component after the deployment module deploys a task to the at least one component according to the template information and the parameter information Progress and / or log information.
  • Embodiments of the present disclosure also provide a non-transitory computer readable storage medium storing computer executable instructions arranged to perform the above method.
  • An embodiment of the present disclosure further provides an electronic device, including:
  • At least one processor At least one processor
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to cause the at least one processor to perform the method described above.
  • the template information for the deployment of the Hadoop cluster is used to indicate the task information and the host information of the Hadoop cluster, where the task information is used to describe a task that needs to be completed by the Hadoop cluster.
  • the task information and the parameter information deploy tasks to at least one of the components. Because the task information and the host information are received, and the load status of the host and the component is obtained by collecting the parameter information, the tasks of the hosts and components of the Hadoop cluster can be properly deployed, which can solve the complicated operation caused by artificially deploying the Hadoop cluster in the related technology. , the problem of long deployment time.
  • FIG. 1 is a general structural diagram of a distributed deployment Hadoop cluster according to an embodiment of the present disclosure
  • FIG. 2 is a flow diagram of a method of distributed deployment of a Hadoop cluster in accordance with an embodiment of the present disclosure
  • FIG. 3 is a structural block diagram of an apparatus for distributed deployment of a Hadoop cluster in accordance with an embodiment of the present disclosure
  • FIG. 4 is a block diagram 1 of an optional structure of an apparatus for distributed deployment of a Hadoop cluster according to an embodiment of the present disclosure
  • FIG. 5 is a block diagram 2 of an optional structure of an apparatus for distributed deployment of a Hadoop cluster according to an embodiment of the present disclosure
  • FIG. 6 is a structural diagram of an agent in a distributed deployment Hadoop cluster system according to this embodiment.
  • FIG. 9 is a sequence diagram of a Hadoop cluster deployment method of this embodiment.
  • FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 1 is a schematic structural diagram of a distributed deployment of a Hadoop cluster according to an embodiment of the present disclosure.
  • the network architecture includes: deploying a Hadoop cluster.
  • the management system and the Hadoop cluster wherein the management system for deploying the Hadoop cluster includes various functional modules and execution proxy nodes, and the Hadoop cluster also includes a plurality of distributed proxy nodes for performing tasks, and the deployment system and the Hadoop cluster are used for communication connection.
  • FIG. 2 is a flowchart of a method for distributedly deploying a Hadoop cluster according to an embodiment of the present disclosure, as shown in FIG. 2 . As shown, the process includes the following steps:
  • Step S202 receiving template information for deploying a Hadoop cluster, where the template information is used to indicate task information and host information of the Hadoop cluster, and the task information is used to describe tasks required for the Hadoop cluster to be completed;
  • Step S204 Collect parameter information of at least one host of the Hadoop cluster according to the host information, where each host is configured to deploy at least one component, the component is deployed by the agent, configured to perform a corresponding task; optionally, the deployment task Executed by the agent.
  • Step S206 deploying a task to at least one component according to the task information and the parameter information.
  • the template information for deploying a Hadoop cluster is received, where the template information is used to indicate task information and host information of the Hadoop cluster, and the task information is used to describe tasks required by the Hadoop cluster; and at least the Hadoop cluster is collected according to the host information.
  • Parameter information of a host wherein each host is configured to deploy at least one component, the component is deployed by the agent, configured to perform a corresponding task; and the task is deployed to the at least one component according to the task information and the parameter information. Since the task information and the host information are received, and the load status of the host and the component is obtained by collecting the parameter information, the tasks of each host and component of the Hadoop cluster can be reasonably deployed, and the related technologies can be solved.
  • the artificial deployment of Hadoop clusters leads to complex operations and long deployment time.
  • the execution body of the foregoing step may be a control end of a Hadoop cluster, a client, etc., but is not limited thereto.
  • the parameter information may be, but is not limited to, host operating system information, host network information, host CPU information (such as core number, primary frequency size), host memory information, host CPU utilization, host memory usage, host disk. IO usage, host network latency, host average IO operation wait time, host disk information, process information for components within the host.
  • the template information may be, but is not limited to, the number of hosts in the Hadoop cluster, the Hadoop cluster component information to be deployed, the number of HDFS copies of the Hadoop distributed file system, the number of client connections and timeouts of the Hadoop cluster components, and the host network. Address, host user name and password, log storage disk information, data storage disk information, metadata storage disk information.
  • the deploying tasks to the at least one component in the Hadoop cluster according to the task information and the parameter information includes:
  • the priority is related to the attributes of the task and/or the parameter information of the execution task.
  • the method further includes:
  • the method further includes: parsing the template information and verifying the validity of the template information. If the template information is legal, go to the next step.
  • the legal deployment template must include at least but not limited to the following: the number of Hadoop cluster nodes, the Hadoop cluster component information to be deployed, the number of HDFS replicas, the number of client connections and timeouts of Hadoop cluster components, the host network address, and the username. And password, log storage disk, data storage disk, metadata storage disk and other information.
  • a device for deploying a Hadoop cluster is also provided, and the device is configured to implement the foregoing embodiments and implementation manners, and details are not described herein.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • FIG. 3 is a structural block diagram of an apparatus for distributed deployment of a Hadoop cluster according to an embodiment of the present disclosure. As shown in FIG. 3, the apparatus includes:
  • the receiving module 30 is configured to receive template information for deploying a Hadoop cluster, where the template information is used to indicate task information and host information of the Hadoop cluster, and the task information is used to describe tasks required for the Hadoop cluster to complete;
  • the collecting module 32 is configured to collect parameter information of at least one host of the Hadoop cluster according to the host information, where each host includes at least one component deployed by the agent and configured to perform a corresponding task;
  • the deployment module 34 is configured to deploy tasks to at least one component based on task information and parameter information.
  • the parameter information may be, but is not limited to, host operating system information, host network information, host CPU information (such as core number, primary frequency size), host memory information, host CPU utilization, host memory usage, host disk. IO usage, host network latency, host average IO operation wait time, host disk information, process information for components within the host.
  • the template information may be, but is not limited to, the number of hosts in the Hadoop cluster, the Hadoop cluster component information to be deployed, the number of HDFS copies of the Hadoop distributed file system, the number of client connections and timeouts of the Hadoop cluster components, and the host network. Address, host user name and password, log storage disk information, data storage disk information, metadata storage disk information.
  • FIG. 4 is a block diagram of an optional structure of an apparatus for distributed deployment of a Hadoop cluster according to an embodiment of the present disclosure.
  • the device includes, in addition to all the modules shown in FIG. 3, the deployment module 34 further includes:
  • the generating unit 40 is configured to generate a deployment task list according to the task information and the parameter information, where the deployment task list includes task information, parameter information required to perform the task, and priority of the task;
  • the selecting unit 42 is configured to select the task with the highest priority from the deployment task list and send it to the corresponding component.
  • the apparatus includes: a monitoring module 50, in addition to all the modules shown in FIG.
  • the task execution schedule and/or log information of the at least one component is monitored after the deployment module deploys the task to the at least one component according to the template information and the parameter information.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the above modules are in any combination.
  • the forms are located in different processors.
  • This embodiment provides a distributed deployment Hadoop cluster method and system. It overcomes the shortcomings of high requirements for deploying Hadoop cluster managers, arbitrary allocation of Hadoop cluster component nodes, and single download source of installation packages.
  • the present disclosure fully utilizes the hardware resources in the cluster and the load conditions of each host to implement a one-click distributed deployment of Hadoop clusters.
  • a distributed deployment Hadoop cluster system of this embodiment includes the following components, as shown in Figure 1, including:
  • the deployment template includes but is not limited to the following: host network address, user name, password, Hadoop component information, node number information, and mount disk information.
  • the template parser parses the template information input by the user and performs legality verification.
  • the monitor is responsible for receiving the Hadoop component deployment task execution and log processing sent by the agent.
  • the collector is responsible for receiving host information sent by the agent (including but not limited to the following: operating system information, CPU information, memory information, network information, CPU utilization, memory usage, disk IO usage, network latency) Etc.) and persist.
  • Task generator The task generator generates the host information and deployment template information collected by the collector. Hadoop component deployment task list.
  • the task scheduler sends a high-priority deployment task to the agent according to the host information collected by the collector, the host load status, and the deployment task list.
  • the agent contains components such as collectors, deployers, parameter configurators, monitors, and more.
  • the collector is responsible for periodically collecting the host information and sending it to the collector of the system;
  • the deployer receives and executes the task delivered by the task scheduler;
  • the parameter configurator is responsible for configuring the Hadoop component configuration files;
  • the monitor is responsible for monitoring the deployment task execution and log collection.
  • FIG. 6 is a structural structural diagram of an agent in a distributed deployment Hadoop cluster system according to the embodiment, as shown in FIG. 6.
  • the distributed deployment Hadoop cluster method of this embodiment includes the following:
  • the agent deployment task is generated by the task generator and scheduled by the task scheduler. After the agent deployment is completed, the collector periodically collects node resource information and feeds back to the management system.
  • the monitor of the distributed deployment Hadoop cluster system receives the deployment template submitted by the user.
  • the parser parses the Hadoop cluster deployment template and verifies the validity of the template.
  • the topology generator Based on the deployment template and resource information submitted by the user, the topology generator generates a Hadoop cluster network topology map.
  • the component deployment task is generated by the task generator according to the Hadoop cluster network topology structure.
  • Task scheduler performs deployment tasks
  • the task scheduler extracts the task to be executed and the resource information of each node from the task list, and generates a task sequence to be executed.
  • the task scheduler takes out the high-priority deployment task and sends it to the corresponding agent.
  • the deployer After the host agent receives the deployment task, the deployer performs the deployment task; the agent monitor feedbacks the deployment task execution progress to the monitor of the deployment system in real time, and the monitor notifies the task scheduler to continue scheduling the task execution. Repeat the step "Task Scheduler Perform Deployment Task" until all tasks to be deployed have been executed.
  • the nodes of the Hadoop cluster component are allocated reasonably according to the cluster resources; in the deployment process, the deployment task is dynamically allocated according to the collected host load, and the Hadoop cluster is deployed in a one-click distributed manner.
  • the disclosure effectively solves the shortcomings of deploying large-scale Hadoop clusters, long deployment time, and high pressure on the deployment system.
  • FIG. 8 is a flowchart of a method for deploying a Hadoop cluster according to this embodiment.
  • FIG. 9 is a sequence diagram of a Hadoop cluster deployment method according to this embodiment.
  • FIG. 9 , FIG. 8 and FIG. Examples include:
  • System initialization Distributed deployment Hadoop cluster system startup, the system needs to be initialized, including initialization monitor, collector and agent A1.
  • Agent deployment The first deployment is performed by the agent A1 to deploy the agent A2 task. After the agent A2 is deployed, the agent A2 is initialized and started; then the agents A3 and A4 are executed by the agents A1 and A2, and so on. Until all host agent deployments in the cluster are complete ( Figure 7).
  • the legal deployment template must include at least but not limited to the following: the number of Hadoop cluster nodes, the Hadoop cluster component information to be deployed, the number of HDFS replicas, the number of client connections and timeouts of Hadoop cluster components, the host network address, and the username. And password, log storage disk, data storage disk, metadata storage disk and other information.
  • the template parser After receiving the template information, the template parser first verifies the validity of the template. If the template does not meet the contract requirements, the template is terminated. If the template is legal, the template is parsed.
  • the topology map generator generates a Hadoop cluster network topology map.
  • Hadoop cluster networking topology map (such as S1).
  • Hadoop cluster component deployment rules include, but are not limited to: 1. Assigning Hadoop component Master and Slave nodes according to hardware resources and host load conditions; 2. Calculating and assigning ZOOKEEPER nodes according to the number of nodes in the cluster; 3. According to the number of HDFS nodes, Calculate the number of Journalnode nodes and assign them.
  • Hadoop component deployment tasks include, but are not limited to, the following information: component name (such as HDFS), node name (such as: NameNode), host network address, task priority, and so on.
  • the deployment task generator generates a deployment task according to the Hadoop cluster networking topology diagram.
  • the task scheduler scans the deployment task list, extracts the unexecuted deployment task from the task list, and calculates the host load in the cluster according to the node resource information (can examine the average load, memory utilization, disk IO utilization, and network delay indicator) , generating a prioritized deployment task sequence (such as S4).
  • the task scheduler selects the high priority deployment task in turn, and sends the deployment task to the agent of the corresponding host.
  • a Hadoop cluster component deployment task of agent A2 is deployed by agent A1.
  • the monitor of agent A1 monitors the execution of the deployment task and feeds back to the monitor of the deployment system (such as S10).
  • the task scheduler regenerates the task sequence according to the task list and the resource information (such as S5), and the task scheduler selects the high priority tasks T3 and T4, and the agent A1 and A2 go to the agent.
  • A3, A4 deployment tasks, and so on (such as S11 to S14).
  • the entire cluster has 2t-1 agents executing the Hadoop component task.
  • each agent can open multiple threads and deploy Hadoop component tasks to multiple (such as 2) agents.
  • the tth time (t is greater than 0) the entire Hadoop cluster has 3t- 1 agent is performing the deployment of Hadoop component tasks.
  • the agent A1 is set up with the distributed deployment Hadoop cluster management system.
  • the parameter configuration task completes the configuration of each component of the Hadoop cluster.
  • the scheduler needs to collect the information about the components of the entire Hadoop cluster (for example, the host name of the node where the master and the slave are located, the log storage disk, the data storage disk, the metadata storage disk, and so on) and send it to each host agent along with the parameter configuration task.
  • the parameter configurator in the component After all parameter configuration tasks in the cluster are executed, the entire Hadoop cluster components are deployed.
  • the collector in the agent component periodically collects hardware resources and running state information of the host, and reports the data to the collector in the deployment system to store the node resources.
  • the hardware resources and running status information include but are not limited to the following contents: operating system information, host name, CPU information, memory information, disk, process information, CPU utilization, memory utilization, disk IO utilization, network information, and average IO. Operation wait time, etc.
  • each node resource (including host and Hadoop component information) collected by the monitor collector.
  • Embodiments of the present disclosure also provide a storage medium, such as a non-transitory computer readable storage medium storing computer executable instructions arranged to perform the above method.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • S1 Receive template information for deploying a Hadoop cluster, where the template information is used to indicate task information and host information of the Hadoop cluster, and the task information is used to describe tasks required for the Hadoop cluster to be completed;
  • S2 Collect parameter information of at least one host of the Hadoop cluster according to the host information, where each host is configured to deploy at least one component, where the component is deployed by the agent, configured to perform a corresponding task;
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk e.g., a hard disk
  • magnetic memory e.g., a hard disk
  • the processor performs to receive template information for deploying a Hadoop cluster according to the stored program code in the storage medium, where the template information is used to indicate task information and host information of the Hadoop cluster, and task information. Used to describe the tasks that need to be completed by the Hadoop cluster;
  • the processor performs parameter information of collecting at least one host of the Hadoop cluster according to the host information according to the stored program code in the storage medium, where each host is configured to deploy at least one component, the component Deployed by the agent, configured to perform the corresponding task;
  • the processor performs a task of deploying the at least one component according to the task information and the parameter information according to the stored program code in the storage medium.
  • the embodiment of the present disclosure further provides a schematic structural diagram of an electronic device.
  • the electronic device includes:
  • At least one processor 100 which is exemplified by a processor 100 in FIG. 10; and a memory 101, may further include a communication interface 102 and a bus 103.
  • the processor 100, the communication interface 102, and the memory 101 can complete communication with each other through the bus 103.
  • Communication interface 102 can be used for information transmission.
  • the processor 100 can call logic instructions in the memory 101 to perform the methods of the above-described embodiments.
  • logic instructions in the memory 101 described above may be implemented in the form of a software functional unit and sold or used as a stand-alone product, and may be stored in a computer readable storage medium.
  • the memory 101 is a computer readable storage medium, and can be used to store a software program, a computer executable program, a program instruction/module corresponding to the method in the embodiment of the present disclosure.
  • the processor 100 executes the functional application and the data processing by executing the software programs, the instructions, and the modules stored in the memory 101, that is, the method for implementing the distributed deployment of the Hadoop cluster in the foregoing method embodiments.
  • the memory 101 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the terminal device, and the like. Further, the memory 101 may include a high speed random access memory, and may also include a nonvolatile memory.
  • the technical solution of the embodiments of the present disclosure may be embodied in the form of a software product stored in a storage medium, including at least one instruction for causing a computer device (which may be a personal computer, a server, a network device, etc.) Performing all or part of the steps of the method of the embodiments of the present disclosure.
  • the foregoing storage medium may be a non-transitory storage medium, including: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like. a medium that can store program code, or it can be temporary State storage medium.
  • modules or steps of the present disclosure described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. As such, the disclosure is not limited to any specific combination of hardware and software.
  • the method and device for deploying a distributed Hadoop cluster provided by the present application solve the problem that the operation is complicated and the deployment time is long due to artificial deployment of a Hadoop cluster in the related art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Multi Processors (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种分布式部署Hadoop集群的方法及装置,其中,该方法包括:接收用于部署Hadoop集群的模板信息,其中,模板信息用于指示Hadoop集群的任务信息和主机信息,任务信息用于描述需要Hadoop集群完成的任务(S202);根据主机信息采集Hadoop集群的至少一个主机的参数信息,其中,每个主机被配置为部署至少一个组件,组件由代理器(A1、A2、An)部署,被配置为执行对应的任务(S204);根据任务信息和参数信息对至少一个组件部署任务(S206)。

Description

分布式部署Hadoop集群的方法及装置 技术领域
本申请涉及通信领域,例如涉及一种分布式部署Hadoop集群的方法及装置。
背景技术
相关技术的Hadoop是一个分布式系统基础架构,是由Apache基金会所开发的分布式基础架构,Hadoop不是一个缩写,而是一个虚构的名字,据称可能与该集群创建者的孩子的一个玩具名字相关,没有实际的意义。Hadoop是一个开发和运行处理大规模数据的软件平台及开源软件框架,实现在大量计算机组成的集群中对海量数据进行分布式计算,用户可以在不了解分布式底层细节的情况下,开发分布式程序,充分利用集群的威力高速运算和存储。
相关技术中,分布式部署Hadoop集群的需要管理人员了解Hadoop生态圈及集群内各主机硬件资源情况,对部署Hadoop集群管理人员提出了较高要求,而且容易出错。采用手动配置Hadoop集群,步骤繁琐,效率低下,特别是大规模Hadoop集群环境下,动态扩容和缩容等弹性管理困难。
然而,目前实现Hadoop自动化部署的系统存在以下问题:
在部署Hadoop集群前,根据集群环境软硬件信息及部署的组件,设计Hadoop集群网络拓扑结构;该方案对集群管理人员要求较高,需要集群管理人员熟悉环境软硬件信息和Hadoop生态圈;在没有集群管理人员干预的情况下,自动化部署系统则任意分配Master和Slave等节点,无法合理分配和利用集群硬件及系统负载信息;
Hadoop集群组件版本包下载源单一,导致Hadoop集群部署时间不可控等缺点。
Hadoop集群部署对运维人员提出较高要求,需要其熟悉Hadoop生态圈;了解集群内各节点资源信息;设计Hadoop集群网络拓扑;2、Hadoop集群组件节点分配任意;3、Hadoop集群部署时间较长。
针对相关技术中存在的上述问题,目前尚未发现有效的解决方案。
发明内容
本公开实施例提供了一种分布式部署Hadoop集群的方法及装置,以至少解决相关技术中由于人为部署Hadoop集群导致操作复杂,部署时间长的问题。
根据本公开的一个实施例,提供了一种分布式部署Hadoop集群的方法,包括:根据Hadoop集群的主机信息采集所述Hadoop集群的至少一个主机的参数信息,其中,每个所述主机被配置为部署至少一个组件,所述组件由代理器部署,被配置为执行对应的任务;根据所述Hadoop集群的任务信息和所述参数信息对至少一个所述组件部署任务。
可选地,所述方法还包括:
接收用于部署所述Hadoop集群的模板信息,其中,所述模板信息用于指示所述Hadoop集群的所述任务信息和所述主机信息,所述任务信息用于描述需要所述Hadoop集群完成的任务。
可选地,所述参数信息包括以下至少之一:主机操作系统信息、主机网络信息、主机CPU信息、主机内存信息、主机CPU利用率、主机内存使用率、主机磁盘IO使用率、主机网络时延、主机平均IO操作等待时间、主机磁盘信息、主机内组件的进程信息。
可选地,根据所述任务信息和所述参数信息对所述Hadoop集群内的至少一个组件部署任务包括:根据所述任务信息和所述参数信息生成部署任务列表,其中,所述部署任务列表包括所述任务信息、执行所述任务所需的所述参数信息,以及所述任务的优先级;从所述部署任务列表中选择优先级最高的任务下发给对应的组件。
可选地,所述优先级与所述任务的属性和/或执行所述任务的所述参数信息相关。
可选地,在根据所述模板信息和所述参数信息对至少一个所述组件部署任务之后,所述方法还包括:监控所述至少一个组件的任务执行进度和/或日志信息。
可选地,所述模板信息包括以下至少之一:Hadoop集群主机个数、需要部署的Hadoop集群组件信息、Hadoop分布式文件系统HDFS副本个数、Hadoop 集群各组件客户端连接数和超时时间、主机网络地址、主机用户名及密码、日志存储盘信息、数据存储盘信息、元数据存储盘信息。
可选地,在接收用于部署Hadoop集群的模板信息之后,所述方法还包括:解析所述模板信息并验证所述模板信息的合法性。
根据本公开的另一个实施例,提供了一种分布式部署Hadoop集群的装置,包括:采集模块,被配置为根据Hadoop集群的主机信息采集所述Hadoop集群的至少一个主机的参数信息,其中,每个所述主机包括至少一个组件,所述组件由代理器部署,被配置为执行对应的任务;部署模块,被配置为根据所述Hadoop集群的任务信息和所述参数信息对至少一个所述组件部署任务。
可选地,所述装置还包括:
接收模块,被配置为接收用于部署所述Hadoop集群的模板信息,其中,所述模板信息用于指示所述Hadoop集群的所述任务信息和所述主机信息,所述任务信息用于描述需要所述Hadoop集群完成的任务。
可选地,部署模块还包括:生成单元,被配置为根据所述任务信息和所述参数信息生成部署任务列表,其中,所述部署任务列表包括所述任务信息、执行所述任务所需的所述参数信息,以及所述任务的优先级;选择单元,被配置为从所述部署任务列表中选择优先级最高的任务下发给对应的组件。
可选地,所述装置还包括:监控模块,被配置为在所述部署模块根据所述模板信息和所述参数信息对至少一个所述组件部署任务之后,监控所述至少一个组件的任务执行进度和/或日志信息。
本公开实施例还提供了一种非暂态计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述方法。
本公开实施例还提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器执行上述的方法。
通过本公开,接收用于部署Hadoop集群的模板信息,其中,所述模板信息用于指示所述Hadoop集群的任务信息和主机信息,所述任务信息用于描述需要所述Hadoop集群完成的任务;根据所述主机信息采集所述Hadoop集群的至少一个主机的参数信息,其中,每个所述主机被配置为部署至少一个组件,所述组件由代理器部署,被配置为执行对应的任务;根据所述任务信息和所述参数信息对至少一个所述组件部署任务。由于接收了任务信息和主机信息,并通过采集参数信息获取了主机和组件的负载情况,因此可以合理对Hadoop集群的各个主机和组件部署任务,可以解决相关技术中由于人为部署Hadoop集群导致操作复杂,部署时间长的问题。
附图概述
此处所说明的附图用来提供对本公开的理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1为本公开实施例的分布式部署Hadoop集群的总体结构框架图;
图2是根据本公开实施例的分布式部署Hadoop集群的方法的流程图;
图3是根据本公开实施例的分布式部署Hadoop集群的装置的结构框图;
图4是根据本公开实施例的分布式部署Hadoop集群的装置的可选结构框图一;
图5是根据本公开实施例的分布式部署Hadoop集群的装置的可选结构框图二;
图6是本实施例分布式部署Hadoop集群系统中代理器的结构框架图;
图7是本实施例的初始状态时代理器的部署流程;
图8是本实施例的Hadoop集群部署方法的流程图;
图9是本实施例的Hadoop集群部署方法的时序图;以及
图10是根据本公开实施例的电子设备的结构示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本公开。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
实施例1
本申请实施例可以运行于图1所示的网络架构上,图1为本公开实施例的分布式部署Hadoop集群的总体结构框架图,如图1所示,该网络架构包括:部署Hadoop集群的管理系统、Hadoop集群,其中,部署Hadoop集群的管理系统包括各个功能模块和执行代理节点,Hadoop集群也包括多个分散的执行任务的代理节点,部署系统和Hadoop集群进行通信连接。
在本实施例中提供了一种运行于上述部署Hadoop集群的管理系统的分布式部署Hadoop集群的方法,图2是根据本公开实施例的分布式部署Hadoop集群的方法的流程图,如图2所示,该流程包括如下步骤:
步骤S202,接收用于部署Hadoop集群的模板信息,其中,模板信息用于指示Hadoop集群的任务信息和主机信息,任务信息用于描述需要Hadoop集群完成的任务;
步骤S204,根据主机信息采集Hadoop集群的至少一个主机的参数信息,其中,每个主机被配置为部署至少一个组件,组件由代理器部署,被配置为执行对应的任务;可选的,部署任务由代理器执行。
步骤S206,根据任务信息和参数信息对至少一个组件部署任务。
通过上述步骤,接收用于部署Hadoop集群的模板信息,其中,模板信息用于指示Hadoop集群的任务信息和主机信息,任务信息用于描述需要Hadoop集群完成的任务;根据主机信息采集Hadoop集群的至少一个主机的参数信息,其中,每个主机被配置为部署至少一个组件,组件由代理器部署,被配置为执行对应的任务;根据任务信息和参数信息对至少一个组件部署任务。由于接收了任务信息和主机信息,并通过采集参数信息获取了主机和组件的负载情况,因此可以合理对Hadoop集群的各个主机和组件部署任务,可以解决相关技术中由 于人为部署Hadoop集群导致操作复杂,部署时间长的问题。
可选地,上述步骤的执行主体可以为Hadoop集群的控制端,客户端等,但不限于此。
可选的,参数信息可以但不限于为:主机操作系统信息、主机网络信息、主机CPU信息(如核心数、主频大小)、主机内存信息、主机CPU利用率、主机内存使用率、主机磁盘IO使用率、主机网络时延、主机平均IO操作等待时间、主机磁盘信息、主机内组件的进程信息。
可选的,模板信息可以但不限于为:Hadoop集群主机个数、需要部署的Hadoop集群组件信息、Hadoop分布式文件系统HDFS副本个数、Hadoop集群各组件客户端连接数和超时时间、主机网络地址、主机用户名及密码、日志存储盘信息、数据存储盘信息、元数据存储盘信息。
在根据本实施例的可选实施方式中,根据任务信息和参数信息对Hadoop集群内的至少一个组件部署任务包括:
S11,根据任务信息和参数信息生成部署任务列表,其中,部署任务列表包括任务信息、执行任务所需的参数信息,以及任务的优先级;
S12,从部署任务列表中选择优先级最高的任务下发给对应的组件。可选的,优先级与任务的属性和/或执行任务的参数信息相关。
可选的,在根据模板信息和参数信息对至少一个组件部署任务之后,方法还包括:
监控至少一个组件的任务执行进度和/或日志信息。
可选的,在接收用于部署Hadoop集群的模板信息之后,方法还包括:解析模板信息并验证模板信息的合法性。在模板信息合法的情况下,才去执行后续步骤。合法的部署模板至少要包含但不限于以下内容:Hadoop集群节点个数、需要部署的Hadoop集群组件信息、HDFS副本个数、Hadoop集群各组件客户端连接数和超时时间、主机网络地址、用户名及密码、日志存储盘、数据存储盘、元数据存储盘等信息。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件实现。基于这样的理解,本公开的技术方案本质上或者说对相关技术 做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例的方法。
实施例2
在本实施例中还提供了一种分布式部署Hadoop集群的装置,该装置被配置为实现上述实施例及实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图3是根据本公开实施例的分布式部署Hadoop集群的装置的结构框图,如图3所示,该装置包括:
接收模块30,被配置为接收用于部署Hadoop集群的模板信息,其中,模板信息用于指示Hadoop集群的任务信息和主机信息,任务信息用于描述需要Hadoop集群完成的任务;
采集模块32,被配置为根据主机信息采集Hadoop集群的至少一个主机的参数信息,其中,每个主机包括至少一个组件,组件由代理器部署,被配置为执行对应的任务;
部署模块34,被配置为根据任务信息和参数信息对至少一个组件部署任务。
可选的,参数信息可以但不限于为:主机操作系统信息、主机网络信息、主机CPU信息(如核心数、主频大小)、主机内存信息、主机CPU利用率、主机内存使用率、主机磁盘IO使用率、主机网络时延、主机平均IO操作等待时间、主机磁盘信息、主机内组件的进程信息。
可选的,模板信息可以但不限于为:Hadoop集群主机个数、需要部署的Hadoop集群组件信息、Hadoop分布式文件系统HDFS副本个数、Hadoop集群各组件客户端连接数和超时时间、主机网络地址、主机用户名及密码、日志存储盘信息、数据存储盘信息、元数据存储盘信息。
图4是根据本公开实施例的分布式部署Hadoop集群的装置的可选结构框图一,如图4所示,该装置除包括图3所示的所有模块外,部署模块34还包括:
生成单元40,被配置为根据任务信息和参数信息生成部署任务列表,其中,部署任务列表包括任务信息、执行任务所需的参数信息,以及任务的优先级;
选择单元42,被配置为从部署任务列表中选择优先级最高的任务下发给对应的组件。
图5是根据本公开实施例的分布式部署Hadoop集群的装置的可选结构框图二,如图5所示,该装置除包括图3所示的所有模块外,装置还包括:监控模块50,被配置为在部署模块根据模板信息和参数信息对至少一个组件部署任务之后,监控至少一个组件的任务执行进度和/或日志信息。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。
实施例3
本实施例是根据本公开的可选实施例,用于对本申请进行详细的解释和说明:
本实施例提供了一种分布式部署Hadoop集群方法与系统。克服了对部署Hadoop集群管理人员要求高、Hadoop集群组件节点任意分配、安装包下载源单一等缺点。本公开充分利用集群内硬件资源、各主机负载情况实现一键式分布式部署Hadoop集群。
本实施例的一种分布式部署Hadoop集群系统包括以下组件,如图1所示的构架,包括:
模板解析器:部署模板包括但不限于以下内容:主机网络地址、用户名、密码、Hadoop组件信息、节点数量信息、挂载盘信息。模板解析器对用户输入的模板信息进行解析并进行合法性校验。
监控器:监控器负责接收代理器发送的Hadoop组件部署任务执行情况及日志处理。
采集器:采集器负责接收代理器发送的主机信息(包含但不限于以下内容:操作系统信息、CPU信息、内存信息、网络信息、CPU利用率、内存使用率、磁盘IO使用率、网络时延等)并持久化。
任务生成器:任务生成器根据采集器采集的主机信息、部署模板信息生成 Hadoop组件部署任务列表。
任务调度器:任务调度器根据采集器采集的主机信息、主机负载情况和部署任务列表选择高优先级的部署任务下发至代理器。
代理器:代理器包含采集器、部署器、参数配置器、监控器等组件。采集器负责定时采集主机信息并发送给系统的采集器;部署器接收并执行任务调度器下发的任务;参数配置器负责配置Hadoop各组件配置文件;监控器负责监控部署任务执行情况及日志收集,图6是本实施例分布式部署Hadoop集群系统中代理器的结构框架图,如图6所示。
图7是本实施例的初始状态时代理器的部署流程,如图7所示,本实施例的分布式部署Hadoop集群方法包括以下:
初始化部署系统
系统启动时,初始化分布式部署Hadoop集群系统中的监控器、采集器和代理器,准备接收用户提交的部署模板。
部署代理器
由任务生成器生成代理器部署任务并由任务调度器调度任务执行。代理器部署完成后,采集器定时采集节点资源信息并反馈至管理系统。
用户提交Hadoop集群部署模板
用户根据需求按部署模板要求填写需要部署的Hadoop集群信息,提交部署模板。
解析Hadoop集群部署模板
分布式部署Hadoop集群系统的监控器接收到用户提交的部署模板,解析器解析Hadoop集群部署模板并校验模板合法性。
根据用户提交的部署模板及资源信息,拓扑生成器生成Hadoop集群网络拓扑图。
生成Hadoop集群组件部署任务
根据Hadoop集群网络拓扑图结构,由任务生成器生成组件部署任务。
任务调度器执行部署任务
任务调度器从任务列表中取出待执行的部署任务及各节点资源信息,生成待执行的任务序列;任务调度器依次取出高优先级的部署任务,下发给对应的代理器。
执行部署任务
主机代理器接收到部署任务后,部署器执行部署任务;代理器的监控器实时反馈部署任务执行进度至部署系统的监控器,监控器通知任务调度器继续调度任务执行。重复步骤“任务调度器执行部署任务”,直至所有待部署任务执行完毕。
本实施例根据Hadoop集群各组件的特点,结合集群资源,合理分配Hadoop集群组件的节点;在部署过程中根据采集的主机负载情况动态分配部署任务,实现了一键分布式部署Hadoop集群。本公开有效解决了部署大规模Hadoop集群复杂、部署时间长、部署系统压力大等缺点。
图8是本实施例的Hadoop集群部署方法的流程图,如图8所示,图9是本实施例的Hadoop集群部署方法的时序图,如图9所示,结合图8和图9,本实施例包括:
系统初始化:分布式部署Hadoop集群系统启动时,需要对系统进行初始化,包含初始化监控器、采集器和代理器A1等。
代理器部署:首次部署由代理器A1执行部署代理器A2任务,代理器A2部署完成后,初始化并启动代理器A2;接着由代理器A1、A2执行部署代理器A3、A4任务,以此类推,直至集群内所有主机代理器部署完成(如图7)。
101、用户提交部署模板:分布式部署Hadoop集群系统初始化完成后,用户可以向系统提交符合条件的部署模板。合法的部署模板至少要包含但不限于以下内容:Hadoop集群节点个数、需要部署的Hadoop集群组件信息、HDFS副本个数、Hadoop集群各组件客户端连接数和超时时间、主机网络地址、用户名及密码、日志存储盘、数据存储盘、元数据存储盘等信息。
102、模板解析器接收到部署模板信息后首先校验模板的合法性,如果模板不符合约定要求则结束部署;如果模板合法则解析模板,由拓扑图生成器生成Hadoop集群组网拓扑图。
103、根据节点资源、Hadoop集群各组件部署规则及部署模板信息,拓扑图 生成器生成Hadoop集群组网拓扑图(如S1)。Hadoop集群组件部署规则包含但不限于:1、根据硬件资源和主机负载情况,分配Hadoop组件Master、Slave节点;2、根据集群内节点数量,计算ZOOKEEPER节点数量并分配;3、根据HDFS节点数量,计算Journalnode节点数量并分配。Hadoop组件部署任务包含但不限于以下信息:组件名称(如HDFS)、节点名称(如:NameNode)、主机网络地址、任务优先级等。
104、存储拓扑图生成器生成的拓扑图。
105、部署任务生成器根据Hadoop集群组网拓扑图生成部署任务。
106、存储部署任务生成器生成的部署任务列表。
107、任务调度器扫描部署任务列表,从任务列表中取出尚未执行的部署任务,根据节点资源信息计算集群内主机负载(可以考查平均负载、内存利用率、磁盘IO利用率、网络时延指标),生成按优先级排列的部署任务序列(如S4)。
108、任务调度器选择依次选择高优先级的部署任务,把部署任务下发给对应主机的代理器。首次执行部署Hadoop组件任务时,由代理器A1部署代理器A2的一个Hadoop集群组件部署任务,代理器A1的监控器监控部署任务执行情况并反馈给部署系统的监控器(如S10)。监控器收到部署任务执行完成情况后,任务调度器根据任务列表、资源信息重新生成任务序列(如S5),任务调度器选择高优先级任务T3和T4,由代理器A1、A2向代理器A3、A4部署任务,以此类推(如S11至于S14)。理想情况下,当第t个时刻(t大于0),整个集群有2t-1个代理器在执行部署Hadoop组件任务。当然,每个代理器可以开启多个线程,并发向多个(如2个)代理器部署Hadoop组件任务,则在理想情况下,第t个时刻(t大于0),整个Hadoop集群有3t-1个代理器在执行部署Hadoop组件任务。
109、与分布式部署Hadoop集群管理系统合设的代理器A1。
110、Hadoop集群内各主机节点部署的代理器。
配置生成:参数配置任务完成Hadoop集群各组件配置生成。调度器需要收集整个Hadoop集群各组件部署信息(例如:Master和Slave所在节点的主机名称、日志存储盘、数据存储盘、元数据存储盘等信息)并与参数配置任务一起下发给各主机代理器组件中的参数配置器。集群内所有参数配置任务执行完后,则完成整个Hadoop集群各组件部署。
201、代理器组件中的采集器定时采集本主机的硬件资源及运行状态信息,并上报至部署系统中的采集器,把节点资源进行存储。其中硬件资源及运行状态信息包括但不限于以下内容:操作系统信息、主机名、CPU信息、内存信息、磁盘、进程信息、CPU利用率、内存利用率、磁盘IO利用率、网络信息、平均IO操作等待时间等。
202、存储监控器采集器采集的各节点资源(包含主机和Hadoop组件信息)信息。
实施例4
本公开的实施例还提供了一种存储介质,例如:一种非暂态计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述方法。
可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:
S1,接收用于部署Hadoop集群的模板信息,其中,模板信息用于指示Hadoop集群的任务信息和主机信息,任务信息用于描述需要Hadoop集群完成的任务;
S2,根据主机信息采集Hadoop集群的至少一个主机的参数信息,其中,每个主机被配置为部署至少一个组件,组件由代理器部署,被配置为执行对应的任务;
S3,根据任务信息和参数信息对至少一个组件部署任务。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行接收用于部署Hadoop集群的模板信息,其中,模板信息用于指示Hadoop集群的任务信息和主机信息,任务信息用于描述需要Hadoop集群完成的任务;
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行根据主机信息采集Hadoop集群的至少一个主机的参数信息,其中,每个主机被配置为部署至少一个组件,组件由代理器部署,被配置为执行对应的任务;
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行根据任务信息和参数信息对至少一个组件部署任务。
本公开实施例还提供了一种电子设备的结构示意图。参见图10,该电子设备包括:
至少一个处理器(processor)100,图10中以一个处理器100为例;和存储器(memory)101,还可以包括通信接口(Communications Interface)102和总线103。其中,处理器100、通信接口102、存储器101可以通过总线103完成相互间的通信。通信接口102可以用于信息传输。处理器100可以调用存储器101中的逻辑指令,以执行上述实施例的方法。
此外,上述的存储器101中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。
存储器101作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序,如本公开实施例中的方法对应的程序指令/模块。处理器100通过运行存储在存储器101中的软件程序、指令以及模块,从而执行功能应用以及数据处理,即实现上述方法实施例中的分布式部署Hadoop集群的方法。
存储器101可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器101可以包括高速随机存取存储器,还可以包括非易失性存储器。
本公开实施例的技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括至少一个指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开实施例所述方法的全部或部分步骤。而前述的存储介质可以是非暂态存储介质,包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂 态存储介质。
可选地,本实施例中的示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。
以上所述仅为本公开的实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开实施例的范围之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。
工业实用性
本申请提供的分布式部署Hadoop集群的方法及装置,解决了相关技术中由于人为部署Hadoop集群导致操作复杂,部署时间长的问题。

Claims (13)

  1. 一种分布式部署Hadoop集群的方法,包括:
    根据Hadoop集群的主机信息采集所述Hadoop集群的至少一个主机的参数信息,其中,每个所述主机被配置为部署至少一个组件,所述组件由代理器部署,被配置为执行对应的任务;
    根据所述Hadoop集群的任务信息和所述参数信息对至少一个所述组件部署任务。
  2. 根据权利要求1所述的方法,还包括:
    接收用于部署所述Hadoop集群的模板信息,其中,所述模板信息用于指示所述Hadoop集群的所述任务信息和所述主机信息,所述任务信息用于描述需要所述Hadoop集群完成的任务。
  3. 根据权利要求1或2所述的方法,其中,所述参数信息包括以下至少之一:主机操作系统信息、主机网络信息、主机CPU信息、主机内存信息、主机CPU利用率、主机内存使用率、主机磁盘IO使用率、主机网络时延、主机平均IO操作等待时间、主机磁盘信息、主机内组件的进程信息。
  4. 根据权利要求1或2所述的方法,其中,根据所述任务信息和所述参数信息对所述Hadoop集群内的至少一个组件部署任务包括:
    根据所述任务信息和所述参数信息生成部署任务列表,其中,所述部署任务列表包括所述任务信息、执行所述任务所需的所述参数信息,以及所述任务的优先级;
    从所述部署任务列表中选择优先级最高的任务下发给对应的组件。
  5. 根据权利要求4所述的方法,其中,所述优先级与所述任务的属性和/或执行所述任务的所述参数信息相关。
  6. 根据权利要求1所述的方法,其中,在根据所述模板信息和所述参数信息对至少一个所述组件部署任务之后,所述方法还包括:
    监控所述至少一个组件的任务执行进度和/或日志信息。
  7. 根据权利要求2所述的方法,其中,所述模板信息包括以下至少之一:Hadoop集群主机个数、需要部署的Hadoop集群组件信息、Hadoop分布式文件系统HDFS副本个数、Hadoop集群各组件客户端连接数和超时时间、主机网络地址、主机用户名及密码、日志存储盘信息、数据存储盘信息、元数据存储盘信息。
  8. 根据权利要求2所述的方法,其中,在接收用于部署Hadoop集群的模板信息之后,所述方法还包括:
    解析所述模板信息并验证所述模板信息的合法性。
  9. 一种分布式部署Hadoop集群的装置,包括:
    采集模块,被配置为根据Hadoop集群的主机信息采集所述Hadoop集群的至少一个主机的参数信息,其中,每个所述主机包括至少一个组件,所述组件由代理器部署,被配置为执行对应的任务;
    部署模块,被配置为根据所述Hadoop集群的任务信息和所述参数信息对至少一个所述组件部署任务。
  10. 根据权利要求9所述的装置,还包括:
    接收模块,被配置为接收用于部署所述Hadoop集群的模板信息,其中,所述模板信息用于指示所述Hadoop集群的所述任务信息和所述主机信息,所述任务信息用于描述需要所述Hadoop集群完成的任务。
  11. 根据权利要求9或10所述的装置,其中,部署模块还包括:
    生成单元,被配置为根据所述任务信息和所述参数信息生成部署任务列表, 其中,所述部署任务列表包括所述任务信息、执行所述任务所需的所述参数信息,以及所述任务的优先级;
    选择单元,被配置为从所述部署任务列表中选择优先级最高的任务下发给对应的组件。
  12. 根据权利要求9所述的装置,还包括:
    监控模块,被配置为在所述部署模块根据所述模板信息和所述参数信息对至少一个所述组件部署任务之后,监控所述至少一个组件的任务执行进度和/或日志信息。
  13. 一种非暂态计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求1-8中任一项的方法。
PCT/CN2017/083207 2016-06-03 2017-05-05 分布式部署Hadoop集群的方法及装置 WO2017206667A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610395969.2 2016-06-03
CN201610395969.2A CN107463582B (zh) 2016-06-03 2016-06-03 分布式部署Hadoop集群的方法及装置

Publications (1)

Publication Number Publication Date
WO2017206667A1 true WO2017206667A1 (zh) 2017-12-07

Family

ID=60479660

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/083207 WO2017206667A1 (zh) 2016-06-03 2017-05-05 分布式部署Hadoop集群的方法及装置

Country Status (2)

Country Link
CN (1) CN107463582B (zh)
WO (1) WO2017206667A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110389766A (zh) * 2019-06-21 2019-10-29 深圳市汇川技术股份有限公司 HBase容器集群部署方法、系统、设备及计算机可读存储介质
CN111061503A (zh) * 2018-10-16 2020-04-24 航天信息股份有限公司 集群系统的配置方法和集群系统
CN111581042A (zh) * 2019-02-15 2020-08-25 网宿科技股份有限公司 一种集群部署方法、部署平台及待部署服务器
CN111754191A (zh) * 2020-06-08 2020-10-09 中国建设银行股份有限公司 基于云平台的自动变更方法及相关设备
CN112732410A (zh) * 2021-01-21 2021-04-30 青岛海尔科技有限公司 服务节点的管理方法及装置、存储介质及电子装置
CN113132383A (zh) * 2021-04-19 2021-07-16 烟台中科网络技术研究所 一种网络数据采集方法及系统
CN113127016A (zh) * 2021-04-30 2021-07-16 平安国际智慧城市科技股份有限公司 Hdp大数据平台的自动化部署方法、装置、设备及介质
CN113886036A (zh) * 2021-09-13 2022-01-04 天翼数字生活科技有限公司 用于优化分布式系统集群配置的方法和系统
CN114816444A (zh) * 2021-01-28 2022-07-29 网联清算有限公司 一种监控程序部署方法、装置及电子设备、存储介质
CN115499304A (zh) * 2022-07-29 2022-12-20 天翼云科技有限公司 一种分布式存储的自动化部署方法、装置、设备及产品

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228796A (zh) * 2017-12-29 2018-06-29 百度在线网络技术(北京)有限公司 Mpp数据库的管理方法、装置、系统、服务器及介质
CN109284272A (zh) * 2018-09-07 2019-01-29 郑州云海信息技术有限公司 一种分布式文件系统的部署方法、装置及设备
CN109508196A (zh) * 2018-10-15 2019-03-22 广州云新信息技术有限公司 基于x86服务器的自动部署系统及方法
CN110262807B (zh) * 2019-06-20 2023-12-26 北京百度网讯科技有限公司 集群创建进度日志采集系统、方法和装置
CN110457114B (zh) * 2019-07-24 2020-11-27 杭州数梦工场科技有限公司 应用集群部署方法及装置
CN111866013B (zh) * 2020-07-29 2023-04-18 杭州安恒信息技术股份有限公司 一种云安全产品管理平台部署方法、装置、设备及介质
CN112363818B (zh) * 2020-11-30 2024-06-07 杭州玳数科技有限公司 一种在Yarn调度下实现HadoopMR无关性的方法
CN117742931A (zh) * 2022-09-15 2024-03-22 华为云计算技术有限公司 大数据集群部署方案的确定方法、装置、集群和存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130024496A1 (en) * 2011-07-21 2013-01-24 Yahoo! Inc Method and system for building an elastic cloud web server farm
US20130031542A1 (en) * 2011-07-28 2013-01-31 Yahoo! Inc. Method and system for distributed application stack deployment
CN103064742A (zh) * 2012-12-25 2013-04-24 中国科学院深圳先进技术研究院 一种hadoop集群的自动部署系统及方法
US20130167139A1 (en) * 2011-12-21 2013-06-27 Yahoo! Inc. Method and system for distributed application stack test certification
CN104317610A (zh) * 2014-10-11 2015-01-28 福建新大陆软件工程有限公司 一种hadoop平台自动安装部署的方法及装置
CN104734892A (zh) * 2015-04-02 2015-06-24 江苏物联网研究发展中心 大数据处理系统Hadoop在云平台OpenStack上自动部署系统
CN105302641A (zh) * 2014-06-04 2016-02-03 杭州海康威视数字技术股份有限公司 虚拟化集群中进行节点调度的方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103152393B (zh) * 2013-02-05 2015-08-05 北京邮电大学 一种云计算的计费方法和计费系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130024496A1 (en) * 2011-07-21 2013-01-24 Yahoo! Inc Method and system for building an elastic cloud web server farm
US20130031542A1 (en) * 2011-07-28 2013-01-31 Yahoo! Inc. Method and system for distributed application stack deployment
US20130167139A1 (en) * 2011-12-21 2013-06-27 Yahoo! Inc. Method and system for distributed application stack test certification
CN103064742A (zh) * 2012-12-25 2013-04-24 中国科学院深圳先进技术研究院 一种hadoop集群的自动部署系统及方法
CN105302641A (zh) * 2014-06-04 2016-02-03 杭州海康威视数字技术股份有限公司 虚拟化集群中进行节点调度的方法及装置
CN104317610A (zh) * 2014-10-11 2015-01-28 福建新大陆软件工程有限公司 一种hadoop平台自动安装部署的方法及装置
CN104734892A (zh) * 2015-04-02 2015-06-24 江苏物联网研究发展中心 大数据处理系统Hadoop在云平台OpenStack上自动部署系统

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061503A (zh) * 2018-10-16 2020-04-24 航天信息股份有限公司 集群系统的配置方法和集群系统
CN111061503B (zh) * 2018-10-16 2023-08-18 航天信息股份有限公司 集群系统的配置方法和集群系统
CN111581042A (zh) * 2019-02-15 2020-08-25 网宿科技股份有限公司 一种集群部署方法、部署平台及待部署服务器
CN111581042B (zh) * 2019-02-15 2023-09-12 网宿科技股份有限公司 一种集群部署方法、部署平台及待部署服务器
CN110389766B (zh) * 2019-06-21 2022-12-27 深圳市汇川技术股份有限公司 HBase容器集群部署方法、系统、设备及计算机可读存储介质
CN110389766A (zh) * 2019-06-21 2019-10-29 深圳市汇川技术股份有限公司 HBase容器集群部署方法、系统、设备及计算机可读存储介质
CN111754191A (zh) * 2020-06-08 2020-10-09 中国建设银行股份有限公司 基于云平台的自动变更方法及相关设备
CN112732410A (zh) * 2021-01-21 2021-04-30 青岛海尔科技有限公司 服务节点的管理方法及装置、存储介质及电子装置
CN114816444A (zh) * 2021-01-28 2022-07-29 网联清算有限公司 一种监控程序部署方法、装置及电子设备、存储介质
CN113132383B (zh) * 2021-04-19 2022-03-25 烟台中科网络技术研究所 一种网络数据采集方法及系统
CN113132383A (zh) * 2021-04-19 2021-07-16 烟台中科网络技术研究所 一种网络数据采集方法及系统
CN113127016A (zh) * 2021-04-30 2021-07-16 平安国际智慧城市科技股份有限公司 Hdp大数据平台的自动化部署方法、装置、设备及介质
CN113886036A (zh) * 2021-09-13 2022-01-04 天翼数字生活科技有限公司 用于优化分布式系统集群配置的方法和系统
CN113886036B (zh) * 2021-09-13 2024-04-19 天翼数字生活科技有限公司 用于优化分布式系统集群配置的方法和系统
CN115499304A (zh) * 2022-07-29 2022-12-20 天翼云科技有限公司 一种分布式存储的自动化部署方法、装置、设备及产品
CN115499304B (zh) * 2022-07-29 2024-03-08 天翼云科技有限公司 一种分布式存储的自动化部署方法、装置、设备及产品

Also Published As

Publication number Publication date
CN107463582B (zh) 2021-11-12
CN107463582A (zh) 2017-12-12

Similar Documents

Publication Publication Date Title
WO2017206667A1 (zh) 分布式部署Hadoop集群的方法及装置
US11704144B2 (en) Creating virtual machine groups based on request
CN108924217B (zh) 一种分布式云系统自动化部署方法
CN108809722B (zh) 一种部署Kubernetes集群的方法、装置和存储介质
JP6549787B2 (ja) ネットワークサービスをデプロイするための方法及び装置
CN107145380B (zh) 虚拟资源编排方法及装置
CN103064742B (zh) 一种hadoop集群的自动部署系统及方法
US20180302335A1 (en) Orchestrating computing resources between different computing environments
CN104144073B (zh) 主从装置环境的部署方法与主从装置环境的部署系统
US20140245319A1 (en) Method for enabling an application to run on a cloud computing system
CN104580519A (zh) 一种快速部署openstack云计算平台的方法
KR102419704B1 (ko) 보안 보호 방법 및 장치
CN111741134A (zh) 一种网络靶场大规模场景中虚拟机快速构建系统与方法
CN113742031A (zh) 节点状态信息获取方法、装置、电子设备及可读存储介质
CN112099917B (zh) 调控系统容器化应用运行管理方法、系统、设备及介质
Soner et al. Integer programming based heterogeneous cpu–gpu cluster schedulers for slurm resource manager
Lingayat et al. Integration of linux containers in openstack: An introspection
CN115098354A (zh) 一种搭建高性能云仿真设计平台的方法
CN103109515B (zh) 一种业务部署的方法及装置
Li et al. Joint scheduling and source selection for background traffic in erasure-coded storage
CN115473780B (zh) 网络靶场分布式流量生成方法、装置
CN114827177B (zh) 一种分布式文件系统的部署方法、装置及电子设备
CN110782040A (zh) 一种pytorch任务训练方法、装置、设备及介质
CN114327770A (zh) 容器集群管理系统及方法
CN109032786A (zh) Jenkins持续集成集群、APP打包方法和服务器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17805612

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17805612

Country of ref document: EP

Kind code of ref document: A1