WO2017113835A1 - 一种大数据库系统的安装工具 - Google Patents

一种大数据库系统的安装工具 Download PDF

Info

Publication number
WO2017113835A1
WO2017113835A1 PCT/CN2016/096756 CN2016096756W WO2017113835A1 WO 2017113835 A1 WO2017113835 A1 WO 2017113835A1 CN 2016096756 W CN2016096756 W CN 2016096756W WO 2017113835 A1 WO2017113835 A1 WO 2017113835A1
Authority
WO
WIPO (PCT)
Prior art keywords
configuration
database system
large database
installation
installation tool
Prior art date
Application number
PCT/CN2016/096756
Other languages
English (en)
French (fr)
Inventor
朱天骏
冯骏
Original Assignee
深圳市华讯方舟软件技术有限公司
华讯方舟科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市华讯方舟软件技术有限公司, 华讯方舟科技有限公司 filed Critical 深圳市华讯方舟软件技术有限公司
Publication of WO2017113835A1 publication Critical patent/WO2017113835A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating

Definitions

  • the invention relates to an installation tool of a database system, in particular to an installation tool of a large database system.
  • a large database system based on python web provides a distributed, scalable, high-capacity, high-speed access, high-speed query database platform for massive data in the network era. Users can develop distributed programs without knowing the underlying details of the distribution. Take full advantage of the power of the cluster for high-speed computing and storage.
  • Hadoop consists of distributed file system HDFS and distributed batch computing framework Mapreduce.
  • the computational model is based on the two functions of Map-Reduce in functional programming.
  • the computational model suitable for multi-machine parallel processing has been developed, which makes the overall processing capability of the cluster scale out horizontally and solves the bottleneck of single-machine data processing.
  • Spark supports a variety of data storage methods including HDFS, which makes Spark a more general data analysis and processing system. Based on Spark, it has developed tools for various computing fields, such as streaming computing Spark Stream, machine learning package Mllib, SparkSQL supporting SQL query, and graph calculation GraphX.
  • Hadoop has matured over the years and its HDFS has become the de facto standard for big data file systems.
  • Hadoop and Spark are the cornerstones of today's big data analytics processing platform.
  • Ambari is a Hadoop distributed cluster configuration management tool, an open source project led by hortonworks. It has become the incubator project of the Apache Foundation, and has become a powerful assistant in the Hadoop operation and maintenance system, which has attracted the attention of the industry and academia.
  • Ambari doesn't adopt a new idea and architecture, nor does it complete a new revolution in software. Instead, it makes full use of some of the best open source software available, and skillfully combines them to make it work in a distributed environment. Clustered service management capabilities, monitoring capabilities, and display capabilities. These excellent open source software are:
  • ember.js is used as the front-end MVC framework and NodeJS related tools, using handlebars.js as the page rendering engine, and the Bootstrap framework in CSS/HTML.
  • the Ambari architecture uses the Server/Client model and consists of two main components: ambari-agent and ambari-server.
  • Ambari relies on other mature tools, such as its ambari-server, which relies on python, while ambari-agent also relies on tools such as ruby, puppet, facter, etc., and it also relies on monitoring tools nagios and ganglia for monitoring cluster conditions. among them:
  • Puppet is a distributed cluster configuration management tool. It is also a typical Server/Client mode. It can centrally manage the installation and configuration deployment of distributed clusters. The main language is ruby.
  • Facter is a node resource collection library written in python for collecting system information of nodes, such as OS information and host information. Since the ambari-agent is mainly written in python, the factor can be used to collect node information well.
  • Ambari relies on a number of open source software, including deployment tools, message queues, and the need to install clients on each node. To install these tools, you need to configure the source address of the software, and you cannot do offline installation.
  • Hadoop a distributed data processing tool has developed a series of big data computing query tools, such as Spark, Hive. But these tools only provide the most basic computing and storage functions. Hadoop and Spark rely on scripts to perform operations. The operating system needs to be configured before starting the cluster. The steps are cumbersome and not conducive to maintenance.
  • Configuration file management is scattered and needs to be filled in manually.
  • Each component of the big data platform has 2 or 3 configuration files to be configured, and the management complexity of the configuration file increases with the increase of components in the big data platform.
  • the big data platform uses xml as the standard configuration file.
  • the file format error is easily caused, causing the process to fail to read the file and causing an error in the startup.
  • the traditional shell deployment script sequentially executes all the commands of the installation process. When a certain command fails, it cannot immediately exit and report an error, which is not conducive to timely exposure of the fault, prolonging the time consumed by the installation process and reducing the installation efficiency.
  • the technical problem to be solved by the present invention is to provide an installation tool for a large database system, which can set all options and parameters before installing a large database system, and most of the options and parameters need only be set by using a mouse, which greatly reduces the number of The probability of keyboard input greatly reduces the possibility of setting options and parameter errors, improves the setting efficiency, and automatically and continuously installs all components of the large database system after setting the parameters and parameters to achieve unattended
  • the installation of the big data platform greatly improves the installation efficiency of the large database system.
  • the present invention provides an installation tool for a large database system, including a mounting panel of a large database system, and the mounting panel is distributed
  • the text box control used by the user to input large database system parameter values and the text box control, compound box control, list box control, check box control, and single button control used by the user to select large database system parameter values,
  • the parameter values of the various options of the large database system are associated with the installer.
  • the installation panel is a web page.
  • the webpage is provided with a label control, and the label control has eight labels, and the eight label names are a common configuration, a Hadoop configuration, a Hive configuration, a Spark configuration, a Zookeeper configuration, an HA configuration, a monitoring suite configuration, and a configuration. Preview.
  • the large database system includes a Hadoop component, a Hive component, a Spark component, and a Zookeeper component;
  • the options and parameter values in the Hadoop component are set in the window where the tag name is a Hadoop configuration.
  • the options and parameter values of the Hive component are set in a window whose tag name is a Hive configuration.
  • the options and parameter values of the Spark component are set in a window whose label name is a Spark configuration.
  • the options and parameter values of the Zookeeper component are set in the window where the tag name is the Zookeeper configuration.
  • the option names in the universal configuration include a root password, a user, a user password, hosts, and slaves;
  • the option names in the Hadoop configuration include hadoop_env, yarn_env, core_site, hdfs_site, mapred_site, and yarn_site;
  • the option names in the Spark configuration include spark_env and spark_conf.
  • the option names in the Hive configuration include hive_env and hive_site;
  • the option name in the Zookeeper configuration includes zoo.cfg;
  • the option names in the HA configuration include a NameNode list, an RM list, a ZK list, JN list;
  • the option name in the monitoring suite configuration includes a host that specifies to start the ganglia process
  • the option name in the configuration preview includes a configuration file download button to launch the install button.
  • the label name is a window configuration window and a configuration file button control is also set;
  • the label name is a configuration preview window and a download profile button control is also set.
  • the parameter values of the various options of the large database system are associated with the installer via an association module.
  • the installation tool includes an installation package, and the installation package is installed on a node by file transfer.
  • the node is a master node, and all components of the large database system send software to each slave node through the SSH file transfer tool scp.
  • the installation tool also includes a detection module.
  • the installation tool of the large database system of the present invention has the following advantageous effects as compared with the prior art.
  • the installation tool can set all the options and parameters before installing the large database system. Most of the options and parameters need only be set by the mouse, which greatly reduces the chance of keyboard input and greatly reduces the possibility of setting options and parameters incorrectly. Sex, improve the setting efficiency, set the options and parameters, automatically and continuously install all the components of the large database system, to achieve unattended installation of the big data platform, greatly improving the installation efficiency of large database systems .
  • the clustered machine does not need to install the deployment tool in advance. It only needs to send the installation package to any node of the cluster. After the installation service is started by script, the subsequent installation process can be completed through the browser.
  • Figure 1 shows the overall architecture of the installation method.
  • Figure 2 is a flow chart of the use of the installation interface.
  • Figure 3 is the basic configuration (username password ip address).
  • Figure 4 shows the configuration of the Hadoop component.
  • Figure 5 shows the configuration of the Spark component.
  • Figure 6 is the configuration of the Hive component.
  • Figure 7 shows the configuration of the Zookeeper component.
  • Figure 8 is the HA configuration.
  • Figure 9 is the monitoring kit configuration.
  • Figure 10 is a configuration preview.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • an installation tool of a large database system includes a mounting panel of a large database system, and the mounting panel is distributed
  • the text box control used by the user to input large database system parameter values and the text box control, compound box control, list box control, check box control, and single button control used by the user to select large database system parameter values,
  • the parameter values of the various options of the large database system are associated with the installer.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the installation panel is a web page.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • the webpage is provided with a label control, and the label control has eight labels, and the eight label names are a common configuration, a Hadoop configuration, a Hive configuration, a Spark configuration, and a Zookeeper configuration. , HA configuration, monitoring kit configuration, configuration preview.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the large database system includes a Hadoop component, a Hive component, a Spark component, and a Zookeeper component.
  • the Hadoop component is for distributed data storage and batch processing
  • the Hive component is for SQL query
  • the Spark component is for high speed query and machine learning algorithms
  • the Zookeeper component is for providing a distributed consistency algorithm
  • the options and parameter values in the Hadoop component are set in the window where the tag name is a Hadoop configuration.
  • the options and parameter values of the Hive component are set in a window whose tag name is a Hive configuration.
  • the options and parameter values of the Spark component are set in a window whose label name is a Spark configuration.
  • the options and parameter values of the Zookeeper component are set in the window where the tag name is the Zookeeper configuration.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • the option names in the universal configuration include a root password, a user, a user password, a hosts, and a slaves;
  • the Root password is used to log in to all machines and execute system calls
  • the user is an execution user of the big data software after the installation is completed;
  • the user password is a password for executing a user
  • the hosts are used for intra-cluster ip-host mapping relationships
  • the slaves are host names for all slave nodes
  • the option names in the Hadoop configuration include hadoop_env, yarn_env, core_site, hdfs_site, mapred_site, and yarn_site;
  • the hadoop_env is a startup configuration file for hadoop
  • the yarn_env is a startup configuration file for the yarn
  • the core_site is used for the hadoop running configuration
  • the hdfs_site is used for the hadoop running configuration
  • the mapred_site is used for the hadoop running configuration
  • the yarn_site is used for the hadoop running configuration
  • the option names in the Spark configuration include spark_env and spark_conf.
  • the spark_env is used for the spark startup configuration
  • the spark_conf is used for the running configuration of the spark
  • the option names in the Hive configuration include hive_env and hive_site;
  • the hive_env is used for the hive startup configuration
  • the hive_site is used for the hive running configuration
  • the option name in the Zookeeper configuration includes zoo.cfg;
  • the zoo.cfg is for the zookeeper configuration
  • the option names in the HA configuration include a NameNode list, an RM list, a ZK list, and a JN list;
  • the NameNode list is a host for specifying a startup NameNode process
  • the RM list is used to specify a host that starts the ResourcesManager process
  • the ZK list is used to specify a host that starts the Zookeeper process
  • the JN list is a host for specifying to start a JournalNode process
  • the option name in the monitoring suite configuration includes a host that specifies to start the ganglia process
  • the option name in the configuration preview includes a configuration file download button to launch the install button.
  • the label name is a Spark configuration window, and a configuration file button control is further disposed in the window;
  • the configuration file button control is used to import a pre-configured configuration file and automatically fill in all parameters
  • the download profile button control is used to export the configured parameters for later installation.
  • the parameter values of the various options of the large database system are associated with the installer through an association module, the function of which is implemented by the following code.
  • the installation tool includes an installation package, and the installation package is installed on a node by using an SSH file transfer tool scp, and the node is a master node, and the large database system software sends the software to each slave node through the SSH file transfer tool scp. .
  • the installation tool further includes a detection module, wherein the detection module is configured to detect whether the result of each step command execution is correct during the installation process, and if the execution error occurs, the subsequent installation step is stopped. The error is displayed directly on the page.
  • the function of the detection module is implemented by the following code.
  • the installation process automatically configures the passwordless login between the clusters, and checks whether the password is correct. If the password is wrong, the installer will tell the operator through the browser, the password is incorrectly filled, and then the installation process is exited, and the operator can fill in the correct one. Reconfigure after configuration.
  • the installation process will use the currently running machine as the primary node of the cluster. It will fill in the configuration options to the configuration file of the big data platform software.
  • the configuration file to be filled out is hadoop-env.sh spark-env.sh, core- Site.xml hdfs-site.xml, yarn-site.xml
  • each step is a shell command
  • the official configuration file format used by the big data platform is xml, and the configuration file is shaped like this:
  • the installation service uses a python-based BS architecture with a user interface. easy to use.
  • the instant feedback mechanism during the installation process can feedback the problems of the installation to the user as soon as possible, reducing the overall efficiency of troubleshooting and deployment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

本发明公开了一种大数据库系统的安装工具,包括大数据库系统的安装面板,所述安装面板上分布有所有大数据库系统需要设置的选项名称标签控件,由用户键入大数据库系统参数值用的文本框控件和由用户选择大数据库系统参数值用的文本框控件、复合框控件、列表框控件、复选框控件、单选按扭控件,用于运行大数据库系统的安装程序的命令按扭控件;所述大数据库系统各个选项的参数值与所述安装程序相关联。该安装工具可在安装大数据库系统之前设置好所有选项和参数,大部分选项和参数只需要通过鼠标进行设置,大大减少了键盘输入的几率,大大减少了将选项和参数设置错误的可能性,提高了设置效率。

Description

一种大数据库系统的安装工具 技术领域
本发明涉及一种数据库系统的安装工具,尤其涉及一种大数据库系统的安装工具。
背景技术
随着计算机技术的发展,单机版数据库系统应运而生并得到了不断发展和完善。但是,随着互联网、移动互联网和物联网的发展,我们迎来了一个海量数据的时代,对这些海量数据的分析已经成为一个非常重要且紧迫的需求。传统单机版数据库系统已无法满足网络时代的海量大数据管理要求,无论是存储容量、存取速度、查询速度都受到极大的限制。一种基于python web的Apache大数据库系统为网络时代的海量大数据提供了一个分布式可扩展、大容量、高速存取、高速查询的数据库平台。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。其中,Hadoop由分布式文件系统HDFS和分布式批处理计算框架Mapreduce组成。其计算模型以函数式编程中的Map-Reduce两个函数为原型,发展出了适用于多机并行处理的计算模式,使集群的整体处理能力可以横向扩展,解决了单机数据处理的瓶颈。
Spark支持包括HDFS在内的多种数据存储方式,这使得Spark成为一套更通用的数据分析处理系统。在基于Spark又开发了针对各种计算领域的工具包,比如流式计算Spark Stream,机器学习包Mllib,支持SQL查询的SparkSQL,图计算GraphX。
而Hadoop经过多年的发展,日渐成熟,其HDFS已经成为大数据文件系统的事实标准。Hadoop和Spark就是现在大数据分析处理平台的基石。
Ambari是hadoop分布式集群配置管理工具,是由hortonworks主导的开源项目。它已经成为apache基金会的孵化器项目,已经成为hadoop运维系统中的得力助手,引起了业界和学术界的关注。
Ambari采用的不是一个新的思想和架构,也不是完成了软件的新的革命,而是充分利用了一些已有的优秀开源软件,巧妙地把它们结合起来,使其在分布式环境中做到了集群式服务管理能力、监控能力、展示能力。这些优秀开源软件有:
在agent端,采用了puppet管理节点;
在Web端,采用了ember.js作为前端的MVC构架和NodeJS相关工具,用handlebars.js作为页面渲染引擎,在CSS/HTML方面还用了Bootstrap框架;
在Server端,采用了Jetty,Spring,Jetty,JAX-RS等;
同时利用了Ganglia,Nagios的分布式监控能力。
Ambari架构采用的是Server/Client的模式,主要由两部分组成:ambari-agent和ambari-server。ambari依赖其它已经成熟的工具,例如其ambari-server就依赖python,而ambari-agent还同时依赖ruby,puppet,facter等工具,还有它也依赖一些监控工具nagios和ganglia用于监控集群状况。其中:
puppet是分布式集群配置管理工具,也是典型的Server/Client模式,能够集中式管理分布式集群的安装配置部署,主要语言是ruby。
facter是用python写的一个节点资源采集库,用于采集节点的系统信息,例如OS信息,主机信息等。由于ambari-agent主要是用python写的,因此用facter可以很好地采集到节点信息。
Ambari依赖的众多开源软件,包括部署工具,消息队列,需要在每个节点安装客户端。而安装这些工具,又需要配置软件的源地址,不能做到离线安装。
当前围绕着Apache Hadoop这个分布式数据处理工具已经开发了一系列大数据计算查询工具,比如Spark,Hive.但这些工具只提供了最基本的计算,存储功能。Hadoop和Spark依赖脚本执行操作。启动集群前需要对操作系统进行配置,步骤繁琐,不利于维护。
1)现有的开源部署软件架构复杂,各组件要完全通过键入命令行输入选项独立安装,部署工具本身的安装无法自动化,使用户不能快速地、不能无人执守地进行大数据平台的安装。
2)配置文件管理分散,需要手工填写。大数据平台的每个组件都有2,到3个配置文件需要配置,配置文件的管理复杂度随着大数据平台中组件的增加而增加。
3)大数据平台使用xml作为标准的配置文件,用户手工改写配置时容易造成文件格式错误,导致进程读取文件失败而使启动发生错误。
4)没有图形化界面,用户需要学会使用linux shell编程才能完成部署工作。
5)传统的shell部署脚本,顺序执行安装过程的全部命令,当某一个命令执行失败之后,不能立刻退出并且报告错误,不利于故障及时暴露,延长了安装过程消耗的时间,降低了安装效率。
发明内容
本发明要解决的技术问题是提供一种大数据库系统的安装工具,该安装工具可在安装大数据库系统之前设置好所有选项和参数,大部分选项和参数只需要通过鼠标进行设置,大大减少了键盘输入的几率,大大减少了将选项和参数设置错误的可能性,提高了设置效率,设置好选项和参数后,可自动地、连续地安装大数据库系统的所有组件,实现无人执守地进行大数据平台的安装,大大提高了大数据库系统的安装效率。
为了解决上述技术问题,本发明提供了一种大数据库系统的安装工具,包括大数据库系统的安装面板,所述安装面板上分布有
所有大数据库系统需要设置的选项名称标签控件,
由用户键入大数据库系统参数值用的文本框控件和由用户选择大数据库系统参数值用的文本框控件、复合框控件、列表框控件、复选框控件、单选按扭控件,
用于运行大数据库系统的安装程序的命令按扭控件;
所述大数据库系统各个选项的参数值与所述安装程序相关联。
所述安装面板是网页。
所述网页内设置有标签控件,所述标签控件中共有八个标签,八个所述标签名称分别是通用配置、Hadoop配置、Hive配置、Spark配置、Zookeeper配置、HA配置、监控套件配置、配置预览。
所述大数据库系统包括Hadoop组件、Hive组件、Spark组件、Zookeeper组件;
所述Hadoop组件中的选项和参数值设置在标签名称是Hadoop配置的窗口中,
所述Hive组件的选项和参数值设置在标签名称是Hive配置的窗口中,
所述Spark组件的选项和参数值设置在标签名称是Spark配置的窗口中,
所述Zookeeper组件的选项和参数值设置在标签名称是Zookeeper配置的窗口中。
所述通用配置中的选项名称包括Root密码、用户、用户密码、hosts、slaves;
所述Hadoop配置中的选项名称包括hadoop_env、yarn_env、core_site、hdfs_site、mapred_site、yarn_site;
所述Spark配置中的选项名称包括spark_env、spark_conf;
所述Hive配置中的选项名称包括hive_env、hive_site;
所述Zookeeper配置中的选项名称包括zoo.cfg;
所述HA配置中的选项名称包括NameNode列表、RM列表、ZK列表、 JN列表;
所述监控套件配置中的选项名称包括指定启动ganglia进程的主机;
所述配置预览中的选项名称包括配置文件下载按钮,启动安装按钮。
所述标签名称是spark配置的窗口中还设置有配置文件按扭控件;
所述标签名称是配置预览的窗口中还设置有下载配置文件按扭控件。
所述大数据库系统各个选项的参数值通过关联模块与所述安装程序相关联。
所述安装工具中包括安装包,所述安装包通过文件传输安装在一个节点上,该节点为主节点,所述大数据库系统全部组件通过SSH文件传输工具scp发送软件到每个从节点。
所述安装工具还包括检测模块。
本发明的大数据库系统的安装工具与现有技术相比具有以下有益效果。
1、该安装工具可在安装大数据库系统之前设置好所有选项和参数,大部分选项和参数只需要通过鼠标进行设置,大大减少了键盘输入的几率,大大减少了将选项和参数设置错误的可能性,提高了设置效率,设置好选项和参数后,可自动地、连续地安装大数据库系统的所有组件,实现无人执守地进行大数据平台的安装,大大提高了大数据库系统的安装效率。
2、集群的机器不需要提前安装部署工具,只需要将安装包发送到集群的任意节点上,通过脚本启动安装服务,即可通过浏览器完成随后的安装过程。
3、并发执行所有节点的文件传输并发执行,文件传输速度仅受限于集群的内网网速。
4、如果安装过程中遇到问题,程序会及时将问题反馈到浏览器端,便于操作人员解决系统故障。
附图说明
图1是安装方法的整体架构。
图2是安装界面的使用流程图。
图3是基础配置(用户名密码ip地址)。
图4是Hadoop组件的配置。
图5是Spark组件的配置。
图6是Hive组件的配置。
图7是Zookeeper组件的配置。
图8是HA配置。
图9是监控套件配置。
图10是配置预览。
具体实施方式
实施例一:
如图1至图10所示,一种大数据库系统的安装工具,包括大数据库系统的安装面板,所述安装面板上分布有
所有大数据库系统需要设置的选项名称标签控件,
由用户键入大数据库系统参数值用的文本框控件和由用户选择大数据库系统参数值用的文本框控件、复合框控件、列表框控件、复选框控件、单选按扭控件,
用于运行大数据库系统的安装程序的命令按扭控件;
所述大数据库系统各个选项的参数值与所述安装程序相关联。
实施例二:
如图3至图10所示,所述安装面板是网页。
实施例三:
如图3至图10所示,所述网页内设置有标签控件,所述标签控件中共有八个标签,八个所述标签名称分别是通用配置、Hadoop配置、Hive配置、Spark配置、Zookeeper配置、HA配置、监控套件配置、配置预览。
实施例四:
如图4至图7所示,所述大数据库系统包括Hadoop组件、Hive组件、Spark组件、Zookeeper组件,
所述Hadoop组件是用于分布式数据存储与批处理;
所述Hive组件是用于SQL查询;
所述Spark组件是用于高速查询和机器学习算法;
所述Zookeeper组件是用于提供分布式一致性算法;
所述Hadoop组件中的选项和参数值设置在标签名称是Hadoop配置的窗口中,
所述Hive组件的选项和参数值设置在标签名称是Hive配置的窗口中,
所述Spark组件的选项和参数值设置在标签名称是Spark配置的窗口中,
所述Zookeeper组件的选项和参数值设置在标签名称是Zookeeper配置的窗口中。
实施例五:
如图3至图10所示,所述通用配置中的选项名称包括Root密码、用户、用户密码、hosts、slaves;
所述Root密码是用于登陆所有机器,执行系统调用;
所述用户是用于安装完成之后大数据软件的执行用户;
所述用户密码是用于执行用户的密码;
所述hosts是用于集群内ip-主机映射关系;
所述slaves是用于所有从节点的主机名;
所述Hadoop配置中的选项名称包括hadoop_env、yarn_env、core_site、hdfs_site、mapred_site、yarn_site;
所述hadoop_env是用于hadoop的启动配置文件;
所述yarn_env是用于yarn的启动配置文件;
所述core_site是用于hadoop运行配置;
所述hdfs_site是用于hadoop运行配置;
所述mapred_site是用于hadoop运行配置;
所述yarn_site是用于hadoop运行配置;
所述Spark配置中的选项名称包括spark_env、spark_conf;
所述spark_env是用于spark启动配置;
所述spark_conf是用于spark运行配置;
所述Hive配置中的选项名称包括hive_env、hive_site;
所述hive_env是用于hive启动配置;
所述hive_site是用于hive运行配置;
所述Zookeeper配置中的选项名称包括zoo.cfg;
所述zoo.cfg是用于zookeeper配置;
所述HA配置中的选项名称包括NameNode列表、RM列表、ZK列表、JN列表;
所述NameNode列表是用于指定启动NameNode进程的主机;
所述RM列表是用于指定启动ResourcesManager进程的主机;
所述ZK列表是用于指定启动Zookeeper进程的主机;
所述JN列表是用于指定启动JournalNode进程的主机;
所述监控套件配置中的选项名称包括指定启动ganglia进程的主机;
所述配置预览中的选项名称包括配置文件下载按钮,启动安装按钮。
实施例六:
如图5所示,所述标签名称是Spark配置的窗口中还设置有配置文件按扭控件;
配置文件按扭控件是用于导入一个预先配置好的配置文件,自动填写所有参数;
所述标签名称是配置预览的窗口中还设置有下载配置文件按扭控件:
所述下载配置文件按扭控件用于导出配置过的参数,供以后安装时使用。
实施例七:
所述大数据库系统各个选项的参数值通过关联模块与所述安装程序相关联,所述关联模块的功能通过以下代码实现。
python代码:
Figure PCTCN2016096756-appb-000001
Figure PCTCN2016096756-appb-000002
实施例八:
所述安装工具中包括安装包,所述安装包通过SSH文件传输工具scp安装在一个节点上,该节点为主节点,所述大数据库系统软件通过SSH文件传输工具scp发送软件到每个从节点。
实施例九:
所述安装工具还包括检测模块,所述检测模块是用于在安装过程中检测每一步命令执行的结果是否正确,如果执行出错,会停止后续的安装步 骤,直接将错误显示在页面上。
所述检测模块的功能通过以下代码实现。
python代码:
Figure PCTCN2016096756-appb-000003
使用时,具体操作步骤如下。
一、在集群中的任意一台机器上启动安装服务进程,服务进程监听8080端口的HTTP请求。
二、用户从浏览器登录到安装服务页面上,填写以下配置预览:
1.填写用户名密码,安装服务创建该用户,并且将这个用户作为大数据平台的使用者账户。
2.填写机器的主机名列表,这些主机和当前机器一起组成了一个大数据平台所需的集群。
3.填写大数据平台所必填的配置项。
填写全部配置后点击安装按钮,开始安装。
三、当服务器收到请求,就开始执行安装进程。
四、安装进程自动配置集群间的无密码登陆,并且校验密码是否正确,如果密码错误,安装程序会通过浏览器告诉操作人员,密码填写错误,然后退出安装进程,操作人员可以在填写正确的配置后重新安装。
五、安装进程将当前运行的机器作为集群的主节点,它会将配置选项填写到大数据平台软件的配置文件中,需要填写的配置文件有hadoop-env.sh spark-env.sh,core-site.xml hdfs-site.xml,yarn-site.xml
六、安装过程执行其他shell命令,比如安装数据库,复制大数据平台软件到指定目录下。
七、主节点安装过程结束之后,进入从节点远程安装阶段。启动一个线程池,将所有从节点地址作为参数注册到线程池内,然后启动线程池,所有节点的安装过程并发执行。所有节点的安装进程相互独立,并且各自报告安装进度和运行时遇到的故障。等待全部节点执行完毕之后,安装进程会输出安装报告,告知用户,有多少节点安装成功,有多少节点安装失败。
安装过程伪代码如下:
pool=Pool(core_nums)
for slave in slaves:
    pool.push(slave)
pool.start()
pool.close()
整个安装进程分为几十个小步骤,每一个步骤都是一条shell命令,
一旦某台机器在安装过程里出现了故障,它会立刻停止安装进程,并且将错误状态返回到浏览器上。操作人员可以及时了解安装进度,并且及 时解决系统故障。
配置项文件格式转换:
大数据平台官方使用的配置文件格式为xml,配置文件形如:
<configuration>
...
<property>
<name>dfs.default.name</name>
<value>hdfs://localhost/</value>
</property>
...
</configuration>
本发明的关键点如下。
1)采用易读的文件格式重新组织配置文件。
2)安装服务采用基于python的BS架构,有用户界面。操作简单。
3)安装过程中的即时反馈机制可以将安装出现的问题尽快反馈给用户,减少了查错和部署的整体效率。
本发明的优点如下。
1)不需要提前安装任何依赖。安装包自身携带了所需的全部依赖文件。适合在封闭的机房内离线安装。
2)操作简单易用。
3)优化了配置文件填写和管理。
需要说明的是,以上参照附图所描述的各个实施例仅用以说明本发明而非限制本发明的范围,本领域的普通技术人员应当理解,在不脱离本发明的精神和范围的前提下对本发明进行的修改或者等同替换,均应涵盖在本发明的范围之内。此外,除上下文另有所指外,以单数形式出现的词包括复数形式,反之亦然。另外,除非特别说明,那么任何实施例的全部或一部分可结合任何其它实施例的全部或一部分来使用。

Claims (9)

  1. 一种大数据库系统的安装工具,其特征在于,包括大数据库系统的安装面板,所述安装面板上分布有
    所有大数据库系统需要设置的选项名称标签控件,
    由用户键入大数据库系统参数值用的文本框控件和由用户选择大数据库系统参数值用的文本框控件、复合框控件、列表框控件、复选框控件、单选按扭控件,
    用于运行大数据库系统的安装程序的命令按扭控件;
    所述大数据库系统各个选项的参数值与所述安装程序相关联。
  2. 根据权利要求1所述的大数据库系统的安装工具,其特征在于,所述安装面板是网页。
  3. 根据权利要求2所述的大数据库系统的安装工具,其特征在于,所述网页内设置有标签控件,所述标签控件中共有八个标签,八个所述标签名称分别是通用配置、Hadoop配置、Hive配置、Spark配置、Zookeeper配置、HA配置、监控套件配置、配置预览。
  4. 根据权利要求3所述的大数据库系统的安装工具,其特征在于,所述大数据库系统包括Hadoop组件、Hive组件、Spark组件、Zookeeper组件;
    所述Hadoop组件中的选项和参数值设置在标签名称是Hadoop配置的窗口中,
    所述Hive组件的选项和参数值设置在标签名称是Hive配置的窗口中,
    所述Spark组件的选项和参数值设置在标签名称是Spark配置的窗口中,
    所述Zookeeper组件的选项和参数值设置在标签名称是Zookeeper配置的窗口中。
  5. 根据权利要求3所述的大数据库系统的安装工具,其特征在于,
    所述通用配置中的选项名称包括Root密码、用户、用户密码、hosts、slaves:
    所述Hadoop配置中的选项名称包括hadoop_env、yarn_env、core_site、hdfs_site、mapred_site、yarn_site;
    所述Spark配置中的选项名称包括spark_env、spark_conf;
    所述Hive配置中的选项名称包括hive_env、hive_site;
    所述Zookeeper配置中的选项名称包括zoo.cfg;
    所述HA配置中的选项名称包括NameNode列表、RM列表、ZK列表、JN列表;
    所述监控套件配置中的选项名称包括指定启动ganglia进程的主机;
    所述配置预览中的选项名称包括配置文件下载按钮,启动安装按钮。
  6. 根据权利要求3所述的大数据库系统的安装工具,其特征在于,
    所述标签名称是Spark配置的窗口中还设置有配置文件按扭控件;
    所述标签名称是配置预览的窗口中还设置有下载配置文件按扭控件。
  7. 根据权利要求1所述的大数据库系统的安装工具,其特征在于,所述大数据库系统各个选项的参数值通过关联模块与所述安装程序相关联。
  8. 根据权利要求1所述的大数据库系统的安装工具,其特征在于,所述安装工具中包括安装包,所述安装包通过文件传输安装在一个节点上,该节点为主节点,所述大数据系统全部组件的组件通过SSH文件传输工具scp发送软件到每个从节点。
  9. 根据权利要求1所述的大数据库系统的安装工具,其特征在于,所述安装工具还包括检测模块。
PCT/CN2016/096756 2015-12-28 2016-08-25 一种大数据库系统的安装工具 WO2017113835A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510998963.X 2015-12-28
CN201510998963.XA CN105677382A (zh) 2015-12-28 2015-12-28 一种大数据库系统的安装工具

Publications (1)

Publication Number Publication Date
WO2017113835A1 true WO2017113835A1 (zh) 2017-07-06

Family

ID=56189609

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/096756 WO2017113835A1 (zh) 2015-12-28 2016-08-25 一种大数据库系统的安装工具

Country Status (2)

Country Link
CN (1) CN105677382A (zh)
WO (1) WO2017113835A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677382A (zh) * 2015-12-28 2016-06-15 深圳市华讯方舟软件技术有限公司 一种大数据库系统的安装工具
CN107797874B (zh) * 2017-10-12 2021-04-27 南京中新赛克科技有限责任公司 一种基于嵌入式jetty和spark on yarn框架的资源管控方法
CN109542011B (zh) * 2018-12-05 2021-10-22 国网江西省电力有限公司信息通信分公司 一种多源异构监测数据的标准化采集系统
CN110764788B (zh) * 2019-09-10 2023-04-25 武汉联影医疗科技有限公司 云存储部署方法、装置、计算机设备和可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081606A (zh) * 2009-12-01 2011-06-01 千乡万才科技(中国)有限公司 万能组合查询的方法和设备
CN104915160A (zh) * 2014-03-14 2015-09-16 佳能株式会社 信息处理设备及控制信息处理设备的方法
CN105677382A (zh) * 2015-12-28 2016-06-15 深圳市华讯方舟软件技术有限公司 一种大数据库系统的安装工具

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102455936A (zh) * 2010-11-25 2012-05-16 中标软件有限公司 集群的快速部署方法
CN104541247B (zh) * 2012-08-07 2018-12-11 超威半导体公司 用于调整云计算系统的系统和方法
CN103412768A (zh) * 2013-07-19 2013-11-27 蓝盾信息安全技术股份有限公司 一种基于脚本程序自动化部署Zookeeper集群的方法
CN104050003B (zh) * 2014-06-27 2017-06-09 浪潮集团有限公司 一种采用shell脚本启动Nutch采集系统的方法
CN104615466B (zh) * 2015-02-05 2017-08-25 广州亦云信息技术有限公司 一种云平台部署方法和系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081606A (zh) * 2009-12-01 2011-06-01 千乡万才科技(中国)有限公司 万能组合查询的方法和设备
CN104915160A (zh) * 2014-03-14 2015-09-16 佳能株式会社 信息处理设备及控制信息处理设备的方法
CN105677382A (zh) * 2015-12-28 2016-06-15 深圳市华讯方舟软件技术有限公司 一种大数据库系统的安装工具

Also Published As

Publication number Publication date
CN105677382A (zh) 2016-06-15

Similar Documents

Publication Publication Date Title
US11307967B2 (en) Test orchestration platform
US10284418B2 (en) Network switch management via embedded management controller using management information base (MIB) to JSON parser
JP7197620B2 (ja) Dagインタラクションに基づくストリーミングコンピューティング方法及び装置
US20200034178A1 (en) Virtualization agnostic orchestration in a virtual computing system
WO2017113835A1 (zh) 一种大数据库系统的安装工具
EP4013015A1 (en) Detection and remediation of virtual environment performance issues
WO2021155667A1 (zh) 模型训练方法、装置及集群系统
US10230567B2 (en) Management of a plurality of system control networks
WO2011150715A1 (zh) 分布式控制系统中采集第三方设备数据的方法及装置
US10778597B1 (en) Orchestration management system and method for managing a resource pool across multiple computing clouds
US20180026829A1 (en) Device or vendor independent network switch management via embedded management controller stack
CN110851234A (zh) 基于docker容器的日志处理方法及装置
CN111045911A (zh) 性能测试方法、性能测试装置、存储介质与电子设备
CN114138754A (zh) 基于Kubernetes平台的软件部署方法及装置
CN113127009A (zh) 大数据管理平台的自动化部署方法和装置
EP4024761A1 (en) Communication method and apparatus for multiple management domains
US10613708B2 (en) Cloning a hypervisor
US20210182165A1 (en) Distributed application resource determination based on performance metrics
CN111045697A (zh) 自动化快速部署方法及系统
CN103617077A (zh) 智能型云端化移转的方法与系统
CN110661886A (zh) 一种组件安装方法及装置
Abidi et al. Desktop grid computing at the age of the web
CN116629382B (zh) 基于Kubernetes的机器学习平台对接HPC集群的方法、装置、系统
US20240070002A1 (en) Hang detection models and management for heterogenous applications in distributed environments
CN114661312B (zh) 一种OpenStack集群嵌套部署方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16880628

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16880628

Country of ref document: EP

Kind code of ref document: A1