CN107463582A - The method and device of distributed deployment Hadoop clusters - Google Patents

The method and device of distributed deployment Hadoop clusters Download PDF

Info

Publication number
CN107463582A
CN107463582A CN201610395969.2A CN201610395969A CN107463582A CN 107463582 A CN107463582 A CN 107463582A CN 201610395969 A CN201610395969 A CN 201610395969A CN 107463582 A CN107463582 A CN 107463582A
Authority
CN
China
Prior art keywords
information
deployment
task
hadoop
hadoop clusters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610395969.2A
Other languages
Chinese (zh)
Other versions
CN107463582B (en
Inventor
高林林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201610395969.2A priority Critical patent/CN107463582B/en
Priority to PCT/CN2017/083207 priority patent/WO2017206667A1/en
Publication of CN107463582A publication Critical patent/CN107463582A/en
Application granted granted Critical
Publication of CN107463582B publication Critical patent/CN107463582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention provides a kind of method and device of distributed deployment Hadoop clusters, wherein, this method includes:The Template Information for disposing Hadoop clusters is received, wherein, Template Information is used for the mission bit stream and host information for indicating Hadoop clusters, and mission bit stream is used to describe task of needing Hadoop clusters to complete;The parameter information of one or more main frames of Hadoop clusters is gathered according to host information, wherein, each main frame is used to dispose one or more assemblies, and component is disposed by proxy server, for performing corresponding task;According to mission bit stream and parameter information to one or more assemblies deployment task.By the present invention, solve because artificial deployment Hadoop clusters cause complex operation in correlation technique, the problem of deployment time is long.

Description

The method and device of distributed deployment Hadoop clusters
Technical field
The present invention relates to the communications field, in particular to a kind of method and device of distributed deployment Hadoop clusters.
Background technology
The Hadoop of correlation technique is a distributed system architecture, is the distribution developed by Apache funds club Formula architecture, Hadoop is not an abbreviation, but an imaginary name, it is said that may be with the child of the group creator A toy name it is related, without actual meaning.Hadoop is the software of an exploitation and operation processing large-scale data Platform and open source software framework, realize and Distributed Calculation, user are carried out to mass data in the cluster of a large amount of computers composition It can develop distributed program in the case where not knowing about distributed low-level details, make full use of the power high-speed computation of cluster And storage.
In correlation technique, distributed deployment Hadoop clusters need administrative staff to understand in the Hadoop ecospheres and cluster Each host hardware resource situation, to deployment, Hadoop cluster management personnel propose high requirement, and easily malfunction.Using hand Dynamic configuration Hadoop clusters, complex steps, efficiency is low, under particularly extensive Hadoop cluster environment, dynamic capacity-expanding and contracting The elastic managements such as appearance are difficult.
However, realize that the system of Hadoop automatically disposes has problems with present:
Before Hadoop clusters are disposed, according to cluster environment software and hardware information and the component of deployment, Hadoop clusters are designed Network topology structure;The program is higher to cluster management personnel requirement, it is necessary to which cluster management personnel are familiar with environment software and hardware information With the Hadoop ecospheres;In the case where no cluster management personnel intervene, automatically dispose system then arbitrarily distributes Master With the node such as Slave, reasonable distribution and cluster hardware and system load information can not be utilized;
Hadoop cluster component version bag loading sources are single, the shortcomings of causing the Hadoop clustered deploy(ment) times uncontrollable.
Hadoop clustered deploy(ment)s propose high requirement, it is necessary to which it is familiar with the Hadoop ecospheres operation maintenance personnel;Understand cluster Interior each node resources information;Design Hadoop cluster networks topology;2nd, Hadoop clusters component node distribution is any;3、Hadoop The clustered deploy(ment) time is longer.
For above mentioned problem present in correlation technique, at present it is not yet found that the solution of effect.
The content of the invention
The embodiments of the invention provide a kind of method and device of distributed deployment Hadoop clusters, at least to solve correlation Because artificial deployment Hadoop clusters cause complex operation in technology, the problem of deployment time is long.
According to one embodiment of present invention, there is provided a kind of method of distributed deployment Hadoop clusters, including:Receive For disposing the Template Information of Hadoop clusters, wherein, the Template Information is used for the task letter for indicating the Hadoop clusters Breath and host information, the mission bit stream are used to describe task of needing the Hadoop clusters to complete;Believed according to the main frame Breath gathers the parameter information of one or more main frames of the Hadoop clusters, wherein, each main frame is used to dispose one Or multiple components, the component are disposed by proxy server, for performing corresponding task;According to the mission bit stream and the parameter Information is to one or more deployment of components tasks.
Alternatively, the parameter information includes at least one of:Host operating system information, mainframe network information, master Machine CPU information, host memory information, host CPU utilization rate, host memory utilization rate, host disk IO utilization rates, mainframe network Time delay, main frame average I/O operation stand-by period, host disk information, the progress information of main frame inner assembly.
Alternatively, according to the mission bit stream and the parameter information to one or more groups in the Hadoop clusters Part deployment task includes:According to the mission bit stream and parameter information generation deployment task list, wherein, the deployment is appointed Business list includes the mission bit stream, performs the parameter information of the required by task, and the priority of the task;From The mission dispatching of highest priority is selected in the deployment task list to corresponding component.
Alternatively, the priority is related to the attribute of the task and/or the parameter information of the execution task.
Alternatively, according to the Template Information and the parameter information to one or more deployment of components tasks it Afterwards, methods described also includes:Monitor the tasks carrying progress and/or log information of one or more of components.
Alternatively, the Template Information includes at least one of:Hadoop cluster systems number, needs are disposed Each component client of Hadoop clusters module information, Hadoop distributed file system HDFS copies number, Hadoop clusters connects Connect number and time-out time, OC NCV ambda, host subscriber's name and password, daily record disc information, data storage disk information, member Data storage disk information.
Alternatively, receiving for after disposing the Template Information of Hadoop clusters, methods described also to include:Described in parsing Template Information and the legitimacy for verifying the Template Information.
According to another embodiment of the invention, there is provided a kind of device of distributed deployment Hadoop clusters, including:Connect Module is received, for receiving the Template Information for being used for disposing Hadoop clusters, wherein, the Template Information is described for indicating The mission bit stream and host information of Hadoop clusters, the mission bit stream are used to describe times for needing the Hadoop clusters to complete Business;Acquisition module, the parameter information of one or more main frames for gathering the Hadoop clusters according to the host information, Wherein, each main frame includes one or more assemblies, and the component is disposed by proxy server, for performing corresponding task; Deployment module, for according to the mission bit stream and the parameter information to one or more deployment of components tasks.
Alternatively, deployment module also includes:Generation unit, for according to the mission bit stream and parameter information generation Deployment task list, wherein, the deployment task list includes the mission bit stream, performs the parameter of the required by task Information, and the priority of the task;Selecting unit, for selecting appointing for highest priority from the deployment task list Business is handed down to corresponding component.
Alternatively, described device also includes:Monitoring module, in the deployment module according to the Template Information and institute After parameter information is stated to one or more deployment of components tasks, the tasks carrying for monitoring one or more of components enters Degree and/or log information.
According to still another embodiment of the invention, a kind of storage medium is additionally provided.The storage medium is arranged to storage and used In the program code for performing following steps:
Receive the Template Information for disposing Hadoop clusters, wherein, the Template Information be used for indicate mission bit stream and The host information of the Hadoop clusters, the mission bit stream are used to describe task of needing the Hadoop clusters to complete;
The parameter information of one or more main frames of the Hadoop clusters is gathered according to the host information, wherein, often The individual main frame includes one or more assemblies, and the component is used to perform corresponding task;
According to the mission bit stream and the parameter information to one or more deployment of components tasks.
By the present invention, the Template Information for disposing Hadoop clusters is received, wherein, the Template Information is used to indicate The mission bit stream and host information of the Hadoop clusters, the mission bit stream are used to describe to need the Hadoop clusters to complete Task;The parameter information of one or more main frames of the Hadoop clusters is gathered according to the host information, wherein, each The main frame is used to dispose one or more assemblies, and the component is disposed by proxy server, for performing corresponding task;According to institute Mission bit stream and the parameter information are stated to one or more deployment of components tasks.Due to have received mission bit stream and main frame Information, and the loading condition of main frame and component by acquisition parameter acquisition of information, therefore can rationally to Hadoop clusters Each main frame and deployment of components task, can solve because artificial deployment Hadoop clusters cause complex operation in correlation technique, The problem of deployment time is long.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the general structure frame figure of the distributed deployment Hadoop clusters of the embodiment of the present invention;
Fig. 2 is the flow chart of the method for distributed deployment Hadoop clusters according to embodiments of the present invention;
Fig. 3 is the structured flowchart of the device of distributed deployment Hadoop clusters according to embodiments of the present invention;
Fig. 4 is the alternative construction block diagram one of the device of distributed deployment Hadoop clusters according to embodiments of the present invention;
Fig. 5 is the alternative construction block diagram two of the device of distributed deployment Hadoop clusters according to embodiments of the present invention;
Fig. 6 is the structural framing figure of proxy server in the present embodiment distributed deployment Hadoop group systems;
The deployment flow of proxy server when Fig. 7 is the original state of the present embodiment;
Fig. 8 is the flow chart of the Hadoop clustered deploy(ment) methods of the present embodiment;
Fig. 9 is the timing diagram of the Hadoop clustered deploy(ment) methods of the present embodiment.
Embodiment
Describe the present invention in detail below with reference to accompanying drawing and in conjunction with the embodiments.It should be noted that do not conflicting In the case of, the feature in embodiment and embodiment in the application can be mutually combined.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.
Embodiment 1
The embodiment of the present application can be run in the network architecture shown in Fig. 1, and Fig. 1 is the distributed portion of the embodiment of the present invention The general structure frame figure of Hadoop clusters is affixed one's name to, as shown in figure 1, the network architecture includes:Dispose the management system of Hadoop clusters System, Hadoop clusters, wherein, the management system of deployment Hadoop clusters includes each functional module and performs agent node, Hadoop clusters also include the agent node of multiple scattered execution tasks, and deployment system and Hadoop clusters carry out communication link Connect.
A kind of distributed deployment for the management system for running on above-mentioned deployment Hadoop clusters is provided in the present embodiment The method of Hadoop clusters, Fig. 2 are the flow charts of the method for distributed deployment Hadoop clusters according to embodiments of the present invention, such as Shown in Fig. 2, the flow comprises the following steps:
Step S202, the Template Information for disposing Hadoop clusters is received, wherein, Template Information is used to indicate Hadoop The mission bit stream and host information of cluster, mission bit stream are used to describe task of needing Hadoop clusters to complete;
Step S204, the parameter information of one or more main frames of Hadoop clusters is gathered according to host information, wherein, often Individual main frame is used to dispose one or more assemblies, and component is disposed by proxy server, for performing corresponding task;Optionally, dispose Task is performed by proxy server.
Step S206, according to mission bit stream and parameter information to one or more assemblies deployment task.
By above-mentioned steps, the Template Information for disposing Hadoop clusters is received, wherein, Template Information is used to indicate The mission bit stream and host information of Hadoop clusters, mission bit stream are used to describe task of needing Hadoop clusters to complete;According to Host information collection Hadoop clusters one or more main frames parameter information, wherein, each main frame be used for dispose one or Multiple components, component are disposed by proxy server, for performing corresponding task;According to mission bit stream and parameter information to one or more Individual deployment of components task.Due to have received mission bit stream and host information, and main frame and group by acquisition parameter acquisition of information The loading condition of part, therefore can solve related skill rationally to each main frame and deployment of components task of Hadoop clusters Because artificial deployment Hadoop clusters cause complex operation in art, the problem of deployment time is long.
Alternatively, the executive agent of above-mentioned steps can be the control terminal, client etc. of Hadoop clusters, but be not limited to This.
Optionally, parameter information can be, but not limited to for:Host operating system information, mainframe network information, host CPU letter Breath (such as core number, dominant frequency size), host memory information, host CPU utilization rate, host memory utilization rate, host disk IO make With rate, mainframe network time delay, the main frame average I/O operation stand-by period, host disk information, main frame inner assembly progress information.
Optionally, Template Information can be, but not limited to for:The Hadoop clusters that Hadoop cluster systems number, needs are disposed Each component client connection number of module information, Hadoop distributed file system HDFS copies number, Hadoop clusters and time-out Time, OC NCV ambda, host subscriber's name and password, daily record disc information, data storage disk information, metadata storage dish Information.
In the optional embodiment according to the present embodiment, according to mission bit stream and parameter information in Hadoop clusters One or more assemblies deployment task includes:
S11, deployment task list is generated according to mission bit stream and parameter information, wherein, deployment task list includes task Information, the parameter information for performing required by task, and the priority of task;
S12, the mission dispatching of highest priority is selected from deployment task list to corresponding component.Optionally, preferentially Level is related to the attribute of task and/or the parameter information of execution task.
Optionally, after according to Template Information and parameter information to one or more assemblies deployment task, method is also wrapped Include:
Monitor the tasks carrying progress and/or log information of one or more assemblies.
Optionally, receiving for after disposing the Template Information of Hadoop clusters, method also to include:Parse Template Information And the legitimacy of validation template information.In the case where Template Information is legal, just go to perform subsequent step.Legal deployment template At least will be including but not limited to herein below:Hadoop clustered nodes number, needs dispose Hadoop clusters module information, Each component client of HDFS copies number, Hadoop clusters connection number and time-out time, OC NCV ambda, user name and close The information such as code, daily record storage dish, data storage disk, metadata storage dish.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but a lot In the case of the former be more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existing The part that technology contributes can be embodied in the form of software product, and the computer software product is stored in a storage In medium (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, calculate Machine, server, or network equipment etc.) method that performs each embodiment of the present invention.
Embodiment 2
A kind of device of distributed deployment Hadoop clusters is additionally provided in the present embodiment, and the device is above-mentioned for realizing Embodiment and preferred embodiment, repeating no more for explanation was carried out.As used below, term " module " can be real The combination of the software and/or hardware of existing predetermined function.Although device described by following examples is preferably realized with software, But hardware, or the realization of the combination of software and hardware is also what may and be contemplated.
Fig. 3 is the structured flowchart of the device of distributed deployment Hadoop clusters according to embodiments of the present invention, such as Fig. 3 institutes Show, the device includes:
Receiving module 30, for receiving the Template Information for being used for disposing Hadoop clusters, wherein, Template Information is used to indicate The mission bit stream and host information of Hadoop clusters, mission bit stream are used to describe task of needing Hadoop clusters to complete;
Acquisition module 32, the parameter information of one or more main frames for gathering Hadoop clusters according to host information, Wherein, each main frame includes one or more assemblies, and component is disposed by proxy server, for performing corresponding task;
Deployment module 34, for according to mission bit stream and parameter information to one or more assemblies deployment task.
Optionally, parameter information can be, but not limited to for:Host operating system information, mainframe network information, host CPU letter Breath (such as core number, dominant frequency size), host memory information, host CPU utilization rate, host memory utilization rate, host disk IO make With rate, mainframe network time delay, the main frame average I/O operation stand-by period, host disk information, main frame inner assembly progress information.
Optionally, Template Information can be, but not limited to for:The Hadoop clusters that Hadoop cluster systems number, needs are disposed Each component client connection number of module information, Hadoop distributed file system HDFS copies number, Hadoop clusters and time-out Time, OC NCV ambda, host subscriber's name and password, daily record disc information, data storage disk information, metadata storage dish Information.
Fig. 4 is the alternative construction block diagram one of the device of distributed deployment Hadoop clusters according to embodiments of the present invention, such as Shown in Fig. 4, in addition to including all modules shown in Fig. 3, deployment module 34 also includes the device:
Generation unit 40, for generating deployment task list according to mission bit stream and parameter information, wherein, deployment task row Table includes mission bit stream, performs the parameter information of required by task, and the priority of task;
Selecting unit 42, for selecting the mission dispatching of highest priority from deployment task list to corresponding component.
Fig. 5 is the alternative construction block diagram two of the device of distributed deployment Hadoop clusters according to embodiments of the present invention, such as Shown in Fig. 5, in addition to including all modules shown in Fig. 3, device also includes the device:Monitoring module 50, in deployment module According to Template Information and parameter information to one or more assemblies deployment task after, the monitoring one or more assemblies of the task is held Traveling degree and/or log information.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Cross in the following manner realization, but not limited to this:Above-mentioned module is respectively positioned in same processor;Or above-mentioned modules are with any The form of combination is located in different processors respectively.
Embodiment 3
The present embodiment is according to an alternative embodiment of the invention, for carrying out specific detailed explanation to the application and saying It is bright:
Present embodiments provide a kind of distributed deployment Hadoop cluster methods and system.Overcome to disposing Hadoop collection Group administrative staff require the shortcomings of high, Hadoop cluster component nodes arbitrarily distribute, installation kit loading source is single.It is of the invention abundant One-touch distributed deployment Hadoop clusters are realized using hardware resource, each load on host computers situation in cluster.
A kind of distributed deployment Hadoop group systems of the present embodiment are included with lower component, framework as shown in Figure 1, bag Include:
Template parser:Deployment template includes but is not limited to herein below:OC NCV ambda, user name, password, Hadoop module informations, number of nodes information, carry disk information.The Template Information that template parser inputs to user parses And carry out legitimacy verifies.
Monitor:Monitor is responsible at Hadoop deployment of components tasks carrying situations and the daily record of the transmission of Receiving Agent device Reason.
Collector:Collector is responsible for the host information of Receiving Agent device transmission (including but not limited to herein below:Operation system System information, CPU information, memory information, the network information, cpu busy percentage, memory usage, disk I/O utilization rate, network delay Deng) and persistence.
Task generator:Host information that task generator gathers according to collector, deployment template information generation Hadoop Deployment of components task list.
Task dispatcher:Host information, load on host computers situation and the deployment task that task dispatcher gathers according to collector The deployment task of list selection high priority is issued to proxy server.
Proxy server:Proxy server includes the components such as collector, deployment device, parameter configuration device, monitor.Collector is responsible for timing Collection host information is simultaneously sent to the collector of system;Deployment device receives and performed the task that task dispatcher issues;Parameter is matched somebody with somebody Device is put to be responsible for configuring each component profiles of Hadoop;Monitor is responsible for monitoring deployment task implementation status and log collection, Fig. 6 It is the structural framing figure of proxy server in the present embodiment distributed deployment Hadoop group systems, as shown in Figure 6.
The deployment flow of proxy server when Fig. 7 is the original state of the present embodiment, as shown in fig. 7, the distribution of the present embodiment Hadoop cluster methods are disposed including following:
Initialize deployment system
When system starts, monitor, collector and proxy server in distributed deployment Hadoop group systems are initialized, it is accurate The standby deployment template for receiving user and submitting.
Dispose proxy server
Proxy server deployment task is generated by task generator and is performed by task dispatcher scheduler task.Proxy server has been disposed Cheng Hou, collector timing acquiring node resources information simultaneously feed back to management system.
User submits Hadoop clustered deploy(ment) templates
User is filled in the Hadoop cluster informations for needing to dispose by deployment template requirement according to demand, submits deployment template.
Parse Hadoop clustered deploy(ment) templates
The monitor of distributed deployment Hadoop group systems receives the deployment template of user's submission, resolver parsing Hadoop clustered deploy(ment)s template simultaneously verifies template legitimacy.
The deployment template and resource information submitted according to user, topology generator generation Hadoop cluster network topological diagrams.
Generate Hadoop cluster component deployment task
According to Hadoop cluster network topology graph structures, by task generator formation component deployment task.
Task dispatcher performs deployment task
Task dispatcher takes out pending deployment task and each node resources information from task list, and generation is pending Task sequence;Task dispatcher takes out the deployment task of high priority successively, is handed down to corresponding proxy server.
Perform deployment task
After master agent device receives deployment task, deployment device performs deployment task;The monitor Real-time Feedback of proxy server Deployment task implementation progress to deployment system monitor, monitor notice task dispatcher continue scheduler task perform.Repeat Step " task dispatcher execution deployment task ", it is finished until needing deployment task.
The characteristics of each according to Hadoop clusters component of the present embodiment, with reference to cluster resource, reasonable distribution Hadoop cluster groups The node of part;According to the load on host computers situation dynamically distributes deployment task of collection during deployment, key distribution is realized Dispose Hadoop clusters.The present invention efficiently solves the complicated extensive Hadoop clusters of deployment, deployment time length, deployment system pressure The shortcomings of power is big.
Fig. 8 is the flow chart of the Hadoop clustered deploy(ment) methods of the present embodiment, as shown in figure 8, Fig. 9 is the present embodiment The timing diagram of Hadoop clustered deploy(ment) methods, as shown in figure 9, with reference to Fig. 8 and Fig. 9, the present embodiment includes:
System initialization:, it is necessary to be initialized to system, comprising first when distributed deployment Hadoop group systems start Beginningization monitor, collector and proxy server A1 etc..
Proxy server is disposed:Dispose first and deployment proxy server A2 tasks are performed by proxy server A1, after the completion of proxy server A2 deployment, Initialize and start proxy server A2;Then deployment proxy server A3, A4 task is performed by proxy server A1, A2, by that analogy, until collection (such as Fig. 7) is completed in the deployment of All hosts proxy server in group.
101st, user submits deployment template:After the completion of the initialization of distributed deployment Hadoop group systems, user can be to System submits qualified deployment template.Legal deployment template at least will be including but not limited to herein below:Hadoop collection Group node number, each component client of Hadoop clusters module information, HDFS copies number, Hadoop clusters for needing to dispose connect Connect number and time-out time, OC NCV ambda, user name and password, daily record storage dish, data storage disk, metadata storage dish etc. Information.
102nd, template parser receives the legitimacy for verifying template after deployment template information first, if template is not met Contractual requirements then terminate to dispose;Template is parsed if template is legal, is opened up by topological diagram generator generation Hadoop cluster networkings Flutter figure.
103rd, according to node resource, each component Arranging principles of Hadoop clusters and deployment template information, topological diagram generator Generate Hadoop cluster networkings topological diagram (such as S1).Hadoop cluster components Arranging principles include and are just not limited to following principle:1、 According to hardware resource and load on host computers situation, Hadoop component Master, Slave nodes are distributed;2nd, counted according to cluster internal segment Amount, calculate ZOOKEEPER number of nodes and distribute;3rd, according to HDFS number of nodes, calculate Journalnode number of nodes and divide Match somebody with somebody.Hadoop deployment of components task is including but not limited to following information:Component Name (such as HDFS), nodename are (such as: NameNode), OC NCV ambda, task priority etc..
104th, the topological diagram of memory topology diagram generator generation.
105th, deployment task maker generates deployment task according to Hadoop cluster networkings topological diagram.
106th, the deployment task list of deployment task maker generation is stored.
107th, task dispatcher scanning deployment task list, takes out the deployment task having not carried out, root from task list (average load, memory usage, disk I/O utilization, net are mainly examined or check according to load on host computers in node resources information computing cluster Network time delay index), generate the deployment task sequence (such as S4) according to priority arranged.
108th, task dispatcher selection selects the deployment task of high priority successively, and deployment task is handed down to respective hosts Proxy server.When performing deployment Hadoop component tasks first, a proxy server A2 Hadoop cluster is disposed by proxy server A1 Deployment of components task, proxy server A1 monitor monitoring deployment task implementation status simultaneously feed back to the monitor of deployment system (such as S10).After monitor receives deployment task execution performance, task dispatcher regenerates according to task list, resource information Task sequence (such as S5), task dispatcher selection high-priority task T3 and T4, from proxy server A1, A2 to proxy server A3, A4 portions Acting is engaged in, by that analogy (such as S11 is as S14).Ideally, when t-th of moment (t is more than 0), whole cluster has 2t-1 Proxy server is performing deployment Hadoop component tasks.Certainly, each proxy server can open multiple threads, and be sent to multiple (such as 2 It is individual) proxy server deployment Hadoop component tasks, then in the ideal case, t-th of moment (t is more than 0), whole Hadoop clusters have 3t-1 proxy server is performing deployment Hadoop component tasks.
109th, the proxy server A1 set is closed with distributed deployment Hadoop cluster management systems.
110th, the proxy server that each host node is disposed in Hadoop clusters.
Configuration generation:Parameter configuration task completes each component Configuration generation of Hadoop clusters.Scheduler needs to collect entirely Each component deployment information of Hadoop clusters (such as:The Hostname of node, daily record storage dish, data where Master and Slave The information such as storage dish, metadata storage dish) and it is handed down to together with parameter configuration task the parameter in each master agent device assembly Configurator.After all parameter configuration tasks carryings are complete in cluster, then whole each deployment of components of Hadoop clusters is completed.
201st, the hardware resource and running state information of collector timing acquiring this main frame in device assembly are acted on behalf of, and is reported Collector into deployment system, stores node resource.Wherein hardware resource and running state information includes but unlimited In herein below:Operation system information, host name, CPU information, memory information, disk, progress information, cpu busy percentage, internal memory Utilization rate, disk I/O utilization, the network information, average I/O operation stand-by period etc..
202nd, each node resource (including main frame and Hadoop module informations) information of control store monitor collector collection.
Embodiment 4
Embodiments of the invention additionally provide a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium can The program code for performing following steps to be arranged to storage to be used for:
S1, the Template Information for disposing Hadoop clusters is received, wherein, Template Information is used to indicate Hadoop clusters Mission bit stream and host information, mission bit stream are used to describe task of needing Hadoop clusters to complete;
S2, the parameter information of one or more main frames of Hadoop clusters is gathered according to host information, wherein, each main frame For disposing one or more assemblies, component is disposed by proxy server, for performing corresponding task;
S3, according to mission bit stream and parameter information to one or more assemblies deployment task.
Alternatively, in the present embodiment, above-mentioned storage medium can include but is not limited to:USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. is various can be with the medium of store program codes.
Alternatively, in the present embodiment, processor performs reception according to the program code stored in storage medium and is used for The Template Information of Hadoop clusters is disposed, wherein, Template Information is used for the mission bit stream and host information for indicating Hadoop clusters, Mission bit stream is used to describe task of needing Hadoop clusters to complete;
Alternatively, in the present embodiment, processor is performed according to main frame according to the program code stored in storage medium The parameter information of one or more main frames of information gathering Hadoop clusters, wherein, each main frame is used to dispose one or more Component, component are disposed by proxy server, for performing corresponding task;
Alternatively, in the present embodiment, processor is performed according to task according to the program code stored in storage medium Information and parameter information are to one or more assemblies deployment task.
Alternatively, the specific example in the present embodiment may be referred to described in above-described embodiment and optional embodiment Example, the present embodiment will not be repeated here.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are stored Performed in the storage device by computing device, and in some cases, can be with different from shown in order execution herein The step of going out or describing, they are either fabricated to each integrated circuit modules respectively or by multiple modules in them or Step is fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific hardware and software combination.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (10)

  1. A kind of 1. method of distributed deployment Hadoop clusters, it is characterised in that including:
    The Template Information for disposing Hadoop clusters is received, wherein, the Template Information is used to indicate the Hadoop clusters Mission bit stream and host information, the mission bit stream is used to describe task of needing the Hadoop clusters to complete;
    The parameter information of one or more main frames of the Hadoop clusters is gathered according to the host information, wherein, Mei Gesuo State main frame to be used to dispose one or more assemblies, the component is disposed by proxy server, for performing corresponding task;
    According to the mission bit stream and the parameter information to one or more deployment of components tasks.
  2. 2. according to the method for claim 1, it is characterised in that the parameter information includes at least one of:Main frame is grasped Make system information, mainframe network information, host CPU information, host memory information, host CPU utilization rate, host memory to use Rate, host disk IO utilization rates, mainframe network time delay, main frame average I/O operation stand-by period, host disk information, group in main frame The progress information of part.
  3. 3. according to the method for claim 1, it is characterised in that according to the mission bit stream and the parameter information to described One or more assemblies deployment task in Hadoop clusters includes:
    According to the mission bit stream and parameter information generation deployment task list, wherein, the deployment task list includes The mission bit stream, the parameter information for performing the required by task, and the priority of the task;
    The mission dispatching of highest priority is selected from the deployment task list to corresponding component.
  4. 4. according to the method for claim 3, it is characterised in that attribute and/or execution of the priority with the task The parameter information of the task is related.
  5. 5. according to the method for claim 1, it is characterised in that according to the Template Information and the parameter information to one After individual or multiple deployment of components tasks, methods described also includes:
    Monitor the tasks carrying progress and/or log information of one or more of components.
  6. 6. according to the method for claim 1, it is characterised in that the Template Information includes at least one of:Hadoop Cluster system number, needs dispose Hadoop clusters module information, Hadoop distributed file system HDFS copies number, Each component client connection number of Hadoop clusters and time-out time, OC NCV ambda, host subscriber's name and password, daily record storage Disk information, data storage disk information, metadata disc information.
  7. 7. according to the method for claim 1, it is characterised in that receive be used for dispose Hadoop clusters Template Information it Afterwards, methods described also includes:
    Parse the Template Information and verify the legitimacy of the Template Information.
  8. A kind of 8. device of distributed deployment Hadoop clusters, it is characterised in that including:
    Receiving module, for receiving the Template Information for being used for disposing Hadoop clusters, wherein, the Template Information is used to indicate institute The mission bit stream and host information of Hadoop clusters are stated, the mission bit stream is used to describe to need what the Hadoop clusters were completed Task;
    Acquisition module, believe for gathering the parameter of one or more main frames of the Hadoop clusters according to the host information Breath, wherein, each main frame includes one or more assemblies, and the component is disposed by proxy server, for performing corresponding appoint Business;
    Deployment module, for according to the mission bit stream and the parameter information to one or more deployment of components tasks.
  9. 9. device according to claim 8, it is characterised in that deployment module also includes:
    Generation unit, for generating deployment task list according to the mission bit stream and the parameter information, wherein, the deployment Task list includes the mission bit stream, performs the parameter information of the required by task, and the priority of the task;
    Selecting unit, for selecting the mission dispatching of highest priority from the deployment task list to corresponding component.
  10. 10. device according to claim 9, it is characterised in that described device also includes:
    Monitoring module, in the deployment module according to the Template Information and the parameter information to described in one or more After deployment of components task, the tasks carrying progress and/or log information of one or more of components are monitored.
CN201610395969.2A 2016-06-03 2016-06-03 Distributed Hadoop cluster deployment method and device Active CN107463582B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610395969.2A CN107463582B (en) 2016-06-03 2016-06-03 Distributed Hadoop cluster deployment method and device
PCT/CN2017/083207 WO2017206667A1 (en) 2016-06-03 2017-05-05 Method and device for distributively deploying hadoop cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610395969.2A CN107463582B (en) 2016-06-03 2016-06-03 Distributed Hadoop cluster deployment method and device

Publications (2)

Publication Number Publication Date
CN107463582A true CN107463582A (en) 2017-12-12
CN107463582B CN107463582B (en) 2021-11-12

Family

ID=60479660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610395969.2A Active CN107463582B (en) 2016-06-03 2016-06-03 Distributed Hadoop cluster deployment method and device

Country Status (2)

Country Link
CN (1) CN107463582B (en)
WO (1) WO2017206667A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228796A (en) * 2017-12-29 2018-06-29 百度在线网络技术(北京)有限公司 Management method, device, system, server and the medium of MPP databases
CN109284272A (en) * 2018-09-07 2019-01-29 郑州云海信息技术有限公司 A kind of dispositions method of distributed file system, device and equipment
CN109508196A (en) * 2018-10-15 2019-03-22 广州云新信息技术有限公司 Automatic deployment system and method based on X86 server
CN110262807A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Cluster creates Progress Log acquisition system, method and apparatus
CN110457114A (en) * 2019-07-24 2019-11-15 杭州数梦工场科技有限公司 Application cluster dispositions method and device
CN111866013A (en) * 2020-07-29 2020-10-30 杭州安恒信息技术股份有限公司 Cloud security product management platform deployment method, device, equipment and medium
CN112363818A (en) * 2020-11-30 2021-02-12 杭州玳数科技有限公司 Method for realizing Hadoop MR task cluster independence under Yarn scheduling
CN113886036A (en) * 2021-09-13 2022-01-04 天翼数字生活科技有限公司 Method and system for optimizing cluster configuration of distributed system
WO2024055715A1 (en) * 2022-09-15 2024-03-21 华为云计算技术有限公司 Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061503B (en) * 2018-10-16 2023-08-18 航天信息股份有限公司 Cluster system configuration method and cluster system
CN111581042B (en) * 2019-02-15 2023-09-12 网宿科技股份有限公司 Cluster deployment method, deployment platform and server to be deployed
CN110389766B (en) * 2019-06-21 2022-12-27 深圳市汇川技术股份有限公司 HBase container cluster deployment method, system, equipment and computer readable storage medium
CN111754191A (en) * 2020-06-08 2020-10-09 中国建设银行股份有限公司 Automatic change method based on cloud platform and related equipment
CN112732410B (en) * 2021-01-21 2023-03-28 青岛海尔科技有限公司 Service node management method and device, storage medium and electronic device
CN114816444A (en) * 2021-01-28 2022-07-29 网联清算有限公司 Method and device for deploying monitoring program, electronic equipment and storage medium
CN113132383B (en) * 2021-04-19 2022-03-25 烟台中科网络技术研究所 Network data acquisition method and system
CN115499304B (en) * 2022-07-29 2024-03-08 天翼云科技有限公司 Automatic deployment method, device, equipment and product for distributed storage

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130024496A1 (en) * 2011-07-21 2013-01-24 Yahoo! Inc Method and system for building an elastic cloud web server farm
US20130031542A1 (en) * 2011-07-28 2013-01-31 Yahoo! Inc. Method and system for distributed application stack deployment
CN103064742A (en) * 2012-12-25 2013-04-24 中国科学院深圳先进技术研究院 Automatic deployment system and method of hadoop cluster
CN103152393A (en) * 2013-02-05 2013-06-12 北京邮电大学 Charging method and charging system for cloud computing
US20130167139A1 (en) * 2011-12-21 2013-06-27 Yahoo! Inc. Method and system for distributed application stack test certification
CN104317610A (en) * 2014-10-11 2015-01-28 福建新大陆软件工程有限公司 Method and device for automatic installation and deployment of hadoop platform
CN105302641A (en) * 2014-06-04 2016-02-03 杭州海康威视数字技术股份有限公司 Node scheduling method and apparatus in virtual cluster

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104734892A (en) * 2015-04-02 2015-06-24 江苏物联网研究发展中心 Automatic deployment system for big data processing system Hadoop on cloud platform OpenStack

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130024496A1 (en) * 2011-07-21 2013-01-24 Yahoo! Inc Method and system for building an elastic cloud web server farm
US20130031542A1 (en) * 2011-07-28 2013-01-31 Yahoo! Inc. Method and system for distributed application stack deployment
US20130167139A1 (en) * 2011-12-21 2013-06-27 Yahoo! Inc. Method and system for distributed application stack test certification
CN103064742A (en) * 2012-12-25 2013-04-24 中国科学院深圳先进技术研究院 Automatic deployment system and method of hadoop cluster
CN103152393A (en) * 2013-02-05 2013-06-12 北京邮电大学 Charging method and charging system for cloud computing
CN105302641A (en) * 2014-06-04 2016-02-03 杭州海康威视数字技术股份有限公司 Node scheduling method and apparatus in virtual cluster
CN104317610A (en) * 2014-10-11 2015-01-28 福建新大陆软件工程有限公司 Method and device for automatic installation and deployment of hadoop platform

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228796A (en) * 2017-12-29 2018-06-29 百度在线网络技术(北京)有限公司 Management method, device, system, server and the medium of MPP databases
CN109284272A (en) * 2018-09-07 2019-01-29 郑州云海信息技术有限公司 A kind of dispositions method of distributed file system, device and equipment
CN109508196A (en) * 2018-10-15 2019-03-22 广州云新信息技术有限公司 Automatic deployment system and method based on X86 server
CN110262807A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Cluster creates Progress Log acquisition system, method and apparatus
CN110262807B (en) * 2019-06-20 2023-12-26 北京百度网讯科技有限公司 Cluster creation progress log acquisition system, method and device
CN110457114A (en) * 2019-07-24 2019-11-15 杭州数梦工场科技有限公司 Application cluster dispositions method and device
CN111866013A (en) * 2020-07-29 2020-10-30 杭州安恒信息技术股份有限公司 Cloud security product management platform deployment method, device, equipment and medium
CN112363818A (en) * 2020-11-30 2021-02-12 杭州玳数科技有限公司 Method for realizing Hadoop MR task cluster independence under Yarn scheduling
CN113886036A (en) * 2021-09-13 2022-01-04 天翼数字生活科技有限公司 Method and system for optimizing cluster configuration of distributed system
CN113886036B (en) * 2021-09-13 2024-04-19 天翼数字生活科技有限公司 Method and system for optimizing distributed system cluster configuration
WO2024055715A1 (en) * 2022-09-15 2024-03-21 华为云计算技术有限公司 Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium

Also Published As

Publication number Publication date
CN107463582B (en) 2021-11-12
WO2017206667A1 (en) 2017-12-07

Similar Documents

Publication Publication Date Title
CN107463582A (en) The method and device of distributed deployment Hadoop clusters
CN110138575B (en) Network slice creating method, system, network device and storage medium
CN103516807B (en) A kind of cloud computing platform server load balancing system and method
CN105245301B (en) A kind of airborne optical-fiber network analogue system based on time triggered
CN106484886A (en) A kind of method of data acquisition and its relevant device
CN108848146B (en) Scheduling optimization method based on time-triggered communication service
CN104503832B (en) A kind of scheduling virtual machine system and method for fair and efficiency balance
EP2330525A1 (en) Parallel computing method and computing platform for security and stability analysis of large power grid
CN106375328A (en) Adaptive optimization operation method of large-scale data distribution system
CN110059829A (en) A kind of asynchronous parameters server efficient parallel framework and method
CN110177146A (en) A kind of non-obstruction Restful communication means, device and equipment based on asynchronous event driven
CN108228796A (en) Management method, device, system, server and the medium of MPP databases
CN111262723B (en) Edge intelligent computing platform based on modularized hardware and software definition
Kanwal et al. A genetic based leader election algorithm for IoT cloud data processing
Yang et al. Smart intent-driven network management
CN106254452A (en) The big data access method of medical treatment under cloud platform
US9547747B2 (en) Distributed internet protocol network analysis model with real time response performance
CN107769934A (en) Rate processing method and processing device
CN103634290A (en) Network simulation system
CN109687985B (en) Automatic configuration method and system for process level network of transformer substation
CN115866059A (en) Block chain link point scheduling method and device
CN110190988A (en) A kind of service deployment method and device
Meddeber et al. Tasks assignment for Grid computing
CN109120443A (en) A kind of management method and device of network attached storage NAS device
CN112532427B (en) Planning and scheduling method of time-triggered communication network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant