CN107463582A - The method and device of distributed deployment Hadoop clusters - Google Patents
The method and device of distributed deployment Hadoop clusters Download PDFInfo
- Publication number
- CN107463582A CN107463582A CN201610395969.2A CN201610395969A CN107463582A CN 107463582 A CN107463582 A CN 107463582A CN 201610395969 A CN201610395969 A CN 201610395969A CN 107463582 A CN107463582 A CN 107463582A
- Authority
- CN
- China
- Prior art keywords
- information
- deployment
- task
- hadoop
- hadoop clusters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2219—Large Object storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Abstract
The invention provides a kind of method and device of distributed deployment Hadoop clusters, wherein, this method includes:The Template Information for disposing Hadoop clusters is received, wherein, Template Information is used for the mission bit stream and host information for indicating Hadoop clusters, and mission bit stream is used to describe task of needing Hadoop clusters to complete;The parameter information of one or more main frames of Hadoop clusters is gathered according to host information, wherein, each main frame is used to dispose one or more assemblies, and component is disposed by proxy server, for performing corresponding task;According to mission bit stream and parameter information to one or more assemblies deployment task.By the present invention, solve because artificial deployment Hadoop clusters cause complex operation in correlation technique, the problem of deployment time is long.
Description
Technical field
The present invention relates to the communications field, in particular to a kind of method and device of distributed deployment Hadoop clusters.
Background technology
The Hadoop of correlation technique is a distributed system architecture, is the distribution developed by Apache funds club
Formula architecture, Hadoop is not an abbreviation, but an imaginary name, it is said that may be with the child of the group creator
A toy name it is related, without actual meaning.Hadoop is the software of an exploitation and operation processing large-scale data
Platform and open source software framework, realize and Distributed Calculation, user are carried out to mass data in the cluster of a large amount of computers composition
It can develop distributed program in the case where not knowing about distributed low-level details, make full use of the power high-speed computation of cluster
And storage.
In correlation technique, distributed deployment Hadoop clusters need administrative staff to understand in the Hadoop ecospheres and cluster
Each host hardware resource situation, to deployment, Hadoop cluster management personnel propose high requirement, and easily malfunction.Using hand
Dynamic configuration Hadoop clusters, complex steps, efficiency is low, under particularly extensive Hadoop cluster environment, dynamic capacity-expanding and contracting
The elastic managements such as appearance are difficult.
However, realize that the system of Hadoop automatically disposes has problems with present:
Before Hadoop clusters are disposed, according to cluster environment software and hardware information and the component of deployment, Hadoop clusters are designed
Network topology structure;The program is higher to cluster management personnel requirement, it is necessary to which cluster management personnel are familiar with environment software and hardware information
With the Hadoop ecospheres;In the case where no cluster management personnel intervene, automatically dispose system then arbitrarily distributes Master
With the node such as Slave, reasonable distribution and cluster hardware and system load information can not be utilized;
Hadoop cluster component version bag loading sources are single, the shortcomings of causing the Hadoop clustered deploy(ment) times uncontrollable.
Hadoop clustered deploy(ment)s propose high requirement, it is necessary to which it is familiar with the Hadoop ecospheres operation maintenance personnel;Understand cluster
Interior each node resources information;Design Hadoop cluster networks topology;2nd, Hadoop clusters component node distribution is any;3、Hadoop
The clustered deploy(ment) time is longer.
For above mentioned problem present in correlation technique, at present it is not yet found that the solution of effect.
The content of the invention
The embodiments of the invention provide a kind of method and device of distributed deployment Hadoop clusters, at least to solve correlation
Because artificial deployment Hadoop clusters cause complex operation in technology, the problem of deployment time is long.
According to one embodiment of present invention, there is provided a kind of method of distributed deployment Hadoop clusters, including:Receive
For disposing the Template Information of Hadoop clusters, wherein, the Template Information is used for the task letter for indicating the Hadoop clusters
Breath and host information, the mission bit stream are used to describe task of needing the Hadoop clusters to complete;Believed according to the main frame
Breath gathers the parameter information of one or more main frames of the Hadoop clusters, wherein, each main frame is used to dispose one
Or multiple components, the component are disposed by proxy server, for performing corresponding task;According to the mission bit stream and the parameter
Information is to one or more deployment of components tasks.
Alternatively, the parameter information includes at least one of:Host operating system information, mainframe network information, master
Machine CPU information, host memory information, host CPU utilization rate, host memory utilization rate, host disk IO utilization rates, mainframe network
Time delay, main frame average I/O operation stand-by period, host disk information, the progress information of main frame inner assembly.
Alternatively, according to the mission bit stream and the parameter information to one or more groups in the Hadoop clusters
Part deployment task includes:According to the mission bit stream and parameter information generation deployment task list, wherein, the deployment is appointed
Business list includes the mission bit stream, performs the parameter information of the required by task, and the priority of the task;From
The mission dispatching of highest priority is selected in the deployment task list to corresponding component.
Alternatively, the priority is related to the attribute of the task and/or the parameter information of the execution task.
Alternatively, according to the Template Information and the parameter information to one or more deployment of components tasks it
Afterwards, methods described also includes:Monitor the tasks carrying progress and/or log information of one or more of components.
Alternatively, the Template Information includes at least one of:Hadoop cluster systems number, needs are disposed
Each component client of Hadoop clusters module information, Hadoop distributed file system HDFS copies number, Hadoop clusters connects
Connect number and time-out time, OC NCV ambda, host subscriber's name and password, daily record disc information, data storage disk information, member
Data storage disk information.
Alternatively, receiving for after disposing the Template Information of Hadoop clusters, methods described also to include:Described in parsing
Template Information and the legitimacy for verifying the Template Information.
According to another embodiment of the invention, there is provided a kind of device of distributed deployment Hadoop clusters, including:Connect
Module is received, for receiving the Template Information for being used for disposing Hadoop clusters, wherein, the Template Information is described for indicating
The mission bit stream and host information of Hadoop clusters, the mission bit stream are used to describe times for needing the Hadoop clusters to complete
Business;Acquisition module, the parameter information of one or more main frames for gathering the Hadoop clusters according to the host information,
Wherein, each main frame includes one or more assemblies, and the component is disposed by proxy server, for performing corresponding task;
Deployment module, for according to the mission bit stream and the parameter information to one or more deployment of components tasks.
Alternatively, deployment module also includes:Generation unit, for according to the mission bit stream and parameter information generation
Deployment task list, wherein, the deployment task list includes the mission bit stream, performs the parameter of the required by task
Information, and the priority of the task;Selecting unit, for selecting appointing for highest priority from the deployment task list
Business is handed down to corresponding component.
Alternatively, described device also includes:Monitoring module, in the deployment module according to the Template Information and institute
After parameter information is stated to one or more deployment of components tasks, the tasks carrying for monitoring one or more of components enters
Degree and/or log information.
According to still another embodiment of the invention, a kind of storage medium is additionally provided.The storage medium is arranged to storage and used
In the program code for performing following steps:
Receive the Template Information for disposing Hadoop clusters, wherein, the Template Information be used for indicate mission bit stream and
The host information of the Hadoop clusters, the mission bit stream are used to describe task of needing the Hadoop clusters to complete;
The parameter information of one or more main frames of the Hadoop clusters is gathered according to the host information, wherein, often
The individual main frame includes one or more assemblies, and the component is used to perform corresponding task;
According to the mission bit stream and the parameter information to one or more deployment of components tasks.
By the present invention, the Template Information for disposing Hadoop clusters is received, wherein, the Template Information is used to indicate
The mission bit stream and host information of the Hadoop clusters, the mission bit stream are used to describe to need the Hadoop clusters to complete
Task;The parameter information of one or more main frames of the Hadoop clusters is gathered according to the host information, wherein, each
The main frame is used to dispose one or more assemblies, and the component is disposed by proxy server, for performing corresponding task;According to institute
Mission bit stream and the parameter information are stated to one or more deployment of components tasks.Due to have received mission bit stream and main frame
Information, and the loading condition of main frame and component by acquisition parameter acquisition of information, therefore can rationally to Hadoop clusters
Each main frame and deployment of components task, can solve because artificial deployment Hadoop clusters cause complex operation in correlation technique,
The problem of deployment time is long.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair
Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the general structure frame figure of the distributed deployment Hadoop clusters of the embodiment of the present invention;
Fig. 2 is the flow chart of the method for distributed deployment Hadoop clusters according to embodiments of the present invention;
Fig. 3 is the structured flowchart of the device of distributed deployment Hadoop clusters according to embodiments of the present invention;
Fig. 4 is the alternative construction block diagram one of the device of distributed deployment Hadoop clusters according to embodiments of the present invention;
Fig. 5 is the alternative construction block diagram two of the device of distributed deployment Hadoop clusters according to embodiments of the present invention;
Fig. 6 is the structural framing figure of proxy server in the present embodiment distributed deployment Hadoop group systems;
The deployment flow of proxy server when Fig. 7 is the original state of the present embodiment;
Fig. 8 is the flow chart of the Hadoop clustered deploy(ment) methods of the present embodiment;
Fig. 9 is the timing diagram of the Hadoop clustered deploy(ment) methods of the present embodiment.
Embodiment
Describe the present invention in detail below with reference to accompanying drawing and in conjunction with the embodiments.It should be noted that do not conflicting
In the case of, the feature in embodiment and embodiment in the application can be mutually combined.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.
Embodiment 1
The embodiment of the present application can be run in the network architecture shown in Fig. 1, and Fig. 1 is the distributed portion of the embodiment of the present invention
The general structure frame figure of Hadoop clusters is affixed one's name to, as shown in figure 1, the network architecture includes:Dispose the management system of Hadoop clusters
System, Hadoop clusters, wherein, the management system of deployment Hadoop clusters includes each functional module and performs agent node,
Hadoop clusters also include the agent node of multiple scattered execution tasks, and deployment system and Hadoop clusters carry out communication link
Connect.
A kind of distributed deployment for the management system for running on above-mentioned deployment Hadoop clusters is provided in the present embodiment
The method of Hadoop clusters, Fig. 2 are the flow charts of the method for distributed deployment Hadoop clusters according to embodiments of the present invention, such as
Shown in Fig. 2, the flow comprises the following steps:
Step S202, the Template Information for disposing Hadoop clusters is received, wherein, Template Information is used to indicate Hadoop
The mission bit stream and host information of cluster, mission bit stream are used to describe task of needing Hadoop clusters to complete;
Step S204, the parameter information of one or more main frames of Hadoop clusters is gathered according to host information, wherein, often
Individual main frame is used to dispose one or more assemblies, and component is disposed by proxy server, for performing corresponding task;Optionally, dispose
Task is performed by proxy server.
Step S206, according to mission bit stream and parameter information to one or more assemblies deployment task.
By above-mentioned steps, the Template Information for disposing Hadoop clusters is received, wherein, Template Information is used to indicate
The mission bit stream and host information of Hadoop clusters, mission bit stream are used to describe task of needing Hadoop clusters to complete;According to
Host information collection Hadoop clusters one or more main frames parameter information, wherein, each main frame be used for dispose one or
Multiple components, component are disposed by proxy server, for performing corresponding task;According to mission bit stream and parameter information to one or more
Individual deployment of components task.Due to have received mission bit stream and host information, and main frame and group by acquisition parameter acquisition of information
The loading condition of part, therefore can solve related skill rationally to each main frame and deployment of components task of Hadoop clusters
Because artificial deployment Hadoop clusters cause complex operation in art, the problem of deployment time is long.
Alternatively, the executive agent of above-mentioned steps can be the control terminal, client etc. of Hadoop clusters, but be not limited to
This.
Optionally, parameter information can be, but not limited to for:Host operating system information, mainframe network information, host CPU letter
Breath (such as core number, dominant frequency size), host memory information, host CPU utilization rate, host memory utilization rate, host disk IO make
With rate, mainframe network time delay, the main frame average I/O operation stand-by period, host disk information, main frame inner assembly progress information.
Optionally, Template Information can be, but not limited to for:The Hadoop clusters that Hadoop cluster systems number, needs are disposed
Each component client connection number of module information, Hadoop distributed file system HDFS copies number, Hadoop clusters and time-out
Time, OC NCV ambda, host subscriber's name and password, daily record disc information, data storage disk information, metadata storage dish
Information.
In the optional embodiment according to the present embodiment, according to mission bit stream and parameter information in Hadoop clusters
One or more assemblies deployment task includes:
S11, deployment task list is generated according to mission bit stream and parameter information, wherein, deployment task list includes task
Information, the parameter information for performing required by task, and the priority of task;
S12, the mission dispatching of highest priority is selected from deployment task list to corresponding component.Optionally, preferentially
Level is related to the attribute of task and/or the parameter information of execution task.
Optionally, after according to Template Information and parameter information to one or more assemblies deployment task, method is also wrapped
Include:
Monitor the tasks carrying progress and/or log information of one or more assemblies.
Optionally, receiving for after disposing the Template Information of Hadoop clusters, method also to include:Parse Template Information
And the legitimacy of validation template information.In the case where Template Information is legal, just go to perform subsequent step.Legal deployment template
At least will be including but not limited to herein below:Hadoop clustered nodes number, needs dispose Hadoop clusters module information,
Each component client of HDFS copies number, Hadoop clusters connection number and time-out time, OC NCV ambda, user name and close
The information such as code, daily record storage dish, data storage disk, metadata storage dish.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but a lot
In the case of the former be more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existing
The part that technology contributes can be embodied in the form of software product, and the computer software product is stored in a storage
In medium (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, calculate
Machine, server, or network equipment etc.) method that performs each embodiment of the present invention.
Embodiment 2
A kind of device of distributed deployment Hadoop clusters is additionally provided in the present embodiment, and the device is above-mentioned for realizing
Embodiment and preferred embodiment, repeating no more for explanation was carried out.As used below, term " module " can be real
The combination of the software and/or hardware of existing predetermined function.Although device described by following examples is preferably realized with software,
But hardware, or the realization of the combination of software and hardware is also what may and be contemplated.
Fig. 3 is the structured flowchart of the device of distributed deployment Hadoop clusters according to embodiments of the present invention, such as Fig. 3 institutes
Show, the device includes:
Receiving module 30, for receiving the Template Information for being used for disposing Hadoop clusters, wherein, Template Information is used to indicate
The mission bit stream and host information of Hadoop clusters, mission bit stream are used to describe task of needing Hadoop clusters to complete;
Acquisition module 32, the parameter information of one or more main frames for gathering Hadoop clusters according to host information,
Wherein, each main frame includes one or more assemblies, and component is disposed by proxy server, for performing corresponding task;
Deployment module 34, for according to mission bit stream and parameter information to one or more assemblies deployment task.
Optionally, parameter information can be, but not limited to for:Host operating system information, mainframe network information, host CPU letter
Breath (such as core number, dominant frequency size), host memory information, host CPU utilization rate, host memory utilization rate, host disk IO make
With rate, mainframe network time delay, the main frame average I/O operation stand-by period, host disk information, main frame inner assembly progress information.
Optionally, Template Information can be, but not limited to for:The Hadoop clusters that Hadoop cluster systems number, needs are disposed
Each component client connection number of module information, Hadoop distributed file system HDFS copies number, Hadoop clusters and time-out
Time, OC NCV ambda, host subscriber's name and password, daily record disc information, data storage disk information, metadata storage dish
Information.
Fig. 4 is the alternative construction block diagram one of the device of distributed deployment Hadoop clusters according to embodiments of the present invention, such as
Shown in Fig. 4, in addition to including all modules shown in Fig. 3, deployment module 34 also includes the device:
Generation unit 40, for generating deployment task list according to mission bit stream and parameter information, wherein, deployment task row
Table includes mission bit stream, performs the parameter information of required by task, and the priority of task;
Selecting unit 42, for selecting the mission dispatching of highest priority from deployment task list to corresponding component.
Fig. 5 is the alternative construction block diagram two of the device of distributed deployment Hadoop clusters according to embodiments of the present invention, such as
Shown in Fig. 5, in addition to including all modules shown in Fig. 3, device also includes the device:Monitoring module 50, in deployment module
According to Template Information and parameter information to one or more assemblies deployment task after, the monitoring one or more assemblies of the task is held
Traveling degree and/or log information.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong
Cross in the following manner realization, but not limited to this:Above-mentioned module is respectively positioned in same processor;Or above-mentioned modules are with any
The form of combination is located in different processors respectively.
Embodiment 3
The present embodiment is according to an alternative embodiment of the invention, for carrying out specific detailed explanation to the application and saying
It is bright:
Present embodiments provide a kind of distributed deployment Hadoop cluster methods and system.Overcome to disposing Hadoop collection
Group administrative staff require the shortcomings of high, Hadoop cluster component nodes arbitrarily distribute, installation kit loading source is single.It is of the invention abundant
One-touch distributed deployment Hadoop clusters are realized using hardware resource, each load on host computers situation in cluster.
A kind of distributed deployment Hadoop group systems of the present embodiment are included with lower component, framework as shown in Figure 1, bag
Include:
Template parser:Deployment template includes but is not limited to herein below:OC NCV ambda, user name, password,
Hadoop module informations, number of nodes information, carry disk information.The Template Information that template parser inputs to user parses
And carry out legitimacy verifies.
Monitor:Monitor is responsible at Hadoop deployment of components tasks carrying situations and the daily record of the transmission of Receiving Agent device
Reason.
Collector:Collector is responsible for the host information of Receiving Agent device transmission (including but not limited to herein below:Operation system
System information, CPU information, memory information, the network information, cpu busy percentage, memory usage, disk I/O utilization rate, network delay
Deng) and persistence.
Task generator:Host information that task generator gathers according to collector, deployment template information generation Hadoop
Deployment of components task list.
Task dispatcher:Host information, load on host computers situation and the deployment task that task dispatcher gathers according to collector
The deployment task of list selection high priority is issued to proxy server.
Proxy server:Proxy server includes the components such as collector, deployment device, parameter configuration device, monitor.Collector is responsible for timing
Collection host information is simultaneously sent to the collector of system;Deployment device receives and performed the task that task dispatcher issues;Parameter is matched somebody with somebody
Device is put to be responsible for configuring each component profiles of Hadoop;Monitor is responsible for monitoring deployment task implementation status and log collection, Fig. 6
It is the structural framing figure of proxy server in the present embodiment distributed deployment Hadoop group systems, as shown in Figure 6.
The deployment flow of proxy server when Fig. 7 is the original state of the present embodiment, as shown in fig. 7, the distribution of the present embodiment
Hadoop cluster methods are disposed including following:
Initialize deployment system
When system starts, monitor, collector and proxy server in distributed deployment Hadoop group systems are initialized, it is accurate
The standby deployment template for receiving user and submitting.
Dispose proxy server
Proxy server deployment task is generated by task generator and is performed by task dispatcher scheduler task.Proxy server has been disposed
Cheng Hou, collector timing acquiring node resources information simultaneously feed back to management system.
User submits Hadoop clustered deploy(ment) templates
User is filled in the Hadoop cluster informations for needing to dispose by deployment template requirement according to demand, submits deployment template.
Parse Hadoop clustered deploy(ment) templates
The monitor of distributed deployment Hadoop group systems receives the deployment template of user's submission, resolver parsing
Hadoop clustered deploy(ment)s template simultaneously verifies template legitimacy.
The deployment template and resource information submitted according to user, topology generator generation Hadoop cluster network topological diagrams.
Generate Hadoop cluster component deployment task
According to Hadoop cluster network topology graph structures, by task generator formation component deployment task.
Task dispatcher performs deployment task
Task dispatcher takes out pending deployment task and each node resources information from task list, and generation is pending
Task sequence;Task dispatcher takes out the deployment task of high priority successively, is handed down to corresponding proxy server.
Perform deployment task
After master agent device receives deployment task, deployment device performs deployment task;The monitor Real-time Feedback of proxy server
Deployment task implementation progress to deployment system monitor, monitor notice task dispatcher continue scheduler task perform.Repeat
Step " task dispatcher execution deployment task ", it is finished until needing deployment task.
The characteristics of each according to Hadoop clusters component of the present embodiment, with reference to cluster resource, reasonable distribution Hadoop cluster groups
The node of part;According to the load on host computers situation dynamically distributes deployment task of collection during deployment, key distribution is realized
Dispose Hadoop clusters.The present invention efficiently solves the complicated extensive Hadoop clusters of deployment, deployment time length, deployment system pressure
The shortcomings of power is big.
Fig. 8 is the flow chart of the Hadoop clustered deploy(ment) methods of the present embodiment, as shown in figure 8, Fig. 9 is the present embodiment
The timing diagram of Hadoop clustered deploy(ment) methods, as shown in figure 9, with reference to Fig. 8 and Fig. 9, the present embodiment includes:
System initialization:, it is necessary to be initialized to system, comprising first when distributed deployment Hadoop group systems start
Beginningization monitor, collector and proxy server A1 etc..
Proxy server is disposed:Dispose first and deployment proxy server A2 tasks are performed by proxy server A1, after the completion of proxy server A2 deployment,
Initialize and start proxy server A2;Then deployment proxy server A3, A4 task is performed by proxy server A1, A2, by that analogy, until collection
(such as Fig. 7) is completed in the deployment of All hosts proxy server in group.
101st, user submits deployment template:After the completion of the initialization of distributed deployment Hadoop group systems, user can be to
System submits qualified deployment template.Legal deployment template at least will be including but not limited to herein below:Hadoop collection
Group node number, each component client of Hadoop clusters module information, HDFS copies number, Hadoop clusters for needing to dispose connect
Connect number and time-out time, OC NCV ambda, user name and password, daily record storage dish, data storage disk, metadata storage dish etc.
Information.
102nd, template parser receives the legitimacy for verifying template after deployment template information first, if template is not met
Contractual requirements then terminate to dispose;Template is parsed if template is legal, is opened up by topological diagram generator generation Hadoop cluster networkings
Flutter figure.
103rd, according to node resource, each component Arranging principles of Hadoop clusters and deployment template information, topological diagram generator
Generate Hadoop cluster networkings topological diagram (such as S1).Hadoop cluster components Arranging principles include and are just not limited to following principle:1、
According to hardware resource and load on host computers situation, Hadoop component Master, Slave nodes are distributed;2nd, counted according to cluster internal segment
Amount, calculate ZOOKEEPER number of nodes and distribute;3rd, according to HDFS number of nodes, calculate Journalnode number of nodes and divide
Match somebody with somebody.Hadoop deployment of components task is including but not limited to following information:Component Name (such as HDFS), nodename are (such as:
NameNode), OC NCV ambda, task priority etc..
104th, the topological diagram of memory topology diagram generator generation.
105th, deployment task maker generates deployment task according to Hadoop cluster networkings topological diagram.
106th, the deployment task list of deployment task maker generation is stored.
107th, task dispatcher scanning deployment task list, takes out the deployment task having not carried out, root from task list
(average load, memory usage, disk I/O utilization, net are mainly examined or check according to load on host computers in node resources information computing cluster
Network time delay index), generate the deployment task sequence (such as S4) according to priority arranged.
108th, task dispatcher selection selects the deployment task of high priority successively, and deployment task is handed down to respective hosts
Proxy server.When performing deployment Hadoop component tasks first, a proxy server A2 Hadoop cluster is disposed by proxy server A1
Deployment of components task, proxy server A1 monitor monitoring deployment task implementation status simultaneously feed back to the monitor of deployment system (such as
S10).After monitor receives deployment task execution performance, task dispatcher regenerates according to task list, resource information
Task sequence (such as S5), task dispatcher selection high-priority task T3 and T4, from proxy server A1, A2 to proxy server A3, A4 portions
Acting is engaged in, by that analogy (such as S11 is as S14).Ideally, when t-th of moment (t is more than 0), whole cluster has 2t-1
Proxy server is performing deployment Hadoop component tasks.Certainly, each proxy server can open multiple threads, and be sent to multiple (such as 2
It is individual) proxy server deployment Hadoop component tasks, then in the ideal case, t-th of moment (t is more than 0), whole Hadoop clusters have
3t-1 proxy server is performing deployment Hadoop component tasks.
109th, the proxy server A1 set is closed with distributed deployment Hadoop cluster management systems.
110th, the proxy server that each host node is disposed in Hadoop clusters.
Configuration generation:Parameter configuration task completes each component Configuration generation of Hadoop clusters.Scheduler needs to collect entirely
Each component deployment information of Hadoop clusters (such as:The Hostname of node, daily record storage dish, data where Master and Slave
The information such as storage dish, metadata storage dish) and it is handed down to together with parameter configuration task the parameter in each master agent device assembly
Configurator.After all parameter configuration tasks carryings are complete in cluster, then whole each deployment of components of Hadoop clusters is completed.
201st, the hardware resource and running state information of collector timing acquiring this main frame in device assembly are acted on behalf of, and is reported
Collector into deployment system, stores node resource.Wherein hardware resource and running state information includes but unlimited
In herein below:Operation system information, host name, CPU information, memory information, disk, progress information, cpu busy percentage, internal memory
Utilization rate, disk I/O utilization, the network information, average I/O operation stand-by period etc..
202nd, each node resource (including main frame and Hadoop module informations) information of control store monitor collector collection.
Embodiment 4
Embodiments of the invention additionally provide a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium can
The program code for performing following steps to be arranged to storage to be used for:
S1, the Template Information for disposing Hadoop clusters is received, wherein, Template Information is used to indicate Hadoop clusters
Mission bit stream and host information, mission bit stream are used to describe task of needing Hadoop clusters to complete;
S2, the parameter information of one or more main frames of Hadoop clusters is gathered according to host information, wherein, each main frame
For disposing one or more assemblies, component is disposed by proxy server, for performing corresponding task;
S3, according to mission bit stream and parameter information to one or more assemblies deployment task.
Alternatively, in the present embodiment, above-mentioned storage medium can include but is not limited to:USB flash disk, read-only storage (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or
CD etc. is various can be with the medium of store program codes.
Alternatively, in the present embodiment, processor performs reception according to the program code stored in storage medium and is used for
The Template Information of Hadoop clusters is disposed, wherein, Template Information is used for the mission bit stream and host information for indicating Hadoop clusters,
Mission bit stream is used to describe task of needing Hadoop clusters to complete;
Alternatively, in the present embodiment, processor is performed according to main frame according to the program code stored in storage medium
The parameter information of one or more main frames of information gathering Hadoop clusters, wherein, each main frame is used to dispose one or more
Component, component are disposed by proxy server, for performing corresponding task;
Alternatively, in the present embodiment, processor is performed according to task according to the program code stored in storage medium
Information and parameter information are to one or more assemblies deployment task.
Alternatively, the specific example in the present embodiment may be referred to described in above-described embodiment and optional embodiment
Example, the present embodiment will not be repeated here.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general
Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed
Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are stored
Performed in the storage device by computing device, and in some cases, can be with different from shown in order execution herein
The step of going out or describing, they are either fabricated to each integrated circuit modules respectively or by multiple modules in them or
Step is fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific hardware and software combination.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.
Claims (10)
- A kind of 1. method of distributed deployment Hadoop clusters, it is characterised in that including:The Template Information for disposing Hadoop clusters is received, wherein, the Template Information is used to indicate the Hadoop clusters Mission bit stream and host information, the mission bit stream is used to describe task of needing the Hadoop clusters to complete;The parameter information of one or more main frames of the Hadoop clusters is gathered according to the host information, wherein, Mei Gesuo State main frame to be used to dispose one or more assemblies, the component is disposed by proxy server, for performing corresponding task;According to the mission bit stream and the parameter information to one or more deployment of components tasks.
- 2. according to the method for claim 1, it is characterised in that the parameter information includes at least one of:Main frame is grasped Make system information, mainframe network information, host CPU information, host memory information, host CPU utilization rate, host memory to use Rate, host disk IO utilization rates, mainframe network time delay, main frame average I/O operation stand-by period, host disk information, group in main frame The progress information of part.
- 3. according to the method for claim 1, it is characterised in that according to the mission bit stream and the parameter information to described One or more assemblies deployment task in Hadoop clusters includes:According to the mission bit stream and parameter information generation deployment task list, wherein, the deployment task list includes The mission bit stream, the parameter information for performing the required by task, and the priority of the task;The mission dispatching of highest priority is selected from the deployment task list to corresponding component.
- 4. according to the method for claim 3, it is characterised in that attribute and/or execution of the priority with the task The parameter information of the task is related.
- 5. according to the method for claim 1, it is characterised in that according to the Template Information and the parameter information to one After individual or multiple deployment of components tasks, methods described also includes:Monitor the tasks carrying progress and/or log information of one or more of components.
- 6. according to the method for claim 1, it is characterised in that the Template Information includes at least one of:Hadoop Cluster system number, needs dispose Hadoop clusters module information, Hadoop distributed file system HDFS copies number, Each component client connection number of Hadoop clusters and time-out time, OC NCV ambda, host subscriber's name and password, daily record storage Disk information, data storage disk information, metadata disc information.
- 7. according to the method for claim 1, it is characterised in that receive be used for dispose Hadoop clusters Template Information it Afterwards, methods described also includes:Parse the Template Information and verify the legitimacy of the Template Information.
- A kind of 8. device of distributed deployment Hadoop clusters, it is characterised in that including:Receiving module, for receiving the Template Information for being used for disposing Hadoop clusters, wherein, the Template Information is used to indicate institute The mission bit stream and host information of Hadoop clusters are stated, the mission bit stream is used to describe to need what the Hadoop clusters were completed Task;Acquisition module, believe for gathering the parameter of one or more main frames of the Hadoop clusters according to the host information Breath, wherein, each main frame includes one or more assemblies, and the component is disposed by proxy server, for performing corresponding appoint Business;Deployment module, for according to the mission bit stream and the parameter information to one or more deployment of components tasks.
- 9. device according to claim 8, it is characterised in that deployment module also includes:Generation unit, for generating deployment task list according to the mission bit stream and the parameter information, wherein, the deployment Task list includes the mission bit stream, performs the parameter information of the required by task, and the priority of the task;Selecting unit, for selecting the mission dispatching of highest priority from the deployment task list to corresponding component.
- 10. device according to claim 9, it is characterised in that described device also includes:Monitoring module, in the deployment module according to the Template Information and the parameter information to described in one or more After deployment of components task, the tasks carrying progress and/or log information of one or more of components are monitored.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610395969.2A CN107463582B (en) | 2016-06-03 | 2016-06-03 | Distributed Hadoop cluster deployment method and device |
PCT/CN2017/083207 WO2017206667A1 (en) | 2016-06-03 | 2017-05-05 | Method and device for distributively deploying hadoop cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610395969.2A CN107463582B (en) | 2016-06-03 | 2016-06-03 | Distributed Hadoop cluster deployment method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107463582A true CN107463582A (en) | 2017-12-12 |
CN107463582B CN107463582B (en) | 2021-11-12 |
Family
ID=60479660
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610395969.2A Active CN107463582B (en) | 2016-06-03 | 2016-06-03 | Distributed Hadoop cluster deployment method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107463582B (en) |
WO (1) | WO2017206667A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108228796A (en) * | 2017-12-29 | 2018-06-29 | 百度在线网络技术(北京)有限公司 | Management method, device, system, server and the medium of MPP databases |
CN109284272A (en) * | 2018-09-07 | 2019-01-29 | 郑州云海信息技术有限公司 | A kind of dispositions method of distributed file system, device and equipment |
CN109508196A (en) * | 2018-10-15 | 2019-03-22 | 广州云新信息技术有限公司 | Automatic deployment system and method based on X86 server |
CN110262807A (en) * | 2019-06-20 | 2019-09-20 | 北京百度网讯科技有限公司 | Cluster creates Progress Log acquisition system, method and apparatus |
CN110457114A (en) * | 2019-07-24 | 2019-11-15 | 杭州数梦工场科技有限公司 | Application cluster dispositions method and device |
CN111866013A (en) * | 2020-07-29 | 2020-10-30 | 杭州安恒信息技术股份有限公司 | Cloud security product management platform deployment method, device, equipment and medium |
CN112363818A (en) * | 2020-11-30 | 2021-02-12 | 杭州玳数科技有限公司 | Method for realizing Hadoop MR task cluster independence under Yarn scheduling |
CN113886036A (en) * | 2021-09-13 | 2022-01-04 | 天翼数字生活科技有限公司 | Method and system for optimizing cluster configuration of distributed system |
WO2024055715A1 (en) * | 2022-09-15 | 2024-03-21 | 华为云计算技术有限公司 | Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061503B (en) * | 2018-10-16 | 2023-08-18 | 航天信息股份有限公司 | Cluster system configuration method and cluster system |
CN111581042B (en) * | 2019-02-15 | 2023-09-12 | 网宿科技股份有限公司 | Cluster deployment method, deployment platform and server to be deployed |
CN110389766B (en) * | 2019-06-21 | 2022-12-27 | 深圳市汇川技术股份有限公司 | HBase container cluster deployment method, system, equipment and computer readable storage medium |
CN111754191A (en) * | 2020-06-08 | 2020-10-09 | 中国建设银行股份有限公司 | Automatic change method based on cloud platform and related equipment |
CN112732410B (en) * | 2021-01-21 | 2023-03-28 | 青岛海尔科技有限公司 | Service node management method and device, storage medium and electronic device |
CN114816444A (en) * | 2021-01-28 | 2022-07-29 | 网联清算有限公司 | Method and device for deploying monitoring program, electronic equipment and storage medium |
CN113132383B (en) * | 2021-04-19 | 2022-03-25 | 烟台中科网络技术研究所 | Network data acquisition method and system |
CN115499304B (en) * | 2022-07-29 | 2024-03-08 | 天翼云科技有限公司 | Automatic deployment method, device, equipment and product for distributed storage |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130024496A1 (en) * | 2011-07-21 | 2013-01-24 | Yahoo! Inc | Method and system for building an elastic cloud web server farm |
US20130031542A1 (en) * | 2011-07-28 | 2013-01-31 | Yahoo! Inc. | Method and system for distributed application stack deployment |
CN103064742A (en) * | 2012-12-25 | 2013-04-24 | 中国科学院深圳先进技术研究院 | Automatic deployment system and method of hadoop cluster |
CN103152393A (en) * | 2013-02-05 | 2013-06-12 | 北京邮电大学 | Charging method and charging system for cloud computing |
US20130167139A1 (en) * | 2011-12-21 | 2013-06-27 | Yahoo! Inc. | Method and system for distributed application stack test certification |
CN104317610A (en) * | 2014-10-11 | 2015-01-28 | 福建新大陆软件工程有限公司 | Method and device for automatic installation and deployment of hadoop platform |
CN105302641A (en) * | 2014-06-04 | 2016-02-03 | 杭州海康威视数字技术股份有限公司 | Node scheduling method and apparatus in virtual cluster |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104734892A (en) * | 2015-04-02 | 2015-06-24 | 江苏物联网研究发展中心 | Automatic deployment system for big data processing system Hadoop on cloud platform OpenStack |
-
2016
- 2016-06-03 CN CN201610395969.2A patent/CN107463582B/en active Active
-
2017
- 2017-05-05 WO PCT/CN2017/083207 patent/WO2017206667A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130024496A1 (en) * | 2011-07-21 | 2013-01-24 | Yahoo! Inc | Method and system for building an elastic cloud web server farm |
US20130031542A1 (en) * | 2011-07-28 | 2013-01-31 | Yahoo! Inc. | Method and system for distributed application stack deployment |
US20130167139A1 (en) * | 2011-12-21 | 2013-06-27 | Yahoo! Inc. | Method and system for distributed application stack test certification |
CN103064742A (en) * | 2012-12-25 | 2013-04-24 | 中国科学院深圳先进技术研究院 | Automatic deployment system and method of hadoop cluster |
CN103152393A (en) * | 2013-02-05 | 2013-06-12 | 北京邮电大学 | Charging method and charging system for cloud computing |
CN105302641A (en) * | 2014-06-04 | 2016-02-03 | 杭州海康威视数字技术股份有限公司 | Node scheduling method and apparatus in virtual cluster |
CN104317610A (en) * | 2014-10-11 | 2015-01-28 | 福建新大陆软件工程有限公司 | Method and device for automatic installation and deployment of hadoop platform |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108228796A (en) * | 2017-12-29 | 2018-06-29 | 百度在线网络技术(北京)有限公司 | Management method, device, system, server and the medium of MPP databases |
CN109284272A (en) * | 2018-09-07 | 2019-01-29 | 郑州云海信息技术有限公司 | A kind of dispositions method of distributed file system, device and equipment |
CN109508196A (en) * | 2018-10-15 | 2019-03-22 | 广州云新信息技术有限公司 | Automatic deployment system and method based on X86 server |
CN110262807A (en) * | 2019-06-20 | 2019-09-20 | 北京百度网讯科技有限公司 | Cluster creates Progress Log acquisition system, method and apparatus |
CN110262807B (en) * | 2019-06-20 | 2023-12-26 | 北京百度网讯科技有限公司 | Cluster creation progress log acquisition system, method and device |
CN110457114A (en) * | 2019-07-24 | 2019-11-15 | 杭州数梦工场科技有限公司 | Application cluster dispositions method and device |
CN111866013A (en) * | 2020-07-29 | 2020-10-30 | 杭州安恒信息技术股份有限公司 | Cloud security product management platform deployment method, device, equipment and medium |
CN112363818A (en) * | 2020-11-30 | 2021-02-12 | 杭州玳数科技有限公司 | Method for realizing Hadoop MR task cluster independence under Yarn scheduling |
CN113886036A (en) * | 2021-09-13 | 2022-01-04 | 天翼数字生活科技有限公司 | Method and system for optimizing cluster configuration of distributed system |
CN113886036B (en) * | 2021-09-13 | 2024-04-19 | 天翼数字生活科技有限公司 | Method and system for optimizing distributed system cluster configuration |
WO2024055715A1 (en) * | 2022-09-15 | 2024-03-21 | 华为云计算技术有限公司 | Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107463582B (en) | 2021-11-12 |
WO2017206667A1 (en) | 2017-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107463582A (en) | The method and device of distributed deployment Hadoop clusters | |
CN110138575B (en) | Network slice creating method, system, network device and storage medium | |
CN103516807B (en) | A kind of cloud computing platform server load balancing system and method | |
CN105245301B (en) | A kind of airborne optical-fiber network analogue system based on time triggered | |
CN106484886A (en) | A kind of method of data acquisition and its relevant device | |
CN108848146B (en) | Scheduling optimization method based on time-triggered communication service | |
CN104503832B (en) | A kind of scheduling virtual machine system and method for fair and efficiency balance | |
EP2330525A1 (en) | Parallel computing method and computing platform for security and stability analysis of large power grid | |
CN106375328A (en) | Adaptive optimization operation method of large-scale data distribution system | |
CN110059829A (en) | A kind of asynchronous parameters server efficient parallel framework and method | |
CN110177146A (en) | A kind of non-obstruction Restful communication means, device and equipment based on asynchronous event driven | |
CN108228796A (en) | Management method, device, system, server and the medium of MPP databases | |
CN111262723B (en) | Edge intelligent computing platform based on modularized hardware and software definition | |
Kanwal et al. | A genetic based leader election algorithm for IoT cloud data processing | |
Yang et al. | Smart intent-driven network management | |
CN106254452A (en) | The big data access method of medical treatment under cloud platform | |
US9547747B2 (en) | Distributed internet protocol network analysis model with real time response performance | |
CN107769934A (en) | Rate processing method and processing device | |
CN103634290A (en) | Network simulation system | |
CN109687985B (en) | Automatic configuration method and system for process level network of transformer substation | |
CN115866059A (en) | Block chain link point scheduling method and device | |
CN110190988A (en) | A kind of service deployment method and device | |
Meddeber et al. | Tasks assignment for Grid computing | |
CN109120443A (en) | A kind of management method and device of network attached storage NAS device | |
CN112532427B (en) | Planning and scheduling method of time-triggered communication network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |