CN106933622A

CN106933622A - The Hadoop dispositions methods of model-driven in cloud environment

Info

Publication number: CN106933622A
Application number: CN201710094086.2A
Authority: CN
Inventors: 武永卫; 陈康; 郑纬民; 陈哲毅
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2017-02-21
Filing date: 2017-02-21
Publication date: 2017-07-07

Abstract

The invention discloses a kind of Hadoop dispositions methods of model-driven in cloud environment, including：The Hadoop dispositions methods of model-driven in a kind of cloud environment；Model conversion between the Hadoop demand models and the Hadoop deployment models is realized according to default transformation rule；The information change situation in the Hadoop demand models and the Hadoop deployment models is monitored using synchronization engine, and information in the Hadoop demand models and/or the Hadoop deployment models carries out synchronizing information when changing.The invention has the advantages that：Diversified software and hardware resources in cloud environment can be managed and deployment.

Description

The Hadoop dispositions methods of model-driven in cloud environment

Technical field

The present invention relates to field of software engineering, the Hadoop dispositions methods of model-driven in more particularly to a kind of cloud environment.

Background technology

Today's society, has substantial amounts of data traffic to generate daily, and the data in the whole world 90% were produced in two years in the past Raw, mass data processing technology has been widely applied to the every field of social production, this also means that the big data epoch It is real to arrive.

, used as a kind of open source software framework of big data distributed treatment, it can be with reliable, efficient, expansible for Hadoop Mode process mass data.Additionally, with the Hadoop ecosystems fast-developing and its a large amount of sub-project develop it is successive Completion, its treatment and storage for being widely used in big data under various scenes.Nowadays, Hadoop has become big data Process one of most important Software tool.As Hadoop is deployed in cloud more and more widely, keeper is needed according to specific Demand, is disposed and is configured to Hadoop in a different manner, therefore brings two challenges of aspect to Hadoop deployment：

(1) diversity of hardware resource：Hadoop clusters may be deployed in different types of infrastructure, including physics Server, virtual machine and Docker containers etc., this isomerism brings difficulty and complexity to the management of clustered node.

(2) diversity of software resource：The Hadoop ecosystems include various different types of calculating and storing framework, example Such as, HDFS, MapReduce, HBase, Yarn, Spark etc..Different types of framework has specific deployment and collocation method, Additionally, also there is dependence or restriction relation between some frameworks.

Being currently, there are some management tools can help user to dispose Hadoop clusters, such as Cloudera Manager With Apache Ambari.Additionally, the container engine Docker that increases income is by the encapsulation to application component, distribution, deployment, operation etc. The management of life cycle, reaches " once encapsulating, run everywhere " of application component rank, reduces Hadoop deployment and O＆M Difficulty.Although deployment of the above-mentioned deployment tool with technology to Hadoop clusters provides solution, research with management Emphasis is mostly the setting of the configuration with parameter of environment, and the commonly provided is a kind of deployment mode of fixation, is not accounted for The diversified infrastructure and scaling concern of cloud platform, it is impossible to according to COS, node resource and scene characteristics come Meet the specific Hadoop deployment requirements of user.

The content of the invention

It is contemplated that at least solving one of above-mentioned technical problem.

Therefore, the Hadoop dispositions methods it is an object of the invention to propose model-driven in a kind of cloud environment, can be right Diversified software and hardware resources is managed and deployment in cloud environment.

To achieve these goals, embodiment of the invention discloses that the Hadoop of model-driven is disposed in a kind of cloud environment Method, comprises the following steps：S1：Hadoop demand models and Hadoop deployment models are provided, wherein, the Hadoop demands mould Type is used to generate corresponding administration view according to system requirements, and the Hadoop deployment models are used to describe the management examination The node configuration information of figure, running status and software are disposed；S2：The Hadoop demands are realized according to default transformation rule Model conversion between model and the Hadoop deployment models, wherein, the default transformation rule includes node transformation model With cluster service transformation model, the node transformation model is used to realizing the node of the Hadoop demand models and described Model conversion between the node of Hadoop deployment models, the cluster service transformation model is used to realize the Hadoop demands Model conversion between the cluster service of the cluster service of model and the Hadoop deployment models；S3：Supervised using synchronization engine The information change situation surveyed in the Hadoop demand models and the Hadoop deployment models, and in the Hadoop demands mould Information in type and/or the Hadoop deployment models carries out synchronizing information when changing.

Further, the Hadoop demand models include：Clustered node module, the clustered node module is provided with base Infrastructure resource, the infrastructure resources include node configured list, node listing and corresponding money in container image list Source and attribute；Cluster service module, the cluster service module is provided with service list, and the service list includes various clothes Business and the attribute of every kind of service.

Further, the Hadoop deployment models include：Clustered node unit, the clustered node unit is stored with void Plan machine configured list, virtual machine list and virtual machine image list；Cluster service unit, the cluster service unit is used to provide Cluster service.

Further, the node transformation model is disposed by the node and the Hadoop of the Hadoop demand models Element mapping relations between the node of model come implementation model conversion, the element mapping relations include helper labels and Mapper labels, the helper labels are used to describe the mapping relations of element between class and class, and the helper labels are used for The mapping relations of attribute between description class and class.

Further, the cluster service transformation model carries out cluster service by restricted model and default transfer algorithm Automatic conversion, the restricted model is used to limiting incidence relation between multiple model elements, the default transfer algorithm according to The Hadoop demand models and restricted model generation service arrangement scheme.

Further, the default Deployment Algorithm is comprised the following steps：Arranged according to being serviced in the Hadoop demand models Service unit under table, obtains needing the set of service of deployment；It is right according to the dependence between service unit in restricted model Service in set of service is supplemented and sorted, and obtains being actually needed the service ordered set of deployment；Had according to the service Ordered sets, each deployment scheme for servicing and calculating service is successively read according to the mode of backward；According to service arrangement unit Node set, the deployment for being serviced successively.

Further, the Hadoop deployment models are constructed using SM@RT instruments.

The Hadoop dispositions methods of model-driven in cloud environment according to embodiments of the present invention, architecture mould during by operation Type is incorporated into during Hadoop deployment, realizes meeting user spy by model proposition, model conversion and the step of mold sync three Fixed Hadoop deployment requirements.

Additional aspect of the invention and advantage will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by practice of the invention.

Brief description of the drawings

Of the invention above-mentioned and/or additional aspect and advantage will become from description of the accompanying drawings below to embodiment is combined Substantially and be readily appreciated that, wherein：

Fig. 1 be the embodiment of the present invention cloud environment in model-driven Hadoop dispositions methods flow chart；

Fig. 2 is the schematic diagram of the Hadoop demand meta-models of one embodiment of the invention；

Fig. 3 is the schematic diagram of the Hadoop deployment meta-models of one embodiment of the invention；

Fig. 4 be one embodiment of the invention model element between mapping relations schematic diagram；

Fig. 5 is the schematic diagram of the restricted model meta-model of one embodiment of the invention；

Fig. 6 is the schematic diagram of the restricted model of one embodiment of the invention；

Fig. 7 is operation parameter change explanatory diagram when carrying out of the Hadoop cluster services deployment of one embodiment of the invention；

Fig. 8 is the schematic diagram of the Hadoop deployment models with the bi-directional synchronization of runtime of one embodiment of the invention；

Fig. 9 is the schematic diagram of Hadoop demand models in the specific embodiment of the invention；

Figure 10 is the schematic diagram of Hadoop deployment models in the specific embodiment of the invention.

Specific embodiment

Embodiments of the invention are described below in detail, the example of embodiment is shown in the drawings, wherein identical from start to finish Or similar label represents same or similar element or the element with same or like function.Retouched below with reference to accompanying drawing The embodiment stated is exemplary, is only used for explaining the present invention, and is not considered as limiting the invention.

With reference to following description and accompanying drawing, it will be clear that these and other aspects of embodiments of the invention.In these descriptions In accompanying drawing, specifically disclose some particular implementations in embodiments of the invention to represent implementation implementation of the invention Some modes of the principle of example, but it is to be understood that the scope of embodiments of the invention is not limited.Conversely, of the invention Embodiment includes all changes, modification and the equivalent that fall into the range of the spiritual and intension of attached claims.

Below in conjunction with the Description of Drawings present invention.

Fig. 1 be the embodiment of the present invention cloud environment in model-driven Hadoop dispositions methods schematic diagram.Such as Fig. 1 institutes Show, the Hadoop dispositions methods of model-driven are comprised the following steps in the cloud environment of the embodiment of the present invention：

S1：Hadoop demand models and Hadoop deployment models are provided.Wherein, Hadoop demand models are used for basis System requirements generate corresponding administration view, and Hadoop deployment models are used to describe node configuration information, the operation that management attempts State and software are disposed；

In an embodiment of the invention, Hadoop demand models include：

Clustered node module, clustered node module is provided with infrastructure resources, and infrastructure resources are configured including node List, node listing and corresponding resource and attribute in container image list；Cluster service module, cluster service module is provided with Service list, service list includes the attribute of various services and every kind of service.

Specifically, during the deployment of Hadoop clusters, Hadoop demand models provide node resource for keeper With the unified administration view of cluster service.As shown in Fig. 2 demand meta-model is by clustered node and cluster service two parts group Into.

In clustered node part, Infrastructure represents infrastructure resources, comprising a NodeTypes element, One Nodes element and an Images element.NodeTypes elements are node configured list, represent usable configuration text The set of part, and the one node configuration of each NodeType element representation, belong to comprising id, name, ram, disk, cpus etc. Property, the information such as identifier, title, disk, memory, CPU quantity are represented successively；Nodes element representation node listings, represent section The set of point configuration, and one node of each Node element representation, including id, name, nodeType, imgeId, Status Deng attribute, the information such as identifier, the title of node, node type, container image, the node state of node are represented successively； Images is container image list, represents the set of usable container image file, and each Image element representation one Container image, comprising attributes such as id, name, size, status, miniDisk, miniRam, represents identifier, the mirror image of mirror image The information such as title, mirror image size, mirrored state, disk, storage.

In the service list that cluster service part, Services element representations Hadoop are provided, comprising The elements such as HDFSService, MapReduceService, HBaseService, YarnService, SparkService, respectively Represent the service such as HDFS, MapReduce, HBase, Yarn and Spark.With HDFS Service, MapReduce Service, It is illustrated as a example by HBase Service：HDFSService represents the HDFS services of cluster, and the service is by multiple sections in cluster Point is provided, and each node disposes corresponding HDFS software modules, i.e. HDFSAgent；HBaseService represents cluster HBase is serviced, and the service is provided by multiple nodes in cluster, and each node disposes corresponding HBase software modules, i.e., HBaseAgent；MapReduce Service then represent the MapReduce services of cluster, and the service is by multiple nodes in cluster There is provided, each node disposes corresponding MapReduce software modules, i.e. MapReduceAgent.Above-mentioned all of service is all Comprising id, name, version and status attribute, identifier, title, version and current operating conditions are represented respectively, and its is soft Part deployment module (i.e. Agent) then includes health and nodeId attributes, where expression health status and software deployment module The information such as node location.

In an embodiment of the invention, Hadoop deployment models include：Clustered node unit, the storage of clustered node unit There are virtual machine configuration list, virtual machine list and virtual machine image list；Cluster service unit, cluster service unit is used to provide Cluster service.

Specifically, during Hadoop clustered deploy(ment)s, deployment model is regarded for the management that keeper provides system deployment Figure, describes clustered node configuration, running status and software deployment unit thereon, and with runtime bi-directional synchronization.Such as Shown in Fig. 3, deployment meta-model also includes clustered node and cluster service two parts.

In clustered node part, by taking Cloudstack as an example, Project represents a project, comprising a VMTypes unit Element, a VMs element and an Images element.VMTypes elements are virtual machine configuration list, represent usable configuration text The set of part, one virtual machine configuration of each VMType element representation, comprising id, name, description, The attributes such as guestCpus, memoryMb, imageSpaceGb, successively represent identifier, title, virtual machine description, CPU quantity, The information such as internal memory, mirror image space size；VMs element representation virtual machine lists, represent the set of virtual machine configuration, each VM units Element represents a virtual machine, comprising the attributes such as id, name, imageId, vmtypeId, cpuutiliz, memoryutiliz, table Show the information such as the information such as identifier, title, virtual machine image, type of virtual machine, CPU usage, memory usage；Images is Virtual machine image list, represents the set of usable virtual machine image file, one virtual machine of each Image element representation Image, comprising attributes such as id, name, vsize, description, represents identifier, title, mirror image size, description of image etc. Information.

In the service list that cluster service part, Services element representations Hadoop are provided, comprising The elements such as HDFSService, MapReduceService, HBaseService, YarnService, SparkService, respectively The service such as HDFS, MapReduce, HBase, Yarn and Spark is represented, every kind of services package contains service unit, character units, deployment Three kinds of main models elements such as unit.Service unit represents the specific service that Hadoop is provided；Character units represent specific Hadoop services included different role；Deployment unit represents the software module being currently running that different role is possessed.With It is illustrated as a example by HDFSService, MapReduceService, HBaseService.HDFS service units include three species The character units of type, i.e. NameNode, SecondaryNameNode and DataNode；Wherein, character units NameNode is represented The administrative center of HDFS, the duplication of NameSpace, cluster configuration information and memory block for managing file system has and only has One deployment unit DU_NN；Character units SecondaryNameNode represents the backup node of NameNode, for backing up The metadata that NameNode nodes are preserved, one and only one deployment unit DU_SNN；Character units DataNode represents HDFS Working node, for dispatch storage and retrieval data, can have multiple deployment unit DU_DN.MapReduce service unit bags Containing two kinds of character units, i.e. JobTracker and TaskTracker；Wherein, character units JobTracker is represented The center service node of MapReduce, each subtask task for dispatching Job makes it run on TaskTracker, The deployment unit DU_JT of one and only one JobTracker；TaskTracker represents sub-services node, for performing The task of JobTracker distribution, can there is the deployment unit DU_TT of multiple TaskTracker.HBase service units include two The character units of type, i.e. HMaster and HRegionServer；Wherein, during HMaster represents the management and dispatching of HBase The heart, for distributing and managing HRegionServer, the deployment unit DU_HM of one and only one HMaster； HRegionServer represents that HBase operates in the service on each working node, the Region for safeguarding HMaster distribution With I/O Request, there can be the deployment unit DU_RS of multiple HRegionServer.The deployment unit of above-mentioned different role is all included The information such as vmId and health attributes, the virtual machine position where representing the operation health status and software module of deployment unit.

S2：Model conversion between Hadoop demand models and Hadoop deployment models is realized according to default transformation rule. Wherein, presetting transformation rule includes node transformation model and cluster service transformation model, and node transformation model is used to realize Model conversion between the node of Hadoop demand models and the node of Hadoop deployment models, cluster service transformation model is used for Realize the model conversion between the cluster service of Hadoop demand models and the cluster service of Hadoop deployment models.

In one embodiment of the invention, node and Hadoop portion of the node transformation model by Hadoop demand models Affix one's name to model node between element mapping relations come implementation model conversion, element mapping relations include helper labels and Mapper labels, helper labels are used to describing the mapping relations of element between class and class, helper labels be used to describing class and The mapping relations of attribute between class.

Specifically, under different application scene, the clustered node part in deployment model has larger difference.For example, In CloudStack, main administrative unit includes VM, Flavor, Image, represents virtual machine, configuration file and mirror image；And In Docker, main administrative unit is then Container, Repository, Image, represents container, warehouse and mirror image. Accordingly, it would be desirable to the element mapping relations for setting up demand model and deployment model node section carry out implementation model conversion.

Embodiments of the invention devise the description rule of a set of mapping relations and the conversion method of model manipulation, according to giving Element mapping relations between fixed model, carry out demand model to the conversion of deployment model automatically.Element mapping between model is closed System is described by an XML file, and keyword is defined as follows in description rule：

(1) helper is used to describe the mapping relations of element between class and class.Helper labels contain two attributes： Value and key, value represent the element that will be mapped in demand model, and key represents corresponding element in deployment model.

(2) mapper is used to describe the mapping relations of attribute between class and class.Mapper labels also contain key and value Two attributes, key represents the title of the element property that deployment model should be mapped, and value represents element pair in demand model The title of attribute is answered, the element that they are belonged to is respectively by the helper tag definitions of last layer.

Based on above-mentioned keyword, it is possible to according to mapping ruler in demand model and deployment model proposed by the invention The mapping relations of element be described.As shown in figure 4, the NodeType elements in demand model are mapped in deployment model VMType elements, are described with a helper label, and the key attributes and value attributes of helper labels are respectively deployment mould The title of type and demand model corresponding element, i.e. VMType and NodeType.Wherein, id pairs of the id and VMType of NodeType Should, name is corresponding with name, and ram is corresponding with memoryMb, and disk is corresponding with imageSpaceGb, vcpus and guestCpus pairs Should.

Management to system can be realized by model manipulation, model manipulation includes five kinds of fundamental types, i.e. Get, Set, List, Add and Remove.Any one acts on the model manipulation of demand model element, is converted into one and acts on deployment The model manipulation of model element.As shown in table 1, invention defines the transformation rule of model manipulation, model manipulation is realized Automatic conversion.For example, the element A in demand model is mapped as the B element in deployment model, then, List for element A, Add and Remove operations will be mapped to that same operation in corresponding B element, and Get or the Set operation for A attributes also will be by It is mapped to the same operation to respective attributes.

The mapping ruler of the model manipulation of table 1 conversion

In one embodiment of the invention, cluster service transformation model is carried out by restricted model and default transfer algorithm The automatic conversion of cluster service, restricted model is used to limit the incidence relation between multiple model elements, presets transfer algorithm root According to Hadoop demand models and restricted model generation service arrangement scheme.

Specifically, it is various not comprising HDFS, MapReduce, Hbase, Yarn, Spark etc. in the Hadoop ecosystems The calculating of same type and storing framework, these are calculated or storing framework has specific deployment and collocation method, and different frames Dependence or restriction relation are there may be between frame.For example, deployment MapReduce services need to rely on HDFS services.Therefore, it is right In different Hadoop cluster services, there is larger difference in model conversion method.The present invention describes cluster service by restricted model Deployment rule, and the automatic conversion of cluster service is realized by a general-purpose algorithm.

Restricted model describes a kind of deployment rule of Hadoop services.As shown in figure 5, the meta-model of restricted model is included Several main model elements such as service unit, character units, deployment unit, and describe the incidence relation between them.Its In, Service represents service unit, comprising name, minNodeNum attribute, title and minimum deployment nodes is represented successively； Role represents character units, comprising attributes such as name, excluName, resPriority, deOrder, title, row is represented successively His attribute, resource prioritization and deployment is sequentially；DU represents deployment unit, comprising attributes such as health, represents the healthy shape of service Condition；Dependency_S represents the dependence between service unit, comprising name attributes, the relied on service unit of expression Title；Similarly, Dependency_DU represents the dependence between deployment unit, and name attributes then represent relied on portion Affix one's name to the title of unit.When there is dependence between service unit, dependence is not necessarily present between deployment unit；But It is, when there is dependence between deployment unit, dependence to be certainly existed between service unit.

As shown in fig. 6, being illustrated by taking the service such as HDFS, MapReduce, HBase as an example.HDFS service units include three Plant character units, i.e. NameNode, SeconderyNameNode and DataNode.Wherein, character units NameNode has and only There are a deployment unit DU_NN, character units SeconderyNameNode one and only one deployment unit DU_SNN, and NameNode and SeconderyNameNode can not be deployed in same node；Character units DataNode can have multiple deployment Cells D U_DN, and DataNode can be deployed in same node with NameNode or SeconderyNameNode；Additionally, HDFS Service unit does not exist dependence to other service units, and deployment unit DU_NN, DU_SNN and DU_DN are to other deployments unit Also in the absence of dependence.MapReduce service units include two kinds of character units, i.e. JobTracker and TaskTracker；Its In, one and only one deployment unit of character units JobTracker DU_JT；Character units TaskTracker can have multiple Deployment unit DU_TT, and JobTracker and TaskTracker can not be deployed in same node；Additionally, MapReduce is serviced Unit is to HDFS service units and there is dependence, and deployment unit DU_JT has dependence, deployment unit DU_ to deployment unit DU_NN There is dependence to deployment unit DU_DN in TT.HBase service units include two kinds of character units, i.e. HMaster and HRegionServer；Wherein, one and only one deployment unit of character units HMaster DU_HM；Character units HRegionServer can have multiple deployment unit DU_RS, and HMaster and HRegionServer can not be deployed in same section Point；Additionally, HBase service units are to HDFS service units and there is dependence, deployment unit DU_HM is deposited to deployment unit DU_NN Relying on, deployment unit DU_RS has dependence to deployment unit DU_DN.

In one embodiment of the invention, default Deployment Algorithm is comprised the following steps：According in Hadoop demand models Service unit under service list, obtains needing the set of service of deployment；According to the dependence between service unit in restricted model Relation, is supplemented and is sorted to the service in set of service, obtains being actually needed the service ordered set of deployment；According to service Ordered set, each deployment scheme for servicing and calculating service is successively read according to the mode of backward；According to service arrangement list The node set of unit, the deployment for being serviced successively.

Specifically, embodiments of the invention propose a kind of general-purpose algorithm, can be according to given demand model and constraint Model, automatically generates service arrangement scheme, and the basic step of algorithm is as follows：

1st, service unit according to demand in model under service list services, obtains needing the set of service of deployment services{S1,S2,S3,…,Si}。

2nd, according to the dependence between service unit in restricted model, the service in set of service services is carried out Supplement and sequence, obtain being actually needed service ordered set servicesOrder { S1, S2, S3 ..., Sj } of deployment；If clothes Other services that certain service is relied in business set are not appeared in set of service, then need to be supplemented；In ordered set In servicesOrder, service S1 does not rely on other services, and service S2 can be dependent on service S1, and service S3 can be dependent on clothes Business S1 and service S2, by that analogy.

3rd, according to ordered set servicesOrder, it is successively read each according to the mode of backward and services and calculate clothes The deployment scheme of business：First, letter according to demand in model in the software deployment module (i.e. Agent) of the service of current deployment Breath, obtains the deployment node listing of the service；Secondly, character types and its portion that the service is included are obtained according to restricted model Administration's constraint；Then, the node that the deployment unit of each role is disposed is calculated according to nodal information and role's deployment constraint； Finally, the dependence according to the deployment unit of role in restricted model records the deployment node letter of the deployment unit of its dependence Breath, obtains the node set of the deployment unit of each service in ordered set servicesOrder, i.e., servicesOrderDeploy{S1{DU1{},DU2{},...},S2{DU1{},DU2{},...},...,Sn{DU1{},DU2 { } ... } }, for example, the node set of a deployment unit for the deployment scheme serviced comprising HDFS and MapReduce is servicesOrderDeploy{HDFSService{DU_NN{1},DU_SNN{2},DU_DN{1,2,3}}, MapReduceService{DU_JT{1},DU_TT{2,3}}}。

4th, the node set servicesOrderDeploy according to service arrangement unit, the deployment for being serviced successively.

S3：The information change situation in Hadoop demand models and Hadoop deployment models is monitored using synchronization engine, and Information in Hadoop demand models and/or Hadoop deployment models carries out synchronizing information when changing.

In being embodiment at one of the invention, Hadoop deployment models are constructed using SM@RT instruments.

Specifically, model represents the overall architecture of system with one group of manageable unit during operation, would fit snugly within system Be described as to presentation of informationization during the operation such as internal structure, state, configuration standard, the structuring towards manager visual angle regards Figure.

The present invention is using SM@RT instruments construction Hadoop deployment models.The meta-model and Access Model of given system, its In, meta-model describes the information of managed system, and Access Model describes the call method of management interface, is input into based on more than, Architecture can be automatically generated by SM@RT instruments when goal systems is run, and its two-way uniformity and between system also can It is guaranteed.

As shown in fig. 7, the invention provides one group of operation on the deployment of Hadoop cluster services, outlining to inhomogeneity The operation of type model element is simultaneously classified to model manipulation, and for every kind of operation, the present invention embodies action name and needs The parameter wanted and the provided change of operation.

SM@RT support the bi-directional synchronization of Hadoop deployment models and runtime.As shown in figure 8, when synchronization engine detection When the character units of certain service increased a deployment unit in Hadoop deployment models, automatically add in runtime Plus a virtual machine is disposed；When keeper deletes this virtual machine in runtime, synchronization engine also can be automatic This change is detected, and calls management interface to terminate corresponding deployment unit.

In order to it is further understood that the present invention, will be described in detail by following examples.

In order to verify the feasibility and validity of method proposed by the invention, an automatic deployment and configuration are realized The example of Hadoop, there is provided the solution of the Hadoop services of user customization, including deployment MapReduce services and HBase is serviced.

Hadoop clusters are disposed in experiment on CloudStack, using 5 deploying virtual machines in CloudStack MapReduce and HBase is serviced.Wherein, MapReduce service arrangements are in tri- virtual machines of Host-1, Host-2, Host-3 On；HBase service arrangements are on tri- virtual machines of Host-1, Host-4, Host-5.Additionally, different virtual machines is assigned with not The resources such as same CPU, internal memory, storage, specific configuration detail is shown in Table 2 with deployment scenario.

The node deployment situation of table 2

User only needs to define the demand model of Hadoop clusters, and method proposed by the present invention can be realized as demand model To the automatic conversion of deployment model, and it is finally completed clustered deploy(ment).

As shown in figure 9, demand model includes clustered node part and cluster service part.

In clustered node part, NodeTypes represents list of win node type WIN, including three kinds of node types, respectively " CPU： 4, Memory：8, Storage：400”、“CPU：2、Memory：8、Storage：400 " and " CPU：2、Memory：4、 Storage：800”；Nodes represents node listing, including 5 nodes, wherein, id is that the node type correspondence id of 1 Node is The NodeType of nt1, id are the NodeType that 2 ids corresponding with the node type of 3 Node are nt2, and id is 4 and 5 Node Node type correspondence id is the NodeType of nt3.

In cluster service part, MapReduce and HBase services are deployed on its corresponding 3 node respectively.

The conversion of demand model to deployment model is divided into two parts, is respectively the model conversion and cluster service of clustered node Model conversion.

In clustered node part, NodeType elements and Node elements in demand model are respectively mapped in deployment model VMType elements and VM elements.Therefore, three kinds of type of virtual machine (i.e. VMType) and 5 virtual machines are there is also in deployment model (i.e. VM), as shown in Figure 10.

In cluster service part, according to restricted model proposed by the present invention and general-purpose algorithm, model conversion process is as follows：

1st, service unit according to demand in model under service list services, obtains needing the set of service of deployment services{MapReduceService,HBaseService}；

2nd, it can be seen from the dependence in restricted model between MapReduceService and HBaseService, Do not exist dependence between MapReduceService and HBaseService but the two services are all relied on HDFSService, now, HDFSService is not appeared in set of service, accordingly, it would be desirable in set of service services Supplement HDFSService simultaneously carries out deployment sequence, obtains being actually needed the ordered set servicesOrder of the service of deployment {HDFSService,MapReduceService,HBaseService}；

3rd, according to ordered set servicesOrder, according to the mode of backward be successively read HBaseService, MapReduceService, HDFSService simultaneously calculate the deployment scheme of service：First, according to demand in model Information in the software deployment module (i.e. Agent) of HBaseService, obtains the deployment node listing of HBase services, 1,4,5 Number node；Secondly, understand that HBaseService includes two kinds of role's lists of HMaster and HRegionServer according to restricted model Unit, one and only one deployment unit of HMaster DU_HM, HRegionServer can have multiple deployment unit DU_RS, and HMaster and HRegionServer can not be deployed on same node, additionally, the portion that deployment unit DU_HM is serviced HDFS There is dependence in administration cells D U_NN, the deployment unit DU_DN that deployment unit DU_RS is serviced HDFS has dependence；Then, according to Nodal information understands and role's deployment constraint is calculated, and the resource prioritization and deployment order of HMaster and NameNode are all It is highest, therefore the two character units is deployed in resourceful No. 1 node, and 4, No. 5 node deployment character units HRegionServer and DataNode；Similarly, when MapReduceService is disposed, by demand model and constraint mould Type can be calculated character units JobTracker and NameNode and be deployed in No. 1 node, and 2, No. 3 node deployment role's lists First TaskTracker and DataNode；And when HDFSService is disposed, from restricted model, HDFSService is disobeyed Rely in other services, and NameNode and SeconderyNameNode can not be deployed in same node, further according to HBaseService and MapReduceService deployment nodal informations understand that NameNode is deployed in No. 1 node, SeconderyNameNode is deployed in No. 2 nodes, and DataNode is deployed in 1~No. 5 node；Finally, according to restricted model middle part The dependence for affixing one's name to unit records the deployment nodal information of its deployment unit for relying on, and obtains ordered set servicesOrder In each service deployment unit node set, i.e. servicesOrderDeploy HDFSService DU_NN { 1 }, DU_SNN{2},DU_DN{1,2,3,4,5}},MapReduce Service{DU_JT{1},DU_TT{2,3}}, HBaseService{DU_HM{1},DU_RS{4,5}}}。

As shown in Figure 10, specific Hadoop portions are obtained by the model conversion of clustered node and the model conversion of cluster service Administration's model.

The Hadoop dispositions methods of model-driven in cloud environment of the present invention, for diversified software and hardware resources in cloud environment The difficulty brought to deployment Hadoop clusters with need property with service, it is proposed that the Hadoop cluster services deployment side of model-driven Method, by propose Hadoop demand models to deployment model conversion method, realize the two-way same of deployment model and runtime Step, for Hadoop deployment provides a kind of quick expansible cluster service deployment way, and demonstrates this in actual scene The feasibility and validity of invention.

In addition, other of the Hadoop dispositions methods of model-driven are constituted and acted in the cloud environment of the embodiment of the present invention All it is for a person skilled in the art known, in order to reduce redundancy, does not repeat.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means to combine specific features, structure, material or spy that the embodiment or example are described Point is contained at least one embodiment of the invention or example.In this manual, to the schematic representation of above-mentioned term not Necessarily refer to identical embodiment or example.And, the specific features of description, structure, material or feature can be any One or more embodiments or example in combine in an appropriate manner.

Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that：Not Can these embodiments be carried out with various changes, modification, replacement and modification in the case of departing from principle of the invention and objective, this The scope of invention is by claim and its equivalent limits.

Claims

1. in a kind of cloud environment model-driven Hadoop dispositions methods, it is characterised in that comprise the following steps：

S1：Hadoop demand models and Hadoop deployment models are provided, wherein, the Hadoop demand models are used for basis System requirements generate corresponding administration view, and the Hadoop deployment models are used to describe the node for attempting that manages with confidence Breath, running status and software are disposed；

S2：Realize that the model between the Hadoop demand models and the Hadoop deployment models turns according to default transformation rule Change, wherein, the default transformation rule includes node transformation model and cluster service transformation model, and the node transformation model is used Model conversion between the node of the node for realizing the Hadoop demand models and the Hadoop deployment models, the collection Group's service transformation model is used to realize the cluster service of the Hadoop demand models and the cluster of the Hadoop deployment models Model conversion between service；

S3：The information change feelings in the Hadoop demand models and the Hadoop deployment models are monitored using synchronization engine Condition, and information in the Hadoop demand models and/or the Hadoop deployment models to enter row information when changing same Step.

2. in cloud environment according to claim 1 model-driven Hadoop dispositions methods, it is characterised in that it is described Hadoop demand models include：

Clustered node module, the clustered node module is provided with infrastructure resources, and the infrastructure resources include node Configured list, node listing and corresponding resource and attribute in container image list；

Cluster service module, the cluster service module is provided with service list, the service list include various services and The attribute of every kind of service.

3. in cloud environment according to claim 2 model-driven Hadoop dispositions methods, it is characterised in that it is described Hadoop deployment models include：

Clustered node unit, the clustered node unit is stored with virtual machine configuration list, virtual machine list and virtual machine image List；

Cluster service unit, the cluster service unit is used to provide cluster service.

4. in cloud environment according to claim 3 model-driven Hadoop dispositions methods, it is characterised in that the node Element between node and the node of the Hadoop deployment models that transformation model passes through the Hadoop demand models maps Relation carrys out implementation model conversion, and the element mapping relations include helper labels and mapper labels, the helper labels Mapping relations for describing element between class and class, the mapping that the helper labels are used to describe attribute between class and class is closed System.

5. in cloud environment according to claim 3 model-driven Hadoop dispositions methods, it is characterised in that the cluster Service transformation model carries out the automatic conversion of cluster service by restricted model and default transfer algorithm, and the restricted model is used for Limit the incidence relation between multiple model elements, the default transfer algorithm according to the Hadoop demand models and it is described about Beam model generates service arrangement scheme.

6. in cloud environment according to claim 5 model-driven Hadoop dispositions methods, it is characterised in that it is described default Deployment Algorithm is comprised the following steps：

According to the service unit under service list in the Hadoop demand models, obtain needing the set of service of deployment；

According to the dependence between service unit in restricted model, the service in set of service is supplemented and sorted, obtained To the service ordered set for being actually needed deployment；

According to the service ordered set, each deployment side for servicing and calculating service is successively read according to the mode of backward Case；

According to the node set of service arrangement unit, the deployment for being serviced successively.

7. in cloud environment according to claim 1 model-driven Hadoop dispositions methods, it is characterised in that using SM@ RT instruments construct the Hadoop deployment models.