CN106953910A

CN106953910A - A kind of Hadoop calculates storage separation method

Info

Publication number: CN106953910A
Application number: CN201710161929.6A
Authority: CN
Inventors: 王德奎; 戴雪冰; 潘峰; 李珂; 刘安
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2017-07-14

Abstract

The present invention provides a kind of Hadoop and calculates storage separation method, including：S1：Kubernetes clusters are disposed in host；S2：By the data disk formatting of host, and it is mounted to system disk fixation catalogue；S3：The Hdfs NameNode nodes and DataNode nodes of order line deployment container；S4：Yarn deployment files are write, PetSet characteristics and ConfigMap characteristics deployment Yarn clusters based on Kubernetes clusters；S5：Carry out Yarn cluster tests.The present invention separates big data storage assembly and computation module, computation module is deployed in Kuberentes environment, keeper can dispose different computation modules according to the resource requirement of business scenario, at the end of big data task, computation module can be deleted, by without using resource discharge, so as to improve resource utilization, save cost.

Description

A kind of Hadoop calculates storage separation method

Technical field

The present invention relates to Hadoop cluster fields, and in particular to the separation method that a kind of Hadoop is calculated and stored.

Background technology

Hadoop clusters are generally emphasized to store together with deployment is calculated, and reduce network pressure during data transfer, with up to To preferable performance, therefore when building Hadoop clusters, it usually needs carry out the planning of resource requirement, to meet business to depositing The demand of the resources such as storage, calculating.It is not high to calculating execution rate request for the off-line calculation in Hadoop clusters, with net The increase of network bandwidth, network is not gradually the bottleneck of cluster performance, it may be considered that using calculating and storage separation scheme.

The content of the invention

To solve the above problems, the present invention provides a kind of separation method of Hadoop calculating and storage.

The technical scheme is that：A kind of Hadoop calculates storage separation method, including：

S1：Kubernetes clusters are disposed in host；

S2：By the data disk formatting of host, and it is mounted to system disk fixation catalogue；

S3：The Hdfs NameNode nodes and DataNode nodes of order line deployment container；

S4：Yarn deployment files are write, PetSet characteristics and ConfigMap characteristics deployment Yarn based on Kubernetes clusters Cluster；

S5：Carry out Yarn cluster tests.

Further, the operating system of host uses Ubuntu14.04 versions.

Further, Kubernetes clusters use flannel real-time performances Pod between across main-machine communication.

Further, in step S4, the PetSet characteristics of Kubernetes clusters are solid for the Pod distribution of operation Yarn components The Slave nodal informations of Yarn components are configured to Yarn collection by fixed domain name, the ConfigMap characteristics of Kubernetes clusters Group.

Further, Kubernetes clusters use the versions of Kubernetes 1.3.

The Hadoop that the present invention is provided calculates storage separation method, and based on Kubernetes, Docker, big data is stored Component and computation module separation, Kuberentes environment is deployed in by computation module, and keeper can be according to the money of business scenario Source demand disposes different computation modules, such as Yarn+MapReduce, Yarn+Spark, when needing to perform big data task When, Hadoop clusters are quickly created based on Kubernetes, and perform big data task, can be with the end of big data task Computation module is deleted, by without using resource discharge, so as to improve resource utilization, save cost.

Brief description of the drawings

Fig. 1 is specific embodiment of the invention method flow diagram.

Fig. 2 is specific embodiment of the invention Hadoop deployment of components figures.

Fig. 3 is specific embodiment of the invention Hadoop component interaction figures.

Embodiment

Below in conjunction with the accompanying drawings and the present invention will be described in detail by specific embodiment, following examples are to the present invention Explanation, and the invention is not limited in implementation below.

As shown in figure 1, the Hadoop that the present invention is provided calculates storage method, comprise the following steps：

S1：Kubernetes clusters are disposed in host；The operating system of host uses Ubuntu14.04 versions, Kubernetes clusters use flannel real-time performances Pod between across main-machine communication.

S2：By the data disk formatting of host, and it is mounted to system disk fixation catalogue；DataNode for Hdfs is saved Point.

S3：The Hdfs NameNode nodes and DataNode nodes of order line deployment container；For to multiple Yarn collection Group provides storage.

S4：Yarn deployment files are write, PetSet characteristics and the deployment of ConfigMap characteristics based on Kubernetes clusters Yarn clusters；The PetSet characteristics of Kubernetes clusters are the fixed domain name of the Pod distribution of operation Yarn components, The Slave nodal informations of Yarn components are configured to Yarn clusters by the ConfigMap characteristics of Kubernetes clusters.

S5：Yarn cluster tests are carried out, deployment is completed, and now, the computation module and storage assembly of Hadoop clusters are Separation.

In step S4, Yarn deployment files are with reference to as follows：

# A headless service to create DNS records

apiVersion: v1

kind: Service

metadata:

namespace: bigdata

labels:

app: mr

spec:

ports:

- port: 80

# *.nginx.default.svc.cluster.local

clusterIP: None

selector:

app: mr

---

apiVersion: apps/v1alpha1

kind: PetSet

metadata:

spec:

serviceName: "mr"

replicas: 3

template:

metadata:

labels:

app: mr

annotations:

pod.alpha.kubernetes.io/initialized: "true"

spec:

terminationGracePeriodSeconds: 0

containers:

- name: mr

image: 10.110.17.138:5000/wangdk/bigdata:v0.6

imagePullPolicy: Always

command:

- /usr/local/bin/start.sh

securityContext:

privileged: true

env:

- name: RESOURCEMANAGE_HOSTNAME

value: mr-0.mr.bigdata.svc.iopk8s.com

- name: NODE_ROLE

value: yarn

- name: NAMENODE_HOSTNAME

value: master.iop.com

- name: HDFSINFO

valueFrom:

configMapKeyRef:

key: hdfsinfo

- name: SLAVES

valueFrom:

configMapKeyRef:

key: yarnslaves

After the completion of deployment, Yarn cluster informations can be viewed：

root@master:~# kubectl get pods -o wide |grep mr-

mr-0 1/1 Running 1d 172.17.18.4 slave3.iop.com

mr-1 1/1 Running 1d 172.17.17.18 master.iop.com

mr-2 1/1 Running 1d 172.17.60.11 slave1.iop.com

In the present embodiment, it can support that, in many set Yarn clusters of Same Physical environment deployment, cluster directly completes resource by Docker Isolation, deployment planning chart is as shown in Figures 2 and 3.

Disclosed above is only the preferred embodiment of the present invention, but the present invention is not limited to this, any this area What technical staff can think does not have a creative change, and some improvement made without departing from the principles of the present invention and Retouching, should all be within the scope of the present invention.

Claims

1. a kind of Hadoop calculates storage separation method, it is characterised in that including：

S1：Kubernetes clusters are disposed in host；

S5：Carry out Yarn cluster tests.

2. Hadoop according to claim 1 calculates storage separation method, it is characterised in that the operating system of host is adopted Use Ubuntu14.04 versions.

3. Hadoop according to claim 1 or 2 calculates storage separation method, it is characterised in that Kubernetes clusters Using between flannel real-time performances Pod across main-machine communication.

4. Hadoop according to claim 3 calculates storage separation method, it is characterised in that in step S4, The PetSet characteristics of Kubernetes clusters are the fixed domain name of the Pod distribution of operation Yarn components, Kubernetes clusters The Slave nodal informations of Yarn components are configured to Yarn clusters by ConfigMap characteristics.

5. the Hadoop according to claim 1,2 or 4 calculates storage separation method, it is characterised in that Kubernetes collection Mine massively with the versions of Kubernetes 1.3.