CN106953910A - A kind of Hadoop calculates storage separation method - Google Patents

A kind of Hadoop calculates storage separation method Download PDF

Info

Publication number
CN106953910A
CN106953910A CN201710161929.6A CN201710161929A CN106953910A CN 106953910 A CN106953910 A CN 106953910A CN 201710161929 A CN201710161929 A CN 201710161929A CN 106953910 A CN106953910 A CN 106953910A
Authority
CN
China
Prior art keywords
yarn
clusters
kubernetes
separation method
hadoop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710161929.6A
Other languages
Chinese (zh)
Inventor
王德奎
戴雪冰
潘峰
李珂
刘安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710161929.6A priority Critical patent/CN106953910A/en
Publication of CN106953910A publication Critical patent/CN106953910A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The present invention provides a kind of Hadoop and calculates storage separation method, including:S1:Kubernetes clusters are disposed in host;S2:By the data disk formatting of host, and it is mounted to system disk fixation catalogue;S3:The Hdfs NameNode nodes and DataNode nodes of order line deployment container;S4:Yarn deployment files are write, PetSet characteristics and ConfigMap characteristics deployment Yarn clusters based on Kubernetes clusters;S5:Carry out Yarn cluster tests.The present invention separates big data storage assembly and computation module, computation module is deployed in Kuberentes environment, keeper can dispose different computation modules according to the resource requirement of business scenario, at the end of big data task, computation module can be deleted, by without using resource discharge, so as to improve resource utilization, save cost.

Description

A kind of Hadoop calculates storage separation method
Technical field
The present invention relates to Hadoop cluster fields, and in particular to the separation method that a kind of Hadoop is calculated and stored.
Background technology
Hadoop clusters are generally emphasized to store together with deployment is calculated, and reduce network pressure during data transfer, with up to To preferable performance, therefore when building Hadoop clusters, it usually needs carry out the planning of resource requirement, to meet business to depositing The demand of the resources such as storage, calculating.It is not high to calculating execution rate request for the off-line calculation in Hadoop clusters, with net The increase of network bandwidth, network is not gradually the bottleneck of cluster performance, it may be considered that using calculating and storage separation scheme.
The content of the invention
To solve the above problems, the present invention provides a kind of separation method of Hadoop calculating and storage.
The technical scheme is that:A kind of Hadoop calculates storage separation method, including:
S1:Kubernetes clusters are disposed in host;
S2:By the data disk formatting of host, and it is mounted to system disk fixation catalogue;
S3:The Hdfs NameNode nodes and DataNode nodes of order line deployment container;
S4:Yarn deployment files are write, PetSet characteristics and ConfigMap characteristics deployment Yarn based on Kubernetes clusters Cluster;
S5:Carry out Yarn cluster tests.
Further, the operating system of host uses Ubuntu14.04 versions.
Further, Kubernetes clusters use flannel real-time performances Pod between across main-machine communication.
Further, in step S4, the PetSet characteristics of Kubernetes clusters are solid for the Pod distribution of operation Yarn components The Slave nodal informations of Yarn components are configured to Yarn collection by fixed domain name, the ConfigMap characteristics of Kubernetes clusters Group.
Further, Kubernetes clusters use the versions of Kubernetes 1.3.
The Hadoop that the present invention is provided calculates storage separation method, and based on Kubernetes, Docker, big data is stored Component and computation module separation, Kuberentes environment is deployed in by computation module, and keeper can be according to the money of business scenario Source demand disposes different computation modules, such as Yarn+MapReduce, Yarn+Spark, when needing to perform big data task When, Hadoop clusters are quickly created based on Kubernetes, and perform big data task, can be with the end of big data task Computation module is deleted, by without using resource discharge, so as to improve resource utilization, save cost.
Brief description of the drawings
Fig. 1 is specific embodiment of the invention method flow diagram.
Fig. 2 is specific embodiment of the invention Hadoop deployment of components figures.
Fig. 3 is specific embodiment of the invention Hadoop component interaction figures.
Embodiment
Below in conjunction with the accompanying drawings and the present invention will be described in detail by specific embodiment, following examples are to the present invention Explanation, and the invention is not limited in implementation below.
As shown in figure 1, the Hadoop that the present invention is provided calculates storage method, comprise the following steps:
S1:Kubernetes clusters are disposed in host;The operating system of host uses Ubuntu14.04 versions, Kubernetes clusters use flannel real-time performances Pod between across main-machine communication.
S2:By the data disk formatting of host, and it is mounted to system disk fixation catalogue;DataNode for Hdfs is saved Point.
S3:The Hdfs NameNode nodes and DataNode nodes of order line deployment container;For to multiple Yarn collection Group provides storage.
S4:Yarn deployment files are write, PetSet characteristics and the deployment of ConfigMap characteristics based on Kubernetes clusters Yarn clusters;The PetSet characteristics of Kubernetes clusters are the fixed domain name of the Pod distribution of operation Yarn components, The Slave nodal informations of Yarn components are configured to Yarn clusters by the ConfigMap characteristics of Kubernetes clusters.
S5:Yarn cluster tests are carried out, deployment is completed, and now, the computation module and storage assembly of Hadoop clusters are Separation.
In step S4, Yarn deployment files are with reference to as follows:
# A headless service to create DNS records
apiVersion: v1
kind: Service
metadata:
name: mr
namespace: bigdata
labels:
app: mr
spec:
ports:
- port: 80
name: mr
# *.nginx.default.svc.cluster.local
clusterIP: None
selector:
app: mr
---
apiVersion: apps/v1alpha1
kind: PetSet
metadata:
name: mr
spec:
serviceName: "mr"
replicas: 3
template:
metadata:
labels:
app: mr
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
terminationGracePeriodSeconds: 0
containers:
- name: mr
image: 10.110.17.138:5000/wangdk/bigdata:v0.6
imagePullPolicy: Always
command:
- /usr/local/bin/start.sh
securityContext:
privileged: true
env:
- name: RESOURCEMANAGE_HOSTNAME
value: mr-0.mr.bigdata.svc.iopk8s.com
- name: NODE_ROLE
value: yarn
- name: NAMENODE_HOSTNAME
value: master.iop.com
- name: HDFSINFO
valueFrom:
configMapKeyRef:
name: hdfsinfo
key: hdfsinfo
- name: SLAVES
valueFrom:
configMapKeyRef:
name: hdfsinfo
key: yarnslaves
After the completion of deployment, Yarn cluster informations can be viewed:
root@master:~# kubectl get pods -o wide |grep mr-
mr-0 1/1 Running 1d 172.17.18.4 slave3.iop.com
mr-1 1/1 Running 1d 172.17.17.18 master.iop.com
mr-2 1/1 Running 1d 172.17.60.11 slave1.iop.com
In the present embodiment, it can support that, in many set Yarn clusters of Same Physical environment deployment, cluster directly completes resource by Docker Isolation, deployment planning chart is as shown in Figures 2 and 3.
Disclosed above is only the preferred embodiment of the present invention, but the present invention is not limited to this, any this area What technical staff can think does not have a creative change, and some improvement made without departing from the principles of the present invention and Retouching, should all be within the scope of the present invention.

Claims (5)

1. a kind of Hadoop calculates storage separation method, it is characterised in that including:
S1:Kubernetes clusters are disposed in host;
S2:By the data disk formatting of host, and it is mounted to system disk fixation catalogue;
S3:The Hdfs NameNode nodes and DataNode nodes of order line deployment container;
S4:Yarn deployment files are write, PetSet characteristics and ConfigMap characteristics deployment Yarn based on Kubernetes clusters Cluster;
S5:Carry out Yarn cluster tests.
2. Hadoop according to claim 1 calculates storage separation method, it is characterised in that the operating system of host is adopted Use Ubuntu14.04 versions.
3. Hadoop according to claim 1 or 2 calculates storage separation method, it is characterised in that Kubernetes clusters Using between flannel real-time performances Pod across main-machine communication.
4. Hadoop according to claim 3 calculates storage separation method, it is characterised in that in step S4, The PetSet characteristics of Kubernetes clusters are the fixed domain name of the Pod distribution of operation Yarn components, Kubernetes clusters The Slave nodal informations of Yarn components are configured to Yarn clusters by ConfigMap characteristics.
5. the Hadoop according to claim 1,2 or 4 calculates storage separation method, it is characterised in that Kubernetes collection Mine massively with the versions of Kubernetes 1.3.
CN201710161929.6A 2017-03-17 2017-03-17 A kind of Hadoop calculates storage separation method Pending CN106953910A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710161929.6A CN106953910A (en) 2017-03-17 2017-03-17 A kind of Hadoop calculates storage separation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710161929.6A CN106953910A (en) 2017-03-17 2017-03-17 A kind of Hadoop calculates storage separation method

Publications (1)

Publication Number Publication Date
CN106953910A true CN106953910A (en) 2017-07-14

Family

ID=59472687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710161929.6A Pending CN106953910A (en) 2017-03-17 2017-03-17 A kind of Hadoop calculates storage separation method

Country Status (1)

Country Link
CN (1) CN106953910A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506282A (en) * 2017-08-28 2017-12-22 郑州云海信息技术有限公司 The system monitoring data capture method and system of container in a kind of docker clusters
CN108039975A (en) * 2017-12-21 2018-05-15 北京搜狐新媒体信息技术有限公司 Container cluster management system and its application process
CN109271233A (en) * 2018-07-25 2019-01-25 上海数耕智能科技有限公司 The implementation method of Hadoop cluster is set up based on Kubernetes
CN109324892A (en) * 2018-07-24 2019-02-12 北京京东尚科信息技术有限公司 Distribution management method, distributed management system and device
CN109542791A (en) * 2018-11-27 2019-03-29 长沙智擎信息技术有限公司 A kind of program large-scale concurrent evaluating method based on container technique
CN111880934A (en) * 2020-07-29 2020-11-03 北京浪潮数据技术有限公司 Resource management method, device, equipment and readable storage medium
CN112882726A (en) * 2021-01-26 2021-06-01 西安建筑科技大学 Hadoop and Docker-based deployment method of environment system
CN113312165A (en) * 2021-07-28 2021-08-27 浙江大华技术股份有限公司 Task processing method and device
CN116483473A (en) * 2023-06-14 2023-07-25 浩鲸云计算科技股份有限公司 Storage plug-in method and system based on GlusterFS

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462811A (en) * 2014-12-05 2015-03-25 云中万维(北京)科技有限公司 Network game data processing method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462811A (en) * 2014-12-05 2015-03-25 云中万维(北京)科技有限公司 Network game data processing method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
谢丽: ""Kubernetes1.3发布,支持跨集群联合服务和有状态服务"", 《HTTPS://WWW.INFOG/CN/ARTICLE/2016/08/KUBERNETES-1.3-RELEASED》 *
谢丽: ""将hadoop的计算和存储分开能有效的提升性能"", 《HTTPS://WWW.INFOG/CN/ARTICLE/2015/12/HADOOP-HDFS-DAS》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506282A (en) * 2017-08-28 2017-12-22 郑州云海信息技术有限公司 The system monitoring data capture method and system of container in a kind of docker clusters
CN108039975A (en) * 2017-12-21 2018-05-15 北京搜狐新媒体信息技术有限公司 Container cluster management system and its application process
CN108039975B (en) * 2017-12-21 2020-08-28 北京搜狐新媒体信息技术有限公司 Container cluster management system and application method thereof
CN109324892A (en) * 2018-07-24 2019-02-12 北京京东尚科信息技术有限公司 Distribution management method, distributed management system and device
CN109271233A (en) * 2018-07-25 2019-01-25 上海数耕智能科技有限公司 The implementation method of Hadoop cluster is set up based on Kubernetes
CN109542791A (en) * 2018-11-27 2019-03-29 长沙智擎信息技术有限公司 A kind of program large-scale concurrent evaluating method based on container technique
CN109542791B (en) * 2018-11-27 2019-11-29 湖南智擎科技有限公司 A kind of program large-scale concurrent evaluating method based on container technique
CN111880934A (en) * 2020-07-29 2020-11-03 北京浪潮数据技术有限公司 Resource management method, device, equipment and readable storage medium
CN112882726A (en) * 2021-01-26 2021-06-01 西安建筑科技大学 Hadoop and Docker-based deployment method of environment system
CN112882726B (en) * 2021-01-26 2022-11-15 西安建筑科技大学 Hadoop and Docker-based deployment method of environment system
CN113312165A (en) * 2021-07-28 2021-08-27 浙江大华技术股份有限公司 Task processing method and device
CN113312165B (en) * 2021-07-28 2021-11-16 浙江大华技术股份有限公司 Task processing method and device
CN116483473A (en) * 2023-06-14 2023-07-25 浩鲸云计算科技股份有限公司 Storage plug-in method and system based on GlusterFS

Similar Documents

Publication Publication Date Title
CN106953910A (en) A kind of Hadoop calculates storage separation method
US8271455B2 (en) Storing replication requests for objects in a distributed storage system
US10853242B2 (en) Deduplication and garbage collection across logical databases
CN103379159B (en) A kind of method that distributed Web station data synchronizes
US10659225B2 (en) Encrypting existing live unencrypted data using age-based garbage collection
US10445433B2 (en) Methods and systems of query engines and secondary indexes implemented in a distributed database
CN107797767B (en) One kind is based on container technique deployment distributed memory system and its storage method
CN102880658A (en) Distributed file management system based on seismic data processing
CN111881223B (en) Data management method, device, system and storage medium
CN102937964B (en) Intelligent data service method based on distributed system
US11068499B2 (en) Method, device, and system for peer-to-peer data replication and method, device, and system for master node switching
US11743333B2 (en) Tiered queuing system
CN103002027A (en) System and method for data storage on basis of key-value pair system tree-shaped directory achieving structure
CN102981933A (en) Method and system for data increment backup of sensing layer of Internet of Things
CN104050276A (en) Cache processing method and system of distributed database
CN103399894A (en) Distributed transaction processing method on basis of shared storage pool
CN104184812A (en) Multi-point data transmission method based on private cloud
US20170351620A1 (en) Caching Framework for Big-Data Engines in the Cloud
CN105516284A (en) Clustered database distributed storage method and device
CN111404932A (en) Method for accessing medical institution system to smart medical cloud service platform
CN109639773A (en) A kind of the distributed data cluster control system and its method of dynamic construction
CN107818111A (en) A kind of method, server and the terminal of cache file data
CN103838515A (en) Method and system for allowing server cluster to have access to and dispatch multi-controller disk array
CN103281383B (en) A kind of time sequence information recording method of Based on Distributed data source
US10592479B2 (en) Space management for a hierarchical set of file systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200518

Address after: Building S01, Inspur Science Park, No. 1036, Inspur Road, high tech Zone, Jinan City, Shandong Province, 250000

Applicant after: Tidal Cloud Information Technology Co.,Ltd.

Address before: 450000 Henan province Zheng Dong New District of Zhengzhou City Xinyi Road No. 278 16 floor room 1601

Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170714