CN113204353B - Big data platform assembly deployment method and device - Google Patents

Big data platform assembly deployment method and device Download PDF

Info

Publication number
CN113204353B
CN113204353B CN202110459382.4A CN202110459382A CN113204353B CN 113204353 B CN113204353 B CN 113204353B CN 202110459382 A CN202110459382 A CN 202110459382A CN 113204353 B CN113204353 B CN 113204353B
Authority
CN
China
Prior art keywords
management service
management
service
big data
component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110459382.4A
Other languages
Chinese (zh)
Other versions
CN113204353A (en
Inventor
马申跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN202110459382.4A priority Critical patent/CN113204353B/en
Publication of CN113204353A publication Critical patent/CN113204353A/en
Application granted granted Critical
Publication of CN113204353B publication Critical patent/CN113204353B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • G06F8/63Image based installation; Cloning; Build to order
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Computer And Data Communications (AREA)

Abstract

The disclosure provides a big data platform component deployment method and device, which are used for improving high availability and component performance of a big data platform. According to the method, the large data platform service is divided into the management service and the non-management service, the management service is deployed at the management service node where the container arrangement engine cluster and the high-availability component are deployed, the non-management service node is deployed at the non-management service node, namely the data node, so that the management node and the data node are decoupled, resource robbery is reduced, the performance of the large data platform component can be improved, the fault of the management node is minimized, and the stable operation of the large data platform is guaranteed.

Description

Big data platform assembly deployment method and device
Technical Field
The disclosure relates to the technical field of big data and cloud computing, in particular to a big data platform component deployment method and device.
Background
Apache Ambari is a Web-based management tool for creating, managing and monitoring a Hadoop cluster, and is referred to as a big data management platform for short. Ambari has supported most Hadoop components including distributed file systems HDFS, MapReduce (parallel arithmetic programming model of large-scale data sets), Hive (data warehouse framework), Pig (large-scale data analysis platform), Hbase (distributed column database), Zookeeper (distributed application coordination service), Sqoop (data transfer tool), and Hcatalog (data table and storage management service), etc.
The Hadoop big data platform represented by Ambari is widely applied, however, with the increase of data volume and the expansion of cluster scale and the continuous improvement of service complexity, various factors such as resource robbery, network oscillation and power failure often occur to cause some services on the platform to hang up. To address single node failures, High Available (HA) deployments are implemented for management services to guarantee High availability of services, but most HA mechanisms rely on the implementation of Zookeeper. Generally, only one Zookeeper cluster exists in one platform, the Zookeeper generally exists in a role of a coordinator, many services (such as HBase, Solr, Kafka) can frequently access the Zookeeper cluster, the pressure is great, and once the Zookeeper fails, all the relied services can fail.
With the advent of the container cloud management platform K8s, people began to move Hadoop big data services into containers to run, but the containers were virtualized out through Cgroup, and compared with physical machines, the containers are many calling levels, so that the performance of the containers is greatly reduced. Moreover, the large data service usually needs massive storage, and the requirement in the industry is difficult to meet by only using a container.
Disclosure of Invention
In view of this, the present disclosure provides a method and an apparatus for deploying a big data platform component, which are used to improve high availability and component performance of a big data platform.
Fig. 1 is a schematic flow chart illustrating steps of a big data platform component deployment method provided in the present disclosure, where the method includes:
step 101, deploying a container arrangement engine cluster at a plurality of management service nodes, and deploying high-availability components for the container arrangement engine cluster;
the container orchestration engine is used for managing containerized applications on multiple hosts in a cloud platform, and may be, for example, kubernets K8s for short.
The high availability component is used to provide a high availability, load balancing component, which may be, for example, HAProxy.
Step 102, deploying a management service installation package of a big data platform assembly in the container arrangement engine cluster consisting of a plurality of management service nodes; the management service installation package only comprises an installation package of management services which only participate in the distribution and scheduling of tasks in the big data platform assembly;
the big data platform component comprises one or more of HDFS, Yarn, HBase, Zookeeper, Kafka, Solr and the like.
103, deploying the non-management service installation package of the big data platform assembly at one or more non-management nodes; and the non-management service installation packages only comprise installation packages of non-management services which only participate in the calculation and data storage of tasks.
The method divides a service installation package (or called mirror image) of a big data platform assembly into a management service installation package and an unmanaged service installation package; the management service installation packages only comprise installation packages of management services only participating in distribution and scheduling of tasks in the big data platform assembly, and the non-management service installation packages only comprise installation packages of non-management services only participating in calculation and data storage of the tasks. The management service runs in Pod in the Pod of the K8s cluster. The non-management service is deployed in the one or more non-management service nodes carried by a physical machine.
Further, the management service installation package includes an installation package of a management service of the big data management platform component, and the non-management service installation package further includes an installation package of a non-management service of the big data management platform component, and the method further includes:
after the management service of the big data management platform assembly is started through the container arranging engine, the one or more non-management service nodes are added through the management service of the big data management platform assembly, and then the step of deploying the non-management service installation package of the big data platform assembly at the one or more non-management nodes is executed.
Furthermore, the management service of the big data platform component runs in a container, and in order to realize the dynamic update of the configuration of the management service, a mechanism of separating configuration information from a service program is adopted to realize the injection of the configuration information; in order to realize the continuous service capability of the management service of the big data platform assembly, the container orchestration engine cluster enables a preset number of container processes bearing the management service of the big data platform assembly to continuously run through a control mechanism.
An embodiment of the present disclosure uses a ConfigMap function to implement dynamic update of the configuration of the management service, and configures one ConfigMap. The control mechanism used by the container orchestration engine cluster is a ReplicationController mechanism.
Further, when the management service of the big data platform component is a stateful service, the management service is marked as a role of providing the service to the outside through a tag.
Further, when the big data platform component relies on a distributed application coordination service component, the method further comprises:
and deploying a distributed application coordination service component cluster for the big data platform component in the container arrangement engine cluster consisting of a plurality of management service nodes, and dynamically loading a service domain name, a network access address and a port of the created distributed application coordination service component cluster into the configuration of management service and non-management service of the big data platform component.
Fig. 2 is a schematic structural diagram of a big data platform component deployment apparatus according to an embodiment of the present disclosure, and each functional module in the apparatus 200 may be implemented by software, hardware, or a combination of software and hardware. When a plurality of hardware devices implement the technical solution of the present disclosure together, since the purpose of mutual cooperation among the hardware devices is to achieve the purpose of the present invention together, and the action and the processing result of one party determine the execution timing of the action of the other party and the result that can be obtained, it can be considered that the execution main bodies have mutual cooperation relationship, and the execution main bodies have mutual command and control relationship. The big data platform assembly deployment apparatus 200 includes:
a plurality of management service nodes 210, one or more non-management service nodes 220, and a high availability component 230;
the plurality of management service nodes 210 are deployed with a container orchestration engine cluster, and the high availability component 230 provides high availability for management services in the container orchestration engine cluster;
the management service of the big data platform component is deployed in the plurality of management service nodes 210; the management service is a service which only participates in distribution and scheduling of tasks in the big data platform assembly;
the one or more non-management service nodes 220 have the non-management services of the big data platform component deployed therein; the unmanaged services are services in the big data platform component that only participate in the computation of tasks and the storage of data.
Further, a management service of a big data management platform component is also deployed in the plurality of management service nodes 210, and an unmanaged service of the big data management platform component is also deployed in the one or more unmanaged nodes 220;
after the container orchestration engine starts the management service of the big data management platform component, the management service of the big data management platform component adds the one or more non-management service nodes 220, and deploys the non-management service of the big data management platform component to the one or more non-management nodes 220.
Furthermore, the management service of the big data platform component runs in the container, and the management service of the big data platform component adopts a mechanism that configuration information is separated from a service program to realize the injection of the configuration information;
and the container arranging engine cluster enables a preset number of container processes bearing the management service of the big data platform component to continuously run through a control mechanism.
Further, a distributed application coordination service component cluster on which the big data platform component depends is also deployed in the plurality of management service nodes 210, and a service domain name, a network access address, and a port of the distributed application coordination service component cluster are loaded into the configuration of the management service and the non-management service of the big data platform component through a dynamic loading mechanism.
Further, the container orchestration engine cluster is K8s, the high available component is HAProxy, and the big data platform component comprises one or more of HDFS, Yarn, HBase, Zookeeper, Kafka and Solr components; the big data management platform component is Ambari; the distributed application coordination service component is a Zookeeper;
the management service of the Ambari is Ambari-Server, and the non-management service is Ambari-Agent;
the management service of the HDFS is NameNode, and the non-management service is DataNode;
the management service of the Yarn is Resourcemanager, and the non-management service is NodeManager;
the management service of the HBase is HMASter, and the non-management service is HRegionServer;
the Kafka and Solr services are non-management services.
According to the method, the large data platform service is divided into the management service and the non-management service, the management service is deployed at the management service node where the container arrangement engine cluster and the high-availability component are deployed, the non-management service node is deployed at the non-management service node, namely the data node, so that the management node and the data node are decoupled, resource robbery is reduced, the performance of the large data platform component can be improved, the fault of the management node is minimized, and the stable operation of the large data platform is guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the embodiments of the present disclosure or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present disclosure.
FIG. 1 is a schematic flow chart illustrating steps of a big data platform component deployment method provided by the present disclosure;
fig. 2 is a schematic structural diagram of a big data platform assembly deployment device according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a big data platform component service split deployment plan provided by an embodiment of the present disclosure;
fig. 4 is a cluster deployment example of a Zookeeper cluster-independent unmanaged service on a physical machine according to an embodiment of the present disclosure;
fig. 5 is a deployment example of a Zookeeper cluster dependent big data platform component according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device implementing a large data platform component deployment method according to an embodiment of the present disclosure.
Detailed Description
The terminology used in the embodiments of the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present disclosure. As used in the embodiments of the present disclosure, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The term "and/or" as used in this disclosure is meant to encompass any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information in the embodiments of the present disclosure, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of embodiments of the present disclosure. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "at … …" or "in response to a determination.
The key component services managed by the Ambari big data management platform generally comprise HDFS, Yarn, HBase, Zookeeper, Kafka, Solr and the like, and due to the problems of service hardware resources, service dependency relationship and the like, the management process of the component is often hung, so that the whole component service is in a paralyzed state.
According to the embodiment of the disclosure, HA is realized on Master service of Kubernets (K8 s for short) through an HAProxy mechanism, so that the stability of K8s is ensured, then the Replica control mechanism is controlled through the copy of K8s to control the copy of the service, when the Pod containing the service is hung up, the Pod is automatically restarted, so that the pods are always in the designated number, and by utilizing the mechanism, the key service of the large data platform assembly is deployed in the pods, so that the high availability of the assembly is ensured. Taking the key component service HDFS managed by Ambari as an example, the HDFS consists of NameNode and DataNode, the more key management service NameNode can be made into mirror images in advance, then the mirror images are placed in Pod deployed by K8s to run, and the DataNode is placed in an Ambari big data management platform to be managed.
Typically all components within one cluster deployed by Ambari share one Zookeeper cluster. However, in an actual service environment, when a large number of Zookeeper clusters need to be accessed, the problem that the service cannot be normally used due to insufficient connection number always occurs. In order to fully ensure the normal use of the service using the Zookeeper cluster, one of the improvement ideas of the disclosure is to place the Zookeeper cluster into the Pod, and because the Zookeeper occupies less memory, when the components (HBase, Solr, Kafka) depending on the Zookeeper cluster are deployed, the K8s is dynamically notified to create a Zookeeper cluster belonging to the Zookeeper cluster, so that the service using the Zookeeper cluster cannot influence the service because the number of the connections is insufficient.
For the purpose of the invention, the method needs to perform customization operation on Ambari and the service package of the large data platform component managed by Ambari, and the component service is divided into management service and non-management service.
The method and the system call a service which only participates in the distribution and the scheduling of the tasks and does not participate in the execution of the tasks as a management service; and the service which only participates in the calculation and data storage of the task and does not participate in the distribution and scheduling of the task is named as a non-management service.
According to the method, the management service of the big data platform is made into a mirror image package in advance, the mirror image package is placed in a mirror image library of a docker, and the package customization is carried out on the rest non-management service, so that the management service can be conveniently deployed in a physical machine. Wherein Docker is the most common runtime in the smallest/simplest base unit Pod created or deployed by kubernets, Pod can also support other container runtimes.
Since Ambari manages a large number of services of a large data platform, the present disclosure mainly exemplifies HDFS, Yarn, HBase, Zookeeper, Kafka, and Solr, and other services may be handled with reference to the technical solution provided by the present disclosure, which is not described in detail in the present disclosure.
The technical scheme of the disclosure mainly comprises three improvements, which are respectively: the method comprises the following steps of splitting an Ambar i big data management platform and big data related component service packages, deploying a management service strategy and deploying a non-management service strategy. The overall process of deployment mainly comprises the following steps:
(1) building a k8s high availability cluster using multiple physical machines.
(2) Splitting the Ambari and the component services of the large data platform managed by the Ambari, making a management service mirror image, and uploading the management service mirror image to a Docker mirror image library.
The management service mirror image only comprises management services of all components of the big data platform, such as NameNode, HMASter, resource manager and the like.
(3) The Ambari-Server service is started within the cluster set up by K8 s.
(4) And adding a plurality of non-management service nodes, namely data nodes, through the Ambari-Server for deploying non-management services.
(5) Deploy management services within the K8s cluster according to management service deployment policies.
(6) Deploying the service at the data node according to the unmanaged service deployment policy.
A first part: ambar i big data management platform and splitting of big data related component service package
Fig. 3 is a schematic diagram of a service splitting and deployment plan for a big data platform component according to an embodiment of the present disclosure, where a horizontal direction is a cluster node, a vertical direction is a physical machine, a physical machine for deploying management services is referred to as a management service node, a physical machine for deploying unmanaged services is referred to as an unmanaged service node, and the unmanaged service node may be dynamically expanded according to business needs.
Kubernets K8s is a container orchestration engine for managing containerized applications on multiple hosts in a cloud platform. According to the embodiment of the method, a plurality of management service nodes are adopted to build a K8s container cloud management platform through an HAproxy to realize the high-availability cluster, and management service scheduling is distributed to any node of the plurality of management service nodes through K8 s. k8s can use its statefull mechanism to guarantee scheduling when the container carrying the management service fails, and can schedule the management service to other management service nodes to run when the scheduling of the management service to one management service node fails.
Ambari itself includes Ambari-Server and Ambari-Agent. And deploying the Ambari-Server as a management service to a management service node, and deploying the Ambari-Agent as an unmanaged service to an unmanaged service node.
The big data platform component service package managed by the Ambari big data management platform generally comprises components such as HDFS, Yarn, HBase, Zookeeper, Kafak and Solr by default. The components need to be separated, the management services in the components are deployed to a high-availability cluster composed of a plurality of management service nodes through K8s to realize high-availability HA, and the non-management services are deployed to non-management service nodes to accelerate the computing and storing performance of data.
By analyzing the Hadoop component, the services of the big data platform component of the terminal managed by Ambari can be divided as follows according to the definition of management services and non-management services:
the HDFS adopts a Master-Slave (Master/Slave) structure model, and an HDFS cluster consists of a NameNode and a plurality of DataNodes. The NameNode is used as a main server and used for managing the naming space of the file system and the access operation of a client to the file; the DataNode in the cluster manages the stored data. And deploying the NameNode as a management service to the management service node, and deploying the DataNode as a non-management service to the non-management service node.
The Yarn includes a ResourceManager that controls the entire Yarn cluster and manages the allocation of applications to the underlying computing resources, and a NodeManager. Resourcemanagers arrange the various resource components (computation, memory, bandwidth, etc.) to the base NodeManager. The NodeManager manages each node in a yann cluster. The NodeManager provides services for each node in the cluster, from overseeing lifetime management of a container to monitoring resources and tracking node health. The resource manager is deployed to the management service node as a management service, and the node manager is deployed to the non-management service node as a non-management service.
HBase includes HMmaster and HRegionServer, and the HMmaster is mainly responsible for the management work of Table and Region. The HRegionServer is mainly responsible for responding to the I/O request of a user and reading and writing data into the HDFS file system, and is the most core module in the HBase. The method and the system deploy the HMASter as a management service to a management service node and deploy the HRegionServer as an unmanaged service to an unmanaged service node.
Zookeeper is distributed application program coordination service, the services of components such as Kafka, HBase and HDFS all depend on Zookeeper, if Zookeeper fails, the component services depending on Zookeeper also fail, belong to coordinator roles and are important components, and therefore Zookeeper is deployed on a management service node as management service in the disclosure. Since the high-availability cluster is built on a plurality of management service nodes by combining K8s with the HAproxy, the Zookeeper is deployed in the Pod of K8s of the management service nodes, and even if the Zookeeper service fails, the K8s can reschedule and execute the Zookeeper service until the Zookeeper service recovers to a normal state, so that the service depending on the Zookeeper cannot generate a fault due to the failure of the Zookeeper.
The services of Kafka and Solr belong to peer nodes, each service node can be used as a master service node or a slave service node, and the condition that the service is influenced when one node is hung does not exist, so that the service of the assembly is deployed on the unmanaged service node as unmanaged service.
K8s produces a container through the service package image, inside which the management service is deployed. The non-management service is deployed on the non-management node, and the non-management node can directly deploy the non-management service by adopting the physical machine because better performance can be obtained by directly deploying the non-management service on the physical machine.
With reference to the example of fig. 3, high availability of management services in a large data platform component can be achieved by using multiple management service nodes deploying K8s in combination with HAProxy, and high availability of important management services is guaranteed. For non-management services in the big data platform assembly, the system can be deployed on a physical machine according to actual business conditions and needs.
Second part, managing deployment policies for services
Since management services such as Ambari-Server, NameNode, ResourceManager, HMaster, and ZooKeeper belong to stateful services, and the IP address of the pod of K8s is dynamic, it can only be created by the kid Service mechanism. The Service belongs to a label definition of the Service provided by K8s, and a role capable of providing the Service to the outside is defined by the label. The management service is deployed by configuring the yaml file, initiated by the K8s command.
The deployment of the management service needs to solve the problem of dynamic update of the configuration of the management service besides creating a container mirror image, the configuration information of the management service is separated from a management service program by adopting a ConfigMap function, and configuration injection can be realized by an environment variable or an external mounted file through the ConfigMap function. And respectively writing the yaml file of the corresponding service for each management service through the following process.
Step A01: yaml file (configMap file for short) is configured.
Each management service may correspond to multiple profiles, one for each configmap.yaml file, facilitating association with management services within the corresponding Pod.
Yaml file is used to configure the configuration parameters that the pod needs to rely on when starting. The configmap file is convenient for modifying the configuration parameters transmitted to the service in the pod, and the parameters are convenient to manage without re-making a mirror image.
Step A02: a service.yaml file (referred to as a Service file) is defined.
Each management service corresponds to a service.yaml file, a targetPort is used in a container internal service, and an external host cannot directly access a container internal port, so that a container Pod needs to map the container internal port targetPort to a host nodecort to be accessed by the external host, and therefore, the mapping relation between the nodecort and the targetPort of the management service needs to be defined for communication between the management service in the Pod on a management node and the service on a non-management node.
Step A03: yaml file (RC file for short) is defined.
The step is used for realizing that one pod of the NameNode always exists in the K8s, and each pod needs to have corresponding storage so as to realize the persistent storage of data. Note that yaml file in "step a 01" is associated at this time, and the service within the pod corresponding to this RC can be loaded with the latest value of use when there is a change in ConfigMap.
The RC file belongs to the controller of K8s for keeping the pod number in K8 s. If the Zookeeper needs to deploy 3 nodes, that is, 3 Pod, and 3 Pod share one set of yaml file, the configuration number in the RC file is 3, and K8s can ensure that 3 Pod can operate simultaneously.
Since the container does not have a function of storing data, data is lost after the Pod reboots, and therefore data needing to be persisted by the management service needs to be persisted to a certain path of the host. If the Namenode is a management service, the Namenode needs to record the metadata of the DataNode to manage the DataNode; zookeeper is an application coordination service whose stored metadata also needs to be persisted down some way to the host.
Step A04: when changing the configuration via Ambari's user interface, Ambari-server tells K8s to update the configmap of step a01, and thus the parameter values within the corresponding management service.
The method for Ambari-server to notify K8s of configuration change may be: ambari-server starts a daemon script to monitor the change of the configmap file, and when the change is found, the daemon script process modifies the corresponding configmap file. The K8s then restarts the corresponding Pod-in-the-Pod service to load the new configmap file.
Third, deployment policy for unmanaged services
And non-management services such as DataNode, NodeManager, HRegionServer, Kafka and Solr are deployed on the physical machine. Wherein DataNode, NodeManager do not rely on Zookeper, HRegionServer, Kafka, and Solr need to rely on Zookeper. Thus, two types of discussion can be distinguished:
FIG. 4 illustrates an example of a cluster deployment of unmanaged services on a physical machine that does not rely on Zookeeper clusters.
The following is a description of a deployment procedure of an unmanaged service that does not depend on Zookeeper, and it should be noted that, because the unmanaged service depends on a corresponding management service, the unmanaged service corresponding to the management service needs to be deployed through Ambari after the corresponding management service is deployed on the K8s cluster of the management node.
Step B01: deploying K8s on a plurality of management service nodes to form a cluster and deploying HAproxy;
for example, none of the components HDFS, Yarn, Ambari depend on Zookeeper, and before deploying the non-management services of these components, it is necessary to deploy a K8s cluster + HAProxy environment on a management service node, then deploy the management services of these components in a K8s cluster, and finally deploy the non-management services of these components through Ambari.
Step B02: and compiling a corresponding yaml file according to the management service deployment strategy, and starting the Pod to run the corresponding management service in the K8s cluster.
Step B03: and deploying the non-management service corresponding to the management service at the non-management node, namely the data node through a user interface UI (user interface) of Ambari.
For example, for a core component HDFS of a Hadoop big data platform, a NameNode needs to be deployed on a management service node first, a NameNode service is started in a K8s cluster after deployment is completed, and finally a DataNode service is deployed and started on a non-management node as needed through a user interface UI interface of Ambari.
FIG. 5 illustrates an example deployment of a big data platform component that relies on Zookeeper clusters. The Zookeeper cluster dependent big data platform component may or may not include management services, for example, the Kafka component does not include management services defined by the present disclosure, and the HBase component includes both management services defined by the present disclosure and non-management services defined by the present disclosure. When the non-management service relying on the Zookeeper is deployed, the non-management service relying on the Zookeeper needs to communicate with the K8s to inform the K8s to create a Zookeeper service cluster for the non-management service. After K8s is created, K8s will combine the service domain name of the created Zookeeper cluster and the corresponding IP: the nodoport is dynamically loaded into the configuration of the management services and non-management services of these components.
The deployment process of unmanaged services for Zookeeper dependent large data platform components is described as follows:
step C01: deploying K8s on a plurality of management service nodes to form a cluster and deploying HAproxy;
as described above, before deploying non-management services of components that depend on Zookeeper, it is necessary to first deploy a K8s cluster + HAProxy environment on a management service node, then deploy Ambari management services in a K8s cluster, if there are management services for these components that depend on Zookeeper, it is also necessary to deploy corresponding management services at the management node, and finally deploy non-management services for these components through Ambari.
Step C02: and compiling a corresponding yaml file according to the management service deployment strategy, and starting the Pod to run the corresponding management service in the k8s cluster.
Step C03: deploying, in the K8s cluster, the Zookeeper cluster on which the Zookeeper-dependent component depends, and connecting the service domain name of the created Zookeeper cluster and the corresponding IP: the nodoport is dynamically loaded into the configuration of the management services and non-management services of these components.
For example, when an HBase component is deployed, before a management service of the HBase is deployed, before a non-management service of the HBase is deployed, or after all services of the HBase are deployed, a Zookeeper cluster on which the HBase depends is deployed, and after the Zookeeper cluster is successfully deployed, a service domain name, a network access address, and a port of the Zookeeper cluster are loaded into configurations of the management service and the non-management service of the HBase through a dynamic loading mechanism, which does not limit the deployment time of the Zookeeper cluster.
Step C04: and deploying the non-management service at a non-management service node, namely a data node through an Ambari user interface UI interface.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the device 600 includes: a processor 610 such as a Central Processing Unit (CPU), a communication bus 620, a communication interface 640, and a storage medium 630. Wherein the processor 610 and the storage medium 630 may communicate with each other through a communication bus 620. The storage medium 630 has stored therein a computer program for implementing the steps of the method provided by the present disclosure or running a service program of the management service or non-management service and distributed application coordination components described in the present disclosure when the computer program is executed by the processor 610.
The storage medium may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. In addition, the storage medium may be at least one memory device located remotely from the processor. The Processor may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), etc.; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
It should be recognized that embodiments of the present disclosure can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory memory. The method may be implemented in a computer program using standard programming techniques, including a non-transitory storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose. Further, operations of processes described by the present disclosure may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described in this disclosure (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executing collectively on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the disclosure may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, so that it may be read by a programmable computer, which when read by the computer may be used to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described in this disclosure includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The disclosure also includes the computer itself when programmed according to the methods and techniques described in this disclosure.
The above description is only an example of the present disclosure and is not intended to limit the present disclosure. Various modifications and variations of this disclosure will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (8)

1. A big data platform component deployment method, the method comprising:
deploying a container orchestration engine cluster at a plurality of management service nodes, and deploying high-availability components for the container orchestration engine cluster;
deploying a management service installation package of a big data platform assembly in a container arrangement engine cluster consisting of a plurality of management service nodes; the management service installation package only comprises an installation package of the management service which only participates in the distribution and the scheduling of the tasks in the big data platform assembly;
deploying the non-management service installation package of the big data platform assembly at one or more non-management nodes; the unmanaged service installation packages only comprise installation packages of unmanaged services which only participate in calculation of tasks and storage of data;
the management service of the big data platform component runs in a container, and in order to realize the dynamic update of the configuration of the management service, the injection of the configuration information is realized by adopting a mechanism of separating the configuration information from a service program; in order to realize the continuous service capability of the management service of the big data platform assembly, the container orchestration engine cluster enables a preset number of container processes bearing the management service of the big data platform assembly to continuously run through a control mechanism.
2. The method of claim 1, wherein the management service installation package further comprises an installation package of a management service of the big data management platform component, and the non-management service installation package further comprises an installation package of a non-management service of the big data management platform component, and the method further comprises:
after the management service of the big data management platform assembly is started through the container arranging engine, the one or more non-management service nodes are added through the management service of the big data management platform assembly, and then the step of deploying the non-management service installation package of the big data platform assembly at the one or more non-management nodes is executed.
3. The method of claim 2, wherein when the big data platform component relies on a distributed application coordination service component, the method further comprises:
and deploying a distributed application coordination service component cluster for the big data platform component in the container arrangement engine cluster consisting of a plurality of management service nodes, and dynamically loading a service domain name, a network access address and a port of the created distributed application coordination service component cluster into the configuration of management service and non-management service of the big data platform component.
4. The method of claim 3,
the container arranging engine cluster is K8s, the high-availability component is HAProxy, and the big data platform component comprises one or more of HDFS, Yarn, HBase, Zookeeper, Kafka and Solr components; the big data management platform component is Ambari; the distributed application coordination service component is a Zookeeper;
the management service of the Ambari is Ambari-Server, and the non-management service is Ambari-Agent;
the management service of the HDFS is NameNode, and the non-management service is DataNode;
the management service of the Yarn is Resourcemanager, and the non-management service is NodeManager;
the management service of the HBase is HMASter, and the non-management service is HRegionServer;
the Kafka and Solr services are non-management services.
5. A big data platform component deployment apparatus, the apparatus comprising: a plurality of management service nodes, one or more non-management service nodes, and a highly available component;
the management service nodes are provided with a container arrangement engine cluster, and the high-availability component provides high-availability for management services in the container arrangement engine cluster;
management services of the big data platform component are deployed in the plurality of management service nodes; the management service is a service which only participates in distribution and scheduling of tasks in the big data platform assembly;
the one or more non-management service nodes are deployed with non-management services of the big data platform component; the non-management service is a service which only participates in the calculation of tasks and the storage of data in the big data platform assembly;
the management service of the big data platform component runs in the container, and the management service of the big data platform component adopts a mechanism of separating configuration information and a service program to realize the injection of the configuration information;
and the container arranging engine cluster enables a preset number of container processes bearing the management service of the big data platform component to continuously run through a control mechanism.
6. The apparatus of claim 5,
the management service of the big data management platform component is also deployed in the plurality of management service nodes, and the non-management service of the big data management platform component is also deployed in the one or more non-management nodes;
after the container arrangement engine starts the management service of the big data management platform assembly, the management service of the big data management platform assembly adds the one or more non-management service nodes, and deploys the non-management service of the big data management platform assembly to the one or more non-management nodes.
7. The apparatus of claim 6,
and a distributed application coordination service component cluster which is depended by the big data platform component is also deployed in the management service nodes, and a service domain name, a network access address and a port of the distributed application coordination service component cluster are loaded into the configuration of the management service and the non-management service of the big data platform component through a dynamic loading mechanism.
8. The apparatus of claim 7,
the container arrangement engine cluster is K8s, the high-availability component is HAproxy, and the big data platform component comprises one or more of HDFS, Yarn, HBase, Zookeeper, Kafka and Solr components; the big data management platform component is Ambari; the distributed application coordination service component is a Zookeeper;
the management service of the Ambari is Ambari-Server, and the non-management service is Ambari-Agent;
the management service of the HDFS is NameNode, and the non-management service is DataNode;
the management service of the Yarn is Resourcemanager, and the non-management service is NodeManager;
the management service of the HBase is HMASter, and the non-management service is HRegionServer;
the Kafka and Solr services are non-management services.
CN202110459382.4A 2021-04-27 2021-04-27 Big data platform assembly deployment method and device Active CN113204353B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110459382.4A CN113204353B (en) 2021-04-27 2021-04-27 Big data platform assembly deployment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110459382.4A CN113204353B (en) 2021-04-27 2021-04-27 Big data platform assembly deployment method and device

Publications (2)

Publication Number Publication Date
CN113204353A CN113204353A (en) 2021-08-03
CN113204353B true CN113204353B (en) 2022-08-30

Family

ID=77028957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110459382.4A Active CN113204353B (en) 2021-04-27 2021-04-27 Big data platform assembly deployment method and device

Country Status (1)

Country Link
CN (1) CN113204353B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113329102B (en) * 2021-08-04 2021-10-29 苏州浪潮智能科技有限公司 Ambari Server system and network request response method
CN114860349B (en) * 2022-07-06 2022-11-08 深圳华锐分布式技术股份有限公司 Data loading method, device, equipment and medium
CN115080149B (en) * 2022-07-20 2023-06-27 荣耀终端有限公司 Control method of terminal equipment and terminal equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105743995A (en) * 2016-04-05 2016-07-06 北京轻元科技有限公司 Transplantable high-available container cluster deploying and managing system and method
CN108737468A (en) * 2017-04-19 2018-11-02 中兴通讯股份有限公司 Cloud platform service cluster, construction method and device
CN111580930A (en) * 2020-05-09 2020-08-25 山东汇贸电子口岸有限公司 Native cloud application architecture supporting method and system for domestic platform
CN111880895A (en) * 2020-07-13 2020-11-03 苏州浪潮智能科技有限公司 Data reading and writing method and device based on Kubernetes platform
CN112084009A (en) * 2020-09-17 2020-12-15 湖南长城科技信息有限公司 Method for constructing and monitoring Hadoop cluster and alarming based on containerization technology under PK system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11153194B2 (en) * 2019-04-26 2021-10-19 Juniper Networks, Inc. Control plane isolation for software defined network routing services

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105743995A (en) * 2016-04-05 2016-07-06 北京轻元科技有限公司 Transplantable high-available container cluster deploying and managing system and method
CN108737468A (en) * 2017-04-19 2018-11-02 中兴通讯股份有限公司 Cloud platform service cluster, construction method and device
CN111580930A (en) * 2020-05-09 2020-08-25 山东汇贸电子口岸有限公司 Native cloud application architecture supporting method and system for domestic platform
CN111880895A (en) * 2020-07-13 2020-11-03 苏州浪潮智能科技有限公司 Data reading and writing method and device based on Kubernetes platform
CN112084009A (en) * 2020-09-17 2020-12-15 湖南长城科技信息有限公司 Method for constructing and monitoring Hadoop cluster and alarming based on containerization technology under PK system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于OpenStack和Kubernetes的双向部署技术研究;杜磊;《电脑知识与技术》;20200105(第01期);全文 *

Also Published As

Publication number Publication date
CN113204353A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
US11226847B2 (en) Implementing an application manifest in a node-specific manner using an intent-based orchestrator
CN113204353B (en) Big data platform assembly deployment method and device
US10061619B2 (en) Thread pool management
US11392400B2 (en) Enhanced migration of clusters based on data accessibility
US11948014B2 (en) Multi-tenant control plane management on computing platform
CN111026414B (en) HDP platform deployment method based on kubernetes
US11853816B2 (en) Extending the Kubernetes API in-process
CN107451147B (en) Method and device for dynamically switching kafka clusters
US11588698B2 (en) Pod migration across nodes of a cluster
US20220283846A1 (en) Pod deployment method and apparatus
CN108491163B (en) Big data processing method and device and storage medium
CN111880936A (en) Resource scheduling method and device, container cluster, computer equipment and storage medium
CN117075930B (en) Computing framework management system
CN114679380A (en) Method and related device for creating edge cluster
CN110365743B (en) Zookeeper-based implementation method of load balancer supporting multiple customizable load algorithms
CN112256384B (en) Service set processing method and device based on container technology and computer equipment
Bekas et al. Cross-layer management of a containerized NoSQL data store
US20220413894A1 (en) Self orchestrated containers for cloud computing
US11768704B2 (en) Increase assignment effectiveness of kubernetes pods by reducing repetitive pod mis-scheduling
US11853783B1 (en) Identifying hosts for dynamically enabling specified features when resuming operation of a virtual compute instance
US20240143315A1 (en) Efficient configuration management in continuous deployment
CN112631727B (en) Monitoring method and device for pod group pod
US20240095092A1 (en) Ring architecture-based workload distribution in a microservice computing environment
US20240143318A1 (en) Efficient configuration management in continuous deployment
US20230236897A1 (en) On-demand clusters in container computing environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant