CN111327681A - Cloud computing data platform construction method based on Kubernetes - Google Patents

Cloud computing data platform construction method based on Kubernetes Download PDF

Info

Publication number
CN111327681A
CN111327681A CN202010068966.4A CN202010068966A CN111327681A CN 111327681 A CN111327681 A CN 111327681A CN 202010068966 A CN202010068966 A CN 202010068966A CN 111327681 A CN111327681 A CN 111327681A
Authority
CN
China
Prior art keywords
data
cluster
kubernetes
cloud computing
spark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010068966.4A
Other languages
Chinese (zh)
Inventor
王凌霄
张建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202010068966.4A priority Critical patent/CN111327681A/en
Publication of CN111327681A publication Critical patent/CN111327681A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cloud computing data platform construction method based on Kubernetes, which realizes resource isolation and control through Docker technology and manages and arranges containers by utilizing the Kubernetes; data acquisition and data stream transmission and management are realized through the Flume + Kafka; processing the acquired data by using a Spark calculation frame, and storing the data by using an HDFS (Hadoop distributed file system) as a distributed storage system to realize dynamic expansion of storage nodes; the functions of inquiry and analysis are realized by combining an ElasticSearch tool and an HBase + Phoenix tool; the invention realizes the functions of data acquisition, storage analysis and monitoring of the whole platform, efficiently completes the processing, circulation and storage of the data, realizes the acquisition of resources as required by a micro-service mode, avoids unnecessary resource waste, and realizes the load balance, disaster recovery and elastic expansion of the cluster.

Description

Cloud computing data platform construction method based on Kubernetes
Technical Field
The invention belongs to the field of cloud computing big data, and particularly relates to a cloud computing platform and a storage framework.
Background
Nowadays, IT technology is continuously updated and updated, new technology is also endlessly developed, and cloud computing is a new computing mode, and through rapid development for more than 10 years, the cloud computing is widely applied to design and research and development in various fields. The cloud computing-based big data era is also fortunate, the data volume of various industries reaches hundreds of millions of bytes (TB), data is increased progressively at an explosive speed along with the development of the technical industries such as the AI, the IoT, the big data and the like, the traditional data architecture cannot meet the processing requirement of the current big data, the complexity of a data platform can be greatly reduced, the operation and maintenance work is simplified, the utilization rate of resources is improved by adopting the cloud computing, and the load balance of a server is achieved.
The existing cloud computing data platform realizes distributed computing by taking a distributed storage system as a storage basis, greatly improves computing capacity, establishes a data platform supported by mass data services, can bear access pressure of PV of tens of millions level every day, realizes second-level operation in real-time processing, can process user data in real time, analyzes user behaviors and excavates internal relation of data, and provides data support for a company decision layer. However, the existing problems are that as data increases at the TB level every day, the pressure borne by a data platform also increases day by day, the data platform cannot cope with real-time load, for the problem of peak period of the platform, thousands of nodes cannot be transversely expanded, automatic deployment cannot be realized, cluster expansion and deployment need artificial management, and various additional problems are increased by complex operation flows Real-time load reduces human error factors, improves the utilization rate of cluster resources, and improves the working efficiency of enterprises.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a cloud computing data platform construction method based on Kubernetes. The technical scheme adopted by the invention is as follows:
1. the system provides a high-availability, high-reliability and distributed function of collecting, aggregating and transmitting mass logs, collects log data from various log sources, and stores the log data on the HDFS so as to be convenient for centralized statistical analysis and processing, and the whole framework of the system is divided into three layers, namely an Agent layer, a Collector layer and a Store layer, wherein each machine of the Agent layer is provided with a process which is responsible for the log collection work of a single machine; the Collector layer is deployed on the central server and is responsible for receiving the logs sent by the Agent layer and writing the logs into the corresponding Store layer according to the routing rules; the Store layer is responsible for providing permanent or temporary log storage services, or directing log streams to other servers. The Agent to the Collector uses a LoadBalance strategy to uniformly send all the logs to all the collectors so as to achieve the purpose of load balancing, and the collectors can be linearly expanded. The target of the Collector layer is mainly three: SinkHdfs, SinkKafka and SinkBypass. Providing offline data to the HDFS and real-time log streams to Kafka and Bypass, respectively;
2. the method comprises the following steps of using Sqoop as a batch data migration tool, wherein the Sqoop is used for exporting data from a relational database such as MySQL, Oracle to HDFS of Hadoop and exporting data from a file system of Hadoop to the relational database;
3. the Kafka is used as a multi-type data pipeline and a multi-type message system, the architecture of the Kafka is divided into three layers, namely a producer, a brooker and a consumer, wherein the producer can be a server log, service data, PV generated by the front end of a page and the like; the broker saves the issued messages, can be horizontally expanded, and ensures the high throughput rate of Kafka; the consumer pulls data from the broker to provide a data source for subsequent spark cluster consumption, and Kafka can guarantee real-time performance and sequentiality of the data. In addition, Kafka uses zookeeper to realize dynamic cluster expansion, and realizes a dynamic load balancing mechanism;
4. the HDFS is used as a storage system of data, a double-NameNode architecture is introduced into Hadoop2.X, two NameNodes are respectively configured into an Active/Passive state by HA, the NameNode in the Active state is called as an Active NameNode, and the NameNode in the Passive state is called as a Standby NameNode. The Standby Namenode is used as the hot backup of the Active Namenode, and can be automatically switched to the Active Namenode in an elegant mode when the Namenode breaks down or needs to be restarted due to daily server maintenance. The NameNode is used as a Master, manages the name space of the HDFS, configures copy strategies and mapping information of data blocks, processes read-write requests from clients, and the DataNode is used as a Slave and is the actual storage position of data. Storing the data collected by the flash, the Sqoop migration data and the data collected by the Kafka in real time on the HDFS;
5. and (3) using Spark as a core computing engine of the cloud computing data platform, and realizing real-time processing of data by using Spark SQL and Spark streaming. The kernel of Spark is RDD, also called elastic distributed data set, each RDD is divided into a plurality of partitions, different partitions run on different nodes of the cluster, and are computing units of Spark, a group of RDDs may form an executed directed acyclic Graph RDD Graph, and data processing on RDD is realized by defining a series of transformation and action operations based on RDD. The DAG Scheduler builds a DAG based on the Stage according to the Job (Job), submits the Stage to the Task Scheduler, and then the Task Scheduler distributes the Task (Task) to the executive to execute the final Task;
6. using the ElasticSearch as a real-time search and analysis engine, multiple types of searches can be performed and merged. The ElasticSearch is realized based on Lucene and has all indexing and searching functions, a plurality of ES process instances are started on a plurality of machines to form an ES cluster, a master node is generated in the cluster through election, the nodes of the whole cluster are managed, all the nodes store all data, the nodes belong to the cluster, and the data are uniformly distributed by the cluster;
7. the method comprises the steps of using HBase and Phoenix to achieve a real-time query function of big data, wherein the HBase is of a distributed architecture and comprises a Master and Region servers, the Master is used for coordinating a plurality of Region servers, detecting states among the Region servers, balancing loads among the Region servers, clients are directly connected with the Region servers, and data in the HBase are obtained through a communication mechanism. Phoenix provides OLTP related functions and API of standard JDBC, so that an application program originally built on JDBC directly asks HBabse through Phoenix;
8. the method comprises the steps of using a Docker container to realize resource isolation and resource control, generating mirror images of different service modules through a Docker technology, realizing lightweight virtualization service through Namespace, having independent resources in different containers, and realizing mutual isolation of Network, PID, UID, IPC, UTS, User, Mount and the like; the CGroups technology is used for realizing the limitation and charging management of resources such as a container memory, a CPU (Central processing Unit), a disk IO (input/output) and the like, providing a unified framework for the resource management of the whole system, providing a unified interface for different application programs, packaging all required dependencies and libraries for application operation in the container, and calling the services through an interface layer API (application programming interface). In addition, because the containers are isolated in communication and process, the same Namespace can be communicated with each other, but different Namespaces cannot be communicated with each other, and the communication and data forwarding are realized by accessing the network equipment into a network bridge;
9. kubernetes is used for realizing the functions of resource scheduling, automatic deployment, elastic capacity expansion and capacity reduction of the application of the container, belongs to a Master-slave distributed architecture, and consists of a Master and a Node. The Master runs four components of ect, APIServer, Controller Manager and Scheduler, the ect is used for persisting the resource object in the cluster, and the rest three are used for scheduling and managing the cluster resource. The Node runs three components of Kubelet, Proxy and Docker Deamon, manages Pod on each Node, and realizes load balancing and service agent functions. In Kubernetes, a container cluster network is replanned by using a Fannel, so that all containers in the cluster obtain IP which belongs to an intranet and is not repeated, and the containers on different nodes are communicated with each other through the IP of the intranet.
10. The authentication service in the cluster is realized by using Kerberos, the Kerberos adopts a traditional key sharing mode to realize the previous communication between a Client and a Server under the condition that the network environment does not necessarily guarantee the security, the Kerberos is a third party authentication mechanism, and a user and the service depend on a third party (a Kerberos Server) to carry out identity verification on each other. The Kerberos server itself is called a key distribution center or KDC.
Drawings
FIG. 1 is a cloud computing data platform based on Kubernetes
FIG. 2 is a data platform
FIG. 3 is a Spark frame diagram
FIG. 4 is a diagram of a Flume three-layer structure
FIG. 5 is a Kafka three-layer structure diagram
FIG. 6 is a high-available HDFS architecture diagram
FIG. 7 is a diagram of HBase framework
Detailed Description
The specific implementation is shown in fig. 1. The invention provides a cloud computing data platform construction method based on Kubernets, which takes Kubernets as a container arrangement tool, utilizes a Docker container to realize isolation, control and scheduling of an IaaS resource layer, and IaaS (Infrastructure as a Service) is a resource pool formed by aggregating IT basic resources such as computing, network, storage and other resources through virtualization and dynamic, and a terminal user can acquire computing resources in the resource pool in a Service form to deploy own system and application programs without paying attention to how the layer is realized, and only needs to pay to use various resources in the resource pool. The Etcd is an important component in kubernets, and is used to store state information of all network configurations and objects in a cluster, such as Flannel network information. Kube-DNS is a DNS-based Service discovery module of Kubernetes, and a simple Service registration discovery and load balancing mode is realized by registering Service in DNS. Kubernets is a Master-slave distributed architecture, four components, namely an ect component, an APIServer component, a Controller Manager component and a Scheduler component, are operated on a Master, the APIServer component provides a unique entrance for resource operation and provides mechanisms such as authentication, authorization, access control, API registration and discovery, the Controller Manager component is responsible for maintaining the state of a cluster, such as fault detection and automatic expansion, and the Scheduler component schedules a pod to a corresponding node according to a preset scheduling strategy. The Node runs three components of Kubelet, Proxy and Docker Deamon, Kubelet is responsible for the management of life cycles of the Node, such as the creation, modification, monitoring, deletion and the like, and Proxy acquires the configuration information of service and endpoints from etcd, then starts a Proxy process from the Node according to the configuration information and monitors a corresponding service port, and then distributes the process to different containers for processing in a balanced manner according to external requests. The Docker Deamon is then responsible for the image management, pod and container real operations. A data platform is built on the cloud computing data platform, the data platform comprises HDFS, HBase, Spark, ES, flash, Kafka and the like, application, service, configuration, mirror images and the like of the cloud computing data platform are packaged, service is provided through a unified external interface, and specific requirements on business are met. The whole platform also needs a monitoring system, the resource use condition of the whole cloud computing platform is monitored by using the Heapster, the InfluxDB and the Grafana, metrics and event data in the cluster are written into the InfluxDB and provided for the Grafana to be inquired, and a graphical interface is used for displaying, so that operation and maintenance personnel can conveniently control the use condition of the whole cluster resource.
A cloud computing data platform architecture diagram is shown in fig. 2. And acquiring the log data of the App and the Web end by using the flash, transmitting the acquired data to a data platform for processing through Kafka, synchronizing the data in the database by using Sqoop, and storing the final result into the HDFS so as to facilitate later off-line calculation. The data computing platform is divided into offline computing and real-time computing, online data needing to be computed are requested by a user in real time through Spark streaming and Storm, the data are subjected to real-time online computing, results are returned to the user, the data can be rapidly achieved through a Spark RDD mechanism, and a real-time data warehouse can be built accordingly. Historical data stored in the HDFS can be used for constructing an offline data warehouse through methods of effective extraction, integration, mining and the like, logic is combed again by using Hive, and execution is performed by converting HiveSQL into a corresponding MapReduce task, so that data analysis can be provided for business requirements, and data support is provided for decision making.
The Spark architecture diagram is shown in fig. 3. The Client is used for submitting the application program, the Driver runs the application program submitted by the Client, the splitting of the application program is completed by creating a context SparkContext, the context SparkContext comprises RDD, DAGSScheduler, TasScheduler, SparkEnv and the like, the application program is converted into RDD, the RDD is divided into a directed acyclic graph according to a wide and narrow dependency rule, a task is generated, and the directed acyclic graph is sent to the executor to be executed. And the ClusterManager manages the scheduling of the whole cluster resource and monitors the running condition of each node through a heartbeat mechanism.
The Flume architecture diagram is shown in figure 4. The method includes that Flume realizes data input and output through an agent, the agent comprises a source, a channel and a sink, various data of WebService are collected through the source, the collected data are sorted by using a temporary storage function of the channel, and the sink is used for sending the collected data to a destination of needed data, such as HDFS, Spark and the like.
The Kafka architecture diagram is shown in fig. 5. Kafka has a similar structure to that of flash, and is also a three-layer structure, a Producer is used for producing data, a Broker is equivalent to a basket for holding data, and a Consumer is used for consuming data, wherein the Broker can realize horizontal expansion, the Broker can be divided into a plurality of partitions, different data are sequentially stored in the corresponding partitions, each message is marked by offset, and each partition can guarantee the sequence of the data, and the validity of the data can be guaranteed by using the high reliability and consistency of Kafka.
The HDFS architecture diagram is shown in FIG. 6. The HDFS introduces HA, the Active NameNode is used for processing read-write requests of the client, and the Standby NameNode is used as the Slave of the Active NameNode and keeps state synchronization with the Active NameNode as much as possible, so that switching can be completed quickly when the Active NameNode fails. Both namenodes need to communicate with a set of Journal nodes. The Active Namenode persists the modification log into the Journal Node. The Standby Namenode continuously monitors these Journal nodes, and when the monitoring finds that these modification logs are changed, these modifications are applied to its namespace and kept consistent with the namespace metadata in the Active Namenode.
The HBase framework diagram is shown in FIG. 7. HBase is a nematic distributed database with efficient real-time read-write performance, the structure of HBase is a master-slave structure, HMASter is elected by Zookeeper, and HMASter and HRegionServer report heartbeat to Zookeeper.
The HMmaster performs load balancing on the Region and distributes the load to a proper HRegion Server, wherein the HRegion Server comprises components such as HRegion, HLog, HFile, Memstore, storefile and the like. HBase divides the table into multiple HRegions, each HRegion stores a certain section of continuous data in the table; when HRegion reaches the threshold, HRegion is equally divided into two new HRegions; the HLog file records attribution information of the written data; one HRegion is composed of a plurality of storeys, each storere comprises a MemStore in a memory and a StoreFile in a disk, when data in the MemStore reaches a certain threshold value, the HRegionServer starts a flash cache process to write into a storeFile, and when the number of the storeFile files increases to a certain threshold value, the system merges.
The authentication service in the cluster is realized by using Kerberos, the Kerberos adopts a traditional key sharing mode to realize the previous communication between a Client and a Server under the condition that the network environment does not necessarily guarantee the security, the Kerberos is a third party authentication mechanism, and a user and the service depend on a third party (a Kerberos Server) to carry out identity verification on each other. The Kerberos server itself is called a key distribution center or KDC. The client establishes connection by requesting a Ticket-Granting Ticket (TGT) from a Key Distribution Center (KDC), and the KDC sends the TGT back to the client in an encrypted form after establishing the TGT; then the client end sends its TGT as its identity certificate to KDC, requests ticket of specific service from KDC, KDC sends the ticket of specific service to the client end; finally, the client sends the ticket to the server, and the server allows the client to access.

Claims (5)

1. A cloud computing data platform construction method based on Kubernetes is characterized by comprising the following contents:
using the flash as a log collection system, wherein the system provides functions of log collection, aggregation and transmission, collects log data from various log sources, stores the log data on the HDFS for centralized statistical analysis processing, and respectively provides offline data to the HDFS and provides real-time log streams to Kafka and Bypass;
using Sqoop as a tool for mutual data transfer between the relational database and the HDFS;
using Kafka as a data pipeline and a message subscription and release system, wherein a producer is log data generated by a server and service data generated by a back end; the middle browser is used as a storage array to store the message issued by the producer; the consumer pulls data from the browser to provide a data source for the Spark cluster;
the HDFS is used as a data storage system, a double-NameNode architecture is introduced into Hadoop2.X, two NameNodes are respectively configured into Active/Passive states by HA, and the Standby NameNode is used as the hot backup of the Active NameNode and can be automatically switched into the Active NameNode when the NameNode breaks down or needs to be restarted due to daily server maintenance; the NameNode is used as a Master, manages the name space of the HDFS, configures copy strategies and mapping information of data blocks, processes read-write requests from a client, and the DataNode is used as a Slave and stores the data collected by the FLUME, the data migrated by the Sqoop and the data collected by the Kafka in real time;
using Spark as a core computing engine of a cloud computing data platform, and realizing real-time processing of data by using Spark SQL and Spark streaming; the kernel of Spark is RDD, namely an elastic distributed data set, which is an invariable distributed object set, each RDD is divided into a plurality of partitions, different partitions run on different nodes of the cluster, which forms a distributed computing model of Spark, and the data of the RDD is processed through transformation and action operations based on the RDD; abstracting an execution model based on a Spark framework into DAG, submitting different stages of the DAG scheduler to the Taskscheduler according to the width dependence of RDD, and then submitting the stages to an executive to execute a final task;
using an ElasticSearch as a real-time search and analysis engine to perform and merge multiple types of searches; the ElasticSearch is realized based on Lucene and has all indexing and searching functions, a plurality of ES process instances are started on a certain number of machines to form an ES cluster, a master node is generated in the cluster through election to manage nodes of the whole cluster, all slave nodes store all data, the nodes are subordinate to the cluster, and the data are uniformly distributed by the cluster;
the method comprises the steps that a real-time query function of big data is achieved by using HBase and Phoenix, the HBase adopts a distributed architecture and consists of a Master and Region servers, the Master is used for coordinating a plurality of Region servers, detecting states among the Region servers and balancing loads among the Region servers, clients are directly connected with the Region servers, and data in the HBase is obtained by using a communication mechanism; phoenix provides OLTP related functions and API of standard JDBC, supports ACID, SQL and secondary index, and enables an application program originally built on JDBC to directly access HBabese through Phoenix;
the method comprises the steps of using a Docker container to realize resource isolation and resource control, generating mirror images of different service modules through a Docker technology, realizing lightweight virtualization service through Namespace, and realizing mutual isolation of resources due to the fact that different containers have independent resources; the management of container resources is realized through the CGroups technology, a unified framework is provided for the management of the resources of the whole system, a unified interface is provided for different application programs, all the required dependencies and libraries for the application operation are contained in the container, and the services are packaged and then called through an interface layer API; in addition, the same Namespace can communicate with each other, but different Namespaces cannot communicate with each other, and the network equipment is accessed into the network bridge to realize communication and data forwarding;
kubernetes is used for realizing the functions of resource scheduling, automatic deployment and elastic capacity expansion and reduction of the application of the container, belongs to a Master-slave distributed architecture and consists of a Master and a Node; operating four components of ect, APIServer, Controller Manager and Scheduler on the Master, wherein ect is used for persisting resource objects in the cluster, and the rest three components are used for scheduling and managing cluster resources; running three components of Kubelet, Proxy and Docker Deamon on the nodes, managing Pod on each Node and realizing load balancing and service agent functions; in Kubernetes, a container cluster network is replanned by using a Fannel, so that all containers in a cluster obtain IPs which belong to an intranet and are not repeated, and the containers on different nodes are communicated with each other through the IPs of the intranet.
2. The method for constructing the cloud computing data platform based on Kubernetes as claimed in claim 1, wherein: zookeeper is used as a distributed coordination service framework.
3. The method for constructing the cloud computing data platform based on Kubernetes as claimed in claim 1, wherein: the Heapster + InfluxDB + Grafana is used for data acquisition and summarization, and detailed resource use conditions are provided from all layers, so that resource management and scheduling are facilitated.
4. The method for constructing the cloud computing data platform based on Kubernetes as claimed in claim 1, wherein: YARN was used as the resource scheduling platform.
5. The method for constructing the cloud computing data platform based on Kubernetes as claimed in claim 1, wherein: kerberos is used to provide authentication services for the cluster.
CN202010068966.4A 2020-01-21 2020-01-21 Cloud computing data platform construction method based on Kubernetes Pending CN111327681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010068966.4A CN111327681A (en) 2020-01-21 2020-01-21 Cloud computing data platform construction method based on Kubernetes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010068966.4A CN111327681A (en) 2020-01-21 2020-01-21 Cloud computing data platform construction method based on Kubernetes

Publications (1)

Publication Number Publication Date
CN111327681A true CN111327681A (en) 2020-06-23

Family

ID=71171276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010068966.4A Pending CN111327681A (en) 2020-01-21 2020-01-21 Cloud computing data platform construction method based on Kubernetes

Country Status (1)

Country Link
CN (1) CN111327681A (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111751788A (en) * 2020-06-29 2020-10-09 成都数之联科技有限公司 Auxiliary enhancement system for big data intelligent detection equipment
CN111880934A (en) * 2020-07-29 2020-11-03 北京浪潮数据技术有限公司 Resource management method, device, equipment and readable storage medium
CN112068847A (en) * 2020-09-07 2020-12-11 海南大学 Computing environment deployment method and device based on kubernets platform
CN112328569A (en) * 2020-07-31 2021-02-05 山东云缦智能科技有限公司 Construction method based on Flume distributed data collection architecture
CN112380040A (en) * 2021-01-18 2021-02-19 北京美摄网络科技有限公司 Message processing method and device, electronic equipment and storage medium
CN112422683A (en) * 2020-11-19 2021-02-26 浪潮云信息技术股份公司 API gateway service high-availability implementation method under K8S environment
CN112486995A (en) * 2020-12-01 2021-03-12 中国人寿保险股份有限公司 Method and device for real-time updating and inquiring of mass business data and electronic equipment
CN112527767A (en) * 2020-12-03 2021-03-19 许继集团有限公司 Method and system for completely repairing multiple region tables after restart of distributed database
CN112532751A (en) * 2021-02-09 2021-03-19 中关村科学城城市大脑股份有限公司 Method and system for scheduling distributed heterogeneous computing power of urban brain AI computing center
CN112688914A (en) * 2020-11-30 2021-04-20 广东电网有限责任公司 Intelligent cloud platform dynamic sensing method
CN112698878A (en) * 2020-12-18 2021-04-23 浙江中控技术股份有限公司 Calculation method and system based on algorithm microservice
CN112714018A (en) * 2020-12-28 2021-04-27 上海领健信息技术有限公司 Gateway-based ElasticSearch search service method, system, medium and terminal
CN112764898A (en) * 2021-01-18 2021-05-07 北京思特奇信息技术股份有限公司 Method and system for scheduling tasks among containers
CN112804362A (en) * 2021-04-06 2021-05-14 湖南师范大学 Dispersed data micro-service automation operation and maintenance system
CN113051061A (en) * 2021-04-20 2021-06-29 南京理工大学 Design of group-string matching stream type big data platform
CN113067850A (en) * 2021-02-20 2021-07-02 麒麟软件有限公司 Cluster arrangement system under multi-cloud scene
CN113238928A (en) * 2021-04-23 2021-08-10 杭州电子科技大学 End cloud collaborative evaluation system for audio and video big data task
CN113312165A (en) * 2021-07-28 2021-08-27 浙江大华技术股份有限公司 Task processing method and device
CN113485650A (en) * 2021-07-26 2021-10-08 南京鹏云网络科技有限公司 Data arrangement system
CN113704069A (en) * 2021-07-20 2021-11-26 北京直真科技股份有限公司 Alarm system fault positioning method based on flash log collection technology
CN114490847A (en) * 2022-01-17 2022-05-13 武汉魅客科技有限公司 Smart energy cloud platform data processing method
CN114661482A (en) * 2022-05-25 2022-06-24 成都索贝数码科技股份有限公司 GPU computing power management method, medium, equipment and system
CN114756170A (en) * 2022-04-02 2022-07-15 苏州空天信息研究院 Storage isolation system and method for container application
CN115102851A (en) * 2022-08-26 2022-09-23 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Fusion platform for HPC and AI fusion calculation and resource management method thereof
CN115412553A (en) * 2022-08-03 2022-11-29 浪潮云信息技术股份公司 CMDB automatic configuration method based on distributed cloud platform
CN116431292A (en) * 2023-06-13 2023-07-14 中孚安全技术有限公司 Method, system and device for isolating server resources and readable storage medium
CN116501947A (en) * 2023-06-21 2023-07-28 中国传媒大学 Construction method, system and equipment of semantic search cloud platform and storage medium
CN117076555A (en) * 2023-05-08 2023-11-17 芜湖本初子午信息技术有限公司 Distributed task management system and method based on calculation
WO2024114409A1 (en) * 2022-11-30 2024-06-06 杭州阿里云飞天信息技术有限公司 Data processing method and data processing system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113597A (en) * 2014-07-18 2014-10-22 西安交通大学 Multi- data-centre hadoop distributed file system (HDFS) data read-write system and method
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform
US9805345B1 (en) * 2014-11-10 2017-10-31 Turbonomic, Inc. Systems, apparatus, and methods for managing quality of service agreements
WO2018205325A1 (en) * 2017-05-08 2018-11-15 网宿科技股份有限公司 Method and system for use in constructing content delivery network platform on heterogeneous resources
CN108848157A (en) * 2018-06-12 2018-11-20 郑州云海信息技术有限公司 A kind of method and apparatus of Kubernetes cluster container monitors
CN109257370A (en) * 2018-10-22 2019-01-22 武汉极意网络科技有限公司 The processing system of checking request
CN109840253A (en) * 2019-01-10 2019-06-04 北京工业大学 Enterprise-level big data platform framework
CN110196871A (en) * 2019-03-07 2019-09-03 腾讯科技(深圳)有限公司 Data storage method and system
CN110489204A (en) * 2019-07-01 2019-11-22 广东轩辕网络科技股份有限公司 A kind of big data platform architecture system based on container cluster
CN110531987A (en) * 2019-07-30 2019-12-03 平安科技(深圳)有限公司 Management method, device and computer readable storage medium based on Kubernetes cluster

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113597A (en) * 2014-07-18 2014-10-22 西安交通大学 Multi- data-centre hadoop distributed file system (HDFS) data read-write system and method
US9805345B1 (en) * 2014-11-10 2017-10-31 Turbonomic, Inc. Systems, apparatus, and methods for managing quality of service agreements
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform
WO2018205325A1 (en) * 2017-05-08 2018-11-15 网宿科技股份有限公司 Method and system for use in constructing content delivery network platform on heterogeneous resources
CN108848157A (en) * 2018-06-12 2018-11-20 郑州云海信息技术有限公司 A kind of method and apparatus of Kubernetes cluster container monitors
CN109257370A (en) * 2018-10-22 2019-01-22 武汉极意网络科技有限公司 The processing system of checking request
CN109840253A (en) * 2019-01-10 2019-06-04 北京工业大学 Enterprise-level big data platform framework
CN110196871A (en) * 2019-03-07 2019-09-03 腾讯科技(深圳)有限公司 Data storage method and system
CN110489204A (en) * 2019-07-01 2019-11-22 广东轩辕网络科技股份有限公司 A kind of big data platform architecture system based on container cluster
CN110531987A (en) * 2019-07-30 2019-12-03 平安科技(深圳)有限公司 Management method, device and computer readable storage medium based on Kubernetes cluster

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
JSTARSEVEN: "Hbase(一)了解Hbase与Phoenix", 《BBSMAX HTTPS://WWW.BBSMAX.COM/A/A2DMQVG4DE/》 *
LIPVIOLET: "spark core、spark sql、spark streaming联系与区别", 《CSDN博客 HTTPS://BLOG.CSDN.NET/LIPVIOLET/ARTICLE/DETAILS/88173227》 *
PHLSHEJI: "Spark简述及基本架构", 《博客园 HTTPS://WWW.CNBLOGS.COM/BHLSHEJI/P/5153108.HTML》 *
余宣杰: "《银行大数据应用》", 31 August 2019 *
尤永康: "《私有云架构设计与实践》", 31 December 2019 *
散尽浮华: "Docker容器基础介绍", 《博客园 HTTPS://WWW.CNBLOGS.COM/KEVINGRACE/P/5252929.HTML》 *
王素贞: "《大数据技术基础实验教程》", 31 July 2018 *
石瑞生: "《网络空间安全专业规划教材 大数据安全与隐私保护》", 31 May 2019 *
純黑色: "一、Kubernetes系列之介绍篇", 《博客园 HTTPS://WWW.CNBLOGS.COM/XHYAN/P/6656062.HTML》 *
青岛英谷教育科技股份有限公司: "《大数据开发与应用》", 31 August 2018 *
马慧民: "《智能新零售数据智能时代的零售业变革》", 31 January 2019 *

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111751788A (en) * 2020-06-29 2020-10-09 成都数之联科技有限公司 Auxiliary enhancement system for big data intelligent detection equipment
CN111880934A (en) * 2020-07-29 2020-11-03 北京浪潮数据技术有限公司 Resource management method, device, equipment and readable storage medium
CN112328569A (en) * 2020-07-31 2021-02-05 山东云缦智能科技有限公司 Construction method based on Flume distributed data collection architecture
CN112068847A (en) * 2020-09-07 2020-12-11 海南大学 Computing environment deployment method and device based on kubernets platform
CN112422683B (en) * 2020-11-19 2023-02-03 浪潮云信息技术股份公司 API gateway service high-availability implementation method in K8S environment
CN112422683A (en) * 2020-11-19 2021-02-26 浪潮云信息技术股份公司 API gateway service high-availability implementation method under K8S environment
CN112688914A (en) * 2020-11-30 2021-04-20 广东电网有限责任公司 Intelligent cloud platform dynamic sensing method
CN112486995A (en) * 2020-12-01 2021-03-12 中国人寿保险股份有限公司 Method and device for real-time updating and inquiring of mass business data and electronic equipment
CN112527767B (en) * 2020-12-03 2024-05-10 许继集团有限公司 Method and system for completely repairing multiple region tables after restarting distributed database
CN112527767A (en) * 2020-12-03 2021-03-19 许继集团有限公司 Method and system for completely repairing multiple region tables after restart of distributed database
CN112698878A (en) * 2020-12-18 2021-04-23 浙江中控技术股份有限公司 Calculation method and system based on algorithm microservice
CN112714018A (en) * 2020-12-28 2021-04-27 上海领健信息技术有限公司 Gateway-based ElasticSearch search service method, system, medium and terminal
CN112764898A (en) * 2021-01-18 2021-05-07 北京思特奇信息技术股份有限公司 Method and system for scheduling tasks among containers
CN112380040A (en) * 2021-01-18 2021-02-19 北京美摄网络科技有限公司 Message processing method and device, electronic equipment and storage medium
CN112532751B (en) * 2021-02-09 2021-05-07 中关村科学城城市大脑股份有限公司 Method and system for scheduling distributed heterogeneous computing power of urban brain AI computing center
CN112532751A (en) * 2021-02-09 2021-03-19 中关村科学城城市大脑股份有限公司 Method and system for scheduling distributed heterogeneous computing power of urban brain AI computing center
CN113067850A (en) * 2021-02-20 2021-07-02 麒麟软件有限公司 Cluster arrangement system under multi-cloud scene
CN113067850B (en) * 2021-02-20 2023-04-07 麒麟软件有限公司 Cluster arrangement system under multi-cloud scene
CN112804362A (en) * 2021-04-06 2021-05-14 湖南师范大学 Dispersed data micro-service automation operation and maintenance system
CN112804362B (en) * 2021-04-06 2021-06-22 湖南师范大学 Dispersed data micro-service automation operation and maintenance system
CN113051061A (en) * 2021-04-20 2021-06-29 南京理工大学 Design of group-string matching stream type big data platform
CN113238928A (en) * 2021-04-23 2021-08-10 杭州电子科技大学 End cloud collaborative evaluation system for audio and video big data task
CN113238928B (en) * 2021-04-23 2022-05-06 杭州电子科技大学 End cloud collaborative evaluation system for audio and video big data task
CN113704069A (en) * 2021-07-20 2021-11-26 北京直真科技股份有限公司 Alarm system fault positioning method based on flash log collection technology
CN113485650A (en) * 2021-07-26 2021-10-08 南京鹏云网络科技有限公司 Data arrangement system
CN113312165A (en) * 2021-07-28 2021-08-27 浙江大华技术股份有限公司 Task processing method and device
CN114490847A (en) * 2022-01-17 2022-05-13 武汉魅客科技有限公司 Smart energy cloud platform data processing method
CN114756170A (en) * 2022-04-02 2022-07-15 苏州空天信息研究院 Storage isolation system and method for container application
CN114661482A (en) * 2022-05-25 2022-06-24 成都索贝数码科技股份有限公司 GPU computing power management method, medium, equipment and system
CN115412553A (en) * 2022-08-03 2022-11-29 浪潮云信息技术股份公司 CMDB automatic configuration method based on distributed cloud platform
CN115102851A (en) * 2022-08-26 2022-09-23 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Fusion platform for HPC and AI fusion calculation and resource management method thereof
WO2024114409A1 (en) * 2022-11-30 2024-06-06 杭州阿里云飞天信息技术有限公司 Data processing method and data processing system
CN117076555A (en) * 2023-05-08 2023-11-17 芜湖本初子午信息技术有限公司 Distributed task management system and method based on calculation
CN117076555B (en) * 2023-05-08 2024-03-22 深圳市优友网络科技有限公司 Distributed task management system and method based on calculation
CN116431292A (en) * 2023-06-13 2023-07-14 中孚安全技术有限公司 Method, system and device for isolating server resources and readable storage medium
CN116501947A (en) * 2023-06-21 2023-07-28 中国传媒大学 Construction method, system and equipment of semantic search cloud platform and storage medium
CN116501947B (en) * 2023-06-21 2023-10-27 中国传媒大学 Construction method, system and equipment of semantic search cloud platform and storage medium

Similar Documents

Publication Publication Date Title
CN111327681A (en) Cloud computing data platform construction method based on Kubernetes
Padhy Big data processing with Hadoop-MapReduce in cloud systems
Ju et al. iGraph: an incremental data processing system for dynamic graph
CN109933631A (en) Distributed parallel database system and data processing method based on Infiniband network
Narkhede et al. HMR log analyzer: Analyze web application logs over Hadoop MapReduce
Soumaya et al. Real-time data stream processing challenges and perspectives
Arfat et al. Big data for smart infrastructure design: Opportunities and challenges
Gohil et al. Efficient ways to improve the performance of HDFS for small files
CN117056303B (en) Data storage method and device suitable for military operation big data
Nivash et al. Analysis on enhancing storm to efficiently process big data in real time
CN115083538B (en) Medicine data processing system, operation method and data processing method
Sarr et al. Transpeer: Adaptive distributed transaction monitoring for web2. 0 applications
Thanekar et al. A study on MapReduce: Challenges and Trends
Liu et al. Research on it architecture of heterogeneous big data
Jain et al. Data optimization techniques using bloom filter in big data
Peng et al. Real-time analytics processing with MapReduce
Khan et al. Computational performance analysis of cluster-based technologies for big data analytics
Kumar et al. Big data processing comparison using pig and hive
CN117708219B (en) Processing method, processing device and storage medium for data of Internet of things
Subbiah et al. Job starvation avoidance with alleviation of data skewness in Big Data infrastructure
Dai et al. An asynchronous traversal engine for graph-based rich metadata management
Zhang et al. Ldsqp: Scalable and efficient log data storage and query processing scheme for cloud data centers
Agrawal et al. Scheduling for htap systems on cpu-gpu clusters
FU et al. Distributed Database Integrated Transaction Processing Technology Research
Cheng et al. BF-matrix: A secondary index for the cloud storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200623