WO2024011627A1 - 大数据集群部署方法以及基于大数据集群的数据处理方法 - Google Patents

大数据集群部署方法以及基于大数据集群的数据处理方法 Download PDF

Info

Publication number
WO2024011627A1
WO2024011627A1 PCT/CN2022/106091 CN2022106091W WO2024011627A1 WO 2024011627 A1 WO2024011627 A1 WO 2024011627A1 CN 2022106091 W CN2022106091 W CN 2022106091W WO 2024011627 A1 WO2024011627 A1 WO 2024011627A1
Authority
WO
WIPO (PCT)
Prior art keywords
deployment
node
deployed
server
container
Prior art date
Application number
PCT/CN2022/106091
Other languages
English (en)
French (fr)
Inventor
张宁
樊林
关蕊
何文
褚虓
李想
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202280002227.2A priority Critical patent/CN117716335A/zh
Priority to PCT/CN2022/106091 priority patent/WO2024011627A1/zh
Priority to CN202380009266.XA priority patent/CN117716338A/zh
Priority to PCT/CN2023/097480 priority patent/WO2024012082A1/zh
Publication of WO2024011627A1 publication Critical patent/WO2024011627A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/38Creation or generation of source code for implementing user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present invention relates to the field of computer technology, and in particular to a big data cluster deployment method and a data processing method based on the big data cluster.
  • the deployment and operation of big data clusters are mainly achieved through distributed computing platforms (such as Hadoop), so that high-speed computing and storage of data can be achieved through deployed services.
  • distributed computing platforms such as Hadoop
  • the present invention provides a big data cluster deployment method and a data processing method based on the big data cluster to solve the deficiencies in related technologies.
  • a big data cluster deployment method which method includes:
  • the nodes to be deployed are displayed in the temporary resource pool area of the deployment interface.
  • the nodes are services included in the big data component and used to provide data management functions;
  • the node to be deployed is displayed in the physical pool in the deployment resource pool area of the deployment interface;
  • a container corresponding to the node to be deployed is created on the server corresponding to the physical pool, and the container is used to provide big data cluster services.
  • the deployment interface includes a node creation area, the node creation area includes a node creation control and at least one big data component;
  • the nodes to be deployed are displayed in the temporary resource pool area of the deployment interface, including:
  • the node to be deployed corresponding to the selected big data component is displayed in the temporary resource pool area.
  • the node creation area also includes a node parameter setting control, which is used to set the version of the node to be deployed;
  • the nodes to be deployed corresponding to the selected big data components are displayed in the temporary resource pool area, including:
  • the node to be deployed corresponding to the version set through the node parameter setting control is displayed in the temporary resource pool area.
  • the big data components include at least HDFS components, YARN components, Hive components, and Clickhouse components.
  • the deployment resource pool area includes at least one physical pool
  • the node to be deployed is displayed in the physical pool in the deployment resource pool area of the deployment interface, including:
  • the node to be deployed in response to the drag operation of the node to be deployed, the node to be deployed is displayed in the physical pool indicated at the end of the drag operation.
  • deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool including:
  • the container corresponding to the node to be deployed is deployed on the server corresponding to the physical pool through the target interface, including:
  • the target plug-in is a binary package, and the target plug-in is stored at a set location in the big data cluster;
  • the acquisition process of the target plug-in includes:
  • deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool including:
  • the first request message is used to instruct the container corresponding to the node to be deployed to be deployed on the server corresponding to the physical pool;
  • the deployment operation type includes new node, mobile node and unchanged node;
  • the container is deployed on the server corresponding to the physical pool.
  • the method in response to starting the deployment operation, after generating the first request message based on the node to be deployed and the physical pool in which the node to be deployed is located, the method further includes:
  • container deployment is performed on the server corresponding to the physical pool according to the deployment operation type corresponding to the node to be deployed and the container to be deleted among the deployed containers, including:
  • the deployment operation type is mobile node
  • the method in response to starting the deployment operation, after generating the first request message based on the node to be deployed and the physical pool in which the node to be deployed is located, the method further includes at least one of the following:
  • the deployment data carried in the first request message is verified according to the preset deployment rules.
  • the method further includes at least one of the following:
  • a container deployment record corresponding to the node to be deployed is generated in the second deployment table, and the container deployment record is used to record the deployment operation corresponding to the node to be deployed.
  • the method further includes at least one of the following:
  • the deployment status includes at least undeployed, deployed and deployment error.
  • the nodes to be deployed include multiple types, and the method further includes:
  • the target data is used to indicate the number of data items stored per second of the container to be deployed;
  • deploying the resource pool area includes adding a physical pool control, and the method further includes:
  • the add physical pool interface In response to the triggering operation of the new physical pool control, the add physical pool interface is displayed, and the added physical pool interface includes an identification acquisition control and a password acquisition control;
  • the physical pool to be added is displayed in the deployment resource pool area.
  • the method further includes:
  • the server is used to install the installation file after receiving the installation file. To enable the server to join the big data cluster.
  • the method further includes:
  • the first prompt information is displayed, and the first prompt information is used to indicate the reason why the server failed to join the big data cluster successfully.
  • the method further includes:
  • a server deployment record is generated in the third deployment table, and the server deployment record is used to record the deployment operation corresponding to the physical pool to be added.
  • the method further includes:
  • the initialization status at least includes pending initialization, initializing, initialization error, and initialization completed.
  • the method further includes:
  • the target key is sent to the server corresponding to the physical pool to be added.
  • the target key is used to implement identity authentication in subsequent communication processes.
  • deploying the resource pool area includes deleting a physical pool control, and one physical pool corresponds to one deleting physical pool control.
  • the method further includes:
  • the physical pool corresponding to the deleted physical pool control is no longer displayed in the deployment resource pool area.
  • the method further includes:
  • deploying the resource pool area includes a top physical pool control, and one physical pool corresponds to a top physical pool control.
  • the method further includes:
  • the physical pool corresponding to the top physical pool control is displayed at the first target position in the deployment resource pool area.
  • the method further includes:
  • the server ID of the server corresponding to the physical pool is displayed at the second target location of the physical pool, and the server corresponding to the physical pool is displayed at the third target location of the physical pool.
  • the deployment interface further includes a recovery settings control
  • the method further includes:
  • a third request message is generated, and the third request message is used to request deletion of the deployed server and container;
  • the big data cluster includes at least one server, and an initial server exists in at least one server.
  • the method includes:
  • the big data component base image is used to provide a building foundation for the container;
  • different containers in the big data cluster communicate through an Overlay network.
  • a data processing method based on a big data cluster includes:
  • a data processing request is sent to the target server.
  • the target server is used to implement the data processing process through the container included on the target server based on the data processing request.
  • the container is based on the configuration of the node to be deployed in the temporary resource pool area of the deployment node.
  • the drag-and-drop operation is created on the target server, and the container is used to provide big data cluster services.
  • a data processing request is sent to the target server through the Overlay network, including:
  • At least one target container when the number of target containers is greater than or equal to 2, at least one target container includes at least a first target container and a second target container;
  • Send data processing requests to at least one target container through the Overlay network including:
  • a data processing request is sent to the first target container through the Overlay network, and the first target container is used to communicate with the second target container through the Overlay network to complete the response to the data processing request.
  • the data processing request is a data storage request, a data retrieval request, or a data deletion request.
  • a big data cluster deployment system and a corresponding data processing system are provided.
  • the system includes:
  • Visual operation module used to display the deployment interface
  • the visual operation module is also used to display the nodes to be deployed in the temporary resource pool area of the deployment interface in response to the node creation operation on the deployment interface.
  • the nodes are services included in the big data component and used to provide data management functions;
  • the visual operation module is also used to display the nodes to be deployed in the physical pool in the deployment resource pool area of the deployment interface in response to the drag operation on the nodes to be deployed in the temporary resource pool area;
  • the architecture service module is used to respond to the start deployment operation in the deployment interface and create a container corresponding to the node to be deployed on the server corresponding to the physical pool according to the physical pool where the node to be deployed is located.
  • the container is used to provide big data cluster services. .
  • the deployment interface includes a node creation area, the node creation area includes a node creation control and at least one big data component;
  • the visual operation module is used to respond to the node creation operation on the deployment interface and when the nodes to be deployed are displayed in the temporary resource pool area of the deployment interface:
  • the node to be deployed corresponding to the selected big data component is displayed in the temporary resource pool area.
  • the node creation area also includes a node parameter setting control, which is used to set the version of the node to be deployed;
  • the visual operation module is used to display the node to be deployed corresponding to the selected big data component in the temporary resource pool area in response to the triggering operation of the node creation control:
  • the node to be deployed corresponding to the version set through the node parameter setting control is displayed in the temporary resource pool area.
  • the big data components include at least HDFS components, YARN components, Hive components, and Clickhouse components.
  • the deployment resource pool area includes at least one physical pool
  • the visual operation module is used to display the nodes to be deployed in the physical pool in the deployment resource pool area of the deployment interface in response to the drag operation on the nodes to be deployed in the temporary resource pool area:
  • the node to be deployed in response to the drag operation of the node to be deployed, the node to be deployed is displayed in the physical pool indicated at the end of the drag operation.
  • the architecture service module is used to deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool according to the physical pool where the node to be deployed is located in response to the start deployment operation in the deployment interface, Used for:
  • the architecture service module is used to deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool through the target interface, including:
  • the target plug-in is a binary package, and the target plug-in is stored at a set location in the big data cluster;
  • the process of obtaining the target plug-in includes:
  • the architecture service module is used to deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool according to the physical pool where the node to be deployed is located in response to the start deployment operation in the deployment interface, Used for:
  • a first request message is generated based on the node to be deployed and the physical pool in which the node to be deployed is located, and the first request message is used to instruct the container corresponding to the node to be deployed to be deployed on the server corresponding to the physical pool;
  • the deployment operation type includes new node, mobile node and unchanged node;
  • the container is deployed on the server corresponding to the physical pool.
  • the architecture service module is also used to store the first request message in the first message queue
  • the system also includes:
  • the message module is used to obtain the first request message from the first message queue
  • the architecture service module is also used to, when the message module obtains the first request message, execute the deployed container on the server corresponding to the first request message and the physical pool, determine the deployment operation type corresponding to the node to be deployed, and Steps to delete a container in a deployed container.
  • the architecture service module is used to deploy containers on the server corresponding to the physical pool according to the deployment operation type corresponding to the node to be deployed and the container to be deleted in the deployed container, and is used to:
  • the deployment operation type is mobile node
  • the architecture service module is also used to verify the data format of the first request message
  • the architecture service module is also used to verify the deployment data carried in the first request message according to the preset deployment rules.
  • system further includes a database module
  • the database module is used to generate an operation record in the first deployment table in response to the first request message, and the operation record is used to record this deployment operation;
  • the database module is also configured to respond to the first request message and generate a container deployment record corresponding to the node to be deployed in the second deployment table, where the container deployment record is used to record the deployment operation corresponding to the node to be deployed.
  • the database module is also used to record the deployment status of this operation in the operation record
  • the database module is also used to record the deployment status of the container corresponding to the node to be deployed in the container deployment record;
  • the deployment status includes at least undeployed, deployed and deployment error
  • the nodes to be deployed include multiple types
  • the visual operation module is also used to display the deployment instruction interface
  • the visual operation module is also used to obtain the target data filled in by the user through the deployment instruction interface.
  • the target data is used to indicate the number of pieces of data stored per second of the container to be deployed;
  • the architecture server module is also used to determine the recommended number of deployments corresponding to various types of nodes to be deployed based on target data and preset parameters.
  • deploying the resource pool area includes adding a new physical pool control; a visual operation module is also used to:
  • the add physical pool interface In response to the triggering operation of the new physical pool control, the add physical pool interface is displayed, and the added physical pool interface includes an identification acquisition control and a password acquisition control;
  • the physical pool to be added is displayed in the deployment resource pool area.
  • the architecture service module is also used to generate a second request message when the password to be verified passes the verification;
  • the architecture service module is also used to store the second request message in the second message queue
  • the device also includes:
  • a message module used to obtain the second request message from the second message queue
  • the architecture service module is also used to send an installation file to the server corresponding to the physical pool to be added based on the second request message.
  • the server is used to install the installation file after receiving the installation file so that the server can join the big data. cluster.
  • the visual operation module is also used to display a first prompt message when the password to be verified is not verified or the server fails to successfully join the big data cluster.
  • the first prompt message is used to indicate that the server has not successfully joined the big data cluster. reasons for data clustering.
  • system further includes:
  • the database module is used to generate a server deployment record in the third deployment table when the password to be verified passes the verification.
  • the server deployment record is used to record the deployment operation corresponding to the physical pool to be added.
  • the database module is also used to record the initialization status of the server corresponding to the physical pool to be added in the server deployment record.
  • the initialization status at least includes to be initialized, initializing, initialization error, and initialization completed.
  • deploying the resource pool area includes deleting a physical pool control, one physical pool corresponding to one deleting physical pool control;
  • the visual operation module is also used to respond to the trigger operation of any deleted physical pool control and no longer display the physical pool corresponding to the deleted physical pool control in the deployment resource pool area.
  • the architecture service module is also configured to delete deployed containers from the server corresponding to the physical pool corresponding to the deletion of the physical pool control in response to the triggering operation of any deletion of the physical pool control.
  • the deployment resource pool area includes a top physical pool control, and one physical pool corresponds to a top physical pool control;
  • the visual operation module is also configured to display the physical pool corresponding to the top physical pool control at the first target position in the deployment resource pool area in response to a triggering operation on any top physical pool control.
  • the visual operation module is also configured to display, for any physical pool displayed in the deployment resource pool area, the server identifier of the server corresponding to the physical pool at the second target location of the physical pool.
  • the third target position displays the current storage usage, memory usage and allocated memory usage of the server corresponding to the physical pool.
  • the deployment interface also includes a restore settings control
  • the architecture service module is also used to generate a third request message in response to the triggering operation of the recovery setting control, and the third request message is used to request deletion of the deployed server and container;
  • the architecture service module is also used to delete multiple deployed containers from the deployed server based on the third request message, and execute the third preset script file to separate the deployed server from the big data cluster.
  • the big data cluster includes at least one server, an initial server exists in at least one server, and the architecture service module is also used for:
  • the big data component base image is used to provide a building foundation for the container;
  • the system also includes a network module for ensuring cross-server communication between containers.
  • the network module is used to send the data processing request to the target container through the Overlay network after obtaining the data processing request.
  • the target container is used to implement the data processing process based on the data processing request.
  • the container is configured according to the configuration in the deployment interface.
  • the nodes to be deployed are created on the server by dragging and dropping, and the container is used to provide big data cluster services.
  • the network module when used to send data processing requests to the target container through the Overlay network, is used to:
  • At least one target container when the number of target containers is greater than or equal to 2, at least one target container includes at least a first target container and a second target container;
  • the network module when used to send data processing requests to at least one target container through the Overlay network, is used to:
  • a data processing request is sent to the first target container through the Overlay network, and the first target container is used to communicate with the second target container through the Overlay network to complete the response to the data processing request.
  • the data processing request is a data storage request, a data retrieval request, or a data deletion request.
  • the system also includes a big data component plug-in module, and the big data component plug-in module is used to start the container on the server.
  • a computing device includes a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the above is implemented.
  • the first aspect and the operations performed by the big data cluster deployment method provided in any one of the first aspect, or when the processor executes the computer program to implement the above second aspect and any one of the second aspect based on big data The operations performed by the cluster's data processing methods.
  • a computer-readable storage medium is provided.
  • a program is stored on the computer-readable storage medium.
  • the program is executed by a processor, the first aspect and any one of the first aspects are implemented.
  • the operations performed by the provided big data cluster deployment method, or when the program is executed by the processor, the operations performed by the data processing method based on the big data cluster provided in any one of the above second aspect and the second aspect are implemented. .
  • a computer program product includes a computer program.
  • the computer program When the computer program is executed by a processor, it implements the first aspect and any of the methods provided in the first aspect.
  • the present invention provides a deployment interface to provide big data cluster deployment functions through the deployment interface.
  • the method includes: in response to a node creation operation in the deployment interface, displaying the node to be deployed in the temporary resource pool area of the deployment interface; in response to the drag operation of the node to be deployed in the temporary resource pool area, displaying the deployment resource in the deployment interface
  • the node to be deployed is displayed in the physical pool in the pool area; in response to the start deployment operation in the deployment interface, according to the physical pool where the node to be deployed is located, a container corresponding to the node to be deployed is created on the server corresponding to the physical pool to pass Containers provide big data cluster services.
  • the present invention ensures communication between containers in the big data cluster through the Overlay network, so that when a data processing request is obtained, the data processing request can be sent to the target container through the Overlay network, so that the target container can implement data based on the data processing request. Processing process to meet the data processing needs of users.
  • Figure 1 shows a big data cluster deployment method according to an embodiment of the present invention.
  • Figure 2 is a schematic diagram of a deployment interface according to an embodiment of the present invention.
  • Figure 3 is a schematic diagram of an interface for adding a physical pool interface according to an embodiment of the present invention.
  • Figure 4 is a schematic diagram of a deployment interface according to an embodiment of the present invention.
  • Figure 5 is a flow chart of a recommended deployment process according to an embodiment of the present invention.
  • Figure 6 is a schematic diagram of a deployment interface according to an embodiment of the present invention.
  • Figure 7 is a schematic diagram of a node allocation process according to an embodiment of the present invention.
  • Figure 8 is a schematic diagram showing the principle of a drag function according to an embodiment of the present invention.
  • Figure 9 is a schematic diagram illustrating a principle of modifying style data according to an embodiment of the present invention.
  • Figure 10 is a schematic diagram of configuration files that need to be modified when performing different operations for various big data components according to an embodiment of the present invention.
  • Figure 11 is a schematic diagram of deployment data according to an embodiment of the present invention.
  • Figure 12 is a schematic diagram of deployment data according to an embodiment of the present invention.
  • Figure 13 is a schematic flowchart of a process of restoring factory settings according to an embodiment of the present invention.
  • Figure 14 is a schematic diagram of a redis.conf configuration file according to an embodiment of the present invention.
  • Figure 15 is a schematic diagram of a Redis cluster building process according to an embodiment of the present invention.
  • Figure 16 is a flow chart of a data processing method based on a big data cluster according to an embodiment of the present invention.
  • Figure 17 is a flow chart of a module interaction process according to an embodiment of the present invention.
  • Figure 18 is a flow chart of another module interaction process according to an embodiment of the present invention.
  • Figure 19 is a flow chart of another module interaction process according to an embodiment of the present invention.
  • Figure 20 is a schematic structural diagram of a computing device according to an exemplary embodiment of the present invention.
  • the present invention provides a big data cluster deployment method and a data processing method based on the big data cluster.
  • the big data platform occupies a large number of machines and has high deployment and use thresholds, it can carry out lightweight transformation of the big data platform and realize Containerized deployment of components to reduce the number of machines required for a big data platform.
  • the method provided by the present invention can provide a visual operation interface, that is, a deployment interface, so that relevant technical personnel can implement the deployment of big data clusters through simple drag and drop operations to reduce the deployment process of big data clusters.
  • the technical threshold allows relevant technical personnel to quickly complete cluster deployment, expansion, contraction, reset and other functions, improves deployment efficiency, reduces deployment costs, and allows ordinary technical personnel to complete deployment.
  • the method provided by the present invention can be applied to computing devices, and the computing devices can be servers, such as one server, multiple servers, server clusters, etc.
  • the present invention does not limit the type and number of computing devices.
  • the big data cluster deployment method and the data processing method based on the big data cluster provided by the present invention are introduced respectively.
  • Figure 1 illustrates a big data cluster deployment method according to an embodiment of the present invention.
  • the method includes:
  • Step 101 Display the deployment interface.
  • Step 102 In response to the node creation operation on the deployment interface, the nodes to be deployed are displayed in the temporary resource pool area of the deployment interface.
  • the nodes are services included in the big data component and used to provide data management functions.
  • the temporary resource pool is equivalent to a virtual pool.
  • the temporary resource pool is set up to facilitate the user's drag and drop operation.
  • the nodes stored in the temporary resource pool are not actually nodes waiting to be deployed.
  • the node By displaying the nodes to be deployed generated according to the node creation operation in the interface area corresponding to the temporary resource pool (that is, the temporary resource pool area), the node can be subsequently dragged and dropped to the interface area corresponding to the deployment resource pool (that is, the temporary resource pool area).
  • Deployment resource pool area to deploy based on the node deployment method in the deployment resource pool.
  • Step 103 In response to the drag operation on the node to be deployed in the temporary resource pool area, display the node to be deployed in the physical pool in the deployment resource pool area of the deployment interface.
  • the nodes displayed in the deployment resource pool are the nodes that are actually deployed.
  • the deployment resource pool includes at least one physical pool. Each physical pool is an actual machine. The resources of different machines can be integrated through the deployment resource pool. Using it, relevant technical personnel can deploy containers according to actual needs.
  • Step 104 In response to the start deployment operation in the deployment interface, create a container corresponding to the node to be deployed on the server corresponding to the physical pool according to the physical pool where the node to be deployed is located.
  • the container is used to provide big data cluster services.
  • the present invention provides a deployment interface to provide big data cluster deployment functions through the deployment interface.
  • Relevant technical personnel can drag and drop nodes and trigger operations on components in the deployment interface.
  • the background can respond to the corresponding operations in the deployment interface and automatically complete the deployment of the container to provide big data cluster services through the container.
  • the above deployment process can greatly simplify the operations of relevant technical personnel, thereby reducing the deployment cost of big data clusters and improving deployment efficiency.
  • n servers can be prepared in advance, and n can be a positive integer greater than or equal to.
  • these n servers can be recorded as S1, S2,..., Sn.
  • any one service among the at least one server can be selected as the initial server, so that the required network environment can be pre-deployed on the initial server to realize the construction of the big data cluster through the deployed network environment.
  • the process of pre-deploying the required network environment on the initial server may include the following steps:
  • Step 1 Install the target operating environment on the initial server, and configure the interface corresponding to the target operating environment on the initial server.
  • the target running environment can be an application engine container (Docker) environment
  • the interface corresponding to the target running environment can be a Docker application programming interface (Application Programming Interface, API).
  • server S1 can be used as the initial server, so that the Docker environment can be installed on server S1, and the Docker API can be configured on server S1 so that subsequent operations on Docker engines on other servers can be supported through the Docker API (for example, support for RESTful way to operate the Docker engine).
  • Step 2 Create an Overlay network corresponding to the target operating environment on the initial server, and initialize the cluster environment on the initial server.
  • the Overlay network is a logical network that uses network virtualization to establish connections on the physical infrastructure.
  • big data clusters also include UnderLay network.
  • Underlay network is a physical network responsible for transmitting data packets. It is composed of switches, routers and other equipment, driven by Ethernet protocols, routing protocols and VLAN protocols.
  • Underlay realizes the separation of control plane and forwarding plane to meet the cross-host communication needs of containers.
  • one or more logical networks can be created on the existing physical network through tunnel technology without any modification to the physical network, effectively solving many problems existing in the physical data center and realizing the realization of the data center. automation and intelligence.
  • Step 3 Create a big data component base image on the initial server.
  • the big data component base image is used to provide a building foundation for the container.
  • a big data component basic Docker image can be created on the initial server to provide the startup function of the big data cluster service container through the big data component basic Docker image.
  • the environment and software required for various big data components can be pre-packaged into Docker image tape archive (Tape Archive, Tar) packages, and the packaged Docker image Tar can be pre-packaged.
  • the package is uploaded to the initial server so that the initial server can install the Docker image Tar package to create the base image of the big data component.
  • big data clusters can include various types of components, such as distributed file system (Hadoop Distributed File System, HDFS) components, resource coordinator (Yet Another Resource Negotiator, YARN) components, and distributed application coordination services. (Zookeeper) components, database tool (Clickhouse) components, data warehouse tool (Hive) components, security management (Knox) components, monitoring tool (such as Prometheus, Grafana) components, etc.
  • big data clusters can also include other types of components. , the present invention does not limit specific component types.
  • the environment and software required by different types of big data components can be packaged into a Docker image Tar package, or the environment and software required by different types of big data components can be packaged into different Docker image Tar packages.
  • Package for example, the HDFS component, YARN component and Zookeeper component are packaged into a Docker image Tar package, and the Clickhouse component, Hive component, Knox component, and monitoring tool component are packaged into a Docker image Tar package respectively.
  • the present invention specifically uses The method is not limited.
  • Step 4. Generate the target key file on the initial server.
  • a symmetric encryption algorithm or an asymmetric encryption algorithm can be used to generate the key.
  • other algorithms can also be used to generate the key, which is not limited by the present invention.
  • the target key can be a Secure Shell protocol (Secure Shell, SSH) public and private key.
  • SSH Secure Shell protocol
  • the target key file By generating the target key file, when a new server is added or a new container is created in the big data cluster, the target key file can be shared with the newly added server or newly created container, making subsequent big data clusters Secret-free communication can be achieved between servers or between servers and containers through the target key.
  • the basic network environment required to build a big data cluster can be completed, so that other servers can be added to the big data cluster based on the built basic network environment to build a large data cluster including multiple servers.
  • Data cluster can be deployed in big data clusters based on the built basic network environment to provide services to users through the deployed containers.
  • a new physical pool control can be set in the deployment interface, so that a server can be added to the big data cluster by adding a new physical pool control.
  • Figure 2 is a schematic diagram of a deployment interface according to an embodiment of the present invention. As shown in Figure 2, the deployment interface is divided into a node creation area, a temporary resource pool area and a deployment resource pool area, where, The "Add Physical Pool” button set in the deployment resource pool area is the new physical pool control. By pressing the "Add Physical Pool” button, you can add the server to the big data cluster.
  • the following process can be used to add a new physical pool, thereby adding a server to the big data cluster:
  • Step 1 In response to the triggering operation of the new physical pool control, the add physical pool interface is displayed.
  • the added physical pool interface includes an identification acquisition control and a password acquisition control.
  • Figure 3 is a schematic diagram of an interface for adding a physical pool according to an embodiment of the present invention. After the new physical pool control is triggered, the add physical pool as shown in Figure 3 can be displayed on the visual interface. Interface, in which the input box with the text prompt "IP" is the identification acquisition control, and the input box with the text prompt "Password” is the password acquisition control.
  • Step 2 Obtain the server ID corresponding to the physical pool to be added through the ID acquisition control, and obtain the password to be verified through the password acquisition control.
  • relevant technical personnel can enter the server ID of the server to be added to the big data cluster in the ID acquisition control, and enter the preset password in the password acquisition control, so that the computing device can obtain the ID through the ID.
  • the control obtains the server ID corresponding to the physical pool to be added, and obtains the password to be verified through the password acquisition control.
  • the password to be verified can be verified.
  • Step 3 If the password to be verified passes the verification, the physical pool to be added is displayed in the deployment resource pool area.
  • the user can enter the server identity of the server to be added to the big data cluster in the identity acquisition control to meet the user's customized needs, and by setting the password acquisition control, the user can enter the password acquisition control in the password acquisition interface. Enter the password to be verified to verify the user's identity based on the password to be verified to determine whether the user has the right to join the server to the big data cluster, thereby ensuring the security of the big data cluster deployment process.
  • the addition of a physical pool can be achieved through the following process:
  • a second request message is generated, thereby storing the second request message in the second message queue, and then obtaining it from the second message queue.
  • the second request message is based on the second request message, and the installation file is sent to the server corresponding to the physical pool to be added, so that the server can install the installation file after receiving the installation file, so that the server can join the big data cluster.
  • the second request message may be request message data in JS object notation (Java Script Object Notation, JSON) format.
  • the second request message may also be other types of message data, which is not limited by the present invention.
  • the second request message can be a code in the following form:
  • the computing device can implement a corresponding processing process based on the second request message.
  • the installation file can include Docker image Tar packages corresponding to various types of components and Red Hat Package Manager (Red-Hat Package Manager, RPM) installation packages, etc., so that the computing device can receive the installation file.
  • the installation of preset scripts (including the first preset script and the second preset script) is implemented by installing the RPM package.
  • the first preset script is used to implement the environment installation function
  • the second preset script is used to implement the cluster joining function.
  • the computing device can install the Docker image Tar package through the first preset script to implement environment installation on the server to be added to the cluster, and then execute the second preset script on the server to be added to the cluster.
  • This server can join the Docker Swarm cluster of the initial server.
  • the computing device may be associated with a database, which may be used to store deployment records in the big data cluster.
  • the database may include a third deployment table, and the third deployment table may be used to record the operation of adding a physical pool.
  • a server deployment record may be generated in the third deployment table, and the server deployment record is used to record the deployment operation corresponding to the physical pool to be added.
  • the initialization status of the server corresponding to the physical pool to be added can be recorded in the server deployment record.
  • the initialization status at least includes to be initialized, initializing, initialization error and initialization completed, so that the computing device can be based on the information recorded in the server deployment record.
  • the physical pool to be added is displayed in the deployment resource pool area.
  • physical pools to be added can be displayed in different colors based on the initialization status recorded in the server deployment record. For example, when the initialization status recorded in the server deployment record is to be initialized or being initialized, the physical pool to be added can be displayed in blue; when the initialization status recorded in the server deployment record is initialization completed, the physical pool to be added can be displayed. The added physical pool is displayed in white; when the initialization status recorded in the server deployment record is an initialization error, the physical pool to be added can be displayed in red so that relevant technicians can visually observe the initialization status of the server.
  • the physical pool to be added is already displayed in the deployment resource pool.
  • the server corresponding to the physical pool to be added is on the server.
  • the initialization status in the deployment record is recorded as pending initialization.
  • the physical pool to be added displayed in the deployment resource pool area is blue.
  • the initialization status recorded in the server deployment record of the server corresponding to the physical pool to be added is Initializing. At this time, the physical pool to be added displayed in the deployment resource pool area The pool remains blue.
  • the initialization status recorded in the server deployment record of the server corresponding to the physical pool to be added is initialization completed.
  • the physical pool to be added displayed in the deployment resource pool area is White.
  • the initialization status recorded in the server deployment record of the server corresponding to the physical pool to be added is an initialization error.
  • the deployment resource pool area Physical pools to be added are displayed in red.
  • the computing device can query the server initialization status every preset time period, and thereby update the display method of the physical pool to be added based on the queried initialization status.
  • the preset duration may be 10 seconds.
  • the preset duration may also be other durations.
  • the present invention does not limit the specific value of the preset duration.
  • the reason for the failure can also be recorded in the server deployment record so that relevant technical personnel can troubleshoot the problem.
  • the reason for the failure can be an incorrect IP address or password, an inability to connect to the server corresponding to the physical pool to be added, etc.
  • the request triggering time, the server identification of the server corresponding to the physical pool to be added, the time when the server corresponding to the physical pool to be added successfully joined the big data cluster, etc. can also be recorded in the server deployment record.
  • the present invention includes the server deployment record The specific content is not limited.
  • the server can form a Docker Swarm cluster with the existing servers in the big data cluster.
  • the server cannot successfully join the big data cluster.
  • the initialization status of the server recorded in the server deployment record is an initialization error, and the server deployment record also records the reason for the failure.
  • the computing device queries that the initialization status is initialization error, it can obtain the failure reason recorded in the server deployment record, so that the first prompt information can be displayed based on the obtained failure reason, so that the first prompt can be passed
  • the information indicates the reason why the server failed to successfully join the big data cluster, allowing relevant technical personnel to perform targeted processing in a timely manner.
  • the second request message after the second request message is generated, that is, the password to be verified is verified based on the second request message.
  • the second request message after generating the second request message, can be stored in the second message queue, so that the second request message can be subsequently obtained from the second message queue to perform the processing based on the second request. message, the process of verifying the password to be verified.
  • the message queue is used to store request messages, which ensures that if there is a problem with request message processing, the request message that has not been successfully processed can be retrieved from the message queue and retried without the user having to re-enter the interface. Performing operations manually simplifies user operations and thus improves user experience.
  • the target key when the password to be verified passes the verification, can be sent to the server corresponding to the physical pool to be added, so that the target key can be used to implement identity verification in the subsequent communication process without the need for Log in to ensure the security of the communication process.
  • Figure 4 is a schematic diagram of a deployment interface according to an embodiment of the present invention. As shown in Figure 4, compared with the deployment interface shown in Figure 2, the initial server (server Two servers have been added based on the ID of 10.10.177.19). The server IDs of these two servers are 10.10.177.18 and 10.10.177.20 respectively.
  • the physical pool in the deployment resource pool area when displaying the physical pool in the deployment resource pool area, you can also display relevant information about the physical pool, such as the server ID of the server corresponding to the physical pool, current storage usage, memory usage, and allocated memory usage. rate etc.
  • the server ID of the server corresponding to the physical pool can be displayed at the second target location of the physical pool, and the server ID of the physical pool can be displayed at the third target location of the physical pool.
  • the current storage usage, memory usage and allocated memory usage of the corresponding server can be displayed at the second target location of the physical pool, and the server ID of the physical pool can be displayed at the third target location of the physical pool.
  • the second target position may be the upper left corner of the displayed physical pool
  • the third target position may be the lower right corner of the displayed physical pool.
  • the deployment interface shown in Figure 4 displays the server identification of the corresponding server in the upper left corner of each physical pool, and displays the current storage of the corresponding server in the lower right corner of each physical pool. Usage, memory usage and allocated memory usage.
  • the second target position and the third target position can also be other positions, which are not limited by the present invention.
  • the status data of each server can be obtained in real time, so that the status of each server can be displayed at the corresponding location on the physical pool based on the obtained status data.
  • the server status By displaying the server status based on the status data obtained in real time, the real-time and validity of the data can be ensured, thereby making the server status obtained by the user through the displayed content more authentic and reliable.
  • the hardware environment of the big data cluster can be built to obtain a big data cluster including at least one server, so that containerized deployment can be performed on at least one server so that the big data cluster can provide users with Provide big data processing capabilities.
  • the deployment interface includes a node creation area including a node creation control and at least one big data component.
  • big data components can include HDFS components, YARN components, Clickhouse components, Hive components, Knox components, monitoring tool components and other types of components
  • some components are configured by default when configuring the initial server. , no manual operation is required by the user. Therefore, not all the above big data components will be displayed in the node creation area.
  • the components displayed in the node creation area can include HDFS components, YARN components, Clickhouse components, and Hive components.
  • HDFS components can be used to provide data storage functions. That is, if you want to provide users with data storage functions, they need to be deployed in a big data cluster.
  • the container corresponding to the node of the HDFS component is used to provide users with distributed storage services of data through the deployed container to meet the needs of users.
  • YARN components can be used to provide data analysis functions. That is, if you want to provide users with data analysis functions, you need to deploy the containers corresponding to the nodes of the YARN components in the big data cluster, so that through the containers corresponding to the nodes of the YARN components, from Data is obtained from the container corresponding to the node of the HDFS component, and data analysis is performed based on the obtained data to meet the user's data analysis needs.
  • the Hive component can convert the data stored in the container corresponding to the node of the HDFS component into a queryable data table, so that data query and processing can be performed based on the data table to meet the user's data processing needs.
  • YARN components and Hive components can provide users with data analysis functions, the difference is that if you want to use YARN components to implement the data analysis process, you need to develop a series of codes to complete the data processing tasks. After submitting it to the YARN component, the corresponding data processing process is carried out based on the data processing task through the developed code. If you want to use the Hive component to implement the data analysis process, you only need to use Structured Query Language (SQL) statements. This can realize the processing of data processing tasks.
  • SQL Structured Query Language
  • the Clickhouse component is a columnar storage database that can be used to meet users' storage needs for large amounts of data. Compared with commonly used row storage databases, the Clickhouse component has a faster reading speed. Moreover, the Clickhouse component can partition data. Storage, users can only obtain data in one or a few partitions for processing according to their actual needs, without obtaining all data in the database, thus reducing the data processing pressure on computing devices.
  • the big data components displayed include HDFS components, YARN components, Hive components and Clickhouse components.
  • the deployment shown in Figure 4 The "Apply" button in the interface is the node creation control.
  • step 102 when the node to be deployed is displayed in the temporary resource pool area of the deployment interface in response to the node creation operation on the deployment interface, it can be implemented in the following manner:
  • the node to be deployed corresponding to the selected big data component is displayed in the temporary resource pool area.
  • HDFS components include NameNode(nn) nodes, DataNode(dn) nodes and SecondaryNameNode(sn) nodes, YARN components include ResourceManager(rm) nodes and NodeManager(nm) nodes, Hive components include Hive(hv) nodes, and Clickhouse components include Clickhouse( ch) node.
  • the computing device can display the corresponding nodes in the temporary resource pool area as nodes to be deployed according to the selected big data components.
  • the node creation area can also be equipped with a node parameter setting control.
  • Node parameter setting The control can be used to set the version of the node to be deployed.
  • the nodes to be deployed corresponding to the version set through the node parameter setting control are displayed in the temporary resource pool area.
  • HDFS components and YARN components can include high availability (High Availability, HA) versions and non-high availability versions. It should be noted that the versions of HDFS components and YARN components need to be consistent.
  • the check box corresponding to "HA" under "Node Parameters” is the node parameter setting control.
  • the check box corresponding to "HA” when the check box corresponding to "HA” is selected, it means that the HDFS component or YARN component to be deployed is the HA version.
  • the check box corresponding to "HA” When the check box corresponding to "HA” is not selected, it means that it is to be deployed.
  • the HDFS component or YARN component is a non-HA version.
  • Hive components and Clickhouse components do not need to distinguish between HA versions and non-HA versions
  • the mark text is displayed under "Node Parameters”
  • the text will change to "None", eliminating the need to distinguish between versions of the Hive component and the Clickhouse component.
  • node parameter setting control By setting the node parameter setting control in the deployment interface, users can select the version of the big data component based on actual technical requirements, thereby meeting the user's customized needs.
  • the version has been set through the node parameter setting control
  • the set version cannot be modified. If the node has not started to be deployed, the user can also modify the set version. Modification, accordingly, the nodes displayed in the temporary resource pool area will be cleared after the version modification, so that users can re-create nodes.
  • the node type and number of nodes to be added to be deployed each time the node creation control is triggered are preset.
  • HDFS components, YARN components, Hive components and Clickhouse components are the most widely used in big data clusters. The following mainly uses HDFS components, YARN components, Hive components and Clickhouse components as examples to illustrate.
  • the node types and number of nodes in the initial state of different versions and different components are as follows:
  • the initial state of the HA version of HDFS components 1 nn node, 1 sn node, 4 dn nodes;
  • the initial state of the non-HA version of HDFS components 3 nn nodes, 4 dn nodes;
  • the initial state of the HA version of the YARN component 1 rm node, 1 nm node;
  • the initial state of the non-HA version of the YARN component 3 rm nodes, 1 nm node;
  • Hive component 1 hv node
  • the nn node is the core node of the HDFS component and is used to provide data management and control functions.
  • the non-HA version of the HDFS component only includes 1 nn node. Once the node fails, the HDFS component will no longer be able to provide the corresponding functions.
  • the HA version of the HDFS component includes 3 nn nodes, of which 1 nn node is in the active (Active) state and the other 2 nn nodes are in the standby (Standby) state.
  • the nn nodes in the Active state can be used to work at the beginning. , and once the nn node in the Active state fails, the nn node in the Standby state can be activated to ensure the normal operation of the HDFS components, thereby achieving high availability.
  • the rm node is the core node of the YARN component and is used to provide data management and control functions.
  • the non-HA version of the YARN component only includes 1 rm node. Once the node fails, the YARN component will no longer be able to provide the corresponding functions. , however, the HA version of the YARN component includes 3 rm nodes, of which 1 rm node is in the Active state and the other 2 rm nodes are in the Standby state.
  • the rm node in the Active state can be used to work at the beginning, and once it is in the Active state If the rm node in the standby state fails, the rm node in the standby state can be activated to ensure the normal operation of the YARN components and achieve high availability.
  • the number of nodes in the initial state is based on the Hadoop architecture. The technical requirements are determined, and for the dn node of the HDFS component, the number of nodes in the initial state is because the default number of copies of the HDFS component is 3. In order to ensure that moving nodes on each copy will not lose data, 4 are set dn node.
  • HDFS components users can increase the number of dn nodes according to actual technical needs, but the number of nn nodes and sn nodes cannot be increased; for YARN components, users can increase the number of nm nodes according to actual technical needs. However, the number of rm nodes cannot be increased; for Hive components and Clickhouse components, the number of corresponding nodes (that is, hv nodes and ch nodes) cannot be increased.
  • the big data component is the HA version of the HDFS component
  • the big data component is a non-HA version of HDFS
  • you click the node to create the control for the first time you can add 1 nn node, 1 sn node, and 4 dn nodes in the temporary resource pool area.
  • Subsequent clicks on the node creation control will not increase the number of nodes corresponding to the Hive component; in the big data component, In the case of Clickhouse components, you can add 1 ch node in the temporary resource pool area when you click the node creation control for the first time. Subsequent clicks on the node creation control cannot increase the number of nodes corresponding to the Clickhouse component.
  • the following takes several exemplary node creation processes as examples to illustrate the node creation process of the present invention.
  • the memory usage of each server can be determined based on the estimated memory usage of various types of nodes. What needs to be emphasized is that if the HDFS component and YARN component are deployed as HA version, although the front-end deployment interface does not display the Zookeeper (zk for short) node, the zk node needs to be deployed during the actual cluster deployment process, so before determining When estimating the memory usage, you need to increase the memory usage of 3 zk nodes.
  • the computing device can determine the estimated memory usage based on the data shown in Table 2 and the number of nodes deployed by the user.
  • the present invention can also provide a configuration recommendation function, when the user cannot When determining the type of big data components that need to be deployed and the number of nodes that need to be deployed, the recommended optimal configuration method can be obtained through the configuration recommendation function provided by the present invention.
  • the configuration method also includes the configuration of the deployed big data components. Type and number of nodes to be deployed.
  • the configuration recommendation process can be implemented through the following process:
  • Step 1 Display the deployment instruction interface.
  • Step 2 Obtain the big data component type, component version and target data to be deployed through the deployment instruction interface.
  • the target data is used to indicate the number of stored data per second required for data processing requirements.
  • the deployment instruction interface can provide multiple deployable big data component options, candidate component versions and a data acquisition control.
  • the big data component options can be set as check boxes, so that users can check the big data components to be deployed according to their actual needs, so that the computing device can obtain the type of big data components to be deployed; candidate component versions The form can be found in the node parameter setting control as mentioned above, which will not be described again here;
  • the data acquisition control can be provided as an input box so that the user can input instructions for data processing through the data acquisition control set on the deployment instruction interface. The required number of target data stored per second is required, so that the computing device can obtain the target data through the deployment instruction interface.
  • Step 3 Based on the big data component type to be deployed, component version, target data and preset parameters, determine the recommended number of deployments corresponding to various types of nodes to be deployed.
  • nn nodes based on the type and component version of the big data component to be deployed, the recommended deployment numbers of nn nodes, sn nodes, rm nodes, hv nodes and ch nodes can be determined.
  • the recommended deployment numbers of dn nodes and nm nodes can be determined as follows:
  • the recommended number of nm nodes to be deployed can also be determined based on the size comparison between the target data and the preset parameters.
  • the recommended deployment number of nm nodes can be determined as 1; and when the target data is greater than the preset parameters, half of the recommended deployment number of dn nodes can be determined as the recommended deployment number of nm nodes.
  • the computing device can also determine the estimated memory usage based on the recommended deployment numbers of various types of nodes to be deployed.
  • the estimated memory usage it should be emphasized that if the HDFS component and YARN component are deployed as HA versions, although the front-end deployment interface does not display the Zookeeper (zk) node, during the actual cluster deployment process, the zk node It needs to be deployed, so when determining the estimated memory usage, it is necessary to increase the memory usage of 3 zk nodes.
  • the target data is 40w/s (number of data stored per second), and the preset parameter is 20w/s to determine the various types.
  • the recommended number of deployments for various types of nodes is introduced in the form of a table below. , see Table 3 below:
  • the determined recommended deployment numbers can be displayed in the deployment instruction interface for users to view.
  • prompt information can also be displayed in the deployment instruction interface.
  • the prompt information can be used to prompt the recommended number of deployments for reference only. The user can increase or decrease the number of nodes to be deployed according to the actual situation.
  • users can also deploy the nodes to be deployed into multiple physical pools according to actual conditions.
  • the node to be deployed is an HA version
  • users can be advised to set up at least 3 physical pools through the visual interface to deploy 3 nn or 3 rm to different servers respectively to truly achieve High availability of big data clusters.
  • the HA version of HDFS components and YARN components require the use of Zookeeper clusters. Therefore, if the big data component selected by the user is the HA version of HDFS component or YARN component, the computing device will deploy the HDFS component or YARN component.
  • a 3-node Zookeeper cluster is deployed by default on the server of the node corresponding to the component. Therefore, the Zookeeper component does not need to be displayed in the front-end deployment interface, and the deployment of the Zookeeper component can be completed when needed.
  • the HA version of HDFS components, YARN components, and other components there is no need to deploy the Zookeeper component.
  • the deployment and use of nodes corresponding to YARN components need to be based on HDFS components. That is, the nodes corresponding to HDFS components must be deployed first, and then the nodes corresponding to YARN components must be deployed. If the corresponding nodes of HDFS components are not deployed, deploy them directly. For nodes corresponding to YARN components, the front-end page will prompt an error; the deployment and use of nodes corresponding to Hive components need to be based on HDFS and YARN components, that is, the nodes corresponding to HDFS components and YARN components must be deployed first, and then the nodes corresponding to Hive components must be deployed.
  • the front-end page will prompt an error; the node corresponding to the Clickhouse component is an independent node and has no dependency on the nodes corresponding to other components.
  • FIG. 5 is a flow chart of a recommended deployment process according to an embodiment of the present invention.
  • the type and size of the big data component to be deployed can be obtained through the deployment instruction interface.
  • Component version if the big data component to be deployed only includes the Clickhouse component, it can be directly determined that the recommended number of ch nodes to be deployed is 1, and the estimated memory usage can be determined; while in the big data component to be deployed, If the data component also includes other components besides the Clickhouse component, the target data can be obtained through the deployment instruction interface to determine whether the target data is greater than the preset parameters. If the target data is greater than the preset parameters, the target data can be obtained through the above process.
  • the formula described is used to determine the recommended number of node deployments.
  • the default number of recommended nodes can be used as the recommended number of deployments; further, the required number of deployments needs to be determined based on the component version.
  • the deployed nodes include nodes of the HA version.
  • it is necessary to add related nodes of the HA version such as zk nodes, and then proceed based on the recommended number of deployments and the increased number of related nodes of the HA version. Determination of the estimated memory occupied. If there is no need to add related nodes of the HA version, the estimated memory occupied can be determined directly based on the recommended number of deployments.
  • the nodes to be deployed can be displayed in In the temporary resource pool area, the user can drag the node to be deployed in the temporary resource pool area to the physical pool of the deployment resource pool area, so that the computing device can respond to the dragging of the node to be deployed in the temporary resource pool area through step 103. Drag operation to display the nodes to be deployed in the physical pool in the deployment resource pool area of the deployment interface.
  • the deployment resource pool area may include at least one physical pool.
  • step 103 in response to the drag operation on the node to be deployed in the temporary resource pool area, when the node to be deployed is displayed in the physical pool in the deployment resource pool area of the deployment interface, for any The node to be deployed may respond to the drag operation of the node to be deployed and display the node to be deployed in the physical pool indicated at the end of the drag operation.
  • the deployment interface can also be set with an automatic allocation control, so that the user can automatically allocate the controls to be deployed in the temporary resource pool area to each physical pool in the deployment resource pool area through the automatic allocation control.
  • FIG. 6 is a schematic diagram of a deployment interface according to an embodiment of the present invention.
  • the "automatic allocation” button in Figure 6 is the automatic allocation control, and the temporary resource pool has been added
  • the nodes corresponding to the HDFS component that is, 3 nn nodes and 4 dn nodes
  • the nodes corresponding to the YARN component that is, 3 rm nodes and 1 nm node
  • the nodes corresponding to the Hive component that is, 1 hv node
  • the node corresponding to the Clickhouse component that is, 1 ch node
  • Figure 7 is a schematic diagram of a node allocation process according to an embodiment of the present invention.
  • taking the temporary resource pool area including 8 nodes as an example it can be automatically allocated or dragged and dropped. Assign these 8 nodes to 3 physical pools. For example, assign node 1, node 2, and node 3 to physical pool 1, assign node 4, node 5, and node 6 to physical pool 2, and assign node 7 to physical pool 2. and node 8 are assigned to physical pool 3, where one physical pool corresponds to one server in the big data cluster.
  • a node is a container in the server.
  • the nodes to be deployed in the temporary resource pool area also support drag-and-drop deletion function. Users can delete the nodes to be deployed by dragging the nodes to be deployed in the temporary resource pool area to the specified location. Still taking the deployment interface shown in Figure 2 as an example, the position of "Drag the node here to delete" in the lower left corner of the interface is the designated position.
  • the temporary resource pool area stores temporary nodes, when you leave the page or refresh the page, the page will be reset, causing the page to clear all nodes in the temporary resource pool.
  • the above process is mainly an introduction to the specific drag and drop process. It should be noted that compared with the drag and drop function currently provided by HTML5 pages, the present invention facilitates developers to drag and drop by designing a drag and drop component that is easy to operate. The process can be more accurately controlled, more comprehensive and concise data information can be obtained, the generation of excessive redundant codes can be avoided, development work can be simplified, and code quality and code readability can be improved.
  • the process of implementing node drag based on the newly developed drag function can be as follows:
  • Step 1 Obtain the attribute data of the operation object corresponding to the node displayed on the target interface.
  • the operation object is the object defined in the program code corresponding to the target interface.
  • the target interface can be the deployment interface
  • the operation object can be a Document Object Model (Document Object Model, DOM) object.
  • Step 2 Associate the obtained attribute data to the corresponding node.
  • Step 3 In response to the drag operation on the node on the target interface, modify the attribute data associated with the node based on the operation data corresponding to the drag operation, so that the node is displayed in the drag operation based on the modified attribute data. end position.
  • the obtained attribute data is associated with the corresponding node, so that the node on the target node can be dragged and dropped.
  • the attribute data associated with the node can be modified directly based on the operation data corresponding to the drag operation without the need to search for the operation object, so that the drag and drop display of the node can be realized through a simple operation process. So that the node will be displayed at the end of the drag operation based on the modified attribute data.
  • the obtained attribute data can be used as a kind of marking information and marked on the corresponding node.
  • the node identification can also be used as a kind of marking information.
  • the corresponding attribute data to achieve the association between the attribute data and the node.
  • other methods can also be used to realize the association between attribute data and nodes, and the present invention does not limit which method is specifically used.
  • the two-way binding of attribute data and nodes can be realized, so that the attribute data can be modified directly based on the operation of the node in the future, without the need to modify the operation object.
  • the attribute data may include location data. Therefore, in response to a drag operation on a node on the target interface, when the attribute data associated with the node is modified based on the operation data corresponding to the drag operation, the attribute data associated with the node may be modified. In response to the drag operation on the node on the target interface, the position data in the attribute data associated with the node is modified based on the position data corresponding to the position where the drag operation ends.
  • the attribute data may also include other types of data, and the present invention does not limit the specific type of the attribute data.
  • the attribute data it can be achieved through the following process:
  • Step 1 In response to the drag operation on the node on the target interface, obtain the attribute data to be modified based on the operation data corresponding to the drag operation through the attribute acquisition instruction.
  • Step 2 Determine the operation object corresponding to the attribute data to be modified through the attribute setting instruction.
  • Step 3 Modify the attribute data corresponding to the determined operation object based on the attribute data to be modified.
  • the attribute acquisition instruction can be a getAtttribute instruction
  • the attribute setting instruction can be a setAtttribute instruction
  • Figure 8 is a schematic diagram of the principle of a drag function according to an embodiment of the present invention.
  • taking the operation object as a DOM object as an example by associating attribute data with nodes displayed in the interface, thus, after detecting a drag operation on a node, the attribute data can be directly modified synchronously based on the drag operation.
  • the corresponding DOM object can be determined based on the operated node, and then the DOM object can be modified. .
  • the native drag and drop function needs to obtain the operation object corresponding to the dragged node through the drag event, and then find the logic required by the business through the attribute data corresponding to the operation object, such as obtaining Assign the value of the DOM object to the variable item, obtain the attribute data of the DOM object through the item.getAttribute instruction, and then modify the attribute data through the item.setAttribute instruction to display the node according to the drag operation.
  • the present invention can directly modify the attribute data when the node is dragged, and encapsulates the process of finding and modifying the operation object into the drag component, so that the user only needs to pay attention to Just modify the data, so that in the actual code implementation, you only need to assign the attribute data to be modified to i, and the attribute data can be modified through the i.isDeploy instruction. It seems that only one word is simplified here, but In actual development, for the native drag and drop function of HTML5 pages, a large amount of attribute data is written in the code, which makes it very complicated to operate the operation object and has poor readability. However, the operation data can make the code clear at a glance, and only focus on the data. changes without paying attention to the modification of the operation object.
  • the above process is explained by taking the example of directly moving and dragging the node. That is, after the node is dragged, the displayed data is deleted from the server where the node was before being dragged. The node will only be displayed in the server where it is after being dragged. This is explained by taking the effect of moving and dragging as an example. In more possible implementations, the node before being dragged can also be The node is displayed in the server where it is located and in the server where the node is located after being dragged, so as to achieve a copy-drag effect.
  • a temporary variable can be generated for the node in response to the drag operation on the node on the target interface, and the attribute data before modification of the node can be modified through the temporary variable. storage.
  • temporary variables can be used to store the position data of a node before it is dragged.
  • the location data may be the server identifier corresponding to the physical pool where the node is located and the index value of the node in the server.
  • the node can be displayed on the server where the node is before being dragged and on the server where the node is after being dragged. In order to facilitate the user to distinguish, the two can also be displayed. Nodes are displayed in different styles.
  • the attribute data may also include style data, and the style data is used to indicate the display style of the node, such as the node's border style (solid or dotted line), the node's color, etc.
  • the style data included in the attribute data stored in the temporary variable is modified to the first style data, and the attributes associated with the node are modified.
  • the style data included in the data is modified into the second style data, so that the node before being dragged and the node copied based on the node-based drag operation can be displayed in different styles to facilitate user distinction.
  • Figure 9 is a schematic diagram illustrating a principle of modifying style data according to an embodiment of the present invention.
  • taking the operation object as a DOM object as an example by pre-setting the style data as attribute data, it is combined with the interface
  • the displayed nodes are associated, so that after detecting the drag operation on the node, modifications can be made directly based on the style data of the node, and within the drag component, the corresponding DOM object can be determined based on the operated node. Then modify the style data of the DOM object.
  • the attribute data may also include behavior data, and the behavior data is used to indicate whether the node needs to display prompt information when it is dragged.
  • the attribute data associated with the dragged node can be obtained; when the behavior data included in the attribute data indicates whether the node needs to display prompt information when it is dragged Next, the prompt information is displayed.
  • the prompt information is used to prompt based on this drag operation.
  • the prompt information can be a pop-up prompt, a message reminder, etc.
  • step 104 can be passed to respond to the start deployment operation in the deployment interface, according to the location of the node to be deployed.
  • Physical pool deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool.
  • step 104 in response to the start deployment operation in the deployment interface, when deploying the container corresponding to the node to be deployed on the server corresponding to the physical pool according to the physical pool where the node to be deployed is located, it may include: follow these steps:
  • Step 1041 In response to starting the deployment operation, determine the target plug-in based on the component type of the big data component to which the node to be deployed belongs.
  • the target plug-in can be a binary package developed by developers according to unified development specifications. After the development of the target plug-in is completed, the developer can upload the target plug-in to the initialization server of the big data cluster so that the server can be initialized and the target plug-in can be stored in A set location in a big data cluster.
  • the setting location can be the plugins file directory in the initial server.
  • the start method is uniformly developed to start the service
  • the restart method is used to restart the service
  • the decommission method is used to decommission it. Node functions.
  • Step 1042 Start the target interface located on the server corresponding to the physical pool through the target plug-in.
  • Step 1043 Deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool through the target interface.
  • step 1043 can be implemented through the following process:
  • Step 1043-1 Read the first configuration file through the target plug-in to obtain the target installation environment from the first configuration file.
  • the first configuration file may be an app.json configuration file.
  • the first configuration file may include the image name, version number, Docker network name, MYSQL information for storing data, RabbitMQ information, etc.
  • the target installation environment can be determined based on the Docker network name and image name included in the first configuration file.
  • the target installation environment can be the Docker network environment corresponding to the HDFS component, the Docker network environment corresponding to the YARN component, the Docker network environment corresponding to the Hive component, the Docker network environment corresponding to the Clickhouse component, etc.
  • Step 1043-2 Modify the configuration file of the target installation environment of the server through the target interface to deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool.
  • the program can automatically generate parameters and automatically complete the modification of the configuration file.
  • Figure 10 shows the configuration files that need to be modified for various big data components when performing different operations.
  • plug-in hot swapping and unified plug-in development can be achieved, reducing the technical threshold for cluster construction.
  • plug-ins have unified template methods, configurations, and functions based on unified development specifications, which not only increases readability, but also reduces conflicts between plug-ins; for users, plug-ins are encapsulated in specifications, Users can manipulate the plug-in without understanding how the back-end plug-in executes, reducing the potential for problems.
  • plug-ins services can be deployed in containers to realize lightweight big data clusters and solve the problem of resource waste.
  • plug-ins can realize environment construction and improve construction efficiency.
  • services can be easily moved later, thus reducing development and maintenance costs.
  • the program After completing the modification of the configuration file, the program will automatically complete operations such as copying, unified configuration between containers, and starting services to complete the deployment of the container. For example, after modifying the configuration file, the container can be deployed through the following process.
  • Step 1 Generate a first request message based on the node to be deployed and the physical pool in which the node to be deployed is located.
  • the first request message is used to instruct the container corresponding to the node to be deployed to be deployed on the server corresponding to the physical pool.
  • request message data in JSON format can be generated based on the node to be deployed and the physical pool in which the node to be deployed is located, so that the generated request message data in JSON format can be used as the first request message. , so that the container can be deployed later based on the first request message.
  • the first request message may carry information corresponding to n nodes to be deployed. For example, it may carry the container name (containerName) and container type (containerType) of the container to be created corresponding to each node to be deployed.
  • the first request message can be stored in the first message queue, so that the first request message can be subsequently obtained from the first message queue to perform the processing based on the first request.
  • the deployed containers on the server corresponding to the message and the physical pool and the containers to be deleted among the deployed containers determine the deployment operation type corresponding to the node to be deployed and the steps to determine the containers to be deleted among the deployed containers.
  • the data format of the first request message can be verified, and/or according to the preset deployment rules, the data format carried in the first request message can be verified.
  • the deployment data is verified, and if the verification passes, the first request message is stored in the first message queue.
  • the legality and validity of the first request message can be ensured, thereby ensuring the security of the processing process.
  • Step 2 Based on the first request message and the deployed container on the server corresponding to the physical pool, determine the deployment operation type corresponding to the node to be deployed and the container to be deleted in the deployed container.
  • the deployment operation type includes adding a node and moving a node. and do not change the node.
  • the container to be created corresponding to the node to be deployed carried in the first request message can be compared with the deployed container on the server to determine which nodes to be deployed are nodes that need to be added ( That is, corresponding containers need to be created in the server), which nodes to be deployed need to be moved (that is, their corresponding containers need to be moved from one server to another), and which nodes to be deployed need not be changed. , and it can be determined which deployed containers need to be deleted (that is, the corresponding deployed containers need to be deleted in the server).
  • Example 1 there are two servers (the server IDs are 10.10.86.214 and 10.10.86.215 respectively).
  • the containers that need to be deployed under each server can be seen in Figure 11.
  • Figure 11 shows a deployment according to an embodiment of the present invention.
  • Data diagram as shown in Figure 11, there are three containers that need to be deployed under the server 10.10.86.214.
  • the containerName of the two containers with containerType 1 and 5 is empty, indicating that the containers corresponding to the NameNode and SecondaryNameNode nodes need to be deployed.
  • the new container deployed so that it can be determined that the deployment operation type of the NameNode and SecondaryNameNode nodes is a new node; the containerName with containerType 6 is not empty, indicating that the container corresponding to the ClickHouse node has been deployed before; and there are total nodes under the server 10.10.86.215
  • the four containers that need to be deployed have a containerType of 2, which means that four new containers corresponding to the DataNode nodes need to be deployed.
  • Example 2 After the deployment operation of Example 1 is completed, the deployment result data shown in Example 2 will be obtained. By comparing the deployment results of Example 1 and Example 2, it can be determined that the container with containerType 6 in Example 2 will be used in this deployment. does not appear in , it can be judged that the component has been deleted during this deployment.
  • Example 3 there are two servers (the server IDs are 10.10.86.214 and 10.10.86.215 respectively).
  • the containers that need to be deployed under each server can be seen in Figure 12.
  • Figure 12 shows a deployment according to an embodiment of the present invention. Data diagram, as shown in Figure 12, two containers with containerType 1 and 5 under server 10.10.86.214 have been deployed before, and the IP addresses of this deployment have not changed, indicating that the containers corresponding to the NameNode and SecondaryNameNode nodes do not need to be redone this time. Deployment, it can be determined that the deployment operation type of the NameNode and SecondaryNameNode nodes is the unchanged node; the containerType is 2.
  • the container was deployed before, and the IP address of this deployment was changed from 10.10.86.215 to 10.10.86.214, indicating that the deployed container is deployed by the server 10.10 .86.214 is moved to server 10.10.86.215, so it can be determined that the deployment operation type of the DataNode node is a mobile node; the containerName of the container with containerType 7 is empty, indicating that the container corresponding to the Hive node is a new container that needs to be deployed, so it can be determined that Hive
  • the deployment operation type of the node is to add a new node; there are three containers that need to be deployed under the server 10.10.86.215, the containerType is 2, and the IP address of this deployment has not changed, which means that the deployment operation type of the four DataNode nodes is Node has not changed.
  • Example 4 After the deployment operation in Example 3 is completed, the deployment result data shown in Example 4 can be obtained.
  • Step 3 Deploy the container on the server corresponding to the physical pool according to the deployment operation type corresponding to the node to be deployed and the container to be deleted among the deployed containers.
  • the component plug-in corresponding to the node type of the node to be deployed is called, and a container corresponding to the node to be deployed is created on the server corresponding to the physical pool.
  • the corresponding relationship between the node type and the component plug-in is preset. Based on the node type of the node to be deployed, the corresponding component plug-in can be found, so that the container can be created through the corresponding component plug-in.
  • the deployment operation type is mobile node
  • the data in each container will be persisted to a storage device such as a hard disk
  • the deployed data can be obtained from the hard disk.
  • the data in the container is stored in the created container to copy the data.
  • the container to be deleted when there is a container to be deleted among the deployed containers, the container to be deleted is deleted from the server corresponding to the physical pool.
  • the database associated with the computing device may also include a first deployment table and a second deployment table.
  • the first deployment table may be used to record each container deployment process
  • the second deployment table may be used to record each container deployment process. The specific deployment content of the container deployment process is recorded.
  • the big data cluster deployment method may also include the following processes:
  • a container deployment record corresponding to the node to be deployed is generated in the second deployment table, and the container deployment record is used to record the deployment operation corresponding to the node to be deployed.
  • the deployment status of this operation can be recorded in the operation record, and the deployment status of the container corresponding to the node to be deployed can be recorded in the container deployment record.
  • the deployment status can include not deployed, deployed, deployment error, etc.
  • the computing device can update the deployment status in the container deployment record based on the deployment status of each container, and then update the deployment status in the operation record to deployment complete when each container has been deployed.
  • each node in the deployment resource pool can be displayed in different colors to facilitate user viewing.
  • nodes in the undeployed state can be displayed in gray
  • nodes in the deployed state can be displayed in green
  • nodes in the deployment error state can be displayed in red.
  • the computing device can query the deployment status of the container every preset period of time, so as to update the display mode of each node in the deployment resource pool based on the queried deployment status.
  • the preset duration may be 10 seconds.
  • the preset duration may also be other durations.
  • the present invention does not limit the specific value of the preset duration.
  • the reason for the failure can also be recorded in the container deployment record so that relevant technicians can troubleshoot the problem.
  • the above embodiment mainly introduces the process of adding a physical pool and deploying containers corresponding to nodes in the physical pool.
  • the deployment resource pool area can also be set with a delete physical pool control, a top physical pool control, etc., in order to provide users with More diverse functions.
  • one physical pool corresponds to a delete physical pool control.
  • Relevant technical personnel can trigger the delete physical pool control corresponding to any physical pool, and the computing device can respond to any delete physical pool control. Trigger operation, the physical pool corresponding to the triggered delete physical pool control will no longer be displayed in the deployment resource pool area.
  • each physical pool displayed in the deployment resource pool area of the deployment interface as shown in Figure 4 has an " ⁇ " button in the upper right corner, which is the delete button.
  • the delete button For the physical pool control, users can delete the corresponding physical pool by triggering any " ⁇ " button.
  • the computing device can delete the deployed container from the server corresponding to the physical pool corresponding to the deletion of the physical pool control in response to the triggering operation of any deletion of the physical pool control.
  • the computing device can query the deployed container included in the server corresponding to the triggered deletion of the physical pool control through the second deployment table, thereby calling the Docker API to delete the total from the corresponding server.
  • the interface of the deployed container to complete the deletion of the deployed container.
  • the deployment resource pool area includes a pinned physical pool control
  • relevant technicians can use the pinned physical pool control to change the display position of the physical pool in the deployment resource pool.
  • one physical pool corresponds to a top physical pool control.
  • Relevant technical personnel can trigger the top physical pool control corresponding to any physical pool, and the computing device can respond to any top physical pool control. Trigger the operation to display the physical pool corresponding to the pinned physical pool control at the first target position in the deployment resource pool area.
  • the first target location may be the leftmost location in the deployment resource pool area.
  • each physical pool displayed in the deployment resource pool area of the deployment interface as shown in Figure 4 has a " ⁇ " button in the upper right corner, which is If the physical pool control is removed from the top, the user can change the display position of the corresponding physical pool by triggering any " ⁇ " button.
  • the present invention adds a top physical pool function and triggers a top physical pool control corresponding to any physical pool.
  • the physical pool will be moved to the leftmost position of the deployment resource pool, that is, the first place in the deployment resource pool, and the other physical pools will be moved to the right in sequence, making it easier for users to move to the first place. Nodes in the physical pool are operated, thereby improving user experience.
  • the present invention can also provide a recovery setting control in the deployment interface, so that when a user using the big data platform encounters some problems and wants to restore the big data platform to its initial state and redeploy it, the user can restore the setting control through the recovery setting control. Restore the big data platform to its original state.
  • the computing device can generate a third request message in response to the triggering operation of the restore settings control.
  • the third request message is used to request deletion of the deployed server and container; based on the third request message text, delete multiple deployed containers from the deployed server, and execute a third preset script file to detach the deployed server from the big data cluster.
  • the data format of the third request message can be verified, so that if the verification passes, the third request message can be processed to ensure that the third request message
  • the legality and validity of the request message can ensure the security of the processing process.
  • all containers in the big data cluster can be queried through the second deployment table.
  • the container has been deployed, thus traversing the container list in sequence, calling the Docker API through the deployed server IP and container name to delete the container interface and complete the deletion of all deployed containers; and, through the first deployment table, query the big data cluster
  • the computing device can display prompt information multiple times to confirm to the user whether to restore factory settings, so as to After receiving the instruction to confirm that the factory settings are to be restored, the third message data is generated.
  • the prompt information can be of various types such as copywriting.
  • FIG 13 is a schematic flowchart of a factory settings restore process according to an embodiment of the present invention.
  • the factory settings restore process After the user triggers the factory settings restore process, Multiple copywriting prompts can be used to confirm whether the user is sure to perform the operation of restoring factory settings.
  • the backend of the computing device that is, the server
  • the backend of the computing device can perform verification based on the received request message, thereby in If the verification passes, delete all containers by traversing all servers, and then remove all servers from the big data cluster. In addition, delete all stored data in the big data cluster to restore the big data cluster to its initial state.
  • the deployment interface will also be restored to the initial state of the system.
  • the above embodiments are mainly explained around several big data components commonly used in big data clusters.
  • the present invention can also support the deployment of other types of containers.
  • the method provided by the present invention can also support container deployment processes such as distributed log system (Kafka) components and remote dictionary service (Remote Dictionary Server, Redis) components.
  • container deployment processes such as distributed log system (Kafka) components and remote dictionary service (Remote Dictionary Server, Redis) components.
  • Kafka distributed log system
  • remote dictionary service Remote Dictionary Server, Redis
  • Redis is an open source log-type Key-Value database written in ANSI C language, supports network, can be memory-based and persistent, and provides APIs in multiple languages.
  • a single Redis component is unstable. When the Redis service goes down, the service will be unavailable. Moreover, the read and write capabilities of a single Redis component are limited. Using a Redis cluster can enhance the read and write capabilities of Redis. When a server goes down, Other servers can still work normally without affecting use.
  • developers can prepare in advance basic image files for deploying Redis clusters, and develop and deploy Redis plug-ins, so that the deployment of Redis clusters can be realized through the method provided by the present invention.
  • the Redis component can be displayed on the deployment interface.
  • the user can select Redis and trigger the node creation control.
  • the computing device can respond to the user's trigger operation on the node creation control and display 6 in the temporary resource pool area.
  • the Redis node to be deployed, thereby dragging the Redis node to at least one physical pool of the deployment resource pool, thereby triggering the start deployment control.
  • the computing device can respond to the triggering operation of the start deployment control and generate JSON request message data.
  • a redis.conf configuration file can be generated for each Redis node based on the configuration file template.
  • Figure 14 is a schematic diagram of a redis.conf configuration file according to an embodiment of the present invention.
  • 6 Redis nodes correspond to 6 redis.conf configuration files.
  • the value range of ⁇ .Port ⁇ (that is, the port number) is 6379-6384.
  • the Redis cluster function can be used through the IP address of the server and the port number in any of the above configuration files.
  • FIG 15 is a schematic diagram of a Redis cluster building process according to an embodiment of the present invention.
  • developers can develop Redis plug-ins in advance, that is, build Redis base image, and then abstract the configuration file to develop the plug-in function (including reading parameter items, creating configuration files, copying the configuration file to the remote target machine, starting the container on the target machine, etc.), and then compiling the plug-in , install and load so that the Redis container can be deployed via the loaded Redis plug-in.
  • the node is first deployed, thereby dragging the deployed Redis node to the deployment resource pool, and then triggering the deployment operation.
  • the server processes according to the deployment logic and calls the Redis plug-in to pass parameters to achieve Set up Redis, otherwise, return an error message so that you can re-deploy the Redis node.
  • the above process mainly introduces some content about the big data cluster construction process. After building the big data cluster and deploying the corresponding containers in the big data cluster, you can provide services to users through the built big data cluster. Among them, different containers in the big data cluster communicate through the Overlay network to jointly provide services to users.
  • Figure 16 is a flow chart of a data processing method based on a big data cluster according to an embodiment of the present invention. As shown in Figure 16, the method includes:
  • Step 1601 Obtain the data processing request.
  • Step 1602 Send a data processing request to the target container through the Overlay network.
  • the target container is used to implement the data processing process based on the data processing request.
  • the container is created on the server according to the drag and drop operation of the node to be deployed in the deployment interface.
  • the container uses To provide big data cluster services.
  • the communication between the various containers in the big data cluster is ensured through the Overlay network, so that when a data processing request is obtained, the data processing request can be sent to the target container through the Overlay network, so that the target container can implement the data processing process based on the data processing request. Meet users' data processing needs.
  • At least one target container when sending a data processing request to the target container through the Overlay network, at least one target container can be determined based on the data processing request, so as to send the data processing request to at least one target container through the Overlay network.
  • At least one target container includes at least a first target container and a second target container. Then, when sending a data processing request to at least one target container through the Overlay network, the data processing request can be sent to the first target container through the Overlay network, and the first target container is used to communicate with the second target container through the Overlay network. Complete responses to data processing requests.
  • the first target container can encapsulate the data to be transmitted to obtain the first data message, and then process the first data message again. Encapsulate to obtain a second data message, in which the destination IP address of the first data message is the IP address of the second target container, the source IP address is the IP address of the first target container, and the destination IP address of the second data message is The address is the IP address of the server where the second target container is located, and the source IP address is the IP address of the server where the first target container is located.
  • the first target container can send the second data packet to the second target container through the Overlay network. , so that the second target container can disassemble the double-layer encapsulation of the second data message to obtain the data part that actually needs to be processed.
  • the data processing request can be a data storage request, a data acquisition request or a data deletion request.
  • data processing requests may also include other types of requests, and the present invention does not limit the specific types of data processing requests.
  • Embodiments of the present invention also provide a big data cluster deployment system and a corresponding data processing system, which may include at least a visual operation module and an architecture service module.
  • the visual operation module is used to provide users with a convenient big data cluster deployment operation interface, which can add and delete servers, deploy, move, delete nodes included in big data components, restore cluster factory settings, etc.
  • the architecture service module can be used It provides API interface services, data rule verification, component deployment logic processing, message processing, plug-in calling, database persistence and other functions.
  • the system can also include a message module, a database module, a network module and a big data component plug-in module.
  • the message module is a message queue based on RabbitMQ. It completes message production and consumption when called by the architecture deployment module. It can be used in time-consuming scenarios (such as server initialization, component container deployment, etc.) to improve user experience and ensure data consistency.
  • the database module uses the MYSQL database to store server status, component deployment status, component deployment and relationship information between servers;
  • the network module is a docker-based overlay network, which is used when the big data service container starts network to ensure cross-server communication between containers;
  • the big data component plug-in module is used to develop pluggable startup plug-ins for each big data component.
  • the plug-in is started by combining the server IP with the plug-in parameters to complete the installation of the specified component container. Start on the specified server.
  • the visual operation module is used to display the deployment interface
  • the visual operation module is also used to display the nodes to be deployed in the temporary resource pool area of the deployment interface in response to the node creation operation on the deployment interface.
  • the nodes are services included in the big data component and used to provide data management functions;
  • the visual operation module is also used to display the nodes to be deployed in the physical pool in the deployment resource pool area of the deployment interface in response to the drag operation on the nodes to be deployed in the temporary resource pool area;
  • the architecture service module is used to respond to the start deployment operation in the deployment interface and create a container corresponding to the node to be deployed on the server corresponding to the physical pool according to the physical pool where the node to be deployed is located.
  • the container is used to provide big data cluster services. .
  • the deployment interface includes a node creation area, the node creation area includes a node creation control and at least one big data component;
  • the visual operation module is used to respond to the node creation operation on the deployment interface and when the nodes to be deployed are displayed in the temporary resource pool area of the deployment interface:
  • the node to be deployed corresponding to the selected big data component is displayed in the temporary resource pool area.
  • the node creation area also includes a node parameter setting control, which is used to set the version of the node to be deployed;
  • the visual operation module is used to display the node to be deployed corresponding to the selected big data component in the temporary resource pool area in response to the triggering operation of the node creation control:
  • the node to be deployed corresponding to the version set through the node parameter setting control is displayed in the temporary resource pool area.
  • the big data components include at least HDFS components, YARN components, Hive components, and Clickhouse components.
  • the deployment resource pool area includes at least one physical pool
  • the visual operation module is used to display the nodes to be deployed in the physical pool in the deployment resource pool area of the deployment interface in response to the drag operation on the nodes to be deployed in the temporary resource pool area:
  • the node to be deployed in response to the drag operation of the node to be deployed, the node to be deployed is displayed in the physical pool indicated at the end of the drag operation.
  • the architecture service module is used to deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool according to the physical pool where the node to be deployed is located in response to the start deployment operation in the deployment interface, Used for:
  • the architecture service module is used to deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool through the target interface, including:
  • the target plug-in is a binary package, and the target plug-in is stored at a set location in the big data cluster;
  • the acquisition process of the target plug-in includes:
  • the architecture service module is used to respond to the start deployment operation in the deployment interface and deploy the container corresponding to the node to be deployed on the server corresponding to the physical pool according to the physical pool where the node to be deployed is located, Used for:
  • a first request message is generated based on the node to be deployed and the physical pool in which the node to be deployed is located, and the first request message is used to instruct the container corresponding to the node to be deployed to be deployed on the server corresponding to the physical pool;
  • the deployment operation type includes new node, mobile node and unchanged node;
  • the container is deployed on the server corresponding to the physical pool.
  • the architecture service module is also used to store the first request message in the first message queue
  • the system also includes:
  • the message module is used to obtain the first request message from the first message queue
  • the architecture service module is also used to, when the message module obtains the first request message, execute the deployed container on the server corresponding to the first request message and the physical pool, determine the deployment operation type corresponding to the node to be deployed, and Steps to delete a container in a deployed container.
  • the architecture service module is used to deploy containers on the server corresponding to the physical pool according to the deployment operation type corresponding to the node to be deployed and the container to be deleted in the deployed container, and is used to:
  • the deployment operation type is mobile node
  • the architecture service module is also used to verify the data format of the first request message
  • the architecture service module is also used to verify the deployment data carried in the first request message according to the preset deployment rules.
  • system further includes a database module
  • the database module is used to generate an operation record in the first deployment table in response to the first request message, and the operation record is used to record this deployment operation;
  • the database module is also configured to respond to the first request message and generate a container deployment record corresponding to the node to be deployed in the second deployment table, where the container deployment record is used to record the deployment operation corresponding to the node to be deployed.
  • the database module is also used to record the deployment status of this operation in the operation record
  • the database module is also used to record the deployment status of the container corresponding to the node to be deployed in the container deployment record;
  • the deployment status includes at least undeployed, deployed and deployment error
  • the nodes to be deployed include multiple types
  • the visual operation module is also used to display the deployment instruction interface
  • the visual operation module is also used to obtain the target data filled in by the user through the deployment instruction interface.
  • the target data is used to indicate the number of pieces of data stored per second of the container to be deployed;
  • the architecture server module is also used to determine the recommended number of deployments corresponding to various types of nodes to be deployed based on target data and preset parameters.
  • deploying the resource pool area includes adding a new physical pool control; a visual operation module is also used to:
  • the add physical pool interface In response to the triggering operation of the new physical pool control, the add physical pool interface is displayed, and the added physical pool interface includes an identification acquisition control and a password acquisition control;
  • the physical pool to be added is displayed in the deployment resource pool area.
  • the architecture service module is also used to generate a second request message when the password to be verified passes the verification;
  • the architecture service module is also used to store the second request message in the second message queue
  • the device also includes:
  • a message module used to obtain the second request message from the second message queue
  • the architecture service module is also used to send an installation file to the server corresponding to the physical pool to be added based on the second request message.
  • the server is used to install the installation file after receiving the installation file so that the server can join the big data. cluster.
  • the visual operation module is also used to display a first prompt message when the password to be verified is not verified or the server fails to successfully join the big data cluster.
  • the first prompt message is used to indicate that the server has not successfully joined the big data cluster. reasons for data clustering.
  • system further includes:
  • the database module is used to generate a server deployment record in the third deployment table when the password to be verified passes the verification.
  • the server deployment record is used to record the deployment operation corresponding to the physical pool to be added.
  • the database module is also used to record the initialization status of the server corresponding to the physical pool to be added in the server deployment record.
  • the initialization status at least includes to be initialized, initializing, initialization error, and initialization completed.
  • deploying the resource pool area includes deleting a physical pool control, one physical pool corresponding to one deleting physical pool control;
  • the visual operation module is also used to respond to the trigger operation of any deleted physical pool control and no longer display the physical pool corresponding to the deleted physical pool control in the deployment resource pool area.
  • the architecture service module is also configured to delete deployed containers from the server corresponding to the physical pool corresponding to the deletion of the physical pool control in response to the triggering operation of any deletion of the physical pool control.
  • the deployment resource pool area includes a top physical pool control, and one physical pool corresponds to a top physical pool control;
  • the visual operation module is also configured to display the physical pool corresponding to the top physical pool control at the first target position in the deployment resource pool area in response to a triggering operation on any top physical pool control.
  • the visual operation module is also configured to display, for any physical pool displayed in the deployment resource pool area, the server identifier of the server corresponding to the physical pool at the second target location of the physical pool.
  • the third target position displays the current storage usage, memory usage and allocated memory usage of the server corresponding to the physical pool.
  • the deployment interface also includes a restore settings control
  • the architecture service module is also used to generate a third request message in response to the triggering operation of the recovery setting control, and the third request message is used to request deletion of the deployed server and container;
  • the architecture service module is also used to delete multiple deployed containers from the deployed server based on the third request message, and execute the third preset script file to separate the deployed server from the big data cluster.
  • the big data cluster includes at least one server, an initial server exists in at least one server, and the architecture service module is also used for:
  • the big data component base image is used to provide a building foundation for the container;
  • the system also includes a network module for ensuring cross-server communication between containers.
  • the network module is used to send the data processing request to the target container through the Overlay network after obtaining the data processing request.
  • the target container is used to implement the data processing process based on the data processing request.
  • the container is configured according to the configuration in the deployment interface.
  • the nodes to be deployed are created on the server by dragging and dropping, and the container is used to provide big data cluster services.
  • the network module when used to send data processing requests to the target container through the Overlay network, is used to:
  • At least one target container when the number of target containers is greater than or equal to 2, at least one target container includes at least a first target container and a second target container;
  • the network module when used to send data processing requests to at least one target container through the Overlay network, is used to:
  • a data processing request is sent to the first target container through the Overlay network, and the first target container is used to communicate with the second target container through the Overlay network to complete the response to the data processing request.
  • the data processing request is a data storage request, a data retrieval request, or a data deletion request.
  • the system also includes a big data component plug-in module, and the big data component plug-in module is used to start the container on the server.
  • the system embodiment since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details.
  • the system embodiments described above are only illustrative.
  • the modules described as separate components may or may not be physically separated.
  • the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple material modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • Figure 17 is a flow chart of a module interaction process according to an embodiment of the present invention.
  • Figure 17 takes the interaction process between modules when adding a physical pool as an example.
  • the visual operation The module is used to add the server to the big data cluster and poll the initialization status of the server every 10 seconds.
  • the JSON format request message can be generated through the visual operation module, thus Send a request message to the architecture service module.
  • the architecture service module After receiving the request message, the architecture service module will test the SSH connection and log in to the server remotely. When the login is successful, it will send a message to the message queue in the message module and to the database. Insert a record of server initialization into the module.
  • the architecture service module can also perform message monitoring on the message module to obtain messages from the message queue of the message module, and install and join the environment on the server based on the obtained messages. Docker Swarm cluster and other operations, and can update the server initialization status recorded in the database module.
  • Figure 18 is a flow chart of another module interaction process according to an embodiment of the present invention.
  • Figure 18 takes the interaction process between various modules when deploying a container as an example.
  • visualization The operation module is used to deploy the container corresponding to the big data component and query the container deployment status every 10 seconds.
  • a request message in JSON format can be generated through the visual operation module, thus Send a request message to the architecture service module.
  • the architecture service module After receiving the request message, the architecture service module verifies the request message based on the deployment rules, and if the verification is successful, determines the deployment method (that is, the node Deployment operation type), send a message to the message queue in the message module, and add the first deployment table (that is, the deployment table shown in Figure 18) and the second deployment table (that is, the deployment table shown in Figure 18) in the database module Deployment table as shown in ) records.
  • the architecture service module can also perform message monitoring on the message module to obtain messages from the message queue of the message module, and start plug-ins on the server based on the obtained messages.
  • the deployment status recorded in the database module can be updated.
  • Figure 19 is a flow chart of another module interaction process according to an embodiment of the present invention.
  • Figure 19 takes the interaction process between various modules when restoring factory settings as an example.
  • visualization The operation module is used to reset the big data cluster.
  • the visual operation module can query the deployed container list to delete all deployed containers based on the content recorded in the database module, and can also query the server List to detach all services from the big data cluster to complete the reset of the big data cluster.
  • FIG. 20 is a schematic structural diagram of a computing device according to an exemplary embodiment of the present invention.
  • the computing device includes a processor 2010, a memory 2020 and a network interface 2030.
  • the memory 2020 is used to store computer instructions that can be run on the processor 2010.
  • the processor 2010 is used to implement any of the present invention when executing the computer instructions.
  • the network interface 2030 is used to implement input and output functions.
  • the computing device may also include other hardware, which is not limited by the present invention.
  • the present invention also provides a computer-readable storage medium.
  • the computer-readable storage medium can be in various forms.
  • the computer-readable storage medium can be: RAM (Radom Access Memory). ), volatile memory, non-volatile memory, flash memory, storage drives (such as hard drives), solid state drives, any type of storage disk (such as optical disks, DVDs, etc.), or similar storage media, or a combination thereof.
  • the computer-readable medium can also be paper or other suitable media capable of printing the program.
  • the computer program is stored on the computer-readable storage medium. When the computer program is executed by the processor, the method provided by any embodiment of the present invention is implemented.
  • the present invention also provides a computer program product, which includes a computer program.
  • a computer program product which includes a computer program.
  • the computer program is executed by a processor, the method provided by any embodiment of the present invention is implemented.
  • first and second are used for descriptive purposes only and cannot be understood as indicating or implying relative importance.
  • plurality refers to two or more than two, unless expressly limited otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种大数据集群部署方法。本发明提供一个部署界面,以通过部署界面来提供大数据集群部署功能。该方法包括:响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点;响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点;响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上创建待部署节点对应的容器,以通过容器提供大数据集群服务。通过上述过程可以大大简化相关技术人员的操作,从而可以降低大数据集群的部署成本、提高部署效率。

Description

大数据集群部署方法以及基于大数据集群的数据处理方法 技术领域
本发明涉及计算机技术领域,尤其涉及一种大数据集群部署方法以及基于大数据集群的数据处理方法。
背景技术
随着计算机技术和信息技术的高速发展,行业应用系统的规模迅速扩大,行业应用所产生的数据呈指数形式增长。时至今日,数据规模达到数百太字节(TB)甚至数十拍字节(PB)或数百PB规模的行业或者企业已经出现,而为了有效地对大数据进行处理,有关大数据管理和应用方式的研究应运而生。
相关技术中,主要是通过分布式计算平台(如Hadoop),来实现大数据集群的部署和运行,以便通过所部署的服务实现数据的高速运算和存储。
然而,在通过分布式计算平台部署大数据集群的服务时,需要相关技术人员手动下载组件压缩包、安装软件开发环境、修改相关配置文件等,操作复杂,而且对相关技术人员的要求较高,从而导致大数据集群的部署成本较高、部署效率较低。
发明内容
本发明提供一种大数据集群部署方法以及基于大数据集群的数据处理方法,以解决相关技术中的不足。
根据本发明实施例的第一方面,提供一种大数据集群部署方法,该方法包括:
显示部署界面;
响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点,节点为大数据组件所包括的、用于提供数据管理功能的服务;
响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点;
响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上创建待部署节点对应的容器,容器用于提供大数据集群服务。
在一些实施例中,部署界面包括节点创建区域,节点创建区域包括节点创建控件和至少一个大数据组件;
响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点,包括:
在任一大数据组件被选中的情况下,响应于对节点创建控件的触发操作,在临时资源池区域中显示被选中的大数据组件对应的待部署节点。
在一些实施例中,节点创建区域还包括节点参数设置控件,节点参数设置控件用于设置待部署节点的版本;
响应于对节点创建控件的触发操作,在临时资源池区域中显示被选中的大数据组件对应的待部署节点,包括:
响应于对节点创建控件的触发操作,在临时资源池区域中显示通过节点参数设置控件所设置的版本对应的待部署节点。
在一些实施例中,大数据组件至少包括HDFS组件、YARN组件、Hive组件和Clickhouse组件。
在一些实施例中,部署资源池区域包括至少一个物理池;
响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点,包括:
对于任一待部署节点,响应于对待部署节点的拖拽操作,将待部署节点显示在拖拽操作结束时所指示的物理池中。
在一些实施例中,响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上部署待部署节点对应的容器,包括:
响应于开始部署操作,基于待部署节点所属的大数据组件的组件类型,确定目标插件;
通过目标插件,启动位于物理池对应的服务器上的目标接口;
通过目标接口,在物理池对应的服务器上部署待部署节点对应的容器。
在一些实施例中,通过目标接口,在物理池对应的服务器上部署待部署节点对应的容器,包括:
通过目标插件读取第一配置文件,以从第一配置文件中获取目标安装环境;
通过目标接口,对服务器的目标安装环境的配置文件进行修改,以在物理池对应的服务器上部署待部署节点对应的容器。
在一些实施例中,目标插件为二进制包,目标插件存储在大数据集群中的设定位置处;
目标插件的获取过程包括:
获取被上传至大数据集群的初始服务器中的目标插件;
将目标插件存储在大数据集群中的设定位置处。
在一些实施例中,响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上部署待部署节点对应的容器,包括:
基于待部署节点以及待部署节点所处的物理池,生成第一请求报文,第一请求报文用于指示在物理池对应的服务器上部署待部署节点对应的容器;
基于第一请求报文和物理池对应的服务器上的已部署容器,确定待部署节点对应的部署操作类型以及已部署容器中的待删除容器,部署操作类型包括新增节点、移动节点和不改变节点;
按照待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在物理池对应的服务器上进行容器部署。
在一些实施例中,响应于开始部署操作,基于待部署节点以及待部署节点所处的物理池,生成第一请求报文之后,该方法还包括:
将第一请求报文存储至第一消息队列中;
从第一消息队列中获取第一请求报文,执行基于第一请求报文和物理池对应的服务器上的已部署容器,确定待部署节点对应的部署操作类型以及已部署容器中的待删除容器的步骤。
在一些实施例中,按照待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在物理池对应的服务器上进行容器部署,包括:
在部署操作类型为新增节点的情况下,调用待部署节点的节点类型对应的组件插件,在物理池对应的服务器上创建待部署节点对应的容器;
在部署操作类型为移动节点的情况下,从已部署待部署节点对应容器的服务器上删除待部署节点对应的已部署容器,在物理池对应的服务器中创建待部署节点对应的容器,并将已部署容器中的数据拷贝至所创建的容器中;
在部署操作类型为不改变节点的情况下,无需在物理池对应的服务器上进行操作;
在已部署容器中存在待删除容器的情况下,从物理池对应的服务器上删除待删除容器。
在一些实施例中,响应于开始部署操作,基于待部署节点以及待部署节点所处的物理池,生成第一请求报文之后,该方法还包括下述至少一项:
对第一请求报文的数据格式进行校验;
按照预设部署规则,对第一请求报文所携带的部署数据进行校验。
在一些实施例中,该方法还包括下述至少一项:
响应于第一请求报文,在第一部署表中生成操作记录,操作记录用于记录本次部署操作;
响应于第一请求报文,在第二部署表中生成待部署节点对应的容器部署记录,容器部署记录用于记录待部署节点对应的部署操作。
在一些实施例中,该方法还包括下述至少一项:
在操作记录中记录本次操作的部署状态;
在容器部署记录中记录待部署节点对应的容器的部署状态;
其中,部署状态至少包括未部署、已部署和部署错误。
在一些实施例中,待部署节点包括多种类型,该方法还包括:
显示部署指示界面;
通过部署指示界面获取用户填写的目标数据,目标数据用于指示待部署的容器的每秒存储数据条数;
基于目标数据和预设参数,确定各种类型的待部署节点对应的推荐部署数量。
在一些实施例中,部署资源池区域包括新增物理池控件,该方法还包括:
响应于对新增物理池控件的触发操作,显示增加物理池界面,增加物理池界面包括标识获取控件和密码获取控件;
通过标识获取控件获取待增加的物理池对应的服务器标识,通过密码获取控件获取待验证密码;
在待验证密码验证通过的情况下,在部署资源池区域显示待增加的物理池。
在一些实施例中,通过标识获取控件获取待增加的物理池对应的服务器标识,通过密码获取控件获取待验证密码之后,该方法还包括:
在待验证密码验证通过的情况下,生成第二请求报文;
将第二请求报文存储至第二消息队列中;
从第二消息队列中获取第二请求报文,基于第二请求报文,向待增加的物理池对应的服务器发送安装文件,服务器用于在接收到安装文件的情况下对安装文件进行安装,以使服务器加入大数据集群。
在一些实施例中,该方法还包括:
在待验证密码未验证通过或者服务器未成功加入大数据集群的情况下,显示第一提示信息,第一提示信息用于指示服务器未成功加入大数据集群的原因。
在一些实施例中,该方法还包括:
在待验证密码验证通过的情况下,在第三部署表中生成服务器部署记录,服务器部署记录用于记录待增加的物理池对应的部署操作。
在一些实施例中,该方法还包括:
在服务器部署记录中记录待增加的物理池对应的服务器的初始化状态,初始化状态至少包括待初初始化、初始化中、初始化错误和初始化完成。
在一些实施例中,该方法还包括:
在待验证密码验证通过的情况下,向待增加的物理池对应的服务器发送目标密钥,目标密钥用于在后续通信过程中实现身份验证。
在一些实施例中,部署资源池区域包括删除物理池控件,一个物理池对应于一个删除物理池控件,该方法还包括:
响应于对任一删除物理池控件的触发操作,不再在部署资源池区域显示删除物理池控件对应的物理池。
在一些实施例中,该方法还包括:
响应于对任一删除物理池控件的触发操作,从删除物理池控件对应的物理池所对应的服务器中删除已部署的容器。
在一些实施例中,部署资源池区域包括置顶物理池控件,一个物理池对应于一个置顶物理池控件,该方法还包括:
响应于对任一置顶物理池控件的触发操作,将置顶物理池控件对应的物理池显示在部署资源池区域中的第一目标位置处。
在一些实施例中,该方法还包括:
对于部署资源池区域中所显示的任一物理池,在物理池的第二目标位置处显示物理池所对应的服务器的服务器标识,在物理池的第三目标位置处显示物理池所对应的服务器的当前存储使用率、内存占用率和分配内存占用率。
在一些实施例中,部署界面还包括恢复设置控件,该方法还包括:
响应于对恢复设置控件的触发操作,生成第三请求报文,第三请求报文用于请求删除已部署的服务器和容器;
基于第三请求报文,从已部署的服务器中删除已部署的多个容器,执行第三预设脚本文件以使已部署的服务器脱离大数据集群。
在一些实施例中,大数据集群包括至少一个服务器,至少一个服务器中存在一个初始服务器,该方法包括:
在初始服务器上安装目标运行环境,并在初始服务器上配置目标运行环境对应的接口;
在初始服务器上创建目标运行环境对应的Overlay网络,并在初始服务器上初始化集群环境;
在初始服务器上创建大数据组件基础镜像,大数据组件基础镜像用于为容器提供构建基础;
在初始服务器上生成目标密钥文件。
在一些实施例中,大数据集群的不同容器之间通过Overlay网络进行通信。
根据本发明实施例的第二方面,提供一种基于大数据集群的数据处理方法,该方法包括:
通过管理插件获取数据处理请求,数据处理请求用于指示对目标服务器上的数据进行处理;
通过Overlay网络,向目标服务器发送数据处理请求,目标服务器用于基于数据处理请求,通过目标服务器上所包括的容器实现数据处理过程,容器按照对部署节点的临时资源池区域中的待部署节点的拖拽操作在目标服务器上创建得到,容器用于提供大数据集群服务。
在一些实施例中,通过Overlay网络,向目标服务器发送数据处理请求,包括:
基于数据处理请求,从目标服务器中确定至少一个目标容器;
通过Overlay网络,向至少一个目标容器发送数据处理请求。
在一些实施例中,在目标容器的数量大于等于2的情况下,至少一个目标容器至少包括第一目标容器和第二目标容器;
通过Overlay网络,向至少一个目标容器发送数据处理请求,包括:
通过Overlay网络,向第一目标容器发送数据处理请求,第一目标容器用于通过Overlay网络与第二目标容器进行通信,以完成对数据处理请求的响应。
在一些实施例中,数据处理请求为数据存储请求、数据获取请求或数据删除请求。
根据本发明实施例的第三方面,提供一种大数据集群部署系统以及对应的数据处理系统,该系统包括:
可视化操作模块,用于显示部署界面;
可视化操作模块,还用于响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点,节点为大数据组件所包括的、用于提供数据管理功能的服务;
可视化操作模块,还用于响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点;
架构服务模块,用于响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上创建待部署节点对应的容器,容器用于提供大数据集群服务。
在一些实施例中,部署界面包括节点创建区域,节点创建区域包括节点创建控件和至少一个大数据组件;
可视化操作模块,在用于响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点时,用于:
在任一大数据组件被选中的情况下,响应于对节点创建控件的触发操作,在临时资源池区域中显示被选中的大数据组件对应的待部署节点。
在一些实施例中,节点创建区域还包括节点参数设置控件,节点参数设置控件用于设置待部署节点的版本;
可视化操作模块,在用于响应于对节点创建控件的触发操作,在临时资源池区域中显示被选中的大数据组件对应的待部署节点时,用于:
响应于对节点创建控件的触发操作,在临时资源池区域中显示通过节点参数设置控件所设置的版本对应的待部署节点。
在一些实施例中,大数据组件至少包括HDFS组件、YARN组件、Hive组件和Clickhouse组件。
在一些实施例中,部署资源池区域包括至少一个物理池;
可视化操作模块,在用于响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点时,用于:
对于任一待部署节点,响应于对待部署节点的拖拽操作,将待部署节点显示在拖拽操作结束时所指示的物理池中。
在一些实施例中,架构服务模块,在用于响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上部署待部署节点对应的容器时,用于:
响应于开始部署操作,基于待部署节点所属的大数据组件的组件类型,确定目标插件;
通过目标插件,启动位于物理池对应的服务器上的目标接口;
通过目标接口,在物理池对应的服务器上部署待部署节点对应的容器。
在一些实施例中,架构服务模块,在用于通过目标接口,在物理池对应的服务器上部署待部署节点对应的容器,包括:
通过目标插件读取第一配置文件,以从第一配置文件中获取目标安装环境;
通过目标接口,对服务器的目标安装环境的配置文件进行修改,以在物理池对应的服务器上部署待部署节点对应的容器。
在一些实施例中,目标插件为二进制包,目标插件存储在大数据集群中的设定位置处;
目标插件的获取过程包括:
获取被上传至大数据集群的初始服务器中的目标插件;
将目标插件存储在大数据集群中的设定位置处。
在一些实施例中,架构服务模块,在用于响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上部署待部署节点对应的容器时,用于:
响应于开始部署操作,基于待部署节点以及待部署节点所处的物理池,生成第一请求报文,第一请求报文用于指示在物理池对应的服务器上部署待部署节点对应的容器;
基于第一请求报文和物理池对应的服务器上的已部署容器,确定待部署节点对应的部署操作类型以及已部署容器中的待删除容器,部署操作类型包括新增节点、移动节点和不改变节点;
按照待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在物理池对应的服务器上进行容器部署。
在一些实施例中,架构服务模块,还用于将第一请求报文存储至第一消息队列中;
该系统还包括:
消息模块,用于从第一消息队列中获取第一请求报文;
架构服务模块,还用于在消息模块获取到第一请求报文的情况下,执行基于第 一请求报文和物理池对应的服务器上的已部署容器,确定待部署节点对应的部署操作类型以及已部署容器中的待删除容器的步骤。
在一些实施例中,架构服务模块,在用于按照待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在物理池对应的服务器上进行容器部署时,用于:
在部署操作类型为新增节点的情况下,调用待部署节点的节点类型对应的组件插件,在物理池对应的服务器上创建待部署节点对应的容器;
在部署操作类型为移动节点的情况下,从已部署待部署节点对应容器的服务器上删除待部署节点对应的已部署容器,在物理池对应的服务器中创建待部署节点对应的容器,并将已部署容器中的数据拷贝至所创建的容器中;
在部署操作类型为不改变节点的情况下,无需在物理池对应的服务器上进行操作;
在已部署容器中存在待删除容器的情况下,从物理池对应的服务器上删除待删除容器。
在一些实施例中,架构服务模块,还用于对第一请求报文的数据格式进行校验;
架构服务模块,还用于按照预设部署规则,对第一请求报文所携带的部署数据进行校验。
在一些实施例中,系统还包括数据库模块;
数据库模块,用于响应于第一请求报文,在第一部署表中生成操作记录,操作记录用于记录本次部署操作;
数据库模块,还用于响应于第一请求报文,在第二部署表中生成待部署节点对应的容器部署记录,容器部署记录用于记录待部署节点对应的部署操作。
在一些实施例中,数据库模块,还用于在操作记录中记录本次操作的部署状态;
数据库模块,还用于在容器部署记录中记录待部署节点对应的容器的部署状态;
其中,部署状态至少包括未部署、已部署和部署错误;
在一些实施例中,待部署节点包括多种类型;
可视化操作模块,还用于显示部署指示界面;
可视化操作模块,还用于通过部署指示界面获取用户填写的目标数据,目标数据用于指示待部署的容器的每秒存储数据条数;
架构服务器模块,还用于基于目标数据和预设参数,确定各种类型的待部署节点对应的推荐部署数量。
在一些实施例中,部署资源池区域包括新增物理池控件;可视化操作模块,还用于:
响应于对新增物理池控件的触发操作,显示增加物理池界面,增加物理池界面包括标识获取控件和密码获取控件;
通过标识获取控件获取待增加的物理池对应的服务器标识,通过密码获取控件获取待验证密码;
在待验证密码验证通过的情况下,在部署资源池区域显示待增加的物理池。
在一些实施例中,架构服务模块,还用于在待验证密码验证通过的情况下,生成第二请求报文;
架构服务模块,还用于将第二请求报文存储至第二消息队列中;
该装置还包括:
消息模块,用于从第二消息队列中获取第二请求报文;
架构服务模块,还用于基于第二请求报文,向待增加的物理池对应的服务器发送安装文件,服务器用于在接收到安装文件的情况下对安装文件进行安装,以使服务器加入大数据集群。
在一些实施例中,可视化操作模块,还用于在待验证密码未验证通过或者服务器未成功加入大数据集群的情况下,显示第一提示信息,第一提示信息用于指示服务器未成功加入大数据集群的原因。
在一些实施例中,该系统还包括:
数据库模块,用于在待验证密码验证通过的情况下,在第三部署表中生成服务器部署记录,服务器部署记录用于记录待增加的物理池对应的部署操作。
在一些实施例中,数据库模块,还用于在服务器部署记录中记录待增加的物理池对应的服务器的初始化状态,初始化状态至少包括待初初始化、初始化中、初始化 错误和初始化完成。
在一些实施例中,部署资源池区域包括删除物理池控件,一个物理池对应于一个删除物理池控件;
可视化操作模块,还用于响应于对任一删除物理池控件的触发操作,不再在部署资源池区域显示删除物理池控件对应的物理池。
在一些实施例中,架构服务模块,还用于响应于对任一删除物理池控件的触发操作,从删除物理池控件对应的物理池所对应的服务器中删除已部署的容器。
在一些实施例中,部署资源池区域包括置顶物理池控件,一个物理池对应于一个置顶物理池控件;
可视化操作模块,还用于响应于对任一置顶物理池控件的触发操作,将置顶物理池控件对应的物理池显示在部署资源池区域中的第一目标位置处。
在一些实施例中,可视化操作模块,还用于对于部署资源池区域中所显示的任一物理池,在物理池的第二目标位置处显示物理池所对应的服务器的服务器标识,在物理池的第三目标位置处显示物理池所对应的服务器的当前存储使用率、内存占用率和分配内存占用率。
在一些实施例中,部署界面还包括恢复设置控件;
架构服务模块,还用于响应于对恢复设置控件的触发操作,生成第三请求报文,第三请求报文用于请求删除已部署的服务器和容器;
架构服务模块,还用于基于第三请求报文,从已部署的服务器中删除已部署的多个容器,执行第三预设脚本文件以使已部署的服务器脱离大数据集群。
在一些实施例中,大数据集群包括至少一个服务器,至少一个服务器中存在一个初始服务器,架构服务模块,还用于:
在初始服务器上安装目标运行环境,并在初始服务器上配置目标运行环境对应的接口;
在初始服务器上创建目标运行环境对应的Overlay网络,并在初始服务器上初始化集群环境;
在初始服务器上创建大数据组件基础镜像,大数据组件基础镜像用于为容器提供构建基础;
在初始服务器上生成目标密钥文件。
在一些实施例中,系统还包括网络模块,用于保证容器间的跨服务器通信。
在一些实施例中,网络模块,用于在获取到数据处理请求后,通过Overlay网络,向目标容器发送数据处理请求,目标容器用于基于数据处理请求实现数据处理过程,容器按照对部署界面中的待部署节点的拖拽操作在服务器上创建得到,容器用于提供大数据集群服务。
在一些实施例中,网络模块,在用于通过Overlay网络,向目标容器发送数据处理请求时,用于:
基于数据处理请求,确定至少一个目标容器;
通过Overlay网络,向至少一个目标容器发送数据处理请求。
在一些实施例中,在目标容器的数量大于等于2的情况下,至少一个目标容器至少包括第一目标容器和第二目标容器;
网络模块,在用于通过Overlay网络,向至少一个目标容器发送数据处理请求时,用于:
通过Overlay网络,向第一目标容器发送数据处理请求,第一目标容器用于通过Overlay网络与第二目标容器进行通信,以完成对数据处理请求的响应。
在一些实施例中,数据处理请求为数据存储请求、数据获取请求或数据删除请求。
在一些实施例中,系统还包括大数据组件插件模块,大数据组件插件模块用于实现容器在服务器上的启动。
根据本发明实施例的第四方面,提供一种计算设备,计算设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行计算机程序时实现如上述第一方面以及第一方面中任一项所提供的大数据集群部署方法所执行的操作,或者,处理器执行计算机程序时实现如上述第二方面以及第二方面中任一项的基于大数据集群的数据处理方法所执行的操作。
根据本发明实施例的第五方面,提供一种计算机可读存储介质,计算机可读存储介质上存储有程序,程序被处理器执行时,实现如第一方面以及第一方面中任一项所提供的大数据集群部署方法所执行的操作,或者,程序被处理器执行时,实现如上 述第二方面以及第二方面中任一项所提供的基于大数据集群的数据处理方法所执行的操作。
根据本发明实施例的第六方面,提供一种计算机程序产品,该计算机程序产品包括计算机程序,计算机程序被处理器执行时,实现如第一方面以及第一方面中任一项所提供的大数据集群部署方法所执行的操作,或者,计算机程序被处理器执行时,实现如上述第二方面以及第二方面中任一项所提供的基于大数据集群的数据处理方法所执行的操作。
本发明提供一个部署界面,以通过部署界面来提供大数据集群部署功能。该方法包括:响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点;响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点;响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上创建待部署节点对应的容器,以通过容器提供大数据集群服务。通过上述过程可以大大简化相关技术人员的操作,从而可以降低大数据集群的部署成本、提高部署效率。
另外,本发明通过Overlay网络保证大数据集群中各个容器之间的通信,以便在获取到数据处理请求时,可以通过Overlay网络,向目标容器发送数据处理请求,以便目标容器基于数据处理请求实现数据处理过程,以满足用户的数据处理需求。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本发明。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。
图1是根据本发明实施例示出的一种大数据集群部署方法。
图2是根据本发明实施例示出的一种部署界面的界面示意图。
图3是根据本发明实施例示出的一种增加物理池界面的界面示意图。
图4是根据本发明实施例示出的一种部署界面的界面示意图。
图5是根据本发明实施例示出的一种推荐部署过程的流程图。
图6是根据本发明实施例示出的一种部署界面的界面示意图。
图7是根据本发明实施例示出的一种节点分配过程的示意图。
图8是根据本发明实施例示出的一种拖拽功能的原理示意图。
图9是根据本发明实施例示出的一种修改样式数据的原理示意图。
图10是根据本发明实施例示出的多种大数据组件在进行不同操作时需要修改的配置文件示意图。
图11是根据本发明实施例示出的一种部署数据示意图。
图12是根据本发明实施例示出的一种部署数据示意图。
图13是根据本发明实施例示出的一种恢复出厂设置过程的流程示意图。
图14是根据本发明实施例示出的一种redis.conf配置文件的示意图。
图15是根据本发明实施例示出的一种Redis集群的搭建过程的示意图。
图16是根据本发明实施例示出的一种基于大数据集群的数据处理方法的流程图。
图17是根据本发明实施例示出的一种模块交互过程的流程图。
图18是根据本发明实施例示出的另一种模块交互过程的流程图。
图19是根据本发明实施例示出的另一种模块交互过程的流程图。
图20是本发明根据一示例性实施例示出的一种计算设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。
本发明提供了一种大数据集群部署方法以及基于大数据集群的数据处理方法,可以针对目前大数据平台占用机器数量多、部署和使用门槛高的问题,对大数据平台 进行轻量化改造,实现组件的容器化部署,以降低大数据平台所需的机器数量。而且,本发明所提供的方法,可以提供一种可视化的操作界面,也即是部署界面,以便相关技术人员可以通过简单的拖拽操作等实现大数据集群的部署,以降低大数据集群部署过程的技术门槛,使得相关技术人员可以快速完成集群部署、扩容、收缩、复位等功能的实现,提高部署效率,降低部署成本,让普通技术人员就可以完成部署。
本发明所提供的方法可以应用于计算设备,计算设备可以为服务器,如一台服务器、多台服务器、服务器集群等,本发明对计算设备的设备类型和设备数量不加以限定。
在介绍了本发明的实施环境之后,下面对本发明所提供的大数据集群部署方法和基于大数据集群的数据处理方法分别进行介绍。
参见图1,图1是根据本发明实施例示出的一种大数据集群部署方法,该方法包括:
步骤101、显示部署界面。
步骤102、响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点,节点为大数据组件所包括的、用于提供数据管理功能的服务。
其中,临时资源池相当于一个虚拟池,临时资源池是为了方便用户的拖拽操作所设置的,临时资源池中所存储的节点并非真正等待进行部署的节点。通过将根据节点创建操作所生成的待部署节点显示在临时资源池对应的界面区域(也即是临时资源池区域)中,以便后续将节点拖拽到部署资源池对应的界面区域(也即是部署资源池区域)中,以便基于部署资源池中的节点部署方式来进行部署。
步骤103、响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点。
其中,部署资源池中所显示的节点是真正要进行部署的节点,部署资源池中包括至少一个物理池,每个物理池都是一个实际的机器,通过部署资源池可以将不同机器的资源整合使用,相关技术人员可以根据实际需求来进行容器的部署。
步骤104、响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上创建待部署节点对应的容器,容器用于提供大数据集群服务。
本发明提供了一个部署界面,以通过部署界面来提供大数据集群部署功能。相关技术人员通过在部署界面中对节点的拖拽操作以及对组件的触发操作,后台即可响应于在部署界面中相应操作,自动完成容器的部署,以通过容器提供大数据集群服务。通过上述部署过程可以大大简化相关技术人员的操作,从而可以降低大数据集群的部署成本、提高部署效率。
在介绍了本发明所提供的大数据集群部署方法的基本实现过程之后,下面介绍本发明的各个可选实施例。
在一些实施例中,为保证本发明的顺利进行,可以预先进行一些准备工作。可选地,可以预先准备至少一台服务器,用于部署大数据集群,且保证服务器间通信畅通。例如,可以预先准备n台服务器,n可以为大于等于的正整数,为便于说明,可以将这n台服务器记为S1、S2、…、Sn。
可选地,可以在这至少一台服务器中任选一台服务作为初始服务器,以便在初始服务器上预先部署所需的网络环境,以通过所部署的网络环境,来实现大数据集群的搭建。
在一种可能的实现方式中,在初始服务器上预先部署所需的网络环境的过程可以包括如下步骤:
步骤一、在初始服务器上安装目标运行环境,并在初始服务器上配置目标运行环境对应的接口。
其中,目标运行环境可以为应用引擎容器(Docker)环境,目标运行环境对应的接口可以为Docker应用程序接口(Application Programming Interface,API)。
例如,可以以服务器S1作为初始服务器,从而可以在服务器S1上安装Docker环境,并且在服务器S1上配置Docker API,以便后续可以通过Docker API支持对其他服务器上的Docker引擎的操作(例如,支持以RESTful方式来操作Docker引擎)。
步骤二、在初始服务器上创建目标运行环境对应的Overlay网络,并在初始服务器上初始化集群环境。
其中,Overlay网络是使用网络虚拟化在物理基础设施之上建立连接的逻辑网络。相对于Overlay网络的概念,大数据集群中还包括UnderLay网络,Underlay网络是负责传递数据包的物理网络,由交换机和路由器等设备组成,借助以太网协议、路由协议和VLAN协议等驱动。相比于UnderLay网络,Overlay实现了控制平面与转发 平面的分离,以满足容器的跨主机通信需求。
通过OverLay技术,可以在对物理网络不做任何改造的情况下,通过隧道技术在现有的物理网络上创建了一个或多个逻辑网络,有效解决物理数据中心存在的诸多问题,实现了数据中心的自动化和智能化。
例如,可以在作为初始服务器的服务器S1上创建Docker环境的Overlay网络,并在初始服务器上进行Docker Swarm集群环境的初始化。
通过在初始服务器上创建目标运行环境对应的Overlay网络,使得后续大数据集群中所包括的组成部分之间可以通过Overlay网络来进行通信,以满足大叔级集群内部的通信需求。而通过在初始服务器上初始化集群环境,以提供大数据集群的搭建基础,以便后续可以基于初始服务器来实现大数据集群的搭建。
步骤三、在初始服务器上创建大数据组件基础镜像,大数据组件基础镜像用于为容器提供构建基础。
例如,可在初始服务器上创建大数据组件基础Docker镜像,以便通过大数据组件基础Docker镜像,来提供大数据集群服务容器的启动功能。
需要说明的是,为了便于后续的容器化部署,可以将各种大数据组件所需的环境和软件预先打包成Docker镜像磁带归档(Tape Archive,Tar)包,并且预先将打包好的Docker镜像Tar包上传到初始服务器中,以便初始服务器可以通过对Docker镜像Tar包进行安装,以实现大数据组件基础镜像的创建。
其中,大数据集群可以包括多种类型的组件,例如,可以包括分布式文件系统(Hadoop Distributed File System,HDFS)组件、资源协调者(Yet Another Resource Negotiator,YARN)组件、分布式应用程式协调服务(Zookeeper)组件、数据库工具(Clickhouse)组件、数据仓库工具(Hive)组件、安全管理(Knox)组件、监控工具(如Prometheus、Grafana)组件等,此外,大数据集群还可以包括其他类型的组件,本发明对具体的组件类型不加以限定。
可选地,不同类型的大数据组件所需的环境和软件可以被打包到一个Docker镜像Tar包中,或者,可以将不同类型的大数据组件所需的环境和软件打包成不同的Docker镜像Tar包,例如,将HDFS组件、YARN组件和Zookeeper组件打包成一个Docker镜像Tar包,将Clickhouse组件、Hive组件、Knox组件、监控工具组件分别打包成一个Docker镜像Tar包,本发明对具体采用哪种方式不加以限定。
相应地,由于不同类型的大数据组件可以被一起打包成一个Docker镜像Tar包,还可以被分开打包成多个Docker镜像Tar包,因而,在初始服务器上创建大数据组件基础镜像时,可以对包括多种类型的大数据组件所需的环境和软件的Docker镜像Tar包进行安装,以创建一个完整的大数据组件基础镜像,来满足多种类型的服务容器的启动需求,或者,可以对多个Docker镜像Tar包分别进行安装,以得到多个大数据组件基础镜像,以便通过各个大数据组件基础镜像,分别满足对应类型的服务容器的启动需求。
另外,需要说明的是,除了上述涉及到的大数据组件,开发人员还可以根据需求开发其他大数据组件对应的Docker镜像Tar包,并通过在初始服务器上上传并安装Docker镜像Tar包的方式热扩展增加支持的组件。
步骤四、在初始服务器上生成目标密钥文件。
可选地,可以采用对称加密算法或非对称加密算法来实现密钥的生成,此外,还可以采用其他算法来实现密钥的生成,本发明对此不加以限定。
其中,目标密钥可以为安全外壳协议(Secure Shell,SSH)公私钥。
通过生成目标密钥文件,以便后续可以在大数据集群中添加了新的服务器或创建了新的容器时,可以与新添加的服务器或新创建的容器共享目标密钥文件,使得后续大数据集群的服务器与服务器之间、或者服务器与容器之间均可以通过目标密钥实现免密通信。
通过上述过程即可完成构建大数据集群所需的基础网络环境的构建,从而使得后续可以基于已构建的基础网络环境,来将其他服务器添加至大数据集群中,以构建包括多个服务器的大数据集群。而且,还可以基于已构建的基础网络环境,在大数据集群中进行容器部署,以实现通过所部署的容器来为用户提供服务。
需要说明的是,上述过程仅介绍了对大数据集群的初始服务器的处理过程,在更多可能的实现方式中,相关技术人员还可以根据自己的实际需求,在大数据集群中添加服务器,以搭建包括多个服务器的大数据集群。
在一些实施例中,可以在部署界面中设置新增物理池控件,从而可以通过新增物理池控件,来将某个服务器加入大数据集群。例如,可以将新增物理池控件设置在部署资源池区域。参见图2,图2是根据本发明实施例示出的一种部署界面的界面示意图,如图2所示,该部署界面被划分为节点创建区域、临时资源池区域和部署资源 池区域,其中,部署资源池区域中所设置的“增加物理池”按钮即为新增物理池控件,通过“增加物理池”按钮,即可实现将服务器加入大数据集群。
通过在部署界面中设置新增物理池控件,以便用户可以根据实际的技术需求,来实现在大数据集群中增加服务器,以使所创建的大数据集群可以满足技术需求,进而保证后续数据处理过程的顺利进行。
在一种可能的实现方式中,可以通过如下过程实现物理池的新增,从而实现将某个服务器加入大数据集群:
步骤一、响应于对新增物理池控件的触发操作,显示增加物理池界面,增加物理池界面包括标识获取控件和密码获取控件。
参见图3,图3是根据本发明实施例示出的一种增加物理池界面的界面示意图,在新增物理池控件被触发后,即可在可视化界面上显示如图3所示的增加物理池界面,其中,文字提示为“IP”的输入框即为标识获取控件,文字提示为“密码”的输入框即为密码获取控件。
步骤二、通过标识获取控件获取待增加的物理池对应的服务器标识,通过密码获取控件获取待验证密码。
在一种可能的实现方式中,相关技术人员可以在标识获取控件中输入待加入大数据集群的服务器的服务器标识,在密码获取控件中输入预先设定好的密码,以便计算设备可以通过标识获取控件获取到待增加的物理池对应的服务器标识,通过密码获取控件获取到待验证密码。
可选地,在通过标识获取控件获取到待增加的物理池对应的服务器标识,通过密码获取控件获取到待验证密码之后,即可对待验证密码进行验证。
步骤三、在待验证密码验证通过的情况下,在部署资源池区域显示待增加的物理池。
通过设置标识获取控件,以便用户可以自行在标识获取控件中输入要加入大数据集群的服务器的服务器标识,以满足用户的定制化需求,而通过设置密码获取控件,以便用户可以在密码获取界面中输入待验证密码,以基于待验证密码对用户身份进行验证,以确定用户是否有权进行将服务器加入大数据集群的过程,从而保证大数据集群部署过程的安全性。
在一种可能的实现方式中,可以通过如下过程实现物理池的新增:
在待验证密码验证通过的情况下,在待验证密码验证通过的情况下,生成第二请求报文,从而将第二请求报文存储至第二消息队列中,进而从第二消息队列中获取第二请求报文,基于第二请求报文,向待增加的物理池对应的服务器发送安装文件,以便服务器可以在接收到安装文件的情况下对安装文件进行安装,以使服务器加入大数据集群。
其中,第二请求报文可以为JS对象简谱(Java Script Object Notation,JSON)格式的请求报文数据。可选地,第二请求报文还可以为其他类型的报文数据,本发明对此不加以限定。
以第二请求报文为JSON格式的请求报文数据为例,第二请求报文可以为如下形式的代码:
{
ip":"10.10.177.18",
"password":"root"
}
上述仅为一种示例性的第二请求报文,并不构成对第二请求报文的限定。
通过生成第二请求报文,以便计算设备可以基于第二请求报文实现相应的处理过程。
其中,安装文件可以包括各种类型的组件对应的Docker镜像Tar包以及红帽软件包管理器(Red-Hat Package Manager,RPM)安装包等,以便计算设备可以在接收到安装文件的情况下,通过安装RPM包实现预设脚本(包括第一预设脚本和第二预设脚本)的安装,其中,第一预设脚本用于实现环境安装功能,第二预设脚本用于实现集群加入功能,因而,计算设备可以通过第一预设脚本来对Docker镜像Tar包进行安装,以在待加入集群的服务器上实现环境安装,再在待加入集群的服务器上执行第二预设脚本,以使该服务器可以加入初始服务器的Docker Swarm集群。
可选地,计算设备可以关联有一个数据库,该数据库可以用于对大数据集群中的部署记录进行存储。例如,该数据库中可以包括第三部署表,第三部署表可以用于对增加物理池的操作进行记录。
在一些实施例中,在待验证密码验证通过的情况下,可以在第三部署表中生成服务器部署记录,服务器部署记录用于记录待增加的物理池对应的部署操作。
其中,可以在服务器部署记录中记录待增加的物理池对应的服务器的初始化状态,初始化状态至少包括待初初始化、初始化中、初始化错误和初始化完成,以便计算设备可以基于服务器部署记录中所记录的初始化状态,实现待增加的物理池在部署资源池区域的显示。
可选地,可以基于服务器部署记录中所记录的初始化状态,将待增加的物理池显示为不同的颜色。例如,在服务器部署记录中所记录的初始化状态为待初始化或初始化中时,可以将待增加的物理池显示为蓝色;在服务器部署记录中所记录的初始化状态为初始化完成时,可以将待增加的物理池显示为白色;服务器部署记录中所记录的初始化状态为初始化错误时,可以将待增加的物理池显示为红色,以便相关技术人员可以直观地观察到服务器的初始化状态。
例如,在待验证密钥刚刚验证通过,即使尚未开始在相应的服务器上进行文件安装,但部署资源池中已经显示有待增加的物理池,此时,该待增加的物理池对应的服务器在服务器部署记录中的初始化状态记录为待初始化,相应地,部署资源池区域所显示的待增加的物理池为蓝色。而在服务器上已经开始进行文件安装的情况下,该待增加的物理池对应的服务器在服务器部署记录中所记录的初始化状态为初始化中,此时,部署资源池区域所显示的待增加的物理池仍为蓝色。在服务器成功加入大数据集群的情况下,该待增加的物理池对应的服务器在服务器部署记录中所记录的初始化状态为初始化完成,相应地,部署资源池区域所显示的待增加的物理池为白色。另外,如果在初始化过程中出现任何问题导致服务器未成功加入大数据集群,则该待增加的物理池对应的服务器在服务器部署记录中所记录的初始化状态为初始化错误,相应地,部署资源池区域所显示的待增加的物理池为红色。
需要说明的是,计算设备可以每隔预设时长查询一次服务器初始化状态,从而基于查询到的初始化状态对待增加的物理池的显示方式进行更新。其中,预设时长可以为10秒,可选地,预设时长还可以为其他时长,本发明对预设时长的具体取值不加以限定。
可选地,还可以在服务器部署记录中记录失败原因,以便相关技术人员可以进行问题排查,其中,失败原因可以为IP地址或密码错误、无法连接待增加的物理池对应的服务器等。另外,还可以在服务器部署记录中记录请求触发时间、待增加的物理 池对应服务器的服务器标识、待增加的物理池对应的服务器成功加入大数据集群的时间等,本发明对服务器部署记录所包括的具体内容不加以限定。
通过在数据库中维护第三部署表,以便可以在第三部署表中记录大数据集群在服务器层面的变更,以便后续可以从第三部署表中查询到所需的操作记录,以更加全面地满足用户需求。
需要说明的是,在服务器初始化完成之后,该服务器即可与大数据集群中已有的服务器组成Docker Swarm集群。另外,在服务器初始化错误的情况下,服务器即无法成功加入大数据集群,此时,服务器部署记录中所记录的该服务器的初始化状态即为初始化错误,而且,服务器部署记录中还记录有失败原因,则计算设备在查询到初始化状态为初始化错误的情况下,即可获取服务器部署记录中所记录的失败原因,从而即可基于获取到的失败原因显示第一提示信息,以便可以通过第一提示信息指示服务器未成功加入大数据集群的原因,便于相关技术人员及时进行针对性处理。
上述过程是以生成第二请求报文后,即基于第二请求报文来进行待验证密码的校验为例来进行说明的。可选地,在生成第二请求报文后,可以将第二请求报文存储至第二消息队列中,以便后续可以从第二消息队列中获取第二请求报文,来执行基于第二请求报文,对待验证密码进行验证的过程。
通过采用消息队列来对请求报文进行存储,可以实现显示侧对用户请求的同步处理以及后台的异步处理,以保证用户可以在当前请求报文未被真正处理完成的情况下,用户可以继续通过部署界面进行操作,而不会影响后台的处理过程,以保证后续的用户请求可以得到及时的响应。
此外,通过消息队列对请求报文进行存储,可以保证请求报文处理出现问题的情况下,可以从消息队列中重新获取未被成功处理的请求报文进行重试,而无需用户重新在界面上手动进行操作,简化用户操作,从而可以提高用户体验。
另外,在一些实施例中,在待验证密码验证通过的情况下,可以向待增加的物理池对应的服务器发送目标密钥,以便通过目标密钥用于在后续通信过程中实现身份验证,无需登录即可保证通信过程的安全性。
需要说明的是,上述过程是以在大数据集群中添加一个服务器的过程为例来进行说明的,重复上述过程即可完成将多个服务器加入大数据集群,具体过程可以参见上述实施例,此处不再赘述。
参见图4,图4是根据本发明实施例示出的一种部署界面的界面示意图,如图4所示,相较于如图2所示的部署界面,大数据集群中已经在初始服务器(服务器标识为10.10.177.19)的基础上增加了两个服务器,这两个服务器的服务器标识分别为10.10.177.18和10.10.177.20。
可选地,在部署资源池区域中对物理池进行显示时,还可以显示有物理池的相关信息,如物理池所对应的服务器的服务器标识、当前存储使用率、内存占用率和分配内存占用率等。
例如,对于部署资源池区域中所显示的任一物理池,可以在物理池的第二目标位置处显示物理池所对应的服务器的服务器标识,在物理池的第三目标位置处显示物理池所对应的服务器的当前存储使用率、内存占用率和分配内存占用率。
其中,第二目标位置可以为所显示的物理池的左上角,第三目标位置可以为所显示的物理池的右下角。仍以图4所示的部署界面为例,如图4所示的部署界面即在各个物理池的左上角显示有对应服务器的服务器标识,在各个物理池的右下角显示有对应服务器的当前存储使用率、内存占用率和分配内存占用率。可选地,第二目标位置和第三目标位置还可以为其他位置,本发明对此不加以限定。
通过在物理池中对相应服务器的相关状态信息进行显示,以便用户可以直观地看到各个物理池对应的服务器的一些状态数据,如当前存储使用率、内存占用率和分配内存占用率等,以使用户可以及时获知各个服务器的状态,便于用户基于各个服务器的状态来进行容器的创建。
可选地,可以实时获取各个服务器的状态数据,从而基于获取到的状态数据,来在物理池上对应的位置处显示各个服务器的状态。
通过基于实时获取到的状态数据实现服务器状态的显示,可以保证数据的实时性和有效性,从而使得用户通过所显示的内容获取到的服务器状态更加真实可靠。
通过上述过程即可完成大数据集群的硬件环境的搭建,以得到包括至少一台服务器的大数据集群,从而即可在这至少一台服务器上进行容器化部署,以使大数据集群可以为用户提供大数据处理功能。
在一些实施例中,部署界面包括节点创建区域,节点创建区域包括节点创建控件和至少一个大数据组件。
需要说明的是,虽然大数据组件可以包括HDFS组件、YARN组件、Clickhouse 组件、Hive组件、Knox组件、监控工具组件等多种类型的组件,但有些组件是在配置初始服务器时就会默认配置好的,无需用户手动操作,因而,上述几种大数据组件并非都会显示在节点创建区域中。一般情况下,节点创建区域所显示的组件可以包括HDFS组件、YARN组件、Clickhouse组件、Hive组件。
为便于理解,下面对各个组件适用的场景进行介绍。以大数据组件包括HDFS组件、YARN组件、Clickhouse组件和Hive组件为例,HDFS组件可以用于提供数据存储功能,也即是,若要为用户提供数据存储功能,则需要在大数据集群中部署HDFS组件的节点对应的容器,以便通过所部属的容器为用户提供数据的分布式存储服务,以满足用户的需求。
YARN组件可以用于提供数据分析功能,也即是,如要为用户提供数据分析功能,则需要在大数据集群中部署YARN组件的节点对应的容器,以便通过YARN组件的节点对应的容器,从HDFS组件的节点对应的容器中获取数据,并基于获取到的数据进行数据分析,以满足用户的数据分析需求。
Hive组件可以将HDFS组件的节点对应的容器中所存储的数据转换为一张可查询的数据表,以便可以基于数据表进行数据查询及处理等过程,以满足用户的数据处理需求。
需要说明的是,虽然YARN组件和Hive组件都可以为用户提供数据分析功能,但有所不同的是,如果要用YARN组件实现数据分析过程,则需要开发一系列代码,以在将数据处理任务提交到YARN组件中之后,通过所开发的代码来基于数据处理任务进行相应的数据处理过程,而如果要用Hive组件实现数据分析过程,仅需使用结构化查询语言(Structured Query Language,SQL)语句即可实现对数据处理任务的处理。
Clickhouse组件为一种列式存储数据库,可以用于满足用户对大量数据的存储需求,相较于常用的行式存储数据库,Clickhouse组件的读取速度更快,而且,Clickhouse组件可以对数据分区进行存储,用户可以根据自己的实际需求,仅获取某一个或某几个分区中的数据来进行处理,而无需获取数据库中的所有数据,从而可以降低计算设备的数据处理压力。
仍以如图4所示的部署界面为例,在如图4所示的部署界面中,所显示的大数据组件包括HDFS组件、YARN组件、Hive组件和Clickhouse组件,如图4所示的部署界面中的“应用”按钮即为节点创建控件。
在一些实施例中,对于步骤102,在响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点时,可以通过如下方式实现:
在任一大数据组件被选中的情况下,响应于对节点创建控件的触发操作,在临时资源池区域中显示被选中的大数据组件对应的待部署节点。
需要说明的是,不同的组件所包括的节点是不同的,下面对各个组件所包括的节点类型进行说明。
HDFS组件包括NameNode(nn)节点、DataNode(dn)节点和SecondaryNameNode(sn)节点,YARN组件包括ResourceManager(rm)节点和NodeManager(nm)节点,Hive组件包括Hive(hv)节点,Clickhouse组件包括Clickhouse(ch)节点。
基于上述组件与节点之间的关系,计算设备即可根据被选中的大数据组件,来在临时资源池区域显示相应的节点,作为待部署节点。
需要说明的是,由于大数据组件还可能有版本的区分,或者说,同一个大数据组件可能有多个可选的版本,因而,节点创建区域还可以设置有节点参数设置控件,节点参数设置控件可以用于设置待部署节点的版本。
基于此,在一些实施例中,在响应于对节点创建控件的触发操作,在临时资源池区域中显示被选中的大数据组件对应的待部署节点时,可以响应于对节点创建控件的触发操作,在临时资源池区域中显示通过节点参数设置控件所设置的版本对应的待部署节点。
例如,HDFS组件、YARN组件可以包括高可用(High Availability,HA)版本和非高可用版本。需要说明的是,HDFS组件和YARN组件的版本需要保持一致。
仍以如图4所示的部署界面为例,在被选中的组件为HDFS组件或YARN组件的情况下,标记文字为“节点参数”下“HA”对应的勾选框即为节点参数设置控件,在“HA”对应的勾选框被选中的情况下,即表示要部署的HDFS组件或YARN组件为HA版本,在“HA”对应的勾选框未被选中的情况下,即表示要部署的HDFS组件或YARN组件为非HA版本。另外,需要说明的是,由于Hive组件和Clickhouse组件无需区分HA版本和非HA版本,因此,在被选中的组件为Hive组件或Clickhouse组件的情况下,标记文字为“节点参数”下所显示的文字即会变为“无”,从而使得无需对Hive组件和Clickhouse组件的版本进行区分。
通过在部署界面中设置节点参数设置控件,以便用户可以根据实际技术需求, 对大数据组件的版本进行选择,从而可以满足用户的定制化需求。
此外,在已经通过节点参数设置控件设置了版本的情况下,如果节点已经开始部署,则无法对所设置的版本进行修改,而如果节点还未开始部署,则用户还可以对所设置的版本进行修改,相应地,在版本修改后临时资源池区域所显示的节点会被清空,以便用户重新进行节点的创建。
需要说明的是,在已选择大数据组件的情况下,每次触发节点创建控件时要添加的待部署节点的节点类型和节点数量都是预先设置好的。而一般情况下,HDFS组件、YARN组件、Hive组件和Clickhouse组件在大数据集群中的使用最为广泛,下面主要以HDFS组件、YARN组件、Hive组件和Clickhouse组件为例来进行说明。
其中,不同版本、不同组件的初始状态(也即是第一次点击节点创建控件的时候)下的节点类型和节点数量如下:
HA版本的HDFS组件的初始状态:1个nn节点、1个sn节点、4个dn节点;
非HA版本的HDFS组件的初始状态:3个nn节点、4个dn节点;
HA版本的YARN组件的初始状态:1个rm节点、1个nm节点;
非HA版本的YARN组件的初始状态:3个rm节点、1个nm节点;
Hive组件:1个hv节点;
Clickhouse组件:1个ch节点。
其中,nn节点是HDFS组件的核心节点,用于提供数据管控功能,而非HA版本的HDFS组件仅包括1个nn节点,一旦该节点发生故障,则会导致HDFS组件无法再提供相应的功能,然而,HA版本的HDFS组件包括3个nn节点,其中,1个nn节点处于激活(Active)状态,另外2个nn节点处于准备(Standby)状态,开始时可以由处于Active状态的nn节点进行工作,而一旦处于Active状态的nn节点出现故障,即可激活处于Standby状态的nn节点,以保证HDFS组件的正常工作,从而可以达到高可用的效果。
同理,rm节点是YARN组件的核心节点,用于提供数据管控功能,而非HA版本的YARN组件仅包括1个rm节点,一旦该节点发生故障,则会导致YARN组件无法再提供相应的功能,然而,HA版本的YARN组件包括3个rm节点,其中,1个rm节点处于Active状态,另外2个rm节点处于Standby状态,开始时可以由处于Active 状态的rm节点进行工作,而一旦处于Active状态的rm节点出现故障,即可激活处于Standby状态的rm节点,以保证YARN组件的正常工作,从而可以达到高可用的效果。
另外,需要说明的是,对于HDFS组件的nn节点和sn节点、YARN组件的rm节点和nm节点、Hive组件的hv节点以及Clickhouse组件的ch节点,其在初始状态下的节点数量是基于Hadoop架构的技术需求确定,而对于HDFS组件的dn节点,其在初始状态下的节点数量是由于HDFS组件的默认副本数为3,为保证在各个副本上移动节点不会丢失数据,因此设置了4个dn节点。
需要说明的是,对于HDFS组件,用户可以根据实际技术需求增加dn节点的数量,但nn节点和sn节点的数量是无法增加的;对于YARN组件,用户可以根据实际技术需求增加nm节点的数量,但rm节点的数量是无法增加的;对于Hive组件和Clickhouse组件,其对应的节点(也即是hv节点和ch节点)的数量均是无法增加的。
为便于理解,下面以表格的形式对每次触发节点创建控件时要添加的待部署节点的节点类型和节点数量进行说明,参见如下表1:
表1
Figure PCTCN2022106091-appb-000001
通过上表可以看出,在大数据组件为HA版本的HDFS组件的情况下,第一次点击节点创建控件时可以在临时资源池区域添加3个nn节点和4个dn节点,第二次点击节点创建控件时可以在临时资源池区域添加1个dn节点,以此类推,后续每次点击创建控件时均可以在临时资源池区域添加1个dn节点;在大数据组件为非HA版本的HDFS组件的情况下,第一次点击节点创建控件时可以在临时资源池区域添加1个nn节点、1个sn节点、4个dn节点,第二次点击节点创建控件时可以在临时资源池区域添加1个dn节点,以此类推,后续每次点击创建控件时均可以在临时资源池区域 添加1个dn节点;在大数据组件为HA版本的YARN组件的情况下,第一次点击节点创建控件时可以在临时资源池区域添加3个rm节点和4个nm节点,第二次点击节点创建控件时可以在临时资源池区域添加1个nm节点,以此类推,后续每次点击创建控件时均可以在临时资源池区域添加1个nm节点;在大数据组件为非HA版本的YARN组件的情况下,第一次点击节点创建控件时可以在临时资源池区域添加1个rm节点和1个nm节点,第二次点击节点创建控件时可以在临时资源池区域添加1个nm节点,以此类推,后续每次点击创建控件时均可以在临时资源池区域添加1个nm节点;在大数据组件为Hive组件的情况下,第一次点击节点创建控件时可以在临时资源池区域添加1个hv节点,后续再点击节点创建控件也无法再增加Hive组件对应的节点个数;在大数据组件为Clickhouse组件的情况下,第一次点击节点创建控件时可以在临时资源池区域添加1个ch节点,后续再点击节点创建控件也无法再增加Clickhouse组件对应的节点个数。
下面以几个示例性的节点创建过程为例,来对本发明的节点创建过程进行说明。
例如,仍以如图2所示的部署界面为例,在仅要部署非HA版本的HDFS组件对应的节点的情况下,在如图2所示的界面中选中HDFS组件,使得HDFS组件处于被选中状态,再点击“应用”按钮(也即是节点创建控件),即可在临时资源池区域显示1个nn节点、1个sn节点和4个dn节点。
又例如,仍以如图2所示的部署界面为例,在要部署HDFS组件、YARN组件、Hive组件和Clickhouse组件对应的节点的情况下,在如图2所示的界面中先选中HDFS组件,使得HDFS组件处于被选中状态,再点击“应用”按钮(也即是节点创建控件),即可在临时资源池区域显示1个nn节点、1个sn节点和4个dn节点;然后选中YARN组件,使得YARN组件处于被选中状态,再点击“应用”按钮,即可在临时资源池区域显示1个rm节点和1个nm节点;再选中Hive组件,使得Hive组件处于被选中状态,再点击“应用”按钮,即可在临时资源池区域显示1个hv节点;最后选中Clickhouse组件,使得Clickhouse组件处于被选中状态,再点击“应用”按钮,即可在临时资源池区域显示1个ch节点。
又例如,仍以如图2所示的部署界面为例,在仅要部署HA版本的HDFS组件对应的节点的情况下,在如图2所示的界面中选中HDFS组件,使得HDFS组件处于被选中状态,再勾选“HA”对应的勾选框,然后再点击“应用”按钮(也即是节点创 建控件),即可在临时资源池区域显示3个nn节点4个dn节点。
又例如,仍以如图2所示的部署界面为例,在要部署HA版本的HDFS组件、HA版本的YARN组件、Hive组件和Clickhouse组件对应的节点的情况下,在如图2所示的界面中先选中HDFS组件,使得HDFS组件处于被选中状态,再勾选“HA”对应的勾选框,然后再点击“应用”按钮(也即是节点创建控件),即可在临时资源池区域显示3个nn节点和4个dn节点;然后选中YARN组件,使得YARN组件处于被选中状态,再勾选“HA”对应的勾选框,然后再点击“应用”按钮,即可在临时资源池区域显示3个rm节点和1个nm节点;再选中Hive组件,使得Hive组件处于被选中状态,再点击“应用”按钮,即可在临时资源池区域显示1个hv节点;最后选中Clickhouse组件,使得Clickhouse组件处于被选中状态,再点击“应用”按钮,即可在临时资源池区域显示1个ch节点。
需要说明的是,在部署了各个大数据组件对应的节点后,即可基于各种类型的节点的预估占用内存,来确定各个服务器的内存占用情况。需求强调的是,如果将HDFS组件和YARN组件部署为HA版本,虽然前端部署界面并未展示出Zookeeper(简称zk)节点,但在实际的集群部署过程中zk节点是需要部署的,因而在确定预估占用内存时,需要增加3个zk节点的占用内存。
其中,各种类型的节点的预估占用内存可以参见如下表2:
表2
Figure PCTCN2022106091-appb-000002
计算设备即可根据如表2中所示的数据以及用户所部署的各个节点的个数,实现预估占用内存的确定。
上述过程是以用户根据自己的实际需求选择所需的大数据组件,并根据自己的需求来进行节点的创建,在更多可能的实现方式中,本发明还可以提供配置推荐功能,当用户无法确定需要部署的大数据组件的类型和需要部署的节点个数时,可以通过本发明所提供的配置推荐功能,来获取推荐的较优配置方式,该配置方式也即包括部署 的大数据组件的类型和需要部署的节点个数。
在一些实施例中,可以通过如下过程实现配置推荐过程:
步骤一、显示部署指示界面。
步骤二、通过部署指示界面获取待部署的大数据组件类型、组件版本和目标数据,目标数据用于指示数据处理需求所需的每秒存储数据条数。
其中,部署指示界面可以提供多个可部署的大数据组件选项、候选组件版本和一个数据获取控件。可选地,大数据组件选项可以被设置为勾选框,以便用户可以根据自己的实际需求勾选要部署的大数据组件,以便计算设备可以获取到待部署的大数据组件类型;候选组件版本的形式可以参见如上所述的节点参数设置控件,此处不再赘述;数据获取控件可以被提供为一个输入框,以便用户可以通过部署指示界面上所设置的数据获取控件,来输入指示数据处理需求所需的每秒存储数据条数的目标数据,从而使得计算设备可以通过部署指示界面获取到目标数据。
步骤三、基于待部署的大数据组件类型、组件版本、目标数据和预设参数,确定各种类型的待部署节点对应的推荐部署个数。
需要说明的是,基于待部署的大数据组件类型和组件版本,即可确定出nn节点、sn节点、rm节点、hv节点和ch节点的推荐部署数量。而对于dn节点和nm节点的推荐部署个数,可以通过如下方式确定:
为便于理解,先介绍一下预设参数的,预设参数可以为预先设置好的每秒存储数据条数阈值。基于此,在确定dn节点的推荐部署个数时,可以对目标数据和预设参数的大小进行比较,在目标数据小于等于预设参数的情况下,可以将dn节点的推荐部署个数确定为4;在目标数据大于预设参数的情况下,可以按照“dn节点的推荐部署个数=目标数据/(预设参数/4)”的公式,来确定出dn节点的推荐部署个数。而在确定nm节点的推荐部署个数时,也可以基于目标数据和预设参数的大小比较结果,来进行nm节点的推荐部署个数的确定,在目标数据小于等于预设参数的情况下,可以将nm节点的推荐部署个数确定为1;而在目标数据大于预设参数的情况下,可以将dn节点的推荐部署个数的一半确定为nm节点的推荐部署个数。
此外,计算设备还可以基于各种类型的待部署节点的推荐部署个数,确定预估占用内存。在确定预估占用内存时,需求强调的是,如果将HDFS组件和YARN组件部署为HA版本,虽然前端部署界面并未展示出Zookeeper(简称zk)节点,但在实 际的集群部署过程中zk节点是需要部署的,因而在确定预估占用内存时,需要增加3个zk节点的占用内存。
下面以部署HA版本的HDFS组件、HA版本的YARN组件、Hive组件、Clickhouse组件,目标数据为40w/s(每秒存储数据条数),预设参数为20w/s为例,确定各种类型的节点的推荐部署个数以及预估占用内存:
其中,dn节点的推荐部署个数为40/(20/4)=8,nm节点的推荐部署个数为4,为便于查看,下面以表格的形式介绍各种类型的节点的推荐部署个数,参见如下表3:
表3
Figure PCTCN2022106091-appb-000003
需要说明的是,在确定出各种类型的待部署节点的推荐部署个数后,即可在部署指示界面中显示所确定出的推荐部署个数,以便用户进行查看。
可选地,还可以在部署指示界面中显示提示信息,提示信息可以用于提示推荐部署个数仅供参考,用户可以根据实际情况,来增加或减少待部署节点的个数。
另外,用户还可以根据实际情况,来将待部署节点部署到多个物理池中。此外,需要注意的是,如果待部署节点为HA版本,则可以通过可视化界面向用户建议至少设置3个物理池,以将3个nn或3个rm分别部署到不同的服务器上,以真正实现大数据集群的高可用。
另外,需要说明的是,HA版本的HDFS组件和YARN组件需要用到Zookeeper集群,因此,若用户选择的大数据组件是HA版本的HDFS组件或YARN组件,则计算设备会在部署HDFS组件或YARN组件对应节点的服务器上默认部署一个3节点的Zookeeper集群,因而,Zookeeper组件无需展示在前端的部署界面中,也可在需要的时候完成Zookeeper组件的部署。而在HA版本的HDFS组件和YARN组件、以及其他组件中,则无需部署Zookeeper组件。
此外,需要强调的是,不同大数据组件之间是有依赖关系的,因而,在进行节点的创建时,需要相关技术人员按照组件之间的依赖关系来进行大数据组件的选择, 从而进行节点的创建。
例如,YARN组件对应节点的部署与使用需要基于HDFS组件,也即是,要先部署HDFS组件对应的节点,再部署YARN组件对应的节点,如果在未部署HDFS组件的对应节点的情况下直接部署YARN组件对应的节点,前端页面会提示错误;Hive组件对应节点的部署与使用需要基于HDFS和YARN组件,也即是,要先部署HDFS组件和YARN组件对应的节点,再部署Hive组件对应的节点,如果在未部署HDFS和YARN组件的对应节点的情况下直接部署Hive组件对应的节点,前端页面会提示错误;Clickhouse组件对应节点是独立的节点,与其他组件对应的节点之间无依赖关系。
上述推荐部署过程的流程可以参见图5,图5是根据本发明实施例示出的一种推荐部署过程的流程图,如图5所示,可以通过部署指示界面获取待部署的大数据组件类型和组件版本,如果在待部署的大数据组件仅包括Clickhouse组件的情况下,即可直接确定出ch节点的推荐部署个数为1,从而即可确定出预估占用内存;而在待部署的大数据组件还包括除Clickhouse组件外的其他组件的情况下,可以通过部署指示界面获取目标数据,从而确定目标数据是否大于预设参数,在目标数据大于预设参数的情况下可以通过上述过程中所描述的公式来进行节点推荐部署个数的确定,而在目标数据小于等于预设参数的情况下,即可采用默认推荐节点个数作为推荐部署个数;进一步地,还需要根据组件版本确定要部署的节点是否包括HA版本的节点,在包括HA版本的节点的情况下,需要增加HA版本的相关节点(如zk节点),从而再基于推荐部署个数以及增加的HA版本的相关节点来进行预估占用内存的确定,而在不需要增加HA版本的相关节点的情况下,直接基于推荐部署个数来进行预估占用内存的确定即可。
需要说明的是,无论是用户根据自己的需求自行创建待部署节点,还是根据计算设备推荐的节点个数来进行待部署节点的创建,在创建了待部署节点之后,待部署节点即可显示在临时资源池区域,用户可以将临时资源池区域的待部署节点拖拽到部署资源池区域的物理池中,以便计算设备可以通过步骤103,响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点。
其中,部署资源池区域可以包括至少一个物理池。在一些实施例中,对于步骤103,在响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源 池区域中的物理池中显示待部署节点时,对于任一待部署节点,可以响应于对待部署节点的拖拽操作,将待部署节点显示在拖拽操作结束时所指示的物理池中。
通过为部署界面中所显示的节点提供拖拽功能,以便用户可以根据实际技术需求,来将各个待部署节点拖拽到对应的物理池中,以满足用户的定制化需求。
可选地,部署界面还可以设置有自动分配控件,以便用户可以通过自动分配控件,来将临时资源池区域中的待部署控件自动分配到部署资源池区域内的各个物理池中。
参见图6,图6是根据本发明实施例示出的一种部署界面的界面示意图,如图6所示,图6中的“自动分配”按钮即为自动分配控件,临时资源池中已添加了HDFS组件对应的节点(也即是3个nn节点和4个dn节点)、YARN组件对应的节点(也即是3个rm节点和1个nm节点)、Hive组件对应的节点(也即是1个hv节点)和Clickhouse组件对应的节点(也即是1个ch节点),通过触发“自动分配”按钮,即可将临时资源池中的这些节点自动分配到部署资源池的各个物理池中。
通过在部署界面中设置自动分配控件,使得用户无需手动拖拽即可将待部署节点分配到部署资源池的各个物理池中,从而可以简化用户操作,提高节点分配效率。
参见图7,图7是根据本发明实施例示出的一种节点分配过程的示意图,如图7所示,以临时资源池区域包括8个节点为例,可以通过自动分配或拖拽的方式,将这8个节点分配到3个物理池中,例如,将节点1、节点2和节点3分配到物理池1中,将节点4、节点5和节点6分配到物理池2中,将节点7和节点8分配到物理池3中,其中,一个物理池对应于大数据集群中的一个服务器。而每个节点在服务器上进行部署时,均会被部署为一个容器,因而,一个节点即为服务器中的一个容器。
此外,临时资源池区域中的待部署节点还支持拖拽删除功能,用户可以通过将临时资源池区域中的待部署节点拖拽到指定位置,从而实现对待部署节点的删除。仍以如图2所示的部署界面为例,界面左下角“拖动节点到此处删除”的位置即为指定位置。
需要强调的是,由于临时资源池区域中存放的都是临时节点,当离开页面或刷新页面时,页面会被重置,从而使得页面会清空临时资源池中的所有节点。
上述过程主要是有关具体的拖拽过程的介绍,需要说明的是,相较于目前HTML5页面所提供的拖拽功能,本发明通过设计一种便捷操作的拖拽组件,方便开发 者对拖拽过程进行更精准的控制,获取更全面简约的数据信息,避免过多冗余代码的产生,简化开发工作,提高代码质量与代码可读性。
在一些实施例中,基于新开发的拖拽功能实现节点拖拽的过程可以如下:
步骤一、获取目标界面上所显示的节点对应的操作对象的属性数据,操作对象为在目标界面对应的程序代码中所定义的对象。
其中,目标界面即可以为部署界面,操作对象可以为文档对象模型(Document Object Model,DOM)对象。
步骤二、将所获取到的属性数据关联到对应的节点上。
步骤三、响应于对目标界面上的节点的拖拽操作,基于拖拽操作对应的操作数据,对节点所关联的属性数据进行修改,以使节点基于修改后的属性数据,显示在拖拽操作结束的位置。
通过在获取到目标节点上所显示的节点在程序代码中对应的操作对象的属性数据之后,将所获取到的属性数据关联到对应的节点上,从而使得在对目标节点上的节点进行拖拽操作时,可以直接基于拖拽操作对应的操作数据,来对节点所关联的属性数据进行修改,而无需进行操作对象的查找,使得通过简单的操作过程,即可实现对节点的拖拽显示,以使节点基于修改后的属性数据,显示在拖拽操作结束的位置。
其中,在获取目标界面上所显示的节点对应的操作对象的属性数据时,可以基于目标界面上所显示的节点所处的位置,确定该节点对应的是哪个操作对象,从而获取该操作对象的属性数据。
在将所获取到的属性数据关联到对应的节点上时,可以将所获取到的属性数据作为一种标记信息,标记到相应的节点上,同时,也将节点标识作为一种标记信息,标记到相应的属性数据上,从而实现属性数据和节点的关联。可选地,还可以采用其他方式实现属性数据和节点的关联,本发明对具体采用哪种方式不加以限定。
通过上述过程,即可实现属性数据和节点的双向绑定,以便后续可以直接基于对节点的操作来对属性数据进行修改,而无需通过修改操作对象实现。
在一些实施例中,属性数据可以包括位置数据,因而,在响应于对目标界面上的节点的拖拽操作,基于拖拽操作对应的操作数据,对节点所关联的属性数据进行修改时,可以响应于对目标界面上的节点的拖拽操作,基于拖拽操作结束的位置所对应 的位置数据,对节点所关联的属性数据中的位置数据进行修改。
可选地,属性数据还可以包括其他类型的数据,本发明对属性数据的具体类型不加以限定。然而,无论属性数据是何种类型的数据,均可以通过如下过程实现:
步骤1、响应于对目标界面上的节点的拖拽操作,通过属性获取指令,基于拖拽操作对应的操作数据,获取待修改的属性数据。
步骤2、通过属性设置指令,确定待修改的属性数据对应的操作对象。
步骤3、基于待修改的属性数据,对所确定出的操作对象对应的属性数据进行修改。
其中,属性获取指令可以为getAtttribute指令,属性设置指令可以为setAtttribute指令。
参见图8,图8是根据本发明实施例示出的一种拖拽功能的原理示意图,如图8所示,以操作对象为DOM对象为例,通过关联属性数据和界面中所显示的节点,从而使得在检测到对节点的拖拽操作后,可以直接基于拖拽操作同步修改属性数据,而在拖拽组件内部,可以基于被操作的节点确定对应的DOM对象,从而再对DOM对象进行修改。
需要说明的是,对于HTML5页面,原生的拖拽功能,需要通过拖拽事件获取到所拖拽的节点对应的操作对象,再通过操作对象对应的属性数据查找到业务所需的逻辑,如获取到DOM对象赋值给变量item,通过item.getAttribute指令获取DOM对象的属性数据,再通过item.setAttribute指令对属性数据进行修改,才能实现按照拖拽操作显示节点。
而本发明通过预先关联属性数据和节点,从而可以在节点被拖拽时直接对属性数据进行修改,而将查找操作对象和修改操作对象的过程封装到拖拽组件内部,从而使得用户仅需关注对数据的修改即可,从而使得在实际代码实现时仅需将待修改的属性数据赋值给i,通过i.isDeploy指令即可实现属性数据的修改,在这里看似只简化了一个单词,但是在实际开发中,对于HTML5页面原生的拖拽功能,大量的属性数据写在代码里,来对操作对象进行操作显得十分复杂,且易读性差,而操作数据可以使代码一目了然,只关注数据的改动,而无需关注操作对象的修改。
需要说明的是,上述过程是以直接对节点进行移动拖拽为例来进行说明的,也即是,在节点被拖拽后,即从节点被拖拽前所处的服务器中删除所显示的节点,而仅 在被拖拽后所处的服务器中显示该节点,以达到一种移动拖拽的效果为例来进行说明的,在更多可能的实现方式中,还可以被拖拽前节点所处的服务器中、以及被拖拽后节点所处的服务器中均显示该节点,以达到一种复制拖拽的效果。
则,在一些实施例中,在节点满足设定条件的情况下,可以响应于对目标界面上的节点的拖拽操作,为节点生成临时变量,通过临时变量,对节点修改前的属性数据进行存储。
例如,通过临时变量,可以对节点被拖拽前的位置数据进行存储。其中,位置数据可以为节点所处物理池对应的服务器标识以及节点在服务器中的索引值。
需要说明的是,通过上述过程,即可使得被拖拽前节点所处的服务器中、以及被拖拽后节点所处的服务器中均显示该节点,而为了便于用户区分,还可以将这两个节点显示为不同的样式。
在一些实施例中,属性数据还可以包括样式数据,样式数据用于指示节点的显示样式,如节点的边框样式(实线或虚线)、节点的颜色等。
在一种可能的实现方式中,响应于对目标界面上的节点的拖拽操作,将临时变量中所存储的属性数据所包括的样式数据修改为第一样式数据,将节点所关联的属性数据所包括的样式数据修改为第二样式数据,从而使得被拖拽前的节点以及基于节点的拖拽操作复制出来的节点可以被显示为不同的样式,便于用户区分。
参见图9,图9是根据本发明实施例示出的一种修改样式数据的原理示意图,如图9所示,以操作对象为DOM对象为例,通过预先将样式数据作为属性数据,与界面中所显示的节点进行关联,从而使得在检测到对节点的拖拽操作后,可以直接基于对节点的样式数据进行修改,而在拖拽组件内部,可以基于被操作的节点确定对应的DOM对象,从而再对DOM对象的样式数据进行修改。
在更多可能的实现方式中,属性数据还可以包括行为数据,行为数据用于指示节点在被拖拽时是否需要显示提示信息。
因此,可以响应于对目标界面上的节点的拖拽操作,获取被拖拽的节点所关联的属性数据;在属性数据所包括的行为数据指示节点在被拖拽时是否需要显示提示信息的情况下,显示提示信息,提示信息用于基于本次拖拽操作进行提示。其中,提示信息可以为弹框提示,消息提醒等。
通过为节点关联属性数据的方式,可以更好地控制操作过程,例如,只需判断 标记属性数据中的某些值,就可以在拖拽过程中触发其他行为,如弹框提示,消息提醒等,无需多次获取节点,简化操作逻辑。
此外,通过上述方法,可以支持不同服务器中多种节点的拖拽,从而可以提高拖拽过程的灵活性。
需要说明的是,在将临时资源池区域中的待部署节点均拖拽到部署资源池区域后,即可通过步骤104,响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上部署待部署节点对应的容器。
在一些实施例中,对于步骤104,在响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上部署待部署节点对应的容器时,可以包括如下步骤:
步骤1041、响应于开始部署操作,基于待部署节点所属的大数据组件的组件类型,确定目标插件。
其中,目标插件可以为开发人员根据统一开发规范开发的二进制包,开发人员可以在目标插件开发完成后,将目标插件上传至大数据集群的初始化服务器中,以便可以初始化服务器可以将目标插件存储至大数据集群中的设定位置处。其中,设定位置可以为初始服务器中的plugins文件目录下。
需要说明的是,不同类型的组件可以对应于不同的目标插件,但每个插件的开发过程都需要遵循插件开发规范,统一开发start方法用来启动服务、restart方法来重启服务、decommission方法来退役节点的功能。
步骤1042、通过目标插件,启动位于物理池对应的服务器上的目标接口。
步骤1043、通过目标接口,在物理池对应的服务器上部署待部署节点对应的容器。
在一种可能的实现方式中,上述步骤1043可以通过如下过程实现:
步骤1043-1、通过目标插件读取第一配置文件,以从第一配置文件中获取目标安装环境。
其中,第一配置文件可以为app.json配置文件,可选地,第一配置文件内可以包括镜像名、版本号、Docker网络名称、储存数据的MYSQL信息、RabbitMQ信息等。
在一种可能的实现方式中,可以基于第一配置文件所包括的Docker网络名称和镜像名,确定目标安装环境。其中,目标安装环境可以为HDFS组件对应的Docker网络环境、YARN组件对应的Docker网络环境、Hive组件对应的Docker网络环境、Clickhouse组件对应的Docker网络环境,等等。
步骤1043-2、通过目标接口,对服务器的目标安装环境的配置文件进行修改,以在物理池对应的服务器上部署待部署节点对应的容器。
需要说明的是,不同的目标安装环境对应的配置文件不同,因而,在部署不同组件对应的容器时,需要修改的配置文件也不相同。部署HDFS组件对应的容器时需要修改core-site.xml和hdfs-site.xml这两个配置文件,部署YARN组件对应的容器时需要修改配置文件yarn-site.xml,部署Clickhouse组件对应的容器时需要修改config.xml和users.xml这两个配置文件,同时,由于上述部署过程均是在大数据集群中进行操作,因而,还需要修改集群启动时需要修改的配置文件workers,等等,但是,需要强调的是,这些繁杂的部署过程都无需手动修改,程序可自动生成参数、自动完成配置文件的修改。
为便于查看,以图的形式来展示各个组件在进行不同操作时需要修改的配置文件,参见图10,图10展示出了多种大数据组件在进行不同操作时需要修改的配置文件。
通过上述插件化开发过程,可以实现插件热插拔和插件统一开发,降低集群搭建技术门槛。而且,对于开发人员来说,插件根据统一开发规范,拥有统一模板的方法、配置、功能,不仅可以增加可读性,还可以减少插件间的冲突问题;对于用户来说,插件被规范封装,用户无需理解后端插件的执行方式,即可操控插件,从而可以减少出现问题的隐患。
通过插件,即可将服务部署在容器中,实现大数据集群轻量化,解决资源浪费的问题,插件通过启动提前打好的镜像包,即可实现环境搭建,提高搭建效率。而且,使用Docker和插件,后期可以很方便地对服务进行移动,从而可以减少开发和维护成本。
完成配置文件的修改后,程序会自动完成拷贝、容器间统一配置、启动服务等操作,以完成容器的部署。例如,在修改完配置文件后,即可通过如下过程实现容器的部署。
步骤一、基于待部署节点以及待部署节点所处的物理池,生成第一请求报文,第一请求报文用于指示在物理池对应的服务器上部署待部署节点对应的容器。
在一种可能的实现方式中,可以基于待部署节点以及待部署节点所处的物理池生成JSON格式的请求报文数据,从而将所生成的JSON格式的请求报文数据作为第一请求报文,以便后续可以基于第一请求报文实现容器的部署。
需要说明的是,第一请求报文中可以携带n个待部署节点对应的信息,例如,可以携带每个待部署节点对应的待创建容器的容器名称(containerName)和容器类型(containerType)。
可选地,在生成第一请求报文后,可以将第一请求报文存储至第一消息队列中,以便后续可以从第一消息队列中获取第一请求报文,来执行基于第一请求报文和物理池对应的服务器上的已部署容器以及已部署容器中的待删除容器,确定待部署节点对应的部署操作类型以及已部署容器中的待删除容器的步骤。
通过采用消息队列来对请求报文进行存储,可以实现显示侧对用户请求的同步处理以及后台的异步处理,以保证用户可以在当前请求报文未被真正处理完成的情况下,用户可以继续通过部署界面进行操作,而不会影响后台的处理过程,以保证后续的用户请求可以得到及时的响应。
在更多可能的实现方式中,在生成第一请求报文后,可以对第一请求报文的数据格式进行校验,和/或,按照预设部署规则,对第一请求报文所携带的部署数据进行校验,从而在校验通过的情况下,再将第一请求报文存储至第一消息队列中。
通过对第一请求报文进行校验,可以保证第一请求报文的合法性和有效性,从而可以保证处理过程的安全性。
步骤二、基于第一请求报文和物理池对应的服务器上的已部署容器,确定待部署节点对应的部署操作类型以及已部署容器中的待删除容器,部署操作类型包括新增节点、移动节点和不改变节点。
在一种可能的实现方式中,可以对第一请求报文所携带的待部署节点对应的待创建容器,与服务器上的已部署容器进行比较,以确定哪些待部署节点是需要增加的节点(即需要在服务器中为其创建对应的容器)、哪些待部署节点是需要移动的节点(即需要将其对应的容器从一个服务器移动到另一个服务器中)、哪些待部署节点是无需改变的节点,并且,可以确定出哪些已部署容器是需要删除的容器(即需要在服 务器中删除对应的已部署容器)。
下面以几个具体示例为例,来对步骤二所描述的过程进行介绍。为便于理解,首先对不同containerType对应的容器类型进行介绍:
containerType=1→NameNode;
containerType=1→DataNode;
containerType=3→ResourceManager;
containerType=4→NodeManager;
containerType=5→Secondary NameNode;
containerType=6→Clickhouse;
containerType=7→Hive;
containerType=8→Zookeeper;
containerType=11→HANameNode;
containerType=12→HADataNode;
containerType=13→HAResourceManager;
containerType=14→HANodeManager。
结合上述内容,接下来对几个示例分别进行介绍:
示例一中,共两个服务器(服务器标识分别为10.10.86.214和10.10.86.215),其中,每个服务器下需要部署的容器可以参见图11,图11是根据本发明实施例示出的一种部署数据示意图,如图11所示,服务器10.10.86.214下共有3个需要部署的容器,其中,containerType为1、5两个容器的containerName为空,表示NameNode、SecondaryNameNode两个节点对应的容器为需要进行部署的新容器,从而可以确定NameNode、SecondaryNameNode两个节点的部署操作类型为新增节点;containerType为6的containerName不为空,表示ClickHouse节点对应的容器之前被部署过;而服务器10.10.86.215下共有4个需要部署的容器,containerType均为2,表示需要部署4个DataNode节点对应的新容器。
当示例一的部署操作完成后,会获取到如示例二所示的部署结果数据,而通过比较示例一和示例二的部署结果,即可确定示例二中containerType为6的容器,在本 次部署中未出现,可以判断出本次部署时该组件已被删除。
示例三中,共两个服务器(服务器标识分别为10.10.86.214和10.10.86.215),其中,每个服务器下需要部署的容器可以参见图12,图12是根据本发明实施例示出的一种部署数据示意图,如图12所示,服务器10.10.86.214下containerType为1、5两个容器之前部署过,本次部署的IP地址未改变,表示NameNode、SecondaryNameNode两个节点对应的容器本次无需重新进行部署,从而可以确定NameNode、SecondaryNameNode两个节点的部署操作类型为未改变节点;containerType为2容器之前部署过,本次部署的IP地址由10.10.86.215变更为10.10.86.214,说明部署容器由服务器10.10.86.214移动到服务器10.10.86.215,从而可以确定DataNode节点的部署操作类型为移动节点;containerType为7的容器的containerName为空,表示Hive节点对应的容器为需要进行部署的新容器,从而可以确定Hive节点的部署操作类型为新增节点;而服务器10.10.86.215下共有3个需要部署的容器,containerType均为2,且本次部署的IP地址未改变,表示4个DataNode节点的部署操作类型均为未改变节点。
当示例三的部署操作完成后,即可获取到如示例四所示的部署结果数据。
步骤三、按照待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在物理池对应的服务器上进行容器部署。
在一种可能的实现方式中,在部署操作类型为新增节点的情况下,调用待部署节点的节点类型对应的组件插件,在物理池对应的服务器上创建待部署节点对应的容器。
其中,节点类型与组件插件之间的对应关系都是预先设置好的,基于待部署节点的节点类型,即可找到对应的组件插件,从而可以通过对应的组件插件来进行容器的创建。
在另一种可能的实现方式中,在部署操作类型为移动节点的情况下,从已部署待部署节点对应容器的服务器上删除待部署节点对应的已部署容器,在物理池对应的服务器中创建待部署节点对应的容器,并将已部署容器中的数据拷贝至所创建的容器中。
需要说明的是,由于各个容器中的数据均会被持久化到诸如硬盘的存储设备中,因而,在将已部署容器中的数据拷贝至所创建的容器中时,可以从硬盘中获取已 部署容器中的数据,从而将获取到的数据存储至所创建的容器中,以实现数据的拷贝。
在另一种可能的实现方式中,在部署操作类型为不改变节点的情况下,无需在物理池对应的服务器上进行操作。
在另一种可能的实现方式中,在已部署容器中存在待删除容器的情况下,从物理池对应的服务器上删除待删除容器。
可选地,计算设备所关联的数据库中还可以包括第一部署表和第二部署表,第一部署表可以用于对每次容器部署过程进行记录,第二部署表可以用于对每次容器部署过程的具体部署内容进行记录。
也即是,在一些实施例中,在基于第一请求报文和物理池对应的服务器上的已部署容器,确定待部署节点对应的部署操作类型以及已部署容器中的待删除容器之后,本发明所提供的大数据集群部署方法还可以包括下述过程:
响应于第一请求报文,在第一部署表中生成操作记录,操作记录用于记录本次部署操作;
响应于第一请求报文,在第二部署表中生成待部署节点对应的容器部署记录,容器部署记录用于记录待部署节点对应的部署操作。
其中,可以在操作记录中记录本次操作的部署状态,在容器部署记录中记录待部署节点对应的容器的部署状态。可选地,部署状态可以包括未部署、已部署和部署错误等。计算设备可以基于各个容器的部署情况,对容器部署记录中的部署状态进行更新,进而在每个容器均已部署完成的情况下,将操作记录中的部署状态更新为部署完成。
可选地,可以基于容器部署记录中所记录的部署状态,将部署资源池中的各个节点显示为不同的颜色,便于用户查看。其中,未部署状态的节点可以被显示为灰色,已部署状态的节点可以被显示为绿色)、部署错误状态的节点可以被显示为红色。
需要说明的是,计算设备可以每隔预设时长查询一次容器的部署状态,从而基于查询到的部署状态对部署资源池中各个节点的显示方式进行更新。其中,预设时长可以为10秒,可选地,预设时长还可以为其他时长,本发明对预设时长的具体取值不加以限定。
此外,还可以在容器部署记录中记录失败原因,以便相关技术人员可以进行问 题排查。
上述实施例中主要介绍了增加物理池以及在物理池中部署节点对应的容器的过程,可选地,部署资源池区域还可以设置有删除物理池控件、置顶物理池控件等,以便为用户提供更加多样化的功能。
在部署资源池区域包括删除物理池控件的情况下,相关技术人员可以通过删除物理池控件来进行物理池的删除。
在一种可能的实现方式中,一个物理池对应于一个删除物理池控件,相关技术人员可以触发任一物理池对应的删除物理池控件,计算设备即可响应于对任一删除物理池控件的触发操作,不再在部署资源池区域显示被触发的删除物理池控件对应的物理池。
仍以图4所示的部署界面为例,如图4所示的部署界面的部署资源池区域中所显示的每个物理池的右上角都设置有一个“×”按钮,该按钮即为删除物理池控件,用户即可通过触发任一“×”按钮,来实现对对应物理池的删除。
通过在部署界面设置删除物理池控件,以便用户可以根据自己的实际需求实现对任一物理池的删除,以将该物理池对应的服务器从大数据集群中剔除,从而可以满足用户的技术需求,而且,操作简便,用户仅需一个简单的控件触发操作即可完成对大数据集群的修改,大大提高了操作效率。
需要说明的是,在对物理池进行删除时,需要将当前部署节点中待删除的物理池中所显示的节点移除,保证该物理池在部署界面中显示为空时,才可以通过删除物理池控件来对物理池进行删除。另外,需要注意的是,正在初始化中的物理池无法删除。
另外,需要说明的是,在对物理池进行删除时,计算设备可以响应于对任一删除物理池控件的触发操作,从删除物理池控件对应的物理池所对应的服务器中删除已部署的容器。
在任一删除物理池控件被触发的情况下,计算设备可以通过第二部署表查询被触发的删除物理池控件对应的服务器中所包括的已部署容器,从而调用Docker API,从相应的服务器总删除已部署容器的接口,以完成已部署容器删除。
而在部署资源池区域包括置顶物理池控件的情况下,相关技术人员可以通过置顶物理池控件来改变物理池在部署资源池中的显示位置。
在一种可能的实现方式中,一个物理池对应于一个置顶物理池控件,相关技术人员可以触发任一物理池对应的置顶物理池控件,计算设备即可响应于对任一置顶物理池控件的触发操作,将置顶物理池控件对应的物理池显示在部署资源池区域中的第一目标位置处。其中,第一目标位置可以为部署资源池区域最左侧的位置。
仍以如图4所示的部署界面为例,如图4所示的部署界面的部署资源池区域中所显示的每个物理池的右上角都设置有一个“↑”按钮,该按钮即为置顶除物理池控件,用户即可通过触发任一“↑”按钮,来实现对对应物理池的显示位置的改变。
需要说明的是,当部署资源池中物理池添加到一定数量时,查找物理池时会出现一定困难,本发明通过增加置顶物理池功能,通过触发任一物理池对应的置顶物理池控件,该物理池就会被移动到部署资源池的最左侧的位置,也即是,部署资源池的第一位,其余物理池顺序右移,从而使得用户可以更加方便地对移动到第一位的物理池中的节点进行操作,从而可以提高用户体验。
另外,本发明还可以在部署界面中提供恢复设置控件,以便在使用大数据平台的用户在遇到一些问题想要将大数据平台恢复到初始状态重新进行部署时,可以通过恢复设置控件,来将大数据平台恢复到初始状态。
在一种可能的实现方式中,计算设备可以响应于对恢复设置控件的触发操作,生成第三请求报文,第三请求报文用于请求删除已部署的服务器和容器;基于第三请求报文,从已部署的服务器中删除已部署的多个容器,执行第三预设脚本文件以使已部署的服务器脱离大数据集群。
可选地,在生成第三请求报文后,可以对第三请求报文的数据格式进行校验,从而在校验通过的情况下,再对第三请求报文进行处理,以保证第三请求报文的合法性和有效性,从而可以保证处理过程的安全性。
其中,在从已部署的服务器中删除已部署的多个容器,以及执行第三预设脚本文件以使已部署的服务器脱离大数据集群时,可以通过第二部署表查询大数据集群中的所有已部署容器,从而依次遍历容器列表,通过已部署的服务器IP、容器名称调用Docker API,来删除容器接口,完成所有已部署容器的删除;并且,通过第一部署表,查询到大数据集群所包括的所有服务器,依次遍历服务器列表,执行离开Docker Swarm集群脚本,完成所有服务器脱离集群的操作。
可选地,由于恢复出厂设置的操作是不可逆的,因而,在检测到对恢复设置控 件的触发操作后,计算设备可以多次显示提示信息,来向用户确认是否确定要恢复出厂设置,从而在接收到确定要恢复出厂设置的指示后,再进行第三报文数据的生成。其中,提示信息可以为文案等多种类型。
上述恢复设置控件被触发后的处理逻辑可以参见图13,图13是根据本发明实施例示出的一种恢复出厂设置过程的流程示意图,如图13所示,在用户触发恢复出厂设置过程之后,可以通过多次文案提示来确认用户是否确定要执行恢复出厂设置的操作,在用户确定后,计算设备后端(也即是服务端)即可基于接收到的请求报文进行校验,从而在校验通过的情况下,通过遍历所有服务器实现所有容器的删除,再让所有服务器脱离大数据集群,此外,删除大数据集群中的所有存储数据,以使大数据集群恢复到初始状态,相应地,部署界面也会恢复到系统初始状态。
上述实施例主要是围绕大数据集群中常用的几种大数据组件来进行说明的,在更多可能的实现方式中,本发明还可以支持其他类型的容器的部署。
例如,本发明所提供的方法,还可以支持诸如分布式日志系统(Kafka)组件和远程字典服务(Remote Dictionary Server,Redis)组件的容器部署过程。下面,以Redis组件为例,给出Redis集群的部署方案。
其中,Redis是一个开源的使用ANSI C语言编写、支持网络、可基于内存亦可持久化的日志型、Key-Value数据库,并提供多种语言的API。单个Redis组件存在不稳定性,当Redis服务宕机就会导致服务不可用,并且,单个Redis组件的读写能力有限,使用Redis集群可以强化Redis的读写能力,并且当一台服务器宕机,其他服务器还能正常工作,不影响使用。
因而,开发人员可以预先准备用于部署Redis集群的基础镜像文件,并开发、部署Redis插件,以便可以通过本发明所提供的方法,实现Redis集群的部署。
在一种可能的实现方式中,可以在部署界面显示Redis组件,用户可以选中Redis后触发节点创建控件,计算设备即可响应于用户对节点创建控件的触发操作,在临时资源池区域显示6个待部署的Redis节点,从而将Redis节点拖拽到部署资源池的至少一个物理池中,进而触发开始部署控件,计算设备即可响应于对开始部署控件的触发操作,生成JSON的请求报文数据,进而对所生成的请求报文数据的数据格式进行校验,并且校验待部署的Redis节点的个数是否为6(由于Redis组件的性质,必须满足待部署的Redis节点的个数为6时,才允许进行部署),从而在校验通过的情况 下,在相应的服务器上创建Redis容器,具体过程可参见上述实施例,此处不再赘述。
需要说明的是,可以根据配置文件模板,为每个Redis节点生成一个redis.conf配置文件。参见图14,图14是根据本发明实施例示出的一种redis.conf配置文件的示意图,如图14所示,6个Redis节点对应于6个redis.conf配置文件,这6个配置文件中{{.Port}}(也即是端口号)的取值范围为6379-6384。在Redis集群启动时,每个Redis节点会加载自己对应的配置文件。
另外,需要说明的是,集群部署完成后,可以通过服务器的IP地址以及上述任意一个配置文件中的端口号,即可完成Redis集群功能使用。
上述Redis集群的搭建过程可以参见图15,图15是根据本发明实施例示出的一种Redis集群的搭建过程的示意图,如图15所示,开发人员可以预先开发Redis插件,也即是,构建Redis基础镜像,然后抽象出配置文件,从而进行插件功能的开发(包括读取参数项、创建配置文件、将配置文件拷贝到远程的目标机器、目标机器上启动容器等),从而对插件进行编译、安装和加载,以便可以通过已加载的Redis插件来部署Redis容器。具体地,首先部署节点,从而将所部署的Redis节点拖动到部署资源池,然后触发开始部署操作,在满足部署规则的情况下,服务器按照部署逻辑进行处理,调用Redis插件传递参数,以实现Redis的搭建,否则,返回错误消息,以便重新进行Redis节点的部署。
图15所示的过程仅为流程性说明,具体实现过程可以参见上述各个实施例,此处不再赘述。
上述过程主要介绍了有关大数据集群搭建过程的一些内容,在构建了大数据集群并在大数据集群中部署了相应的容器后,即可通过所搭建的大数据集群为用户提供服务。其中,大数据集群的不同容器之间通过Overlay网络进行通信,以共同为用户提供服务。
参见图16,图16是根据本发明实施例示出的一种基于大数据集群的数据处理方法的流程图,如图16所示,该方法包括:
步骤1601、获取数据处理请求。
步骤1602、通过Overlay网络,向目标容器发送数据处理请求,目标容器用于基于数据处理请求实现数据处理过程,容器按照对部署界面中的待部署节点的拖拽操作在服务器上创建得到,容器用于提供大数据集群服务。
通过Overlay网络保证大数据集群中各个容器之间的通信,以便在获取到数据处理请求时,可以通过Overlay网络,向目标容器发送数据处理请求,以便目标容器基于数据处理请求实现数据处理过程,以满足用户的数据处理需求。
在一些实施例中,对于步骤1602,在通过Overlay网络,向目标容器发送数据处理请求时,可以基于数据处理请求,确定至少一个目标容器,从而通过Overlay网络,向至少一个目标容器发送数据处理请求。
需要说明的是,在目标容器的数量大于等于2的情况下,至少一个目标容器至少包括第一目标容器和第二目标容器。则在通过Overlay网络,向至少一个目标容器发送数据处理请求时,可以通过Overlay网络,向第一目标容器发送数据处理请求,第一目标容器用于通过Overlay网络与第二目标容器进行通信,以完成对数据处理请求的响应。
以通过Overlay网络实现第一目标容器和第二目标容器之间数据通信需求的过程为例,第一目标容器可以对待传输的数据进行封装得到第一数据报文,进而对第一数据报文再次进行封装,得到第二数据报文,其中,第一数据报文的目的IP地址为第二目标容器的IP地址,源IP地址为第一目标容器的IP地址,第二数据报文的目的IP地址为第二目标容器所处服务器的IP地址,源IP地址为第一目标容器所处服务器的IP地址,第一目标容器即可通过Overlay网络,将第二数据报文发送给第二目标容器,以便第二目标容器可以对第二数据报文的双层封装进行拆解,以得到真正需要处理的数据部分。
其中,数据处理请求可以为数据存储请求、数据获取请求或数据删除请求。可选地,数据处理请求还可以包括其他类型的请求,本发明对数据处理请求的具体类型不加以限定。
本发明的实施例还提出了一种大数据集群部署系统以及对应的数据处理系统,该系统可以包括至少可视化操作模块和架构服务模块。其中,可视化操作模块用于为使用者提供便捷的大数据集群部署操作界面,可添加、删除服务器,部署、移动、删除大数据组件所包括的节点,恢复集群出厂设置等;架构服务模块可以用于提供API接口服务、数据规则校验、组件部署逻辑处理、消息处理、插件调用、数据库持久化等功能。
可选地,该系统还可以包括消息模块、数据库模块、网络模块和大数据组件插 件模块。其中,消息模块为基于RabbitMQ的消息队列,在被架构部署模块调用时完成消息生产、消费,在比较耗时的场景(如服务器初始化、各组件容器部署等)下使用提升用户体验,保证数据一致性、稳定性和可靠性;数据库模块使用MYSQL数据库,用于存储服务器状态、组件部署状态、组件部署与服务器间关系信息;网络模块,为基于docker的overlay网络,大数据服务容器启动时使用该网络,保证容器间跨服务器通信;大数据组件插件模块,用于对于每个大数据组件开发可插拔的启动插件,以服务器IP结合插件参数的方式启动插件,即可完成对指定组件容器在指定服务器上启动。
上述仅为对各个模块的功能的简要说明,下面对各个模块的功能仅详细介绍。
其中,可视化操作模块,用于显示部署界面;
可视化操作模块,还用于响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点,节点为大数据组件所包括的、用于提供数据管理功能的服务;
可视化操作模块,还用于响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点;
架构服务模块,用于响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上创建待部署节点对应的容器,容器用于提供大数据集群服务。
在一些实施例中,部署界面包括节点创建区域,节点创建区域包括节点创建控件和至少一个大数据组件;
可视化操作模块,在用于响应于在部署界面的节点创建操作,在部署界面的临时资源池区域显示待部署节点时,用于:
在任一大数据组件被选中的情况下,响应于对节点创建控件的触发操作,在临时资源池区域中显示被选中的大数据组件对应的待部署节点。
在一些实施例中,节点创建区域还包括节点参数设置控件,节点参数设置控件用于设置待部署节点的版本;
可视化操作模块,在用于响应于对节点创建控件的触发操作,在临时资源池区域中显示被选中的大数据组件对应的待部署节点时,用于:
响应于对节点创建控件的触发操作,在临时资源池区域中显示通过节点参数设置控件所设置的版本对应的待部署节点。
在一些实施例中,大数据组件至少包括HDFS组件、YARN组件、Hive组件和Clickhouse组件。
在一些实施例中,部署资源池区域包括至少一个物理池;
可视化操作模块,在用于响应于对临时资源池区域中的待部署节点的拖拽操作,在部署界面的部署资源池区域中的物理池中显示待部署节点时,用于:
对于任一待部署节点,响应于对待部署节点的拖拽操作,将待部署节点显示在拖拽操作结束时所指示的物理池中。
在一些实施例中,架构服务模块,在用于响应于在部署界面中的开始部署操作,按照待部署节点所处的物理池,在物理池对应的服务器上部署待部署节点对应的容器时,用于:
响应于开始部署操作,基于待部署节点所属的大数据组件的组件类型,确定目标插件;
通过目标插件,启动位于物理池对应的服务器上的目标接口;
通过目标接口,在物理池对应的服务器上部署待部署节点对应的容器。
在一些实施例中,架构服务模块,在用于通过目标接口,在物理池对应的服务器上部署待部署节点对应的容器,包括:
通过目标插件读取第一配置文件,以从第一配置文件中获取目标安装环境;
通过目标接口,对服务器的目标安装环境的配置文件进行修改,以在物理池对应的服务器上部署待部署节点对应的容器。
在一些实施例中,目标插件为二进制包,目标插件存储在大数据集群中的设定位置处;
目标插件的获取过程包括:
获取被上传至大数据集群的初始服务器中的目标插件;
将目标插件存储在大数据集群中的设定位置处。
在一些实施例中,架构服务模块,在用于响应于在部署界面中的开始部署操作, 按照待部署节点所处的物理池,在物理池对应的服务器上部署待部署节点对应的容器时,用于:
响应于开始部署操作,基于待部署节点以及待部署节点所处的物理池,生成第一请求报文,第一请求报文用于指示在物理池对应的服务器上部署待部署节点对应的容器;
基于第一请求报文和物理池对应的服务器上的已部署容器,确定待部署节点对应的部署操作类型以及已部署容器中的待删除容器,部署操作类型包括新增节点、移动节点和不改变节点;
按照待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在物理池对应的服务器上进行容器部署。
在一些实施例中,架构服务模块,还用于将第一请求报文存储至第一消息队列中;
该系统还包括:
消息模块,用于从第一消息队列中获取第一请求报文;
架构服务模块,还用于在消息模块获取到第一请求报文的情况下,执行基于第一请求报文和物理池对应的服务器上的已部署容器,确定待部署节点对应的部署操作类型以及已部署容器中的待删除容器的步骤。
在一些实施例中,架构服务模块,在用于按照待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在物理池对应的服务器上进行容器部署时,用于:
在部署操作类型为新增节点的情况下,调用待部署节点的节点类型对应的组件插件,在物理池对应的服务器上创建待部署节点对应的容器;
在部署操作类型为移动节点的情况下,从已部署待部署节点对应容器的服务器上删除待部署节点对应的已部署容器,在物理池对应的服务器中创建待部署节点对应的容器,并将已部署容器中的数据拷贝至所创建的容器中;
在部署操作类型为不改变节点的情况下,无需在物理池对应的服务器上进行操作;
在已部署容器中存在待删除容器的情况下,从物理池对应的服务器上删除待删除容器。
在一些实施例中,架构服务模块,还用于对第一请求报文的数据格式进行校验;
架构服务模块,还用于按照预设部署规则,对第一请求报文所携带的部署数据进行校验。
在一些实施例中,系统还包括数据库模块;
数据库模块,用于响应于第一请求报文,在第一部署表中生成操作记录,操作记录用于记录本次部署操作;
数据库模块,还用于响应于第一请求报文,在第二部署表中生成待部署节点对应的容器部署记录,容器部署记录用于记录待部署节点对应的部署操作。
在一些实施例中,数据库模块,还用于在操作记录中记录本次操作的部署状态;
数据库模块,还用于在容器部署记录中记录待部署节点对应的容器的部署状态;
其中,部署状态至少包括未部署、已部署和部署错误;
在一些实施例中,待部署节点包括多种类型;
可视化操作模块,还用于显示部署指示界面;
可视化操作模块,还用于通过部署指示界面获取用户填写的目标数据,目标数据用于指示待部署的容器的每秒存储数据条数;
架构服务器模块,还用于基于目标数据和预设参数,确定各种类型的待部署节点对应的推荐部署数量。
在一些实施例中,部署资源池区域包括新增物理池控件;可视化操作模块,还用于:
响应于对新增物理池控件的触发操作,显示增加物理池界面,增加物理池界面包括标识获取控件和密码获取控件;
通过标识获取控件获取待增加的物理池对应的服务器标识,通过密码获取控件获取待验证密码;
在待验证密码验证通过的情况下,在部署资源池区域显示待增加的物理池。
在一些实施例中,架构服务模块,还用于在待验证密码验证通过的情况下,生成第二请求报文;
架构服务模块,还用于将第二请求报文存储至第二消息队列中;
该装置还包括:
消息模块,用于从第二消息队列中获取第二请求报文;
架构服务模块,还用于基于第二请求报文,向待增加的物理池对应的服务器发送安装文件,服务器用于在接收到安装文件的情况下对安装文件进行安装,以使服务器加入大数据集群。
在一些实施例中,可视化操作模块,还用于在待验证密码未验证通过或者服务器未成功加入大数据集群的情况下,显示第一提示信息,第一提示信息用于指示服务器未成功加入大数据集群的原因。
在一些实施例中,该系统还包括:
数据库模块,用于在待验证密码验证通过的情况下,在第三部署表中生成服务器部署记录,服务器部署记录用于记录待增加的物理池对应的部署操作。
在一些实施例中,数据库模块,还用于在服务器部署记录中记录待增加的物理池对应的服务器的初始化状态,初始化状态至少包括待初初始化、初始化中、初始化错误和初始化完成。
在一些实施例中,部署资源池区域包括删除物理池控件,一个物理池对应于一个删除物理池控件;
可视化操作模块,还用于响应于对任一删除物理池控件的触发操作,不再在部署资源池区域显示删除物理池控件对应的物理池。
在一些实施例中,架构服务模块,还用于响应于对任一删除物理池控件的触发操作,从删除物理池控件对应的物理池所对应的服务器中删除已部署的容器。
在一些实施例中,部署资源池区域包括置顶物理池控件,一个物理池对应于一个置顶物理池控件;
可视化操作模块,还用于响应于对任一置顶物理池控件的触发操作,将置顶物理池控件对应的物理池显示在部署资源池区域中的第一目标位置处。
在一些实施例中,可视化操作模块,还用于对于部署资源池区域中所显示的任一物理池,在物理池的第二目标位置处显示物理池所对应的服务器的服务器标识,在物理池的第三目标位置处显示物理池所对应的服务器的当前存储使用率、内存占用率 和分配内存占用率。
在一些实施例中,部署界面还包括恢复设置控件;
架构服务模块,还用于响应于对恢复设置控件的触发操作,生成第三请求报文,第三请求报文用于请求删除已部署的服务器和容器;
架构服务模块,还用于基于第三请求报文,从已部署的服务器中删除已部署的多个容器,执行第三预设脚本文件以使已部署的服务器脱离大数据集群。
在一些实施例中,大数据集群包括至少一个服务器,至少一个服务器中存在一个初始服务器,架构服务模块,还用于:
在初始服务器上安装目标运行环境,并在初始服务器上配置目标运行环境对应的接口;
在初始服务器上创建目标运行环境对应的Overlay网络,并在初始服务器上初始化集群环境;
在初始服务器上创建大数据组件基础镜像,大数据组件基础镜像用于为容器提供构建基础;
在初始服务器上生成目标密钥文件。
在一些实施例中,系统还包括网络模块,用于保证容器间的跨服务器通信。
在一些实施例中,网络模块,用于在获取到数据处理请求后,通过Overlay网络,向目标容器发送数据处理请求,目标容器用于基于数据处理请求实现数据处理过程,容器按照对部署界面中的待部署节点的拖拽操作在服务器上创建得到,容器用于提供大数据集群服务。
在一些实施例中,网络模块,在用于通过Overlay网络,向目标容器发送数据处理请求时,用于:
基于数据处理请求,确定至少一个目标容器;
通过Overlay网络,向至少一个目标容器发送数据处理请求。
在一些实施例中,在目标容器的数量大于等于2的情况下,至少一个目标容器至少包括第一目标容器和第二目标容器;
网络模块,在用于通过Overlay网络,向至少一个目标容器发送数据处理请求 时,用于:
通过Overlay网络,向第一目标容器发送数据处理请求,第一目标容器用于通过Overlay网络与第二目标容器进行通信,以完成对数据处理请求的响应。
在一些实施例中,数据处理请求为数据存储请求、数据获取请求或数据删除请求。
在一些实施例中,系统还包括大数据组件插件模块,大数据组件插件模块用于实现容器在服务器上的启动。
对于系统实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的系统实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个物料模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
下面以几个实际的处理过程为例,来对本发明所提供的系统进行进一步说明。
例如,参见图17,图17是根据本发明实施例示出的一种模块交互过程的流程图,图17以增加物理池时各个模块之间的交互过程为例,如图17所示,可视化操作模块用于将服务器添加至大数据集群以及每10秒轮循查询服务器的初始化状态,在通过可视化操作模块将服务器添加至大数据集群时,可以通过可视化操作模块生成JSON格式的请求报文,从而向架构服务模块发送请求报文,架构服务模块在接收到请求报文的情况下,通过SSH连接测试,远程登录服务器,并在登录成功时,向消息模块中的消息队列发送消息,且向数据库模块中插入一条服务器初始化中的记录,此外,架构服务模块还可以对消息模块进行消息监听,以从消息模块的消息队列中获取消息,并基于获取到的消息来在服务器上进行环境安装、加入Docker Swarm集群等操作,并且,可以更新数据库模块中所记录的服务器初始化状态。
又例如,参见图18,图18是根据本发明实施例示出的另一种模块交互过程的流程图,图18以部署容器时各个模块之间的交互过程为例,如图18所示,可视化操作模块用于部署大数据组件对应的容器以及每10秒轮循查询容器部署状态,在通过可视化操作模块部署大数据组件对应的容器时,可以通过可视化操作模块生成JSON格式的请求报文,从而向架构服务模块发送请求报文,架构服务模块在接收到请求报文 的情况下,基于部署规则对请求报文进行校验,并在校验成功的情况下,确定部署方式(也即是节点部署操作类型),并向消息模块中的消息队列发送消息,且在数据库模块中增加第一部署表(也即是图18中所示的部署表)和第二部署表(也即是图18中所示的部署表细表)记录,此外,架构服务模块还可以对消息模块进行消息监听,以从消息模块的消息队列中获取消息,并基于获取到的消息来在服务器上进行插件启动,并且,可以更新数据库模块中所记录的部署状态等。
例如,参见图19,图19是根据本发明实施例示出的另一种模块交互过程的流程图,图19以恢复出厂设置时各个模块之间的交互过程为例,如图19所示,可视化操作模块用于重置大数据集群,在有重置大数据集群的需求时,可视化操作模块可以根据数据库模块中所记录的内容,查询已部署容器列表以删除所有已部署容器,还可以查询服务器列表以使所有服务脱离大数据集群,以完成大数据集群的重置。
上述仅为三种示例性说明,并不构成对本发明的限定。
本发明还提供了一种计算设备,参见图20,图20是本发明根据一示例性实施例示出的一种计算设备的结构示意图。如图20所示,计算设备包括处理器2010、存储器2020和网络接口2030,存储器2020用于存储可在处理器2010上运行的计算机指令,处理器2010用于在执行计算机指令时实现本发明任一实施例所提供的方法(包括大数据集群部署方法以及基于大数据集群的数据处理方法),网络接口2030用于实现输入输出功能。在更多可能的实现方式中,计算设备还可以包括其他硬件,本发明对此不做限定。
本发明还提供了一种计算机可读存储介质,计算机可读存储介质可以是多种形式,比如,在不同的例子中,计算机可读存储介质可以是:RAM(Radom Access Memory,随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、DVD等),或者类似的存储介质,或者它们的组合。特殊的,计算机可读介质还可以是纸张或者其他合适的能够打印程序的介质。计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现本发明任一实施例所提供的方法。
本发明还提供了一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现本发明任一实施例所提供的方法。
在本发明中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示 相对重要性。术语“多个”指两个或两个以上,除非另有明确的限定。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本发明的其它实施方案。本发明旨在涵盖本发明的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本发明未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本发明的真正范围和精神由权利要求指出。
应当理解的是,本发明并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本发明的范围仅由所附的权利要求来限制。

Claims (43)

  1. 一种大数据集群部署方法,其特征在于,所述方法包括:
    显示部署界面;
    响应于在所述部署界面的节点创建操作,在所述署界面的临时资源池区域显示待部署节点,所述节点为大数据组件所包括的、用于提供数据管理功能的服务;
    响应于对所述临时资源池区域中的待部署节点的拖拽操作,在所述部署界面的部署资源池区域中的物理池中显示所述待部署节点;
    响应于在所述部署界面中的开始部署操作,按照所述待部署节点所处的物理池,在所述物理池对应的服务器上创建所述待部署节点对应的容器,所述容器用于提供大数据集群服务。
  2. 根据权利要求1所述的方法,其特征在于,所述部署界面包括节点创建区域,所述节点创建区域包括节点创建控件和至少一个大数据组件;
    所述响应于在所述部署界面的节点创建操作,在所述部署界面的临时资源池区域显示待部署节点,包括:
    在任一大数据组件被选中的情况下,响应于对所述节点创建控件的触发操作,在所述临时资源池区域中显示被选中的大数据组件对应的待部署节点。
  3. 根据权利要求2所述的方法,其特征在于,所述节点创建区域还包括节点参数设置控件,所述节点参数设置控件用于设置所述待部署节点的版本;
    所述响应于对所述节点创建控件的触发操作,在所述临时资源池区域中显示被选中的大数据组件对应的待部署节点,包括:
    响应于对所述节点创建控件的触发操作,在所述临时资源池区域中显示通过所述节点参数设置控件所设置的版本对应的待部署节点。
  4. 根据权利要求2所述的方法,其特征在于,所述大数据组件至少包括HDFS组件、YARN组件、Hive组件和Clickhouse组件。
  5. 根据权利要求1所述的方法,其特征在于,所述部署资源池区域包括至少一个物理池;
    所述响应于对所述临时资源池区域中的待部署节点的拖拽操作,在所述部署界面 的部署资源池区域中的物理池中显示所述待部署节点,包括:
    对于任一待部署节点,响应于对所述待部署节点的拖拽操作,将所述待部署节点显示在拖拽操作结束时所指示的物理池中。
  6. 根据权利要求1所述的方法,其特征在于,所述响应于在所述部署界面中的开始部署操作,按照所述待部署节点所处的物理池,在所述物理池对应的服务器上部署所述待部署节点对应的容器,包括:
    响应于所述开始部署操作,基于所述待部署节点所属的大数据组件的组件类型,确定目标插件;
    通过所述目标插件,启动位于所述物理池对应的服务器上的目标接口;
    通过所述目标接口,在所述物理池对应的服务器上部署所述待部署节点对应的容器。
  7. 根据权利要求6所述的方法,其特征在于,所述通过所述目标接口,在所述物理池对应的服务器上部署所述待部署节点对应的容器,包括:
    通过所述目标插件读取第一配置文件,以从所述第一配置文件中获取目标安装环境;
    通过所述目标接口,对所述服务器的目标安装环境的配置文件进行修改,以在所述物理池对应的服务器上部署所述待部署节点对应的容器。
  8. 根据权利要求6所述的方法,其特征在于,所述目标插件为二进制包,所述目标插件存储在大数据集群中的设定位置处;
    所述目标插件的获取过程包括:
    获取被上传至大数据集群的初始服务器中的所述目标插件;
    将所述目标插件存储在所述大数据集群中的设定位置处。
  9. 根据权利要求1所述的方法,其特征在于,所述响应于在所述部署界面中的开始部署操作,按照所述待部署节点所处的物理池,在所述物理池对应的服务器上部署所述待部署节点对应的容器,包括:
    基于所述待部署节点以及所述待部署节点所处的物理池,生成第一请求报文,所述第一请求报文用于指示在所述物理池对应的服务器上部署所述待部署节点对应的容 器;
    基于所述第一请求报文和所述物理池对应的服务器上的已部署容器,确定所述待部署节点对应的部署操作类型以及已部署容器中的待删除容器,所述部署操作类型包括新增节点、移动节点和不改变节点;
    按照所述待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在所述物理池对应的服务器上进行容器部署。
  10. 根据权利要求9所述的方法,其特征在于,所述响应于所述开始部署操作,基于所述待部署节点以及所述待部署节点所处的物理池,生成第一请求报文之后,所述方法还包括:
    将所述第一请求报文存储至第一消息队列中;
    从所述第一消息队列中获取所述第一请求报文,执行基于所述第一请求报文和所述物理池对应的服务器上的已部署容器,确定所述待部署节点对应的部署操作类型以及已部署容器中的待删除容器的步骤。
  11. 根据权利要求9所述的方法,其特征在于,所述按照所述待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在所述物理池对应的服务器上进行容器部署,包括:
    在所述部署操作类型为新增节点的情况下,调用所述待部署节点的节点类型对应的组件插件,在所述物理池对应的服务器上创建所述待部署节点对应的容器;
    在所述部署操作类型为移动节点的情况下,从已部署所述待部署节点对应容器的服务器上删除所述待部署节点对应的已部署容器,在所述物理池对应的服务器中创建所述待部署节点对应的容器,并将所述已部署容器中的数据拷贝至所创建的容器中;
    在所述部署操作类型为不改变节点的情况下,无需在所述物理池对应的服务器上进行操作;
    在已部署容器中存在待删除容器的情况下,从所述物理池对应的服务器上删除所述待删除容器。
  12. 根据权利要求9所述的方法,其特征在于,所述响应于所述开始部署操作,基于所述待部署节点以及所述待部署节点所处的物理池,生成第一请求报文之后,所述方法还包括下述至少一项:
    对所述第一请求报文的数据格式进行校验;
    按照预设部署规则,对所述第一请求报文所携带的部署数据进行校验。
  13. 根据权利要求9所述的方法,其特征在于,所述方法还包括下述至少一项:
    响应于所述第一请求报文,在第一部署表中生成操作记录,所述操作记录用于记录本次部署操作;
    响应于所述第一请求报文,在第二部署表中生成待部署节点对应的容器部署记录,所述容器部署记录用于记录所述待部署节点对应的部署操作。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括下述至少一项:
    在所述操作记录中记录本次操作的部署状态;
    在所述容器部署记录中记录所述待部署节点对应的容器的部署状态;
    其中,所述部署状态至少包括未部署、已部署和部署错误。
  15. 根据权利要求1所述的方法,其特征在于,所述待部署节点包括多种类型,所述方法还包括:
    显示部署指示界面;
    通过所述部署指示界面获取待部署的大数据组件类型、组件版本和目标数据,所述目标数据用于指示数据处理需求所需的每秒存储数据条数;
    基于待部署的大数据组件类型、组件版本、目标数据和预设参数,确定各种类型的待部署节点对应的推荐部署个数。
  16. 根据权利要求1所述的方法,其特征在于,所述部署资源池区域包括新增物理池控件,所述方法还包括:
    响应于对所述新增物理池控件的触发操作,显示增加物理池界面,所述增加物理池界面包括标识获取控件和密码获取控件;
    通过所述标识获取控件获取待增加的物理池对应的服务器标识,通过所述密码获取控件获取待验证密码;
    在所述待验证密码验证通过的情况下,在所述部署资源池区域显示所述待增加的物理池。
  17. 根据权利要求16所述的方法,其特征在于,所述通过所述标识获取控件获取 待增加的物理池对应的服务器标识,通过所述密码获取控件获取待验证密码之后,所述方法还包括:
    在所述待验证密码验证通过的情况下,生成第二请求报文;
    将所述第二请求报文存储至第二消息队列中;
    从所述第二消息队列中获取所述第二请求报文,基于所述第二请求报文,向所述待增加的物理池对应的服务器发送安装文件,所述服务器用于在接收到所述安装文件的情况下对所述安装文件进行安装,以使所述服务器加入大数据集群。
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:
    在所述待验证密码未验证通过或者所述服务器未成功加入大数据集群的情况下,显示第一提示信息,所述第一提示信息用于指示所述服务器未成功加入大数据集群的原因。
  19. 根据权利要求16所述的方法,其特征在于,所述方法还包括:
    在所述待验证密码验证通过的情况下,在第三部署表中生成服务器部署记录,所述服务器部署记录用于记录所述待增加的物理池对应的部署操作。
  20. 根据权利要求19所述的方法,其特征在于,所述方法还包括:
    在所述服务器部署记录中记录所述待增加的物理池对应的服务器的初始化状态,所述初始化状态至少包括待初初始化、初始化中、初始化错误和初始化完成。
  21. 根据权利要求16所述的方法,其特征在于,所述方法还包括:
    在所述待验证密码验证通过的情况下,向待增加的物理池对应的服务器发送目标密钥,所述目标密钥用于在后续通信过程中实现身份验证。
  22. 根据权利要求1所述的方法,其特征在于,所述部署资源池区域包括删除物理池控件,一个物理池对应于一个删除物理池控件,所述方法还包括:
    响应于对任一删除物理池控件的触发操作,不再在所述部署资源池区域显示所述删除物理池控件对应的物理池。
  23. 根据权利要求22所述的方法,其特征在于,所述方法还包括:
    响应于对任一删除物理池控件的触发操作,从所述删除物理池控件对应的物理池 所对应的服务器中删除已部署的容器。
  24. 根据权利要求1所述的方法,其特征在于,所述部署资源池区域包括置顶物理池控件,一个物理池对应于一个置顶物理池控件,所述方法还包括:
    响应于对任一置顶物理池控件的触发操作,将所述置顶物理池控件对应的物理池显示在所述部署资源池区域中的第一目标位置处。
  25. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    对于所述部署资源池区域中所显示的任一物理池,在所述物理池的第二目标位置处显示所述物理池所对应的服务器的服务器标识,在所述物理池的第三目标位置处显示所述物理池所对应的服务器的当前存储使用率、内存占用率和分配内存占用率。
  26. 根据权利要求1所述的方法,其特征在于,所述部署界面还包括恢复设置控件,所述方法还包括:
    响应于对所述恢复设置控件的触发操作,生成第三请求报文,所述第三请求报文用于请求删除已部署的服务器和容器;
    基于所述第三请求报文,从已部署的服务器中删除已部署的多个容器,执行第三预设脚本文件以使已部署的服务器脱离大数据集群。
  27. 根据权利要求1所述的方法,其特征在于,所述大数据集群包括至少一个服务器,所述至少一个服务器中存在一个初始服务器,所述方法包括:
    在所述初始服务器上安装目标运行环境,并在所述初始服务器上配置所述目标运行环境对应的接口;
    在所述初始服务器上创建所述目标运行环境对应的Overlay网络,并在所述初始服务器上初始化集群环境;
    在所述初始服务器上创建大数据组件基础镜像,所述大数据组件基础镜像用于为容器提供构建基础;
    在所述初始服务器上生成目标密钥文件。
  28. 根据权利要求1所述的方法,其特征在于,所述大数据集群的不同容器之间通过Overlay网络进行通信。
  29. 一种基于大数据集群的数据处理方法,其特征在于,所述方法包括:
    获取数据处理请求;
    通过Overlay网络,向目标容器发送所述数据处理请求,所述目标容器用于基于所述数据处理请求实现数据处理过程,所述容器按照对部署界面中的待部署节点的拖拽操作在服务器上创建得到,所述容器用于提供大数据集群服务。
  30. 根据权利要求29所述的方法,其特征在于,所述通过Overlay网络,向目标容器发送所述数据处理请求,包括:
    基于所述数据处理请求,确定至少一个目标容器;
    通过所述Overlay网络,向所述至少一个目标容器发送所述数据处理请求。
  31. 根据权利要求30所述的方法,其特征在于,在所述目标容器的数量大于等于2的情况下,所述至少一个目标容器至少包括第一目标容器和第二目标容器;
    所述通过所述Overlay网络,向所述至少一个目标容器发送所述数据处理请求,包括:
    通过所述Overlay网络,向所述第一目标容器发送所述数据处理请求,所述第一目标容器用于通过所述Overlay网络与所述第二目标容器进行通信,以完成对所述数据处理请求的响应。
  32. 根据权利要求31所述的方法,其特征在于,所述数据处理请求为数据存储请求、数据获取请求或数据删除请求。
  33. 一种大数据集群部署系统以及对应的数据处理系统,其特征在于,所述系统包括:
    可视化操作模块,用于显示部署界面;
    所述可视化操作模块,还用于响应于在所述部署界面的节点创建操作,在所述部署界面的临时资源池区域显示待部署节点,所述节点为大数据组件所包括的、用于提供数据管理功能的服务;
    所述可视化操作模块,还用于响应于对所述临时资源池区域中的待部署节点的拖拽操作,在所述部署界面的部署资源池区域中的物理池中显示所述待部署节点;
    架构服务模块,用于响应于在所述部署界面中的开始部署操作,按照所述待部署 节点所处的物理池,在所述物理池对应的服务器上创建所述待部署节点对应的容器,所述容器用于提供大数据集群服务。
  34. 根据权利要求33所述的系统,其特征在于,所述架构服务模块,在用于响应于在所述部署界面中的开始部署操作,按照所述待部署节点所处的物理池,在所述物理池对应的服务器上部署所述待部署节点对应的容器时,用于:
    响应于所述开始部署操作,基于所述待部署节点以及所述待部署节点所处的物理池,生成第一请求报文,所述第一请求报文用于指示在所述物理池对应的服务器上部署所述待部署节点对应的容器;
    基于所述第一请求报文和所述物理池对应的服务器上的已部署容器,确定所述待部署节点对应的部署操作类型以及已部署容器中的待删除容器,所述部署操作类型包括新增节点、移动节点和不改变节点;
    按照所述待部署节点对应的部署操作类型以及已部署容器中的待删除容器,在所述物理池对应的服务器上进行容器部署。
  35. 根据权利要求34所述的系统,其特征在于,所述架构服务模块,还用于将所述第一请求报文存储至第一消息队列中;
    所述系统还包括:
    消息模块,用于从所述第一消息队列中获取所述第一请求报文;
    所述架构服务模块,还用于在消息模块获取到第一请求报文的情况下,执行基于所述第一请求报文和所述物理池对应的服务器上的已部署容器,确定所述待部署节点对应的部署操作类型以及已部署容器中的待删除容器的步骤。
  36. 根据权利要求35所述的系统,其特征在于,所述系统还包括数据库模块;
    所述数据库模块,用于响应于所述第一请求报文,在第一部署表中生成操作记录,所述操作记录用于记录本次部署操作;
    所述数据库模块,还用于响应于所述第一请求报文,在第二部署表中生成待部署节点对应的容器部署记录,所述容器部署记录用于记录所述待部署节点对应的部署操作。
  37. 根据权利要求33所述的系统,其特征在于,所述部署资源池区域包括新增物 理池控件;所述可视化操作模块,还用于:
    响应于对所述新增物理池控件的触发操作,显示增加物理池界面,所述增加物理池界面包括标识获取控件和密码获取控件;
    通过所述标识获取控件获取待增加的物理池对应的服务器标识,通过所述密码获取控件获取待验证密码;
    在所述待验证密码验证通过的情况下,在所述部署资源池区域显示所述待增加的物理池。
  38. 根据权利要求37所述的系统,其特征在于,所述架构服务模块,还用于在所述待验证密码验证通过的情况下,生成第二请求报文;
    所述架构服务模块,还用于将所述第二请求报文存储至第二消息队列中
    所述系统还包括:
    消息模块,用于从所述第二消息队列中获取所述第二请求报文;
    所述架构服务模块,还用于基于所述第二请求报文,向所述待增加的物理池对应的服务器发送安装文件,所述服务器用于在接收到所述安装文件的情况下对所述安装文件进行安装,以使所述服务器加入大数据集群。
  39. 根据权利要求37所述的系统,其特征在于,所述系统还包括:
    数据库模块,用于在所述待验证密码验证通过的情况下,在第三部署表中生成服务器部署记录,所述服务器部署记录用于记录所述待增加的物理池对应的部署操作。
  40. 根据权利要求33所述的系统,其特征在于,所述系统还包括网络模块,用于保证容器间的跨服务器通信。
  41. 根据权利要求33所述的系统,其特征在于,所述系统还包括大数据组件插件模块,所述大数据组件插件模块用于实现容器在服务器上的启动。
  42. 一种计算设备,其特征在于,所述计算设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至28中任一项所述的大数据集群部署方法所执行的操作,或者,所述处理器执行所述计算机程序时实现如权利要求29至32中任一项所述的基于大数据集群的数据处理方法所执行的操作。
  43. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有程序,所述程序被处理器执行时,实现如权利要求1至28中任一项所述的大数据集群部署方法所执行的操作,或者,所述程序被处理器执行时,实现如权利要求29至32中任一项所述的基于大数据集群的数据处理方法所执行的操作。
PCT/CN2022/106091 2022-07-15 2022-07-15 大数据集群部署方法以及基于大数据集群的数据处理方法 WO2024011627A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202280002227.2A CN117716335A (zh) 2022-07-15 2022-07-15 大数据集群部署方法以及基于大数据集群的数据处理方法
PCT/CN2022/106091 WO2024011627A1 (zh) 2022-07-15 2022-07-15 大数据集群部署方法以及基于大数据集群的数据处理方法
CN202380009266.XA CN117716338A (zh) 2022-07-15 2023-05-31 大数据集群部署方法、装置、设备及介质
PCT/CN2023/097480 WO2024012082A1 (zh) 2022-07-15 2023-05-31 大数据集群部署方法、装置、设备及介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/106091 WO2024011627A1 (zh) 2022-07-15 2022-07-15 大数据集群部署方法以及基于大数据集群的数据处理方法

Publications (1)

Publication Number Publication Date
WO2024011627A1 true WO2024011627A1 (zh) 2024-01-18

Family

ID=89535335

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2022/106091 WO2024011627A1 (zh) 2022-07-15 2022-07-15 大数据集群部署方法以及基于大数据集群的数据处理方法
PCT/CN2023/097480 WO2024012082A1 (zh) 2022-07-15 2023-05-31 大数据集群部署方法、装置、设备及介质

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/097480 WO2024012082A1 (zh) 2022-07-15 2023-05-31 大数据集群部署方法、装置、设备及介质

Country Status (2)

Country Link
CN (2) CN117716335A (zh)
WO (2) WO2024011627A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681781A (zh) * 2012-04-27 2012-09-19 华为技术有限公司 一种集群重组的方法及装置
CN107070691A (zh) * 2017-01-12 2017-08-18 阿里巴巴集团控股有限公司 Docker容器的跨主机通信方法和系统
US20170316114A1 (en) * 2016-04-29 2017-11-02 Accenture Global Solutions Limited System architecture with visual modeling tool for designing and deploying complex models to distributed computing clusters
CN111736994A (zh) * 2020-06-15 2020-10-02 国网电力科学研究院有限公司 资源编排方法、系统及存储介质和电子设备
CN113835705A (zh) * 2021-09-29 2021-12-24 北京金山云网络技术有限公司 大数据服务产品开发方法、装置及系统
CN114443294A (zh) * 2022-01-20 2022-05-06 苏州浪潮智能科技有限公司 大数据服务组件部署方法、系统、终端及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107454140A (zh) * 2017-06-27 2017-12-08 北京溢思得瑞智能科技研究院有限公司 一种基于大数据平台的Ceph集群自动化部署方法及系统
CN110011827A (zh) * 2019-02-26 2019-07-12 中电科软件信息服务有限公司 面向医联体的多用户大数据分析服务系统和方法
CN112084009A (zh) * 2020-09-17 2020-12-15 湖南长城科技信息有限公司 PK体系下基于容器化技术构建和监控Hadoop集群与告警的方法
CN114443293A (zh) * 2022-01-20 2022-05-06 苏州浪潮智能科技有限公司 一种大数据平台的部署系统及方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681781A (zh) * 2012-04-27 2012-09-19 华为技术有限公司 一种集群重组的方法及装置
US20170316114A1 (en) * 2016-04-29 2017-11-02 Accenture Global Solutions Limited System architecture with visual modeling tool for designing and deploying complex models to distributed computing clusters
CN107070691A (zh) * 2017-01-12 2017-08-18 阿里巴巴集团控股有限公司 Docker容器的跨主机通信方法和系统
CN111736994A (zh) * 2020-06-15 2020-10-02 国网电力科学研究院有限公司 资源编排方法、系统及存储介质和电子设备
CN113835705A (zh) * 2021-09-29 2021-12-24 北京金山云网络技术有限公司 大数据服务产品开发方法、装置及系统
CN114443294A (zh) * 2022-01-20 2022-05-06 苏州浪潮智能科技有限公司 大数据服务组件部署方法、系统、终端及存储介质

Also Published As

Publication number Publication date
CN117716338A (zh) 2024-03-15
CN117716335A (zh) 2024-03-15
WO2024012082A1 (zh) 2024-01-18

Similar Documents

Publication Publication Date Title
US10250461B2 (en) Migrating legacy non-cloud applications into a cloud-computing environment
CN109194506B (zh) 区块链网络部署方法、平台及计算机存储介质
US10873510B2 (en) Packaging tool for first and third party component deployment
US20220078092A1 (en) Provisioning a service
EP3176697B1 (en) Type-to-type analysis for cloud computing technical components
US8949364B2 (en) Apparatus, method and system for rapid delivery of distributed applications
US8316305B2 (en) Configuring a service based on manipulations of graphical representations of abstractions of resources
US11086685B1 (en) Deployment of virtual computing resources with repeatable configuration as a resource set
JP2010102414A (ja) 仮想システム制御プログラム、方法及び装置
US9959157B1 (en) Computing instance migration
US11894983B2 (en) Simulation and testing of infrastructure as a service scale using a container orchestration engine
US20220263772A1 (en) Metadata driven static determination of controller availability
CN115827008B (zh) 一种基于云原生平台Kubernetes的云原生大数据组件管理系统
US11762668B2 (en) Centralized configuration data management and control
US20200218566A1 (en) Workload migration
JP2021039393A (ja) パッケージ化支援システムおよびパッケージ化支援方法
WO2024011627A1 (zh) 大数据集群部署方法以及基于大数据集群的数据处理方法
WO2024011628A1 (zh) 数据处理方法、装置、设备及介质
JP2018018415A (ja) 情報処理装置、情報処理システム、プログラム及び情報処理方法
Hicks et al. Integration and implementation (int) cs 5604 f2020
US11477090B1 (en) Detecting deployment problems of containerized applications in a multiple-cluster environment
US20240152372A1 (en) Virtual representations of endpoints in a computing environment
Christian A study of live migration on OpenStack
Moravcik et al. Automated deployment of the OpenStack platform
CN117435307A (zh) 一种虚拟机迁移的方法及设备

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202280002227.2

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22950745

Country of ref document: EP

Kind code of ref document: A1