WO2024055715A1 - Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium - Google Patents

Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium Download PDF

Info

Publication number
WO2024055715A1
WO2024055715A1 PCT/CN2023/105108 CN2023105108W WO2024055715A1 WO 2024055715 A1 WO2024055715 A1 WO 2024055715A1 CN 2023105108 W CN2023105108 W CN 2023105108W WO 2024055715 A1 WO2024055715 A1 WO 2024055715A1
Authority
WO
WIPO (PCT)
Prior art keywords
deployed
computer room
component
deployment
components
Prior art date
Application number
PCT/CN2023/105108
Other languages
French (fr)
Chinese (zh)
Inventor
冯伟
武文博
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024055715A1 publication Critical patent/WO2024055715A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • This application relates to the field of big data technology, and in particular to a method, device, cluster and storage medium for determining a big data cluster deployment solution.
  • Big data is usually processed by a big data cluster, which can be a Hadoop cluster.
  • a big data cluster needs to expand its scale.
  • the big data cluster needs to be installed in an off-site computer room.
  • Deployment, equipment between multiple computer rooms transmits data through the network.
  • the big data cluster was originally deployed in the computer room at location A.
  • the big data cluster was also deployed in the computer room at location B.
  • the devices in the computer room in location A and location B transmit data through the network.
  • the big data cluster when deploying a big data cluster, the big data cluster is usually manually divided into multiple parts randomly according to the host information accommodated in multiple computer rooms, and deployed to multiple computer rooms respectively. This method may cause deployment problems. The plan is unreasonable and the efficiency of determining the deployment plan is low.
  • This application provides a method, device, cluster and storage medium for determining a big data cluster deployment plan, which can improve the efficiency of determining the deployment plan.
  • this application provides a method for determining a big data cluster deployment plan.
  • the big data cluster is managed by the big data management platform.
  • the computer room where the big data cluster is deployed includes multiple computer rooms, and each computer room houses a host. Methods include:
  • the deployment requirements include the deployment requirement information of the components to be deployed in the big data cluster, the computer room capacity information of the multiple computer rooms, and the parameter information of the host where the big data cluster is deployed. Based on the deployment The requirements and the category of the component to be deployed are determined, and the deployment plan of the component to be deployed is determined.
  • the deployment plan includes the computer room where the component to be deployed is deployed and the hosts deployed in the computer room, and the deployment plan is output.
  • the deployment demand information of the components to be deployed in the big data cluster, the computer room capacity information of multiple computer rooms, and the parameter information of the host where the big data cluster is deployed are fully considered, and the computing device determines based on this information
  • the deployment plan is generated, which improves the rationality of the deployment plan and the efficiency of determining the deployment plan.
  • determining the deployment plan of the component to be deployed based on the deployment requirement and the category of the component to be deployed includes: determining the deployment plan of the component to be deployed based on the deployment requirement and the first deployment strategy, the The first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component in the component to be deployed in the same computer room.
  • the data storage part and the computing part of the same component in the first type of component are deployed in the same computer room, which can reduce the cross-computer room data transmission of the data storage part and the computing part of the first type component. This can save bandwidth between computer rooms.
  • determining a deployment plan for the component to be deployed based on the deployment requirement and the first deployment strategy includes: determining a host that satisfies the deployment requirement information of the component to be deployed as the host for deployment of the component to be deployed, based on The computer room capacity information, the host where the component to be deployed is deployed, and the first deployment strategy determine the computer room to which the host where the component to be deployed belongs.
  • the method before determining the deployment plan of the component to be deployed based on the deployment requirement and the first deployment strategy, the method further includes: determining the first type of component based on the deployment requirement information and the parameter information.
  • the number of hosts required by the component is determined to be greater than the number of hosts accommodated in each of the multiple computer rooms based on the capacity information of the computer room.
  • the components to be deployed other than the first type component are deployed in the same computer room.
  • components other than the first type of components are deployed in the same computer room, which can reduce the data transmission of these components across computer rooms, thereby saving bandwidth between computer rooms.
  • determining a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed includes: determining the first category of components based on the deployment requirement information of the component to be deployed. The number of hosts required by the component. If the number of hosts accommodated in the multiple computer rooms is greater than or equal to the number of hosts, a deployment plan for the component to be deployed is determined based on the deployment requirement and the second deployment strategy.
  • the second deployment strategy is a strategy of deploying the first type of components in the same computer room.
  • the first type of components can be deployed in a certain computer room among multiple computer rooms. Deploying the first type of components in the same computer room can reduce the cross-machine room transmission of data between the first type of components, and thus Can save bandwidth between computer rooms.
  • the plurality of computer rooms include the first computer room and the second computer room, and determining a deployment plan for the component to be deployed based on the deployment requirement and the second deployment strategy includes: determining a deployment plan that satisfies the component to be deployed.
  • the host of the required information is the host where the component to be deployed is deployed, the computer room to which the host of the first type component is deployed belongs is determined to be the first computer room, and the second type of components in the component to be deployed except the first type component are determined.
  • the deployed computer room is the second computer room, and the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
  • deploying the first type of components in one computer room and deploying the second type of components in another computer room can reduce the cross-machine room transmission of data between the first type of components and the The data is transmitted across computer rooms, thereby saving bandwidth between computer rooms.
  • the first type of components includes components based on big data file resource system (Hadoop distributed file system, HDFS) and big data resource scheduler (yet another resource negotiator, YARN).
  • HDFS big data file resource system
  • YARN another resource negotiator
  • the method further includes: for a first computer room and a second computer room in the plurality of computer rooms, determining a first data transmission amount of the first type component between the first computer room and the second computer room. , determine the second data transmission volume of the first type component and the second type component between the first computer room and the second computer room, determine the sum of the management plane data volume between the first computer room and the second computer room.
  • the control plane data volume determines the bandwidth requirement between the first computer room and the second computer room based on the first data transmission volume, the second data transmission volume, the management plane data volume, and the control plane data volume.
  • the bandwidth requirements between computer rooms can be determined based on certain strategies, providing a reference value for bandwidth requirements for big data cluster deployers.
  • the method further includes: determining the amount of data transmission between the first type component and the second type component, and determining the amount of management plane data and control between the first computer room and the second computer room. Based on the data transmission volume, the management plane data volume and the control plane data volume, the bandwidth demand information between the first computer room and the second computer room is determined.
  • the bandwidth requirements between computer rooms can be determined based on certain strategies, providing a reference value for bandwidth requirements for big data cluster deployers.
  • the deployment requirement information for each component includes one or more of operating system requirement information, data volume, or throughput.
  • the parameter information includes one or more of operating system information, network information, or hardware information of various types of hosts and the number of the various types of hosts.
  • this application provides a device for determining a big data cluster deployment solution.
  • the device includes at least one module, and the at least one module is used to implement the above-mentioned first aspect or any one of the examples of the first aspect.
  • the method to determine the big data cluster deployment plan is used to implement the above-mentioned first aspect or any one of the examples of the first aspect.
  • the modules in the determined device of the big data cluster deployment solution are implemented by software, and the modules in the determined device of the big data cluster deployment solution are program modules. In other embodiments, the modules in the determined device of the big data cluster deployment solution are implemented by hardware or firmware.
  • the present application provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • Each computing device includes a processor and a memory.
  • the processor of the at least one computing device is configured to execute the at least one computing device.
  • the instructions stored in the memory enable the computing device cluster to execute the determined method of the big data cluster deployment solution provided by the above-mentioned first aspect or any example of the first aspect.
  • the present application provides a computer-readable storage medium.
  • the computer-readable storage medium includes computer program instructions.
  • the computing device cluster executes the above first aspect or the third aspect.
  • any of the examples provides a definite method of big data cluster deployment solution.
  • the present application provides a computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to execute the above-mentioned first aspect or any one of the examples of the first aspect.
  • the method to determine the big data cluster deployment plan is a computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to execute the above-mentioned first aspect or any one of the examples of the first aspect.
  • Figure 1 is a schematic diagram of off-site expansion of a big data cluster provided by an exemplary embodiment of the present application
  • Figure 2 is a schematic diagram of the system architecture provided by an exemplary embodiment of the present application.
  • Figure 3 is a schematic flowchart of a method for determining a big data cluster deployment solution provided by an exemplary embodiment of the present application
  • Figure 4 is a schematic flowchart of a method for determining a big data cluster deployment solution provided by an exemplary embodiment of the present application
  • Figure 5 is a schematic structural diagram of a device for determining a big data cluster deployment solution provided by an exemplary embodiment of the present application
  • Figure 6 is a schematic structural diagram of a computing device provided by an exemplary embodiment of the present application.
  • Figure 7 is a schematic structural diagram of a computing device cluster provided by an exemplary embodiment of the present application.
  • Figure 8 is a schematic connection diagram of a computing device provided by an exemplary embodiment of the present application.
  • YARN is the resource manager of Hadoop. It is a general resource management system that can provide unified resource management and scheduling for upper-layer applications. YARN is also known as another resource coordinator.
  • HDFS is a distributed file system of Hadoop, which can provide a highly available distributed file system for obtaining application data.
  • Big data array database (Hadoop database, HBase) is a column-oriented non-relational structured query language (NoSQL) database built on HDFS, which is used to quickly read and write large amounts of data. data.
  • NoSQL non-relational structured query language
  • Hive Big data data warehouse
  • Application scenario 1 The embodiment of this application is applied to the scenario of deploying big data clusters in multiple computer rooms.
  • the scale of big data clusters is often relatively large, and the space and capacity of a single computer room are limited, so big data clusters are often deployed in multiple computer rooms.
  • Application scenario 2 The embodiment of this application is applied to the scenario of remote expansion. For example, after a big data cluster is deployed in a computer room for a period of time, as the amount of data increases, the big data cluster needs to expand its scale. In many cases, due to insufficient space in the original computer room, big data clusters need to expand the host capacity in an off-site computer room. For example, see Figure 1. When the original first computer room is insufficient in space, a second computer room is built at another location, and the big data cluster is deployed in both the first computer room and the second computer room.
  • the big data cluster supports cross-computer room deployment and cross-computer room expansion.
  • the big data cluster is a complete cluster for the upper-layer application system and other business systems.
  • the upper-layer application system and other business systems will not Aware of the physical deployment form of big data clusters.
  • the management platform of the big data cluster will perform unified management.
  • the system architecture 100 includes a first device 101 and a second device 102.
  • Both the first device 101 and the second device 102 can be computing devices such as terminals or servers (for computing devices, see the computing device 200 described later).
  • the first device 101 and the second device 102 are connected through a wired or wireless network.
  • the first device 101 is used to determine the deployment plan.
  • the deployment plan includes the computer room where the components in the big data cluster are deployed and the hosts deployed in the computer room.
  • the second device 102 is used to deploy the components of the big data cluster according to the deployment plan. to the corresponding host.
  • the first device and/or the second device may also be a server or a virtual machine in the data center of the cloud computing platform to provide users with cloud services that determine the deployment plan.
  • the second device 102 is responsible for deploying components. In another implementation, the second device 102 is not included in the system architecture 100. After the first device 101 determines the deployment plan, it will cluster the big data according to the deployment plan. The components are deployed to the corresponding hosts.
  • Figure 3 provides the process of determining the big data cluster deployment plan, see step 301 to step 303.
  • the first device 101 determines the deployment plan as an example to illustrate the plan.
  • Step 301 Receive the input deployment requirements of the big data cluster.
  • the deployment requirements include the deployment requirement information of the components to be deployed in the big data cluster, the computer room capacity information of the multiple computer rooms where the big data cluster is deployed, and the information of the computer rooms where the big data cluster is deployed. Host information.
  • the big data cluster is a big data Hadoop cluster
  • the components to be deployed include components based on HDFS and YARN in the big data cluster, as well as components not based on HDFS and YARN.
  • components based on HDFS and YARN include HDFS components, YRAN components, HBase components, Hive components, Spark components, and Flink components, etc.
  • components not based on HDFS and YARN include Kafka components (a high-throughput Distributed publish-subscribe messaging system components), elastic search (Elastic Search) components, remote data service (remote dictionary server, Redis) components and Flume components (a highly available, highly reliable, distributed massive log collection , aggregation and transmission system components), etc.
  • the deployment requirements include deployment requirement information of the components to be deployed, capacity information of multiple computer rooms where the big data cluster is deployed, and parameter information of the host where the big data cluster is deployed.
  • an input interface for deployment requirements is provided to the user, and the user inputs the deployment requirements for the big data cluster through the input interface.
  • the first device may obtain the deployment requirements input by the user.
  • the input interface can be provided in the form of a graphical interface, a command line, or an application programming interface (API) on a cloud computing platform.
  • API application programming interface
  • the user when deploying the big data cluster, the user triggers the terminal device to send a confirmation request for the deployment plan to the first device.
  • the first device receives a determination request of the deployment plan, and the determination request includes the deployment requirements of the big data cluster.
  • the first device may obtain the deployment requirement in the determination request.
  • the deployment requirement information of the component includes one or more of operating system requirement information, data volume or throughput, and the operating system requirement information indicates the operating system of the host on which the component is deployed,
  • the data volume indicates the amount of data to be processed by the component
  • the throughput indicates the amount of data transmitted by the component per unit time.
  • the parameter information of the host includes one or more of operating system information, network information, or hardware information of various types of hosts, as well as the number of various types of hosts.
  • the operating system information indicates the operating system of the host
  • the network information indicates the bandwidth of the host, etc.
  • the hardware information indicates the central processing unit (CPU) model and storage resources of the host.
  • multiple computer rooms can be set up in the same city or in different cities.
  • the computer room capacity information of each computer room includes the number of hosts accommodated.
  • the computer room capacity information of each computer room includes the number of racks accommodated and the number of hosts placed in each rack.
  • the computer room capacity information of each computer room includes the number of racks accommodated and the number of hosts accommodated by each rack. and the number of individually accommodated hosts.
  • Step 302 Determine a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed.
  • the deployment plan includes the computer room where the component to be deployed is deployed and the hosts deployed in the computer room.
  • the category of the component to be deployed is used to distinguish the component to be deployed.
  • the first device uses the deployment requirement and the category of the component to be deployed to determine the computer room where the component to be deployed is to be deployed and the host to be deployed in the computer room, that is, the first device obtains the deployment plan of the component to be deployed.
  • the host to which the component to be deployed can be the model to which the host is deployed, or the ID of the host to which it is deployed. There may be multiple hosts of the same model in the computer room, and there may only be one host with the same identification in the computer room. .
  • the components to be deployed can be divided into a first type of component and a second type of component.
  • the first type of component is a component based on HDFS and YRAN
  • the second type of component is a component not based on HDFS and YRAN.
  • step 302 can be as follows:
  • the first device obtains a first deployment strategy of storage, where the first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component in the first type of component in the same computer room, and determines the components included in the first type of component. Based on the deployment requirements of the big data cluster and the first deployment strategy, the first device deploys the data storage part and the computing part of the same component in the first type of component in the same computer room. For example, for the HBase component, the data storage part of the HBase component is deployed on host 1, host 2, host 3 and host 4, and the corresponding computing part of the HBase component is deployed on host 1, host 2, host 3 and host 4. Host 1 , Host 2, Host 3 and Host 4 are deployed in computer room A.
  • the data storage part of the Hive component is deployed on host 5, host 6, host 7 and host 8.
  • the corresponding computing part of the Hive component is deployed on host 5, host 6, host 7 and host 8.
  • Host 7 and host 8 are deployed in computer room B.
  • the processing method is as follows:
  • the first device determines a host that satisfies the deployment requirement information of the first type component.
  • the deployment requirement information of component A is a Windows operating system, a data volume of 500M, and a throughput of 1G.
  • the deployment requirement information of component A is met.
  • the operating system of the information host is a Windows operating system.
  • the host can process 500M data volume for component A, and the host can transmit 1G data volume for component A within a unit time.
  • the first device determines the host that meets the deployment requirement information as the host where the component to be deployed is deployed, and then for the same component in the first type of component, the first device uses the computer room capacity information and the host to deploy the component to be deployed, and deploys the same component.
  • the hosts of the data storage part and the computing part of a component are set up in the same computer room, so that there is no need for cross-machine room data transmission between the data storage part and the computing part of the same component.
  • the first device deploys the hosts to which the second type components are deployed in the same computer room, thereby reducing the amount of data transmission of the second type components across computer rooms. This is because the second type of components is relatively small and can be deployed in the same computer room.
  • resource pools are divided for different components in the first type of components, and different components correspond to different resource pools.
  • Configure the host label of the data storage part for each component and configure the host label of the computing part for each component.
  • the host label of the data storage part corresponds to the host label of the computing part.
  • the host corresponding to the host label of the data storage part constitutes the component. resource pool.
  • the host labels configured for the data storage part of the HBase component in the first type of component are label 1, label 2, label 3 and label 4.
  • the hosts corresponding to label 1, label 2, label 3 and label 4 are deployed in the first computer room.
  • the host labels configured for the computing part corresponding to the HBase component are label 1, label 2, label 3 and label 4, so that the computing part is scheduled to the hosts corresponding to label 1, label 2, label 3 and label 4 during scheduling.
  • Yarn dynamically associates computing tasks in the computing task queue with resource pools with corresponding labels based on the resource requirements of the computing task queue.
  • the processing in step 302 can be as follows:
  • the input of the deployment plan generation software is the deployment requirements of the big data cluster and the categories of components to be deployed.
  • the output is the deployment plan of the big data cluster.
  • the first device inputs the deployment requirements of the big data cluster and the categories of components to be deployed into the deployment plan generation software, and the deployment plan generation software outputs a deployment plan, and the deployment plan is the deployment plan of the big data cluster.
  • the components to be deployed include a first type of component and a second type of component.
  • the first type of component is a component based on HDFS and YRAN
  • the second type of component is a component that is not based on HDFS and YRAN.
  • the second type of component It can be considered as components other than the first type of components to be deployed.
  • step 401 the first device uses the deployment requirement information of each component included in the first type of component and the parameter information of the host to determine the number of hosts required by each component, and compares the number of hosts required by each component. Add, the number of hosts required to obtain the first type of components.
  • the first device uses the computer room capacity information of each computer room in the plurality of computer rooms to determine the number of hosts accommodated in each computer room.
  • the parameter information of the host is: there are 10 hosts in the Windows system, the hardware is CPU, the CPU is 2*32 cores, the memory is 4*32G, etc., there are 12 hosts in the Linux system, the hardware is CPU, and the CPU is 2* 32 cores, memory 8*32G, the number of hosts required for the first type of component is 13, and the capacity information of the first computer room in multiple computer rooms is to accommodate 20 hosts.
  • Step 402 The first device determines the relationship between the number of hosts required for the first type of component and the number of hosts accommodated in each computer room. In the case where the number of hosts accommodated in multiple computer rooms is greater than or equal to the number of hosts , then it is determined that all the first-type components can be deployed in one of the multiple computer rooms. For example, the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
  • Step 403 The first device obtains the second deployment strategy of storage.
  • the second deployment strategy is a strategy for deploying the first type of components in the same computer room.
  • the first device determines the deployment plan of the big data cluster, that is, deploying the first type of components in the same computer room, and for the second type of components, deploy the second type of components in A computer room other than the computer room where the first type components are deployed, or part of it is deployed in the computer room where the first type components are deployed, and the other part is deployed in a computer room other than the computer room where the first type components are deployed.
  • Step 404 If the number of hosts accommodated in multiple computer rooms is not greater than or equal to the number of hosts, the first deployment strategy described above can be used to deploy the components to be deployed. For detailed description, please refer to the previous description. No further details will be given.
  • the processing method is as follows:
  • the first device determines a host that satisfies the deployment requirement information of the first type of component.
  • the first device determines the host that meets the deployment requirement information as the host where the component to be deployed is deployed.
  • the first device determines the host where the first type of component is deployed as the first computer room, and determines the host where the second type of component is deployed as the second computer room. This is because the second type of components is relatively small and can be deployed in the same computer room.
  • the big data cluster is a Hadoop cluster.
  • the first type of components includes HDFS components, HBase components, Yarn components, Spark components, Spark2X components, Hive components, MapReduce (MapReduce) components, and Storm components (a distributed real-time computing System system component), Zookeeper component (ZooKeeper component is a distributed, open source distributed application coordination service component), database (data base, DB) service (Service) component, network authentication protocol (kerberos, Krb) service (Server) component, Hadoop user experience (Hadoop user experience, Hue) component and lightweight directory access protocol (lightweight directory access protocol, Ldap) Server component.
  • the second type of components includes Elastic Search components, enterprise-level search application server (Solr) components, Redis components, graph database (GraphBase) components, Kafka components, loader (Loader) components, file transfer protocol (FTP)- Server component and Oozie component (a task scheduling framework component).
  • Table 1 provides the deployment scheme of the components in the Hadoop cluster.
  • the host type indicates the type of host on which the component is deployed.
  • three different host types are shown, which are represented by type 1, type 2 and type 3 respectively. The details of the hosts of the three host types are See Table 2.
  • deploying the first-category components in the first computer room can reduce the data transmission of the components in the first-category components between different computer rooms. quantity.
  • the management node is to centrally manage the components deployed in the big data cluster, and the control node is the relevant node that performs resource scheduling and task allocation.
  • Step 303 Output the deployment plan.
  • the first device after the first device determines the deployment plan of the component to be deployed, it can output the deployment plan to the second device, and the second device can deploy the component to be deployed to hosts in multiple computer rooms based on the deployment plan. For example, after the first device determines the deployment plan of the component to be deployed, it generates a deployment task list based on the deployment plan.
  • the deployment task list can be output to the second device in the form of an offline export spreadsheet (such as an EXCEL table).
  • the big data cluster installation software of the second device can use the deployment task list to deploy the components to be deployed to hosts in multiple computer rooms.
  • the first device may send the deployment plan to the device that sent the determination request.
  • the first device can display the deployment plan.
  • the first device determines the deployment plan of the component to be deployed, it can use a dedicated device to access the host, and deploy the component to be deployed in the host.
  • the deployment plan of the components to be deployed is automatically determined instead of manually calculating the deployment plan. This can improve the efficiency of determining the deployment plan, and thus improve the efficiency of the deployment plan. Data cluster deployment efficiency.
  • the first device when the first deployment strategy is used to determine the deployment plan, the first device can also determine the bandwidth requirements between any two of the multiple computer rooms, considering that the second type of components are deployed in the same computer room.
  • the processing method for determining bandwidth requirements is as follows:
  • the first data transmission amount of the first type component between the first computer room and the second computer room is determined, and the first data transmission amount of the first type component and the second type component between the first type component and the second type component is determined.
  • the second data transmission volume between the computer room and the second computer room determines the management plane data volume and control plane data volume between the first computer room and the second computer room, based on the first data transmission volume, the second data transmission volume, the management The amount of plane data and the amount of control plane data determine the bandwidth requirements between the first computer room and the second computer room.
  • the first computer room and the second computer room among multiple computer rooms are used as an example for explanation.
  • management nodes and control nodes can be deployed in the same computer room or in different computer rooms.
  • the data transmission volume between three parts is considered.
  • the first part is the management plane data volume and the control plane data volume between the first computer room and the second computer room.
  • the second part is the first type component between the first computer room and the second computer room.
  • the third part is the second data transmission amount between the first type component and the second type component between the first computer room and the second computer room.
  • the minimum bandwidth required between the first computer room and the second computer room is the sum of the management plane data volume, the control plane data volume, the first data transmission volume, and the second data transmission volume.
  • the bandwidth requirement between the first computer room and the second computer room is usually greater than the minimum bandwidth required between the first computer room and the second computer room.
  • the first device After determining the bandwidth requirements between each two computer rooms in the plurality of computer rooms, the first device outputs the bandwidth requirements and the deployment plan of the big data cluster to the second device. Or, send the bandwidth requirement and the deployment plan of the big data cluster to the device that sent the confirmation request, or, when displaying the deployment plan of the big data cluster, display the bandwidth requirement at the same time.
  • the second type of components is deployed in the same computer room as an example.
  • the second type of components are deployed in different computer rooms, it is also necessary to consider the deployment of the second type of components in different computer rooms.
  • the second device when the second deployment strategy is used to determine the deployment plan, the second device can also determine the bandwidth requirements between the first computer room and the computer room.
  • the processing method is as follows:
  • the first device determines the data transmission amount per unit time between any component and each component in the second type of component.
  • the first device adds the data transmission amount per unit time corresponding to all components in the first type component to obtain the data transmission amount per unit time between the first type component and the second type component.
  • Management nodes and control nodes can be deployed in the same computer room or in different computer rooms.
  • the first device determines the control plane data amount between the first computer room and the second computer room, determines the management plane data amount between the first computer room and the second computer room, and adds the control plane data amount to the management plane data amount. , obtain the data volume of the control plane between the first computer room and the second computer room.
  • the first device adds the data transmission volume and the management and control plane data volume to obtain a value, and determines the value as the minimum bandwidth required between the first computer room and the second computer room.
  • the bandwidth requirement between the first computer room and the second computer room is usually greater than the minimum bandwidth required between the first computer room and the second computer room.
  • the throughput between a component in the first category and a component in the second category is less than 50Mb/s/node
  • the management plane data volume is 5Mb/s/node
  • the control plane data volume is 1Gb/ s.
  • the total number of components is 50 to 100.
  • the number of first type components is 50 to 450.
  • the number of second type components is less than 50.
  • the control node and management node are deployed in the first computer room.
  • the first type components are deployed in In the first computer room, the second type of components is deployed in the second computer room.
  • the data volume of the control plane across the computer room is 50*5Mb/s/component+1Gb/s
  • the data transmission volume between components across the computer room is 50*50Mb/s/ Component
  • the minimum bandwidth is 3.75GE
  • the bandwidth requirement is 10GE
  • 1GE means 1000Mb/s.
  • the total number of components is 500 to 1000, the number of first type components is 400 to 900, and the number of second type components is less than 100.
  • the control node and management node are deployed in the first computer room, and the first type components are deployed in the first computer room.
  • the second type of component is deployed in the second computer room.
  • the data volume of the control plane across the computer room is 100*5Mb/s/component + 1Gb/s.
  • the data transmission volume between components across the computer room is 100*50Mb/s/component.
  • the minimum bandwidth It is 6.5GE and the bandwidth requirement is 10GE.
  • the total number of components is 1000 to 2000, the number of the first type of components is 800 to 1800, and the number of the second type of components is within 200.
  • the control node and management node are deployed in the first computer room, and the first type of components are deployed in the first computer room.
  • the second type of component is deployed in the second computer room.
  • the data volume of the control plane across the computer room is 200*5Mb/s/component + 1Gb/s.
  • the data transmission volume between components across the computer room is 200*50Mb/s/component.
  • the minimum bandwidth It is 12GE and the bandwidth requirement is 20GE.
  • the first device After determining the bandwidth requirement between the first computer room and the second computer room, the first device outputs the bandwidth requirement and the deployment plan of the big data cluster to the second device. Or, send the bandwidth requirement and the deployment plan of the big data cluster to the device that sent the confirmation request, or, when displaying the deployment plan of the big data cluster, display the bandwidth requirement at the same time.
  • bandwidth requirements can also be output to provide a reference for bandwidth settings between computer rooms.
  • the deployment plan for deploying big data clusters across computer rooms can be automatically output, effectively reducing the complexity of manual random deployment of cross-computer rooms, and solving the problem of complex calculations, complex operations and deployment caused by manual random deployment of big data clusters. Long time problem.
  • the following describes the device for determining the big data cluster deployment solution provided by this application.
  • Figure 5 is a structural diagram of a device for determining a big data cluster deployment solution provided by an embodiment of the present application.
  • the device can be implemented as part or all of the device through software, hardware, or a combination of both.
  • the device provided by the embodiment of the present application can implement the process shown in Figure 3 of the embodiment of the present application.
  • the device includes: an acquisition module 510 and a determination module 520, wherein:
  • the acquisition module 510 is configured to receive the input deployment requirements of the big data cluster.
  • the deployment requirements include deployment requirement information of the components to be deployed in the big data cluster, computer room capacity information of multiple computer rooms in which the big data cluster is deployed, and
  • the parameter information of the host where the big data cluster is deployed can be used to implement the acquisition function of step 301 and execute the implicit steps included in step 301;
  • Determining module 520 configured to determine a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed.
  • the deployment plan includes the computer room where the component to be deployed is deployed and the hosts deployed in the computer room. ;
  • Outputting the deployment plan can specifically be used to implement the acquisition functions of steps 302 and 303 and to execute the implicit steps included in steps 302 and 303.
  • the determining module 520 is used to:
  • the first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component among the components to be deployed in the same computer room.
  • the determining module 520 is used to:
  • the computer room capacity information the host where the component to be deployed is deployed, and the first deployment policy, the computer room to which the host where the component to be deployed belongs belongs is determined.
  • the determining module 520 is also used to:
  • the components to be deployed except for the first type of components are deployed in the same computer room.
  • the determining module 520 is used to:
  • the second deployment strategy is a strategy for deploying the first type of components in the same computer room.
  • the plurality of computer rooms include the first computer room and the second computer room;
  • the determination module 520 is used for:
  • the computer room to which the host of the first type of components belongs is the first computer room
  • the computer room to which the second type of components other than the first type of components to be deployed is deployed is determined to be the second computer room.
  • the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
  • the first type of components includes components based on HDFS and YRAN;
  • components other than the first type of components are components that are not based on the HDFS and the YRAN.
  • the determining module 520 is also used to:
  • the bandwidth requirement between the first computer room and the second computer room is determined.
  • the determining module 520 is also used to:
  • bandwidth demand information between the first computer room and the second computer room is determined.
  • both the acquisition module 510 and the determination module 520 can be implemented by software, or can be implemented by hardware.
  • the implementation of the determination module 520 is introduced.
  • the implementation of the acquisition module 510 can refer to the implementation of the determination module 520 .
  • the determination module 520 may include code running on a computing instance.
  • the computing instance may include at least one of a physical host (computing device), a virtual machine, or a container. Furthermore, the above computing instance may be one or more.
  • determination module 520 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Furthermore, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.
  • the multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (VPC), or across multiple VPCs.
  • VPC virtual private cloud
  • Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .
  • the determination module 520 may include at least one computing device, such as a server.
  • the determination module 520 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • Multiple computing devices included in the determination module 520 may be distributed in the same region or in different regions.
  • the multiple computing devices included in the determination module 520 may be distributed in the same AZ or in different AZs.
  • multiple computing devices included in the determination module 520 may be distributed in the same VPC or in multiple VPCs.
  • the plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • the acquisition module 510 can be used to perform any of the methods for determining the big data cluster deployment plan.
  • the determination module 520 may be used to perform any step in the method for determining the big data cluster deployment plan.
  • the steps that the acquisition module 510 and the determination module 520 are responsible for can be specified as needed.
  • the acquisition module 510 and the determination module 520 respectively implement different steps in the method for determining the big data cluster deployment plan to realize the entire device for determining the big data cluster deployment plan. Function.
  • the following describes the computing device 200 provided by the embodiment of the present application.
  • computing device 200 includes: bus 1102, processor 1104, memory 1106, and communication interface 1108.
  • the processor 1104, the memory 1106 and the communication interface 1108 communicate through a bus 1102.
  • Computing device 200 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 200.
  • the bus 1102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus can be divided into address bus, data bus and control bus. For ease of presentation, only one line is used in Figure 6, but it does not mean that there is only one bus or one type of bus.
  • Bus 1102 may include a path that carries information between various components of computing device 200 (eg, memory 1106, processor 1104, and communications interface 1108).
  • the processor 1104 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 1106 may include volatile memory, such as random access memory (RAM).
  • RAM random access memory
  • the memory 1106 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive). drive, SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 1106 stores executable program code, and the processor 1104 executes the executable program code to respectively implement the functions of the acquisition module 510 and the determination module 520 mentioned above, thereby realizing the method for determining the big data cluster deployment plan. That is, the memory 1106 stores instructions for executing the determined method of the big data cluster deployment plan.
  • the communication interface 1108 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 200 and other devices or communication networks.
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device may be a server, for example, the computing device may be a central server, an edge server, or a local server in a local data center.
  • the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
  • the computing device cluster includes at least one computing device 200 .
  • the memory 1106 of one or more computing devices 200 in the computing device cluster may store the same instructions for executing the determined method of the big data cluster deployment plan.
  • the memory 1106 of one or more computing devices 200 in the computing device cluster may also store part of the instructions for executing the method for determining the big data cluster deployment plan.
  • a combination of one or more computing devices 200 may jointly execute instructions for performing the determined method of the big data cluster deployment scenario.
  • the memory 1106 in different computing devices 200 in the computing device cluster can store different instructions, respectively used to execute part of the functions of the device for determining the big data cluster deployment solution mentioned above. That is, instructions stored in the memory 1106 in different computing devices 200 may implement the functions of one or more of the acquisition module 510 and the determination module 520 .
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network can be a wide area network or a local area network, etc.
  • Figure 8 shows a possible implementation.
  • two computing devices a first computing device 200A and a second computing device 200B
  • the connection to the network is made through a communication interface in each computing device.
  • the memory 1106 in the first computing device 200A stores instructions for performing the functions of the determination module 520 .
  • instructions for performing the functions of the acquisition module 510 are stored in the memory 1106 in the second computing device 200B.
  • connection method between computing device clusters shown in Figure 8 can be based on the fact that in the determination method of the big data cluster deployment solution provided by this application, there is data transmission between the acquisition module 510 and the determination module 520, and the space occupied by the determination module 520 is relatively large, so the function implemented by the execution determination module 520 is considered to be executed by the first computing device 200A, and considering that the determination method of the big data cluster deployment solution provided by this application may interact with the terminal device, it is considered that the execution acquisition
  • the functions implemented by module 510 are performed by the second computing device 200B.
  • first computing device 200A shown in FIG. 8 can also be completed by multiple computing devices 200.
  • second calculation The functions of device 200B may also be performed by multiple computing devices 200.
  • An embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium.
  • the computer program product is run on at least one computing device, at least one computing device is caused to execute the method for determining a big data cluster deployment scheme.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
  • the available media may be magnetic media (for example, floppy disks, hard disks, magnetic tapes), optical media (for example, digital video discs (DVD)), or semiconductor media (for example, solid state drives), etc.
  • the computer-readable storage medium includes instructions that instruct the computing device to perform a method for determining a big data cluster deployment plan.
  • first and second are used to distinguish identical or similar items with substantially the same functions and functions. It should be understood that there is no logical or logical connection between “first” and “second”. Timing dependencies do not limit the number and execution order. It should also be understood that, although the following description uses the terms “first”, “second”, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first type of component may be referred to as a second type of component, and similarly, a second type of component may be referred to as a first type component, without departing from the scope of various examples. Both Type 1 components and Type 2 components can be problems, and in some cases, can be separate and distinct problems.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the technical field of big data, and provides a method and apparatus for determining a big data cluster deployment scheme, a cluster, and a storage medium. The method comprises: receiving an inputted deployment demand of a big data cluster, wherein the deployment demand comprises deployment demand information of a component to be deployed of the big data cluster, machine room capacity information of a plurality of machine rooms, and parameter information of hosts for deploying the big data cluster; on the basis of the deployment demand and the category of said component, determining a deployment scheme of said component, wherein the deployment scheme comprises a machine room for deploying said component and a host deployed in the machine room; and outputting the deployment scheme. By using the solution of the present application, the deployment scheme of said component can be automatically determined, but not is manually calculated, and the determination efficiency of the deployment scheme can be improved.

Description

大数据集群部署方案的确定方法、装置、集群和存储介质Determination method, device, cluster and storage medium of big data cluster deployment plan
本申请要求于2022年09月15日提交的申请号为202211123966.5、发明名称为“大数据集群部署方案的确定方法、装置、集群和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application with application number 202211123966.5 and the invention title "Method, device, cluster and storage medium for determining big data cluster deployment scheme" submitted on September 15, 2022, the entire content of which is incorporated by reference. incorporated in this application.
技术领域Technical field
本申请涉及大数据技术领域,特别涉及一种大数据集群部署方案的确定方法、装置、集群和存储介质。This application relates to the field of big data technology, and in particular to a method, device, cluster and storage medium for determining a big data cluster deployment solution.
背景技术Background technique
随着信息技术的发展,大数据在多个领域获得广泛的应用。大数据通常由大数据集群进行处理,大数据集群可以是Hadoop集群。在一些场景中,大数据集群在部署一段时间后,随着数据量的增加,大数据集群需要扩充自己的规模,在大数据集群原有机房空间不足的情况下,大数据集群需要在异地机房部署,多个机房之间的设备通过网络进行数据传输。例如,原来大数据集群在A地机房部署,在扩充规模时,大数据集群也在B地机房进行部署,A地机房与B地机房中的设备通过网络进行数据传输。With the development of information technology, big data has been widely used in many fields. Big data is usually processed by a big data cluster, which can be a Hadoop cluster. In some scenarios, after a big data cluster is deployed for a period of time, as the amount of data increases, the big data cluster needs to expand its scale. When the original computer room space of the big data cluster is insufficient, the big data cluster needs to be installed in an off-site computer room. Deployment, equipment between multiple computer rooms transmits data through the network. For example, the big data cluster was originally deployed in the computer room at location A. When expanding the scale, the big data cluster was also deployed in the computer room at location B. The devices in the computer room in location A and location B transmit data through the network.
相关技术中,在进行大数据集群部署时,通常由人工按照多个机房容纳的主机信息,将大数据集群随机拆分成多个部分,分别部署至多个机房中,这种方式可能会导致部署方案不合理、部署方案确定的效率低。In related technologies, when deploying a big data cluster, the big data cluster is usually manually divided into multiple parts randomly according to the host information accommodated in multiple computer rooms, and deployed to multiple computer rooms respectively. This method may cause deployment problems. The plan is unreasonable and the efficiency of determining the deployment plan is low.
发明内容Contents of the invention
本申请提供了一种大数据集群部署方案的确定方法、装置、集群和存储介质,能够提升部署方案的确定效率。This application provides a method, device, cluster and storage medium for determining a big data cluster deployment plan, which can improve the efficiency of determining the deployment plan.
第一方面,本申请提供了一种大数据集群部署方案的确定方法,大数据集群由大数管理平台进行管理,部署大数据集群的机房包括多个机房,每个机房中容纳有主机,该方法包括:In the first aspect, this application provides a method for determining a big data cluster deployment plan. The big data cluster is managed by the big data management platform. The computer room where the big data cluster is deployed includes multiple computer rooms, and each computer room houses a host. Methods include:
接收输入的大数据集群的部署需求,该部署需求包括该大数据集群中待部署组件的部署需求信息、该多个机房的机房容量信息和部署该大数据集群的主机的参数信息,基于该部署需求和该待部署组件的类别,确定该待部署组件的部署方案,该部署方案包括该待部署组件部署的机房以及在机房中部署的主机,输出该部署方案。Receive the input deployment requirements of the big data cluster. The deployment requirements include the deployment requirement information of the components to be deployed in the big data cluster, the computer room capacity information of the multiple computer rooms, and the parameter information of the host where the big data cluster is deployed. Based on the deployment The requirements and the category of the component to be deployed are determined, and the deployment plan of the component to be deployed is determined. The deployment plan includes the computer room where the component to be deployed is deployed and the hosts deployed in the computer room, and the deployment plan is output.
本申请所示的方案中,充分考虑了大数据集群中待部署组件的部署需求信息、多个机房的机房容量信息和部署大数据集群的主机的参数信息,并由计算设备基于这些信息来确定出部署方案,提升了部署方案的合理性以及部署方案的确定效率。In the solution shown in this application, the deployment demand information of the components to be deployed in the big data cluster, the computer room capacity information of multiple computer rooms, and the parameter information of the host where the big data cluster is deployed are fully considered, and the computing device determines based on this information The deployment plan is generated, which improves the rationality of the deployment plan and the efficiency of determining the deployment plan.
在一种示例中,该基于该部署需求和该待部署组件的类别,确定该待部署组件的部署方案,包括:基于该部署需求和第一部署策略,确定该待部署组件的部署方案,该第一部署策略为将该待部署组件中第一类组件中同一个组件的数据存储部分和计算部分部署在相同机房的策略。In one example, determining the deployment plan of the component to be deployed based on the deployment requirement and the category of the component to be deployed includes: determining the deployment plan of the component to be deployed based on the deployment requirement and the first deployment strategy, the The first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component in the component to be deployed in the same computer room.
本申请所示的方案中,将第一类组件中同一个组件的数据存储部分和计算部分部署在相同的机房中,能够减少第一类组件中数据存储部分和计算部分的数据跨机房传输,进而能够节约机房之间的带宽。In the solution shown in this application, the data storage part and the computing part of the same component in the first type of component are deployed in the same computer room, which can reduce the cross-computer room data transmission of the data storage part and the computing part of the first type component. This can save bandwidth between computer rooms.
在一种示例中,该基于该部署需求和第一部署策略,确定该待部署组件的部署方案,包括:确定满足该待部署组件的部署需求信息的主机为该待部署组件部署的主机,基于该机房容量信息、该待部署组件部署的主机和第一部署策略,确定该待部署组件部署的主机所属的机房。In one example, determining a deployment plan for the component to be deployed based on the deployment requirement and the first deployment strategy includes: determining a host that satisfies the deployment requirement information of the component to be deployed as the host for deployment of the component to be deployed, based on The computer room capacity information, the host where the component to be deployed is deployed, and the first deployment strategy determine the computer room to which the host where the component to be deployed belongs.
在一种示例中,该基于该部署需求和第一部署策略,确定该待部署组件的部署方案之前,还包括:基于该第一类组件的部署需求信息和该参数信息,确定该第一类组件所需的主机数目,基于该机房容量信息,确定该主机数目大于该多个机房中每个机房所容纳的主机的数目。In one example, before determining the deployment plan of the component to be deployed based on the deployment requirement and the first deployment strategy, the method further includes: determining the first type of component based on the deployment requirement information and the parameter information. The number of hosts required by the component is determined to be greater than the number of hosts accommodated in each of the multiple computer rooms based on the capacity information of the computer room.
本申请所示的方案中,在各个机房均不能完全部署第一类组件的情况下,将第一类组件中同一个组件的数据存储部分和计算部分部署在相同的机房中,能够减少第一类组件中数据存储部分和计算部分的数据跨机房传输,进而节约机房之间的带宽。 In the solution shown in this application, when the first type of components cannot be fully deployed in each computer room, deploying the data storage part and the computing part of the same component in the first type of components in the same computer room can reduce the number of first type components. The data in the data storage part and the computing part in the class component are transmitted across computer rooms, thereby saving bandwidth between computer rooms.
在一种示例中,该待部署组件中除该第一类组件之外的组件部署在同一个机房中。In one example, the components to be deployed other than the first type component are deployed in the same computer room.
本申请所示的方案中,将除第一类组件之外的组件部署在同一个机房中,能够减少这些组件的数据跨机房传输,进而节约机房之间的带宽。In the solution shown in this application, components other than the first type of components are deployed in the same computer room, which can reduce the data transmission of these components across computer rooms, thereby saving bandwidth between computer rooms.
在一种示例中,该基于该部署需求和该待部署组件的类别,确定该待部署组件的部署方案,包括:基于该待部署组件中第一类组件的部署需求信息,确定该第一类组件所需的主机数目,在该多个机房中存在所容纳的主机的数目大于或等于该主机数目的情况下,基于该部署需求和第二部署策略,确定该待部署组件的部署方案,该第二部署策略为将该第一类组件部署在同一个机房的策略。In one example, determining a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed includes: determining the first category of components based on the deployment requirement information of the component to be deployed. The number of hosts required by the component. If the number of hosts accommodated in the multiple computer rooms is greater than or equal to the number of hosts, a deployment plan for the component to be deployed is determined based on the deployment requirement and the second deployment strategy. The second deployment strategy is a strategy of deploying the first type of components in the same computer room.
本申请所示的方案中,在多个机房中某个机房中足够部署第一类组件,将第一类组件部署在同一机房中,能够减少第一类组件之间的数据跨机房传输,进而能够节约机房之间的带宽。In the solution shown in this application, the first type of components can be deployed in a certain computer room among multiple computer rooms. Deploying the first type of components in the same computer room can reduce the cross-machine room transmission of data between the first type of components, and thus Can save bandwidth between computer rooms.
在一种示例中,该多个机房包括该第一机房和第二机房,该基于该部署需求和第二部署策略,确定该待部署组件的部署方案,包括:确定满足该待部署组件的部署需求信息的主机为该待部署组件部署的主机,确定该第一类组件部署的主机所属的机房为该第一机房,确定该待部署组件中除该第一类组件之外的第二类组件部署的机房为该第二机房,该第一机房所容纳的主机的数目大于或等于该主机数目。In one example, the plurality of computer rooms include the first computer room and the second computer room, and determining a deployment plan for the component to be deployed based on the deployment requirement and the second deployment strategy includes: determining a deployment plan that satisfies the component to be deployed. The host of the required information is the host where the component to be deployed is deployed, the computer room to which the host of the first type component is deployed belongs is determined to be the first computer room, and the second type of components in the component to be deployed except the first type component are determined. The deployed computer room is the second computer room, and the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
本申请所示的方案中,将第一类组件部署在一个机房,将第二类组件部署在另一个机房,能够减少第一类组件之间的数据跨机房传输,以及第二类组件之间的数据跨机房传输,进而能够节约机房之间的带宽。In the solution shown in this application, deploying the first type of components in one computer room and deploying the second type of components in another computer room can reduce the cross-machine room transmission of data between the first type of components and the The data is transmitted across computer rooms, thereby saving bandwidth between computer rooms.
在一种示例中,该第一类组件包括基于大数据文件资源系统(Hadoop distributed file system,HDFS)和大数据资源调度器(yet another resource negotiator,YARN)的组件,该待部署组件中除该第一类组件之外的组件为非基于该HDFS和该YRAN的组件。In one example, the first type of components includes components based on big data file resource system (Hadoop distributed file system, HDFS) and big data resource scheduler (yet another resource negotiator, YARN). The components to be deployed except the Components other than the first type of components are components that are not based on HDFS and YRAN.
在一种示例中,该方法还包括:对于该多个机房中的第一机房和第二机房,确定该第一类组件在该第一机房和该第二机房之间的第一数据传输量,确定该第一类组件与该第二类组件在该第一机房和该第二机房之间的第二数据传输量,确定该第一机房和该第二机房之间的管理面数据量和控制面数据量,基于该第一数据传输量、该第二数据传输量、该管理面数据量以及该控制面数据量,确定该第一机房和该第二机房之间的带宽需求。In one example, the method further includes: for a first computer room and a second computer room in the plurality of computer rooms, determining a first data transmission amount of the first type component between the first computer room and the second computer room. , determine the second data transmission volume of the first type component and the second type component between the first computer room and the second computer room, determine the sum of the management plane data volume between the first computer room and the second computer room The control plane data volume determines the bandwidth requirement between the first computer room and the second computer room based on the first data transmission volume, the second data transmission volume, the management plane data volume, and the control plane data volume.
本申请所示的方案中,可以基于一定策略,确定机房之间的带宽需求,为大数据集群部署方提供带宽需求的参考值。In the solution shown in this application, the bandwidth requirements between computer rooms can be determined based on certain strategies, providing a reference value for bandwidth requirements for big data cluster deployers.
在一种示例中,该方法还包括:确定该第一类组件与该第二类组件之间的数据传输量,并确定该第一机房与该第二机房之间的管理面数据量和控制面数据量,基于该数据传输量、该管理面数据量以及该控制面数据量,确定该第一机房与该第二机房之间的带宽需求信息。In one example, the method further includes: determining the amount of data transmission between the first type component and the second type component, and determining the amount of management plane data and control between the first computer room and the second computer room. Based on the data transmission volume, the management plane data volume and the control plane data volume, the bandwidth demand information between the first computer room and the second computer room is determined.
本申请所示的方案中,可以基于一定策略,确定机房之间的带宽需求,为大数据集群部署方提供带宽需求的参考值。In the solution shown in this application, the bandwidth requirements between computer rooms can be determined based on certain strategies, providing a reference value for bandwidth requirements for big data cluster deployers.
在一种示例中,每个组件的部署需求信息包括操作系统要求信息、数据量或吞吐量中一种或多种。In one example, the deployment requirement information for each component includes one or more of operating system requirement information, data volume, or throughput.
在一种示例中,该参数信息包括各种型号的主机的操作系统信息、网络信息或硬件信息中的一种或多种以及该各种型号的主机的数目。In one example, the parameter information includes one or more of operating system information, network information, or hardware information of various types of hosts and the number of the various types of hosts.
第二方面,本申请提供了一种大数据集群部署方案的确定的装置,该装置包括至少一个模块,该至少一个模块用于实现上述第一方面或第一方面中任一种示例所提供的大数据集群部署方案的确定的方法。In a second aspect, this application provides a device for determining a big data cluster deployment solution. The device includes at least one module, and the at least one module is used to implement the above-mentioned first aspect or any one of the examples of the first aspect. The method to determine the big data cluster deployment plan.
在一些实施例中,大数据集群部署方案的确定的装置中的模块通过软件实现,大数据集群部署方案的确定的装置中的模块是程序模块。在另一些实施例中,大数据集群部署方案的确定的装置中的模块通过硬件或固件实现。In some embodiments, the modules in the determined device of the big data cluster deployment solution are implemented by software, and the modules in the determined device of the big data cluster deployment solution are program modules. In other embodiments, the modules in the determined device of the big data cluster deployment solution are implemented by hardware or firmware.
第三方面,本申请提供了一种计算设备集群,该计算设备集群包括至少一个计算设备,每个计算设备包括处理器和存储器,该至少一个计算设备的处理器用于执行该至少一个计算设备的存储器中存储的指令,以使得该计算设备集群执行上述第一方面或第一方面中任一种示例所提供的大数据集群部署方案的确定的方法。In a third aspect, the present application provides a computing device cluster. The computing device cluster includes at least one computing device. Each computing device includes a processor and a memory. The processor of the at least one computing device is configured to execute the at least one computing device. The instructions stored in the memory enable the computing device cluster to execute the determined method of the big data cluster deployment solution provided by the above-mentioned first aspect or any example of the first aspect.
第四方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质包括计算机程序指令,当该计算机程序指令由计算设备集群执行时,该计算设备集群执行上述第一方面或第一方面中任一种示例所提供的大数据集群部署方案的确定的方法。In a fourth aspect, the present application provides a computer-readable storage medium. The computer-readable storage medium includes computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the above first aspect or the third aspect. In one aspect, any of the examples provides a definite method of big data cluster deployment solution.
第五方面,本申请提供了一种包含指令的计算机程序产品,当该指令被计算设备集群运行时,使得所述计算设备集群执行上述第一方面或第一方面中任一种示例所提供的大数据集群部署方案的确定的方法。 In a fifth aspect, the present application provides a computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to execute the above-mentioned first aspect or any one of the examples of the first aspect. The method to determine the big data cluster deployment plan.
附图说明Description of drawings
图1是本申请一个示例性实施例提供的大数据集群异地扩容的示意图;Figure 1 is a schematic diagram of off-site expansion of a big data cluster provided by an exemplary embodiment of the present application;
图2是本申请一个示例性实施例提供的系统架构的示意图;Figure 2 is a schematic diagram of the system architecture provided by an exemplary embodiment of the present application;
图3是本申请一个示例性实施例提供的大数据集群部署方案的确定方法流程示意图;Figure 3 is a schematic flowchart of a method for determining a big data cluster deployment solution provided by an exemplary embodiment of the present application;
图4是本申请一个示例性实施例提供的大数据集群部署方案的确定方法流程示意图;Figure 4 is a schematic flowchart of a method for determining a big data cluster deployment solution provided by an exemplary embodiment of the present application;
图5是本申请一个示例性实施例提供的大数据集群部署方案的确定装置的结构示意图;Figure 5 is a schematic structural diagram of a device for determining a big data cluster deployment solution provided by an exemplary embodiment of the present application;
图6是本申请一个示例性实施例提供的计算设备的结构示意图;Figure 6 is a schematic structural diagram of a computing device provided by an exemplary embodiment of the present application;
图7是本申请一个示例性实施例提供的计算设备集群的结构示意图;Figure 7 is a schematic structural diagram of a computing device cluster provided by an exemplary embodiment of the present application;
图8是本申请一个示例性实施例提供的计算设备的连接示意图。Figure 8 is a schematic connection diagram of a computing device provided by an exemplary embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the purpose, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
下面对本申请实施例涉及的一些术语概念做解释说明。Some terms and concepts involved in the embodiments of this application are explained below.
1、YARN,是Hadoop的资源管理器,是一个通用资源管理系统,可为上层应用提供统一的资源管理和调度。YARN也称为是另一种资源协调者。1. YARN is the resource manager of Hadoop. It is a general resource management system that can provide unified resource management and scheduling for upper-layer applications. YARN is also known as another resource coordinator.
2、HDFS,是Hadoop的分布式文件系统,能够提供高可用获取应用数据的分布式文件系统。2. HDFS is a distributed file system of Hadoop, which can provide a highly available distributed file system for obtaining application data.
3、大数据阵列式数据库(Hadoop database,HBase),是一个建立在HDFS之上,且面向列的非关系型结构化查询语言(not only structured query language,NoSQL)数据库,用于快速读写大量数据。3. Big data array database (Hadoop database, HBase) is a column-oriented non-relational structured query language (NoSQL) database built on HDFS, which is used to quickly read and write large amounts of data. data.
4、大数据数据仓库(Hive),是一个建立在Hadoop上的数据仓库基础框架。Hive提供了一系列的工具,用来进行数据提取转化加载(extract-transform-load,ETL)。4. Big data data warehouse (Hive) is a basic data warehouse framework built on Hadoop. Hive provides a series of tools for data extraction, transformation and loading (extract-transform-load, ETL).
下面描述本申请的应用场景。The application scenarios of this application are described below.
应用场景一,本申请实施例应用于在多个机房部署大数据集群的场景。例如,大数据集群的规模往往比较大,而单个机房的场地空间和容积有限,大数据集群往往部署在多个机房中。Application scenario 1: The embodiment of this application is applied to the scenario of deploying big data clusters in multiple computer rooms. For example, the scale of big data clusters is often relatively large, and the space and capacity of a single computer room are limited, so big data clusters are often deployed in multiple computer rooms.
应用场景二,本申请实施例应用于异地扩容的场景。例如,大数据集群在机房中部署一段时间后,随着数据量的增加,大数据集群需要扩充自身规模。在很多情况下,由于原有机房的场地空间不足,大数据集群需要在异地机房进行扩容主机。例如,参见图1,在原有的第一机房的场地空间不足的情况下,在另一位置新建第二机房,大数据集群同时部署在第一机房和第二机房。Application scenario 2: The embodiment of this application is applied to the scenario of remote expansion. For example, after a big data cluster is deployed in a computer room for a period of time, as the amount of data increases, the big data cluster needs to expand its scale. In many cases, due to insufficient space in the original computer room, big data clusters need to expand the host capacity in an off-site computer room. For example, see Figure 1. When the original first computer room is insufficient in space, a second computer room is built at another location, and the big data cluster is deployed in both the first computer room and the second computer room.
在应用场景一和应用场景二中,大数据集群支持跨机房部署以及跨机房扩容,大数据集群对上层应用系统和其他业务系统而言是一个完整的集群,上层应用系统和其他业务系统不会感知到大数据集群的物理部署形态。在新部署大数据集群,或者扩容后的大数据集群接入大数据集群的管理平台中,由大数据集群的管理平台统一进行管理。In application scenario one and application scenario two, the big data cluster supports cross-computer room deployment and cross-computer room expansion. The big data cluster is a complete cluster for the upper-layer application system and other business systems. The upper-layer application system and other business systems will not Aware of the physical deployment form of big data clusters. When a newly deployed big data cluster or an expanded big data cluster is connected to the management platform of the big data cluster, the management platform of the big data cluster will perform unified management.
需要说明的是,在应用场景二中,虽然是异地扩容,但是从实质来讲,依旧是大数据集群的跨机房部署。It should be noted that in application scenario two, although it is off-site expansion, in essence, it is still a cross-machine room deployment of big data clusters.
下面描述本申请实施例的系统架构。The system architecture of the embodiment of this application is described below.
本申请实施例提供了一种系统架构100。如图2所示,系统架构100包括第一设备101和第二设备102,第一设备101和第二设备102均可以是终端或者服务器等计算设备(计算设备参见后文中描述的计算设备200),第一设备101与第二设备102之间通过有线或者无线网络连接。其中,第一设备101用于确定部署方案,部署方案包括大数据集群中的组件部署的机房以及在机房中部署的主机,第二设备102用于按照该部署方案,将大数据集群的组件部署至对应的主机。可选的,第一设备和/或第二设备还可以是在云计算平台的数据中心中的服务器或虚拟机,以向用户提供确定部署方案的云服务。The embodiment of this application provides a system architecture 100. As shown in Figure 2, the system architecture 100 includes a first device 101 and a second device 102. Both the first device 101 and the second device 102 can be computing devices such as terminals or servers (for computing devices, see the computing device 200 described later). , the first device 101 and the second device 102 are connected through a wired or wireless network. Among them, the first device 101 is used to determine the deployment plan. The deployment plan includes the computer room where the components in the big data cluster are deployed and the hosts deployed in the computer room. The second device 102 is used to deploy the components of the big data cluster according to the deployment plan. to the corresponding host. Optionally, the first device and/or the second device may also be a server or a virtual machine in the data center of the cloud computing platform to provide users with cloud services that determine the deployment plan.
在图2中,第二设备102负责部署组件,在另一种实现中,系统架构100中不包括第二设备102,第一设备101确定出部署方案后,按照该部署方案,将大数据集群的组件部署至对应的主机。In Figure 2, the second device 102 is responsible for deploying components. In another implementation, the second device 102 is not included in the system architecture 100. After the first device 101 determines the deployment plan, it will cluster the big data according to the deployment plan. The components are deployed to the corresponding hosts.
下面描述本申请实施例中大数据集群部署方案的确定方法流程。The following describes the process of determining the big data cluster deployment plan in the embodiment of this application.
图3提供了大数据集群部署方案的确定方法的流程,参见步骤301至步骤303。在图3中以第一设备101确定部署方案为例进行方案的说明。 Figure 3 provides the process of determining the big data cluster deployment plan, see step 301 to step 303. In FIG. 3 , the first device 101 determines the deployment plan as an example to illustrate the plan.
步骤301,接收输入的大数据集群的部署需求,该部署需求包括该大数据集群中待部署组件的部署需求信息、部署该大数据集群的多个机房的机房容量信息和部署该大数据集群的主机的信息。Step 301: Receive the input deployment requirements of the big data cluster. The deployment requirements include the deployment requirement information of the components to be deployed in the big data cluster, the computer room capacity information of the multiple computer rooms where the big data cluster is deployed, and the information of the computer rooms where the big data cluster is deployed. Host information.
其中,大数据集群为大数据Hadoop集群,待部署组件包括大数据集群中基于HDFS的组件和YARN的组件以及非基于HDFS和YARN的组件。例如,基于HDFS和YARN的组件包括HDFS组件、YRAN组件、HBase组件、Hive组件、Spark组件和弗林克(Flink)组件等,非基于HDFS和YARN的组件包括Kafka组件(是一种高吞吐量的分布式发布订阅消息系统组件)、弹性搜索(Elastic Search)组件、远程数据服务(remote dictionary server,Redis)组件和Flume组件(是一种高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统组件)等。部署需求包括待部署组件的部署需求信息、部署大数据集群的多个机房的容量信息和部署大数据集群的主机的参数信息。Among them, the big data cluster is a big data Hadoop cluster, and the components to be deployed include components based on HDFS and YARN in the big data cluster, as well as components not based on HDFS and YARN. For example, components based on HDFS and YARN include HDFS components, YRAN components, HBase components, Hive components, Spark components, and Flink components, etc., and components not based on HDFS and YARN include Kafka components (a high-throughput Distributed publish-subscribe messaging system components), elastic search (Elastic Search) components, remote data service (remote dictionary server, Redis) components and Flume components (a highly available, highly reliable, distributed massive log collection , aggregation and transmission system components), etc. The deployment requirements include deployment requirement information of the components to be deployed, capacity information of multiple computer rooms where the big data cluster is deployed, and parameter information of the host where the big data cluster is deployed.
在本实施例中,在部署大数据集群时,为用户提供部署需求的输入接口,用户通过该输入接口,输入大数据集群的部署需求。第一设备可以获取用户输入的部署需求。该输入接口可以是以图形界面、命令行或云计算平台上的应用程序编程接口(application programming interface,API)等方式提供的。In this embodiment, when deploying a big data cluster, an input interface for deployment requirements is provided to the user, and the user inputs the deployment requirements for the big data cluster through the input interface. The first device may obtain the deployment requirements input by the user. The input interface can be provided in the form of a graphical interface, a command line, or an application programming interface (API) on a cloud computing platform.
或者,在部署大数据集群时,用户触发终端设备向第一设备发送部署方案的确认请求。第一设备接收到部署方案的确定请求,在该确定请求中包括大数据集群的部署需求。第一设备可以在该确定请求中获取到该部署需求。Or, when deploying the big data cluster, the user triggers the terminal device to send a confirmation request for the deployment plan to the first device. The first device receives a determination request of the deployment plan, and the determination request includes the deployment requirements of the big data cluster. The first device may obtain the deployment requirement in the determination request.
在一种示例中,对于任一组件,该组件的部署需求信息包括操作系统要求信息、数据量或吞吐量中一种或多种,操作系统要求信息指示该组件所部署的主机的操作系统,数据量指示该组件所要处理的数据量,吞吐量指示该组件单位时间内所传输的数据量。In one example, for any component, the deployment requirement information of the component includes one or more of operating system requirement information, data volume or throughput, and the operating system requirement information indicates the operating system of the host on which the component is deployed, The data volume indicates the amount of data to be processed by the component, and the throughput indicates the amount of data transmitted by the component per unit time.
在一种示例中,主机的参数信息包括各种型号的主机的操作系统信息、网络信息或硬件信息中一种或多种,以及各种型号的主机的数目。操作系统信息指示主机的操作系统,网络信息指示主机的带宽等,硬件信息指示主机的中央处理器(central processing unit,CPU)型号和存储资源等。In one example, the parameter information of the host includes one or more of operating system information, network information, or hardware information of various types of hosts, as well as the number of various types of hosts. The operating system information indicates the operating system of the host, the network information indicates the bandwidth of the host, etc., and the hardware information indicates the central processing unit (CPU) model and storage resources of the host.
在一种示例中,多个机房可以设置在同一个城市,也可以设置在不同的城市。在机房中放置主机时,并未设置机架,每个机房的机房容量信息包括容纳的主机的数目。在机房中设置主机时,主机是通过机架放置在机房中,每个机房的机房容量信息包括容纳的机架的数目以及每个机架放置的主机的数目。在机房中设置主机时,一部分主机直接放置在机房中,另一部分主机通过机架放置在机房中,每个机房的机房容量信息包括容纳的机架的数目、每个机架容纳的主机的数目以及单独容纳的主机的数目。In one example, multiple computer rooms can be set up in the same city or in different cities. When placing hosts in the computer room, no racks are set up. The computer room capacity information of each computer room includes the number of hosts accommodated. When setting up a host in a computer room, the host is placed in the computer room through a rack. The computer room capacity information of each computer room includes the number of racks accommodated and the number of hosts placed in each rack. When setting up hosts in a computer room, some hosts are placed directly in the computer room, and other hosts are placed in the computer room through racks. The computer room capacity information of each computer room includes the number of racks accommodated and the number of hosts accommodated by each rack. and the number of individually accommodated hosts.
步骤302,基于该部署需求和该待部署组件的类别,确定该待部署组件的部署方案,该部署方案包括该待部署组件部署的机房以及在机房中部署的主机。Step 302: Determine a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed. The deployment plan includes the computer room where the component to be deployed is deployed and the hosts deployed in the computer room.
其中,待部署组件的类别用于区分待部署组件。Among them, the category of the component to be deployed is used to distinguish the component to be deployed.
在本实施例中,第一设备使用该部署需求和待部署组件的类别,确定待部署组件所要部署的机房以及在机房中所要部署的主机,即获得待部署组件的部署方案。此处待部署组件所要部署至的主机可以是部署至主机的型号,也可以是部署至的主机的标识,机房中同一型号的主机有可能存在多个,机房中同一标识的主机仅可能存在一个。In this embodiment, the first device uses the deployment requirement and the category of the component to be deployed to determine the computer room where the component to be deployed is to be deployed and the host to be deployed in the computer room, that is, the first device obtains the deployment plan of the component to be deployed. The host to which the component to be deployed here can be the model to which the host is deployed, or the ID of the host to which it is deployed. There may be multiple hosts of the same model in the computer room, and there may only be one host with the same identification in the computer room. .
在一种示例中,待部署组件可以分为第一类组件和第二类组件,可选地,第一类组件为基于HDFS和YRAN的组件,第二类组件为非基于HDFS和YRAN的组件。In an example, the components to be deployed can be divided into a first type of component and a second type of component. Optionally, the first type of component is a component based on HDFS and YRAN, and the second type of component is a component not based on HDFS and YRAN. .
为了减少第一类组件中同一个组件在不同机房中的数据传输量,可以将第一类组件中同一个组件的数据存储部分和计算部分部署在相同的机房中。步骤302的处理可以如下:In order to reduce the amount of data transmission of the same component in the first type of component in different computer rooms, the data storage part and the computing part of the same component in the first type of component can be deployed in the same computer room. The processing of step 302 can be as follows:
第一设备获取存储的第一部署策略,第一部署策略为将第一类组件中同一个组件的数据存储部分和计算部分部署在相同机房的策略,并且确定第一类组件所包括的组件。第一设备基于大数据集群的部署需求和第一部署策略,将第一类组件中同一个组件的数据存储部分和计算部分部署在相同的机房中。例如,对于HBase组件,HBase组件的数据存储部分部署在主机1、主机2、主机3和主机4上,HBase组件对应的计算部分部署在主机1、主机2、主机3和主机4上,主机1、主机2、主机3和主机4部署在机房A。对于Hive组件,Hive组件的数据存储部分部署在主机5、主机6、主机7和主机8上,Hive组件对应的计算部分部署在主机5、主机6、主机7和主机8上,主机5、主机6、主机7和主机8部署在机房B。The first device obtains a first deployment strategy of storage, where the first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component in the first type of component in the same computer room, and determines the components included in the first type of component. Based on the deployment requirements of the big data cluster and the first deployment strategy, the first device deploys the data storage part and the computing part of the same component in the first type of component in the same computer room. For example, for the HBase component, the data storage part of the HBase component is deployed on host 1, host 2, host 3 and host 4, and the corresponding computing part of the HBase component is deployed on host 1, host 2, host 3 and host 4. Host 1 , Host 2, Host 3 and Host 4 are deployed in computer room A. For the Hive component, the data storage part of the Hive component is deployed on host 5, host 6, host 7 and host 8. The corresponding computing part of the Hive component is deployed on host 5, host 6, host 7 and host 8. Host 5, host 6. Host 7 and host 8 are deployed in computer room B.
可选地,在采用第一部署策略部署组件时,处理方式如下:Optionally, when deploying the component using the first deployment strategy, the processing method is as follows:
第一设备确定满足第一类组件的部署需求信息的主机。例如,对于第一类组件中的组件A,组件A的部署需求信息为视窗(windows)操作系统、数据量为500M以及吞吐量为1G,满足组件A的部署需求信 息的主机的操作系统为windows操作系统,该主机能够为组件A处理500M的数据量,并且该主机能够在单位时长内为组件A传输1G的数据量。The first device determines a host that satisfies the deployment requirement information of the first type component. For example, for component A in the first category of components, the deployment requirement information of component A is a Windows operating system, a data volume of 500M, and a throughput of 1G. The deployment requirement information of component A is met. The operating system of the information host is a Windows operating system. The host can process 500M data volume for component A, and the host can transmit 1G data volume for component A within a unit time.
第一设备将满足部署需求信息的主机确定为待部署组件部署的主机,然后对于第一类组件中的同一个组件,第一设备使用机房容量信息和待部署组件部署的主机,将部署该同一个组件的数据存储部分和计算部分的主机设置在相同的机房中,使得同一个组件的数据存储部分和计算部分之间不需要进行跨机房间的数据传输。对于待部署组件中除第一类组件之外的第二类组件,第一设备将第二类组件部署至的主机部署在同一个机房中,使得减少第二类组件跨机房的数据传输量。此处是考虑到第二类组件比较少,能够部署在同一个机房中。The first device determines the host that meets the deployment requirement information as the host where the component to be deployed is deployed, and then for the same component in the first type of component, the first device uses the computer room capacity information and the host to deploy the component to be deployed, and deploys the same component. The hosts of the data storage part and the computing part of a component are set up in the same computer room, so that there is no need for cross-machine room data transmission between the data storage part and the computing part of the same component. For the second type of components other than the first type of components among the components to be deployed, the first device deploys the hosts to which the second type components are deployed in the same computer room, thereby reducing the amount of data transmission of the second type components across computer rooms. This is because the second type of components is relatively small and can be deployed in the same computer room.
可选地,针对第一类组件中的不同组件,划分资源池,不同组件对应不同的资源池。为每个组件配置数据存储部分的主机标签,并且为每个组件配置计算部分的主机标签,数据存储部分的主机标签与计算部分的主机标签对应,数据存储部分的主机标签对应的主机构成该组件的资源池。例如,为第一类组件中HBase组件的数据存储部分配置的主机标签为标签1、标签2、标签3和标签4,标签1、标签2、标签3和标签4对应的主机部署在第一机房,为HBase组件对应的计算部分配置的主机标签为标签1、标签2、标签3和标签4,使得计算部分在调度时被调度至标签1、标签2、标签3和标签4对应的主机。在执行计算任务时,Yarn根据计算任务队列对资源的需求,将计算任务队列中的计算任务与有对应标签的资源池动态关联。Optionally, resource pools are divided for different components in the first type of components, and different components correspond to different resource pools. Configure the host label of the data storage part for each component, and configure the host label of the computing part for each component. The host label of the data storage part corresponds to the host label of the computing part. The host corresponding to the host label of the data storage part constitutes the component. resource pool. For example, the host labels configured for the data storage part of the HBase component in the first type of component are label 1, label 2, label 3 and label 4. The hosts corresponding to label 1, label 2, label 3 and label 4 are deployed in the first computer room. , the host labels configured for the computing part corresponding to the HBase component are label 1, label 2, label 3 and label 4, so that the computing part is scheduled to the hosts corresponding to label 1, label 2, label 3 and label 4 during scheduling. When executing computing tasks, Yarn dynamically associates computing tasks in the computing task queue with resource pools with corresponding labels based on the resource requirements of the computing task queue.
在另一种示例中,将第一类组件中同一个组件的数据存储部分和计算部分部署在相同的机房中时,步骤302的处理可以如下:In another example, when the data storage part and the computing part of the same component in the first type of component are deployed in the same computer room, the processing in step 302 can be as follows:
预先研发一个部署方案生成软件,该部署方案生成软件的输入是大数据集群的部署需求和待部署组件的类别,输出是大数据集群的部署方案,该部署方案生成软件在配置时是将第一类组件中同一个组件的数据存储部分和计算部分部署在同一个机房为目的进行配置的。Develop a deployment plan generation software in advance. The input of the deployment plan generation software is the deployment requirements of the big data cluster and the categories of components to be deployed. The output is the deployment plan of the big data cluster. When configuring the deployment plan generation software, it will first The data storage part and computing part of the same component in the class component are configured for the purpose of being deployed in the same computer room.
第一设备将大数据集群的部署需求和待部署组件的类别输入到该部署方案生成软件,该部署方案生成软件输出部署方案,该部署方案即为大数据集群的部署方案。The first device inputs the deployment requirements of the big data cluster and the categories of components to be deployed into the deployment plan generation software, and the deployment plan generation software outputs a deployment plan, and the deployment plan is the deployment plan of the big data cluster.
在另一种示例中,待部署组件包括第一类组件和第二类组件,第一类组件为基于HDFS和YRAN的组件,第二类组件为非基于HDFS和YRAN的组件,第二类组件可以认为是待部署组件中除第一类组件之外的组件。为了减少第一类组件包括的不同组件在不同机房之间的数据传输量,考虑将第一类组件部署在同一个机房的主机上,处理方式如下:In another example, the components to be deployed include a first type of component and a second type of component. The first type of component is a component based on HDFS and YRAN, and the second type of component is a component that is not based on HDFS and YRAN. The second type of component It can be considered as components other than the first type of components to be deployed. In order to reduce the amount of data transmission between different components included in the first type of components between different computer rooms, consider deploying the first type of components on the host in the same computer room. The processing method is as follows:
参见图4,步骤401,第一设备使用第一类组件包括的各个组件的部署需求信息和主机的参数信息,确定该各个组件所需的主机的数目,将各个组件所需的主机的数目相加,获得第一类组件所需的主机数目。第一设备使用多个机房中每个机房的机房容量信息,确定每个机房所容纳的主机的数目。例如,主机的参数信息为:windows系统的主机有10个,硬件为CPU,CPU为2*32核,内存为4*32G等,linux系统的主机有12个,硬件为CPU,CPU为2*32核,内存为8*32G,第一类组件所需的主机数目为13个,多个机房中第一机房的机房容量信息为容纳20个主机。Referring to Figure 4, step 401, the first device uses the deployment requirement information of each component included in the first type of component and the parameter information of the host to determine the number of hosts required by each component, and compares the number of hosts required by each component. Add, the number of hosts required to obtain the first type of components. The first device uses the computer room capacity information of each computer room in the plurality of computer rooms to determine the number of hosts accommodated in each computer room. For example, the parameter information of the host is: there are 10 hosts in the Windows system, the hardware is CPU, the CPU is 2*32 cores, the memory is 4*32G, etc., there are 12 hosts in the Linux system, the hardware is CPU, and the CPU is 2* 32 cores, memory 8*32G, the number of hosts required for the first type of component is 13, and the capacity information of the first computer room in multiple computer rooms is to accommodate 20 hosts.
步骤402,第一设备判断第一类组件所需的主机数目与每个机房所容纳的主机的数目的大小关系,在多个机房中存在所容纳的主机的数目大于或等于主机数目的情况下,则确定第一类组件能够全部部署至多个机房中的一个机房。例如,第一机房所容纳的主机的数目大于或等于主机数目。Step 402: The first device determines the relationship between the number of hosts required for the first type of component and the number of hosts accommodated in each computer room. In the case where the number of hosts accommodated in multiple computer rooms is greater than or equal to the number of hosts , then it is determined that all the first-type components can be deployed in one of the multiple computer rooms. For example, the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
步骤403,第一设备获取存储的第二部署策略,第二部署策略是将第一类组件部署在同一个机房的策略。第一设备基于大数据集群的部署需求和第二部署策略,确定大数据集群的部署方案,即将第一类组件部署在同一个机房中,而对于第二类组件,将第二类组件部署在除第一类组件部署的机房之外的机房,或者,部分部署在第一类组件部署的机房中,另外部分部署在除第一类组件部署的机房之外的机房。Step 403: The first device obtains the second deployment strategy of storage. The second deployment strategy is a strategy for deploying the first type of components in the same computer room. Based on the deployment requirements of the big data cluster and the second deployment strategy, the first device determines the deployment plan of the big data cluster, that is, deploying the first type of components in the same computer room, and for the second type of components, deploy the second type of components in A computer room other than the computer room where the first type components are deployed, or part of it is deployed in the computer room where the first type components are deployed, and the other part is deployed in a computer room other than the computer room where the first type components are deployed.
步骤404,在多个机房中不存在所容纳的主机的数目大于或等于主机数目的情况下,则可以采用前文中描述的第一部署策略部署待部署组件,具体描述参见前文中的描述,此处不再赘述。Step 404: If the number of hosts accommodated in multiple computer rooms is not greater than or equal to the number of hosts, the first deployment strategy described above can be used to deploy the components to be deployed. For detailed description, please refer to the previous description. No further details will be given.
可选地,在采用第二部署策略部署组件时,处理方式如下:Optionally, when deploying components using the second deployment strategy, the processing method is as follows:
在多个机房包括第一机房和第二机房的情况下,第一设备确定满足第一类组件的部署需求信息的主机。第一设备将满足部署需求信息的主机确定为待部署组件部署的主机。In the case where the multiple computer rooms include a first computer room and a second computer room, the first device determines a host that satisfies the deployment requirement information of the first type of component. The first device determines the host that meets the deployment requirement information as the host where the component to be deployed is deployed.
在第一机房所容纳的主机的数目大于主机数目时,第一设备确定第一类组件部署的主机为第一机房,确定第二类组件部署的主机为第二机房。此处是考虑到第二类组件比较少,能够部署在同一个机房中。 When the number of hosts accommodated in the first computer room is greater than the number of hosts, the first device determines the host where the first type of component is deployed as the first computer room, and determines the host where the second type of component is deployed as the second computer room. This is because the second type of components is relatively small and can be deployed in the same computer room.
例如,大数据集群为Hadoop集群,第一类组件包括HDFS组件、HBase组件、Yarn组件、Spark组件、Spark2X组件、Hive组件、映射归约(MapReduce)组件、Storm组件(是一种分布式实时计算系统系统组件)、Zookeeper组件(ZooKeeper组件是一个分布式的,开放源码的分布式应用程序协调服务组件)、数据库(data base,DB)服务(Service)组件、网络认证协议(kerberos,Krb)服务(Server)组件、Hadoop用户体验(Hadoop user experience,Hue)组件和轻量目录访问协议(lightweight directory access protocol,Ldap)Server组件。第二类组件包括Elastic Search组件、企业级搜索应用服务器(Solr)组件、Redis组件、图数据库(GraphBase)组件、Kafka组件、装卸器(Loader)组件、文件传输协议(file transfer protocol,FTP)-Server组件和Oozie组件(是一种任务调度框架组件)。表一中提供了该Hadoop集群中组件的部署方案。For example, the big data cluster is a Hadoop cluster. The first type of components includes HDFS components, HBase components, Yarn components, Spark components, Spark2X components, Hive components, MapReduce (MapReduce) components, and Storm components (a distributed real-time computing System system component), Zookeeper component (ZooKeeper component is a distributed, open source distributed application coordination service component), database (data base, DB) service (Service) component, network authentication protocol (kerberos, Krb) service (Server) component, Hadoop user experience (Hadoop user experience, Hue) component and lightweight directory access protocol (lightweight directory access protocol, Ldap) Server component. The second type of components includes Elastic Search components, enterprise-level search application server (Solr) components, Redis components, graph database (GraphBase) components, Kafka components, loader (Loader) components, file transfer protocol (FTP)- Server component and Oozie component (a task scheduling framework component). Table 1 provides the deployment scheme of the components in the Hadoop cluster.
表一
Table I
在表一中,主机类型指示组件所部署的主机的类型,在表一中示出三种不同的主机类型,分别使用类型1、类型2和类型3表示,三种主机类型的主机的详细信息参见表二。In Table 1, the host type indicates the type of host on which the component is deployed. In Table 1, three different host types are shown, which are represented by type 1, type 2 and type 3 respectively. The details of the hosts of the three host types are See Table 2.
表二

Table II

采用此种部署方案,在第一机房能够全部容纳第一类组件的情况下,将第一类组件部署在第一机房中,能够减少第一类组件中的组件在不同机房之间的数据传输量。Using this deployment plan, when the first computer room can accommodate all the first-category components, deploying the first-category components in the first computer room can reduce the data transmission of the components in the first-category components between different computer rooms. quantity.
需要说明的是,在大数据扩容场景中,若原来已建机房中的主机已经无法挪动,可以直接获取到已建机房所能容纳的主机的数目。在表二中,管理节点是对部署在大数据集群中的组件进行集中管理,控制节点是执行资源调度和任务分配的相关节点。It should be noted that in the big data expansion scenario, if the hosts in the originally built computer room cannot be moved, you can directly obtain the number of hosts that the built computer room can accommodate. In Table 2, the management node is to centrally manage the components deployed in the big data cluster, and the control node is the relevant node that performs resource scheduling and task allocation.
步骤303,输出该部署方案。Step 303: Output the deployment plan.
在本实施例中,第一设备确定出待部署组件的部署方案后,可以向第二设备输出该部署方案,第二设备可以基于该部署方案,将待部署组件部署至多个机房的主机上。例如,第一设备确定出待部署组件的部署方案后,基于该部署方案生成部署任务列表,该部署任务列表可以通过离线导出电子表格(如EXCEL表)的形式输出至第二设备。第二设备的大数据集群安装软件可以使用部署任务列表,将待部署组件部署至多个机房的主机上。In this embodiment, after the first device determines the deployment plan of the component to be deployed, it can output the deployment plan to the second device, and the second device can deploy the component to be deployed to hosts in multiple computer rooms based on the deployment plan. For example, after the first device determines the deployment plan of the component to be deployed, it generates a deployment task list based on the deployment plan. The deployment task list can be output to the second device in the form of an offline export spreadsheet (such as an EXCEL table). The big data cluster installation software of the second device can use the deployment task list to deploy the components to be deployed to hosts in multiple computer rooms.
或者,第一设备确定出待部署组件的部署方案后,可以向发送确定请求的设备发送部署方案。Alternatively, after determining the deployment plan of the component to be deployed, the first device may send the deployment plan to the device that sent the determination request.
或者,第一设备确定出待部署组件的部署方案后,可以显示该部署方案。Alternatively, after determining the deployment plan of the component to be deployed, the first device can display the deployment plan.
或者,第一设备确定出待部署组件的部署方案后,可以采用专用装置接入主机中,在该主机中部署所要部署的待部署组件。Alternatively, after the first device determines the deployment plan of the component to be deployed, it can use a dedicated device to access the host, and deploy the component to be deployed in the host.
采用图3所示的流程,基于大数据集群的部署需求和待部署组件的类别,自动确定待部署组件的部署方案,而不是人工计算部署方案,能够提升部署方案的确定效率,进而可以提升大数据集群的部署效率。Using the process shown in Figure 3, based on the deployment requirements of the big data cluster and the categories of components to be deployed, the deployment plan of the components to be deployed is automatically determined instead of manually calculating the deployment plan. This can improve the efficiency of determining the deployment plan, and thus improve the efficiency of the deployment plan. Data cluster deployment efficiency.
在一种示例中,在使用第一部署策略确定部署方案的情况下,第一设备还可以确定多个机房中任意两个机房之间的带宽需求,考虑第二类组件部署在同一个机房中,确定带宽需求的处理方式如下:In one example, when the first deployment strategy is used to determine the deployment plan, the first device can also determine the bandwidth requirements between any two of the multiple computer rooms, considering that the second type of components are deployed in the same computer room. , the processing method for determining bandwidth requirements is as follows:
对于该多个机房中的第一机房和第二机房,确定第一类组件在第一机房和第二机房之间的第一数据传输量,确定第一类组件与第二类组件在第一机房和第二机房之间的第二数据传输量,确定第一机房和第二机房之间的管理面数据量和控制面数据量,基于第一数据传输量、第二数据传输量、该管理面数据量以及该控制面数据量,确定第一机房和第二机房之间的带宽需求。For the first computer room and the second computer room in the plurality of computer rooms, the first data transmission amount of the first type component between the first computer room and the second computer room is determined, and the first data transmission amount of the first type component and the second type component between the first type component and the second type component is determined. The second data transmission volume between the computer room and the second computer room determines the management plane data volume and control plane data volume between the first computer room and the second computer room, based on the first data transmission volume, the second data transmission volume, the management The amount of plane data and the amount of control plane data determine the bandwidth requirements between the first computer room and the second computer room.
在本实施例中,在确定两个机房之间的带宽需求时,以多个机房中的第一机房和第二机房为例进行说明。在大数据集群中,管理节点与控制节点可以部署在同一个机房中,也可以部署在不同的机房中。In this embodiment, when determining the bandwidth requirement between two computer rooms, the first computer room and the second computer room among multiple computer rooms are used as an example for explanation. In a big data cluster, management nodes and control nodes can be deployed in the same computer room or in different computer rooms.
在管理节点与控制节点均部署在第一机房或者第二机房的情况下,或者,在管理节点与控制节点分别部署在第一机房和第二机房的情况下,第一机房与第二机房之间的数据传输量考虑三部分,第一部分为第一机房与第二机房之间的管理面数据量和控制面数据量,第二部分为第一类组件在第一机房与第二机房之 间的第一数据传输量,第三部分为第一类组件与第二类组件在第一机房和第二机房之间的第二数据传输量。第一机房与第二机房之间所需的最小带宽为管理面数据量、控制面数据量、第一数据传输量和第二数据传输量之和。为了使得第一机房与第二机房之间的带宽足够大数据集群使用,所以第一机房与第二机房之间的带宽需求,通常会大于第一机房与第二机房之间所需的最小带宽。In the case where the management node and the control node are both deployed in the first computer room or the second computer room, or in the case where the management node and the control node are deployed in the first computer room and the second computer room respectively, the relationship between the first computer room and the second computer room The data transmission volume between three parts is considered. The first part is the management plane data volume and the control plane data volume between the first computer room and the second computer room. The second part is the first type component between the first computer room and the second computer room. The third part is the second data transmission amount between the first type component and the second type component between the first computer room and the second computer room. The minimum bandwidth required between the first computer room and the second computer room is the sum of the management plane data volume, the control plane data volume, the first data transmission volume, and the second data transmission volume. In order to make the bandwidth between the first computer room and the second computer room sufficient for large data cluster use, the bandwidth requirement between the first computer room and the second computer room is usually greater than the minimum bandwidth required between the first computer room and the second computer room. .
在第一机房和第二机房中均未部署管理节点和控制节点时,第一机房和第二机房之间不存在管理面数据量和控制面数据量。在第一机房或第二机房部署管理节点,且第一机房与第二机房均未部署控制节点时,第一机房和第二机房之间不存在控制面数据量,在第一机房或第二机房部署控制节点,且第一机房与第二机房均未部署管理节点时,第一机房和第二机房之间不存在管理面数据量。此处,不存在管理面数据量和控制面数据量,可以认为管理面数据量和控制面数据量均为0。When no management node or control node is deployed in either the first computer room or the second computer room, there is no management plane data volume or control plane data volume between the first computer room and the second computer room. When the management node is deployed in the first computer room or the second computer room, and no control node is deployed in either the first computer room or the second computer room, there is no control plane data volume between the first computer room and the second computer room. When control nodes are deployed in the computer room, and management nodes are not deployed in either the first computer room or the second computer room, there is no management plane data volume between the first computer room and the second computer room. Here, there is no management plane data amount and control plane data amount, and it can be considered that both the management plane data amount and the control plane data amount are 0.
第一设备在确定出多个机房中每两个机房之间的带宽需求之后,将该带宽需求和大数据集群的部署方案一起输出给第二设备。或者,向发送确定请求的设备发送该带宽需求和大数据集群的部署方案,或者,显示大数据集群的部署方案时,同时显示该带宽需求。After determining the bandwidth requirements between each two computer rooms in the plurality of computer rooms, the first device outputs the bandwidth requirements and the deployment plan of the big data cluster to the second device. Or, send the bandwidth requirement and the deployment plan of the big data cluster to the device that sent the confirmation request, or, when displaying the deployment plan of the big data cluster, display the bandwidth requirement at the same time.
需要说明的是,在计算带宽需求时,是以第二类组件部署在同一个机房为例进行说明,在第二类组件部署在不同的机房时,还需要考虑第二类组件在不同机房之间的第三数据传输量。It should be noted that when calculating bandwidth requirements, the second type of components is deployed in the same computer room as an example. When the second type of components are deployed in different computer rooms, it is also necessary to consider the deployment of the second type of components in different computer rooms. The third amount of data transfer between.
在一种示例中,在使用第二部署策略确定部署方案的情况下,第二设备还可以确定第一机房与机房之间的带宽需求,处理方式如下:In one example, when the second deployment strategy is used to determine the deployment plan, the second device can also determine the bandwidth requirements between the first computer room and the computer room. The processing method is as follows:
确定第一类组件与第二类组件之间的数据传输量,并确定第一机房与第二机房之间的管理面数据量和控制面数据量,基于该数据传输量、该管理面数据量和该控制面数据量,确定第一机房与第二机房之间的带宽需求。Determine the data transmission volume between the first type component and the second type component, and determine the management plane data volume and control plane data volume between the first computer room and the second computer room. Based on the data transmission volume and the management surface data volume, and the control plane data volume to determine the bandwidth requirements between the first computer room and the second computer room.
在本实施例中,对于第一类组件中的任一组件,第一设备确定该任一组件与第二类组件中各个组件之间单位时长内的数据传输量。第一设备将第一类组件中所有组件对应的单位时长内的数据传输量相加,获得第一类组件与第二类组件之间单位时长内的数据传输量。In this embodiment, for any component in the first type of component, the first device determines the data transmission amount per unit time between any component and each component in the second type of component. The first device adds the data transmission amount per unit time corresponding to all components in the first type component to obtain the data transmission amount per unit time between the first type component and the second type component.
在大数据集群中,还设置有管理节点和控制节点。管理节点与控制节点可以部署在同一个机房中,也可以部署在不同的机房中。第一设备确定第一机房与第二机房之间的控制面数据量,并且确定第一机房与第二机房之间的管理面数据量,将该控制面数据量与该管理面数据量相加,获得第一机房与第二机房之间的管控面数据量。In the big data cluster, there are also management nodes and control nodes. Management nodes and control nodes can be deployed in the same computer room or in different computer rooms. The first device determines the control plane data amount between the first computer room and the second computer room, determines the management plane data amount between the first computer room and the second computer room, and adds the control plane data amount to the management plane data amount. , obtain the data volume of the control plane between the first computer room and the second computer room.
第一设备将该数据传输量与管控面数据量相加,获得一个数值,将该数值确定为第一机房与第二机房之间所需的最小带宽。为了使得第一机房与第二机房之间的带宽足够大数据集群使用,所以第一机房与第二机房之间的带宽需求,通常会大于第一机房与第二机房之间所需的最小带宽。The first device adds the data transmission volume and the management and control plane data volume to obtain a value, and determines the value as the minimum bandwidth required between the first computer room and the second computer room. In order to make the bandwidth between the first computer room and the second computer room sufficient for large data cluster use, the bandwidth requirement between the first computer room and the second computer room is usually greater than the minimum bandwidth required between the first computer room and the second computer room. .
例如,假设第一类组件中的一个组件与第二类组件中的一个组件之间的吞吐量小于50Mb/s/节点,管理面数据量为5Mb/s/节点,控制面数据量为1Gb/s。For example, assume that the throughput between a component in the first category and a component in the second category is less than 50Mb/s/node, the management plane data volume is 5Mb/s/node, and the control plane data volume is 1Gb/ s.
参见表三,总的组件数目为50~100,第一类组件的数目为50~450,第二类组件的数目为50以内,控制节点和管理节点部署在第一机房,第一类组件部署在第一机房,第二类组件部署在第二机房,跨机房的管控面数据量为50*5Mb/s/组件+1Gb/s,跨机房的组件间数据传输量为50*50Mb/s/组件,最小带宽为3.75GE,带宽需求为10GE,1GE表示1000Mb/s。Refer to Table 3. The total number of components is 50 to 100. The number of first type components is 50 to 450. The number of second type components is less than 50. The control node and management node are deployed in the first computer room. The first type components are deployed in In the first computer room, the second type of components is deployed in the second computer room. The data volume of the control plane across the computer room is 50*5Mb/s/component+1Gb/s, and the data transmission volume between components across the computer room is 50*50Mb/s/ Component, the minimum bandwidth is 3.75GE, the bandwidth requirement is 10GE, 1GE means 1000Mb/s.
总的组件数目为500~1000,第一类组件的数目为400~900,第二类组件的数目为100以内,控制节点和管理节点部署在第一机房,第一类组件部署在第一机房,第二类组件部署在第二机房,跨机房的管控面数据量为100*5Mb/s/组件+1Gb/s,跨机房的组件间数据传输量为100*50Mb/s/组件,最小带宽为6.5GE,带宽需求为10GE。The total number of components is 500 to 1000, the number of first type components is 400 to 900, and the number of second type components is less than 100. The control node and management node are deployed in the first computer room, and the first type components are deployed in the first computer room. , the second type of component is deployed in the second computer room. The data volume of the control plane across the computer room is 100*5Mb/s/component + 1Gb/s. The data transmission volume between components across the computer room is 100*50Mb/s/component. The minimum bandwidth It is 6.5GE and the bandwidth requirement is 10GE.
总的组件数目为1000~2000,第一类组件的数目为800~1800,第二类组件的数目为200以内,控制节点和管理节点部署在第一机房,第一类组件部署在第一机房,第二类组件部署在第二机房,跨机房的管控面数据量为200*5Mb/s/组件+1Gb/s,跨机房的组件间数据传输量为200*50Mb/s/组件,最小带宽为12GE,带宽需求为20GE。The total number of components is 1000 to 2000, the number of the first type of components is 800 to 1800, and the number of the second type of components is within 200. The control node and management node are deployed in the first computer room, and the first type of components are deployed in the first computer room. , the second type of component is deployed in the second computer room. The data volume of the control plane across the computer room is 200*5Mb/s/component + 1Gb/s. The data transmission volume between components across the computer room is 200*50Mb/s/component. The minimum bandwidth It is 12GE and the bandwidth requirement is 20GE.
表三

Table 3

第一设备在确定出第一机房与第二机房之间的带宽需求之后,将该带宽需求和大数据集群的部署方案一起输出给第二设备。或者,向发送确定请求的设备发送该带宽需求和大数据集群的部署方案,或者,显示大数据集群的部署方案时,同时显示该带宽需求。After determining the bandwidth requirement between the first computer room and the second computer room, the first device outputs the bandwidth requirement and the deployment plan of the big data cluster to the second device. Or, send the bandwidth requirement and the deployment plan of the big data cluster to the device that sent the confirmation request, or, when displaying the deployment plan of the big data cluster, display the bandwidth requirement at the same time.
这样,还可以输出带宽需求,为机房之间的带宽设置提供参考。In this way, bandwidth requirements can also be output to provide a reference for bandwidth settings between computer rooms.
本申请实施例中,能够自动化输出跨机房部署大数据集群的部署方案,有效降低了人为随机进行跨机房部署的复杂程度,解决了由于人为随机部署大数据集群,导致计算复杂、操作复杂和部署时间长的问题。In the embodiment of this application, the deployment plan for deploying big data clusters across computer rooms can be automatically output, effectively reducing the complexity of manual random deployment of cross-computer rooms, and solving the problem of complex calculations, complex operations and deployment caused by manual random deployment of big data clusters. Long time problem.
而且在跨机房部署大数据集群时,不是随意在机房中部署组件,而是考虑了大数据集群中组件的类别,这样,在不降低大数据集群的计算性能的前提下,能够尽量减少机房间的数据传输量,进而能够降低机房间的带宽需求。而且在降低机房间的带宽需求后,还可以降低机房间的网络成本。Moreover, when deploying big data clusters across computer rooms, we do not randomly deploy components in the computer room, but consider the categories of components in the big data cluster. In this way, we can minimize the number of computer rooms without reducing the computing performance of the big data cluster. The amount of data transmission can thereby reduce the bandwidth requirements of the computer room. Moreover, after reducing the bandwidth requirements of the computer room, the network cost of the computer room can also be reduced.
下面描述本申请实施提供的大数据集群部署方案的确定装置。The following describes the device for determining the big data cluster deployment solution provided by this application.
图5是本申请实施例提供的大数据集群部署方案的确定装置的结构图。该装置可以通过软件、硬件或者两者的结合实现成为装置中的部分或者全部。本申请实施例提供的装置可以实现本申请实施例图3所示的流程,该装置包括:获取模块510和确定模块520,其中:Figure 5 is a structural diagram of a device for determining a big data cluster deployment solution provided by an embodiment of the present application. The device can be implemented as part or all of the device through software, hardware, or a combination of both. The device provided by the embodiment of the present application can implement the process shown in Figure 3 of the embodiment of the present application. The device includes: an acquisition module 510 and a determination module 520, wherein:
获取模块510,用于接收输入的大数据集群的部署需求,所述部署需求包括所述大数据集群中待部署组件的部署需求信息、部署所述大数据集群的多个机房的机房容量信息和部署所述大数据集群的主机的参数信息,具体可以用于实现步骤301的获取功能以及执行步骤301包含的隐含步骤;The acquisition module 510 is configured to receive the input deployment requirements of the big data cluster. The deployment requirements include deployment requirement information of the components to be deployed in the big data cluster, computer room capacity information of multiple computer rooms in which the big data cluster is deployed, and The parameter information of the host where the big data cluster is deployed can be used to implement the acquisition function of step 301 and execute the implicit steps included in step 301;
确定模块520,用于基于所述部署需求和所述待部署组件的类别,确定所述待部署组件的部署方案,所述部署方案包括所述待部署组件部署的机房以及在机房中部署的主机;Determining module 520, configured to determine a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed. The deployment plan includes the computer room where the component to be deployed is deployed and the hosts deployed in the computer room. ;
输出所述部署方案,具体可以用于实现步骤302和步骤303的获取功能以及执行步骤302和步骤303包含的隐含步骤。Outputting the deployment plan can specifically be used to implement the acquisition functions of steps 302 and 303 and to execute the implicit steps included in steps 302 and 303.
在一种示例中,所述确定模块520,用于:In an example, the determining module 520 is used to:
基于所述部署需求和第一部署策略,确定所述待部署组件的部署方案;Based on the deployment requirements and the first deployment strategy, determine a deployment plan for the component to be deployed;
所述第一部署策略为将所述待部署组件中第一类组件中同一个组件的数据存储部分和计算部分部署在相同机房的策略。The first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component among the components to be deployed in the same computer room.
在一种示例中,所述确定模块520,用于:In an example, the determining module 520 is used to:
确定满足所述待部署组件的部署需求信息的主机为所述待部署组件部署的主机;Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;
基于所述机房容量信息、所述待部署组件部署的主机和第一部署策略,确定所述待部署组件部署的主机所属的机房。Based on the computer room capacity information, the host where the component to be deployed is deployed, and the first deployment policy, the computer room to which the host where the component to be deployed belongs belongs is determined.
在一种示例中,所述确定模块520,还用于:In one example, the determining module 520 is also used to:
基于所述部署需求和第一部署策略,确定所述待部署组件的部署方案之前,基于所述第一类组件的部署需求信息和所述参数信息,确定所述第一类组件所需的主机数目;Based on the deployment requirements and the first deployment strategy, before determining the deployment plan of the component to be deployed, determine the host required by the first type component based on the deployment requirement information and the parameter information of the first type component. number;
基于所述机房容量信息,确定所述主机数目大于所述多个机房中每个机房所容纳的主机的数目。Based on the computer room capacity information, it is determined that the number of hosts is greater than the number of hosts accommodated by each of the plurality of computer rooms.
在一种示例中,所述待部署组件中除所述第一类组件之外的组件部署在同一个机房中。In one example, the components to be deployed except for the first type of components are deployed in the same computer room.
在一种示例中,所述确定模块520,用于:In an example, the determining module 520 is used to:
基于所述待部署组件中第一类组件的部署需求信息,确定所述第一类组件所需的主机数目; Determine the number of hosts required for the first type of component based on the deployment requirement information of the first type of component among the components to be deployed;
在所述多个机房中存在所容纳的主机的数目大于或等于所述主机数目的情况下,基于所述部署需求和第二部署策略,确定所述待部署组件的部署方案;If the number of hosts accommodated in the multiple computer rooms is greater than or equal to the number of hosts, determine a deployment plan for the component to be deployed based on the deployment requirements and the second deployment strategy;
所述第二部署策略为将所述第一类组件部署在同一个机房的策略。The second deployment strategy is a strategy for deploying the first type of components in the same computer room.
在一种示例中,所述多个机房包括所述第一机房和第二机房;In one example, the plurality of computer rooms include the first computer room and the second computer room;
所述确定模块520,用于:The determination module 520 is used for:
确定满足所述待部署组件的部署需求信息的主机为所述待部署组件部署的主机;Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;
确定所述第一类组件部署的主机所属的机房为所述第一机房,确定所述待部署组件中除所述第一类组件之外的第二类组件部署的机房为所述第二机房,所述第一机房所容纳的主机的数目大于或等于所述主机数目。It is determined that the computer room to which the host of the first type of components belongs is the first computer room, and the computer room to which the second type of components other than the first type of components to be deployed is deployed is determined to be the second computer room. , the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
在一种示例中,所述第一类组件包括基于HDFS和YRAN的组件;In one example, the first type of components includes components based on HDFS and YRAN;
所述待部署组件中除所述第一类组件之外的组件为非基于所述HDFS和所述YRAN的组件。Among the components to be deployed, components other than the first type of components are components that are not based on the HDFS and the YRAN.
在一种示例中,所述确定模块520,还用于:In one example, the determining module 520 is also used to:
对于所述多个机房中的第一机房和第二机房,确定所述第一类组件在所述第一机房和所述第二机房之间的第一数据传输量;For the first computer room and the second computer room in the plurality of computer rooms, determine the first data transmission amount of the first type component between the first computer room and the second computer room;
确定所述第一类组件与所述第二类组件在所述第一机房和所述第二机房之间的第二数据传输量;Determine the second data transmission amount of the first type component and the second type component between the first computer room and the second computer room;
确定所述第一机房和所述第二机房之间的管理面数据量和控制面数据量;Determine the amount of management plane data and the amount of control plane data between the first computer room and the second computer room;
基于所述第一数据传输量、所述第二数据传输量、所述管理面数据量以及所述控制面数据量,确定所述第一机房和所述第二机房之间的带宽需求。Based on the first data transmission amount, the second data transmission amount, the management plane data amount, and the control plane data amount, the bandwidth requirement between the first computer room and the second computer room is determined.
在一种示例中,所述确定模块520,还用于:In one example, the determining module 520 is also used to:
确定所述第一类组件与所述第二类组件之间的数据传输量,并确定所述第一机房与所述第二机房之间的管理面数据量和控制面数据量;Determine the amount of data transmission between the first type component and the second type component, and determine the management plane data amount and control plane data amount between the first computer room and the second computer room;
基于所述数据传输量、所述管理面数据量以及所述控制面数据量,确定所述第一机房与所述第二机房之间的带宽需求信息。Based on the data transmission volume, the management plane data volume, and the control plane data volume, bandwidth demand information between the first computer room and the second computer room is determined.
其中,获取模块510和确定模块520均可以通过软件实现,或者可以通过硬件实现。示例性的,接下来以确定模块520为例,介绍确定模块520的实现方式。类似的,获取模块510的实现方式可以参考确定模块520的实现方式。Wherein, both the acquisition module 510 and the determination module 520 can be implemented by software, or can be implemented by hardware. Illustratively, next, taking the determination module 520 as an example, the implementation of the determination module 520 is introduced. Similarly, the implementation of the acquisition module 510 can refer to the implementation of the determination module 520 .
模块作为软件功能单元的一种举例,确定模块520可以包括运行在计算实例上的代码。其中,计算实例可以包括物理主机(计算设备)、虚拟机或容器中的至少一种。进一步地,上述计算实例可以是一台或者多台。例如,确定模块520可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。Module As an example of a software functional unit, the determination module 520 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, or a container. Furthermore, the above computing instance may be one or more. For example, determination module 520 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Furthermore, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内,同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。Likewise, the multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (VPC), or across multiple VPCs. Among them, usually a VPC is set up in a region. Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .
模块作为硬件功能单元的一种举例,确定模块520可以包括至少一个计算设备,如服务器等。或者,确定模块520也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)和通用阵列逻辑(generic array logic,GAL)或其任意组合实现。Module As an example of a hardware functional unit, the determination module 520 may include at least one computing device, such as a server. Alternatively, the determination module 520 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). Among them, the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
确定模块520包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。确定模块520包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,确定模块520包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。Multiple computing devices included in the determination module 520 may be distributed in the same region or in different regions. The multiple computing devices included in the determination module 520 may be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the determination module 520 may be distributed in the same VPC or in multiple VPCs. The plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
需要说明的是,在其他实施例中,获取模块510可以用于执行大数据集群部署方案的确定方法中的任 意步骤,确定模块520可以用于执行大数据集群部署方案的确定方法中的任意步骤。获取模块510和确定模块520负责实现的步骤可根据需要指定,通过获取模块510和确定模块520分别实现大数据集群部署方案的确定方法中不同的步骤来实现大数据集群部署方案的确定装置的全部功能。It should be noted that in other embodiments, the acquisition module 510 can be used to perform any of the methods for determining the big data cluster deployment plan. The determination module 520 may be used to perform any step in the method for determining the big data cluster deployment plan. The steps that the acquisition module 510 and the determination module 520 are responsible for can be specified as needed. The acquisition module 510 and the determination module 520 respectively implement different steps in the method for determining the big data cluster deployment plan to realize the entire device for determining the big data cluster deployment plan. Function.
还需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时也可以有另外的划分方式。It should also be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. In actual implementation, there may be other division methods.
下面描述本申请实施例提供的计算设备200。The following describes the computing device 200 provided by the embodiment of the present application.
本申请实施例还提供了一种计算设备200。如图6所示,计算设备200包括:总线1102、处理器1104、存储器1106和通信接口1108。处理器1104、存储器1106和通信接口1108之间通过总线1102通信。计算设备200可以是服务器或终端设备。应理解,本申请不限定计算设备200中的处理器和存储器的个数。An embodiment of the present application also provides a computing device 200. As shown in Figure 6, computing device 200 includes: bus 1102, processor 1104, memory 1106, and communication interface 1108. The processor 1104, the memory 1106 and the communication interface 1108 communicate through a bus 1102. Computing device 200 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 200.
总线1102可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线和控制总线等。为便于表示,图6中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线1102可包括在计算设备200各个部件(例如,存储器1106、处理器1104和通信接口1108)之间传送信息的通路。The bus 1102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus and control bus. For ease of presentation, only one line is used in Figure 6, but it does not mean that there is only one bus or one type of bus. Bus 1102 may include a path that carries information between various components of computing device 200 (eg, memory 1106, processor 1104, and communications interface 1108).
处理器1104可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。The processor 1104 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
存储器1106可以包括易失性存储器(volatile memory),例如,随机存取存储器(random access memory,RAM)。存储器1106还可以包括非易失性存储器(non-volatile memory),例如,只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。Memory 1106 may include volatile memory, such as random access memory (RAM). The memory 1106 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive). drive, SSD).
存储器1106中存储有可执行的程序代码,处理器1104执行该可执行的程序代码以分别实现前文中获取模块510和确定模块520的功能,从而实现大数据集群部署方案的确定方法。也即,存储器1106上存有用于执行大数据集群部署方案的确定方法的指令。The memory 1106 stores executable program code, and the processor 1104 executes the executable program code to respectively implement the functions of the acquisition module 510 and the determination module 520 mentioned above, thereby realizing the method for determining the big data cluster deployment plan. That is, the memory 1106 stores instructions for executing the determined method of the big data cluster deployment plan.
通信接口1108使用例如但不限于网络接口卡和收发器一类的收发模块,来实现计算设备200与其他设备或通信网络之间的通信。The communication interface 1108 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 200 and other devices or communication networks.
下面描述本申请实施例提供的计算设备集群。The following describes the computing device cluster provided by the embodiment of the present application.
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一个计算设备。该计算设备可以是服务器,例如,该计算设备是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。An embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device may be a server, for example, the computing device may be a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
如图7所示,该计算设备集群包括至少一个计算设备200。计算设备集群中的一个或多个计算设备200中的存储器1106中可以存有相同的用于执行大数据集群部署方案的确定方法的指令。As shown in FIG. 7 , the computing device cluster includes at least one computing device 200 . The memory 1106 of one or more computing devices 200 in the computing device cluster may store the same instructions for executing the determined method of the big data cluster deployment plan.
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备200的存储器1106中也可以分别存有用于执行大数据集群部署方案的确定方法的部分指令。换言之,一个或多个计算设备200的组合可以共同执行用于执行大数据集群部署方案的确定方法的指令。In some possible implementations, the memory 1106 of one or more computing devices 200 in the computing device cluster may also store part of the instructions for executing the method for determining the big data cluster deployment plan. In other words, a combination of one or more computing devices 200 may jointly execute instructions for performing the determined method of the big data cluster deployment scenario.
需要说明的是,计算设备集群中的不同的计算设备200中的存储器1106可以存储不同的指令,分别用于执行前文中大数据集群部署方案的确定装置的部分功能。也即,不同的计算设备200中的存储器1106存储的指令可以实现获取模块510和确定模块520中的一个或多个模块的功能。It should be noted that the memory 1106 in different computing devices 200 in the computing device cluster can store different instructions, respectively used to execute part of the functions of the device for determining the big data cluster deployment solution mentioned above. That is, instructions stored in the memory 1106 in different computing devices 200 may implement the functions of one or more of the acquisition module 510 and the determination module 520 .
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,该网络可以是广域网或局域网等等。图8示出了一种可能的实现方式。如图8所示,两个计算设备(第一计算设备200A和第二计算设备200B)之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与该网络进行连接。在这一类可能的实现方式中,第一计算设备200A中的存储器1106中存有执行确定模块520的功能的指令。同时,第二计算设备200B中的存储器1106中存有执行获取模块510的功能的指令。In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Among them, the network can be a wide area network or a local area network, etc. Figure 8 shows a possible implementation. As shown in FIG. 8 , two computing devices (a first computing device 200A and a second computing device 200B) are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, the memory 1106 in the first computing device 200A stores instructions for performing the functions of the determination module 520 . At the same time, instructions for performing the functions of the acquisition module 510 are stored in the memory 1106 in the second computing device 200B.
图8所示的计算设备集群之间的连接方式可以是考虑到本申请提供的大数据集群部署方案的确定方法中获取模块510与确定模块520之间存在数据传输,且确定模块520占用的空间比较大,因此考虑将执行确定模块520实现的功能交由第一计算设备200A执行,并且考虑到本申请提供的大数据集群部署方案的确定方法有可能与终端设备进行交互,因此考虑将执行获取模块510实现的功能交由第二计算设备200B执行。The connection method between computing device clusters shown in Figure 8 can be based on the fact that in the determination method of the big data cluster deployment solution provided by this application, there is data transmission between the acquisition module 510 and the determination module 520, and the space occupied by the determination module 520 is relatively large, so the function implemented by the execution determination module 520 is considered to be executed by the first computing device 200A, and considering that the determination method of the big data cluster deployment solution provided by this application may interact with the terminal device, it is considered that the execution acquisition The functions implemented by module 510 are performed by the second computing device 200B.
应理解,图8中示出的第一计算设备200A的功能也可以由多个计算设备200完成。同样,第二计算 设备200B的功能也可以由多个计算设备200完成。It should be understood that the functions of the first computing device 200A shown in FIG. 8 can also be completed by multiple computing devices 200. Likewise, the second calculation The functions of device 200B may also be performed by multiple computing devices 200.
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行大数据集群部署方案的确定方法。An embodiment of the present application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, at least one computing device is caused to execute the method for determining a big data cluster deployment scheme.
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字多功能光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行大数据集群部署方案的确定方法。An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media. The available media may be magnetic media (for example, floppy disks, hard disks, magnetic tapes), optical media (for example, digital video discs (DVD)), or semiconductor media (for example, solid state drives), etc. The computer-readable storage medium includes instructions that instruct the computing device to perform a method for determining a big data cluster deployment plan.
本领域普通技术人员可以意识到,结合本申请中所公开的实施例中描述的各方法步骤和单元,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各实施例的步骤及组成。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the method steps and units described in conjunction with the embodiments disclosed in this application can be implemented with electronic hardware, computer software, or a combination of both. In order to clearly illustrate the relationship between hardware and software Interchangeability, in the above description, the steps and compositions of each embodiment have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. One of ordinary skill in the art may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
本申请中术语“第一”和“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”和“第二”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语“第一”和“第二”等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种示例的范围的情况下,第一类组件可以被称为第二类组件,并且类似地,第二类组件可以被称为第一类组件。第一类组件和第二类组件都可以是问题,并且在某些情况下,可以是单独且不同的问题。In this application, the terms "first" and "second" are used to distinguish identical or similar items with substantially the same functions and functions. It should be understood that there is no logical or logical connection between "first" and "second". Timing dependencies do not limit the number and execution order. It should also be understood that, although the following description uses the terms "first", "second", etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first type of component may be referred to as a second type of component, and similarly, a second type of component may be referred to as a first type component, without departing from the scope of various examples. Both Type 1 components and Type 2 components can be problems, and in some cases, can be separate and distinct problems.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的保护范围。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to depart from the protection scope of the technical solutions of the embodiments of the present application.

Claims (24)

  1. 一种大数据集群部署方案的确定方法,应用于计算设备,其特征在于,所述大数据集群由大数管理平台进行管理,部署所述大数据集群的机房包括多个机房,每个机房中容纳有主机,所述方法包括:A method for determining a big data cluster deployment plan, applied to computing equipment, characterized in that the big data cluster is managed by a large data management platform, and the computer room where the big data cluster is deployed includes multiple computer rooms, and each computer room Host is accommodated, and the method includes:
    接收输入的所述大数据集群的部署需求,所述部署需求包括所述大数据集群中待部署组件的部署需求信息、所述多个机房的机房容量信息和部署所述大数据集群的主机的参数信息;Receive the input deployment requirements of the big data cluster. The deployment requirements include deployment requirement information of the components to be deployed in the big data cluster, computer room capacity information of the multiple computer rooms, and the information of the host on which the big data cluster is deployed. Parameter information;
    基于所述部署需求和所述待部署组件的类别,确定所述待部署组件的部署方案,所述部署方案包括所述待部署组件部署的机房以及在机房中部署的主机;Based on the deployment requirements and the category of the component to be deployed, determine a deployment plan for the component to be deployed, where the deployment plan includes a computer room where the component to be deployed is deployed and a host deployed in the computer room;
    输出所述部署方案。Output the deployment plan.
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述部署需求和所述待部署组件的类别,确定所述待部署组件的部署方案,包括:The method according to claim 1, characterized in that, based on the deployment requirements and the category of the component to be deployed, determining the deployment plan of the component to be deployed includes:
    基于所述部署需求和第一部署策略,确定所述待部署组件的部署方案;Based on the deployment requirements and the first deployment strategy, determine a deployment plan for the component to be deployed;
    所述第一部署策略为将所述待部署组件中第一类组件中同一个组件的数据存储部分和计算部分部署在相同机房的策略。The first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component among the components to be deployed in the same computer room.
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述部署需求和第一部署策略,确定所述待部署组件的部署方案,包括:The method according to claim 2, characterized in that, based on the deployment requirements and the first deployment strategy, determining the deployment plan of the component to be deployed includes:
    确定满足所述待部署组件的部署需求信息的主机为所述待部署组件部署的主机;Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;
    基于所述机房容量信息、所述待部署组件部署的主机和第一部署策略,确定所述待部署组件部署的主机所属的机房。Based on the computer room capacity information, the host where the component to be deployed is deployed, and the first deployment policy, the computer room to which the host where the component to be deployed belongs belongs is determined.
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述部署需求和第一部署策略,计算所述待部署组件的部署方案之前,还包括:The method according to claim 3, characterized in that, before calculating the deployment plan of the component to be deployed based on the deployment requirement and the first deployment strategy, the method further includes:
    基于所述第一类组件的部署需求信息和所述参数信息,确定所述第一类组件所需的主机数目;Determine the number of hosts required for the first type component based on the deployment requirement information and the parameter information of the first type component;
    基于所述机房容量信息,确定所述主机数目大于所述多个机房中每个机房所容纳的主机的数目。Based on the computer room capacity information, it is determined that the number of hosts is greater than the number of hosts accommodated by each of the plurality of computer rooms.
  5. 根据权利要求2至4任一项所述的方法,其特征在于,所述待部署组件中除所述第一类组件之外的组件部署在同一个机房中。The method according to any one of claims 2 to 4, characterized in that, among the components to be deployed, components other than the first type of components are deployed in the same computer room.
  6. 根据权利要求1所述的方法,其特征在于,所述基于所述部署需求和所述待部署组件的类别,确定所述待部署组件的部署方案,包括:The method according to claim 1, characterized in that, based on the deployment requirements and the category of the component to be deployed, determining the deployment plan of the component to be deployed includes:
    基于所述待部署组件中第一类组件的部署需求信息,确定所述第一类组件所需的主机数目;Determine the number of hosts required for the first type of component based on the deployment requirement information of the first type of component among the components to be deployed;
    在所述多个机房中存在所容纳的主机的数目大于或等于所述主机数目的情况下,基于所述部署需求和第二部署策略,确定所述待部署组件的部署方案;If the number of hosts accommodated in the multiple computer rooms is greater than or equal to the number of hosts, determine a deployment plan for the component to be deployed based on the deployment requirements and the second deployment strategy;
    所述第二部署策略为将所述第一类组件部署在同一个机房的策略。The second deployment strategy is a strategy for deploying the first type of components in the same computer room.
  7. 根据权利要求6所述的方法,其特征在于,所述多个机房包括所述第一机房和第二机房;The method according to claim 6, wherein the plurality of computer rooms include the first computer room and the second computer room;
    所述基于所述部署需求和第二部署策略,确定所述待部署组件的部署方案,包括:Determining a deployment plan for the component to be deployed based on the deployment requirement and the second deployment strategy includes:
    确定满足所述待部署组件的部署需求信息的主机为所述待部署组件部署的主机;Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;
    确定所述第一类组件部署的主机所属的机房为所述第一机房,确定所述待部署组件中除所述第一类组件之外的第二类组件部署的机房为所述第二机房,所述第一机房所容纳的主机的数目大于或等于所述主机数目。It is determined that the computer room to which the host of the first type of components belongs is the first computer room, and the computer room to which the second type of components other than the first type of components to be deployed is deployed is determined to be the second computer room. , the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
  8. 根据权利要求2至7任一项所述的方法,其特征在于,所述第一类组件包括基于分布式文件系统HDFS和大数据资源调度器YRAN的组件;The method according to any one of claims 2 to 7, characterized in that the first type of components includes components based on distributed file system HDFS and big data resource scheduler YRAN;
    所述待部署组件中除所述第一类组件之外的组件为非基于所述HDFS和所述YRAN的组件。Among the components to be deployed, components other than the first type of components are components that are not based on the HDFS and the YRAN.
  9. 根据权利要求5所述的方法,其特征在于,所述方法还包括:The method of claim 5, further comprising:
    对于所述多个机房中的第一机房和第二机房,确定所述第一类组件在所述第一机房和所述第二机房之间的第一数据传输量;For the first computer room and the second computer room in the plurality of computer rooms, determine the first data transmission amount of the first type component between the first computer room and the second computer room;
    确定所述第一类组件与所述第二类组件在所述第一机房和所述第二机房之间的第二数据传输量;Determine the second data transmission amount of the first type component and the second type component between the first computer room and the second computer room;
    确定所述第一机房和所述第二机房之间的管理面数据量和控制面数据量;Determine the amount of management plane data and the amount of control plane data between the first computer room and the second computer room;
    基于所述第一数据传输量、所述第二数据传输量、所述管理面数据量以及所述控制面数据量,确定所述第一机房和所述第二机房之间的带宽需求。 Based on the first data transmission amount, the second data transmission amount, the management plane data amount, and the control plane data amount, the bandwidth requirement between the first computer room and the second computer room is determined.
  10. 根据权利要求7所述的方法,其特征在于,所述方法还包括:The method of claim 7, further comprising:
    确定所述第一类组件与所述第二类组件之间的数据传输量,并确定所述第一机房与所述第二机房之间的管理面数据量和控制面数据量;Determine the amount of data transmission between the first type component and the second type component, and determine the management plane data amount and control plane data amount between the first computer room and the second computer room;
    基于所述数据传输量、所述管理面数据量以及所述控制面数据量,确定所述第一机房与所述第二机房之间的带宽需求信息。Based on the data transmission volume, the management plane data volume, and the control plane data volume, bandwidth demand information between the first computer room and the second computer room is determined.
  11. 根据权利要求1至10任一项所述的方法,其特征在于,每个组件的部署需求信息包括操作系统需求信息、数据量或吞吐量中一种或多种。The method according to any one of claims 1 to 10, characterized in that the deployment requirement information of each component includes one or more of operating system requirement information, data volume or throughput.
  12. 根据权利要求1至11任一项所述的方法,其特征在于,所述参数信息包括各种型号的主机的操作系统信息、网络信息或硬件信息中的一种或多种以及所述各种型号的主机的数目。The method according to any one of claims 1 to 11, characterized in that the parameter information includes one or more of operating system information, network information or hardware information of various types of hosts and the various The number of hosts of the model.
  13. 一种大数据集群部署方案的确定装置,应用于计算设备,其特征在于,所述大数据集群由大数管理平台进行管理,部署所述大数据集群的机房包括多个机房,每个机房中容纳有主机,所述装置包括:A device for determining a big data cluster deployment plan, applied to computing equipment, characterized in that the big data cluster is managed by a large number management platform, and the computer room where the big data cluster is deployed includes multiple computer rooms, and each computer room Housing a host computer, the device includes:
    获取模块,用于接收输入的所述获取大数据集群的部署需求,所述部署需求包括所述大数据集群中待部署组件的部署需求信息、所述多个机房的机房容量信息和部署所述大数据集群的主机的参数信息;The acquisition module is configured to receive the input of the deployment requirements of the big data cluster. The deployment requirements include the deployment requirement information of the components to be deployed in the big data cluster, the computer room capacity information of the multiple computer rooms and the deployment description. Parameter information of the host of the big data cluster;
    确定模块,用于基于所述部署需求和所述待部署组件的类别,确定所述待部署组件的部署方案,所述部署方案包括所述待部署组件部署的机房以及在机房中部署的主机;A determination module, configured to determine a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed, where the deployment plan includes a computer room where the component to be deployed is deployed and a host deployed in the computer room;
    输出所述部署方案。Output the deployment plan.
  14. 根据权利要求13所述的装置,其特征在于,所述确定模块,用于:The device according to claim 13, characterized in that the determining module is used to:
    基于所述部署需求和第一部署策略,确定所述待部署组件的部署方案;Based on the deployment requirements and the first deployment strategy, determine a deployment plan for the component to be deployed;
    所述第一部署策略为将所述待部署组件中第一类组件中同一个组件的数据存储部分和计算部分部署在相同机房的策略。The first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component among the components to be deployed in the same computer room.
  15. 根据权利要求14所述的装置,其特征在于,所述确定模块,用于:The device according to claim 14, characterized in that the determining module is used to:
    确定满足所述待部署组件的部署需求信息的主机为所述待部署组件部署的主机;Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;
    基于所述机房容量信息、所述待部署组件部署的主机和第一部署策略,确定所述待部署组件部署的主机所属的机房。Based on the computer room capacity information, the host where the component to be deployed is deployed, and the first deployment policy, the computer room to which the host where the component to be deployed belongs belongs is determined.
  16. 根据权利要求15所述的装置,其特征在于,所述确定模块,还用于:The device according to claim 15, characterized in that the determining module is also used to:
    基于所述部署需求和第一部署策略,确定所述待部署组件的部署方案之前,基于所述第一类组件的部署需求信息和所述参数信息,确定所述第一类组件所需的主机数目;Based on the deployment requirements and the first deployment strategy, before determining the deployment plan of the component to be deployed, determine the host required by the first type component based on the deployment requirement information and the parameter information of the first type component. number;
    基于所述机房容量信息,确定所述主机数目大于所述多个机房中每个机房所容纳的主机的数目。Based on the computer room capacity information, it is determined that the number of hosts is greater than the number of hosts accommodated by each of the plurality of computer rooms.
  17. 根据权利要求14至16任一项所述的方法,其特征在于,所述待部署组件中除所述第一类组件之外的组件部署在同一个机房中。The method according to any one of claims 14 to 16, characterized in that, among the components to be deployed, components other than the first type of components are deployed in the same computer room.
  18. 根据权利要求13所述的装置,其特征在于,所述确定模块,用于:The device according to claim 13, characterized in that the determining module is used to:
    基于所述待部署组件中第一类组件的部署要求信息,确定所述第一类组件所需的主机数目;Determine the number of hosts required for the first type of component based on the deployment requirement information of the first type of component among the components to be deployed;
    在所述多个机房中存在所容纳的主机的数目大于或等于所述主机数目的情况下,基于所述部署需求和第二部署策略,确定所述待部署组件的部署方案;If the number of hosts accommodated in the multiple computer rooms is greater than or equal to the number of hosts, determine a deployment plan for the component to be deployed based on the deployment requirements and the second deployment strategy;
    所述第二部署策略为将所述第一类组件部署在同一个机房的策略。The second deployment strategy is a strategy for deploying the first type of components in the same computer room.
  19. 根据权利要求18所述的装置,其特征在于,所述多个机房包括所述第一机房和第二机房;The device according to claim 18, wherein the plurality of computer rooms include the first computer room and the second computer room;
    所述确定模块,用于:The determination module is used for:
    确定满足所述待部署组件的部署需求信息的主机为所述待部署组件部署的主机;Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;
    确定所述第一类组件部署的主机所属的机房为所述第一机房,确定所述待部署组件中除所述第一类组件之外的第二类组件部署的机房为所述第二机房,所述第一机房所容纳的主机的数目大于或等于所述主机数目。It is determined that the computer room where the host where the first type of component is deployed belongs is the first computer room, and the computer room where the second type of components other than the first type of component among the components to be deployed is deployed is determined as the second computer room. , the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
  20. 根据权利要求15至19任一项所述的装置,其特征在于,所述第一类组件包括基于分布式文件系统HDFS和大数据资源调度器YRAN的组件;The device according to any one of claims 15 to 19, wherein the first type of components includes components based on distributed file system HDFS and big data resource scheduler YRAN;
    所述待部署组件中除所述第一类组件之外的组件为非基于所述HDFS和所述YRAN的组件。Among the components to be deployed, components other than the first type of components are components that are not based on the HDFS and the YRAN.
  21. 根据权利要求17所述的装置,其特征在于,所述确定模块,还用于:The device according to claim 17, characterized in that the determining module is also used to:
    对于所述多个机房中的第一机房和第二机房,确定所述第一类组件在所述第一机房和所述第二机房之 间的第一数据传输量;For the first computer room and the second computer room in the plurality of computer rooms, it is determined that the first type of component is between the first computer room and the second computer room. The first amount of data transmission between;
    确定所述第一类组件与所述第二类组件在所述第一机房和所述第二机房之间的第二数据传输量;Determine the second data transmission amount of the first type component and the second type component between the first computer room and the second computer room;
    确定所述第一机房和所述第二机房之间的管理面数据量和控制面数据量;Determine the amount of management plane data and the amount of control plane data between the first computer room and the second computer room;
    基于所述第一数据传输量、所述第二数据传输量、所述管理面数据量以及所述控制面数据量,确定所述第一机房和所述第二机房之间的带宽需求。Based on the first data transmission amount, the second data transmission amount, the management plane data amount, and the control plane data amount, the bandwidth requirement between the first computer room and the second computer room is determined.
  22. 根据权利要求19所述的装置,其特征在于,所述确定模块,还用于:The device according to claim 19, characterized in that the determining module is also used to:
    确定所述第一类组件与所述第二类组件之间的数据传输量,并确定所述第一机房与所述第二机房之间的管理面数据量和控制面数据量;Determine the amount of data transmission between the first type component and the second type component, and determine the management plane data amount and control plane data amount between the first computer room and the second computer room;
    基于所述数据传输量、所述管理面数据量以及所述控制面数据量,确定所述第一机房与所述第二机房之间的带宽需求信息。Based on the data transmission volume, the management plane data volume, and the control plane data volume, bandwidth demand information between the first computer room and the second computer room is determined.
  23. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;A computing device cluster, characterized by including at least one computing device, each computing device including a processor and a memory;
    所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1至12任一项所述的方法。The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the cluster of computing devices performs the method according to any one of claims 1 to 12.
  24. 一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如权利要求1至12任一项所述的方法。 A computer-readable storage medium, characterized in that it includes computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster performs the method according to any one of claims 1 to 12.
PCT/CN2023/105108 2022-09-15 2023-06-30 Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium WO2024055715A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211123966.5 2022-09-15
CN202211123966.5A CN117742931A (en) 2022-09-15 2022-09-15 Method and device for determining big data cluster deployment scheme, clusters and storage medium

Publications (1)

Publication Number Publication Date
WO2024055715A1 true WO2024055715A1 (en) 2024-03-21

Family

ID=90253232

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/105108 WO2024055715A1 (en) 2022-09-15 2023-06-30 Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium

Country Status (2)

Country Link
CN (1) CN117742931A (en)
WO (1) WO2024055715A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294445A (en) * 2015-05-27 2017-01-04 华为技术有限公司 The method and device stored based on the data across machine room Hadoop cluster
CN107463582A (en) * 2016-06-03 2017-12-12 中兴通讯股份有限公司 The method and device of distributed deployment Hadoop clusters
CN110708369A (en) * 2019-09-25 2020-01-17 深圳市网心科技有限公司 File deployment method and device for equipment nodes, scheduling server and storage medium
CN110704382A (en) * 2019-09-25 2020-01-17 深圳市网心科技有限公司 File deployment method, device, server and storage medium
US10841152B1 (en) * 2017-12-18 2020-11-17 Pivotal Software, Inc. On-demand cluster creation and management
CN114647501A (en) * 2020-12-17 2022-06-21 顺丰科技有限公司 Mycat system deployment, operation and maintenance method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294445A (en) * 2015-05-27 2017-01-04 华为技术有限公司 The method and device stored based on the data across machine room Hadoop cluster
CN107463582A (en) * 2016-06-03 2017-12-12 中兴通讯股份有限公司 The method and device of distributed deployment Hadoop clusters
US10841152B1 (en) * 2017-12-18 2020-11-17 Pivotal Software, Inc. On-demand cluster creation and management
CN110708369A (en) * 2019-09-25 2020-01-17 深圳市网心科技有限公司 File deployment method and device for equipment nodes, scheduling server and storage medium
CN110704382A (en) * 2019-09-25 2020-01-17 深圳市网心科技有限公司 File deployment method, device, server and storage medium
CN114647501A (en) * 2020-12-17 2022-06-21 顺丰科技有限公司 Mycat system deployment, operation and maintenance method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN117742931A (en) 2024-03-22

Similar Documents

Publication Publication Date Title
EP3761170B1 (en) Virtual machine creation method and apparatus
WO2021017301A1 (en) Management method and apparatus based on kubernetes cluster, and computer-readable storage medium
US10013662B2 (en) Virtual resource cost tracking with dedicated implementation resources
CN109491801B (en) Micro-service access scheduling method, micro-service access scheduling device, medium and electronic equipment
US20160092119A1 (en) Data migration between different types of storage systems
US8924561B2 (en) Dynamically resizing a networked computing environment to process a workload
CN106775946B (en) A kind of virtual machine Method of Creation Process
US20200045117A1 (en) Dynamic backoff and retry attempts based on incoming request
US10685033B1 (en) Systems and methods for building an extract, transform, load pipeline
US11336588B2 (en) Metadata driven static determination of controller availability
US11636072B2 (en) Parallel processing of a keyed index file system
US10855538B2 (en) Single management connection automatic device stack configuration system
US20190138474A1 (en) System, method, and recording medium for topology-aware parallel reduction in an accelerator
EP4052126A1 (en) Management of multiple physical function non-volatile memory devices
CN114489954A (en) Tenant creation method based on virtualization platform, tenant access method and equipment
CN112905596A (en) Data processing method and device, computer equipment and storage medium
CN111045802B (en) Redis cluster component scheduling system and method and platform equipment
CN109032753A (en) A kind of isomery virtual hard disk trustship method, system, storage medium and Nova platform
WO2024055715A1 (en) Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium
US11561995B2 (en) Multitenant database instance view aggregation
CN115469807A (en) Disk function configuration method, device, equipment and storage medium
US11226838B2 (en) Container-based management method by changing intelligent container component execution priority using remote calls via remote access unit and remote network functon module
CN112015515B (en) Instantiation method and device of virtual network function
CN113094354A (en) Database architecture method and device, database all-in-one machine and storage medium
US20200133733A1 (en) Hyper-converged infrastructure (hci) ephemeral workload/data provisioning system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23864452

Country of ref document: EP

Kind code of ref document: A1