WO2024055715A1

WO2024055715A1 - Method and apparatus for determining big data cluster deployment scheme, cluster, and storage medium

Info

Publication number: WO2024055715A1
Application number: PCT/CN2023/105108
Authority: WO
Inventors: 冯伟; 武文博
Original assignee: 华为云计算技术有限公司
Priority date: 2022-09-15
Filing date: 2023-06-30
Publication date: 2024-03-21
Also published as: CN117742931A

Abstract

The present application relates to the technical field of big data, and provides a method and apparatus for determining a big data cluster deployment scheme, a cluster, and a storage medium. The method comprises: receiving an inputted deployment demand of a big data cluster, wherein the deployment demand comprises deployment demand information of a component to be deployed of the big data cluster, machine room capacity information of a plurality of machine rooms, and parameter information of hosts for deploying the big data cluster; on the basis of the deployment demand and the category of said component, determining a deployment scheme of said component, wherein the deployment scheme comprises a machine room for deploying said component and a host deployed in the machine room; and outputting the deployment scheme. By using the solution of the present application, the deployment scheme of said component can be automatically determined, but not is manually calculated, and the determination efficiency of the deployment scheme can be improved.

Description

Determination method, device, cluster and storage medium of big data cluster deployment plan

This application claims priority to the Chinese patent application with application number 202211123966.5 and the invention title "Method, device, cluster and storage medium for determining big data cluster deployment scheme" submitted on September 15, 2022, the entire content of which is incorporated by reference. incorporated in this application.

Technical field

This application relates to the field of big data technology, and in particular to a method, device, cluster and storage medium for determining a big data cluster deployment solution.

Background technique

With the development of information technology, big data has been widely used in many fields. Big data is usually processed by a big data cluster, which can be a Hadoop cluster. In some scenarios, after a big data cluster is deployed for a period of time, as the amount of data increases, the big data cluster needs to expand its scale. When the original computer room space of the big data cluster is insufficient, the big data cluster needs to be installed in an off-site computer room. Deployment, equipment between multiple computer rooms transmits data through the network. For example, the big data cluster was originally deployed in the computer room at location A. When expanding the scale, the big data cluster was also deployed in the computer room at location B. The devices in the computer room in location A and location B transmit data through the network.

In related technologies, when deploying a big data cluster, the big data cluster is usually manually divided into multiple parts randomly according to the host information accommodated in multiple computer rooms, and deployed to multiple computer rooms respectively. This method may cause deployment problems. The plan is unreasonable and the efficiency of determining the deployment plan is low.

Contents of the invention

This application provides a method, device, cluster and storage medium for determining a big data cluster deployment plan, which can improve the efficiency of determining the deployment plan.

In the first aspect, this application provides a method for determining a big data cluster deployment plan. The big data cluster is managed by the big data management platform. The computer room where the big data cluster is deployed includes multiple computer rooms, and each computer room houses a host. Methods include:

Receive the input deployment requirements of the big data cluster. The deployment requirements include the deployment requirement information of the components to be deployed in the big data cluster, the computer room capacity information of the multiple computer rooms, and the parameter information of the host where the big data cluster is deployed. Based on the deployment The requirements and the category of the component to be deployed are determined, and the deployment plan of the component to be deployed is determined. The deployment plan includes the computer room where the component to be deployed is deployed and the hosts deployed in the computer room, and the deployment plan is output.

In the solution shown in this application, the deployment demand information of the components to be deployed in the big data cluster, the computer room capacity information of multiple computer rooms, and the parameter information of the host where the big data cluster is deployed are fully considered, and the computing device determines based on this information The deployment plan is generated, which improves the rationality of the deployment plan and the efficiency of determining the deployment plan.

In one example, determining the deployment plan of the component to be deployed based on the deployment requirement and the category of the component to be deployed includes: determining the deployment plan of the component to be deployed based on the deployment requirement and the first deployment strategy, the The first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component in the component to be deployed in the same computer room.

In the solution shown in this application, the data storage part and the computing part of the same component in the first type of component are deployed in the same computer room, which can reduce the cross-computer room data transmission of the data storage part and the computing part of the first type component. This can save bandwidth between computer rooms.

In one example, determining a deployment plan for the component to be deployed based on the deployment requirement and the first deployment strategy includes: determining a host that satisfies the deployment requirement information of the component to be deployed as the host for deployment of the component to be deployed, based on The computer room capacity information, the host where the component to be deployed is deployed, and the first deployment strategy determine the computer room to which the host where the component to be deployed belongs.

In one example, before determining the deployment plan of the component to be deployed based on the deployment requirement and the first deployment strategy, the method further includes: determining the first type of component based on the deployment requirement information and the parameter information. The number of hosts required by the component is determined to be greater than the number of hosts accommodated in each of the multiple computer rooms based on the capacity information of the computer room.

In the solution shown in this application, when the first type of components cannot be fully deployed in each computer room, deploying the data storage part and the computing part of the same component in the first type of components in the same computer room can reduce the number of first type components. The data in the data storage part and the computing part in the class component are transmitted across computer rooms, thereby saving bandwidth between computer rooms.

In one example, the components to be deployed other than the first type component are deployed in the same computer room.

In the solution shown in this application, components other than the first type of components are deployed in the same computer room, which can reduce the data transmission of these components across computer rooms, thereby saving bandwidth between computer rooms.

In one example, determining a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed includes: determining the first category of components based on the deployment requirement information of the component to be deployed. The number of hosts required by the component. If the number of hosts accommodated in the multiple computer rooms is greater than or equal to the number of hosts, a deployment plan for the component to be deployed is determined based on the deployment requirement and the second deployment strategy. The second deployment strategy is a strategy of deploying the first type of components in the same computer room.

In the solution shown in this application, the first type of components can be deployed in a certain computer room among multiple computer rooms. Deploying the first type of components in the same computer room can reduce the cross-machine room transmission of data between the first type of components, and thus Can save bandwidth between computer rooms.

In one example, the plurality of computer rooms include the first computer room and the second computer room, and determining a deployment plan for the component to be deployed based on the deployment requirement and the second deployment strategy includes: determining a deployment plan that satisfies the component to be deployed. The host of the required information is the host where the component to be deployed is deployed, the computer room to which the host of the first type component is deployed belongs is determined to be the first computer room, and the second type of components in the component to be deployed except the first type component are determined. The deployed computer room is the second computer room, and the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.

In the solution shown in this application, deploying the first type of components in one computer room and deploying the second type of components in another computer room can reduce the cross-machine room transmission of data between the first type of components and the The data is transmitted across computer rooms, thereby saving bandwidth between computer rooms.

In one example, the first type of components includes components based on big data file resource system (Hadoop distributed file system, HDFS) and big data resource scheduler (yet another resource negotiator, YARN). The components to be deployed except the Components other than the first type of components are components that are not based on HDFS and YRAN.

In one example, the method further includes: for a first computer room and a second computer room in the plurality of computer rooms, determining a first data transmission amount of the first type component between the first computer room and the second computer room. , determine the second data transmission volume of the first type component and the second type component between the first computer room and the second computer room, determine the sum of the management plane data volume between the first computer room and the second computer room The control plane data volume determines the bandwidth requirement between the first computer room and the second computer room based on the first data transmission volume, the second data transmission volume, the management plane data volume, and the control plane data volume.

In the solution shown in this application, the bandwidth requirements between computer rooms can be determined based on certain strategies, providing a reference value for bandwidth requirements for big data cluster deployers.

In one example, the method further includes: determining the amount of data transmission between the first type component and the second type component, and determining the amount of management plane data and control between the first computer room and the second computer room. Based on the data transmission volume, the management plane data volume and the control plane data volume, the bandwidth demand information between the first computer room and the second computer room is determined.

In one example, the deployment requirement information for each component includes one or more of operating system requirement information, data volume, or throughput.

In one example, the parameter information includes one or more of operating system information, network information, or hardware information of various types of hosts and the number of the various types of hosts.

In a second aspect, this application provides a device for determining a big data cluster deployment solution. The device includes at least one module, and the at least one module is used to implement the above-mentioned first aspect or any one of the examples of the first aspect. The method to determine the big data cluster deployment plan.

In some embodiments, the modules in the determined device of the big data cluster deployment solution are implemented by software, and the modules in the determined device of the big data cluster deployment solution are program modules. In other embodiments, the modules in the determined device of the big data cluster deployment solution are implemented by hardware or firmware.

In a third aspect, the present application provides a computing device cluster. The computing device cluster includes at least one computing device. Each computing device includes a processor and a memory. The processor of the at least one computing device is configured to execute the at least one computing device. The instructions stored in the memory enable the computing device cluster to execute the determined method of the big data cluster deployment solution provided by the above-mentioned first aspect or any example of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium. The computer-readable storage medium includes computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the above first aspect or the third aspect. In one aspect, any of the examples provides a definite method of big data cluster deployment solution.

In a fifth aspect, the present application provides a computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to execute the above-mentioned first aspect or any one of the examples of the first aspect. The method to determine the big data cluster deployment plan.

Description of drawings

Figure 1 is a schematic diagram of off-site expansion of a big data cluster provided by an exemplary embodiment of the present application;

Figure 2 is a schematic diagram of the system architecture provided by an exemplary embodiment of the present application;

Figure 3 is a schematic flowchart of a method for determining a big data cluster deployment solution provided by an exemplary embodiment of the present application;

Figure 4 is a schematic flowchart of a method for determining a big data cluster deployment solution provided by an exemplary embodiment of the present application;

Figure 5 is a schematic structural diagram of a device for determining a big data cluster deployment solution provided by an exemplary embodiment of the present application;

Figure 6 is a schematic structural diagram of a computing device provided by an exemplary embodiment of the present application;

Figure 7 is a schematic structural diagram of a computing device cluster provided by an exemplary embodiment of the present application;

Figure 8 is a schematic connection diagram of a computing device provided by an exemplary embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

Some terms and concepts involved in the embodiments of this application are explained below.

1. YARN is the resource manager of Hadoop. It is a general resource management system that can provide unified resource management and scheduling for upper-layer applications. YARN is also known as another resource coordinator.

2. HDFS is a distributed file system of Hadoop, which can provide a highly available distributed file system for obtaining application data.

3. Big data array database (Hadoop database, HBase) is a column-oriented non-relational structured query language (NoSQL) database built on HDFS, which is used to quickly read and write large amounts of data. data.

4. Big data data warehouse (Hive) is a basic data warehouse framework built on Hadoop. Hive provides a series of tools for data extraction, transformation and loading (extract-transform-load, ETL).

The application scenarios of this application are described below.

Application scenario 1: The embodiment of this application is applied to the scenario of deploying big data clusters in multiple computer rooms. For example, the scale of big data clusters is often relatively large, and the space and capacity of a single computer room are limited, so big data clusters are often deployed in multiple computer rooms.

Application scenario 2: The embodiment of this application is applied to the scenario of remote expansion. For example, after a big data cluster is deployed in a computer room for a period of time, as the amount of data increases, the big data cluster needs to expand its scale. In many cases, due to insufficient space in the original computer room, big data clusters need to expand the host capacity in an off-site computer room. For example, see Figure 1. When the original first computer room is insufficient in space, a second computer room is built at another location, and the big data cluster is deployed in both the first computer room and the second computer room.

In application scenario one and application scenario two, the big data cluster supports cross-computer room deployment and cross-computer room expansion. The big data cluster is a complete cluster for the upper-layer application system and other business systems. The upper-layer application system and other business systems will not Aware of the physical deployment form of big data clusters. When a newly deployed big data cluster or an expanded big data cluster is connected to the management platform of the big data cluster, the management platform of the big data cluster will perform unified management.

It should be noted that in application scenario two, although it is off-site expansion, in essence, it is still a cross-machine room deployment of big data clusters.

The system architecture of the embodiment of this application is described below.

The embodiment of this application provides a system architecture 100. As shown in Figure 2, the system architecture 100 includes a first device 101 and a second device 102. Both the first device 101 and the second device 102 can be computing devices such as terminals or servers (for computing devices, see the computing device 200 described later). , the first device 101 and the second device 102 are connected through a wired or wireless network. Among them, the first device 101 is used to determine the deployment plan. The deployment plan includes the computer room where the components in the big data cluster are deployed and the hosts deployed in the computer room. The second device 102 is used to deploy the components of the big data cluster according to the deployment plan. to the corresponding host. Optionally, the first device and/or the second device may also be a server or a virtual machine in the data center of the cloud computing platform to provide users with cloud services that determine the deployment plan.

In Figure 2, the second device 102 is responsible for deploying components. In another implementation, the second device 102 is not included in the system architecture 100. After the first device 101 determines the deployment plan, it will cluster the big data according to the deployment plan. The components are deployed to the corresponding hosts.

The following describes the process of determining the big data cluster deployment plan in the embodiment of this application.

Figure 3 provides the process of determining the big data cluster deployment plan, see step 301 to step 303. In FIG. 3 , the first device 101 determines the deployment plan as an example to illustrate the plan.

Step 301: Receive the input deployment requirements of the big data cluster. The deployment requirements include the deployment requirement information of the components to be deployed in the big data cluster, the computer room capacity information of the multiple computer rooms where the big data cluster is deployed, and the information of the computer rooms where the big data cluster is deployed. Host information.

Among them, the big data cluster is a big data Hadoop cluster, and the components to be deployed include components based on HDFS and YARN in the big data cluster, as well as components not based on HDFS and YARN. For example, components based on HDFS and YARN include HDFS components, YRAN components, HBase components, Hive components, Spark components, and Flink components, etc., and components not based on HDFS and YARN include Kafka components (a high-throughput Distributed publish-subscribe messaging system components), elastic search (Elastic Search) components, remote data service (remote dictionary server, Redis) components and Flume components (a highly available, highly reliable, distributed massive log collection , aggregation and transmission system components), etc. The deployment requirements include deployment requirement information of the components to be deployed, capacity information of multiple computer rooms where the big data cluster is deployed, and parameter information of the host where the big data cluster is deployed.

In this embodiment, when deploying a big data cluster, an input interface for deployment requirements is provided to the user, and the user inputs the deployment requirements for the big data cluster through the input interface. The first device may obtain the deployment requirements input by the user. The input interface can be provided in the form of a graphical interface, a command line, or an application programming interface (API) on a cloud computing platform.

Or, when deploying the big data cluster, the user triggers the terminal device to send a confirmation request for the deployment plan to the first device. The first device receives a determination request of the deployment plan, and the determination request includes the deployment requirements of the big data cluster. The first device may obtain the deployment requirement in the determination request.

In one example, for any component, the deployment requirement information of the component includes one or more of operating system requirement information, data volume or throughput, and the operating system requirement information indicates the operating system of the host on which the component is deployed, The data volume indicates the amount of data to be processed by the component, and the throughput indicates the amount of data transmitted by the component per unit time.

In one example, the parameter information of the host includes one or more of operating system information, network information, or hardware information of various types of hosts, as well as the number of various types of hosts. The operating system information indicates the operating system of the host, the network information indicates the bandwidth of the host, etc., and the hardware information indicates the central processing unit (CPU) model and storage resources of the host.

In one example, multiple computer rooms can be set up in the same city or in different cities. When placing hosts in the computer room, no racks are set up. The computer room capacity information of each computer room includes the number of hosts accommodated. When setting up a host in a computer room, the host is placed in the computer room through a rack. The computer room capacity information of each computer room includes the number of racks accommodated and the number of hosts placed in each rack. When setting up hosts in a computer room, some hosts are placed directly in the computer room, and other hosts are placed in the computer room through racks. The computer room capacity information of each computer room includes the number of racks accommodated and the number of hosts accommodated by each rack. and the number of individually accommodated hosts.

Step 302: Determine a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed. The deployment plan includes the computer room where the component to be deployed is deployed and the hosts deployed in the computer room.

Among them, the category of the component to be deployed is used to distinguish the component to be deployed.

In this embodiment, the first device uses the deployment requirement and the category of the component to be deployed to determine the computer room where the component to be deployed is to be deployed and the host to be deployed in the computer room, that is, the first device obtains the deployment plan of the component to be deployed. The host to which the component to be deployed here can be the model to which the host is deployed, or the ID of the host to which it is deployed. There may be multiple hosts of the same model in the computer room, and there may only be one host with the same identification in the computer room. .

In an example, the components to be deployed can be divided into a first type of component and a second type of component. Optionally, the first type of component is a component based on HDFS and YRAN, and the second type of component is a component not based on HDFS and YRAN. .

In order to reduce the amount of data transmission of the same component in the first type of component in different computer rooms, the data storage part and the computing part of the same component in the first type of component can be deployed in the same computer room. The processing of step 302 can be as follows:

The first device obtains a first deployment strategy of storage, where the first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component in the first type of component in the same computer room, and determines the components included in the first type of component. Based on the deployment requirements of the big data cluster and the first deployment strategy, the first device deploys the data storage part and the computing part of the same component in the first type of component in the same computer room. For example, for the HBase component, the data storage part of the HBase component is deployed on host 1, host 2, host 3 and host 4, and the corresponding computing part of the HBase component is deployed on host 1, host 2, host 3 and host 4. Host 1 , Host 2, Host 3 and Host 4 are deployed in computer room A. For the Hive component, the data storage part of the Hive component is deployed on host 5, host 6, host 7 and host 8. The corresponding computing part of the Hive component is deployed on host 5, host 6, host 7 and host 8. Host 5, host 6. Host 7 and host 8 are deployed in computer room B.

Optionally, when deploying the component using the first deployment strategy, the processing method is as follows:

The first device determines a host that satisfies the deployment requirement information of the first type component. For example, for component A in the first category of components, the deployment requirement information of component A is a Windows operating system, a data volume of 500M, and a throughput of 1G. The deployment requirement information of component A is met. The operating system of the information host is a Windows operating system. The host can process 500M data volume for component A, and the host can transmit 1G data volume for component A within a unit time.

The first device determines the host that meets the deployment requirement information as the host where the component to be deployed is deployed, and then for the same component in the first type of component, the first device uses the computer room capacity information and the host to deploy the component to be deployed, and deploys the same component. The hosts of the data storage part and the computing part of a component are set up in the same computer room, so that there is no need for cross-machine room data transmission between the data storage part and the computing part of the same component. For the second type of components other than the first type of components among the components to be deployed, the first device deploys the hosts to which the second type components are deployed in the same computer room, thereby reducing the amount of data transmission of the second type components across computer rooms. This is because the second type of components is relatively small and can be deployed in the same computer room.

Optionally, resource pools are divided for different components in the first type of components, and different components correspond to different resource pools. Configure the host label of the data storage part for each component, and configure the host label of the computing part for each component. The host label of the data storage part corresponds to the host label of the computing part. The host corresponding to the host label of the data storage part constitutes the component. resource pool. For example, the host labels configured for the data storage part of the HBase component in the first type of component are label 1, label 2, label 3 and label 4. The hosts corresponding to label 1, label 2, label 3 and label 4 are deployed in the first computer room. , the host labels configured for the computing part corresponding to the HBase component are label 1, label 2, label 3 and label 4, so that the computing part is scheduled to the hosts corresponding to label 1, label 2, label 3 and label 4 during scheduling. When executing computing tasks, Yarn dynamically associates computing tasks in the computing task queue with resource pools with corresponding labels based on the resource requirements of the computing task queue.

In another example, when the data storage part and the computing part of the same component in the first type of component are deployed in the same computer room, the processing in step 302 can be as follows:

Develop a deployment plan generation software in advance. The input of the deployment plan generation software is the deployment requirements of the big data cluster and the categories of components to be deployed. The output is the deployment plan of the big data cluster. When configuring the deployment plan generation software, it will first The data storage part and computing part of the same component in the class component are configured for the purpose of being deployed in the same computer room.

The first device inputs the deployment requirements of the big data cluster and the categories of components to be deployed into the deployment plan generation software, and the deployment plan generation software outputs a deployment plan, and the deployment plan is the deployment plan of the big data cluster.

In another example, the components to be deployed include a first type of component and a second type of component. The first type of component is a component based on HDFS and YRAN, and the second type of component is a component that is not based on HDFS and YRAN. The second type of component It can be considered as components other than the first type of components to be deployed. In order to reduce the amount of data transmission between different components included in the first type of components between different computer rooms, consider deploying the first type of components on the host in the same computer room. The processing method is as follows:

Referring to Figure 4, step 401, the first device uses the deployment requirement information of each component included in the first type of component and the parameter information of the host to determine the number of hosts required by each component, and compares the number of hosts required by each component. Add, the number of hosts required to obtain the first type of components. The first device uses the computer room capacity information of each computer room in the plurality of computer rooms to determine the number of hosts accommodated in each computer room. For example, the parameter information of the host is: there are 10 hosts in the Windows system, the hardware is CPU, the CPU is 2*32 cores, the memory is 4*32G, etc., there are 12 hosts in the Linux system, the hardware is CPU, and the CPU is 2* 32 cores, memory 8*32G, the number of hosts required for the first type of component is 13, and the capacity information of the first computer room in multiple computer rooms is to accommodate 20 hosts.

Step 402: The first device determines the relationship between the number of hosts required for the first type of component and the number of hosts accommodated in each computer room. In the case where the number of hosts accommodated in multiple computer rooms is greater than or equal to the number of hosts , then it is determined that all the first-type components can be deployed in one of the multiple computer rooms. For example, the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.

Step 403: The first device obtains the second deployment strategy of storage. The second deployment strategy is a strategy for deploying the first type of components in the same computer room. Based on the deployment requirements of the big data cluster and the second deployment strategy, the first device determines the deployment plan of the big data cluster, that is, deploying the first type of components in the same computer room, and for the second type of components, deploy the second type of components in A computer room other than the computer room where the first type components are deployed, or part of it is deployed in the computer room where the first type components are deployed, and the other part is deployed in a computer room other than the computer room where the first type components are deployed.

Step 404: If the number of hosts accommodated in multiple computer rooms is not greater than or equal to the number of hosts, the first deployment strategy described above can be used to deploy the components to be deployed. For detailed description, please refer to the previous description. No further details will be given.

Optionally, when deploying components using the second deployment strategy, the processing method is as follows:

In the case where the multiple computer rooms include a first computer room and a second computer room, the first device determines a host that satisfies the deployment requirement information of the first type of component. The first device determines the host that meets the deployment requirement information as the host where the component to be deployed is deployed.

When the number of hosts accommodated in the first computer room is greater than the number of hosts, the first device determines the host where the first type of component is deployed as the first computer room, and determines the host where the second type of component is deployed as the second computer room. This is because the second type of components is relatively small and can be deployed in the same computer room.

For example, the big data cluster is a Hadoop cluster. The first type of components includes HDFS components, HBase components, Yarn components, Spark components, Spark2X components, Hive components, MapReduce (MapReduce) components, and Storm components (a distributed real-time computing System system component), Zookeeper component (ZooKeeper component is a distributed, open source distributed application coordination service component), database (data base, DB) service (Service) component, network authentication protocol (kerberos, Krb) service (Server) component, Hadoop user experience (Hadoop user experience, Hue) component and lightweight directory access protocol (lightweight directory access protocol, Ldap) Server component. The second type of components includes Elastic Search components, enterprise-level search application server (Solr) components, Redis components, graph database (GraphBase) components, Kafka components, loader (Loader) components, file transfer protocol (FTP)- Server component and Oozie component (a task scheduling framework component). Table 1 provides the deployment scheme of the components in the Hadoop cluster.

Table I

In Table 1, the host type indicates the type of host on which the component is deployed. In Table 1, three different host types are shown, which are represented by type 1, type 2 and type 3 respectively. The details of the hosts of the three host types are See Table 2.

Table II

Using this deployment plan, when the first computer room can accommodate all the first-category components, deploying the first-category components in the first computer room can reduce the data transmission of the components in the first-category components between different computer rooms. quantity.

It should be noted that in the big data expansion scenario, if the hosts in the originally built computer room cannot be moved, you can directly obtain the number of hosts that the built computer room can accommodate. In Table 2, the management node is to centrally manage the components deployed in the big data cluster, and the control node is the relevant node that performs resource scheduling and task allocation.

Step 303: Output the deployment plan.

In this embodiment, after the first device determines the deployment plan of the component to be deployed, it can output the deployment plan to the second device, and the second device can deploy the component to be deployed to hosts in multiple computer rooms based on the deployment plan. For example, after the first device determines the deployment plan of the component to be deployed, it generates a deployment task list based on the deployment plan. The deployment task list can be output to the second device in the form of an offline export spreadsheet (such as an EXCEL table). The big data cluster installation software of the second device can use the deployment task list to deploy the components to be deployed to hosts in multiple computer rooms.

Alternatively, after determining the deployment plan of the component to be deployed, the first device may send the deployment plan to the device that sent the determination request.

Alternatively, after determining the deployment plan of the component to be deployed, the first device can display the deployment plan.

Alternatively, after the first device determines the deployment plan of the component to be deployed, it can use a dedicated device to access the host, and deploy the component to be deployed in the host.

Using the process shown in Figure 3, based on the deployment requirements of the big data cluster and the categories of components to be deployed, the deployment plan of the components to be deployed is automatically determined instead of manually calculating the deployment plan. This can improve the efficiency of determining the deployment plan, and thus improve the efficiency of the deployment plan. Data cluster deployment efficiency.

In one example, when the first deployment strategy is used to determine the deployment plan, the first device can also determine the bandwidth requirements between any two of the multiple computer rooms, considering that the second type of components are deployed in the same computer room. , the processing method for determining bandwidth requirements is as follows:

For the first computer room and the second computer room in the plurality of computer rooms, the first data transmission amount of the first type component between the first computer room and the second computer room is determined, and the first data transmission amount of the first type component and the second type component between the first type component and the second type component is determined. The second data transmission volume between the computer room and the second computer room determines the management plane data volume and control plane data volume between the first computer room and the second computer room, based on the first data transmission volume, the second data transmission volume, the management The amount of plane data and the amount of control plane data determine the bandwidth requirements between the first computer room and the second computer room.

In this embodiment, when determining the bandwidth requirement between two computer rooms, the first computer room and the second computer room among multiple computer rooms are used as an example for explanation. In a big data cluster, management nodes and control nodes can be deployed in the same computer room or in different computer rooms.

In the case where the management node and the control node are both deployed in the first computer room or the second computer room, or in the case where the management node and the control node are deployed in the first computer room and the second computer room respectively, the relationship between the first computer room and the second computer room The data transmission volume between three parts is considered. The first part is the management plane data volume and the control plane data volume between the first computer room and the second computer room. The second part is the first type component between the first computer room and the second computer room. The third part is the second data transmission amount between the first type component and the second type component between the first computer room and the second computer room. The minimum bandwidth required between the first computer room and the second computer room is the sum of the management plane data volume, the control plane data volume, the first data transmission volume, and the second data transmission volume. In order to make the bandwidth between the first computer room and the second computer room sufficient for large data cluster use, the bandwidth requirement between the first computer room and the second computer room is usually greater than the minimum bandwidth required between the first computer room and the second computer room. .

When no management node or control node is deployed in either the first computer room or the second computer room, there is no management plane data volume or control plane data volume between the first computer room and the second computer room. When the management node is deployed in the first computer room or the second computer room, and no control node is deployed in either the first computer room or the second computer room, there is no control plane data volume between the first computer room and the second computer room. When control nodes are deployed in the computer room, and management nodes are not deployed in either the first computer room or the second computer room, there is no management plane data volume between the first computer room and the second computer room. Here, there is no management plane data amount and control plane data amount, and it can be considered that both the management plane data amount and the control plane data amount are 0.

After determining the bandwidth requirements between each two computer rooms in the plurality of computer rooms, the first device outputs the bandwidth requirements and the deployment plan of the big data cluster to the second device. Or, send the bandwidth requirement and the deployment plan of the big data cluster to the device that sent the confirmation request, or, when displaying the deployment plan of the big data cluster, display the bandwidth requirement at the same time.

It should be noted that when calculating bandwidth requirements, the second type of components is deployed in the same computer room as an example. When the second type of components are deployed in different computer rooms, it is also necessary to consider the deployment of the second type of components in different computer rooms. The third amount of data transfer between.

In one example, when the second deployment strategy is used to determine the deployment plan, the second device can also determine the bandwidth requirements between the first computer room and the computer room. The processing method is as follows:

Determine the data transmission volume between the first type component and the second type component, and determine the management plane data volume and control plane data volume between the first computer room and the second computer room. Based on the data transmission volume and the management surface data volume, and the control plane data volume to determine the bandwidth requirements between the first computer room and the second computer room.

In this embodiment, for any component in the first type of component, the first device determines the data transmission amount per unit time between any component and each component in the second type of component. The first device adds the data transmission amount per unit time corresponding to all components in the first type component to obtain the data transmission amount per unit time between the first type component and the second type component.

In the big data cluster, there are also management nodes and control nodes. Management nodes and control nodes can be deployed in the same computer room or in different computer rooms. The first device determines the control plane data amount between the first computer room and the second computer room, determines the management plane data amount between the first computer room and the second computer room, and adds the control plane data amount to the management plane data amount. , obtain the data volume of the control plane between the first computer room and the second computer room.

The first device adds the data transmission volume and the management and control plane data volume to obtain a value, and determines the value as the minimum bandwidth required between the first computer room and the second computer room. In order to make the bandwidth between the first computer room and the second computer room sufficient for large data cluster use, the bandwidth requirement between the first computer room and the second computer room is usually greater than the minimum bandwidth required between the first computer room and the second computer room. .

For example, assume that the throughput between a component in the first category and a component in the second category is less than 50Mb/s/node, the management plane data volume is 5Mb/s/node, and the control plane data volume is 1Gb/ s.

Refer to Table 3. The total number of components is 50 to 100. The number of first type components is 50 to 450. The number of second type components is less than 50. The control node and management node are deployed in the first computer room. The first type components are deployed in In the first computer room, the second type of components is deployed in the second computer room. The data volume of the control plane across the computer room is 50*5Mb/s/component+1Gb/s, and the data transmission volume between components across the computer room is 50*50Mb/s/ Component, the minimum bandwidth is 3.75GE, the bandwidth requirement is 10GE, 1GE means 1000Mb/s.

The total number of components is 500 to 1000, the number of first type components is 400 to 900, and the number of second type components is less than 100. The control node and management node are deployed in the first computer room, and the first type components are deployed in the first computer room. , the second type of component is deployed in the second computer room. The data volume of the control plane across the computer room is 100*5Mb/s/component + 1Gb/s. The data transmission volume between components across the computer room is 100*50Mb/s/component. The minimum bandwidth It is 6.5GE and the bandwidth requirement is 10GE.

The total number of components is 1000 to 2000, the number of the first type of components is 800 to 1800, and the number of the second type of components is within 200. The control node and management node are deployed in the first computer room, and the first type of components are deployed in the first computer room. , the second type of component is deployed in the second computer room. The data volume of the control plane across the computer room is 200*5Mb/s/component + 1Gb/s. The data transmission volume between components across the computer room is 200*50Mb/s/component. The minimum bandwidth It is 12GE and the bandwidth requirement is 20GE.

Table 3

After determining the bandwidth requirement between the first computer room and the second computer room, the first device outputs the bandwidth requirement and the deployment plan of the big data cluster to the second device. Or, send the bandwidth requirement and the deployment plan of the big data cluster to the device that sent the confirmation request, or, when displaying the deployment plan of the big data cluster, display the bandwidth requirement at the same time.

In this way, bandwidth requirements can also be output to provide a reference for bandwidth settings between computer rooms.

In the embodiment of this application, the deployment plan for deploying big data clusters across computer rooms can be automatically output, effectively reducing the complexity of manual random deployment of cross-computer rooms, and solving the problem of complex calculations, complex operations and deployment caused by manual random deployment of big data clusters. Long time problem.

Moreover, when deploying big data clusters across computer rooms, we do not randomly deploy components in the computer room, but consider the categories of components in the big data cluster. In this way, we can minimize the number of computer rooms without reducing the computing performance of the big data cluster. The amount of data transmission can thereby reduce the bandwidth requirements of the computer room. Moreover, after reducing the bandwidth requirements of the computer room, the network cost of the computer room can also be reduced.

The following describes the device for determining the big data cluster deployment solution provided by this application.

Figure 5 is a structural diagram of a device for determining a big data cluster deployment solution provided by an embodiment of the present application. The device can be implemented as part or all of the device through software, hardware, or a combination of both. The device provided by the embodiment of the present application can implement the process shown in Figure 3 of the embodiment of the present application. The device includes: an acquisition module 510 and a determination module 520, wherein:

The acquisition module 510 is configured to receive the input deployment requirements of the big data cluster. The deployment requirements include deployment requirement information of the components to be deployed in the big data cluster, computer room capacity information of multiple computer rooms in which the big data cluster is deployed, and The parameter information of the host where the big data cluster is deployed can be used to implement the acquisition function of step 301 and execute the implicit steps included in step 301;

Determining module 520, configured to determine a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed. The deployment plan includes the computer room where the component to be deployed is deployed and the hosts deployed in the computer room. ;

Outputting the deployment plan can specifically be used to implement the acquisition functions of steps 302 and 303 and to execute the implicit steps included in steps 302 and 303.

In an example, the determining module 520 is used to:

Based on the deployment requirements and the first deployment strategy, determine a deployment plan for the component to be deployed;

The first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component among the components to be deployed in the same computer room.

In an example, the determining module 520 is used to:

Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;

Based on the computer room capacity information, the host where the component to be deployed is deployed, and the first deployment policy, the computer room to which the host where the component to be deployed belongs belongs is determined.

In one example, the determining module 520 is also used to:

Based on the deployment requirements and the first deployment strategy, before determining the deployment plan of the component to be deployed, determine the host required by the first type component based on the deployment requirement information and the parameter information of the first type component. number;

Based on the computer room capacity information, it is determined that the number of hosts is greater than the number of hosts accommodated by each of the plurality of computer rooms.

In one example, the components to be deployed except for the first type of components are deployed in the same computer room.

In an example, the determining module 520 is used to:

Determine the number of hosts required for the first type of component based on the deployment requirement information of the first type of component among the components to be deployed;

If the number of hosts accommodated in the multiple computer rooms is greater than or equal to the number of hosts, determine a deployment plan for the component to be deployed based on the deployment requirements and the second deployment strategy;

The second deployment strategy is a strategy for deploying the first type of components in the same computer room.

In one example, the plurality of computer rooms include the first computer room and the second computer room;

The determination module 520 is used for:

It is determined that the computer room to which the host of the first type of components belongs is the first computer room, and the computer room to which the second type of components other than the first type of components to be deployed is deployed is determined to be the second computer room. , the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.

In one example, the first type of components includes components based on HDFS and YRAN;

Among the components to be deployed, components other than the first type of components are components that are not based on the HDFS and the YRAN.

In one example, the determining module 520 is also used to:

For the first computer room and the second computer room in the plurality of computer rooms, determine the first data transmission amount of the first type component between the first computer room and the second computer room;

Determine the second data transmission amount of the first type component and the second type component between the first computer room and the second computer room;

Determine the amount of management plane data and the amount of control plane data between the first computer room and the second computer room;

Based on the first data transmission amount, the second data transmission amount, the management plane data amount, and the control plane data amount, the bandwidth requirement between the first computer room and the second computer room is determined.

In one example, the determining module 520 is also used to:

Determine the amount of data transmission between the first type component and the second type component, and determine the management plane data amount and control plane data amount between the first computer room and the second computer room;

Based on the data transmission volume, the management plane data volume, and the control plane data volume, bandwidth demand information between the first computer room and the second computer room is determined.

Wherein, both the acquisition module 510 and the determination module 520 can be implemented by software, or can be implemented by hardware. Illustratively, next, taking the determination module 520 as an example, the implementation of the determination module 520 is introduced. Similarly, the implementation of the acquisition module 510 can refer to the implementation of the determination module 520 .

Module As an example of a software functional unit, the determination module 520 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, or a container. Furthermore, the above computing instance may be one or more. For example, determination module 520 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Furthermore, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.

Likewise, the multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (VPC), or across multiple VPCs. Among them, usually a VPC is set up in a region. Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .

Module As an example of a hardware functional unit, the determination module 520 may include at least one computing device, such as a server. Alternatively, the determination module 520 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). Among them, the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.

Multiple computing devices included in the determination module 520 may be distributed in the same region or in different regions. The multiple computing devices included in the determination module 520 may be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the determination module 520 may be distributed in the same VPC or in multiple VPCs. The plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

It should be noted that in other embodiments, the acquisition module 510 can be used to perform any of the methods for determining the big data cluster deployment plan. The determination module 520 may be used to perform any step in the method for determining the big data cluster deployment plan. The steps that the acquisition module 510 and the determination module 520 are responsible for can be specified as needed. The acquisition module 510 and the determination module 520 respectively implement different steps in the method for determining the big data cluster deployment plan to realize the entire device for determining the big data cluster deployment plan. Function.

It should also be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. In actual implementation, there may be other division methods.

The following describes the computing device 200 provided by the embodiment of the present application.

An embodiment of the present application also provides a computing device 200. As shown in Figure 6, computing device 200 includes: bus 1102, processor 1104, memory 1106, and communication interface 1108. The processor 1104, the memory 1106 and the communication interface 1108 communicate through a bus 1102. Computing device 200 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 200.

The bus 1102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus and control bus. For ease of presentation, only one line is used in Figure 6, but it does not mean that there is only one bus or one type of bus. Bus 1102 may include a path that carries information between various components of computing device 200 (eg, memory 1106, processor 1104, and communications interface 1108).

The processor 1104 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.

Memory 1106 may include volatile memory, such as random access memory (RAM). The memory 1106 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive). drive, SSD).

The memory 1106 stores executable program code, and the processor 1104 executes the executable program code to respectively implement the functions of the acquisition module 510 and the determination module 520 mentioned above, thereby realizing the method for determining the big data cluster deployment plan. That is, the memory 1106 stores instructions for executing the determined method of the big data cluster deployment plan.

The communication interface 1108 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 200 and other devices or communication networks.

The following describes the computing device cluster provided by the embodiment of the present application.

An embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device may be a server, for example, the computing device may be a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.

As shown in FIG. 7 , the computing device cluster includes at least one computing device 200 . The memory 1106 of one or more computing devices 200 in the computing device cluster may store the same instructions for executing the determined method of the big data cluster deployment plan.

In some possible implementations, the memory 1106 of one or more computing devices 200 in the computing device cluster may also store part of the instructions for executing the method for determining the big data cluster deployment plan. In other words, a combination of one or more computing devices 200 may jointly execute instructions for performing the determined method of the big data cluster deployment scenario.

It should be noted that the memory 1106 in different computing devices 200 in the computing device cluster can store different instructions, respectively used to execute part of the functions of the device for determining the big data cluster deployment solution mentioned above. That is, instructions stored in the memory 1106 in different computing devices 200 may implement the functions of one or more of the acquisition module 510 and the determination module 520 .

In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Among them, the network can be a wide area network or a local area network, etc. Figure 8 shows a possible implementation. As shown in FIG. 8 , two computing devices (a first computing device 200A and a second computing device 200B) are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, the memory 1106 in the first computing device 200A stores instructions for performing the functions of the determination module 520 . At the same time, instructions for performing the functions of the acquisition module 510 are stored in the memory 1106 in the second computing device 200B.

The connection method between computing device clusters shown in Figure 8 can be based on the fact that in the determination method of the big data cluster deployment solution provided by this application, there is data transmission between the acquisition module 510 and the determination module 520, and the space occupied by the determination module 520 is relatively large, so the function implemented by the execution determination module 520 is considered to be executed by the first computing device 200A, and considering that the determination method of the big data cluster deployment solution provided by this application may interact with the terminal device, it is considered that the execution acquisition The functions implemented by module 510 are performed by the second computing device 200B.

It should be understood that the functions of the first computing device 200A shown in FIG. 8 can also be completed by multiple computing devices 200. Likewise, the second calculation The functions of device 200B may also be performed by multiple computing devices 200.

An embodiment of the present application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, at least one computing device is caused to execute the method for determining a big data cluster deployment scheme.

An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media. The available media may be magnetic media (for example, floppy disks, hard disks, magnetic tapes), optical media (for example, digital video discs (DVD)), or semiconductor media (for example, solid state drives), etc. The computer-readable storage medium includes instructions that instruct the computing device to perform a method for determining a big data cluster deployment plan.

Those of ordinary skill in the art will appreciate that the method steps and units described in conjunction with the embodiments disclosed in this application can be implemented with electronic hardware, computer software, or a combination of both. In order to clearly illustrate the relationship between hardware and software Interchangeability, in the above description, the steps and compositions of each embodiment have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. One of ordinary skill in the art may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

In this application, the terms "first" and "second" are used to distinguish identical or similar items with substantially the same functions and functions. It should be understood that there is no logical or logical connection between "first" and "second". Timing dependencies do not limit the number and execution order. It should also be understood that, although the following description uses the terms "first", "second", etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first type of component may be referred to as a second type of component, and similarly, a second type of component may be referred to as a first type component, without departing from the scope of various examples. Both Type 1 components and Type 2 components can be problems, and in some cases, can be separate and distinct problems.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to depart from the protection scope of the technical solutions of the embodiments of the present application.

Claims

A method for determining a big data cluster deployment plan, applied to computing equipment, characterized in that the big data cluster is managed by a large data management platform, and the computer room where the big data cluster is deployed includes multiple computer rooms, and each computer room Host is accommodated, and the method includes:

Receive the input deployment requirements of the big data cluster. The deployment requirements include deployment requirement information of the components to be deployed in the big data cluster, computer room capacity information of the multiple computer rooms, and the information of the host on which the big data cluster is deployed. Parameter information;

Based on the deployment requirements and the category of the component to be deployed, determine a deployment plan for the component to be deployed, where the deployment plan includes a computer room where the component to be deployed is deployed and a host deployed in the computer room;

Output the deployment plan.
The method according to claim 1, characterized in that, based on the deployment requirements and the category of the component to be deployed, determining the deployment plan of the component to be deployed includes:

Based on the deployment requirements and the first deployment strategy, determine a deployment plan for the component to be deployed;

The first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component among the components to be deployed in the same computer room.
The method according to claim 2, characterized in that, based on the deployment requirements and the first deployment strategy, determining the deployment plan of the component to be deployed includes:

Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;

Based on the computer room capacity information, the host where the component to be deployed is deployed, and the first deployment policy, the computer room to which the host where the component to be deployed belongs belongs is determined.
The method according to claim 3, characterized in that, before calculating the deployment plan of the component to be deployed based on the deployment requirement and the first deployment strategy, the method further includes:

Determine the number of hosts required for the first type component based on the deployment requirement information and the parameter information of the first type component;

Based on the computer room capacity information, it is determined that the number of hosts is greater than the number of hosts accommodated by each of the plurality of computer rooms.
The method according to any one of claims 2 to 4, characterized in that, among the components to be deployed, components other than the first type of components are deployed in the same computer room.
The method according to claim 1, characterized in that, based on the deployment requirements and the category of the component to be deployed, determining the deployment plan of the component to be deployed includes:

Determine the number of hosts required for the first type of component based on the deployment requirement information of the first type of component among the components to be deployed;

If the number of hosts accommodated in the multiple computer rooms is greater than or equal to the number of hosts, determine a deployment plan for the component to be deployed based on the deployment requirements and the second deployment strategy;

The second deployment strategy is a strategy for deploying the first type of components in the same computer room.
The method according to claim 6, wherein the plurality of computer rooms include the first computer room and the second computer room;

Determining a deployment plan for the component to be deployed based on the deployment requirement and the second deployment strategy includes:

Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;

It is determined that the computer room to which the host of the first type of components belongs is the first computer room, and the computer room to which the second type of components other than the first type of components to be deployed is deployed is determined to be the second computer room. , the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
The method according to any one of claims 2 to 7, characterized in that the first type of components includes components based on distributed file system HDFS and big data resource scheduler YRAN;

Among the components to be deployed, components other than the first type of components are components that are not based on the HDFS and the YRAN.
The method of claim 5, further comprising:

For the first computer room and the second computer room in the plurality of computer rooms, determine the first data transmission amount of the first type component between the first computer room and the second computer room;

Determine the second data transmission amount of the first type component and the second type component between the first computer room and the second computer room;

Determine the amount of management plane data and the amount of control plane data between the first computer room and the second computer room;

Based on the first data transmission amount, the second data transmission amount, the management plane data amount, and the control plane data amount, the bandwidth requirement between the first computer room and the second computer room is determined.
The method of claim 7, further comprising:

Determine the amount of data transmission between the first type component and the second type component, and determine the management plane data amount and control plane data amount between the first computer room and the second computer room;

Based on the data transmission volume, the management plane data volume, and the control plane data volume, bandwidth demand information between the first computer room and the second computer room is determined.
The method according to any one of claims 1 to 10, characterized in that the deployment requirement information of each component includes one or more of operating system requirement information, data volume or throughput.
The method according to any one of claims 1 to 11, characterized in that the parameter information includes one or more of operating system information, network information or hardware information of various types of hosts and the various The number of hosts of the model.
A device for determining a big data cluster deployment plan, applied to computing equipment, characterized in that the big data cluster is managed by a large number management platform, and the computer room where the big data cluster is deployed includes multiple computer rooms, and each computer room Housing a host computer, the device includes:

The acquisition module is configured to receive the input of the deployment requirements of the big data cluster. The deployment requirements include the deployment requirement information of the components to be deployed in the big data cluster, the computer room capacity information of the multiple computer rooms and the deployment description. Parameter information of the host of the big data cluster;

A determination module, configured to determine a deployment plan for the component to be deployed based on the deployment requirement and the category of the component to be deployed, where the deployment plan includes a computer room where the component to be deployed is deployed and a host deployed in the computer room;

Output the deployment plan.
The device according to claim 13, characterized in that the determining module is used to:

Based on the deployment requirements and the first deployment strategy, determine a deployment plan for the component to be deployed;

The first deployment strategy is a strategy of deploying the data storage part and the computing part of the same component of the first type of component among the components to be deployed in the same computer room.
The device according to claim 14, characterized in that the determining module is used to:

Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;

Based on the computer room capacity information, the host where the component to be deployed is deployed, and the first deployment policy, the computer room to which the host where the component to be deployed belongs belongs is determined.
The device according to claim 15, characterized in that the determining module is also used to:

Based on the deployment requirements and the first deployment strategy, before determining the deployment plan of the component to be deployed, determine the host required by the first type component based on the deployment requirement information and the parameter information of the first type component. number;

Based on the computer room capacity information, it is determined that the number of hosts is greater than the number of hosts accommodated by each of the plurality of computer rooms.
The method according to any one of claims 14 to 16, characterized in that, among the components to be deployed, components other than the first type of components are deployed in the same computer room.
The device according to claim 13, characterized in that the determining module is used to:

Determine the number of hosts required for the first type of component based on the deployment requirement information of the first type of component among the components to be deployed;

If the number of hosts accommodated in the multiple computer rooms is greater than or equal to the number of hosts, determine a deployment plan for the component to be deployed based on the deployment requirements and the second deployment strategy;

The second deployment strategy is a strategy for deploying the first type of components in the same computer room.
The device according to claim 18, wherein the plurality of computer rooms include the first computer room and the second computer room;

The determination module is used for:

Determine the host that meets the deployment requirement information of the component to be deployed as the host where the component to be deployed is deployed;

It is determined that the computer room where the host where the first type of component is deployed belongs is the first computer room, and the computer room where the second type of components other than the first type of component among the components to be deployed is deployed is determined as the second computer room. , the number of hosts accommodated in the first computer room is greater than or equal to the number of hosts.
The device according to any one of claims 15 to 19, wherein the first type of components includes components based on distributed file system HDFS and big data resource scheduler YRAN;

Among the components to be deployed, components other than the first type of components are components that are not based on the HDFS and the YRAN.
The device according to claim 17, characterized in that the determining module is also used to:

For the first computer room and the second computer room in the plurality of computer rooms, it is determined that the first type of component is between the first computer room and the second computer room. The first amount of data transmission between;

Determine the second data transmission amount of the first type component and the second type component between the first computer room and the second computer room;

Determine the amount of management plane data and the amount of control plane data between the first computer room and the second computer room;

Based on the first data transmission amount, the second data transmission amount, the management plane data amount, and the control plane data amount, the bandwidth requirement between the first computer room and the second computer room is determined.
The device according to claim 19, characterized in that the determining module is also used to:

Determine the amount of data transmission between the first type component and the second type component, and determine the management plane data amount and control plane data amount between the first computer room and the second computer room;

Based on the data transmission volume, the management plane data volume, and the control plane data volume, bandwidth demand information between the first computer room and the second computer room is determined.
A computing device cluster, characterized by including at least one computing device, each computing device including a processor and a memory;

The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the cluster of computing devices performs the method according to any one of claims 1 to 12.
A computer-readable storage medium, characterized in that it includes computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster performs the method according to any one of claims 1 to 12.