TW201327205A

TW201327205A - Managing method for hardware performance and cloud computing system

Info

Publication number: TW201327205A
Application number: TW100147600A
Authority: TW
Inventors: Ying-Chih Lu
Original assignee: Inventec Corp
Priority date: 2011-12-21
Filing date: 2011-12-21
Publication date: 2013-07-01

Abstract

A managing method for hardware performance and a cloud computing system are provided. The cloud computing system includes a plurality of node devices deployed with a plurality of node resource pools and a managing node for performing the managing method. The managing method for hardware performance includes the following steps. A load for the node resource pools are detected to judge a bottleneck and a corresponding bottleneck resource pool. At least one of convert nodes is evaluated and selected among the node devices of the node resource pools other than the bottleneck resource pool. The at least one of convert nodes is changed and deployed from its original node resource pool to the bottleneck resource pool.

Description

Hardware performance management method and cloud computing system

本發明是有關於一種雲端運算的效能管理技術，且特別是有關於一種硬體效能的管理方法與雲端運算系統。The present invention relates to a performance management technology for cloud computing, and in particular to a hardware performance management method and a cloud computing system.

雲端運算(Cloud Computing)技術是透過網際網路(Internet)結合大量的伺服器(或稱為節點)以形成高速運算與具備大量儲存能力的整合式電腦，其強調在本地端資源有限的情況下，利用網路取得遠方的運算資源、儲存資源或服務。雲端運算技術可透過虛擬化以及自動化等技術將這些節點進行資源共享或分工，並透過網路及瀏覽器等終端介面來操作這些服務的網頁，藉以進行各種運算和工作。Cloud Computing technology is a combination of a large number of servers (or nodes) through the Internet to form high-speed computing and integrated storage with a large amount of storage capacity, which emphasizes limited resources at the local end. Use the network to obtain remote computing resources, storage resources or services. Cloud computing technology can share or divide these nodes through virtualization and automation technologies, and operate the web pages of these services through terminal interfaces such as the Internet and browsers to perform various operations and work.

這些眾多的節點集結成為伺服器機群(server group)。由於這些節點的數量龐大，因此如何在伺服器機群的某種節點資源發生瓶頸進而影響系統的整體效能時，伺服器機群能夠自動解除瓶頸的發生，藉以提供更高的效能，已成為現今眾多雲端運算系統的重要課題。These numerous nodes are aggregated into a server group. Due to the large number of these nodes, how can the server cluster automatically cancel the bottleneck and provide higher performance when a certain bottleneck of the server cluster causes bottlenecks and affects the overall performance of the system. An important topic in many cloud computing systems.

本發明提供一種硬體效能的管理方法及雲端運算系統，其偵測各種節點資源池是否發生瓶頸現象，並自動調整及重新分配這些伺服器的角色分工，便可有效地自動解決瓶頸現象，為雲端運算系統提供更高的效能。The invention provides a hardware performance management method and a cloud computing system, which detects whether a bottleneck phenomenon occurs in various node resource pools, and automatically adjusts and reallocates the role division of the servers, thereby effectively solving the bottleneck phenomenon automatically. Cloud computing systems provide higher performance.

本發明提出一種硬體效能的管理方法，其適用於一雲端運算系統。所述雲端運算系統包括多個節點裝置，且這些節點裝置配置於多個節點資源池。所述管理方法包括下列步驟。偵測這些節點資源池的負載，以判斷一瓶頸現象及發生此瓶頸現象所對應的一瓶頸資源池。其中這些節點資源池包括所述節點資源池。從瓶頸資源池以外的其他節點資源池的這些節點裝置中評估並選擇至少一個轉換節點。變更所述轉換節點，以將所述轉換節點從原先的節點資源池重新分配至瓶頸資源池。The invention provides a hardware performance management method suitable for a cloud computing system. The cloud computing system includes a plurality of node devices, and the node devices are configured in a plurality of node resource pools. The management method includes the following steps. The load of the resource pools of these nodes is detected to determine a bottleneck phenomenon and a bottleneck resource pool corresponding to the bottleneck. Wherein the node resource pools comprise the node resource pools. At least one transition node is evaluated and selected from these node devices of other node resource pools other than the bottleneck resource pool. The conversion node is changed to redistribute the conversion node from the original node resource pool to the bottleneck resource pool.

在本發明之一實施例中，上述之管理方法更包括下列步驟。針對每一個節點資源池分別設定一正常臨界值與一瓶頸臨界值。當這些節點資源池之其一的負載低於對應的正常臨界值時，表示這些節點資源池之其一位於正常現象。當這些節點資源池之其一的負載高於對應的瓶頸臨界值時，表示這些節點資源池之其一發生所述瓶頸現象並成為所述的瓶頸資源池。In an embodiment of the present invention, the foregoing management method further includes the following steps. A normal threshold and a bottleneck threshold are respectively set for each node resource pool. When the load of one of the resource pools of the nodes is lower than the corresponding normal threshold, it indicates that one of the resource pools of the nodes is in a normal state. When the load of one of the node resource pools is higher than the corresponding bottleneck threshold, it indicates that the bottleneck phenomenon occurs in one of the node resource pools and becomes the bottleneck resource pool.

在本發明之一實施例中，評估所述轉換節點包括下列步驟。依據所述瓶頸臨界值以估算所述轉換節點從原先的節點資源池分配至瓶頸資源池之後，原先的節點資源池的負載以及瓶頸資源池的負載應各自小於其對應的瓶頸臨界值。In an embodiment of the invention, evaluating the conversion node comprises the following steps. After the bottleneck threshold is estimated to estimate that the conversion node is allocated from the original node resource pool to the bottleneck resource pool, the load of the original node resource pool and the load of the bottleneck resource pool should each be smaller than the corresponding bottleneck threshold.

在本發明之一實施例中，變更所述轉換節點包括下列步驟。檢索一節點關連資料庫並取得所述轉換節點的節點相關資料。調整所述轉換節點的節點相關資料，以使所述轉換節點從原先的節點資源池修改至瓶頸資源池。從雲端運算系統中隔離所述轉換節點。依據所述瓶頸資源池調整所述轉換節點。以及，將所述轉換節點重新加入雲端運算系統。In an embodiment of the invention, changing the conversion node comprises the following steps. Retrieving a node-related database and obtaining node-related data of the conversion node. Adjusting node related data of the conversion node, so that the conversion node is modified from the original node resource pool to the bottleneck resource pool. The conversion node is isolated from the cloud computing system. Adjusting the conversion node according to the bottleneck resource pool. And, the conversion node is rejoined to the cloud computing system.

在本發明之一實施例中，隔離所述轉換節點包括下列步驟。將所述轉換節點中的多個虛擬機器從所述轉換節點遷移到原先的節點資源池的其他節點裝置。以及，關閉所述轉換節點所執行的多個服務程序。In an embodiment of the invention, isolating the conversion node comprises the following steps. Migrating a plurality of virtual machines in the conversion node from the conversion node to other node devices of the original node resource pool. And, closing a plurality of service programs executed by the conversion node.

在本發明之一實施例中，隔離所述轉換節點更包括下列步驟。設定節點關連資料庫以隔絕所述轉換節點的節點相關資料。In an embodiment of the invention, isolating the conversion node further comprises the following steps. A node association database is set to isolate node related data of the conversion node.

在本發明之一實施例中，所述節點資源池的負載包括這些節點資源池各自的運算負載、空間負載和/或其結合。In an embodiment of the invention, the load of the node resource pool includes a respective operational load, a spatial load, and/or a combination thereof of the node resource pools.

在本發明之一實施例中，所述節點資源池包括服務資源池、計算資源池、儲存資源池和/或其結合。In an embodiment of the present invention, the node resource pool includes a service resource pool, a computing resource pool, a storage resource pool, and/or a combination thereof.

從另一觀點而言，本發明提出一種雲端運算系統，其包括多個節點裝置以及一管理節點。這些節點裝置透過一網路相互耦接且配置於多個節點資源池。管理節點透過網路耦接至上述節點裝置，藉以偵測這些節點資源池的負載，判斷瓶頸現象及發生此瓶頸現象所對應的瓶頸資源池，其中這些節點資源池包括節點資源池。管理節點從瓶頸資源池以外的其他節點資源池的節點裝置中評估並選擇至少一個轉換節點，並且變更所述轉換節點以將所述轉換節點從原先的節點資源池重新分配至瓶頸資源池。From another point of view, the present invention provides a cloud computing system that includes a plurality of node devices and a management node. The node devices are coupled to each other through a network and configured in a plurality of node resource pools. The management node is coupled to the node device through the network to detect the load of the resource pools of the nodes, and determine the bottleneck phenomenon and the bottleneck resource pool corresponding to the bottleneck, wherein the node resource pool includes the node resource pool. The management node evaluates and selects at least one transition node from node devices of other node resource pools other than the bottleneck resource pool, and alters the transition node to redistribute the transition node from the original node resource pool to the bottleneck resource pool.

本雲端運算系統之其餘實施細節請參照上述說明，在此不加贅述。Please refer to the above description for the remaining implementation details of the cloud computing system, and will not be described here.

基於上述，本發明實施例的雲端運算系統針對每個節點資源池分別設定不同的負載上限，並偵測每個節點資源池的運作情況。當特定節點資源池發生瓶頸現象且無其它冗餘節點可供支援時，雲端運算系統可從正常運作且沒有發生瓶頸現象的節點資源池中選擇部分節點，並將其投入上述特定節點資源池中(換句話說，就是重新分配部分節點的角色分工)，藉以降低瓶頸現象的發生。因此，透過自動調整及重新分配這些伺服器的角色分工，雲端運算系統便可有效地自動解決瓶頸現象並提升其硬體運作效能，提供更高的效能。Based on the above, the cloud computing system of the embodiment of the present invention sets different load upper limits for each node resource pool, and detects the operation status of each node resource pool. When a bottleneck occurs in a specific node resource pool and no other redundant nodes are available for support, the cloud computing system can select some nodes from the normal operation node and no bottlenecks, and put them into the specific node resource pool. (In other words, it is to redistribute the role division of some nodes) to reduce the bottleneck. Therefore, by automatically adjusting and redistributing the role division of these servers, the cloud computing system can effectively solve bottlenecks automatically and improve its hardware operation performance, providing higher performance.

為讓本發明之上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。The above described features and advantages of the present invention will be more apparent from the following description.

現將詳細參考本發明之示範性實施例，在附圖中說明所述示範性實施例之實例。另外，凡可能之處，在圖式及實施方式中使用相同標號的元件/構件/符號代表相同或類似部分。DETAILED DESCRIPTION OF THE INVENTION Reference will now be made in detail to the exemplary embodiments embodiments In addition, wherever possible, the elements and/

圖1是根據本發明一實施例說明雲端運算系統100的示意圖，例如，本發明實施例提供基礎設施即服務(Infrastructure as a Service；簡稱為IaaS)的機櫃式(Container)資料中心(Data Center)以作為雲端運算系統100。如圖1所示，本實施例的雲端運算系統100可以包含至少一機櫃(container)。以本實施例中所述的機櫃而言，每個機櫃包括有多個機架(RACK)，每個機架亦具有多個插槽，每個插槽也可以包括有一到多台伺服器(或稱為節點裝置)。由於各個機櫃具有相似組成，為方便說明，本實施例中以一個機櫃作為舉例。FIG. 1 is a schematic diagram of a cloud computing system 100 according to an embodiment of the present invention. For example, an embodiment of the present invention provides a Infrastructure as a Service (IaaS)-based Container Data Center. As the cloud computing system 100. As shown in FIG. 1, the cloud computing system 100 of this embodiment may include at least one container. In the cabinet described in this embodiment, each cabinet includes multiple racks (RACKs), each rack also has multiple slots, and each slot may also include one or more servers ( Or called a node device). Each cabinet has a similar composition. For convenience of description, a cabinet is taken as an example in this embodiment.

請參照圖1，雲端運算系統100中包括多個節點裝置，這些節點裝置在雲端運算系統100的雲端作業系統進行部署(deploy)時，便已配置於多種節點資源池中。換句話說，這些節點裝置可以分類成三種節點類型，亦即，服務資源池(service resource pool)110、計算資源池(computing resource pool)120以及儲存資源池(storage resource pool)130，而服務資源池110還可以依據其服務功能而更為細部劃分。因此，節點資源池包括服務資源池110、計算資源池120、儲存資源池130和/或其結合。Referring to FIG. 1 , the cloud computing system 100 includes a plurality of node devices. When the cloud computing system of the cloud computing system 100 is deployed, the cloud computing system 100 is configured in a plurality of node resource pools. In other words, the node devices can be classified into three types of nodes, that is, a service resource pool 110, a computing resource pool 120, and a storage resource pool 130, and the service resources. Pool 110 can also be more detailed according to its service functions. Thus, the node resource pool includes a service resource pool 110, a computing resource pool 120, a storage resource pool 130, and/or a combination thereof.

於本實施例中，服務資源池110包括i個節點裝置112_1~112_i，計算資源池120包括j個節點裝置122_1~122_j，儲存資源池130包括k個節點裝置132_1~132_k，i、j、k皆為非負整數。節點裝置112_1~112_i、節點裝置122_1~122_j以及節點裝置132_1~132_k於本實施例中也可分別稱作是服務節點112_1~112_i、計算節點122_1~122_j以及儲存節點132_1~132_k。上述這些節點裝置皆耦接至第二層交換機140，藉以透過區域網路來相互耦接、進行通訊以及資訊傳遞。應用本實施例者亦可透過其他種類的網路方式來耦接這些節點裝置，例如網際網路、無線網路..等，在此不予贅述。In this embodiment, the service resource pool 110 includes i node devices 112_1~112_i, the computing resource pool 120 includes j node devices 122_1~122_j, and the storage resource pool 130 includes k node devices 132_1~132_k, i, j, k. All are non-negative integers. The node devices 112_1~112_i, the node devices 122_1~122_j, and the node devices 132_1~132_k may also be referred to as service nodes 112_1~112_i, compute nodes 122_1~122_j, and storage nodes 132_1~132_k, respectively, in this embodiment. The node devices are all coupled to the second layer switch 140, so as to be coupled to each other, communicate, and transmit information through the area network. The embodiments of the present invention may also be coupled to the node devices through other types of network methods, such as the Internet, the wireless network, etc., and are not described herein.

服務資源池110以及位於其中的服務節點可依據服務功能而細分其種類，例如實體安裝(physical installer)服務、實體管理(physical manager)服務、日誌(LOG)處理服務、虛擬管理(virtual manager)服務、應用程式介面(Application Programming Interface，API)服務、虛擬資源提供(virtual resource provisioning)服務、資料庫服務、儲存管理(storage manager)服務、負載平衡(load balance)服務以及安全機制(security)服務...等。計算資源池120及位於其中的計算節點用以提供計算服務。儲存資源池130及位於其中的儲存節點則用以提供儲存服務。The service resource pool 110 and the service nodes located therein can be subdivided according to the service function, such as a physical installer service, a physical manager service, a log processing service, and a virtual manager service. , Application Programming Interface (API) services, virtual resource provisioning services, database services, storage manager services, load balance services, and security services. ..Wait. The computing resource pool 120 and the computing nodes located therein are used to provide computing services. The storage resource pool 130 and the storage nodes located therein are used to provide storage services.

換句話說，服務節點112_1~112_i主要提供許多虛擬機器(Virtual Machine；簡稱為VM)的服務給予使用者，這些虛擬機器執行於計算節點122_1~122_j所組成的計算資源池120，虛擬機器其所需之儲存空間則由儲存節點132_1~132_k所組成的儲存資源池130所提供。每個服務節點112_1~112_i會依據其所執行之不同的軟體而提供使用者不同的服務。相對而言，配置於計算資源池120中的計算節點122_1~122_j或是儲存資源池130中的儲存節點132_1~132_k分別執行相似的軟體程序，使其易於相互整合而進行龐大的運算或是儲存資料。In other words, the service nodes 112_1~112_i mainly provide services of a plurality of virtual machines (VMs), which are executed by the computing resources pool 120 composed of the computing nodes 122_1~122_j. The required storage space is provided by the storage resource pool 130 composed of the storage nodes 132_1~132_k. Each service node 112_1~112_i provides different services for the user according to the different software they execute. In contrast, the computing nodes 122_1~122_j disposed in the computing resource pool 120 or the storage nodes 132_1~132_k in the storage resource pool 130 respectively perform similar software programs, so that they are easy to integrate with each other for huge computation or storage. data.

雲端運算系統也包括一個管理節點，其用以監控及調整各個節點裝置的負載情況。上述管理節點可以是上述節點裝置中的其中之一，或是獨立於節點裝置以外的另一個監控裝置，本實施例以位在服務資源池110中的節點裝置112_2作為管理節點。管理節點112_2包括瓶頸監控模組150、節點選擇模組160、資料存取模組170、節點隔離模組180、節點部署模組190以及節點增加模組195，這些功能模組將於下述詳細說明。此外，雲端作業系統在進行各節點裝置的部署時，便會獲得各個節點裝置所對應的節點相關資料(node related data)，雲端作業系統會將這些節點相關資料整合儲存於節點關連資料庫DB中，以供管理節點112_2進行參考。於本實施例中，節點關連資料庫DB設置於服務節點112_1中，但本發明並不受限於此，其他實施例亦可將節點關連資料庫DB放置於任一節點裝置中。The cloud computing system also includes a management node for monitoring and adjusting the load of each node device. The management node may be one of the foregoing node devices or another monitoring device independent of the node device. In this embodiment, the node device 112_2 located in the service resource pool 110 serves as a management node. The management node 112_2 includes a bottleneck monitoring module 150, a node selection module 160, a data access module 170, a node isolation module 180, a node deployment module 190, and a node addition module 195. These functional modules will be described in detail below. Description. In addition, when the cloud operating system deploys each node device, node related data corresponding to each node device is obtained, and the cloud operating system integrates the node related data into the node related database DB. For reference by the management node 112_2. In this embodiment, the node association database DB is disposed in the service node 112_1, but the present invention is not limited thereto. Other embodiments may also place the node association database DB in any node device.

部分的節點裝置雖然專門應用於雲端運算系統100之特定角色(例如，專用於特定的節點資源池)，但是亦有部份種類的節點裝置可以支援多種雲端資源，不受限於扮演雲端運算系統100的特定角色。例如，部分節點裝置的運算效能遠優於其他節點裝置，但其對於提供服務或是儲存資料的能力則遠遜於其他節點，此時這些節點裝置便可專門歸類到計算資源池120中以作為計算節點。然而，許多節點裝置既有良好的運算效能，也能夠提供較佳的服務以及資料儲存，因此可以作為服務節點、儲存節點或是計算節點之用。也就是說，這種節點裝置並不因其硬體設計而受限於僅能作為雲端運算系統100的特定角色。雖然在雲端作業系統進行配置時，這種相容性較佳的節點裝置已被決定屬於特定的節點資源池，但是在特定情況下，管理模組112_2也變更這些節點裝置的角色。Although some of the node devices are specifically applied to the specific roles of the cloud computing system 100 (for example, dedicated to a specific node resource pool), some types of node devices can support multiple cloud resources, and are not limited to playing the cloud computing system. 100 specific roles. For example, some node devices have much better computing performance than other node devices, but their ability to provide services or store data is far less than other nodes. In this case, these node devices can be specifically classified into the computing resource pool 120. As a compute node. However, many node devices have good computing performance, can provide better services and data storage, and thus can be used as service nodes, storage nodes or computing nodes. That is to say, such a node device is not limited to being a specific role of the cloud computing system 100 only because of its hardware design. Although the more compatible node devices have been determined to belong to a specific node resource pool when the cloud operating system is configured, in a specific case, the management module 112_2 also changes the roles of the node devices.

所謂的『瓶頸現象(Bottleneck)』，即是雲端運算系統100各個節點資源池中的效能負載或是空間負載過重，並且沒有其他的備用(spare)節點可供支援的時候，此時可以稱為雲端運算系統100已經處於瓶頸現象。例如，整個計算資源池120中每個計算節點122_1~122_j之中央處理器(CPU)的平均使用率過高的時候，此時稱為是效能瓶頸(Performance Bottleneck)。又例如，儲存資源池130中儲存節點122_1~122_k所剩下的儲存空間快要不足的時候，此時稱為是空間瓶頸(Space Bottleneck)。The so-called "Bottleneck" is the performance load or space load in the resource pool of each node of the cloud computing system 100, and there is no other spare node available for support. The cloud computing system 100 is already in a bottleneck. For example, when the average usage rate of the central processing unit (CPU) of each of the computing nodes 122_1 122 122_j in the entire computing resource pool 120 is too high, it is referred to as a performance bottleneck. For another example, when the storage space remaining in the storage nodes 122_1~122_k in the storage resource pool 130 is insufficient, it is referred to as a space bottleneck.

於此，當已經偵測到雲端運算系統100發生瓶頸現象，整個機櫃又沒有多餘備用節點的時候，本發明實施例的雲端運算系統100可從沒有發生瓶頸現象的節點資源池中選擇某些節點裝置以進行角色變更，使得這些被選擇的節點裝置便以作為發生瓶頸現象的節點資源池的其中一員，藉以消除整個雲端運算系統100的瓶頸現象。當然，本發明實施例必須考量到這些受變更的節點裝置是否可以承擔切換後的節點資源池的效能。In this case, when the cloud computing system 100 has detected a bottleneck and the entire cabinet has no redundant standby nodes, the cloud computing system 100 of the embodiment of the present invention can select certain nodes from the node resource pool where no bottleneck occurs. The device performs role change, so that the selected node devices act as one of the node resource pools that are bottlenecks, thereby eliminating the bottleneck phenomenon of the entire cloud computing system 100. Of course, the embodiment of the present invention must consider whether the changed node devices can bear the performance of the switched node resource pool.

以下即透過其所適用的雲端運算系統100來加以說明硬體效能的管理方法。圖2是依照本發明一實施例說明硬體效能的管理方法流程圖。請同時參考圖1及圖2，於步驟S210中，管理節點112_2中的瓶頸監控模組150偵測上述節點資源池110~130的負載，並於步驟S220中，瓶頸監控模組150判斷是否發生瓶頸現象，以及判斷發生此瓶頸現象所對應的節點資源池。此處將發生瓶頸現象所對應的節點資源池稱為是瓶頸資源池。The following is a description of the management method of hardware performance by the cloud computing system 100 to which it is applied. 2 is a flow chart showing a method for managing hardware performance according to an embodiment of the invention. Referring to FIG. 1 and FIG. 2, in step S210, the bottleneck monitoring module 150 in the management node 112_2 detects the load of the node resource pools 110-130, and in step S220, the bottleneck monitoring module 150 determines whether it occurs. The bottleneck phenomenon and the node resource pool corresponding to the bottleneck. Here, the node resource pool corresponding to the bottleneck phenomenon is called a bottleneck resource pool.

於本實施例中，管理節點112_2針對每一個節點資源池110~130分別設定一個正常臨界值與一個瓶頸臨界值，藉以判斷每個節點資源池的目前運作情況。詳細而言，本實施例的瓶頸監控模組150偵測各個節點裝置的負載，所述各個節點裝置的負載包括這些節點裝置各自的運算負載以及空間負載(也就是已用的儲存空間)，並透過節點關連資料庫DB來統整計算出每個節點資源池110~130的平均負載。藉此，節點資源池的負載包括這些節點資源池中各自節點裝置的平均運算負載、空間負載和/或其結合。In this embodiment, the management node 112_2 sets a normal threshold and a bottleneck threshold for each of the node resource pools 110-130 to determine the current operation of each node resource pool. In detail, the bottleneck monitoring module 150 of the embodiment detects the load of each node device, and the load of each node device includes the computing load of each node device and the space load (that is, the used storage space), and The average load of each node resource pool 110~130 is calculated through the node related database DB. Thereby, the load of the node resource pool includes the average computational load, spatial load, and/or a combination thereof of the respective node devices in the resource pools of the nodes.

於本實施例中，效能瓶頸的正常臨界值設定為70%，而效能瓶頸的瓶頸臨界值則設定為80%。也就是說，當節點資源池中CPU的平均效能使用率小於70%，且當某裝節點裝置進行角色變更之後，變更後原先的點節資源池中CPU的平均效能使用率仍需低於70%。另一方面，當節點資源池中CPU的平均效能使用率大於80%，且當某裝節點裝置進行角色變更後，變更後的瓶頸資源池應該要低於80%才能符合其評估。In this embodiment, the normal threshold of the performance bottleneck is set to 70%, and the bottleneck threshold of the performance bottleneck is set to 80%. That is to say, when the average performance rate of the CPU in the node resource pool is less than 70%, and the average performance of the CPU in the original point resource pool after the change is changed, the average performance rate of the CPU in the original point resource pool needs to be lower than 70%. %. On the other hand, when the average CPU usage of the CPU in the node resource pool is greater than 80%, and the role of the installed node device changes, the bottleneck resource pool after the change should be less than 80% to meet the assessment.

本實施例中，空間瓶頸的正常臨界值設定為80%，而空間瓶頸的瓶頸臨界值則設定為90%。也就是說，當節點資源池中(已使用空間/所有儲存空間)的數值小於80%，且當某裝節點裝置進行角色變更之後，變更後原先的點節資源池中(已使用空間/所有儲存空間)的數值仍需低於80%。當當節點資源池中(已使用空間/所有儲存空間)的數值已大於80%，且當某裝節點裝置進行角色變更後，變更後的瓶頸資源池中(已使用空間/所有儲存空間)的數值應該要低於80%才能符合其評估。In this embodiment, the normal threshold of the space bottleneck is set to 80%, and the bottleneck threshold of the space bottleneck is set to 90%. That is to say, when the value of the used resource pool (used space/all storage space) is less than 80%, and after a node device changes the role, the original point resource pool is changed (used space/all) The value of the storage space still needs to be less than 80%. The value of the used resource pool (used space/all storage space) in the node resource pool is greater than 80%, and the value of the changed bottleneck resource pool (used space/all storage space) after a node device is changed. It should be less than 80% to meet its assessment.

計算出各個節點資源池的負載後，當節點資源池的負載低於其對應的正常臨界值時，瓶頸監控模組150便可判斷這些節點資源池位於正常現象，並沒有發生瓶頸現象的情況。當節點資源池的負載高於對應的正常臨界值但卻低於對應的瓶頸臨界值的時候，瓶頸監控模組150便可判斷此節點資源池處於高負載情況，但尚未達到上述的『瓶頸現象』。然而，當節點資源池的負載已達到或是高於其對應的瓶頸臨界值的時候，表示此節點資源池的相應負載已經接近滿載狀態，瓶頸監控模組150便可判斷此節點資源池已發生『瓶頸現象』。『瓶頸現象』例如是，計算資源池120中的效能負載不足以負荷現今的運算量，或是儲存資源池130中的可儲存空間已經低於預設的備用空間。After the load of each node resource pool is calculated, when the load of the node resource pool is lower than the corresponding normal threshold, the bottleneck monitoring module 150 can determine that the resource pools of the nodes are in a normal state, and no bottleneck occurs. When the load of the node resource pool is higher than the corresponding normal threshold but lower than the corresponding bottleneck threshold, the bottleneck monitoring module 150 can determine that the resource pool of the node is in a high load condition, but has not yet reached the bottleneck phenomenon described above. 』. However, when the load of the node resource pool has reached or is higher than its corresponding bottleneck threshold, it indicates that the corresponding load of the resource pool of the node is near full load, and the bottleneck monitoring module 150 can determine that the resource pool of the node has occurred. "Bottling phenomenon". The "bottleneck phenomenon" is, for example, that the performance load in the computing resource pool 120 is insufficient to load the current computing amount, or the storable space in the storage resource pool 130 is already lower than the preset spare space.

為了方便說明，本實施例假設儲存資源池130已發生空間瓶頸的問題。因此，當瓶頸監控模組150判斷已發生瓶頸現象，並且已判斷發生此瓶頸現象所對應的節點資源池(也就是儲存資源池130)之後，便由步驟S220進入步驟S230，節點選擇模組160從瓶頸資源池以外的其他節點資源池的節點裝置中，評估並選擇至少一個轉換節點。換句話說，節點選擇模組160會從其他沒有發生瓶頸現象的節點資源池(例如是服務資源池110或是計算資源池120)中選擇可以作為儲存節點的節點裝置，並且評估這些節點裝置在進行角色變更後，是否可以確實地使雲端運算系統100不會發生瓶頸現象。For convenience of explanation, the present embodiment assumes that the storage resource pool 130 has a problem of a space bottleneck. Therefore, when the bottleneck monitoring module 150 determines that the bottleneck phenomenon has occurred, and has determined that the node resource pool corresponding to the bottleneck phenomenon (that is, the storage resource pool 130) has occurred, the process proceeds from step S220 to step S230, and the node selection module 160 At least one conversion node is evaluated and selected from a node device of a node resource pool other than the bottleneck resource pool. In other words, the node selection module 160 selects node devices that can serve as storage nodes from other node resource pools that have no bottlenecks (for example, the service resource pool 110 or the computing resource pool 120), and evaluates that the node devices are After the role change is made, whether or not the cloud computing system 100 can be surely prevented from bottlenecking.

為了避免兩種已經處於高負載狀態之節點資料池的節點裝置會持續不斷地進行角色變換，本實施例的節點選擇模組160便會從位於正常現象的節點資源池(也就是，其負載低於正常臨界值的節點資料池)中選擇欲變更角色的節點裝置，不會在位於高負載狀態的節點資料池中進行選擇。此外，節點選擇模組160也需要依據瓶頸臨界值估算上述的轉換節點在進行角色變更之後(也就是，從原先的節點資源池分配至瓶頸資源池之後)是否會使原先的節點資料池以及瓶頸資源池的負載皆小於其對應的瓶頸臨界值，以預期在進行角色轉換後可達到雲端運算系統100皆無瓶頸現象的功效。本實施例的節點選擇模組160可直覺地計算位在同一個節點資源池中各個節點裝置之中央處理器的平均效能負載，或是直覺地計算已用的儲存空間是否超出，以判斷其是否已達到或超出瓶頸臨界值。In order to prevent the node devices of the node data pools that are already in a high load state from continuously performing role transformation, the node selection module 160 of this embodiment will be from a node resource pool located in a normal phenomenon (that is, its load is low). Selecting the node device whose role is to be changed in the node data pool of the normal threshold value is not selected in the node data pool located in the high load state. In addition, the node selection module 160 also needs to estimate whether the above-mentioned conversion node after the role change (that is, after the allocation from the original node resource pool to the bottleneck resource pool) causes the original node data pool and the bottleneck according to the bottleneck threshold value. The load of the resource pool is less than the corresponding bottleneck threshold, so that it is expected that the cloud computing system 100 has no bottleneck after the role transition. The node selection module 160 of this embodiment can intuitively calculate the average performance load of the central processing unit of each node device in the same node resource pool, or intuitively calculate whether the used storage space is exceeded, to determine whether it is The bottleneck threshold has been reached or exceeded.

例如，假設節點選擇模組160在位於正常現象的計算資源池120中選擇計算節點122_2作為轉換節點，計算節點122_2可以作為儲存節點，並且計算節點122_2在節點選擇模組160評估後可以達成上述效果的話，節點選擇模組160便將這些節點裝置視作是轉換節點，藉以續行下述步驟。For example, assume that the node selection module 160 selects the computing node 122_2 as a conversion node in the computing resource pool 120 located in the normal phenomenon, the computing node 122_2 can serve as a storage node, and the computing node 122_2 can achieve the above effects after the node selection module 160 evaluates. The node selection module 160 treats these node devices as conversion nodes, thereby continuing the steps described below.

於步驟S240中，管理節點112_2變更上述轉換節點122_2在節點關連資料庫DB中的節點相關資料，藉以將轉換節點122_2從原先的節點資源池(也就是計算資源池120)重新分配至瓶頸資源池(也就是儲存資源池130)中。其中，步驟S240的動作流程可以細分為步驟S250至步驟S290，並透過節點關連資料庫DB中的節點相關資訊表格逐一詳述這些步驟。In step S240, the management node 112_2 changes the node-related data of the conversion node 122_2 in the node-associated database DB, thereby reallocating the conversion node 122_2 from the original node resource pool (that is, the computing resource pool 120) to the bottleneck resource pool. (that is, storage resource pool 130). The operation flow of step S240 can be subdivided into steps S250 to S290, and the steps are detailed one by one through the node related information table in the node association database DB.

雲端運算系統100的雲端作業系統在配置節點裝置時，便將各個節點裝置的節點相關資訊記錄整理於節點相關資訊表格。上述的節點相關資訊可由以下幾種方式來取得。例如，當每個節點裝置的基本輸入輸出系統(BIOS)在進行加電自檢(POST)程序時，會動態地取得其節點相關資料(例如，中央處理器、記憶體、硬碟、網路卡...等相關資料)，並透過例如SMBIOS資料結構(類型0、1、2及OEM類型)以及網路卡EEPROM中的MAC位址來取得其它的節點相關資料(例如，節點裝置的產品資料、BIOS資訊、節點類型)，最後將這些資料經由IPMI OEM指令傳送到各個節點裝置的基板管理控制器(BMC)中。此外，BMC亦可以動態地取得例如BMC網路卡相關資訊，例如BMC網路卡MAC位址、IP位址及其頻寬，藉以充實其節點相關資訊。以下利用表(1)來作為節點關連資料庫DB以及其中之節點相關資訊的舉例。其中，表(1)所記錄的5筆節點相關資訊分別依序取自圖1的服務節點112_1、112_i、計算節點122_1、122_2以及儲存節點132_1。When the cloud operating system of the cloud computing system 100 configures the node device, the node related information records of the respective node devices are organized into a node related information table. The above node related information can be obtained in the following ways. For example, when the basic input/output system (BIOS) of each node device performs a power-on self-test (POST) program, it dynamically obtains its node-related data (for example, central processing unit, memory, hard disk, network). Cards and other related information), and obtain other node-related data through, for example, SMBIOS data structures (types 0, 1, 2, and OEM types) and MAC addresses in the network card EEPROM (for example, products of node devices) Data, BIOS information, node type), and finally transfer these data to the Baseboard Management Controller (BMC) of each node device via IPMI OEM instructions. In addition, the BMC can dynamically obtain information such as the BMC network card, such as the BMC network card MAC address, IP address and its bandwidth, to enrich its node related information. The following uses Table (1) as an example of the node-related database DB and the information about the nodes therein. The five pieces of node related information recorded in the table (1) are sequentially taken from the service nodes 112_1, 112_i, the computing nodes 122_1, 122_2, and the storage node 132_1 of FIG. 1, respectively.

表(1)包括10個欄位，分別記錄各個節點裝置中基板管理控制器(BMC)網卡的MAC位址、BMC網卡的IP位址及頻寬、系統網卡的MAC位址、系統網卡的IP位址及頻寬、處理器資訊(型號/運算速率)、記憶體資訊、硬碟資訊、節點位置、節點類型以及伺服器類型。系統網卡的IP位址之取得則是透過網路啟動(Network Boot)。Table (1) includes 10 fields, respectively recording the MAC address of the Baseboard Management Controller (BMC) network card in each node device, the IP address and bandwidth of the BMC network card, the MAC address of the system network card, and the IP of the system network card. Address and bandwidth, processor information (model/operation rate), memory information, hard drive information, node location, node type, and server type. The IP address of the system NIC is obtained through Network Boot.

以節點裝置112_1的節點相關資訊而言，其BMC網卡的MAC位址為『00:A0:D1:EC:F8:B1』，所分配到的BMC網卡的IP位址為『10.1.0.1』，且BMC網卡的頻寬為100Mbps(bps=bits per second)。而節點裝置112_1的系統網卡的MAC位址為『00:A0:D1:EA:34:E1』、IP位址為『10.1.0.11』以及頻寬為1000Mbps。另外，節點裝置112_1的中央處理單元的型號為『Intel(R) Xeon(R) CPU E5540』，其運算頻率為『2530MHz』。而節點裝置112_1包括4個記憶體模組，DIMM1~DIMM4，每一個記憶體模組的容量皆為8G。此外，節點裝置112_1的硬碟的托架(carrier)編號為1、硬碟類型為SAS(Serial Attached SCSI，SCSI=Small Computer System Interface)、硬碟容量為1TB、硬碟轉速為7200 RPM(Revolution Per Minute)以及硬碟快取(cache)容量為16MB。The node address information of the node device 112_1 is that the MAC address of the BMC network card is "00:A0:D1:EC:F8:B1", and the IP address of the allocated BMC network card is "10.1.0.1". The BMC NIC has a bandwidth of 100 Mbps (bps=bits per second). The MAC address of the system network card of the node device 112_1 is "00: A0: D1: EA: 34: E1", the IP address is "10.1.0.11", and the bandwidth is 1000 Mbps. Further, the model of the central processing unit of the node device 112_1 is "Intel(R) Xeon(R) CPU E5540", and its operation frequency is "2530 MHz". The node device 112_1 includes four memory modules, DIMM1~DIMM4, and each memory module has a capacity of 8G. Further, the carrier number of the hard disk of the node device 112_1 is 1, the hard disk type is SAS (Serial Attached SCSI, SCSI = Small Computer System Interface), the hard disk capacity is 1 TB, and the hard disk rotation speed is 7200 RPM (Revolution). Per Minute) and the hard drive cache capacity is 16MB.

回到圖2並同時參考圖1，於步驟S250中，資料存取模組170檢索節點關連資料庫DB並取得轉換節點122_2的節點相關資料。也就是，資料存取模組170可從節點關連資料庫DB中取得例如下表(2)的節點相關資料。Referring back to FIG. 2 and referring to FIG. 1 simultaneously, in step S250, the material access module 170 retrieves the node-related database DB and obtains node-related data of the conversion node 122_2. That is, the material access module 170 can obtain node related data such as the following table (2) from the node-related database DB.

於步驟S260中，資料存取模組170調整上述表(2)中轉換節點122_2的節點相關資料，並將這些節點相關資料回存至服務節點112_1中的節點關連資料庫DB，本實施例是修改表(2)中欄位『節點類型』以及『伺服器類型』以其由原先的『計算節點』調整為『儲存節點』(如下表(3)所示)，使轉換節點122_2從原先節點資源池(計算資源池120)的計算節點122_2修改至瓶頸資源池(儲存資源池130)的儲存節點。於其他實施例中，資料存取模組170也會於此時設定節點關連資料庫DB並隔絕轉換節點112_2的節點相關資料，以使其他節點裝置無法存取轉換節點112_2。In step S260, the data access module 170 adjusts the node-related data of the conversion node 122_2 in the table (2), and restores the node-related data to the node-related database DB in the service node 112_1. This embodiment is Modify the field "node type" and "server type" in table (2) to be changed from the original "computing node" to "storage node" (as shown in the following table (3)), so that the conversion node 122_2 is from the original node. The compute node 122_2 of the resource pool (computation resource pool 120) modifies to the storage node of the bottleneck resource pool (storage resource pool 130). In other embodiments, the data access module 170 also sets the node-related database DB at this time and isolates the node-related data of the conversion node 112_2 so that other node devices cannot access the conversion node 112_2.

然後，於步驟S270中，節點隔離模組180從雲端運算系統100中將轉換節點122_2進行隔離。詳細來說，節點隔離模組180會執行許多流程以使轉換節點122_2從雲端運算系統100中隔離。例如，節點隔離模組180將轉換節點122_2中正在運行的多個虛擬機器(VM)從轉換節點遷移到計算資源池120的其他節點裝置122_1~122_j。以及，在轉移完上述的虛擬機器後，節點隔離模組180關閉轉換節點122_2上所執行的所有服務程序。Then, in step S270, the node isolation module 180 isolates the conversion node 122_2 from the cloud computing system 100. In detail, the node isolation module 180 performs a number of processes to isolate the conversion node 122_2 from the cloud computing system 100. For example, the node isolation module 180 migrates a plurality of virtual machines (VMs) running in the conversion node 122_2 from the conversion node to the other node devices 122_1 122 122_j of the computing resource pool 120. And, after the transfer of the virtual machine described above, the node isolation module 180 closes all service programs executed on the conversion node 122_2.

於步驟S280中，節點部署模組190依據瓶頸資源池(儲存資源池130)來調整轉換節點122_2。節點部署模組190會依照調整後的節點類型/伺服器類型來重新部署這個轉換節點122_2，也就是說，對此轉換節點122_2安裝瓶頸資源池(儲存資源池130)所需的作業系統，並在上述作業系統安裝完成後，再安裝所有儲存節點所必須擁有的服務軟體包(service packages)，以使轉換節點122_2符合儲存資源池130的需求。In step S280, the node deployment module 190 adjusts the conversion node 122_2 according to the bottleneck resource pool (the storage resource pool 130). The node deployment module 190 redeploys the conversion node 122_2 according to the adjusted node type/server type, that is, the operation system required to install the bottleneck resource pool (storage resource pool 130) for the conversion node 122_2, and After the installation of the above operating system is completed, the service packages of all the storage nodes must be installed to make the conversion node 122_2 meet the requirements of the storage resource pool 130.

最後，如圖3所示，圖3是根據本發明一實施例說明雲端運算系統100的另一示意圖，並同時參照圖2，於步驟S290中，節點增加模組195便會將轉換節點122_2重新加入雲端運算系統100中，並且由原先節點資源池(計算資源池120)的計算節點122_2轉換為瓶頸資源池(儲存資源池130)中的儲存節點132_x(如虛線箭頭300所示)。資料存取模組170也會於此時設定節點關連資料庫DB並重新開放原先轉換節點112_2(也就是圖3的儲存節點132_x)的節點相關資料，以使其他節點裝置得以存取儲存節點132_x。Finally, as shown in FIG. 3, FIG. 3 is another schematic diagram illustrating the cloud computing system 100 according to an embodiment of the present invention. Referring to FIG. 2, in step S290, the node adding module 195 will re-switch the node 122_2. The cloud computing system 100 is added and converted from the computing node 122_2 of the original node resource pool (the computing resource pool 120) to the storage node 132_x in the bottleneck resource pool (the storage resource pool 130) (as indicated by the dashed arrow 300). The data access module 170 also sets the node association database DB at this time and reopens the node related data of the original conversion node 112_2 (that is, the storage node 132_x of FIG. 3), so that other node devices can access the storage node 132_x. .

綜上所述，本發明實施例的雲端運算系統針對每個節點資源池分別設定不同的負載上限，並偵測每個節點資源池的運作情況。當特定節點資源池發生瓶頸現象且無其它冗餘節點可供支援時，雲端運算系統可從正常運作且沒有發生瓶頸現象的節點資源池中選擇部分節點，並將其投入上述特定節點資源池中(換句話說，就是重新分配部分節點的角色分工)，藉以降低瓶頸現象的發生。因此，透過自動調整及重新分配這些伺服器的角色分工，雲端運算系統便可有效地自動解決瓶頸現象並提升其硬體運作效能，提供更高的效能。In summary, the cloud computing system in the embodiment of the present invention sets different load upper limits for each node resource pool, and detects the operation status of each node resource pool. When a bottleneck occurs in a specific node resource pool and no other redundant nodes are available for support, the cloud computing system can select some nodes from the normal operation node and no bottlenecks, and put them into the specific node resource pool. (In other words, it is to redistribute the role division of some nodes) to reduce the bottleneck. Therefore, by automatically adjusting and redistributing the role division of these servers, the cloud computing system can effectively solve bottlenecks automatically and improve its hardware operation performance, providing higher performance.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作些許之更動與潤飾，故本發明之保護範圍當視後附之申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the invention, and any one of ordinary skill in the art can make some modifications and refinements without departing from the spirit and scope of the invention. The scope of the invention is defined by the scope of the appended claims.

100．．．雲端運算系統100. . . Cloud computing system

110．．．服務資源池110. . . Service resource pool

112_1~112_i．．．服務節點112_1~112_i. . . Service node

120．．．計算資源池120. . . Computing resource pool

122_1~122_j．．．計算節點122_1~122_j. . . calculate node

130．．．儲存資源池130. . . Storage resource pool

132_1~132_k．．．儲存節點132_1~132_k. . . Storage node

140．．．交換機140. . . switch

150．．．瓶頸監控模組150. . . Bottleneck monitoring module

160．．．節點選擇模組160. . . Node selection module

170．．．資料存取模組170. . . Data access module

180．．．節點隔離模組180. . . Node isolation module

190．．．節點部署模組190. . . Node deployment module

195．．．節點增加模組195. . . Node add module

300．．．虛線箭頭300. . . Dotted arrow

DB．．．節點關連資料庫DB. . . Node related database

S210~S290．．．步驟S210~S290. . . step

圖1是根據本發明一實施例說明雲端運算系統的示意圖。1 is a schematic diagram illustrating a cloud computing system in accordance with an embodiment of the present invention.

圖2是依照本發明一實施例說明硬體效能的管理方法流程圖。2 is a flow chart showing a method for managing hardware performance according to an embodiment of the invention.

圖3是根據本發明一實施例說明雲端運算系統的另一示意圖。FIG. 3 is another schematic diagram illustrating a cloud computing system in accordance with an embodiment of the invention.

100．．．雲端運算系統100. . . Cloud computing system

110．．．服務資源池110. . . Service resource pool

112_1~112_i．．．服務節點112_1~112_i. . . Service node

120．．．計算資源池120. . . Computing resource pool

122_1~122_j．．．計算節點122_1~122_j. . . calculate node

130．．．儲存資源池130. . . Storage resource pool

132_1~132_k．．．儲存節點132_1~132_k. . . Storage node

140．．．交換機140. . . switch

150．．．瓶頸監控模組150. . . Bottleneck monitoring module

160．．．節點選擇模組160. . . Node selection module

170．．．資料存取模組170. . . Data access module

180．．．節點隔離模組180. . . Node isolation module

190．．．節點部署模組190. . . Node deployment module

195．．．節點增加模組195. . . Node add module

DB．．．節點關連資料庫DB. . . Node related database

Claims

A hardware performance management method is applicable to a cloud computing system, the cloud computing system includes a plurality of node devices, and the node devices are configured in a plurality of node resource pools, and the management method includes: detecting the node resource pools Load to determine a bottleneck phenomenon and a bottleneck resource pool corresponding to the bottleneck phenomenon, wherein the node resource pool includes the node resource pool; from the node devices of the other node resource pools other than the bottleneck resource pool Evaluating and selecting at least one transition node; and changing the at least one transition node to reallocate the at least one transition node from the original node resource pool to the bottleneck resource pool.

For example, the management method described in claim 1 further includes: setting a normal threshold and a bottleneck threshold for each of the node resource pools; when the load of one of the node resource pools is lower than the corresponding one When the normal threshold is used, one of the node resource pools is in a normal phenomenon; and when the load of one of the node resource pools is higher than the corresponding threshold value of the bottleneck, the bottleneck occurs in one of the node resource pools And become the bottleneck resource pool.

For the management method described in claim 2, the evaluating the at least one conversion node includes the following steps: after estimating the at least one conversion node from the original node resource pool to the bottleneck resource pool according to the bottleneck threshold value, the original The load of the node resource pool and the load of the bottleneck resource pool are each smaller than the corresponding bottleneck threshold.

The management method according to claim 1, wherein the changing the at least one conversion node comprises the steps of: retrieving a node-related database and obtaining node-related data of the at least one conversion node; and adjusting node correlation of the at least one conversion node. Data, such that the at least one transition node is modified from the original node resource pool to the bottleneck resource pool; the at least one transition node is isolated from the cloud computing system; the at least one transition node is adjusted according to the bottleneck resource pool; At least one conversion node rejoins the cloud computing system.

The management method of claim 4, the isolating the at least one conversion node comprises the steps of: migrating the plurality of virtual machines of the at least one conversion node from the at least one conversion node to the other of the original node resource pool a node device; and shutting down a plurality of service programs executed by the at least one conversion node.

For the management method described in claim 5, isolating the at least one conversion node further includes the step of setting the node association database to isolate node related data of the at least one conversion node.

The management method of claim 1, wherein the load of the node resource pools includes an operational load, a spatial load, and/or a combination thereof of the node resource pools.

The management method of claim 1, wherein the node resource pool comprises a service resource pool, a computing resource pool, a storage resource pool, and/or a combination thereof.

A cloud computing system includes: a plurality of node devices coupled to each other through a network and configured in a plurality of node resource pools; and a management node coupled to the node devices to detect the The load of the node resource pool is used to determine a bottleneck phenomenon and a bottleneck resource pool corresponding to the bottleneck phenomenon, wherein the node resource pool includes the node resource pool, and the management node is from other node resources other than the bottleneck resource pool. The at least one transition node is evaluated and selected in the node devices of the pool, and the at least one transition node is changed to reallocate the at least one transition node from the original node resource pool to the bottleneck resource pool.

The cloud computing system of claim 9, wherein the management node comprises: a bottleneck monitoring module, and respectively setting a normal threshold and a bottleneck threshold for each of the node resource pools, when the nodes When the load of one of the resource pools is lower than the corresponding normal threshold, it is determined that one of the node resource pools is in a normal state, and when the load of one of the node resource pools is higher than the corresponding threshold value of the bottleneck, Determining that the bottleneck phenomenon occurs in one of the node resource pools and becomes the bottleneck resource pool; a node selection module estimates the at least one conversion node according to the bottleneck threshold value, wherein the at least one conversion node is from the original node resource pool After being allocated to the bottleneck resource pool, the load of the original node resource pool and the load of the bottleneck resource pool are each smaller than the corresponding bottleneck threshold; and a data access module retrieves a node-related database to obtain the at least one Converting node-related data of the node, and adjusting node-related data of the at least one conversion node, so that the at least one conversion node The node resource pool is modified to the bottleneck resource pool; a node isolation module isolates the at least one conversion node from the cloud computing system; and a node deployment module adjusts the at least one conversion node according to the bottleneck resource pool; A node adds a module, and the at least one conversion node is rejoined to the cloud computing system.