TWI516956B

TWI516956B - Adaptive quick response controlling system for software defined storage system for improving performance parameter

Info

Publication number: TWI516956B
Application number: TW103122094A
Authority: TW
Inventors: 黃明仁; 黃純芳; 石宗民; 陳文賢
Original assignee: 先智雲端數據股份有限公司
Priority date: 2014-06-26
Filing date: 2014-06-26
Publication date: 2016-01-11
Also published as: TW201600977A

Description

Adaptation for software-defined storage systems to improve performance parameters Rapid response control system

本發明關於一種用於軟體定義儲存的控制系統，特別是關於一種用於軟體定義儲存，以達成在服務層級協議裡需求的特定性能指標之控制系統。 The present invention relates to a control system for software defined storage, and more particularly to a control system for software defined storage to achieve specific performance metrics required in a service level protocol.

雲端服務在最近十年中發展得非常普及。雲端服務是基於雲端計算，在不增加客戶端的負擔情形下，提供相關的服務或商品。雲端計算涉及了大量的電腦主機，這些電腦主機彼此經由一個通信網路，比如網際網路，而連接。它依賴資源的分享，以達成一致性與經濟規模。雲端計算的概念乃是融合了網路基礎設施以及資源分享的服務等形成的基礎架構。在所有分享的服務中，記憶體與儲存設備絕對是兩個需求最大的項目。這是因為某些熱門的應用，比如視頻串流，需要巨大的資料量存儲。當雲端服務運作時，記憶體和存儲設備管理是非常重要的，以為客戶維持正常的服務品質。 Cloud services have grown very popular in the last decade. Cloud services are based on cloud computing and provide related services or products without increasing the burden on the client. Cloud computing involves a large number of computer hosts that are connected to each other via a communication network, such as the Internet. It relies on the sharing of resources to achieve consistency and economic size. The concept of cloud computing is an infrastructure formed by a combination of network infrastructure and resource sharing services. Among all the shared services, memory and storage devices are definitely the two most demanding projects. This is because some popular applications, such as video streaming, require huge amounts of data storage. When cloud services operate, memory and storage management is very important to ensure that customers maintain normal service quality.

舉例而言，用來提供雲端服務的伺服器通常管理或連接到數個硬式磁碟上。客戶使用該伺服器，資料自該硬式磁碟讀出或寫入其中。肇因於硬式磁碟系統的限制而產生回應時間的延遲，會引起服務需求上的問題。在正常的硬式磁碟系統的操作下，當應用面所需求(即工作量)的存取速度超過硬式磁碟系統所能提供時，過長的回應時間因而產生。硬式磁碟系統所能提供的最大負載量通常是整個雲端服務系統裡運作的瓶頸。換句話說，硬式磁碟系統的每秒輸入輸出操作次數無法滿足外在需求。針對此問題，有必要移除或減少工作量以達成及改善伺服器的效能。實作上，部分的工作量能由其它伺服器(如果有的話)或硬式磁碟所分擔，而該些伺服器或硬式磁碟會自動或手動地上線加入支援現有的硬式磁碟。不管上述哪種方法用來解決該問題，其所增加的成本，是因應無法預期工作情況而事先多儲備的大量硬式磁碟，以及為了額外的硬體設備而必要增添的電力消耗。從經濟面來看，實在不值得如此做。然而，由於系統的服務層級協議通常會規定最短的延遲時間或最小的每秒輸入輸出操作次數，因此必需要達成。對於以有限的資金來維持雲服務的運營商而言，如何降低成本是一個重要的問題。 For example, servers used to provide cloud services typically manage or connect to several hard disks. The client uses the server and the data is read from or written to the hard disk. The delay in response time due to the limitations of the hard disk system can cause problems in service requirements. Under the operation of a normal hard disk system, an excessive response time is generated when the access speed required by the application surface (i.e., workload) exceeds that provided by the hard disk system. The maximum amount of load that a hard disk system can provide is usually the bottleneck in the operation of the entire cloud service system. In other words, the number of input/output operations per second of the hard disk system cannot meet the external requirements. In response to this problem, it is necessary to remove or reduce the workload to achieve and improve the performance of the server. In practice, part of the workload can be shared by other servers (if any) or hard disks, and these servers or hard disks will be automatically or manually added to support existing hard disks. Regardless of which of the above methods is used to solve the problem, the added cost is a large number of hard disks that are reserved in advance due to unpredictable work conditions, and the necessary power consumption for additional hardware devices. From an economic perspective, it is really not worth doing. However, since the system's service level protocol usually specifies the shortest delay time or the minimum number of input/output operations per second, it must be achieved. For operators that maintain cloud services with limited funds, how to reduce costs is an important issue.

值得注意的是伺服器(硬式磁碟系統)的工作量或多或少可以根據歷史記錄來預測未來一段時間內的演變，雲端服務工作量的需求是可以預見的。因此，可以藉著重新配置在硬式磁碟系統內的硬式磁碟，以用最小代價達到工作量的需求。然而，一台機器是不能學會如何與何時進行硬式磁碟的重新配置。在許多情況下，這項工作是由授權人員，根據及時狀態或按照固定的進度表來完成，實行效果可能不是很好。 It is worth noting that the workload of the server (hard disk system) can more or less predict the evolution in the future based on historical records, and the demand for cloud service workload is predictable. Therefore, it can be re A hard disk that is configured in a hard disk system to meet the workload requirements with minimal cost. However, a machine cannot learn how and when to reconfigure a hard disk. In many cases, this work is done by authorized personnel, depending on the status of time or on a fixed schedule, and the effect may not be very good.

另一個和雲端服務一樣的快速增長需求是軟體定義儲存。軟體定義儲存指的是可從管理儲存基礎架構的軟體中，獨立出儲存硬體的計算機資料儲存技術。在軟體定義儲存下，可以啟動一些功能選項，如重複數據刪除、複製、自動精簡配置、快照，和備份，提供策略管理。藉由軟體定義儲存技術，有幾個前案可提供上述問題的解決方案。舉例而言，在美國專利公開第20130297907號中，揭露了一種用於重新配置儲存系統的方法。該方法包含兩個主要的步驟：接收使用者對儲存裝置的需求資訊，並由使用者的需求資訊，自動產生儲存裝置的功能設定及用於該儲存裝置的設備設定檔；及使用該功能設定，以自動重新配置該儲存裝置為一或多個具有獨立行為特徵的邏輯裝置。該申請案的內容指出了一種藉由軟體定義儲存觀念來重新配置儲存裝置的新方法。依照該申請案的方法與系統也能允許使用者動態調整一或多個邏輯裝置的配置，以更彈性地滿足使用者的需求資訊。然而，該申請案卻不能提供依照應用面需求(即工作量)的變化，能夠自動學習如何對儲存裝置重新配置的一種系統。 Another fast-growing demand like cloud services is software definition storage. Software definition storage refers to computer data storage technology that can separate storage hardware from the software that manages the storage infrastructure. Under Software Definition Storage, you can launch some functional options such as deduplication, replication, thin provisioning, snapshots, and backups to provide policy management. With software-defined storage technology, there are several previous cases that provide solutions to the above problems. A method for reconfiguring a storage system is disclosed, for example, in U.S. Patent No. 20,130,297,907. The method comprises two main steps: receiving user demand information of the storage device, and automatically generating a function setting of the storage device and a device setting file for the storage device by using the user's demand information; and setting the function using the function device; To automatically reconfigure the storage device as one or more logical devices with independent behavioral characteristics. The content of this application points to a new way to reconfigure storage devices by software definition storage concepts. The method and system in accordance with the application also allows the user to dynamically adjust the configuration of one or more of the logical devices to more flexibly satisfy the user's demand information. However, the application does not provide a system that automatically learns how to reconfigure the storage device in accordance with changes in application requirements (ie, workload).

因此，本發明揭露一種為軟體定義儲存實現自動學習和資源重分配的新的系統。該系統採用了自適應控制和操作，無需人工干預。 Accordingly, the present invention discloses a new system for implementing automatic learning and resource reallocation for software definition storage. The system uses adaptive control and operation without manual intervention.

由於先前技術無法提供儲存系統依照應用面需求的變化，自動學習如何對其儲存裝置進行重新配置，造成現有儲存系統難以達成服務層級協議(Service Level Agreement)或服務品質(Quality of Service)需求之要求。故發明人利用神經網絡演算法，配合軟體定義儲存系統的操作，提出解決前述問題的發明。 Since the prior art cannot provide a change in the storage system according to the application requirements, it automatically learns how to reconfigure its storage device, making it difficult for the existing storage system to meet the requirements of Service Level Agreement or Quality of Service requirements. . Therefore, the inventors have proposed a solution to the aforementioned problems by using a neural network algorithm and the operation of the software-defined storage system.

依照本發明的一種態樣，一種用於軟體定義儲存系統以改善性能參數的自適應快速反應控制系統，包含：一流量監控模組，用以於一儲存節點，取得性能參數的一觀察值；一自適應雙神經模組，用以在該觀察值與該性能參數的一特定值間，不同的差異值情況下，從儲存裝置配置的歷史紀錄與相關的觀察值中，學習該儲存節點中複數個儲存裝置的最佳配置，及當一現有的差異值不小於一門檻值時，提供該最佳配置；及一快速反應控制模組，如果該現有的差異值不小於該門檻值，用以改變在該儲存節點中，該儲存裝置現有的配置，為由該自適應雙神經模組所提供之儲存裝置的最佳配置。該儲存節點由軟體定義儲存之軟體所運作，在該最佳配置採用後，該現有的差異值將減少。 According to an aspect of the present invention, an adaptive rapid response control system for a software defined storage system to improve performance parameters includes: a flow monitoring module for obtaining an observation value of a performance parameter at a storage node; An adaptive dual neural module for learning the storage node from a historical record of the storage device configuration and related observation values between the observed value and a specific value of the performance parameter and different difference values The optimal configuration of the plurality of storage devices, and providing the optimal configuration when an existing difference value is not less than a threshold value; and a rapid response control module, if the existing difference value is not less than the threshold value, To change the existing configuration of the storage device in the storage node, the optimal configuration of the storage device provided by the adaptive dual neural module. The storage node is operated by software defined by the software, and the existing difference value will be reduced after the optimal configuration is adopted.

自適應雙神經模組包含：一定神經網路元件，當該現有的差異值不小於一容忍值時，用以提供該些最佳配置，而該些最佳配置預設於該自適應快速反應控制系統運行前；及一自適應神經網絡元件，用以在不同的差異值情況下，從該儲存裝置配置的歷史紀錄與一長週期內之相關的觀察值，學習該儲存節點中儲存裝置的最佳配置，及當現有的差異值小於該容忍值但不小於該門檻值時，提供該最佳配置。 The adaptive dual neural module includes: a certain neural network component for providing the optimal configuration when the existing difference value is not less than a tolerance value, and the optimal configurations are preset to the adaptive fast response Before the control system is operated; and an adaptive neural network component for learning the storage device in the storage node from the historical record of the storage device configuration and the observation value associated with a long period of time under different difference values Optimal configuration, and provides the best configuration when the existing difference value is less than the tolerance but not less than the threshold.

依照本案構想，當該定神經網路元件運作時，該自適應神經網絡元件停止運作，或當該自適應神經網絡元件運作時，該定神經網路元件停止工作。該容忍值小於或等於一預設值，而預設值為3秒。該長週期範圍由數十秒到整個歷史記錄期間，而該觀察值在該長週期中不是連續地被記錄。該定神經網路元件所提供的最佳配置與現有的配置間的變化量，大於由該自適應神經網絡元件所提供的最佳配置與現有的配置間的變化量。學習該儲存裝置的最佳配置是藉由神經網絡演算法而達成，該特定值為一服務層級協議或一服務品質需求之要求。該性能參數為每秒輸入輸出操作次數、延遲時間或流通量。該儲存裝置為硬式磁碟、固態硬碟、隨機存取記憶體或其混成組合，而該最佳配置為不同型式儲存裝置的百分比或單一型式儲存裝置使用的固定數量。 According to the present invention, the adaptive neural network element ceases to function when the fixed neural network component operates, or the fixed neural network component ceases to function when the adaptive neural network component operates. The tolerance value is less than or equal to a preset value, and the preset value is 3 seconds. The long period range is from tens of seconds to the entire history period, and the observation value is not continuously recorded in the long period. The amount of variation between the optimal configuration provided by the fixed neural network component and the existing configuration is greater than the amount of variation between the optimal configuration provided by the adaptive neural network component and the existing configuration. The optimal configuration for learning the storage device is achieved by a neural network algorithm that is a service level protocol or a quality of service requirement. The performance parameter is the number of input/output operations per second, delay time, or throughput. The storage device is a hard disk, a solid state hard disk, a random access memory or a hybrid combination thereof, and the optimal configuration is a percentage of different types of storage devices or a fixed number of single type storage devices.

該自適應快速反應控制系統，進一步包含一計算模組，用以計算該差異值及傳遞該計算的差異值到自適應雙神經模組與快速反應控制模組。該流量監控模組、自適應雙神經模組、快速反應控制模組或計算模組是硬體，或是於該儲存節點中的至少一個處理器上執行的軟體。 The adaptive rapid response control system further includes a calculation module for calculating the difference value and transmitting the calculated difference value to the adaptive double Neural module and rapid response control module. The flow monitoring module, the adaptive dual neural module, the rapid response control module or the computing module are hardware or software executed on at least one of the storage nodes.

由以上的硬體實現，可以讓系統自動學習如何對其儲存裝置進行重新配置，以達成服務層級協議或服務品質需求之要求。 Implemented by the above hardware, the system can automatically learn how to reconfigure its storage device to meet the requirements of service level agreements or service quality requirements.

10‧‧‧自適應快速反應控制系統 10‧‧‧Adaptive rapid response control system

100‧‧‧儲存節點 100‧‧‧ storage node

102‧‧‧管理伺服器 102‧‧‧Management Server

104‧‧‧硬式磁碟 104‧‧‧hard disk

106‧‧‧固態硬碟 106‧‧‧ Solid State Drive

120‧‧‧流量監控模組 120‧‧‧Flow Monitoring Module

140‧‧‧計算模組 140‧‧‧Computation Module

160‧‧‧自適應雙神經模組 160‧‧‧Adaptive dual nerve module

162‧‧‧定神經網路元件 162‧‧‧Definite neural network components

164‧‧‧自適應神經網絡元件 164‧‧‧Adaptive Neural Network Components

180‧‧‧快速反應控制模組 180‧‧‧Quick Reaction Control Module

第1圖說明依照本發明實施例之自適應快速反應控制系統的方框圖。 Figure 1 illustrates a block diagram of an adaptive fast response control system in accordance with an embodiment of the present invention.

第2圖顯示一儲存節點的架構。 Figure 2 shows the architecture of a storage node.

第3圖為自適應雙神經模組運作的流程圖。 Figure 3 is a flow chart of the operation of the adaptive dual neural module.

第4圖為自適應雙神經模組所提供之最佳配置表。 Figure 4 shows the optimal configuration table provided by the adaptive dual neural module.

本發明將藉由參照下列的實施方式而更具體地描述。 The invention will be more specifically described by reference to the following embodiments.

請參閱第1圖到第4圖，依照本發明的一實施例揭露於此。第1圖為依照本發明實施例，一自適應快速反應控制系統10的方框圖。該系統能改進一網路中，軟體定義儲存系統的性能參數，諸如每秒輸入輸出操作次數、延遲時間或流通量。在實施例中，軟體定義儲存系統為一儲存節點100，自軟體定義儲存系統取得資料的延遲時間作為例子而說明。該網路可以是網際網路。因而，儲存節點100可以是一資料庫伺服器，管理眾多的儲存設備及提供客戶雲端服務。它也可以是一個檔案伺服器或郵件伺服器，具有專屬使用的儲存設備。該網路也可能用於實驗室的區域網路，或用於跨國企業的廣域網路，本發明並未限定儲存節點100的應用。然而，儲存節點100必須是軟體定義儲存。換句話說，儲存節點100的硬體(儲存裝置)應該能與管理儲存節點100的軟體分離。儲存節點100由軟體定義儲存之軟體所運作。因此，儲存節點100中儲存裝置的重配置能藉由各別的軟體或硬體來實現。 Please refer to FIG. 1 to FIG. 4, which are disclosed herein in accordance with an embodiment of the present invention. 1 is a block diagram of an adaptive fast response control system 10 in accordance with an embodiment of the present invention. The system can improve the performance parameters of a software-defined storage system in a network, such as the number of input/output operations per second, delay time or throughput. In an embodiment, the software definition storage system is a storage node 100, and the delay time for obtaining data from the software definition storage system is illustrated as an example. The The network can be the internet. Thus, the storage node 100 can be a database server that manages a large number of storage devices and provides client cloud services. It can also be a file server or mail server with a dedicated storage device. The network may also be used in a regional network of a laboratory, or a wide area network for a multinational enterprise, and the present invention does not limit the application of the storage node 100. However, storage node 100 must be a software definition store. In other words, the hardware (storage device) of the storage node 100 should be separable from the software managing the storage node 100. The storage node 100 is operated by a software defined by the software definition storage. Therefore, the reconfiguration of the storage device in the storage node 100 can be implemented by a separate software or hardware.

請見第2圖，第2圖顯示儲存節點100的架構。儲存節點100包括1個管理伺服器102、10個硬式磁碟104，與10個固態硬碟106。管理伺服器102能接收指令，以進行硬式磁碟104與固態硬碟106的重配置。儲存節點100的不同配置，即使用之硬式磁碟104與固態硬碟106的百分比，能在不同工作量下維持一定的延遲時間。固態硬碟106具有較硬式磁碟104更快的儲存速度。然而，相同的容量下，固態硬碟106的價格較硬式磁碟104貴出許多。易言之，同樣的成本，硬式磁碟104的儲存容量約為固態硬碟106的十倍。對這樣的儲存節點100而言，因為固態硬碟106的生命週期將下降非常快，提供全以固態硬碟106待機的服務是不經濟的，且當固態硬碟106都用上時，儲存容量會成為一個問題。當儲存節點100的配置包含一些硬式磁碟104與固態硬碟106時，只要延遲時間能滿足服務層級協議(Service Level Agreement)或服務品質(Quality of Service)需求之要求，該儲存節點100仍能順利運行及避免前述的問題。 See Figure 2, which shows the architecture of the storage node 100. The storage node 100 includes a management server 102, 10 hard disks 104, and 10 solid state disks 106. The management server 102 can receive instructions to reconfigure the hard disk 104 and the solid state disk 106. The different configurations of the storage node 100, i.e., the percentage of the hard disk 104 and the solid state disk 106 used, can maintain a certain delay time at different workloads. The solid state hard disk 106 has a faster storage speed than the hard disk 104. However, at the same capacity, the price of the solid state drive 106 is much more expensive than the hard disk 104. In other words, at the same cost, the hard disk 104 has a storage capacity ten times that of the solid state disk 106. For such a storage node 100, since the life cycle of the solid state hard disk 106 will drop very fast, it is uneconomical to provide a service in which the solid state hard disk 106 is standby, and when the solid state hard disk 106 is used, the storage capacity Will become a problem. When the configuration of the storage node 100 includes some hard disks 104 and solid state disks 106, as long as the delay time can satisfy the service The storage node 100 can still operate smoothly and avoid the aforementioned problems as required by the Service Level Agreement or the Quality of Service requirements.

自適應快速反應控制系統10包含一流量監控模組120、一計算模組140、一自適應雙神經模組160，與一快速反應控制模組180。流量監控模組120被用來取得儲存節點100延遲時間的觀察值。計算模組140能計算一觀察值與延遲時間的一特定值間的差異值，並傳遞該計算的差異值到自適應雙神經模組160與快速反應控制模組180。此處，延遲時間的特定值是指在服務層級協議或服務品質需求中所要求的數值，它是儲存節點100最長的延遲時間，應於正常使用下所提供的服務中實行(例外於儲存節點100開機時或超大工作量發生時)。對本實施例而言，延遲時間的特定值為2秒。任何的特定值皆可行，本發明並未限定之。 The adaptive rapid response control system 10 includes a flow monitoring module 120, a computing module 140, an adaptive dual neural module 160, and a rapid response control module 180. The traffic monitoring module 120 is used to obtain an observation of the delay time of the storage node 100. The calculation module 140 can calculate a difference value between an observation value and a specific value of the delay time, and transmit the calculated difference value to the adaptive dual neural module 160 and the rapid response control module 180. Here, the specific value of the delay time refers to the value required in the service level agreement or the quality of service requirement, which is the longest delay time of the storage node 100, and should be implemented in the service provided under normal use (except for the storage node) 100 when starting up or when a large workload occurs). For the present embodiment, the specific value of the delay time is 2 seconds. Any specific values are possible, and the invention is not limited thereto.

自適應雙神經模組160被用來自硬式磁碟104與固態硬碟106配置的歷史紀錄與相關的觀察值中，在不同差異值的情況下，學習儲存節點100中硬式磁碟104與固態硬碟106的最佳配置。差異值存在於觀察值與延遲時間特定值間，自適應雙神經模組160也能提供最佳配置給快速反應控制模組180。當現有的差異值不小於一門檻值時，自適應雙神經模組160才運作。所謂的現有的差異值，指的是來自流量監控模組120的觀察值與延遲時間特定值(2秒)間的最新的差異值。該門檻值是對於延遲時間特定值預設的超出時間。因為超出延遲時間特定值的時間太短，就不值得改變硬式磁碟104與固態硬碟106的配置來減少延遲時間，現有的配置可以持續運作。本實施例中該門檻值為0.2秒。當然，它可以為儲存節點100提供的不同的服務而改變。 The adaptive dual neural module 160 is used to learn the hard disk 104 and solid state hard in the storage node 100 with different historical values from the historical records and associated observations of the configuration of the hard disk 104 and the solid state hard disk 106. The best configuration of the dish 106. The difference value exists between the observed value and the delay time specific value, and the adaptive dual neural module 160 can also provide an optimal configuration to the rapid response control module 180. The adaptive dual neural module 160 operates when the existing difference value is not less than a threshold. The so-called existing difference value refers to the latest difference value between the observation value from the flow monitoring module 120 and the delay time specific value (2 seconds). The threshold value is a preset timeout for a specific value of the delay time. Since the time beyond the delay time specific value is too short, it is not worth changing the configuration of the hard disk 104 and the solid state hard disk 106 to reduce the delay time, and the existing configuration can continue to operate. In this embodiment, the threshold is 0.2 seconds. Of course, it can be changed for the different services provided by the storage node 100.

為了實現自適應雙神經模組160提供的運作，自適應雙神經模組160能進一步包含兩個主要的部份：一定神經網路元件162與一自適應神經網絡元件164。定神經網路元件162提供最佳配置，該些最佳配置預設於自適應快速反應控制系統160運行前，當現有的差異值不小於容忍值時，定神經網路元件162啟動。此處，該容忍值為延遲時間特定值外之一額外值。一旦該容忍值被察覺到，某些緊急處理就得進行，快速地縮短延遲時間，以便客戶不必在接著的幾秒鐘內一直等待儲存節點100的回覆。定神經網路元件162的運作能被視為對隨工作量增加而延長的延遲時間的一種制約動作。實作上，該容忍值應少於或等於一預設值。較佳的情況是少於或等於3秒。因此，實施例以3秒作為該容忍值。 In order to achieve the operation provided by the adaptive dual neural module 160, the adaptive dual neural module 160 can further comprise two main components: a certain neural network component 162 and an adaptive neural network component 164. The fixed neural network component 162 provides an optimal configuration that is preset before the adaptive fast response control system 160 operates, and when the existing difference value is not less than the tolerance value, the fixed neural network component 162 is activated. Here, the tolerance value is an extra value other than the specific value of the delay time. Once the tolerance value is perceived, some emergency processing has to be performed, quickly reducing the delay time so that the customer does not have to wait for the reply of the storage node 100 for the next few seconds. The operation of the fixed neural network component 162 can be viewed as a constraint to the extended delay time as the workload increases. In practice, the tolerance value should be less than or equal to a preset value. Preferably, it is less than or equal to 3 seconds. Therefore, the embodiment takes 3 seconds as the tolerance value.

自適應神經網絡元件164被用來在不同的差異值情況下，從硬式磁碟104與固態硬碟106配置的歷史紀錄與一長週期內之相關的觀察值，學習該儲存節點100中硬式磁碟104與固態硬碟106的最佳配置。它也能提供最佳配置。當現有的差異值小於該容忍值但不小於門檻值時，自適應神經網絡元件164才運作。前述的長週期可以短到數十秒到，長至儲存節點100的整個歷史記錄期間。能被提供作用於自適應神經網絡元件164的資料，以學習硬式磁碟104與固態硬碟的最佳配置，儲存節點100的任何紀錄都是可行的。最好使用整個歷史記錄期間的紀錄資料。可以理解的是在該長週期中，某些觀察值不是連續地被記錄，某些資料也可能遺失，但自適應神經網絡元件164仍能使用這些不連續的紀錄。 The adaptive neural network component 164 is used to learn the hard magnetic state of the storage node 100 from the historical records of the hard disk 104 and the solid state hard disk 106 configuration and the observations associated with a long period of time with different difference values. The optimal configuration of the disc 104 and the solid state hard drive 106. It also provides the best configuration. Adaptive neural network when the existing difference value is less than the tolerance value but not less than the threshold value The network element 164 operates. The aforementioned long period can be as short as several tens of seconds up to the entire history of the storage node 100. Any record that can be applied to the adaptive neural network component 164 to learn the optimal configuration of the hard disk 104 and the solid state hard disk, any record of the storage node 100 is feasible. It is best to use the records of the entire history period. It will be appreciated that during this long period, certain observations are not continuously recorded and some data may be lost, but adaptive neural network component 164 can still use these discrete records.

因為儲存節點100硬體的複雜度及來自客戶的需求所產生的不同的工作量，將導致儲存節點100不同的延遲時間，延遲時間與工作量隨著時間，並不存在特定的關係。對自適應快速反應控制系統10而言，具有對儲存節點100的管理方法之最好的方式是靠自身來學習其間的變動關係。因此，神經網絡演算法是一種達到該目標的不錯的方法，學習硬式磁碟104與固態硬碟106的最佳配置可藉由神經網絡演算法達成。雖然現今有許多的神經網絡演算法，本發明並不限定於使用哪一種。每一種演算法的模式中，不同層的參數設定能利用其他系統的經驗來建立。 Because of the complexity of the storage node 100 hardware and the different workloads generated by the customer's needs, the storage node 100 will have different delay times. The delay time and the workload have no specific relationship with time. For the adaptive rapid response control system 10, the best way to have a management method for the storage node 100 is to learn the change relationship between itself by itself. Therefore, neural network algorithms are a good way to achieve this goal. Learning the optimal configuration of hard disk 104 and solid state hard disk 106 can be achieved by neural network algorithms. Although there are many neural network algorithms available today, the invention is not limited to which one to use. In each algorithm's mode, the parameter settings of different layers can be established using the experience of other systems.

為了知道自適應雙神經模組160如何運作，請參閱第3圖，第3圖為自適應雙神經模組160運作的流程圖。在延遲時間的觀察值由流量監控模組120取得後(S01)，及該計算模組140計算延遲時間的現有的差異值之後(S02)，自適應雙神經模組160將判斷是否現有的差異值不小於門檻值，0.2秒 (S03)。如果為否，硬式磁碟104與固態硬碟106現有的配置維持不變(S04)；如果是，自適應雙神經模組160將判斷是否現有的差異值不小於該容忍值，3秒(S05)。如果為否，自適應神經網絡元件164運作(S06)；如果是，定神經網路元件162運作(S07)。很明顯的是當定神經網路元件162運作時，自適應神經網絡元件164停止運作；當自適應神經網絡元件164運作時，定神經網路元件162停止工作。 In order to know how the adaptive dual neural module 160 operates, please refer to FIG. 3, which is a flow chart of the operation of the adaptive dual neural module 160. After the observation value of the delay time is obtained by the flow monitoring module 120 (S01), and the calculation module 140 calculates the existing difference value of the delay time (S02), the adaptive dual neural module 160 determines whether the existing difference is present. The value is not less than the threshold, 0.2 seconds (S03). If not, the existing configuration of the hard disk 104 and the solid state hard disk 106 remains unchanged (S04); if so, the adaptive dual neural module 160 will determine whether the existing difference value is not less than the tolerance value, 3 seconds (S05) ). If not, the adaptive neural network component 164 operates (S06); if so, the fixed neural network component 162 operates (S07). It will be apparent that when the fixed neural network component 162 is operational, the adaptive neural network component 164 ceases to function; when the adaptive neural network component 164 is operational, the fixed neural network component 162 ceases to function.

如果現有的差異值不小於門檻值時，快速反應控制模組180能轉變儲存節點100中硬式磁碟104與固態硬碟106現有的配置，成為自適應雙神經模組160提供的硬式磁碟104與固態硬碟106的最佳配置。因此，快速反應控制模組180能總是使用自適應雙神經模組160提供的最佳配置，以調整儲存節點100的配置。現有的差異值在最佳配置採用後，會變得較少。 If the existing difference value is not less than the threshold value, the rapid response control module 180 can change the existing configuration of the hard disk 104 and the solid state hard disk 106 in the storage node 100 to become the hard disk 104 provided by the adaptive dual neural module 160. The best configuration with solid state hard disk 106. Therefore, the rapid response control module 180 can always use the optimal configuration provided by the adaptive dual neural module 160 to adjust the configuration of the storage node 100. Existing difference values become less after the optimal configuration is adopted.

請見第4圖，這是本實施例自適應雙神經模組160提供的最佳配置表。當儲存節點100運作，其延遲時間小於2秒時，該配置包含50%的硬式磁碟104與50%的固態硬碟106。即使延遲時間差異值在0.2秒(延遲時間為2.2秒)內，因為延遲時間差異值仍小於門檻值，自適應雙神經模組160將不會運作，配置維持原樣。當延遲時間差異值增加超過0.2秒時，自適應神經網絡元件164運作，以歷史紀錄與某些新接受的資料，學習硬式磁碟104與固態硬碟106的最佳配置，前述之新接受的資料將被視為歷史紀錄以供學習。同時，基於過去學習的結果，當延遲時間差異值不小於0.2秒但小於0.5秒時，自適應神經網絡元件164提供快速反應控制模組180的最佳配置為40%的硬式磁碟104與60%的固態硬碟106；當延遲時間差異值不小於0.5秒但小於1.0秒時，最佳配置為30%的硬式磁碟與70%的固態硬碟106；當延遲時間差異值不小於1.0秒但小於3.0秒時，d最佳配置為20%的硬式磁碟104與80%的固態硬碟106。當然，因為客戶的行為模式可能在未來會變動，最佳配置能從現有歷史紀錄學習中，進一步改變。在新的最佳配置於不同的延遲時間差異值下應用後，延遲時間將快速變成小於該特定值，2秒。要注意的是對最佳配置來說，全部分階數不限於上面所說的6個，可以是大於6個或小於6個。舉例而言，延遲時間差異值的分階數量，在門檻值與容忍值間可以是5個。換句話說，每0.5秒劃為一分階。這樣一來，在本實施例中，全部分階數量變成8個，而非6個。這是因為由自適應雙神經模組160學習來的最佳配置，依賴應用(即工作量)需求的種類，及儲存節點100中硬式磁碟與固態硬碟的硬體規格。 Please refer to FIG. 4, which is the optimal configuration table provided by the adaptive dual neural module 160 of the present embodiment. When the storage node 100 is operating with a latency of less than 2 seconds, the configuration includes 50% hard disk 104 and 50% solid state disk 106. Even if the delay time difference value is within 0.2 seconds (delay time is 2.2 seconds), since the delay time difference value is still less than the threshold value, the adaptive dual neural module 160 will not operate and the configuration remains intact. When the delay time difference value increases by more than 0.2 seconds, the adaptive neural network component 164 operates to learn the optimal configuration of the hard disk 104 and the solid state hard disk 106 with historical records and some newly accepted data, the aforementioned new acceptance. The information will be considered historical records for study. At the same time, based on the results of past learning, when the delay time difference value is not less than 0.2 seconds but less than 0.5 seconds, the adaptive neural network component 164 provides the optimal configuration of the fast response control module 180 to 40% of the hard disks 104 and 60. % solid state hard disk 106; when the delay time difference value is not less than 0.5 second but less than 1.0 second, the optimal configuration is 30% hard disk and 70% solid state hard disk 106; when the delay time difference value is not less than 1.0 second However, when less than 3.0 seconds, d is optimally configured as a 20% hard disk 104 and an 80% solid state disk 106. Of course, because the customer's behavior patterns may change in the future, the best configuration can be further changed from the existing history. After the new optimal configuration is applied at different delay time difference values, the delay time will quickly become less than the specific value, 2 seconds. It should be noted that for the optimal configuration, the full part order is not limited to the above six, and may be more than 6 or less than 6. For example, the number of steps of the delay time difference value may be five between the threshold value and the tolerance value. In other words, every 0.5 seconds is divided into a step. In this way, in the present embodiment, the number of all partial steps becomes eight instead of six. This is because the optimal configuration learned by the adaptive dual neural module 160 depends on the type of application (ie, workload) requirements and the hardware specifications of the hard disk and solid state disk in the storage node 100.

當延遲時間差異值不小於該容忍值時，配置的適度調整為時已晚。在這種情況下，應該實施一種強制的手段以快速地減少延遲時間。因此，定神經網路元件162運作且自適應神經網絡元件164停止運作。定神經網路元件162將提供預設的最佳配置於硬式磁碟104與固態硬碟106。依照本實施例，當延遲時間差異值不小於3.0秒但小於5.0秒時，最佳配置為10%的硬式磁碟104與90%的固態硬碟106；當延遲時間差異值不小於5.0秒時，最佳配置為0%的硬式磁碟104與100%的固態硬碟106。在這極端的案例中，使用了所有的固態硬碟106。 When the delay time difference value is not less than the tolerance value, the modest adjustment of the configuration is too late. In this case, a mandatory means should be implemented to quickly reduce the delay time. Thus, the fixed neural network component 162 operates and the adaptive neural network component 164 ceases to function. The fixed neural network component 162 will provide a predetermined optimal configuration for the hard disk 104 and the solid state hard disk 106. According to this implementation For example, when the delay time difference value is not less than 3.0 seconds but less than 5.0 seconds, the optimal configuration is 10% of the hard disk 104 and 90% of the solid state hard disk 106; when the delay time difference value is not less than 5.0 seconds, the best It is configured as a 0% hard disk 104 and a 100% solid state disk 106. In this extreme case, all solid state drives 106 are used.

然而，雖然定神經網路元件162與自適應神經網絡元件164能提供最佳配置，可以從第4圖看見，由定神經網路元件162提供的最佳配置與現有的配置(50%的硬式磁碟104與50%的固態硬碟106)間的變化量大於由自適應神經網絡元件164提供的最佳配置與現有的配置間的變化量。 However, while the fixed neural network component 162 and the adaptive neural network component 164 can provide an optimal configuration, as seen in Figure 4, the optimal configuration provided by the fixed neural network component 162 is compatible with the existing configuration (50% hard The amount of variation between disk 104 and 50% solid state hard disk 106) is greater than the amount of variation between the optimal configuration provided by adaptive neural network component 164 and the existing configuration.

如上所述，延遲時間僅是服務層級協議要求的性能參數之一。其它性能參數能以相同的方法改變，以調整改變硬式磁碟104與固態硬碟106配置的。舉例而言，每秒輸入輸出操作次數與流通量能隨著固態硬碟106的增加而增加。 As mentioned above, the delay time is only one of the performance parameters required by the service level protocol. Other performance parameters can be changed in the same manner to adjust the configuration of the hard disk 104 and solid state hard disk 106. For example, the number of input and output operations and throughput per second can increase as the solid state hard disk 106 increases.

需要強調的是儲存裝置並不限定於硬式磁碟與固態硬碟，隨機存取記憶體也能被使用。因而，硬式磁碟與隨機存取記憶體，或固態硬碟與隨機存取記憶體的混搭形式也是可被應用的。實施例中的最佳配置為使用中不同型式儲存裝置的百分比。它也能是單一型式儲存裝置使用的固定數量(如儲存節點僅包含固態硬碟，重配置藉由增加新的或待機固態硬碟而完成)。最重要的是，流量監控模組120、計算模組140、自適應雙神經模組160，及快速反應控制模組180能以硬體，或藉由儲存節點100內的至少一個處理器而執行。 It should be emphasized that the storage device is not limited to hard disks and solid state disks, and random access memory can also be used. Thus, a hard disk and a random access memory, or a mashup of a solid state hard disk and a random access memory can also be applied. The optimal configuration in the embodiment is the percentage of different types of storage devices in use. It can also be a fixed number of single-type storage devices (eg, storage nodes contain only solid-state drives, and reconfiguration is accomplished by adding new or standby solid-state drives). Most importantly, the traffic monitoring module 120, the computing module 140, the adaptive dual neural module 160, and the rapid response control module 180 can be implemented by hardware or by at least one processor in the storage node 100. .

雖然本發明已以實施方式揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作些許之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the invention, and those skilled in the art can make some modifications and refinements without departing from the spirit and scope of the invention. The scope of the invention is defined by the scope of the appended claims.

Claims

An adaptive rapid response control system for a software defined storage system to improve performance parameters, comprising: a flow monitoring module for obtaining an observation value of a performance parameter at a storage node; an adaptive dual neural module, For determining whether to provide a preset optimal configuration or an adaptive optimal configuration, when the existing difference value is not less than a tolerance value, providing the preset optimal configuration, and when the existing difference value is less than a tolerance The adaptive optimal configuration is provided when the value is not less than a threshold, wherein the adaptive optimal configuration is stored from the difference between the observed value and a specific value of the performance parameter. Learning from the historical record of the device configuration and related observations; and a rapid response control module for changing the existing storage device in the storage node if the existing difference value is not less than the threshold value The configuration is an optimal configuration of the storage device provided by the adaptive dual neural module, wherein the storage node is operated by a software defined by the software, and is used in the optimal configuration. The existing difference value will decrease.

The adaptive rapid response control system according to claim 1, wherein the adaptive dual neural module comprises: a certain neural network component, when the existing difference value is not less than the tolerance value, Providing the preset optimal configuration, and the preset optimal configuration is preset before the operation of the adaptive rapid response control system; and an adaptive neural network component for using different difference values from the The historical record of the storage device configuration and the observation value associated with a long period of time, learning the adaptive optimal configuration of the storage device in the storage node, and when the existing difference value is less than the tolerance value but not less than the threshold value Provides this adaptive optimal configuration.

The adaptive rapid response control system of claim 2, wherein the adaptive neural network component stops operating when the fixed neural network component operates, or when the adaptive neural network component operates The neural network component stops working.

The adaptive rapid response control system of claim 2, wherein the tolerance value is less than or equal to a predetermined value.

The adaptive rapid response control system of claim 4, wherein the preset value is 3 seconds.

The adaptive rapid response control system of claim 2, wherein the long period ranges from tens of seconds to the entire history period.

The adaptive rapid response control system of claim 2, wherein the observation is not continuously recorded during the long period.

The adaptive rapid response control system of claim 2, wherein the amount of change between the preset optimal configuration and the existing configuration provided by the fixed neural network component is greater than the adaptive neural network Component mention The amount of change between this adaptive optimal configuration and the existing configuration.

The adaptive rapid response control system of claim 2, wherein the adaptive optimal configuration for learning the storage device is achieved by a neural network algorithm.

The adaptive rapid response control system of claim 1, wherein the specific value is a service level agreement or a quality of service requirement.

The adaptive rapid response control system of claim 1, wherein the performance parameter is an input/output operation number per second, a delay time, or a throughput.

The adaptive rapid response control system of claim 1, wherein the storage device is a hard disk, a solid state hard disk, or a combination thereof.

The adaptive rapid response control system of claim 1, wherein the preset optimal configuration and the adaptive optimal configuration are a percentage of different types of storage devices or a fixed number used by a single type storage device.

The adaptive rapid response control system of claim 1, further comprising a calculation module for calculating the difference value and transmitting the calculated difference value to the adaptive dual neural module and the rapid response control Module.

The adaptive rapid response control system according to claim 1, wherein the flow monitoring module, the adaptive dual neural module, and the rapid response The control module or the computing module is a hardware.

The adaptive rapid response control system of claim 1, wherein the flow monitoring module, the adaptive dual neural module, the rapid response control module, or the computing module is at least one of the storage nodes Software executed on one processor.