TWI608358B - Method for data protection in cloud-based service system - Google Patents

Method for data protection in cloud-based service system Download PDF

Info

Publication number
TWI608358B
TWI608358B TW105124795A TW105124795A TWI608358B TW I608358 B TWI608358 B TW I608358B TW 105124795 A TW105124795 A TW 105124795A TW 105124795 A TW105124795 A TW 105124795A TW I608358 B TWI608358 B TW I608358B
Authority
TW
Taiwan
Prior art keywords
data
storage device
life
model
storage devices
Prior art date
Application number
TW105124795A
Other languages
Chinese (zh)
Other versions
TW201810063A (en
Inventor
陳文賢
黃純芳
黃明仁
Original Assignee
先智雲端數據股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 先智雲端數據股份有限公司 filed Critical 先智雲端數據股份有限公司
Priority to TW105124795A priority Critical patent/TWI608358B/en
Application granted granted Critical
Publication of TWI608358B publication Critical patent/TWI608358B/en
Publication of TW201810063A publication Critical patent/TW201810063A/en

Links

Landscapes

  • Debugging And Monitoring (AREA)

Description

用於雲端服務系統中資料保護的方法 Method for data protection in cloud service system

本發明關於一種用於資料保護的方法,特別是一種用於雲端服務系統中資料保護的方法。 The invention relates to a method for data protection, in particular to a method for data protection in a cloud service system.

工作負載,如MongoDB,運作於具有群集結點的雲端服務系統上。該工作負載可於雲端服務系統的單一節點或數個節點上運行,每一節點指定至少一個磁碟來儲存供存取的資料。就在單一節點上作業的工作負載來說,當指定的磁碟損壞時,該工作負載在備份的資料回存前都無法執行。對在數個節點上作業的工作負載來說,當其中一個指定的磁碟損壞,或甚至一整個節點都損毀時,因為資料需要轉移到新的節點上,雲端服務系統的性能可能會降低,工作負載的性能也會受影響。顯然,雲端服務系統中磁碟的健康情況及對資料回復之精心策畫的存檔作業乃是對工作負載進行資料保護的重要因素。 Workloads, such as MongoDB, operate on cloud service systems with cluster nodes. The workload can be run on a single node or nodes of the cloud service system, each node specifying at least one disk to store the data for access. In the case of a workload running on a single node, when the specified disk is damaged, the workload cannot be executed until the backed up data is restored. For workloads working on several nodes, when one of the specified disks is damaged, or even an entire node is damaged, the performance of the cloud service system may be degraded because the data needs to be transferred to the new node. The performance of the workload is also affected. Obviously, the health of the disk in the cloud service system and the well-planned archiving of data recovery are important factors for data protection of the workload.

事實上,現有很多技術可提供上述相應需求解決方案,而其中多數是關於儲存設備壽命之預測。舉例來說,一 種用於監測儲存設備壽命的習知做法可包含步驟:設定一資料庫,該資料庫紀錄數筆訓練資料,其中每一筆訓練資料包含操作習慣資訊與一對應的操作壽命值;自對應儲存設備取得操作習慣資訊;依照該操作習慣資訊與對應的訓練資料之操作壽命值,建立一儲存設備壽命預測模型;及輸入儲存設備的操作習慣資訊到該儲存設備壽命預測模型中,以為個別儲存設備產生一預測壽命值。儲存設備壽命預測模型也能藉使用預測壽命值而建立。當儲存設備中的一第一儲存設備損毀時,紀錄該第一儲存設備的一真實壽命以建立該儲存設備壽命預測模型。 In fact, many of the existing technologies provide the above-mentioned corresponding demand solutions, and most of them are about the prediction of the life of the storage device. For example, one A conventional method for monitoring the life of a storage device may include the steps of: setting a database for recording a plurality of training materials, wherein each training material includes operational habit information and a corresponding operational life value; Obtaining operational habit information; establishing a storage device life prediction model according to the operational habit information and the operational life value of the corresponding training data; and inputting the operational habit information of the storage device to the storage device life prediction model to generate for the individual storage device A predicted lifetime value. The storage device life prediction model can also be established by using the predicted life value. When a first storage device in the storage device is damaged, a real life of the first storage device is recorded to establish the storage device life prediction model.

雖然有許多的方法可用來預測儲存設備的壽命,以便資料保護可藉由該預測結果而進行,而應用時仍遭遇許多的挑戰。首先,一個儲存設備(硬碟或固態硬碟)的損壞機率隨著儲存設備接近使用壽命終點時急速增加。然而,前述方法僅依賴操作壽命值的訓練資料,在使用壽命終點前突發的儲存設備損毀難以預測。第二,儲存設備的損毀為施載工作負載的結果。也就是說,越高的工作負載使用需求會導致較短的儲存設備壽命。工作負載的影響並未於先前方法中考量。此外,資料保護應包含儲存於儲存設備中資料備分的適當計畫。如果資料備份時常進行,可能會減少相關工作負載的性能。反之,則工作負載系統性崩潰可能會發生。如果有了預測的儲存設備壽命,該問題就可獲致解決。 Although there are many ways to predict the life of a storage device so that data protection can be performed with this prediction, many challenges are still encountered in the application. First, the probability of damage to a storage device (hard disk or solid state drive) increases rapidly as the storage device approaches the end of its useful life. However, the foregoing method relies only on training data for operating life values, and sudden storage device damage before the end of the service life is difficult to predict. Second, the destruction of the storage device is the result of loading the workload. In other words, higher workload usage requirements result in shorter storage device life. The impact of the workload is not considered in the previous method. In addition, data protection should include appropriate plans for data backups stored in storage devices. If data backups occur frequently, the performance of related workloads may be reduced. Conversely, a systemic crash of the workload may occur. This problem can be solved if there is a predicted storage device life.

因此,本發明揭露一種用於雲端服務系統中資料保護的方法,此為上述問題的解決案。最重要的,本發明導入“接近損壞機率”的概念,考慮當一磁碟接近使用壽命終點時的損壞概率。從而,本發明能提供磁碟可能損壞的時間點之較精準的預測,且為雲端服務系統中資料保護的創新方法。 Therefore, the present invention discloses a method for data protection in a cloud service system, which is a solution to the above problem. Most importantly, the present invention introduces the concept of "proximity to damage", considering the probability of damage when a disk approaches the end of its useful life. Thus, the present invention can provide a more accurate prediction of the point in time at which the disk may be damaged, and is an innovative method of data protection in the cloud service system.

本段文字提取和編譯本發明的某些特點。其他特點將被揭露於後續段落中。其目的在涵蓋附加的申請專利範圍之精神和範圍中,各式的修改和類似的排列。 This paragraph of text extracts and compiles certain features of the present invention. Other features will be revealed in subsequent paragraphs. The intention is to cover various modifications and similar arrangements in the spirit and scope of the appended claims.

為了解決上述問題,本發明揭露一種用於雲端服務系統中資料保護的方法,該方法包含步驟:A.蒐集雲端服務系統中儲存設備的歷史運作資料;B.由該些蒐集的運作資料建立一壽命預期模型與一次7日損壞機率模型;C.為每個儲存設備輸入過去24小時的運作資料到該壽命預期模型與次7日損壞機率模型中,以取得各組中預期壽命的範圍及對應的損壞機率;及D.依照步驟C的結果備份該些儲存設備中的資料。 In order to solve the above problem, the present invention discloses a method for data protection in a cloud service system, the method comprising the steps of: A. collecting historical operation data of a storage device in a cloud service system; B. establishing a collection of operation data from the cloud service system Life expectancy model and a 7-day damage probability model; C. Enter the operational data of the past 24 hours for each storage device into the life expectancy model and the 7-day damage probability model to obtain the range and corresponding life expectancy in each group The probability of damage; and D. Back up the data in the storage devices according to the results of step C.

依照本發明,該運作資料可為性能資料、SMART(Self-Monitoring Analysis and Reporting Technology,自我監測分析和報告技術)資料、該些儲存設備的可用容量、該些儲存設備的總容量,或設備元資料。該性能資料可為延遲時間、流通量、CPU(Central Processing Unit,中央處理器)負載、記憶體使用量,或IOPS(Input/Output Per Second,每 秒輸入/輸出操作次數)。該儲存設備可為硬碟或固態硬碟。該壽命預期模型與次7日損壞機率模型以未來所新蒐集到的運作資料持續進行更新。蒐集儲存設備的歷史運作資料的時間間隔可為1小時。壽命預期模型由以下步驟所建立:B1.區分該些儲存設備為好的與損壞的;B2.以不同壽命範圍歸類該些損壞的儲存設備並設定所有好的儲存設備為一特定壽命範圍;B3.依照該些壽命範圍分級(binning)該些儲存設備的運作資料至複數個組中;及B4.對所有組常規化來自每一儲存設備的運作資料。壽命預期模型由以下步驟所操作:B3’.依照該些壽命範圍分級該些儲存設備的運作資料至複數個組中;及B4’.對所有組常規化來自每一儲存設備的運作資料。該次7日損壞機率模型由以下步驟所建立:B5.排序該些運作資料;B6.對損壞的儲存設備及複數個隨機選取的好的儲存設備,取得由最近蒐集時點起算7天內的儲存設備的運作資料;及B7.常規化來自每一儲存設備的運作資料。由最近蒐集時點起算7天內用於蒐集運作資料之損壞的儲存設備與好的儲存設備的比例可為1:1。 According to the present invention, the operational data may be performance data, SMART (Self-Monitoring Analysis and Reporting Technology) data, available capacity of the storage devices, total capacity of the storage devices, or device elements. data. The performance data can be delay time, throughput, CPU (Central Processing Unit) load, memory usage, or IOPS (Input/Output Per Second, per Second input/output operation times). The storage device can be a hard disk or a solid state hard disk. The life expectancy model and the 7th damage probability model are continuously updated with the operational data newly collected in the future. The time interval for collecting historical operational data for storage devices can be one hour. The life expectancy model is established by the following steps: B1. distinguishing the storage devices from good and damaged; B2. classifying the damaged storage devices with different life ranges and setting all good storage devices to a specific life span; B3. Binning the operational data of the storage devices to a plurality of groups according to the life ranges; and B4. normalizing the operational data from each storage device for all groups. The life expectancy model is operated by the following steps: B3'. classify the operational data of the storage devices into a plurality of groups according to the life ranges; and B4'. normalize the operational data from each storage device for all groups. The 7-day damage probability model is established by the following steps: B5. Sorting the operational data; B6. Storing the damaged storage device and a plurality of randomly selected good storage devices within 7 days from the time of the recent collection. Information on the operation of the equipment; and B7. Regularization of operational information from each storage device. The ratio of storage devices used to collect operational data damage to good storage devices within 7 days from the date of collection may be 1:1.

依照本發明,該方法可進一步包含步驟:A1.蒐集儲存設備的歷史運作資料,其中該些儲存設備為全新或剛加入該雲端服務系統。 According to the present invention, the method may further comprise the steps of: A1. Collecting historical operational data of the storage device, wherein the storage devices are brand new or just joined the cloud service system.

該壽命預期模型可以一ANN(Artificial Neural Network,人工神經網絡)演算法計算該些輸入的過去24小時 的運作資料與歷史運作資料來預測該些預期壽命的範圍。該次7日損壞機率模型亦可以一ANN演算法計算該些輸入的過去24小時的運作資料與歷史運作資料來預測對應的損壞機率。 The life expectation model can calculate the past 24 hours of the inputs by an ANN (Artificial Neural Network) algorithm. Operational data and historical operational data to predict the extent of these life expectancy. The 7-day damage probability model can also calculate the corresponding damage probability by using an ANN algorithm to calculate the operational data and historical operational data of the past 24 hours.

步驟D進一步可為具有預期壽命低於一對應特定壽命及/或具有損壞機率超過一特定百分比的儲存設備內的資料進行備份。步驟D亦可進一步由執行具有計算得到的快照時間間隔的快照作業為儲存設備內的資料進行備份。對於後者,該快照時間間隔由輸入步驟C的結果到一模糊系統中而計算得到。 Step D may further back up data in a storage device having an expected life of less than a corresponding specified lifetime and/or having a probability of damage exceeding a certain percentage. Step D may further perform backup of the data in the storage device by performing a snapshot job having the calculated snapshot interval. For the latter, the snapshot time interval is calculated by entering the result of step C into a fuzzy system.

該模糊系統由以下步驟形成:E1.定義用於該些分級(bins)、損壞機率,及快照時間間隔的語言值;E2.建構隸屬函數以描述所有分級、損壞機率,及快照時間間隔的程度;及E3.為該些分級、損壞機率,及快照時間間隔建構模糊規則。該模糊系統包含運作步驟:F1.接收一分級與一損壞機率;F2.由輸入該分級與損壞機率到該些模糊規則的隸屬函數以執行模糊化、模糊推理,與結果匯總;及F3.解模糊化以得到一快照時間間隔。 The fuzzy system is formed by the following steps: E1. Defining the language values for the bins, the probability of damage, and the snapshot interval; E2. Constructing a membership function to describe all the ratings, the probability of corruption, and the extent of the snapshot interval And E3. Construct fuzzy rules for these grading, damage probability, and snapshot time intervals. The fuzzy system includes operational steps: F1. receiving a grading and a probability of damage; F2. inputting the grading and damage probability to the membership functions of the fuzzy rules to perform fuzzification, fuzzy reasoning, and summarizing the results; and F3. Blurring to get a snapshot interval.

藉由壽命預期模型與次7日損壞機率模型,預期壽命與任何儲存設備在接下來7天內的損壞機率可以被決定。一但獲得這些結果,資料備份(快照作業)的時程也能被決定,前述問題可以一次獲得解決。 With the life expectancy model and the next 7-day damage probability model, the life expectancy and the probability of damage to any storage device over the next 7 days can be determined. Once these results are obtained, the time course of the data backup (snapshot job) can also be determined, and the aforementioned problems can be solved at one time.

10‧‧‧雲端服務系統 10‧‧‧Cloud Service System

100‧‧‧伺服器 100‧‧‧Server

101‧‧‧中央處理器 101‧‧‧Central Processing Unit

102‧‧‧儲存設備輸出輸入單元 102‧‧‧Storage device output input unit

103‧‧‧資料庫 103‧‧‧Database

104‧‧‧網路輸出輸入單元 104‧‧‧Network output input unit

110‧‧‧運作資料搜集器 110‧‧‧ Operational Data Collector

200(1)‧‧‧第一儲存設備 200(1)‧‧‧First storage equipment

200(2)‧‧‧第二儲存設備 200(2)‧‧‧Second storage equipment

200(3)‧‧‧第三儲存設備 200(3)‧‧‧ third storage equipment

200(N)‧‧‧第N儲存設備 200(N)‧‧‧Nth storage equipment

200(N+1)‧‧‧第N+1儲存設備 200(N+1)‧‧‧N+1 storage device

第1圖顯示該方法可以應用的典型雲端服務系統。 Figure 1 shows a typical cloud service system that the method can be applied to.

第2圖為依照本發明,一種用於雲端服務系統中資料保護的方法的流程圖。 2 is a flow chart of a method for data protection in a cloud service system in accordance with the present invention.

第3圖為建立該壽命預期模型的工作流程。 Figure 3 shows the workflow for establishing this life expectancy model.

第4圖為建立該次7日損壞機率模型的工作流程。 Figure 4 shows the workflow for establishing the 7-day damage probability model.

第5圖為顯示該壽命預期模型與次7日損壞機率模型的輸入及輸出的表單。 Figure 5 is a table showing the input and output of the life expectancy model and the next 7th damage probability model.

第6圖為一形成模糊系統的方法的流程圖。 Figure 6 is a flow chart of a method of forming a fuzzy system.

第7圖顯示用於該些分級、損壞機率,與快照時間間隔的語言值及模糊規則。 Figure 7 shows the language values and fuzzy rules for the grading, damage probability, and snapshot time interval.

第8圖、第9圖,及第10圖顯示該模糊系統的隸屬函數。 Fig. 8, Fig. 9, and Fig. 10 show the membership functions of the fuzzy system.

本發明將藉由參照下列的實施方式而更具體地描述。 The invention will be more specifically described by reference to the following embodiments.

本發明揭露的方法適用於雲端服務系統的資料保護,該雲端服務系統為用於像電子郵件服務、影音串流、ERP系統...等工作負載運作的架構。本方法可應用的一個典型的雲端服務系統10如第1圖所示。雲端服務系統10包含一伺服器(主機)100與數個儲存設備200。伺服器100基本上具有一中央處理器101、一儲存設備輸出輸入單元102,一資料庫103, 及一網路輸出輸入單元104。中央處理器101管理雲端服務系統10的操作及其上運作的工作負載。同時,中央處理器101能追蹤及記錄經由儲存設備輸出輸入單元102來自儲存設備200及來自網路輸出輸入單元104的運作資料。儲存設備輸出輸入單元102是符合任何應用於雲端服務系統10的工業標準的硬體,用於內部資料傳輸。該工業標準可以是PCI Express(Peripheral Component Interconnect Express,快捷外設互聯標準),IDE(Integrated Device Electronics,整合裝置電路),SATA(Serial Advanced Technology Attachment,串行高技術配置),或USB(Universal Serial Bus,通用序列埠)。網路輸出輸入單元104是用於無線或有線連接至外部客戶端設備的硬體,該客戶端設備如個人電腦、平板電腦或智慧型手機。網路輸出輸入單元104可以是一USB埠、RJ45埠、光纖連接器、Wi-Fi模組,或藍芽模組。資料庫103指的是硬碟、固態硬碟或伺服器100的DRAM中,可永久或暫時性建立的資料庫或結構性資料,不直接讓工作負載進行存取,有利於本發明之應用。在本實施例中,有N個儲存設備200(一第一儲存設備200(1)、一第二儲存設備200(2)、一第三儲存設備200(3)、...及一第N儲存設備200(N))。 The method disclosed by the present invention is applicable to data protection of a cloud service system, which is an architecture for working with workloads such as email services, video streaming, ERP systems, and the like. A typical cloud service system 10 to which the method is applicable is shown in FIG. The cloud service system 10 includes a server (host) 100 and a plurality of storage devices 200. The server 100 basically has a central processing unit 101, a storage device output input unit 102, and a database 103. And a network output input unit 104. The central processing unit 101 manages the operation of the cloud service system 10 and the workloads it operates on. At the same time, the central processing unit 101 can track and record the operational data from the storage device 200 and from the network output input unit 104 via the storage device output input unit 102. The storage device output input unit 102 is hardware conforming to any industry standard applied to the cloud service system 10 for internal data transfer. The industry standard can be PCI Express (Peripheral Component Interconnect Express), IDE (Integrated Device Electronics), SATA (Serial Advanced Technology Attachment), or USB (Universal Serial) Bus, universal sequence 埠). The network output input unit 104 is a hardware for wirelessly or wiredly connecting to an external client device such as a personal computer, a tablet computer, or a smart phone. The network output input unit 104 can be a USB port, an RJ45 port, a fiber optic connector, a Wi-Fi module, or a Bluetooth module. The database 103 refers to a database or a structured data that can be permanently or temporarily established in a DRAM of a hard disk, a solid state hard disk or a server 100, and does not directly access a workload, which is advantageous for the application of the present invention. In this embodiment, there are N storage devices 200 (a first storage device 200 (1), a second storage device 200 (2), a third storage device 200 (3), ... and a Nth Storage device 200 (N)).

運作資料可以是性能資料、SMART(Self-Monitoring Analysis and Reporting Technology,自我監測分析和報告技術)資料,儲存設備200的可用容量、儲存設備200的總容量,或 設備元資料。該些性能資料為執行工作負載的雲端服務系統10之運作的物理性資訊。舉例來說,性能資料可以是延遲時間、流通量、CPU(Central Processing Unit,中央處理器)負載、記憶體使用量,或IOPS(Input/Output Per Second,每秒輸入/輸出操作次數),獲得的管道可通過連接到儲存設備200的儲存設備輸出輸入單元102、連接到外部客戶端設備的網路輸出輸入單元104,或直接來自中央處理器101的資料流。SAMRT資料用於以一序列的字碼(數字)指出可能即將發生的驅動器故障,可藉由在每一儲存設備200中安裝監控軟體而獲得。從而,依照SMART的定義,儲存設備200可以是硬碟或固態硬碟。性能資料與SMART資料以外的資料也可用於本發明,只要它們在所有儲存設備安裝前,當雲端服務系統10運行時可簡便地取得即可。 The operational data may be performance data, SMART (Self-Monitoring Analysis and Reporting Technology) data, available capacity of the storage device 200, total capacity of the storage device 200, or Device metadata. The performance data is physical information about the operation of the cloud service system 10 that executes the workload. For example, performance data can be latency, throughput, CPU (Central Processing Unit) load, memory usage, or IOPS (Input/Output Per Second). The pipeline may be connected to the storage device output input unit 102 of the storage device 200, the network output input unit 104 connected to the external client device, or the data stream directly from the central processing unit 101. The SAMRT data is used to indicate a possible drive failure with a sequence of words (numbers), which can be obtained by installing monitoring software in each storage device 200. Thus, in accordance with the definition of SMART, storage device 200 can be a hard disk or a solid state hard disk. Information other than the performance data and the SMART material can also be used in the present invention as long as they can be easily obtained when the cloud service system 10 is in operation before all the storage devices are installed.

以上為一標準可應用本發明的雲端服務系統。為了實現本發明所提出的方法,需要一運作資料搜集器110。在本實施例中,運作資料搜集器110為一個硬體,裝置於伺服器100中並與中央處理器101連接,用來蒐集運作資料並儲存蒐集到的運作資料於資料庫103(通常以資料庫型態存在)中。實作上,具有與硬體相同功能的軟體也能安裝到伺服器100中並藉中央處理器101操作。運作資料搜集器110與中央處理器101一起運作以執行本發明的步驟。 The above is a standard cloud service system to which the present invention can be applied. In order to implement the method proposed by the present invention, an operational data collector 110 is required. In this embodiment, the operational data collector 110 is a hardware device that is connected to the server 100 and connected to the central processing unit 101 for collecting operational data and storing the collected operational data in the database 103 (usually as data). The library type exists). In practice, a software having the same function as the hardware can also be installed in the server 100 and operated by the central processing unit 101. The operational data collector 110 operates in conjunction with the central processor 101 to perform the steps of the present invention.

請見第2圖,該圖為一種用於雲端服務系統10中資料保護的方法的流程圖。本方法的第一步驟為利用運作資料搜集器110蒐集雲端服務系統10中儲存設備200的歷史運作資料(S01)。在該方法應用前,雲端服務系統10可能已經運作了一段時間,蒐集到的運作資料可反映來自工作負載的資料存取的負荷(工作負載對儲存設備200存取的時間及頻率)。然而,如果沒有獲得歷史運作資料,如資料遺失,或該雲端服務系統10才剛完成架設,用於本方法的運作資料能藉由蒐集將於雲端服務系統10使用的各別儲存設備200之相關資料而取得。 Please refer to FIG. 2, which is a flow chart of a method for data protection in the cloud service system 10. The first step of the method is to collect the historical operation data of the storage device 200 in the cloud service system 10 by using the operation data collector 110 (S01). Before the application of the method, the cloud service system 10 may have been in operation for a period of time, and the collected operational data may reflect the load of data access from the workload (the time and frequency of the workload access to the storage device 200). However, if historical operational information is not obtained, such as missing data, or the cloud service system 10 has just been erected, the operational data for the method can be collected by collecting information about the respective storage devices 200 to be used by the cloud service system 10. And achieved.

本方法的第二步驟為由該些蒐集的運作資料建立一壽命預期模型與一次7日損壞機率模型(S02)。壽命預期模型與次7日損壞機率模型是以資料庫型態儲存於資料庫103中,執行並週期性地更新資料。建立壽命預期模型與次7日損壞機率模型的步驟如下所述。 The second step of the method is to establish a life expectation model and a 7-day damage probability model (S02) from the collected operational data. The life expectancy model and the next 7th damage probability model are stored in the database 103 in a database type, and the data is executed and periodically updated. The steps to establish a life expectancy model and a 7th damage probability model are as follows.

請見第3圖,該圖為建立該壽命預期模型的工作流程。首先,獲取雲端服務系統10儲存設備200已取得的歷史運作資料(S11)。歷史運作資料可一批接一批地獲得。也就是說,可能現有某些用以建立該壽命預期模型的儲存設備200的歷史運作資料已在一資料庫中,另一批取得的歷史運作資料剛新加入。新獲取的儲存設備200的歷史運作資料,舉例來說自半小時前,可被視作新的訓練材料以便預測結果能更接近真 實。運作資料用於建立及更新壽命預期模型並於未來更新它。可能需要一段時間等待“損壞的儲存設備”出現。本發明提供的方法需要知道隨時間演進儲存設備200的壽命分布。接著,區分該些儲存設備200為好的與損壞的(S12)。當一個儲存設備200是好的而可以工作,其蒐集的歷史運作資料僅反映儲存設備200能忍受的艱困環境(工作負載的應用、雲端服務系統10的管理模式、雲端服務系統10硬體的物理性狀態...等等)。如果儲存設備200是損壞的而無法運作,其蒐集的歷史運作資料就可視作它一生的紀錄。如果遭逢損壞的儲存設備200所經歷的相同情況且追蹤取得到相同或近似的運作資料,任何相同的儲存設備200可能會失效。對損壞的儲存設備200來說,以不同壽命範圍歸類之(S13)。此處,壽命範圍為連續的天數。舉例來說,由0天(DOA,Dead on Arrival,貨到即損壞)到90天、由91天到180天、由181天到270天...等等。基於到損毀前的工作天數,每一儲存設備200可分類至一壽命範圍。對好的儲存設備200來說,因為它們還屬健康,所以設定所有好的儲存設備200為一特定壽命範圍(S14)。該特定壽命範圍沒有上限。舉例來說,超過1081天。“1081天”可指的是雲端服務系統10已運行或該些好的儲存設備200已工作的時間。也就是說,好的儲存設備200已正常運作至少1081天。要強調的是特定壽命範圍的下限並不局限於1081天,其僅是一參考例子。 See Figure 3, which is the workflow for establishing this life expectancy model. First, the historical operation data that the cloud service system 10 has acquired by the storage device 200 is acquired (S11). Historical operational data can be obtained in batches. That is to say, it may be that some of the historical operational data of the storage device 200 for establishing the life expectancy model is already in one database, and another batch of historical operational data is newly added. The historical operational data of the newly acquired storage device 200, for example, can be regarded as a new training material from half an hour ago so that the predicted result can be closer to true. real. Operational data is used to build and update the life expectancy model and update it in the future. It may take a while for the "damaged storage device" to appear. The method provided by the present invention requires knowledge of the evolution of the life of the storage device 200 over time. Next, the storage devices 200 are distinguished as being good and damaged (S12). When a storage device 200 is good and can work, the collected historical operation data only reflects the difficult environment that the storage device 200 can tolerate (the application of the workload, the management mode of the cloud service system 10, and the hardware of the cloud service system 10) Physical state...etc.). If the storage device 200 is damaged and cannot function, the historical operational data collected may be regarded as a record of its lifetime. Any identical storage device 200 may fail if the same situation experienced by the damaged storage device 200 is encountered and the same or similar operational data is tracked. For the damaged storage device 200, it is classified into different life ranges (S13). Here, the life span is a continuous number of days. For example, from 0 days (DOA, Dead on Arrival) to 90 days, from 91 days to 180 days, from 181 days to 270 days... and so on. Each storage device 200 can be classified into a range of lifetimes based on the number of working days before the damage. For a good storage device 200, since they are still healthy, all of the good storage devices 200 are set to a specific life span (S14). There is no upper limit to this particular life span. For example, more than 1081 days. "1081 days" may refer to the time when the cloud services system 10 has been running or the good storage devices 200 have been in operation. That is to say, the good storage device 200 has been operating normally for at least 1081 days. It is emphasized that the lower limit of the specific life range is not limited to 1081 days, which is only a reference example.

接著,依照壽命範圍分級(bin)該些儲存設備200的運作資料至數個組中(S15)。資料分級是一種資料預先執行技術,用來降低次要觀察失誤效應。落於一給定的小區間,即一分級,的原始資料值由一代表該區間的數值所取代,該數值通常是一中間值,為一量化的形式。儲存設備200的運作資料,無論是好的或是損壞的,依照定義於步驟S15中的壽命範圍而分級。當一儲存設備200分級到一個組時,如bin#4(由271日到360日),所有運作資料也分級到該組中。為了簡化,分級數值(間隔的代表值)由第一個(由0日到90日)開始的順序。最後,對所有組常規化來自每一儲存設備200的運作資料(S16)。因為每一組(分級)中的儲存設備200可能不是相同的型態(固態硬碟或硬碟)或一致的型號(同一個製造商的相同特定或製造的型號),很重要的,建立用於預測的壽命預期模型需要是要以"蘋果對蘋果"的方式。預測應辨認特定型號而非所有型號(適用於固態硬碟可能不適用於硬碟;適用於512G固態硬碟可能不適用於1G固態硬碟;適用於Toshiba的1G固態硬碟可能不適用於Samsung的1G固態硬碟)。在上述步驟結束後,藉由顯示該組(分級)的結果,該壽命預期模型準備好為每一儲存設備200提供壽命預測。要強調的是該預測步驟能一天執行一次,雖然可能有24次用於訓練的運作資料收集。 Next, the operational data of the storage devices 200 are binned into a plurality of groups according to the lifetime range (S15). Data grading is a data pre-execution technique used to reduce secondary observational error effects. The raw data value falling within a given cell, ie, a hierarchy, is replaced by a value representing the interval, which is usually an intermediate value, in a quantized form. The operational data of the storage device 200, whether good or damaged, is ranked according to the life span defined in step S15. When a storage device 200 is grouped into a group, such as bin #4 (from 271 to 360 days), all operational data is also ranked into the group. For simplification, the grading values (representative values of the intervals) are in the order of the first one (from 0 to 90 days). Finally, the operational data from each storage device 200 is normalized for all groups (S16). Since the storage devices 200 in each group (classification) may not be of the same type (solid state hard disk or hard disk) or a consistent model (the same specific or manufactured model of the same manufacturer), it is important to establish The predicted life expectancy model needs to be in the "Apple vs. Apple" approach. Forecasts should identify specific models and not all models (applicable to solid state drives may not work on hard drives; 512G solid state drives may not be suitable for 1G solid state drives; 1G solid state drives for Toshiba may not be applicable to Samsung) 1G solid state hard drive). After the end of the above steps, the life expectancy model is ready to provide a life prediction for each storage device 200 by displaying the results of the group (hierarchy). It is important to emphasize that this forecasting step can be performed once a day, although there may be 24 operational data collections for training.

要注意的是,以上用於建立該壽命預期模型的說明稱為學習階段,意味著在預計的工作負載上線前或甚至該雲端服務系統10運作前,用於上述步驟的資料一再被使用。壽命預期模型可接著應用於一運行階段,在該運行階段中壽命預期模型運作考量在線工作負載的衝擊。運作壽命預期模型的步驟可簡化成獲取雲端服務系統10儲存設備200已取得的歷史運作資料200(S11);依照壽命範圍分級該些儲存設備200的運作資料至數個組中(S15);及對所有組常規化來自每一儲存設備200的運作資料(S16)。在這階段,只需重複步驟S11、步驟S15與步驟S16,預期分級即可由參照壽命預期模型而找出。 It is to be noted that the above description for establishing the life expectancy model is referred to as the learning phase, meaning that the data for the above steps is used repeatedly before the expected workload goes online or even before the cloud service system 10 operates. The life expectancy model can then be applied to an operational phase in which the life expectancy model operation considers the impact of the online workload. The steps of the operational life expectation model may be simplified to obtain the historical operational data 200 that the cloud service system 10 has acquired from the storage device 200 (S11); and the operational data of the storage devices 200 are classified into a plurality of groups according to the lifetime range (S15); The operation data from each storage device 200 is normalized for all groups (S16). At this stage, it is only necessary to repeat steps S11, S15 and S16, and the expected classification can be found by the reference life expectancy model.

至於建立該次7日損壞機率模型,請見第4圖,該圖為建立次7日損壞機率模型的工作流程。首先,獲取雲端服務系統10儲存設備200已取得的歷史運作資料200(S21)。相似地,歷史運作資料可以是一批接一批地獲得。可能現有某些用以建立該次7日損壞機率模型的儲存設備200的歷史運作資料已在一資料庫中,另一批取得的歷史運作資料剛新加入。新獲取的儲存設備200的歷史運作資料可被視作新的材料進行訓練以便預測結果能更接近真實。然而,並非所有的歷史運作資料都能使用。接著,該次7日損壞機率模型需要排序該些運作資料(S22),需要知道那些運作資料來自好的儲存設備,那些運作資料來自損壞的儲存設備。接著,對損壞的儲 存設備200取得由最近蒐集時點起算7天內的儲存設備200的運作資料(S23)及對數個隨機選取的好的儲存設備200取得由最近蒐集時點起算7天內的儲存設備200的運作資料(S24)。如果上一次蒐集的時點是一小時前,儲存設備200的運作資料應於一小時前開始蒐集,並於168小時後結束。很重要地,依照本發明,由最近蒐集時點起算7天內用於蒐集運作資料之損壞的儲存設備200與好的儲存設備200的比例為1:1,這樣能有平衡的方式來預測儲存設備200的損壞機率。因為好的儲存設備200數量一定大於損壞的儲存設備200數量,這是為何步驟S24需要數個隨機選取的好的儲存設備200,而非所有好的儲存設備200的理由。最後,常規化來自每一儲存設備200的運作資料(S25)。相同地,常規化使得損壞機率的預測對每一型號的儲存設備200更精準。依照本發明,壽命預期模型與次7日損壞機率模型營持續以未來新蒐集到的運作資料進行更新。蒐集儲存設備200歷史運作資料的時間間隔最好為1小時。 As for the establishment of the 7-day damage probability model, please refer to Figure 4, which is the workflow for establishing the 7th damage probability model. First, the history operation data 200 that the cloud service system 10 has acquired by the storage device 200 is acquired (S21). Similarly, historical operational data can be obtained in batches. It may be that some of the historical operational data of the storage device 200 used to establish the 7-day damage probability model has been in one database, and another batch of historical operational data has just been added. The historical operational data of the newly acquired storage device 200 can be viewed as a new material for training to predict that the results will be closer to reality. However, not all historical operational data can be used. Then, the 7-day damage probability model needs to sort the operational data (S22), and it is necessary to know that the operational data comes from good storage devices, and the operational data comes from damaged storage devices. Then, for the damaged storage The storage device 200 obtains the operation data of the storage device 200 within 7 days from the point of the recent collection (S23) and the logarithm of the randomly selected good storage devices 200 to obtain the operation data of the storage device 200 within 7 days from the point of the latest collection ( S24). If the last collection time is one hour ago, the operation data of the storage device 200 should be collected one hour ago and ended after 168 hours. Very importantly, according to the present invention, the ratio of the storage device 200 for collecting the damage of the operational data to the good storage device 200 within 7 days from the point of the recent collection is 1:1, so that the storage device can be predicted in a balanced manner. 200 damage probability. Since the number of good storage devices 200 must be greater than the number of damaged storage devices 200, this is why step S24 requires several randomly selected good storage devices 200, rather than all good storage devices 200. Finally, the operational data from each storage device 200 is normalized (S25). Similarly, conventionalization makes the prediction of damage probability more accurate for each model of storage device 200. In accordance with the present invention, the life expectancy model and the next 7 day damage probability model camp are continuously updated with operational data newly collected in the future. The time interval for collecting historical operational data of the storage device 200 is preferably one hour.

與壽命預期模型的場景相似,以上對次7日損壞機率模型的說明稱為學習階段,意味著在預計的工作負載上線前或甚至該雲端服務系統10運作前,用於上述步驟的資料一再被使用。該次7日損壞機率模型也可接著應用到一運行階段,在該運行階段中次7日損壞機率模型運作考量在線工作負載的衝擊。次7日損壞機率模型由獲取雲端服務系統10中儲存設 備200已取得的歷史運作資料200而運作。損壞機率可參照次7日損壞機率模型而獲得。 Similar to the scenario of the life expectancy model, the above description of the damage probability model for the next 7 days is called the learning phase, meaning that the data used for the above steps is repeatedly used before the expected workload goes online or even before the cloud service system 10 operates. use. The 7-day damage probability model can then be applied to an operational phase in which the next 7-day damage probability model operation considers the impact of the online workload. The 7th day damage probability model is obtained by acquiring the cloud service system 10 It is operated by the historical operation data 200 that has been obtained. The probability of damage can be obtained by referring to the 7th damage probability model.

本發明揭露用於雲端服務系統10中資料保護的方法的第三個步驟是為每個儲存設備200輸入過去24小時的運作資料到該壽命預期模型與次7日損壞機率模型中,以取得各組中預期壽命的範圍及對應的損壞機率(S03)。請見第5圖,該圖為顯示該壽命預期模型與次7日損壞機率模型的輸入及輸出的表單。在壽命預期模型與次7日損壞機率模型準備好提供預測之後,輸入該些運作資料。應有N筆運作資料但僅顯示3個用以說明。第一儲存設備200(1)具有的預期壽命的範圍落於bin#18(預期壽命由3061小時到3240小時)組中,且損壞機率為35%。第二儲存設備200(2)具有的預期壽命的範圍落於bin#21(預期壽命由3601小時到3780小時)組中,損壞機率為21%。第三儲存設備200(3)具有的預期壽命的範圍落於bin#2(預期壽命由181小時到360小時)組中,損壞機率為95%。似乎第三儲存設備200(3)具有較短的預期壽命及較高的機會在接著7天內損壞。從而,儲存在第三儲存設備200(3)中的資料應進行輩分,以免遺失。此為本發明所述方法的最後步驟:依照步驟S03的結果備份該些儲存設備中的資料(S04)。 The third step of the method for data protection in the cloud service system 10 is to input the operation data of the past 24 hours for each storage device 200 into the life expectancy model and the next 7 day damage probability model to obtain each The range of life expectancy in the group and the corresponding probability of damage (S03). See Figure 5, which is a table showing the input and output of the life expectancy model and the 7th damage probability model. After the life expectancy model and the next 7th damage probability model are ready to provide forecasts, enter the operational data. There should be N operating data but only 3 for illustration. The first storage device 200(1) has a life expectancy ranging from bin #18 (expected life from 3061 hours to 3240 hours) and the damage probability is 35%. The second storage device 200(2) has a life expectancy ranging from bin #21 (expected life from 3601 hours to 3780 hours) in a group with a damage probability of 21%. The third storage device 200(3) has a life expectancy ranging from bin #2 (expected life from 181 hours to 360 hours), and the damage probability is 95%. It appears that the third storage device 200(3) has a shorter life expectancy and a higher chance of damage within the next 7 days. Thus, the data stored in the third storage device 200(3) should be scored in order to avoid loss. This is the final step of the method of the present invention: the data in the storage devices is backed up according to the result of step S03 (S04).

依照本發明,壽命預期模型以一ANN(Artificial Neural Network,人工神經網絡)演算法計算該些輸入的過去 24小時的運作資料與歷史運作資料來預測該些預期壽命的範圍。相似地,次7日損壞機率模型也以一ANN演算法計算該些輸入的過去24小時的運作資料與歷史運作資料來預測對應的損壞機率。應用到壽命預期模型的ANN演算法與應用到次7日損壞機率模型的ANN演算法可以相同也可不同。現今有許多的ANN演算法都可以應用,只要它們能計算輸入的運作資料與取得的歷史運作資料間的參數。壽命預期模型指出一個組(分級號碼),次7日損壞機率模型為每一儲存設備200提供一個機率值。 According to the present invention, the life expectancy model calculates the past of the inputs by an ANN (Artificial Neural Network) algorithm. 24-hour operational data and historical operational data to predict the range of life expectancy. Similarly, the 7th damage probability model also calculates an operation probability and historical operation data of the past 24 hours of the input to predict the corresponding damage probability by an ANN algorithm. The ANN algorithm applied to the life expectancy model and the ANN algorithm applied to the 7th day damage probability model may be the same or different. There are many ANN algorithms available today, as long as they can calculate the parameters between the input operational data and the historical operational data obtained. The life expectancy model indicates a group (hierarchical number), and the next 7 day damage probability model provides a probability value for each storage device 200.

如果一個新形式的儲存設備新的儲存設備,如一第N+1儲存設備200(N+1)將要用於雲端服務系統10但雲端服務系統10沒有它的任何紀錄,依照本發明,在步驟S01後應多加一個步驟:蒐集儲存設備200的歷史運作資料,其中該些儲存設備200為全新或剛加入雲端服務系統10。(S01’)。如上所述,在第N+1儲存設備200(N+1)上線之前,其餘的儲存設備200(1)到200(N)已在過去時間裡蒐集了許多的運作資料,已有了一個現存的壽命預期模型及一個現存的次7日損壞機率模型。第N+1儲存設備200(N+1)的歷史運作資料可從其它資料中心或測試處取得並發佈給雲端服務系統10。於步驟S01到S02完成後,可以創建一個新的壽命預期模型及一個新的次7日損壞機率模型。對該第N+1儲存設備200(N+1)來說,必須決定哪一組模型(現存的或新的)能較精準預測其 性能。這判斷可以由伺服器100管理人員手動處理,也可以由運作資料搜集器110執行。運作資料搜集器110扮演類似仲裁者的角色,且基於第N+1儲存設備200(N+1)的性能於未來做出決定。到做出決定的時間點可能會很久。在決定發出前,現存的模型或新的模型都能被用作預設模型於雲端服務系統10中執行。如果運作資料搜集器110發現兩組模型對該第N+1儲存設備200(N+1)提出的預測都偏離實際情況,運作資料搜集器110可依照本方法的步驟,決定再創建更鮮的一組模型,直到一組模型提出的預測落入可接受範圍為止。 If a new form of storage device new storage device, such as an N+1th storage device 200 (N+1) is to be used for the cloud service system 10 but the cloud service system 10 does not have any of its records, in accordance with the present invention, in step S01 One more step should be followed: collecting historical operational data of the storage device 200, wherein the storage devices 200 are brand new or just joined the cloud service system 10. (S01'). As described above, before the N+1th storage device 200(N+1) goes online, the remaining storage devices 200(1) to 200(N) have collected a lot of operational data in the past time, and there is already an existing one. The life expectancy model and an existing 7-day damage probability model. The historical operation data of the (N+1)th storage device 200 (N+1) can be obtained from other data centers or test sites and distributed to the cloud service system 10. After steps S01 to S02 are completed, a new life expectancy model and a new 7-day damage probability model can be created. For the N+1th storage device 200(N+1), it must be decided which set of models (existing or new) can accurately predict its performance. This determination can be manually processed by the server 100 manager or by the operational data collector 110. The operational data collector 110 acts like an arbiter and makes decisions based on the performance of the N+1th storage device 200 (N+1) in the future. It may take a long time to make a decision. The existing model or the new model can be used as a preset model to be executed in the cloud service system 10 before the decision is issued. If the operational data collector 110 finds that the predictions of the two sets of models for the (N+1)th storage device 200 (N+1) deviate from the actual situation, the operational data collector 110 may decide to create a more fresh one according to the steps of the method. A set of models until the predictions presented by a set of models fall within an acceptable range.

對資料保護而言,備份具較高損壞機率或預期有較短壽命的儲存設備200中的資料是非常重要的。唯一要注意的是備份的頻率(在本例中為是否於次日進行備份)。一個簡單的作法來實現步驟S04可以是為具有預期壽命低於一對應特定壽命及/或具有損壞機率超過一特定百分比的儲存設備200內的資料進行備份。舉例來說,對第一儲存設備200(1)來說,因為它是一個固態硬碟,能設定一旦落入bin#18範圍且損壞機率預測超過90%時,執行資料備份。因為第5圖中的損壞機率僅為35%,第一儲存設備200(1)中的資料將不會在13:45,2016/5/12到13:45,2016/5/13間進行備份。時間間隔不限於1天(24小時),這將於下文中決定及說明。 For data protection, it is very important to back up the data in the storage device 200 with a higher probability of damage or a shorter life expectancy. The only thing to note is the frequency of backups (in this case, whether to back up the next day). A simple implementation to implement step S04 may be to back up data within the storage device 200 having an expected life of less than a corresponding specified lifetime and/or having a probability of damage exceeding a certain percentage. For example, for the first storage device 200(1), since it is a solid state hard disk, it can be set to perform data backup once it falls within the bin#18 range and the damage probability is predicted to exceed 90%. Because the damage probability in Figure 5 is only 35%, the data in the first storage device 200(1) will not be backed up between 13:45, 2016/5/12 to 13:45, 2016/5/13. . The time interval is not limited to one day (24 hours), which will be determined and explained below.

當然,備份可以是對儲存設備200進行快照。在另一實施例中,本發明提供另一個步驟強化說明步驟S04:由執行 具有計算得到的快照時間間隔的快照作業為儲存設備200內的資料進行備份(S04’)。快照時間間隔由輸入步驟S03的結果到一模糊系統中而計算得到。應用的模糊系統由以下步驟所建立(請參閱第6圖):定義用於該些分級、損壞機率,及快照時間間隔的語言值(S31);建構隸屬函數以描述所有分級、損壞機率,及快照時間間隔的程度(S32);及為該些分級、損壞機率,及快照時間間隔建構模糊規則(S33)。為了有較佳的理解,請見第7圖。 Of course, the backup can be a snapshot of the storage device 200. In another embodiment, the present invention provides another step enhancement description step S04: by execution The snapshot job having the calculated snapshot time interval is backed up for the data in the storage device 200 (S04'). The snapshot time interval is calculated by inputting the result of step S03 into a fuzzy system. The applied fuzzy system is established by the following steps (see Figure 6): defining the language values for the grading, the probability of damage, and the snapshot interval (S31); constructing a membership function to describe all grading, probability of damage, and The degree of the snapshot interval (S32); and the fuzzy rules are constructed for the grading, the probability of damage, and the snapshot interval (S33). For a better understanding, see Figure 7.

第7圖顯示用於該些分級、損壞機率,及快照時間間隔的語言值與模糊規則。用於分級(預期壽命)的語言值為非常長、長、中性、短,及非常短。用於損壞機率的語言值為可能、中性,及不可能。用於快照時間間隔的語言值為非常長、長、中性、短,及非常短。模糊規則說明於每個分級列與每個損壞預期的欄位中。舉例來說,如果預期壽命為”長”且損壞機率是”可能”,那麼快照時間間隔為”短”。模糊規則的定義基於雲端服務系統10上執行的工作負載之政策而建立。模糊規則可依照工作負載的需求(SLA)而改變。為分級、損壞機率,及快照時間間隔描述程度的隸屬函數各顯示於第8圖、第9圖,與第10圖中。 Figure 7 shows the language values and fuzzy rules for these grading, damage probability, and snapshot time intervals. The language values used for grading (expected life) are very long, long, neutral, short, and very short. The language value used to damage the probability is possible, neutral, and impossible. The language values used for snapshot intervals are very long, long, neutral, short, and very short. The fuzzy rules are stated in each of the hierarchical columns with the expected field for each damage. For example, if the life expectancy is "long" and the probability of damage is "likely", the snapshot interval is "short". The definition of the fuzzy rules is established based on the policy of the workload executed on the cloud service system 10. Fuzzy rules can vary depending on the workload's needs (SLAs). The membership functions for the classification, the probability of damage, and the degree of description of the snapshot interval are shown in Figures 8, 9, and 10, respectively.

模糊系統的操作步驟說明如下。首先,接收一分級與一損壞機率(S41)該分級與損壞機率為執行步驟(S03)之結果。接著,由輸入該分級與損壞機率到該些模糊規則的 隸屬函數以執行模糊化、模糊推理,與結果匯總(S42)。許多現有技術可實現糊化、模糊推理,與結果匯總,本發明並未限定之。如同其它的模糊系統,最後解模糊化以得到一快照時間間隔(S43)。相似地,解模糊化的方式可依照進行模糊化的方式而不同,這也不為本發明所限制。計算的快照時間間隔可立即應用到一個儲存設備200上。當然,所有儲存設備200各自的快照時間間隔可決定為一天一次,甚至快照時間間隔為0(目前不需要進行資料保護)。 The operation steps of the fuzzy system are explained below. First, receiving a grading and a probability of damage (S41), the grading and damage probability is a result of performing the step (S03). Then, by inputting the grading and damage probability to the fuzzy rules The membership function performs blurring, fuzzy reasoning, and summary of results (S42). Many prior art techniques can achieve gelatinization, fuzzy reasoning, and summary of results, and the invention is not limited thereto. Like other fuzzy systems, the final blurring is performed to obtain a snapshot time interval (S43). Similarly, the manner of defuzzification may vary according to the manner in which the fuzzification is performed, which is not limited by the invention. The calculated snapshot time interval can be immediately applied to one storage device 200. Of course, the snapshot interval of each storage device 200 can be determined to be once a day, or even the snapshot interval is 0 (data protection is not currently required).

雖然上述的實施例是由一個雲端服務系統中蒐集資料以供訓練及更新壽命預期模型與次7日損壞機率模型,但不須僅限應用該些模型於雲端服務系統中。在更廣泛的應用中,壽命預期模型與次7日損壞機率模型可以在一資料中心或雲端服務系統中進行訓練,並接著應用到其它具有相同或類似配置的儲存設備的雲端服務系統中,有利用有限資源實現本方法的優點。 Although the above embodiment collects data from a cloud service system for training and updating the life expectancy model and the next 7-day damage probability model, it is not necessary to apply only these models to the cloud service system. In a wider range of applications, the life expectancy model and the 7th damage probability model can be trained in a data center or cloud service system and then applied to other cloud service systems with the same or similar configuration of storage devices. The advantages of the method are achieved with limited resources.

雖然本發明已以實施方式揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明之精神和範圍內,當可作些許之更動與潤飾,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the invention, and those skilled in the art can make some modifications and refinements without departing from the spirit and scope of the invention. The scope of the invention is defined by the scope of the appended claims.

Claims (15)

一種用於雲端服務系統中資料保護的方法,包含步驟:A.蒐集雲端服務系統中複數個儲存設備的歷史運作資料;B.利用所蒐集到的該些歷史運作資料建立一壽命預期模型與一次7日損壞機率模型,其中將該些儲存設備區分為好的與損壞的,將該些儲存設備中損壞的儲存設備以不同壽命範圍進行歸類,並將該些儲存設備中好的儲存設備設定為一特定壽命範圍,再依照該些壽命範圍對該些儲存設備中的該歷史運作資料進行分級(binning),區分成複數個組,每一組代表不同的壽命範圍,並對所有該複數個組中來自每一儲存設備的運作資料進行常規化,以建立該壽命預期模型;C.將每個儲存設備於過去24小時的運作資料儲存到該壽命預期模型與次7日損壞機率模型中,以取得各組中預期的壽命範圍及對應的損壞機率;及D.依照步驟C的結果備份該些儲存設備中的資料,其中該次7日損壞機率模型由以下步驟所建立:將該些歷史運作資料進行排序;從損壞的儲存設備及複數個隨機選取的好的儲存設備當中,取得由最近蒐集時點起算7天內的儲存設備的運作資料;及 將來自每一儲存設備的運作資料進行常規化。 A method for data protection in a cloud service system, comprising the steps of: A. collecting historical operation data of a plurality of storage devices in the cloud service system; B. establishing a life expectation model and using the collected historical operation data 7th damage probability model, in which the storage devices are classified as good and damaged, the damaged storage devices in the storage devices are classified into different life ranges, and the good storage devices in the storage devices are set For a specific life range, the historical operation data in the storage devices are binned according to the life ranges, and are divided into a plurality of groups, each group representing a different life range, and all of the plurality of The operational data from each storage device in the group is normalized to establish the life expectancy model; C. The operational data of each storage device in the past 24 hours is stored in the life expectancy model and the 7th damage probability model. To obtain the expected life range and corresponding damage probability in each group; and D. Back up the data in the storage devices according to the result of step C The 7-day damage probability model is established by the following steps: sorting the historical operation data; from the damaged storage device and a plurality of randomly selected good storage devices, the data is obtained within 7 days from the time of the recent collection. Information on the operation of storage equipment; and The operational data from each storage device is normalized. 如申請專利範圍第1項所述的方法,其中該運作資料為性能資料、SMART(Self-Monitoring Analysis and Reporting Technology,自我監測分析和報告技術)資料、該些儲存設備的可用容量、該些儲存設備的總容量,或設備元資料。 The method of claim 1, wherein the operational data is performance data, SMART (Self-Monitoring Analysis and Reporting Technology) data, available capacity of the storage devices, and the storage The total capacity of the device, or device metadata. 如申請專利範圍第2項所述的方法,其中該性能資料為延遲時間、流通量、CPU(Central Processing Unit,中央處理器)負載、記憶體使用量,或IOPS(Input/Output Per Second,每秒輸入/輸出操作次數)。 The method of claim 2, wherein the performance data is delay time, throughput, CPU (Central Processing Unit) load, memory usage, or IOPS (Input/Output Per Second, each Second input/output operation times). 如申請專利範圍第1項所述的方法,其中該儲存設備為硬碟或固態硬碟。 The method of claim 1, wherein the storage device is a hard disk or a solid state hard disk. 如申請專利範圍第1項所述的方法,其中該壽命預期模型與次7日損壞機率模型以未來所新蒐集到的運作資料持續進行更新。 The method of claim 1, wherein the life expectancy model and the seventh-day damage probability model are continuously updated with operational data newly collected in the future. 如申請專利範圍第1項所述的方法,其中蒐集儲存設備的歷史運作資料的時間間隔為1小時。 The method of claim 1, wherein the historical operation data of the storage device is collected at an interval of one hour. 如申請專利範圍第1項所述的方法,其中由最近蒐集時點起算7天內用於蒐集運作資料之損壞的儲存設備與好的儲存設備的比例為1:1。 The method of claim 1, wherein the ratio of the storage device used to collect the damage of the operational data to the good storage device within 7 days from the point of recent collection is 1:1. 如申請專利範圍第1項所述的方法,進一步包含步驟:A1.蒐集儲存設備的歷史運作資料,其中該些儲存設備為全新或剛加入該雲端服務系統。 The method of claim 1, further comprising the step of: A1. Collecting historical operation data of the storage device, wherein the storage devices are brand new or just joined the cloud service system. 如申請專利範圍第1項所述的方法,其中該壽命預期模型以一ANN(Artificial Neural Network,人工神經網絡)演算法計算該些輸入的過去24小時的運作資料與歷史運作資料來預測該些預期壽命的範圍。 The method of claim 1, wherein the life expectancy model uses an ANN (Artificial Neural Network) algorithm to calculate the input of the past 24 hours of operational data and historical operational data to predict the The range of life expectancy. 如申請專利範圍第1項所述的方法,其中該次7日損壞機率模型以一ANN演算法計算該些輸入的過去24小時的運作資料與歷史運作資料來預測對應的損壞機率。 The method of claim 1, wherein the 7-day damage probability model calculates the input operation data and historical operation data of the past 24 hours by an ANN algorithm to predict a corresponding damage probability. 如申請專利範圍第1項所述的方法,其中該步驟D進一步為具有預期壽命低於一對應特定壽命及/或具有損壞機率超過一特定百分比的儲存設備內的資料進行備份。 The method of claim 1, wherein the step D further backs up data in a storage device having an expected life of less than a corresponding specific lifetime and/or having a probability of damage exceeding a certain percentage. 如申請專利範圍第1項所述的方法,其中該步驟D進一步由執行具有計算得到的快照時間間隔的快照作業為儲存設備內的資料進行備份。 The method of claim 1, wherein the step D further performs backup of the data in the storage device by performing a snapshot job having the calculated snapshot interval. 如申請專利範圍第12項所述的方法,其中該快照時間間隔由輸入步驟C的結果到一模糊系統中而計算得到。 The method of claim 12, wherein the snapshot time interval is calculated by inputting the result of step C into a fuzzy system. 如申請專利範圍第13項所述的方法,其中該模糊系統由以下步驟形成:E1.定義用於該些分級(bins)、損壞機率,及快照時間間隔的語言值; E2.建構隸屬函數以描述所有分級、損壞機率,及快照時間間隔的程度;及E3.為該些分級、損壞機率,及快照時間間隔建構模糊規則。 The method of claim 13, wherein the fuzzy system is formed by the following steps: E1. defining language values for the bins, damage probability, and snapshot time interval; E2. Construct a membership function to describe all the grading, damage probability, and the extent of the snapshot interval; and E3. Construct fuzzy rules for the grading, damage probability, and snapshot interval. 如申請專利範圍第14項所述的方法,其中該模糊系統包含運作步驟:F1.接收一分級與一損壞機率;F2.由輸入該分級與損壞機率到該些模糊規則的隸屬函數以執行模糊化、模糊推理,與結果匯總;及F3.解模糊化以得到一快照時間間隔。 The method of claim 14, wherein the fuzzy system comprises an operation step: F1. receiving a classification and a probability of damage; F2. performing a fuzzy by inputting the classification and damage probability to a membership function of the fuzzy rules to perform blurring , fuzzy reasoning, and summary of results; and F3. Defuzzification to get a snapshot time interval.
TW105124795A 2016-08-04 2016-08-04 Method for data protection in cloud-based service system TWI608358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW105124795A TWI608358B (en) 2016-08-04 2016-08-04 Method for data protection in cloud-based service system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW105124795A TWI608358B (en) 2016-08-04 2016-08-04 Method for data protection in cloud-based service system

Publications (2)

Publication Number Publication Date
TWI608358B true TWI608358B (en) 2017-12-11
TW201810063A TW201810063A (en) 2018-03-16

Family

ID=61230825

Family Applications (1)

Application Number Title Priority Date Filing Date
TW105124795A TWI608358B (en) 2016-08-04 2016-08-04 Method for data protection in cloud-based service system

Country Status (1)

Country Link
TW (1) TWI608358B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201014262A (en) * 2008-09-23 2010-04-01 Apacer Technology Inc Remote monitoring system and the monitoring method using the same
TW201301024A (en) * 2011-06-23 2013-01-01 Aten Int Co Ltd Active server monitoring apparatus and active server monitoring method
CN103297264A (en) * 2013-04-19 2013-09-11 无锡成电科大科技发展有限公司 Cloud platform failure recovery method and system
CN104767806A (en) * 2015-03-31 2015-07-08 重庆大学 Method, device and system for backup of cloud data central task
US20150193325A1 (en) * 2013-06-19 2015-07-09 Continuware Corporation Method and system for determining hardware life expectancy and failure prevention
TWI510916B (en) * 2015-02-05 2015-12-01 緯創資通股份有限公司 Storage device lifetime monitoring system and storage device lifetime monitoring method thereof
CN105204961A (en) * 2015-09-21 2015-12-30 重庆大学 Method, device and system for setting check point of cloud data center host

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201014262A (en) * 2008-09-23 2010-04-01 Apacer Technology Inc Remote monitoring system and the monitoring method using the same
TW201301024A (en) * 2011-06-23 2013-01-01 Aten Int Co Ltd Active server monitoring apparatus and active server monitoring method
CN103297264A (en) * 2013-04-19 2013-09-11 无锡成电科大科技发展有限公司 Cloud platform failure recovery method and system
US20150193325A1 (en) * 2013-06-19 2015-07-09 Continuware Corporation Method and system for determining hardware life expectancy and failure prevention
TWI510916B (en) * 2015-02-05 2015-12-01 緯創資通股份有限公司 Storage device lifetime monitoring system and storage device lifetime monitoring method thereof
CN104767806A (en) * 2015-03-31 2015-07-08 重庆大学 Method, device and system for backup of cloud data central task
CN105204961A (en) * 2015-09-21 2015-12-30 重庆大学 Method, device and system for setting check point of cloud data center host

Also Published As

Publication number Publication date
TW201810063A (en) 2018-03-16

Similar Documents

Publication Publication Date Title
US10157105B2 (en) Method for data protection for cloud-based service system
US10567226B2 (en) Mitigating risk and impact of server-change failures
Ganguly et al. A practical approach to hard disk failure prediction in cloud platforms: Big data model for failure management in datacenters
JP2019053474A (en) Data protecting method for cloud-based service
US20140136901A1 (en) Proactive risk analysis and governance of upgrade process
US9535981B2 (en) Systems and methods for filtering low utility value messages from system logs
AU2019216636A1 (en) Automation plan generation and ticket classification for automated ticket resolution
US20170149690A1 (en) Resource Aware Classification System
CN105511957A (en) Method and system for generating work alarm
US12001968B2 (en) Using prediction uncertainty quantifier with machine learning classifier to predict the survival of a storage device
US20210366268A1 (en) Automatic tuning of incident noise
Bogojeska et al. Classifying server behavior and predicting impact of modernization actions
US11188409B2 (en) Data lifecycle management
US11561875B2 (en) Systems and methods for providing data recovery recommendations using A.I
Hemmat et al. SLA violation prediction in cloud computing: A machine learning perspective
Ardimento et al. Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time
KR20210108874A (en) Systems and methods for predicting storage device failure using machine learning
CN108021484B (en) Method and system for prolonging expected life value of disk in cloud service system
Xu et al. General feature selection for failure prediction in large-scale SSD deployment
US10748074B2 (en) Configuration assessment based on inventory
RU2632124C1 (en) Method of predictive assessment of multi-stage process effectiveness
TWI590052B (en) Data storage device monitoring
TWI608358B (en) Method for data protection in cloud-based service system
Nikiforov Clustering-based anomaly detection for microservices
Su et al. Big data preventive maintenance for hard disk failure detection