TW201224805A - A method of building the index of the data blocks - Google Patents

A method of building the index of the data blocks Download PDF

Info

Publication number
TW201224805A
TW201224805A TW99144092A TW99144092A TW201224805A TW 201224805 A TW201224805 A TW 201224805A TW 99144092 A TW99144092 A TW 99144092A TW 99144092 A TW99144092 A TW 99144092A TW 201224805 A TW201224805 A TW 201224805A
Authority
TW
Taiwan
Prior art keywords
block
data
file
hash value
index
Prior art date
Application number
TW99144092A
Other languages
Chinese (zh)
Inventor
Yun-Song Wang
Ming-Sheng Zhu
Chih-Feng Chen
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to TW99144092A priority Critical patent/TW201224805A/en
Publication of TW201224805A publication Critical patent/TW201224805A/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method of building the index of the data blocks for the data deduplication process. The method comprises of the steps. Loading a index file. The index file includes a plurality of a location block. Each location block includes a plurality of a storage item, and the storage item saves a main hash value of the corresponds of the data block. Performing a first hash process for each the main hash value and outputs a block number. Performing a second hash process to the main hash value and outputs a item number. Loading a location list for checking the item number in the location list whether or not. If the item number is not in the location list, the main hash value writes into the location list.

Description

201224805 六、發明說明: 【發明所屬之技術領域】 一種應用在重複 程序後所產生的 種建立數據區塊的索引方法,特別有關於 數據刪除程序之中,將經過重複數獅除的切分 數據區塊相應的建立數據區塊的索引方法。 【先前技術】 重複數制除是-健據縮減技術,通^於基於磁盤的備 刀糸統’主要目的在於減少存儲紐巾使用的存儲容量。它的工 作方式是在某個時間週_查找不同射不同位置的重複可變大 小數魏。缝的崎塊用指稍取代。由於麵系財總是充 斥著大量的職數#。為雜決這侧題,_錢乡郎,「重複 刪除」技術便順理成章地成了人們關注的焦點。採^重複刪Z =術可以將存儲的數據縮減為原來的·,從而讓出更多的備^ ^間’不僅可以使存儲系統上的備份數據保存更長的時間,而且 還可以節約離線存儲時所需的大量的帶寬。 明參考「第1圖」所不,其係為習知技術之重複數據刪除的 存取的示賴。為能有效的掌控已儲存的文件數據,因此在飼服 端中會透過哈希(Hash)列絲記錄各輸入文件的數據區塊。在哈希 列表中記錄了數據區塊所相應的哈希值。由於哈希演算法具有單 向轉換(One-Way transform)的特點,所以每一個數據區塊必然只有 一組唯一的哈希值。重複刪除程序也藉此特性,將相同哈希值的 數據區塊視為相同的。所以在儲存設備中只要存儲一份數據區 201224805 塊,並記錄不同文件中相同的數據區塊的對應關係即可。 之二=曰:增加的資料量’也將使得啥希列表的長度也隨 之曰力m將轉縣狀畴㈣ 的時間也會拉長。 【發明内容】 餐於以上的問題’本發明在於提供—種建立數據區塊的索弓^ 方法’應縣錢數據猶料之巾,將_錢輯刪除中的 •切分程序後,所產生的數據區塊建立相應的索引文件。 為達上述目的,本發明所揭露之建立數據區塊的索引方法包 括以下步驟:載人索引文件’在料文件包括多個位置區塊,每 置區兔中更^括夕個存儲攔位’每—存儲欄位記錄數據區塊 所相應的主哈希值;對數據區塊的一主哈希值進行第一哈希程 序,計算區塊編號,·對同-數據區塊的主哈希值進行第二哈希程 序’計算攔位編號;載入位置衝突列表;將攔位編號與位置衝突 •列表中的攔位編號進行比對,查找位置衝突列表中是否已經存儲 有相同的嫌編號;若位置触列表巾碎在嫌編號時,則將 主哈希值寫入相應的區塊編號與攔位編號之中。 本發明所提出崎層式㈣文件㈣記騎顧塊的所在位 置,藉以提高錢數翻除程序在贿(或㈣惟射引文件的 存取效率。 有關本發賴椒與實作,兹配合圖式作最佳實施例詳細說 明如下。 201224805 【實施方式】 請參考「第2圖」所示,其係為本發明之架構示意圖。禅 明包括客戶端210與舰端22G。客戶端21〇可以通過網際網路 __或企業内網(i咖net)的方式連接於飼服端22〇,也^以將 客戶端210與伺服端220同時運行於同一台計算機裝置上。而客 戶端210用以對所輸入的文件進行重複數據刪除程序,並透^司 服端根據本發明將產生相應的輸人文件的該些數據區塊的索引文 件 221。 ’、 在祠服端22〇中存儲索引文件功與位_突列表222。索引 文件22〗記錄多組數據區塊的哈希值。為能提高索引文件221、的 查找效率,並降低索引文件奶在内存或高速緩存間的存取時間。 因此提出索引文件221的建立方法,請同時參考「第3a圖」與「第 3B圖」所不,其係分別為本發明之糾文件建立流程示意圖與索 弓I文件架構示意圖。 步驟S310:載入索引文件,在索引文件包括多個位置區塊, 每一位置區塊中更包括多個存儲攔位,每一存儲 攔位記錄數據區塊所相應的主哈希值; 步驟S320 :對數據區塊的主哈希值進行第一哈希程序,計算 區塊編號; 步驟S33G :對同—數據區塊的主哈希值進行第二哈希鞋序, 計鼻攔位編號; 步驟S340 :建立位置衝朗表,用以記錄攔位編餘同者; 201224805 步驟測:=物細駿物_編號進行比 們:―找位置衝突列表中是否已經存儲有相同的 攔位編號;以及 步驟⑽:若位置衝突列表林存在欄位編號時,則將主哈 希值寫入相應的區塊編號與攔位編號之中。 位置圖」所示,㈣文件221包括多個位置區塊,每一 立^塊中更包括多個存储攔位,每—存儲襴位記錄數據區塊所 相應的主哈希值。在索引文件功中的存储襴位均是定長。在本 發明中存儲欄位的數量透過下式i所產生: N=位置區塊的容量/存儲襴位的容量 式1 N:存儲攔位的數量。 式2 而位置區塊的數量係由式2所產生: 數據區塊的數量 Μ:位置區塊的數量。 索引文件221被劃分成多個容量為固定大小的位置區塊(以下 係以Μ個位置區塊作為說明)。數據區塊對應的主哈希值(可透過 SHA1或SHA256演算法得到)進行第一哈希程序的處理,使區塊 編號能散顺Μ舰塊顧的細之内。為能達絲落於μ個區 塊編號的翻之目的,可轉主哈希值透過概計算(_),使得 主哈希值的餘數可以確定落於Μ個區塊編號的範圍之内(如「第 3Β圖」所示,用以選擇相應的位置區塊)。第一哈希程序所産生的 哈希值只用於分配主哈希值的存儲分配,所以其計算結果(區塊編 201224805 號)是不會占用實際的内存和硬盤空間。 著再對主哈希值的做第一哈希程序,用以將所產生的第 二哈希值作為相應數據區塊的攔位編號。·編號用以標示在區 〜中的特痛位。聰,為能使攔位標號散落在N個存儲搁 位的範圍之内(如「第3B圖」所示,用以選擇相應的存儲觸, 可以將主哈倾透過概計算(職j)。#主哈輕_N的模數計 异後’主哈希_餘歸僅會分布于_存賴⑽範圍之内。 如此一來,則完成索引文件221的建立。 >考第4圖」所示’其係為本發明之查詢索引文件奶 之流程示意圖。查詢索引文件221係包括以下步驟: 步驟S4H)··客戶端接收區塊查詢請求,用以查詢索引文件十 是否存在相應的數據區塊; 步驟S420 1索引文件中不存在區塊查詢請求所要查詢的數 據區塊時,則在内存中產生暫存索引文件,並在 暫存索引文件中記錄數據區塊被查詢的次數;以 及 步驟S430 :當數據區塊被查詢的次數符合門檻值時,則於索 引文件中建缝魏塊的域區塊編號與棚位編 號。 首先’客戶端210向伺服端22〇發出對一輸入文件的查詢要 求時’伺服端220根據索引文件221簡入文件進行比對是否在 伺服端220中已經存在有相同的數據區塊。 201224805 如果欲查詢的第二哈希值已見于索引文件221之中(意即經過 第二哈希程序的主哈希值),則把主哈希值的攔位編號都保存在位 置衝大列表222中。將攔位編號記錄于位置衝突列表a】中並且 利用位址指針記錄欄位編號所相應的數據區塊。換言之,就是以 鏈表的記錄方式’將每-條記錄都有—個字段記錄與主哈希值相 同的下-條記錄的記錄號。如果在記錄號的後面沒有衝突的記錄 這個字段值時,則可以將此一記錄號設置成無效值。 • 當第二哈希值出現與之前的主哈希值重複衝突時,對主哈希 值再進行-次哈希將其散列在位置衝突列表222巾。在本發财 鍊表的處理程序可以透過下述方式所實現:假設對主哈希值取N 的模數運算(福),則位置衝突列表222的項次數量即爲N個,並 5己錄號碼 ------- 主哈希值 ---—--- 記錄號~ ' ' 1 -------- 主哈希值1 ——~~~—__ N+1 2 —------ 主哈希值2 ----- ~-—~~~__ 無效值0 3 〜——--- 主哈希值3 -------- --~…_ 無效值0 ----- N 〜———一 主哈希值N ----- 無效值0 ' N+1 主哈希值N+1 ------_ N+3 N+2 ——- 主哈希值N+2 無效值0 N+3 主哈希值N+3 無效值0 .- ---- 表1.位置衝突列表 201224805 首先對“主哈希值1”取N的餘數後並將其第二哈希值存入 位置衝突列表222的第一條記錄中。然而,“主哈希值N+i”取 N的餘數後的第二哈希值是也會對應第一條記錄,因此就會產生 了重複的衝突。 這時該第一條記錄已有内容(其内容為“主哈希值丨”),並且 兩個主哈希值不同(分別為“主哈希值Γ與“主哈希值N+1”)。 因此主哈希值Ν+Γ的第二哈希值會被添加到位置衝突列表 222的尾部,並將其記錄號“主哈希值N+1,’記入第一條記錄中, 以進行關聯。 同理,假設“主哈希值N+3”對N取餘後同樣與“主哈希值 1會發生衝突’而其記錄的衝突記錄號“Ν+Γ找到“主哈希值 N+1比較後,主哈希值也不相同,則又被添加到位置衝突列表 222。並且將记錄號“主哈希值n+3”記錄在記錄“主哈希值 N+1中用以關聯。記錄“N+2”添加過程相同。而在位置衝突列 表222中的無效值〇之§己錄號用以表示此記錄後面不存在衝 突記錄。 當查詢的主哈希值是新數據時則不立即進行寫入硬盤的動 作,而疋先將主哈希值保存在高速緩存中。伺服端Mo會執行計 數的動作,等待新數據的超過門檻值或者高速緩存的容量超過一 定大小時才進行寫入硬盤的動作。這樣就能避免頻繁的寫盤動作。 本發明所提出的階層式索引文件221用以記錄數據區塊的所 在位置,藉以提高重複數據刪除程序在内存(或硬盤中)查找索引文 201224805 件221的存取效率。 雖然本發明以前述之較佳實施例揭露如上,然其並非、 定本發明’任何熟習相像技藝者’在不脫離本發明之精神:: 内,當可作些許之更動麵飾,因此本發明之專利保護範圍須視 本說明書所附之申請專利範圍所界定者為準。 【圖式簡單說明】 第1圖係為習知技術之重複數據.的存取的示意圖。 第2圖係為本發明之架構示意圖。 ★第3A圖係為本發明之索引文件建立流程示意圖。 第3B圖係為本發明之索引文件架構示音圖。 第4圖係為本發明之查詢索引文件之流Γ示意圖。 【主要元件符號說明】 客戶端210 伺服端220 索引文件221 位置衝突列表222 11201224805 VI. Description of the invention: [Technical field to which the invention pertains] An indexing method for establishing a data block generated by an application after repeating a program, and particularly relates to a segmentation data of a data culling program that is repeated by a plurality of lions The block method for establishing a data block corresponding to the block. [Prior Art] The repetition number is a reduction technique, and the main purpose of the disk-based preparation system is to reduce the storage capacity used by the storage towel. It works by finding a variable variable fraction Wei at different times in a certain time. The seams of the seams are slightly replaced with fingers. Because the face is always filled with a large number of jobs #. For the side of the problem, _ Qian Xianglang, the "repeated deletion" technology has become a focus of attention. By repeatedly deleting Z = surgery, the stored data can be reduced to the original ·, so that more spares can not only save the backup data on the storage system for a longer period of time, but also save offline storage. A lot of bandwidth is required. Referring to "Figure 1", it is a demonstration of the deduplication access of the prior art. In order to effectively control the stored file data, the data block of each input file is recorded in the feeding end through the hash column. The corresponding hash value of the data block is recorded in the hash list. Since the hash algorithm has the characteristics of a one-way transform, each data block must have only one unique set of hash values. The deduplication program also uses this feature to treat data blocks of the same hash value as identical. Therefore, in the storage device, only one data area 201224805 block is stored, and the correspondence relationship of the same data block in different files can be recorded. The second = 曰: the increased amount of data will also make the length of the list of the 啥希 also increase the time of the county (4). SUMMARY OF THE INVENTION The problem of the above meal is as follows: 'The present invention provides a method for establishing a data block, a method for generating a data block, and a method for collecting the data of the county, and deleting the _ money. The data block is created with the corresponding index file. In order to achieve the above objective, the method for indexing data blocks disclosed in the present invention includes the following steps: a manned index file 'in the material file includes a plurality of location blocks, and each of the zone rabbits has a more storage block" Each storage field records the corresponding main hash value of the data block; performs a first hash procedure on a main hash value of the data block, calculates the block number, and performs a main hash of the same-data block The value performs the second hash program 'calculates the block number; loads the position conflict list; compares the block number with the position conflict list number in the list, and finds whether the same suspect number is already stored in the position conflict list. If the location touches the list, the main hash value is written into the corresponding block number and the block number. According to the present invention, the position of the squad (4) document (4) is used to improve the access efficiency of the money-reversing procedure in bribery (or (4) only the cited documents. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The following is a detailed description of the following: 201224805 [Embodiment] Please refer to "FIG. 2", which is a schematic diagram of the architecture of the present invention. The meditation includes the client 210 and the terminal 22G. The client 21〇 It can be connected to the feeding end 22 via the Internet __ or the intranet (i coffee net), and the client 210 and the server 220 can be simultaneously run on the same computer device. The client 210 It is used to perform a deduplication procedure on the input file, and the index file 221 of the data blocks corresponding to the input file according to the present invention is generated by the server end. ', stored in the server port 22〇 The index file function and the bit_extension list 222. The index file 22 records the hash value of the plurality of sets of data blocks. To improve the search efficiency of the index file 221, and reduce the access of the index file milk between the memory or the cache. Time For the method of establishing the reference file 221, please refer to the "3a map" and the "3B map" at the same time, which are respectively a schematic diagram of the process of establishing the correct file of the present invention and a schematic diagram of the file structure of the file I. Step S310: Loading the index a file, the index file includes a plurality of location blocks, each location block further includes a plurality of storage blocks, and each storage block records a corresponding primary hash value of the data block; Step S320: a data block The main hash value is subjected to the first hashing process, and the block number is calculated; Step S33G: performing a second hash shoe sequence on the main hash value of the same-data block, and counting the nasal block number; Step S340: Establishing the position Chonglang table, used to record the same as the blocker; 201224805 Step test: = thing fine _ _ number to compare: "Is the location conflict list has been stored in the same block number; and step (10): If When there is a field number in the location conflict list, the main hash value is written into the corresponding block number and the block number. As shown in the location map, (4) the file 221 includes a plurality of location blocks, each of which is ^ Block also includes multiple storage blocks Each storage header records a corresponding primary hash value. The storage locations in the index file function are all fixed lengths. In the present invention, the number of storage fields is generated by the following formula i: N= Capacity of the location block/capacity of the storage unit 1 N: Number of storage blocks. Equation 2 The number of position blocks is generated by Equation 2: Number of data blocks: Number of position blocks. The file 221 is divided into a plurality of location blocks of a fixed size (hereinafter referred to as a location block). The primary hash value corresponding to the data block (obtained by the SHA1 or SHA256 algorithm) is first. The processing of the hash program enables the block number to be scattered within the fineness of the ship's block. For the purpose of turning the wire into the number of the block, the main hash value can be transferred to the general calculation (_). Therefore, the remainder of the main hash value can be determined to fall within the range of the block number (as shown in "Figure 3" to select the corresponding location block). The hash value generated by the first hash program is only used to allocate the storage allocation of the main hash value, so its calculation result (block 201224805) does not occupy the actual memory and hard disk space. A first hash procedure is then performed on the primary hash value to use the generated second hash value as the block number of the corresponding data block. • The number is used to indicate the special pain level in the area ~. Satoshi, in order to make the interception mark scattered within the range of N storage shelves (as shown in "3B"), to select the corresponding storage touch, the main haon can be passed through the calculation (job j). #主哈轻_N The modulus of the difference after the 'main hash _ remaining will only be distributed within the scope of _ 存 (10). As a result, the establishment of the index file 221 is completed. > test 4 The following is a schematic diagram of the process of query index file milk of the present invention. The query index file 221 includes the following steps: Step S4H) · The client receives the block query request to query whether the index file 10 has corresponding data. Step S420: When there is no data block to be queried by the block query request in the index file, a temporary index file is generated in the memory, and the number of times the data block is queried is recorded in the temporary index file; Step S430: When the number of times the data block is queried meets the threshold value, the domain block number and the booth number of the Wei block are constructed in the index file. First, when the client 210 sends a query request to an input file to the server 22, the server 220 compares the file according to the index file 221 to compare whether the same data block already exists in the server 220. 201224805 If the second hash value to be queried has been found in the index file 221 (that is, after the main hash value of the second hash program), the block number of the main hash value is saved in the location. In list 222. The block number is recorded in the position conflict list a] and the data block corresponding to the field number is recorded by the address pointer. In other words, the record number of each of the records in the linked list is recorded as the same as the main hash value. If there is no conflicting record of this field value after the record number, then this record number can be set to an invalid value. • When the second hash value appears to collide with the previous primary hash value, the primary hash value is then hashed again - hashed to the location conflict list 222. The processing procedure in the present financing list can be realized by the following method: assuming that the main hash value takes N modulo operation (fu), the position conflict list 222 has the number of items n times, and 5 Record number ------- main hash value --------- record number ~ ' ' 1 -------- main hash value 1 ——~~~___ N+1 2 —------ Main hash value 2 ----- ~--~~~__ Invalid value 0 3 ~——--- Main hash value 3 -------- --~ ..._ Invalid value 0 ----- N ~———One main hash value N ----- Invalid value 0 ' N+1 Main hash value N+1 ------_ N+3 N+2 ——- Main hash value N+2 Invalid value 0 N+3 Main hash value N+3 Invalid value 0 .- ---- Table 1. Position conflict list 201224805 First of all, "Master hash value 1 After taking the remainder of N and storing its second hash value in the first record of the position conflict list 222. However, the second hash value after the remainder of the "main hash value N+i" takes N also corresponds to the first record, so a repeated collision occurs. At this time, the first record has the content (the content is "main hash value"), and the two main hash values are different ("main hash value" and "main hash value N+1" respectively) Therefore, the second hash value of the main hash value Ν+Γ is added to the end of the position conflict list 222, and its record number "main hash value N+1," is entered in the first record for proceeding. Association. Similarly, suppose that the "main hash value N+3" is the same as the "main hash value 1 conflicts with the "main hash value N+3" and its recorded conflict record number "Ν+Γ finds the main hash value N+1". After the comparison, the main hash value is also different, and is added to the position conflict list 222 again, and the record number "main hash value n+3" is recorded in the record "main hash value N+1 for association. . The process of adding "N+2" is the same. The invalid value in the position conflict list 222 is used to indicate that there is no conflict record after this record. When the main hash value of the query is new data, the operation of writing to the hard disk is not immediately performed, and the main hash value is first saved in the cache. The server Mo performs the count operation and waits for the new data to exceed the threshold or the cache capacity exceeds a certain size before writing to the hard disk. This will avoid frequent writes. The hierarchical index file 221 proposed by the present invention is used to record the location of the data block, thereby improving the access efficiency of the deduplication program in the memory (or hard disk) to find the index file 201224805. Although the present invention has been disclosed above in the above preferred embodiments, it is not intended to be a matter of the invention, and the invention may be modified in the spirit of the present invention. The scope of patent protection shall be subject to the definition of the scope of the patent application attached to this specification. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a schematic diagram showing the access of the repeated data of the prior art. Figure 2 is a schematic diagram of the architecture of the present invention. ★ Figure 3A is a schematic diagram of the process of establishing an index file of the present invention. Figure 3B is a sound map of the index file architecture of the present invention. Figure 4 is a flow diagram of the query index file of the present invention. [Main component symbol description] Client 210 Server 220 Index file 221 Location conflict list 222 11

Claims (1)

201224805 七、申請專利範圍: 1. -種建立數據區塊的索引方法’應用在一重複數據冊〗除程序之 中’將經過該重複數據刪除程序的—切分程序後所產生的一數 據區塊建:Μ目應的索引文件’該建域祕塊的索引方法包括 以下步驟: 載入-索引文件’在該索引文件包括多個位置區塊,每一 該位置區塊中更包括多個存儲攔位,每一該存儲攔位記錄缝 據區塊所相應的一主哈希值; 對該數據區塊的該主哈希值進行一第一哈希程序,計算並 產生一區塊編號; 對同-該數據區塊的該主哈希值進行一第二哈希程序,計 算並產生一攔位編號; 建立-位置衝突列表,用以記錄細位編號相同者; 將該欄位編號與該位置衝突職中的該些襴位編號進行 比對,查找該位置衝突列表中是否已經存儲有相同的該欄位編 5虎,以及 若該位置衝突列表中不存在該攔位編號時,則將該主哈希 值寫入相應的該區塊編號與該攔位編號之中。 2.如請求項1所述之建立數魏塊的钟方法,其#該处衝 突列表中存在該攔位編號時的步驟中更包括. 將該欄位編號記錄於綠置衝突列表中並且利用—位址 指針記錄該欄位編號所相應的該數據區塊。 12 201224805 法,其中在完成該索 3.如請柄1所狀敎數據魏的索弓丨方 引文件後更包括: ,用以查_料文件巾是否存在相 接收一區塊查詢請求 應的該數據區塊; 若該索引文件中不存在該區塊查詢請求所要查詢的該數 據區塊時,則在—内存中產生—暫存索引文件,並在該暫存索 引文件中記職數據區塊被細的次數;以及201224805 VII. Patent application scope: 1. - An index method for establishing a data block 'Apply in a duplicate data book' except for a program that will pass through the data segment of the deduplication program. Block construction: the index file of the target file The method of indexing the built-in secret block includes the following steps: loading-indexing file includes multiple location blocks in the index file, and each of the location blocks includes multiple Storing the block, each of the storage blocks recording a corresponding main hash value of the block; performing a first hashing process on the main hash of the data block, calculating and generating a block number Performing a second hash procedure on the main hash value of the same data block, calculating and generating a block number; establishing a position conflict list for recording the same fine number; Comparing the number of the position conflicts in the position conflicting position, finding whether the same position field has been stored in the position conflict list, and if the position number does not exist in the position conflict list, then The primary hash value is written to the corresponding block number and the block number. 2. The clock method for establishing a number of Wei blocks according to claim 1, wherein the step of the presence of the block number in the conflict list further includes: recording the field number in the green conflict list and utilizing - The address pointer records the data block corresponding to the field number. 12 201224805 method, in which the completion of the cable 3. If the handle 1 is in the shape of the data, Wei's cable is also included in the file: to check whether the data file towel exists to receive a block query request The data block; if the data block to be queried by the block query request does not exist in the index file, then the data file is generated in the memory, and the index data file is temporarily stored in the temporary index file. The number of times the block is thinned; 當該數據區塊被查詢的次數符合一門檻值時,則於該索弓丨 文件中建立該數據區塊的相應該區塊編號與該襴位編號。When the number of times the data block is queried meets a threshold, the corresponding block number of the data block and the location number are established in the file.
TW99144092A 2010-12-15 2010-12-15 A method of building the index of the data blocks TW201224805A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW99144092A TW201224805A (en) 2010-12-15 2010-12-15 A method of building the index of the data blocks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW99144092A TW201224805A (en) 2010-12-15 2010-12-15 A method of building the index of the data blocks

Publications (1)

Publication Number Publication Date
TW201224805A true TW201224805A (en) 2012-06-16

Family

ID=46725965

Family Applications (1)

Application Number Title Priority Date Filing Date
TW99144092A TW201224805A (en) 2010-12-15 2010-12-15 A method of building the index of the data blocks

Country Status (1)

Country Link
TW (1) TW201224805A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103873504A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 System enabling data blocks to be stored in distributed server and method thereof
CN111414367A (en) * 2020-03-31 2020-07-14 中国建设银行股份有限公司 Method and device for acquiring parameters

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103873504A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 System enabling data blocks to be stored in distributed server and method thereof
CN111414367A (en) * 2020-03-31 2020-07-14 中国建设银行股份有限公司 Method and device for acquiring parameters

Similar Documents

Publication Publication Date Title
JP6556911B2 (en) Method and apparatus for performing an annotated atomic write operation
JP6884128B2 (en) Data deduplication device, data deduplication method, and data deduplication program
JP6385570B2 (en) Storage system and storage control method
US10248623B1 (en) Data deduplication techniques
US8271462B2 (en) Method for creating a index of the data blocks
US10216446B2 (en) Method for deduplication in storage system, storage system, and controller
US20140297603A1 (en) Method and apparatus for deduplication of replicated file
JP5902323B2 (en) Method and apparatus for arranging content-derived data in memory
JP6805816B2 (en) Information processing equipment, information processing system, information processing method and program
WO2022048356A1 (en) Data processing method and system for cloud platform, and electronic device and storage medium
CN104054071A (en) Method for accessing storage device and storage device
US20170322878A1 (en) Determine unreferenced page in deduplication store for garbage collection
CN106462481A (en) Duplicate data using cyclic redundancy check
CN111522502A (en) Data deduplication method and device, electronic equipment and computer-readable storage medium
Lei et al. An improved image file storage method using data deduplication
WO2019120226A1 (en) Data access prediction method and apparatus
TW201224805A (en) A method of building the index of the data blocks
WO2019072088A1 (en) File management method, file management device, electronic equipment and storage medium
US20090328229A1 (en) System, method and computer program product for performing a data protection operation
JP2019016293A (en) Information processing device, deduplication rate determination method and deduplication rate determination program
TWI420306B (en) A searching method of the blocks of the data deduplication
JP5591406B2 (en) Low latency content address storage device
KR20130074605A (en) Apparatus and method for managing shared data storage
TW201224787A (en) A deduplication system
US20130007363A1 (en) Control device and control method