TW202301146A - Index node allocation method, data processing device and computer-readable medium - Google Patents
Index node allocation method, data processing device and computer-readable medium Download PDFInfo
- Publication number
- TW202301146A TW202301146A TW110147495A TW110147495A TW202301146A TW 202301146 A TW202301146 A TW 202301146A TW 110147495 A TW110147495 A TW 110147495A TW 110147495 A TW110147495 A TW 110147495A TW 202301146 A TW202301146 A TW 202301146A
- Authority
- TW
- Taiwan
- Prior art keywords
- directory
- sequence
- subsequences
- file
- index node
- Prior art date
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本發明係有關檔案系統之索引節點(index node,簡稱為inode),且特別係有關一種索引節點配置方法、資料處理裝置與電腦可讀媒介。 The present invention relates to an index node (inode for short) of a file system, and in particular relates to an index node configuration method, a data processing device and a computer-readable medium.
目前,例如ZFS檔案系統之某些檔案系統係使用索引節點儲存檔案的建立時間、修改時間、存取模式、大小等元資料(metadata)、以及檔案資料在硬碟等儲存裝置中的儲存位置。為了快速查閱(lookup)檔案的元資料,可利用雜湊(hash)演算法。例如,可將目錄分為複數區塊,利用雜湊演算法的均勻分散特性,將同一目錄中的複數檔案均勻對應至各區塊,再於各區塊中記錄各檔案的索引節點編號。藉此,當需要查詢某個檔案的元資料時,僅需要掃描該檔案的雜湊值所對應的目錄區塊,即可取得該檔案的索引節點編號,以讀取元資料,而不需掃描整個目錄。 At present, some file systems such as the ZFS file system use inodes to store metadata such as file creation time, modification time, access mode, size, and the storage location of the file data in storage devices such as hard disks. In order to quickly check (lookup) the metadata of the file, a hash algorithm can be used. For example, the directory can be divided into multiple blocks, and the uniform dispersion property of the hash algorithm can be used to uniformly map the multiple files in the same directory to each block, and then record the inode number of each file in each block. In this way, when the metadata of a file needs to be queried, it is only necessary to scan the directory block corresponding to the hash value of the file to obtain the inode number of the file to read the metadata without scanning the entire Table of contents.
然而,雜湊演算法的分散特性卻不利於目錄詳列(directory traversal)。此外,一般檔案系統係根據檔案建立的時間先後順序,將連續的待配置索引節點逐一配置給各檔案,而連續建立的檔案未必在同一目錄中,故同一目 錄的檔案的索引節點的位置並無關聯,而是分散配置。因此,當需要詳列(traverse)一個包含眾多檔案的大型目錄中的全部檔案及其元資料時,必須隨機讀取分散在廣大儲存空間中的索引節點,而導致處理速度緩慢。 However, the decentralized nature of the hash algorithm is not conducive to directory traversal. In addition, the general file system allocates consecutive index nodes to be configured to each file one by one according to the time sequence of file creation, but the files created consecutively may not be in the same directory, so the same directory The position of the inode of the recorded file is not related, but distributed configuration. Therefore, when it is necessary to traverse all files and their metadata in a large directory containing many files, it is necessary to randomly read inodes scattered in a large storage space, resulting in slow processing speed.
此外,上述元資料亦可包括檔案之擴展屬性(extended attributes,EA)及/或存取控制清單(access control list,ACL),由於這些元資料沒有長度限制,容易因為某個檔案的擴展屬性及/或存取控制清單的數量龐大,使其索引節點的標頭(header)無法儲存,而必須配置額外的索引節點以儲存這些元資料,這會使索引節點更多且更分散,進一步降低目錄詳列的處理速度。 In addition, the above metadata may also include extended attributes (extended attributes, EA) and/or access control list (access control list, ACL) of the file. /or the number of access control lists is so large that the headers of the index nodes cannot be stored, and additional index nodes must be configured to store these metadata, which will make the index nodes more and more scattered, further reducing the directory details. The processing speed of the column.
在先前技術中,係藉由加大索引節點以便儲存大量元資料,例如擴展屬性及/或存取控制清單,這樣的元資料可稱為內嵌(embedded)於索引節點中,然而在元資料數量較少時會浪費儲存空間,因此不容易預估索引節點的合適大小,且因為索引節點增大,其儲存空間必須隨之增大,即擴大了索引節點的分散範圍,而使索引節點分散的問題更加惡化,因此需要一種技術來至少解決上述問題。 In the prior art, by enlarging the index node to store a large amount of metadata, such as extended attributes and/or access control lists, such metadata can be said to be embedded in the index node, but in the metadata When the number is small, the storage space will be wasted, so it is not easy to estimate the appropriate size of the index node, and because the index node increases, its storage space must increase accordingly, that is, the dispersion range of the index node is expanded, and the index node is scattered The problem is exacerbated, so a technique is needed to at least solve the above problem.
為解決上述問題,本發明提供一種索引節點配置方法,該索引節點配置方法係用於檔案系統且包括下列步驟:在目錄順序模式已啟動時,該檔案系統中之目錄係配置包括連續之複數索引節點的序列,其中,該目錄中之複數檔案係根據該等檔案之雜湊值排序;將該序列劃分為複數子序列;以及,當在該目錄中建立新檔案時,根據該新檔案之雜湊值選定該等子序列中之一者,以配置被選定之該等子序列中之一者的索引節點,俾供該新檔案使用。 In order to solve the above problems, the present invention provides an index node configuration method, which is used in a file system and includes the following steps: when the directory sequence mode is activated, the directory system configuration in the file system includes continuous plural indexes A sequence of nodes, wherein the plurality of files in the directory are ordered according to the hash value of the files; the sequence is divided into plural subsequences; and, when a new file is created in the directory, according to the hash value of the new file One of the subsequences is selected to configure an inode of the selected one of the subsequences for use by the new file.
本發明另提供一種資料處理裝置,該資料處理裝置安裝有檔案系統,以執行上述之索引節點配置方法。 The present invention further provides a data processing device, the data processing device is installed with a file system to execute the above-mentioned index node configuration method.
本發明又提供一種電腦可讀媒介,應用於資料處理裝置中,係儲存有指令,以執行上述之索引節點配置方法。 The present invention further provides a computer-readable medium, which is used in a data processing device and stores instructions to execute the above-mentioned index node configuration method.
在本發明中,係將同一目錄中之檔案的索引節點依目錄詳列的順序連續配置,以加快大型目錄的詳列速度,及其處理速度。 In the present invention, the index nodes of the files in the same directory are arranged continuously according to the order of directory listing, so as to speed up the listing speed and processing speed of large directories.
200,300:目錄 200,300: directory
210,310,410,610:索引節點之序列 210,310,410,610: sequence of index nodes
file37,file99:檔案 file37, file99: file
S100~S124:方法步驟 S100~S124: method steps
圖1係根據本發明一實施例的一種索引節點配置方法的流程圖。 FIG. 1 is a flow chart of an index node configuration method according to an embodiment of the present invention.
圖2至圖6係根據本發明不同實施例的索引節點配置方法的配置示意圖。 FIG. 2 to FIG. 6 are configuration diagrams of inode configuration methods according to different embodiments of the present invention.
以下藉由特定的具體實施例說明本發明之實施方式,在本技術領域具有通常知識者可由本說明書所揭示之內容輕易地瞭解本發明之其他優點及功效。 The implementation of the present invention will be described below through specific specific examples. Those with ordinary knowledge in the technical field can easily understand other advantages and effects of the present invention from the contents disclosed in this specification.
請參照圖1,係根據本發明一實施例的一種索引節點配置方法的流程圖。在本實施例中,該方法係由一個檔案系統執行,且該方法可用於該檔案系統中的任一目錄。 Please refer to FIG. 1 , which is a flowchart of an inode configuration method according to an embodiment of the present invention. In this embodiment, the method is executed by a file system, and the method can be used for any directory in the file system.
此外,為加快目錄詳列的速度,該目錄中之複數檔案可根據該等檔案之雜湊值排序,其中,該等檔案之雜湊值可為該等檔案之名稱的雜湊值,或 根據該等檔案之元資料或其他相關資料計算所得的雜湊值。以下將說明圖1的方法流程。 In addition, in order to speed up the listing of the directory, the plurality of files in the directory can be sorted according to the hash value of the files, wherein the hash value of the files can be the hash value of the names of the files, or The hash value calculated based on the metadata or other relevant information of the files. The flow of the method in FIG. 1 will be described below.
首先,在步驟S100,檢查該目錄是否符合目錄順序模式(directory order mode)的啟動條件。例如,該啟動條件可為該目錄中的全部檔案所使用的索引節點總數大於預設門檻值。若該目錄未符合該啟動條件,則流程進入步驟S110,反之,若該目錄已符合該啟動條件,則流程進入步驟S120。 First, in step S100, it is checked whether the directory meets the activation condition of the directory order mode (directory order mode). For example, the starting condition may be that the total number of inodes used by all the files in the directory is greater than a preset threshold. If the directory does not meet the activation condition, the process proceeds to step S110; otherwise, if the directory meets the activation condition, the process proceeds to step S120.
在步驟S110中,檔案系統不啟動目錄順序模式,此時會根據整個檔案系統中檔案建立的時間先後順序,將連續的待配置索引節點逐一配置給各檔案。 In step S110, the file system does not activate the directory sequence mode, and at this time, according to the chronological order of files in the entire file system, consecutive index nodes to be configured are allocated to each file one by one.
在步驟S120中,檔案系統啟動目錄順序模式。 In step S120, the file system activates the directory sequence mode.
在步驟S121中,當目錄順序模式已啟動時,檔案系統會在未使用的索引節點中,配置包括連續k個索引節點的一個序列,供該目錄中之檔案使用。在一實施例中,該序列可表示為(Ix,Ix+k-1),其中,Ix及Ix+k-1係該序列的第一個及最後一個索引節點,x係該第一個索引節點的編號,k係大於1的整數,x+k-1係該最後一個索引節點的編號。 In step S121, when the directory sequence mode is enabled, the file system will configure a sequence including k consecutive inodes in unused inodes for use by files in the directory. In one embodiment, the sequence can be expressed as (I x , I x+k-1 ), where I x and I x+k-1 are the first and last index nodes of the sequence, and x is the The number of the first index node, k is an integer greater than 1, and x+k-1 is the number of the last index node.
在步驟S122中,將該序列劃分為複數子序列。在一實施例中,各該子序列均包括相同數量之索引節點。例如,可將該序列的k個索引節點劃分為L個子序列,其中,L為大於1的整數,且該等子序列可表示為(Ix,Ix+k/L-1),(Ix+k/L,Ix+2k/L-1),...,(Ix+(L-1)k/L,Ix+k-1)。 In step S122, the sequence is divided into complex subsequences. In one embodiment, each of the subsequences includes the same number of inodes. For example, the k index nodes of the sequence can be divided into L subsequences, where L is an integer greater than 1, and the subsequences can be expressed as (I x , I x+k/L-1 ), (I x+k/L ,I x+2k/L-1 ),...,(I x+(L-1)k/L ,I x+k-1 ).
在步驟S123中,當在該目錄中建立新檔案時,根據該新檔案之雜湊值選定該等子序列中之一者。 In step S123, when a new file is created in the directory, one of the subsequences is selected according to the hash value of the new file.
在步驟S124中,配置被選定該等子序列中之一者的一個索引節點,供該新檔案使用。例如,可選定該等子序列中之第r個子序列(Ix+rk/L,Ix+(r+1)k/L-1)做為該被選定子序列,其中,r=F % L,F係該新檔案之名稱的雜湊值。易言之,r係該新檔案之該雜湊值除以該等子序列之數量L所得之餘數。 In step S124, configure an inode of the selected one of the subsequences for use by the new file. For example, the rth subsequence (I x+rk/L , I x+(r+1)k/L-1 ) of the subsequences can be selected as the selected subsequence, wherein r=F % L , F is the hash value of the name of the new file. In other words, r is the remainder obtained by dividing the hash value of the new file by the number L of the subsequences.
在另一實施例中,各該子序列所包括之索引節點數量可以不完全相同或完全不相同。當在該目錄中建立新檔案時,仍可選定該等子序列中之第r個子序列,且配置第r個子序列中之索引節點,供該新檔案使用。 In another embodiment, the number of index nodes included in each of the subsequences may not be completely the same or completely different. When creating a new file in the directory, the rth subsequence among the subsequences can still be selected, and the index node in the rth subsequence can be configured for use by the new file.
例如,當該序列的各該子序列所包括之索引節點數量為不完全相同或完全不相同時,該等子序列可表示為(Ix,Ix+s1-1),(Ix+s1,Ix+s2-1),...,(Ix+s(L-1),Ix+sL-1),並以一個表格記錄決定各該子序列長度的參數(s1,s2,...,sL),惟在檔案雜湊值已足夠隨機分佈的情況下,採用前述子序列索引節點數量相同的方式,可以更有效率地配置索引節點。 For example, when the number of index nodes included in each subsequence of the sequence is not exactly the same or completely different, the subsequences can be expressed as (I x , I x+s1-1 ), (I x+s1 ,I x+s2-1 ),...,(I x+s(L-1) ,I x+sL-1 ), and record the parameters (s1,s2, ..., sL), but in the case that the hash value of the file is sufficiently randomly distributed, the index nodes can be configured more efficiently by adopting the same number of subsequence index nodes.
此外,在步驟S120之檔案系統啟動目錄順序模式後,可根據上述的檔案和子序列之間的對應關係,將該目錄中之各檔案的先前四處分散之索引節點集中移動至該檔案所對應的子序列中,以符合目錄順序模式的配置方式。因此,如果檔案系統用於計算雜湊值的演算法足夠隨機,就能確保各子序列被均勻充分地使用。 In addition, after the file system starts the directory sequence mode in step S120, according to the above-mentioned corresponding relationship between files and sub-sequences, the previous scattered index nodes of each file in the directory can be collectively moved to the sub-sequence corresponding to the file. sequence, in a configuration that conforms to the directory order pattern. Therefore, if the algorithm used by the file system to calculate the hash value is random enough, it can ensure that the subsequences are evenly and fully used.
圖2係根據本發明一實施例的索引節點配置方法的配置示意圖。如圖2所示,目錄200包括複數檔案及複數區塊。
FIG. 2 is a configuration schematic diagram of an inode configuration method according to an embodiment of the present invention. As shown in FIG. 2 , the
在一實施例中,在目錄200的區塊0的表格中,該等檔案係根據該等檔案之名稱的雜湊值的尾數(即末尾四位元)排序,除區塊0之外,每個區塊均記錄配置給各檔案的索引節點編號,區塊0中的該表格則記錄各檔案之該
尾數與記錄其索引節點編號的區塊編號的對應關係。舉例而言,檔案系統將包括編號為1025至1152的128個連續索引節點的序列210配置給目錄200,以供目錄200的檔案使用,且將序列210劃分為16個子序列,每個子序列均包括8個連續索引節點,例如序列210的子序列0包括編號為1025至1032的8個連續索引節點,序列210的子序列1則包括編號為1033至1040的8個連續索引節點。
In one embodiment, in the table of block 0 of the
在本實施例中,檔案的雜湊值尾數長度為4位元,且24等於16,即上述子序列之數量L,故每一檔案之雜湊值尾數皆等於該檔案之雜湊值除以子序列數量L所得之餘數r。易言之,具有相同雜湊值尾數之檔案,其索引節點均位於同一子序列中。 In this embodiment, the length of the mantissa of the hash value of the file is 4 bits, and 2 4 is equal to 16, which is the number L of the above-mentioned subsequences, so the mantissa of the hash value of each file is equal to the hash value of the file divided by the subsequence The remainder r obtained from the quantity L. In other words, for files with the same hash value mantissa, their index nodes are all located in the same subsequence.
以檔案file99為例,檔案file99的雜湊值的尾數為0001,對應子序列1,且在區塊0的表格中對應區塊5,因此,檔案系統可將序列210的子序列1中的一個索引節點配置給檔案file99使用。例如,檔案系統將索引節點1034配置給檔案file99使用,且在區塊5中記錄檔案file99的索引節點編號1034。此後,當需要存取檔案file99的元資料時,可從區塊0的表格獲知區塊5記錄檔案file99的索引節點編號,且從區塊5獲知檔案file99的索引節點編號為1034,再於索引節點1034存取檔案file99的元資料。
Taking file file99 as an example, the mantissa of the hash value of file99 is 0001, which corresponds to subsequence 1, and corresponds to block 5 in the table of block 0. Therefore, the file system can index an index in subsequence 1 of
此外,當先前配置給某一目錄之第一序列中的任一子序列之索引節點已全部被配置給該目錄的檔案使用時,檔案系統可在未使用的索引節點中,配置同樣包括連續k個索引節點的第二序列,供該目錄中之檔案使用。在一實施例中,該第二序列可表示為(Iy,Iy+k-1),其中,Iy及Iy+k-1係該第二序列的第一個及最後一個索引節點,y係該第二序列的第一個索引節點的編號,y+k-1係該第二序列的最後一個索引節點的編號。檔案系統會將該第二序列劃分為複數子序列。 In addition, when all the index nodes previously allocated to any subsequence in the first sequence of a certain directory have been allocated to the files of the directory, the file system can configure the unused index nodes also including consecutive k The second sequence of inodes to use for files in this directory. In one embodiment, the second sequence can be expressed as (I y ,I y+k-1 ), where I y and I y+k-1 are the first and last index nodes of the second sequence , y is the number of the first index node of the second sequence, and y+k-1 is the number of the last index node of the second sequence. The filing system divides this second sequence into plural subsequences.
在一實施例中,各該子序列均包括相同數量之索引節點。例如,可將該第二序列的k個索引節點同樣劃分為L個子序列,且該等子序列可表示為(Iy,Iy+k/L-1),(Iy+k/L,Iy+2k/L-1),...,(Iy+(L-1)k/L,Iy+k-1)。 In one embodiment, each of the subsequences includes the same number of inodes. For example, the k index nodes of the second sequence can also be divided into L subsequences, and the subsequences can be expressed as (I y ,I y+k/L-1 ),(I y+k/L , I y+2k/L-1 ),...,(I y+(L-1)k/L ,I y+k-1 ).
另外,當該目錄中之任一檔案所需之索引節點數量超出該檔案於該第一序列中所對應之被選定子序列所能提供者時(例如該檔案剛建立,需要一個索引節點,且該被選定子序列的索引節點已全被配置給其他檔案使用),檔案系統根據該檔案之雜湊值(例如檔案名稱之雜湊值)選定該第二序列之該等子序列中之一者,以配置該第二序列之該被選定子序列中之一個索引節點,供該檔案使用。例如,同樣可選定該等子序列中之第r個子序列(Iy+rk/L,Iy+(r+1)k/L-1)做為該被選定子序列,其中,r係該檔案之雜湊值除以該第二序列之子序列數量L所得之餘數。 In addition, when the number of inodes required by any file in the directory exceeds what can be provided by the selected subsequence corresponding to the file in the first sequence (for example, the file has just been created and needs an inode, and The index nodes of the selected subsequence have been allocated to other files), the file system selects one of the subsequences of the second sequence according to the hash value of the file (such as the hash value of the file name), and An inode in the selected subsequence of the second sequence is allocated for use by the file. For example, the rth subsequence (I y+rk/L , I y+(r+1)k/L-1 ) among the subsequences can also be selected as the selected subsequence, wherein r is the file The remainder obtained by dividing the hash value by the number L of subsequences of the second sequence.
在另一實施例中,該第二序列的各子序列所包括之索引節點數量可以不完全相同或完全不相同,其中,該等子序列可表示為(Iy,Iy+s1-1),(Iy+s1,Iy+s2-1),...,(Iy+s(L-1),Iy+sL-1),並以一個表格記錄決定各該子序列長度的參數(s1,s2,...,sL),惟在檔案雜湊值已足夠隨機分佈的情況下,採用前述子序列索引節點數量相同的方式,可以更有效率地配置索引節點。 In another embodiment, the number of index nodes included in each subsequence of the second sequence may not be exactly the same or completely different, wherein the subsequences may be expressed as (I y , I y+s1-1 ) ,(I y+s1 ,I y+s2-1 ),...,(I y+s(L-1) ,I y+sL-1 ), and record the length of each subsequence in a table Parameters (s1, s2,..., sL), but in the case that the hash value of the file is sufficiently randomly distributed, the index nodes can be configured more efficiently by using the same number of index nodes in the aforementioned subsequences.
此後,當在該目錄中建立新檔案,且該新檔案在該第一序列中的被選定子序列尚有索引節點可配置給該新檔案,則檔案系統配置該被選定子序列的索引節點,供該新檔案使用。反之,若該被選定子序列的索引節點已全被配置給其他檔案,則檔案系統配置該檔案於該第二序列中對應之被選定子序列的索引節點,供該新檔案使用。此外,當該第二序列中的任一子序列之索引節點已 全部被配置給該目錄的檔案使用時,檔案系統可再配置一個序列供該目錄的檔案使用,依此類推。 Thereafter, when a new file is created in the directory, and the selected subsequence of the new file in the first sequence still has index nodes that can be allocated to the new file, the file system configures the index nodes of the selected subsequence, for this new file. Conversely, if all the inodes of the selected subsequence have been allocated to other files, the file system allocates the inodes of the selected subsequence corresponding to the file in the second sequence for use by the new file. In addition, when the index node of any subsequence in the second sequence has been When all are configured for the files in this directory, the file system can configure another sequence for the files in this directory, and so on.
圖3係根據本發明一實施例的索引節點配置方法的配置示意圖。如圖3所示,檔案系統將包括編號為1025至1152的128個連續索引節點的序列210配置給目錄300,以供目錄300的檔案使用,且將序列210劃分為16個子序列,每個子序列均包括8個連續索引節點。
FIG. 3 is a configuration schematic diagram of an inode configuration method according to an embodiment of the present invention. As shown in Figure 3, the file system configures the
之後,序列210的至少一個子序列之索引節點已全部被配置給目錄300的檔案使用,故檔案系統進一步將包括編號為1153至1280的128個連續索引節點的序列310配置給目錄300,以供目錄300的檔案使用,且將序列310劃分為16個子序列,每個子序列均包括8個連續索引節點。
Afterwards, all the index nodes of at least one sub-sequence of the
以檔案file37為例。檔案file37需要索引節點,然而檔案file37在序列210中對應之子序列1的索引節點已全被配置給其他檔案,因此,檔案系統配置檔案file37於序列310中對應之子序列1的索引節點,供檔案file37使用。例如,檔案系統將索引節點1161配置給檔案file37使用,且在區塊5中記錄檔案file37的索引節點編號1161。
Take the file file37 as an example. The file file37 needs an index node, but the index node of the subsequence 1 corresponding to the file file37 in the
在一實施例中,目錄中的檔案可能會有太多元資料而無法全部儲存在該檔案的索引節點中,因此,檔案系統可在未使用的索引節點中,配置包括連續wk個索引節點的第三序列,以供儲存檔案之元資料。在一實施例中,該第三序列可表示為(Iz,Iz+wk-1),其中,Iz及Iz+wk-1係該第三序列的第一個及最後一個索引節點,w係正整數,z係該第三序列的第一個索引節點的編號,z+wk-1係該第三序列的最後一個索引節點的編號。檔案系統會將該第三序列劃分為複數子序列。 In one embodiment, the files in the directory may have too much metadata to be stored in the inodes of the files. Therefore, the file system may configure the first index node including consecutive wk index nodes among the unused index nodes Three sequences, used to store the metadata of the file. In one embodiment, the third sequence can be expressed as (I z , I z+wk-1 ), where I z and I z+wk-1 are the first and last index nodes of the third sequence , w is a positive integer, z is the number of the first index node of the third sequence, and z+wk-1 is the number of the last index node of the third sequence. The filing system divides this third sequence into plural subsequences.
在一實施例中,各該子序列均包括相同數量之索引節點。例如,可將該第三序列的wk個索引節點同樣劃分為L個子序列,且該等子序列可表示為(Iz,Iz+wk/L-1),(Iz+wk/L,Iz+2wk/L-1),...,(Iz+(L-1)wk/L,Iz+wk-1)。 In one embodiment, each of the subsequences includes the same number of inodes. For example, the wk index nodes of the third sequence can also be divided into L subsequences, and these subsequences can be expressed as (I z ,I z+wk/L-1 ),(I z+wk/L , I z+2wk/L-1 ),...,(I z+(L-1)wk/L ,I z+wk-1 ).
此外,當該目錄中的某一檔案的元資料過多而無法全部儲存在該檔案原有之索引節點中時,檔案系統根據該檔案之雜湊值(例如檔案名稱之雜湊值)選定該第三序列之該等子序列中之一者,以配置該第三序列之該被選定子序列中之w個索引節點,供儲存該檔案之元資料,其中,該元資料可包括該檔案之擴展屬性及/或存取控制清單。例如,同樣可選定該等子序列中之第r個子序列(Iz+rwk/L,Iz+(r+1)wk/L-1)做為該被選定子序列,其中,r係該檔案之雜湊值除以該第三序列之子序列數量L所得之餘數。 In addition, when the metadata of a file in the directory is too much to be stored in the original index node of the file, the file system selects the third sequence according to the hash value of the file (such as the hash value of the file name) one of the subsequences of the third sequence to configure w index nodes in the selected subsequence of the third sequence for storing metadata of the file, wherein the metadata may include extended attributes of the file and /or access control list. For example, the rth subsequence (I z+rwk/L , I z+(r+1)wk/L-1 ) among the subsequences can also be selected as the selected subsequence, where r is the file The remainder obtained by dividing the hash value by the number L of subsequences of the third sequence.
在另一實施例中,該第三序列的各子序列所包括之索引節點數量可以不完全相同或完全不相同,其中,該等子序列可表示為(Iz,Iz+s1-1),(Iz+s1,Iz+s2-1),...,(Iz+s(L-1),Iz+sL-1),並以一個表格記錄決定各該子序列長度的參數(s1,s2,...,sL),惟在檔案雜湊值已足夠隨機分佈的情況下,採用前述子序列索引節點數量相同的方式,可以更有效率地配置索引節點。 In another embodiment, the number of index nodes included in each subsequence of the third sequence may not be exactly the same or completely different, wherein the subsequences may be expressed as (I z , I z+s1-1 ) ,(I z+s1 ,I z+s2-1 ),...,(I z+s(L-1) ,I z+sL-1 ), and record the length of each subsequence in a table Parameters (s1, s2,..., sL), but in the case that the hash value of the file is sufficiently randomly distributed, the index nodes can be configured more efficiently by using the same number of index nodes in the aforementioned subsequences.
圖4及圖5係根據本發明一實施例的索引節點配置方法的配置示意圖。如圖4及圖5所示,檔案系統先將包括編號為1025至1152的128個連續索引節點的序列210配置給目錄200,以供目錄200的檔案使用,再將包括編號為1153至1408的256個連續索引節點(w等於2)的序列410配置給目錄200,以供儲存其檔案的元資料。序列210劃分為16個子序列,每個子序列均包括8個連續索引節點。序列410同樣劃分為16個子序列,每個子序列則包括16個連續索引節點,例如,子序列1包括編號為1169至1184的16個連續索引節點。
FIG. 4 and FIG. 5 are configuration schematic diagrams of an inode configuration method according to an embodiment of the present invention. As shown in Figure 4 and Figure 5, the file system first configures the
以檔案file99為例,檔案file99的雜湊值的尾數為0001,對應子序列1,且在區塊0的表格中對應區塊5,因此,檔案系統將序列210的子序列1的索引節點1034配置給檔案file99使用,且在區塊5中記錄檔案file99的索引節點編號1034。然後,因為檔案file99的元資料過多而無法全部儲存於索引節點1034中,檔案系統將序列410的子序列1的兩個索引節點1169及1170配置給檔案file99,用以儲存檔案file99之擴展屬性及/或存取控制清單,且在索引節點1034中記錄檔案file99的擴展屬性及/或存取控制清單係儲存於索引節點1169及1170。
Taking file file99 as an example, the mantissa of the hash value of file99 is 0001, which corresponds to subsequence 1, and corresponds to block 5 in the table of block 0. Therefore, the file system configures
此外,當需要存取檔案file99的擴展屬性及/或存取控制清單時,可從區塊0的表格獲知區塊5記錄檔案file99的索引節點編號,且從區塊5獲知檔案file99的索引節點編號為1034,再從索引節點1034獲知檔案file99的擴展屬性及/或存取控制清單係儲存於索引節點1169及1170,然後可在索引節點1169及1170存取file99的擴展屬性及/或存取控制清單。
In addition, when the extended attribute and/or access control list of the file file99 needs to be accessed, the inode number of the file file99 recorded in the block 5 can be obtained from the table of the block 0, and the inode of the file file99 can be obtained from the block 5 The number is 1034, and the extended attribute and/or access control list of the file file99 is learned from the
除上述之第三序列外,檔案系統還可在未使用的索引節點中,配置包括連續w’k個索引節點的第四序列,以供儲存檔案之元資料。在一實施例中,該第四序列可表示為(Iz’,Iz’+w’k-1),其中,Iz’及Iz’+w’k-1係該第四序列的第一個及最後一個索引節點,w’係正整數,z’係該第四序列的第一個索引節點的編號,z’+w’k-1係該第四序列的最後一個索引節點的編號。該第四序列之劃分與配置可比照該第三序列,如同該第二序列之劃分與配置比照該第一序列。 In addition to the above-mentioned third sequence, the file system can also configure a fourth sequence including w'k consecutive index nodes in unused index nodes for storing metadata of files. In one embodiment, the fourth sequence can be expressed as (I z' , I z'+w'k-1 ), wherein, I z' and I z'+w'k-1 are the fourth sequence The first and last index node, w' is a positive integer, z' is the number of the first index node of the fourth sequence, z'+w'k-1 is the number of the last index node of the fourth sequence serial number. The division and configuration of the fourth sequence can be compared with the third sequence, just as the division and configuration of the second sequence can be compared with the first sequence.
此外,當某一檔案的元資料過多而無法全部儲存於該檔案在該第三序列中配置的w個索引節點中時,檔案系統可配置該第四序列中該檔案所對應之被選定子序列中的w’個索引節點,以儲存該檔案的元資料中超出該w個索 引節點的容量的部分。若該第四序列尚不足以儲存全部元資料,則檔案系統可再配置一個序列,用以儲存額外的元資料,依此類推。 In addition, when the metadata of a certain file is too much to be stored in all the w index nodes configured in the third sequence, the file system can configure the selected subsequence corresponding to the file in the fourth sequence w' index nodes in the file to store more than the w index in the metadata of the file The portion of the inode's capacity. If the fourth sequence is not enough to store all the metadata, the file system can configure another sequence to store additional metadata, and so on.
圖6係根據本發明一實施例的索引節點配置方法的配置示意圖。如圖6所示,檔案系統將包括編號為1025至1152的128個連續索引節點的序列210配置給目錄200,以供目錄200的檔案使用,且將序列210劃分為16個子序列,每個子序列均包括8個連續索引節點。接著,檔案系統將包括編號為1153至1408的256個連續索引節點的序列410配置給目錄200,用以儲存目錄200的檔案的擴展屬性及/或存取控制清單等元資料,且將序列410劃分為16個子序列,每個子序列均包括16個連續索引節點。然後,序列210的至少一個子序列之索引節點已全部被配置給目錄200的檔案使用,故檔案系統進一步將包括編號為1409至1536的128個連續索引節點的序列610配置給目錄200,以供目錄200的檔案使用,且將序列610劃分為16個子序列,每個子序列均包括8個連續索引節點。
FIG. 6 is a configuration schematic diagram of an inode configuration method according to an embodiment of the present invention. As shown in Figure 6, the file system configures the
在本實施例中,檔案的雜湊值尾數長度為4位元,且24等於16,即上述子序列之數量L,故每一檔案之雜湊值尾數即等於該檔案之雜湊值除以子序列數量L所得之餘數r。易言之,具有相同雜湊值尾數之檔案均對應同一目錄區塊,且均對應相同之子序列編號。例如,雜湊值尾數為0000之檔案均對應目錄區塊10及序列210、410與610之子序列0,雜湊值尾數為0001之檔案均對應目錄區塊5及序列210、410與610之子序列1,依此類推。
In this embodiment, the length of the mantissa of the hash value of the file is 4 bits, and 2 4 is equal to 16, which is the number L of the above-mentioned subsequences, so the mantissa of the hash value of each file is equal to the hash value of the file divided by the subsequence The remainder r obtained from the quantity L. In other words, files with the same hash value mantissa all correspond to the same directory block, and all correspond to the same subsequence number. For example, a file whose hash value ends with 0000 corresponds to directory block 10 and subsequence 0 of
如上所述,目錄200中之檔案係根據其雜湊值尾數排序,且具有相同雜湊值尾數之檔案均對應相同之子序列編號,因此,當檔案系統執行目錄200的詳列(traversal)時,僅需在各序列中依序預讀取(pre-fetch)其子序列,例如
先預讀取序列210、410及610之子序列0,再預讀取序列210、410及610之子序列1,依此類推,即可依目錄順序迅速取得每一檔案之索引節點,而不需隨機讀取大量分散的索引節點。
As mentioned above, the files in the
本發明另提供一種資料處理裝置,例如智慧型手機、電腦、伺服器、或其他任何一種具有資料處理及儲存功能之電子裝置。該資料處理裝置安裝有檔案系統,以執行上述之索引節點配置方法。 The present invention also provides a data processing device, such as a smart phone, a computer, a server, or any other electronic device with data processing and storage functions. The data processing device is installed with a file system to execute the above-mentioned index node configuration method.
本發明亦提供一種電腦可讀媒介,例如記憶體、軟碟、硬碟或光碟。該電腦可讀媒介係應用於該資料處理裝置中,且儲存有指令,以執行上述之索引節點配置方法。 The present invention also provides a computer-readable medium, such as a memory, a floppy disk, a hard disk or an optical disk. The computer-readable medium is used in the data processing device and stores instructions to execute the above-mentioned index node configuration method.
綜上所述,本發明之檔案系統的目錄中之檔案,係根據其雜湊值尾數對應至該目錄中之一區塊,故於查閱(lookup)單一檔案的元資料時,僅需掃描該檔案對應之目錄區塊,即可獲知其索引節點的編號,而不需要掃描整個目錄。另外,在進行目錄詳列時,僅需在各序列中依序預讀取其子序列,即可依目錄順序迅速取得每一檔案之索引節點,而不需隨機讀取大量分散的索引節點。因此,對於單一檔案之查閱及整個目錄之詳列,本發明均能快速執行,並增加其處理速度。 In summary, the files in the directory of the file system of the present invention correspond to a block in the directory according to the mantissa of the hash value, so when looking up the metadata of a single file, only the file needs to be scanned Corresponding directory block, you can know the number of its inode without scanning the entire directory. In addition, when performing directory listing, only the subsequences in each sequence need to be pre-read in order, and the index nodes of each file can be quickly obtained according to the directory order, without randomly reading a large number of scattered index nodes. Therefore, the present invention can execute quickly and increase the processing speed for the query of a single file and the detailed listing of the entire directory.
上述實施形態僅例示性說明本發明之原理及其功效,而非用於限制本發明。任何在本技術領域具有通常知識者均可在不違背本發明之精神及範疇下,對上述實施形態進行修飾與改變。因此,本發明之權利保護範圍,應如後述之申請專利範圍所列。 The above-mentioned embodiments are only illustrative to illustrate the principles and effects of the present invention, and are not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can modify and change the above embodiments without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the rights of the present invention should be listed in the scope of the patent application described later.
S100~S124:方法步驟 S100~S124: method steps
Claims (12)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/646,455 US20220405240A1 (en) | 2021-06-16 | 2021-12-29 | Index node allocation method, data processing device and computer-readable medium |
CN202210129194.XA CN115481085A (en) | 2021-06-16 | 2022-02-11 | Index node configuration method, data processing device and computer readable medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163211002P | 2021-06-16 | 2021-06-16 | |
US63/211,002 | 2021-06-16 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202301146A true TW202301146A (en) | 2023-01-01 |
TWI835039B TWI835039B (en) | 2024-03-11 |
Family
ID=86658108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110147495A TWI835039B (en) | 2021-06-16 | 2021-12-17 | Index node allocation method, data processing device and computer-readable medium |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI835039B (en) |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7689547B2 (en) * | 2006-09-06 | 2010-03-30 | Microsoft Corporation | Encrypted data search |
-
2021
- 2021-12-17 TW TW110147495A patent/TWI835039B/en active
Also Published As
Publication number | Publication date |
---|---|
TWI835039B (en) | 2024-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3869316B1 (en) | Hybrid storage | |
KR100484147B1 (en) | Flash memory management method | |
US8423733B1 (en) | Single-copy implicit sharing among clones | |
US9047301B2 (en) | Method for optimizing the memory usage and performance of data deduplication storage systems | |
CN108628753B (en) | Memory space management method and device | |
KR101599177B1 (en) | Data migration for composite non-volatile storage device | |
US9189494B2 (en) | Object file system | |
US9355121B1 (en) | Segregating data and metadata in a file system | |
WO2020038186A1 (en) | Data migration method and apparatus, and storage device | |
CN106406759B (en) | Data storage method and device | |
US20140019706A1 (en) | System and method of logical object management | |
CN111143285A (en) | Small file storage file system and small file processing method | |
US9430492B1 (en) | Efficient scavenging of data and metadata file system blocks | |
CN111427855A (en) | Method for deleting repeated data in storage system, storage system and controller | |
CA2865240A1 (en) | Method and apparatus for content derived data placement in memory | |
CN113535670B (en) | Virtual resource mirror image storage system and implementation method thereof | |
CN113094336B (en) | Cuckoo hash-based file system directory management method and system | |
CN111142780A (en) | Large file storage file system and large file processing method | |
KR20090097696A (en) | File access method and system using the same | |
US10698865B2 (en) | Management of B-tree leaf nodes with variable size values | |
WO2016191964A1 (en) | Management method and device of file system | |
KR20090042570A (en) | Apparatus and method for managing files and memory unit | |
TWI835039B (en) | Index node allocation method, data processing device and computer-readable medium | |
US20200019539A1 (en) | Efficient and light-weight indexing for massive blob/objects | |
US11803527B2 (en) | Techniques for efficient data deduplication |