CN104537050A

CN104537050A - Method for batch and rapid establishment of metadata and data of file system

Info

Publication number: CN104537050A
Application number: CN201410826066.6A
Authority: CN
Inventors: 曹强; 钱璐; 谭诗诗; 谢长生
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2014-12-25
Filing date: 2014-12-25
Publication date: 2015-04-22
Anticipated expiration: 2034-12-25
Also published as: CN104537050B

Abstract

The invention discloses a method for batch and rapid establishment of metadata and data of a file system. According to the method for batch and rapid establishment of the metadata and data of the file system, specifically aiming at a pre-known working set such as catalog copying and decompressing of compressed files, three kinds of metadata, including super blocks, block group descriptors, a data bitmap and an Inode bitmap, in a metadata area are modified firstly on the premise that the reliability of the system file is guaranteed; then, Inode information and data are written into the metadata area and a data area respectively according to the operating sequence. According to the method for batch and rapid establishment of the metadata and data of the file system, frequent random lower cases introduced onto a metadata page during back writing are avoided for the file system; writing delay time is prolonged, a part of IO requests are absorbed and combined by the metadata and the data area to a greater extent, and repeated updating for issuing the same metadata object to a disk is replaced with one-time disk updating; in the metadata and the data area, sequential establishment of Inodes and data blocks is achieved, and head track seeking frequency and positioning frequency are reduced.

Description

A kind of method of batch fast creation file system metadata and data

Technical field

The invention belongs to computer memory system technical field, more specifically, relate to a kind of method of batch fast creation file system metadata and data.

Background technology

File system is the core component of storage system, and primary responsibility is constituent act and data on a storage device.File system is made up of three parts: the interface of file system, the operation of object and the software assembly (object and attribute) of management.

Decades in the past, the research of file system mainly concentrates on lifting file system performance, and the magnetic disk comprising file optimizing system, to make full use of Disk bandwidth and to reduce disk tracking, reduces the I/O operation etc. to disk by lifting memory efficient.But along with the increase of system complexity and the growth of storage data volume, the reliability of balanced file system and performance become the focus of academia and industry member research gradually.

Based in the file system of disk, in order to reach the reliability requirement of system, needing the metadata on internal memory and data to update on disk, ensureing consistance and the persistence of metadata and data.But renewal rewards theory will produce the small letter of a large amount of dispersion frequently, thus affects the performance of file system.From hardware point of view analysis, the little write operation of dispersion can increase the seek time of magnetic head, reduces the speed upgraded; From software respective analysis, internal system is serialized processes for the synchronization mechanism of metadata and data, cannot make full use of on the basis of Disk bandwidth, further limit the speed of renewal rewards theory.

And at large data age, copy and move heap file data to become and operate more and more frequently, this operation relates to the establishment of heap file catalogue and data, a large amount of metadata and data can be produced, according to existing method, in order to ensure consistance and persistence, all catalogues and file creation operation are that serial performs, respective meta-data and data must write disk according to order, thus produce a large amount of random little I/O, seriously limit reproduction speed.

Present stage is divided into both direction for the primary solutions of this difficult problem.A direction is the time suitably increasing write delay, with the amendment of merger metadata and data; Another direction is the research of log-structuredization file system.But write delay method is limited for the lifting of file system performance, and the disk space that the design of log-structuredization file system makes file system must be taken by the expired invalid data of extra garbage reclamation operation cleaning, when disk space utilization rate is higher, performance can decline fast, and this makes log-structuredization file system fail to be promoted in disk file system.This two schemes all can not solve the problem that when copying and move heap file data, the processing time is long very well.

Summary of the invention

For above defect or the Improvement requirement of prior art, the invention provides the method and system of a kind of batch fast creation file system metadata and data, its object is to solve in the file system of existing ext series the technical matters easily causing file system processing speed limited existing for method copying and move heap file data and provide metadata and data serial to create.

For achieving the above object, according to one aspect of the present invention, provide a kind of method of batch fast creation file system metadata and data, comprise the following steps:

(1) in the catalogue copy or compressed file decompression operations of user's execution, by traveling through the number of sub-directory and file under the data set statistics copy or decompress(ion) catalogue predicted in this operation, to obtain the number of Inode node, it equals the two quantity sum N _inodes;

(2) in the internal memory of disk file system, metadata area and the data area in continuation address space is built respectively, by the copies of metadata in disk in the metadata area in continuation address space;

(3) in the metadata area created and data area, metadata and data are created successively according to the data set of precognition, until terminate the operation of data set, the metadata in the metadata area of internal memory and data area and data rapid batch are updated in meta-data region corresponding in disk and data field.

Preferably, the metadata of copy comprises superblock, block group descriptor, Inode node bitmap, data bitmap.

Preferably, step (2) comprises following sub-step:

(2-1) the size S of metadata area to be built is calculated _{metadata_chunk}, build in the internal memory of disk file system and there is this size S _{metadata_chunk}address space continuous print metadata area;

(2-2) the metadata replication in disk to build meta-data region in;

(2-3) the size S of data area to be built is set _{data_chunk}, in the internal memory of disk file system, build size is S _{data_chunk}address space continuous print data area.

Preferably, in step (2-1), metadata area size adopts following formula to calculate:

\begin{matrix} S_{metadata_chunk} = (\frac{N_{inodes} * S_{inode}}{1024}) + S_{super_block} \\ + S_{inode_bitmap} + S_{block_bitmap} (KB) \end{matrix}

Wherein, S _inodefor single Inode node size, S _{super_block}the size of superblock, S _{inode_bitmap}the size of Inode node bitmap, S _{block_bitmap}it is the size of data bitmap.

Preferably, data bitmap size S _{block_bitmap}following formula is adopted to calculate:

S_{block_bitmap} = \frac{S_{Disk_cpapcity}}{8 * S_{Disk_block_size}}

Wherein, S _{disk_capacity}the amount of capacity of disk, S _{disk_block_size}for disk block size;

Inode node bitmap size S _{inode_bitmap}following formula is adopted to calculate:

S_{inode_bitmap} = \frac{S_{block_bitmap}}{N_{block_per_inode}}

Wherein N _{block_per_inode}the data block quantity that in disk, each node takies.

Preferably, step (3) comprises following sub-step:

(3-1) the Inode nodal information in the catalogue copy of user's execution or compressed file decompression operations and corresponding data are write metadata area and internal storage data region respectively, renewal rewards theory is carried out to the superblock copied in metadata area, block group descriptor, data block bitmap, Inode node bitmap three kinds of metadata simultaneously;

(3-2) respectively the metadata in the metadata area of internal memory and data area and data rapid batch are updated in meta-data region corresponding in disk and data field.

Preferably, step (3-1) comprises following sub-step:

(3-1-1) counter i=1 is set;

(3-1-2) i-th Inode nodal information and corresponding data in the catalogue copy of user's execution or compressed file decompression operations are write metadata area and internal storage data region respectively, and renewal rewards theory is carried out to the superblock copied in metadata area, block group descriptor, data block bitmap, Inode node bitmap three kinds of metadata;

(3-1-3) current data region size and data area size S is judged _{data_chunk}between ratio whether reach threshold value, if it is proceed to step (3-1-4), otherwise proceed to step (3-1-5);

(3-1-4) in internal memory, a new distribution size is S _{data_chunk}the data area in continuation address space, data corresponding for remaining Inode nodal information are write in this new data area, simultaneously by the content of former data area with whole region for transmission unit batch writes back to data field corresponding on disk

In;

(3-1-5) i=i+1 is set, and judges whether i equals N _inodesif then process terminates, otherwise return step (3-1-2).

Preferably, step (3-2) comprises following sub-step:

(3-2-1) batch data also not writing back disk in the data area of internal memory is write back in the data field of disk;

(3-2-2) superblock in the metadata area of internal memory, block group descriptor, data block bitmap, Inode node bitmap are write back to superblock, block group descriptor, data block bitmap corresponding in disk, in Inode node bit map area;

(3-2-3) the Inode nodal information batch in the metadata area of internal memory is write back to Inode corresponding on disk to show in district.

According to another aspect of the present invention, provide the system of a kind of batch fast creation file system metadata and data, comprising:

First module, for perform user catalogue copy or compressed file decompression operations in, by traveling through the number of sub-directory and file under the data set statistics copy or decompress(ion) catalogue predicted in this operation, to obtain the number of Inode node, it equals the two quantity sum N _inodes;

Second module, for building metadata area and the data area in continuation address space in the internal memory of disk file system respectively, by the copies of metadata in disk in the metadata area in continuation address space;

3rd module, for creating metadata and data successively according to the data set of precognition in the metadata area created and data area, until terminate the operation of data set, the metadata in the metadata area of internal memory and data area and data rapid batch are updated in meta-data region corresponding in disk and data field.

In general, the above technical scheme conceived by the present invention compared with prior art, can obtain following beneficial effect:

1, owing to have employed step (3-1), so file system adds the time of write delay, make in metadata area in internal memory and data area, the part I/O request of merger to a greater degree, the repeatedly renewal rewards theory making same metadata object be issued to disk is merged into a disk operating, decrease the synchronization times with disk, avoid the dispersion small letter problem of frequent Timing Belt.

2, simultaneously owing to have employed step (3-2), in metadata area in internal memory and data area, realize Inode nodal information and the order establishment of data block ground, make them can write back to continuation address space corresponding on disk by rapid batch, so decrease the tracking of magnetic head and the number of times of location, the speed copying and move heap file of significant increase file system.

Accompanying drawing explanation

Fig. 1 is application internal memory cut-away view of the present invention;

Fig. 2 is application disk cut-away view of the present invention;

Fig. 3 is the framework organization chart of the method for the present invention's batch fast creation file system metadata and data;

Fig. 4 is the overview flow chart of the method for the present invention's batch fast creation file system metadata and data;

Fig. 5 is the refinement process flow diagram of step in the inventive method (2);

Fig. 6 is the refinement process flow diagram of step in the inventive method (3);

Fig. 7 is the refinement process flow diagram of step in the inventive method (3-1).

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.In addition, if below in described each embodiment of the present invention involved technical characteristic do not form conflict each other and just can mutually combine.

Basic ideas of the present invention are, a kind of method of batch fast creation file system metadata and data is provided, it is specifically for the data set of precognition, and this data set comprises catalogue and the fileinfo of the establishment of all needs, such as, copy source directory and compressed file set etc.Under the prerequisite ensureing file system reliability, focus in two different regions by internal memory respectively to the creation operation of metadata and data, described region is continuous print space in address in internal memory, for metadata, called after metadata area, for data, called after data area.In operating process, Inode nodal information in the catalogue copy of user's execution or compressed file decompression operations and corresponding data are write metadata area and internal storage data region respectively, simultaneously to the superblock copied in metadata area, block group descriptor, data block bitmap, Inode node bitmap three kinds of metadata carry out renewal rewards theory.Until operated, the metadata finally metadata area in internal memory and data area created respectively and data entirety have write back in meta-data region corresponding to disk and data field, complete and copy and move heap file data.

Batch creates the method for file system metadata and data specifically for a data set for precognition, and this data set comprises catalogue and the fileinfo of the establishment of all needs.Described a kind of batch creates the method for file system metadata and data specifically for specific operation, comprises catalogue copy and compressed file decompression operations etc.

First, some nouns and term in the present invention are made an explanation:

File system metadata: file system metadata refers to the management data structure in space and the metadata of file system file and directory tree structure, comprise the data structure managing whole file system, the data structure of management allocation of space, the attribute of file and pointer, the attribute of catalogue and content.For EXT2, superblock, block group descriptor, data block bitmap, Inode node bitmap, Inode shows, and the directory entry page all belongs to the metadata of file system.Wherein Inode table is the set of Inode nodal information, has both comprised file Inode nodal information, and has also comprised catalogue Inode nodal information; The directory entry page is the set of directory entry, records the information such as the name of All Files and catalogue under this catalogue and No. Inode.

File system data: file system data refers to the content of file.For EXT2, the data of file system are the file contents leaving data field in.

Metadata area: metadata area refers to the one section of memory address continuous print space distributed in internal memory.It is for buffer memory superblock, block group descriptor, Inode node bitmap, data bitmap and copy the Inode nodal information created in catalogue and compressed file decompression operations process.The establishment data centralization of size by predicting of metadata area, the number of catalogue and file calculates.

Data area: data area refers to the one section of memory address continuous print space distributed in internal memory.Its data message for creating in cached copies catalogue and compressed file decompression operations process.The size S of data area _{data_chunk}for the default value (such as 64MB) preset, it can be arranged according to concrete scene dynamics by user.

In order to accelerating file system operations, alleviate the performance bottleneck that disk at a slow speed brings, system utilizes memory cache partial document system metadata and data.Because inside save as volatile storage, thus the maintenance requirement metadata page of file system metadata is write back to non-volatile memory apparatus in time.

Fig. 1 is internal memory cut-away view of the present invention.As shown in Figure 1, superblock, block group descriptor, data block bitmap, Inode node bitmap, Inode nodal information and data are cached with in internal memory, for concrete catalogue copy and the operation of compressed file decompress(ion), system in Memory Allocation address space continuous print metadata area and data area, can be respectively used to metadata and the data of journal copy and decompression operations.

Fig. 2 is disk cut-away view of the present invention.As shown in Figure 2, superblock, block group descriptor, data block bitmap, Inode node bitmap, Inode table and data block is stored lastingly in disk, in internal memory, the volatile metadata of buffer memory and data need the metadata corresponding with on disk and data content in time to carry out synchronously, therefore the small letter of a large amount of dispersion can be produced, cause frequent tracking and the location of magnetic head, affect the processing speed of file system.

As shown in Figure 4, the method for the present invention's batch fast creation file system metadata and data comprises the following steps:

(2) in the internal memory of disk file system, build metadata area and the data area in continuation address space respectively, by the copies of metadata in disk in the metadata area in continuation address space, the metadata wherein copied comprises superblock, block group descriptor, Inode node bitmap, data bitmap;

As shown in Figure 5, step (2) comprises following sub-step:

Specifically, metadata area size adopts following formula to calculate:

\begin{matrix} S_{metadata_chunk} = (\frac{N_{inodes} * S_{inode}}{1024}) + S_{super_block} \\ + S_{inode_bitmap} + S_{block_bitmap} (KB) \end{matrix}

Wherein, S _inodefor single Inode node size, unit is Byte, S _{super_block}be the size of superblock, unit is KB, it typically is 1KB, S _{inode_bitmap}be the size of Inode node bitmap, unit is KB, S _{block_bitmap}be the size of data bitmap, unit is KB;

Wherein, data bitmap size S _{block_bitmap}following formula is adopted to calculate:

S_{block_bitmap} = \frac{S_{Disk_cpapcity}}{8 * S_{Disk_block_size}}

Wherein, S _{disk_capacity}be the amount of capacity of disk, unit is KB, S _{disk_block_size}for disk block size, unit is KB;

S_{inode_bitmap} = \frac{S_{block_bitmap}}{N_{block_per_inode}}

Wherein N _{block_per_inode}the data block quantity that in disk, each node takies;

(2-2) metadata (comprising superblock, block group descriptor, Inode node bitmap and data bitmap) in disk is copied in the meta-data region of structure;

(2-3) the size S of data area to be built is set _{data_chunk}, in the internal memory of disk file system, build size is S _{data_chunk}address space continuous print data area; Specifically, the size S of data area _{data_chunk}for the default value (such as 64MB) preset, it can be arranged (file size of copy or decompression operations is larger, then this value is larger, otherwise then less) according to concrete scene dynamics by user.

(3) in the metadata area created and data area, metadata and data are created successively according to the data set of precognition, until terminate the operation of data set, the metadata in the metadata area of internal memory and data area and data rapid batch are updated in meta-data region corresponding in disk and data field:

As shown in Figure 6, this step comprises following sub-step:

(3-1) the Inode nodal information in the catalogue copy of user's execution or compressed file decompression operations and corresponding data are write metadata area and internal storage data region respectively, renewal rewards theory is carried out to the superblock copied in metadata area, block group descriptor, data block bitmap, Inode node bitmap three kinds of metadata simultaneously, specifically, because a kind of method of batch fast creation file system metadata and data copies or compressed file decompression operations for the catalogue predicting building work collection, so in the process, Inode nodal information and data all belong to the new content created, Inode nodal information and data are sequentially written in metadata area and data area according to what operate successively, wherein Inode node and data are distributed in order, the address space that Inode nodal information and data are corresponding on disk simultaneously is also continuous print, with realize metadata area and data area content batch be written back to disk.

Specifically, as shown in Figure 7, this step comprises following sub-step:

(3-1-1) counter i=1 is set;

(3-1-3) current data region size and data area size S is judged _{data_chunk}between ratio whether reach threshold value, if it is proceed to step (3-1-4), otherwise proceed to step (3-1-5); Specifically, the large I of threshold value is by user's free setting;

(3-1-4) in internal memory, a new distribution size is S _{data_chunk}the data area in continuation address space, data corresponding for remaining Inode nodal information are write in this new data area, simultaneously by the content of former data area with whole region for transmission unit batch writes back in data field corresponding on disk;

(3-1-5) i=i+1 is set, and judges whether i equals N _inodesif then process terminates, otherwise return step (3-1-2);

This step comprises following sub-step:

The invention has the advantages that:

Owing to have employed step (3-1), so file system adds the time of write delay, make in metadata area in internal memory and data area, the part I/O request of merger to a greater degree, the repeatedly renewal rewards theory making same metadata object be issued to disk is merged into a disk operating, decrease the synchronization times with disk, avoid the dispersion small letter problem of frequent Timing Belt.Simultaneously, owing to have employed step (3-2), in metadata area in internal memory and data area, realize Inode nodal information and the order establishment of data block ground, make them can write back to continuation address space corresponding on disk by rapid batch, so decrease the tracking of magnetic head and the number of times of location, thus the speed copying and move heap file of significant increase file system.

Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims

1. a method for batch fast creation file system metadata and data, is characterized in that, comprise the following steps:

2. method according to claim 1, is characterized in that, the metadata of copy comprises superblock, block group descriptor, Inode node bitmap, data bitmap.

3. method according to claim 1, is characterized in that, step (2) comprises following sub-step:

(2-2) the metadata replication in disk to build meta-data region in;

4. method according to claim 3, is characterized in that, in step (2-1), metadata area size adopts following formula to calculate:

\begin{matrix} S_{metadata_chunk} = (\frac{N_{inodes} * S_{inode}}{1024}) + S_{super_block} \\ + S_{inode_bitmap} + S_{block_bitmap} (KB) \end{matrix}

5. method according to claim 4, is characterized in that,

Data bitmap size S _{block_bitmap}following formula is adopted to calculate:

S_{block_bitmap} = \frac{S_{Disk_capacity}}{8 * S_{Disk_block_size}}

S_{inode_bitmap} = \frac{S_{block_bitmap}}{N_{block_per_inode}}

6. method according to claim 1, is characterized in that, step (3) comprises following sub-step:

7. method according to claim 6, is characterized in that, step (3-1) comprises following sub-step:

(3-1-1) counter i=1 is set;

8. method according to claim 7, is characterized in that, step (3-2) comprises following sub-step:

9. a system for batch fast creation file system metadata and data, is characterized in that, comprising: