CN110263005B

CN110263005B - Storage system management system for realizing data content locality read-write optimization

Info

Publication number: CN110263005B
Application number: CN201910499391.9A
Authority: CN
Inventors: 殷树; 胡冠洲
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2019-06-11
Filing date: 2019-06-11
Publication date: 2022-11-25
Anticipated expiration: 2039-06-11
Also published as: CN110263005A

Abstract

The invention relates to a storage system management system for realizing data content locality read-write optimization, which is characterized by comprising the following steps: on the basis of the traditional data space-time locality, the concept of content locality is provided, a design scheme of a file system middle layer capable of clustering and storing labeled data is provided, the data analysis module is in butt joint with a user to capture data label information, the back-end interface module is in butt joint with a bottom layer parallel or distributed storage system API (application program interface), clustering and storing according to labels are achieved, and therefore a subsequent data access mode meeting the content locality is converted into space locality. The invention comprises the following steps: 1) The performance of tagged data access is improved significantly. 2) The sequentiality of tagged data access is optimized. 3) The original structured data format is not destroyed, and the correctness of the data is ensured.

Description

Storage system management system for realizing data content locality read-write optimization

Technical Field

The invention relates to a storage management scheme for large-scale structured data, in particular to a storage system management system for realizing data content locality read-write optimization.

Background

With the advent of the big data era, the capability of information collection and calculation processing at the front end of a computer system is greatly improved, a larger amount of data files can be generated in a shorter time, and great pressure is brought to a storage system at the rear, so that Input/Output (I/O) of a file system becomes one of important bottlenecks limiting the performance of the computer system.

One of the ways that conventional file storage systems increase performance is to mine the locality of data access. The locality of data can be divided into two categories:

1. temporal Locality (Temporal Locality): in a period of time, the application often accesses the data at the same position of the file for multiple times, for example, reading and writing a certain calculation result for multiple times;

2. spatial Locality: after accessing data at a certain position of a file, an application often continues to access related data near a storage space at the position, for example, a string of arrays is sequentially read;

through a data caching mechanism from a disk to a memory, a piece of data nearby is prefetched to the memory after a user application accesses a certain position of a file, so that the frequency of actually reaching the disk by a file access request can be obviously reduced, and the interaction performance with a storage system is improved.

In scientific computing applications, different applications have their specific structured data file formats, such as the Bag file (. Bag) used in the robot operating system ROS and the Xtc file (. Xtc) used in the biomolecular dynamics visualization software VMD, among others. The data file contains a large number of data entries, which can be clustered according to certain characteristics, such as data source differentiation (e.g. infrared sensor, position sensor, etc.), or data attribute differentiation (e.g. water molecules, protein molecules, etc.); such features are referred to as "tags" of the data entry. It is anticipated that a user may need to access and process certain types of related data of a particular tag during the same time period when extracting and analyzing the data. At this time, if the data entries are stored clustered according to tags, their spatio-temporal locality can only play a role.

However, as the amount of data collected by the front end is increasing, they often adopt a log-type writing mode to aggregate data with various tags into data files in a simple time sequence. Limited by such a storage optimization mode for the writing process, in most current structured data file formats, data entries of different tag categories are not clustered together but are complexly staggered together in a time sequence, so that subsequent data accesses of a user no longer satisfy the characteristics of space-time locality.

Under the circumstance, the traditional file system based on the space-time locality assumption cannot improve the performance through a cache mechanism, but introduces a large amount of useless data prefetching, and seriously affects the performance of a storage link. In addition, such a frequent, small-volume, random data access mode is not friendly to the existing storage hardware (especially to mechanical hard disk), and brings disadvantages of high I/O latency, small bandwidth, and easy deterioration of frequent head addressing.

Disclosure of Invention

The invention aims to provide a management system of a storage system based on content locality.

In order to achieve the above object, the technical solution of the present invention is to provide a storage system management system for implementing data content locality read-write optimization, which is located in a file system intermediate layer for clustering and storing tagged data, and is configured to convert a data access mode satisfying content locality into spatial locality, where content locality refers to a tendency that a user application continues to access data of the same or related content after accessing data of a certain tagged content, and is characterized in that the management system includes a data analysis module interfacing with a user program, a backend interface module interfacing with a bottom-layer parallel storage system, and a virtual file system intermediate layer transparent to the bottom-layer storage system and translucent to the user application program;

the data analysis module is used for capturing a request of a user for writing a tagged data file into the file system, then analyzing data records in the tagged data file, sequentially extracting the tag type Topic of each data record to be written, and simultaneously acquiring the Offset of the data in the original tagged data file;

the back-end interface module realizes interface calling of a back-end parallel or distributed storage system and is used for really writing data processed by the middle layer of the virtual file system into the storage hardware in a mode of conforming to the selected back-end file system;

the middle layer of the virtual file system is realized in a user mode library mode, the upper layer is in butt joint with a user program, a default system calling function read and written by POSIX is packaged, and the lower layer is in butt joint with a back-end storage system; the write-in request of the user is captured by the data analysis module, the extracted data record and the label type Topic and the Offset thereof are processed by the virtual file system middle layer, the virtual file system middle layer clusters the data of the same label type Topic and stores the data into the same sub-data file, and simultaneously, the Offset of each record in the original labeled data file and the Offset of the record in the sub-data file are in one-to-one correspondence to form a table and are stored into the associated sub-index file; the user's request for reading the data of the specific tag type Topic is realized through a function interface provided by the intermediate layer of the virtual file system, and the corresponding sub data file in the back end is directly read.

Preferably, the data analysis module treats metadata that is not explicitly classified in the tagged data file as a special class of tag category Topic.

Preferably, the virtual file system intermediate layer includes a set of file system intermediate layer extensibility schemes, and by unifying the interfaces of the data analysis module to the virtual file system intermediate layer and the standards of the virtual file system intermediate layer to the interfaces of the backend interface modules, for different user applications and different backend storage systems, only the corresponding data analysis module and the backend interface module need to be provided, that is, the corresponding data analysis module and the backend interface module are linked in a pluggable manner, so that compatibility and extension are realized.

Preferably, for ordinary reading request and file traversal operation of the user to the tagged data file, translation and merging operation are performed through the index table stored in the sub-index file, so that the user can be provided with a phantom that the file is still stored according to the original structure, and therefore the logic of the original storage structure possibly required by the algorithm of the user program is not influenced.

The invention provides a concept of content locality on the basis of the traditional space-time locality. Content Locality of data access means that a user application has a tendency to access data of a certain tag Content and then continue to access the same or related Content data. Content locality is a data access mode appearing in structured and tagged data in the current big data environment, and is an extended extension of traditional spatio-temporal locality.

The invention has the following beneficial effects:

1) The performance of tagged data access is significantly improved. Through the reorganization of the data storage structure, the content locality characteristics of user data access are converted into spatial locality, and the contradiction between large-scale labeling data reading and writing and a traditional space-time locality caching mechanism is reconciled. Taking a data message Bag file in a robot operating system ROS as an example, the scheme improves the file opening efficiency to 5 times of the original API; in the reading operation of the file, the reading of the relevant message record is guided to the reading of the whole data, so that the traditional data caching mechanism plays a role, and the reading delay is reduced by 15 percent.

2) The sequentiality of tagged data access is optimized. For data reading and collection of related tags, an original staggered and random access mode is converted into a sequential and blocking access mode, so that the working performance of physical storage hardware, particularly HDD mechanical hard disks, is exerted to a greater extent, and the failure probability is reduced.

3) The correctness of the data is ensured. The invention only classifies and arranges the data records and is realized in the form of the user mode middle layer, on one hand, the invention does not intervene in the working flow of a mature parallel or distributed storage system used at the bottom layer, and the back-end storage still plays the functions of parallelization reading and writing, consistency redundancy, fault recovery and the like, thereby ensuring the correctness of the data; on the other hand, the file format of the original format can still be provided for the user without breaking the requirements of the user program on the data format.

Drawings

FIG. 1 is a flow chart of the structure of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention can be made by those skilled in the art after reading the teaching of the present invention, and these equivalents also fall within the scope of the claims appended to the present application.

The invention provides a design scheme of a file system middle layer capable of clustering and storing labeled data, which converts a data access mode meeting the content locality into the spatial locality. Specifically, the storage system management system for implementing data content locality read-write optimization provided by the invention comprises:

a data analysis module interfaced with the user program. The data analysis module can capture a request of a user for writing a tagged data file into a file system, then analyze data records in the tagged data file, and sequentially extract tag categories of each data record to be written, wherein the tag categories are called Topic. Meanwhile, the Offset of the data in the original tagged data file is also obtained. Since not all data in the original tagged data file are necessarily true content data, particularly, metadata that is not explicitly classified, such as version number of the data file, etc., is treated as a special type of Topic.

And the back-end interface module is in butt joint with the bottom parallel storage system. The back-end interface module realizes interface calling of a back-end parallel or distributed storage system, and is used for really writing data processed by the middle layer of the virtual file system into the storage hardware in a mode of conforming to the selected back-end file system.

A virtual file system middle layer transparent to the underlying storage system and translucent to user applications. The virtual file system middle layer is realized in a user mode library mode, is in butt joint with a user program, is packaged with a default system call function read and written by POSIX, and is in butt joint with a back-end storage system in the lower case. The write-in request of the user is captured by the data analysis module, the extracted data record, the label type and the original offset are processed by the virtual file system middle layer, then the virtual file system middle layer clusters the data of the same label type and stores the data into the same sub-data file, and simultaneously, each offset recorded in the original labeled data file and the offset thereof in the sub-data file are mapped into a table one by one and stored into the associated sub-index file. The user's request for reading the data of the specific tag is realized through the function interface provided by the middle layer, and the corresponding sub data file in the back end is directly read.

The invention also designs an expandability scheme of the middle layer of the file system. By unifying the standard of the interface of the data analysis module to the middle layer and the standard of the interface of the middle layer to the rear-end module, the corresponding analysis module and the corresponding rear-end module can be linked in a pluggable manner aiming at different user applications and different rear-end storage systems, and the compatibility and the expandability are strong;

in addition, for the common reading request and file traversal operation from the user to the tagged data file, the invention carries out translation and merging operation through the stored index table, and can provide a phantom for the user, namely the file is still stored according to the original structure, thereby not influencing the original storage structure logic possibly required by the algorithm of the user program and ensuring the correctness of the storage optimization scheme.

Referring to fig. 1, the following implementation scheme using the user application as ROS Bag API, the backend storage as Ceph, and the middle layer using the PLFS framework exemplifies the present invention:

1. middle layer virtual file system mount

1. The plfsrc configuration file is edited according to the syntax definition of the PLFS;

2. defining the back-end storage in the configuration file as a Ceph parallel storage system;

3. using plfs instruction to mount the middle layer to the specified directory;

2. writing tagged data

1. Writing a bag format data file into the mounting directory by a user;

2. the data analysis module captures written data and calls an analyzer;

3. the analyzer analyzes the written data according to the Bag format;

4. adding the data message obtained by analysis to the data file of the corresponding label;

5. the index of each piece of data is mapped and recorded in the index file of the corresponding label;

6. metadata of the non-data are uniformly recorded under a special label;

7. calling a write interface of the rear end Ceph to perform actual writing;

8. the write operation is complete;

3. opening data files

1. Calling an open interface defined by the middle layer by a user;

2. the middle layer skips the steps of original complete scanning files in the ROS API, establishing indexes and the like, and directly loads the index files to obtain the mapping of the indexes;

3. opening is completed, and returning is carried out;

4. reading specific tag data

1. A user calls a read interface defined by the middle layer and transmits a label to be read;

2. the middle layer responds to the reading operation and directly positions a data file corresponding to the tag at the rear end;

3. calling a read interface of the rear-end Ceph to actually read, and only reading the data file;

4. after the reading operation is finished, returning the data under the label of the user;

5. legacy POSIX read support

1. Calling a traditional POSIX reading operation interface by a user;

2. the middle layer captures the reading operation and positions the label of the read content according to the mapping relation of the index;

3. acquiring data from the data file corresponding to the tag;

4. and returning the data read by the user after the reading operation is finished.

Claims

1. A storage system management system for realizing data content locality read-write optimization is positioned in a file system intermediate layer for clustering and storing taggable data and is used for converting a data access mode meeting the content locality into spatial locality, wherein the content locality means that after a user application accesses data of a certain tag content, the data access mode has a trend of continuously accessing the data of the same or related content, and the storage system management system is characterized by comprising a data analysis module, a rear-end interface module and a virtual file system intermediate layer, wherein the data analysis module is in butt joint with a user program, the rear-end interface module is in butt joint with a bottom-layer parallel storage system, and the virtual file system intermediate layer is transparent to the bottom-layer parallel storage system and semitransparent to the user application program;

2. The storage system management system for implementing data content locality read-write optimization as claimed in claim 1, wherein said data analysis module regards metadata not explicitly classified in said tagged data file as a special class of tag Topic.

3. The storage system management system for implementing data content locality read-write optimization as claimed in claim 1, wherein the virtual file system middle layer includes a set of file system middle layer extensibility schemes, and by unifying the interfaces of the data analysis module to the virtual file system middle layer and the standards of the virtual file system middle layer to the interfaces of the backend interface modules, it is only necessary to provide the corresponding data analysis module and the backend interface modules, that is, to pluggably link the corresponding data analysis module and the backend interface modules, for different user applications and different backend storage systems, so as to implement compatibility and extension.

4. The storage system management system for implementing data content locality read-write optimization according to claim 1, wherein for normal read requests and file traversal operations from users to tagged data files, translation and merging operations are performed through the index tables stored in the sub-index files, which can provide users with a illusion that the files are still stored according to the original structure, so as not to affect the logic of the original storage structure that may be required by the algorithm of the user program.