CN107291870B

CN107291870B - Method for reading files in distributed storage in batch

Info

Publication number: CN107291870B
Application number: CN201710451855.XA
Authority: CN
Inventors: 张书扬
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2017-06-15
Filing date: 2017-06-15
Publication date: 2021-03-09
Anticipated expiration: 2037-06-15
Also published as: CN107291870A

Abstract

The invention discloses a method for reading files in distributed storage in batch, which comprises the following steps: when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file; storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure; searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd; and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure. The method reduces IO flow of file reading and improves reading speed.

Description

Method for reading files in distributed storage in batch

Technical Field

The invention relates to the technical field of file reading, in particular to a method for reading files in batch in distributed storage.

Background

At present, with the rapid development of information technology and internet technology, the data required to be transmitted and stored by enterprises is also increased dramatically. In the scenes of massive small files, such as social shopping websites, radio and television, network videos and the like, the system generates massive small files such as texts, pictures, music and the like, and the files have the following characteristics: the number of files is large, and the size is generally below 1 Mb; reading of files is generally sequential.

In the distributed data storage system, when a file is read, a client firstly sends a request to an mds terminal to acquire the authority and metadata information of the file, then the client finds corresponding stored file data osd according to the acquired metadata information, and reads the data information of the file from the osd. The whole file reading process needs to pass through a long IO flow. When a large number of files are read in batch, each file independently passes through all IO flows, so that metadata information of the file is frequently requested to mds, and then a read interface in objectcatcher is frequently called to read data from osd. In the file reading process, repeated request sending and processing are frequent, and the system pressure is large.

In the distributed file system, a client is required to request metadata information of a file to be read from an mds end every time when the file is read, and data information of the file is acquired from an osd after the metadata information is acquired. When the file is read, a longer IO flow is needed, and the reading speed is lower.

Disclosure of Invention

The invention aims to provide a method for reading files in distributed storage in batch, which aims to reduce IO (input/output) processes for reading files and improve reading speed.

In order to solve the above technical problem, the present invention provides a method for reading files in batch in distributed storage, which is applied to a client, and comprises:

when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file;

storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure;

searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd;

and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure.

Preferably, the storing the index number ino and the directory entry in the metadata information into the entry _ map structure includes:

and storing the index number ino and a corresponding directory entry in the metadata information into a entry _ map structure of the client according to the mode of increasing the index number ino.

Preferably, the performing of file pre-reading according to the storage sequence in the entry _ map structure includes:

and acquiring a corresponding inode structure according to the dent directory structure, and reading a specified number of files from the corresponding osd according to the inode structure.

Preferably, the request is to acquire adjacent file metadata in the current file metadata request at the same time.

Preferably, the amount of the metadata information is dynamically configured through a configuration item.

Preferably, after the data reading of the file to be read is completed, and the file pre-reading is performed according to the storage sequence in the entry _ map structure, the method further includes:

and sequentially reading the next file, hitting the client cache, and acquiring file data from the client cache.

Preferably, the sequentially reading the next file, hitting the client cache, and after obtaining the file data from the client cache, further includes:

and if the file to be read does not hit the client cache, pre-reading the specified number of files from the osd into the client cache.

The invention provides a method for reading files in distributed storage in batches, which is applied to a client, and is used for sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file when the files are operated in an open mode; storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure; searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd; and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure. Therefore, when files are read in batch, a plurality of file metadata information are obtained at one time each time when the file metadata information is obtained from the mds end, the interaction times between the client and the mds are reduced, the request processing pressure of the mds end is reduced, and the IO flow in the file reading process is shortened. And after the current file to be read is read, file pre-reading is carried out according to the sequence stored in the entry _ map, so that the pre-reading cache can be hit at a high probability under the scene of reading the sequence, the IO flow of file reading is reduced, and the reading speed is accelerated.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flowchart of a method for reading files in batch in distributed storage according to the present invention;

FIG. 2 is a schematic view of a batch reading process of files;

FIG. 3 is a schematic diagram of a file read;

FIG. 4 is a schematic diagram of batch reading of files.

Detailed Description

The core of the invention is to provide a method for reading files in batch in distributed storage, so as to reduce IO (input/output) processes for reading the files and improve the reading speed.

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms are explained as follows:

mds: a metadata server for managing file metadata information;

osd: and the object storage device is used for storing data information.

Referring to fig. 1, fig. 1 is a flowchart of a method for reading files in batch in distributed storage according to the present invention, where the method is applied to a client and includes:

step S11: when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file;

step S12: storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure;

step S13: searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd;

step S14: and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure.

Therefore, when files are read in batch, a plurality of file metadata information are obtained at one time each time when the file metadata information is obtained from the mds end, the interaction times between the client and the mds are reduced, the request processing pressure of the mds end is reduced, and the IO flow in the file reading process is shortened. And after the current file to be read is read, file pre-reading is carried out according to the sequence stored in the entry _ map, so that the pre-reading cache can be hit at a high probability under the scene of reading the sequence, the IO flow of file reading is reduced, and the reading speed is accelerated.

Based on the above method, further, the process of step S12 specifically includes: and storing the index number ino and a corresponding directory entry in the metadata information into a entry _ map structure of the client according to the mode of increasing the index number ino.

Further, in step S14, the process of performing file pre-reading according to the storage order in the dense _ map structure specifically includes: and acquiring a corresponding inode structure according to the dent directory structure, and reading a specified number of files from the corresponding osd according to the inode structure.

In step S11, the request is to simultaneously acquire adjacent file metadata in the current file metadata request.

In step S11, the amount of metadata information is dynamically configured by the configuration item.

In step S11, the metadata information of the current file and the metadata information of a plurality of files to be read after the current file are obtained.

Further, after step S14, the method further includes: and sequentially reading the next file, hitting the client cache, and acquiring file data from the client cache.

Wherein, the sequential reading of the next file, the hit of the client cache, and the acquisition of the file data from the client cache further comprise: and if the file to be read does not hit the client cache, pre-reading the specified number of files from the osd into the client cache.

In a small file scene, when a file open is operated, metadata information of a current file and a specified number of adjacent files is obtained from an mds end. When the file is read, after the data reading of the current file is completed, reading the adjacent files with the appointed number from the osd end according to the acquired metadata information. When reading in sequence, the cache of the client can be hit, the whole IO flow of reading the file is reduced, and the reading speed is improved. According to the method, the adjacent files are sequentially read into the client cache when the files are read each time, the IO flow of reading the files can be shortened when the read files hit the cache, and the reading speed of the files is improved.

The creation of the files in the scene of the massive small files is generally batch sequential creation, so the inos of adjacent files are also adjacent with a high probability. When the files are read in batches in sequence, the inos of the files to be read can be considered to be sequentially increased.

Based on the method, in detail, when the file open is operated, a request is sent to mds, and metadata information of a plurality of files behind the current open file is obtained. And obtaining batch metadata information through one-time metadata obtaining request. The quantity of metadata acquired each time of metadata request can be dynamically configured through a configuration item.

Then, after obtaining the metadata information of the file to be read under the current directory, storing the file ino and the corresponding entry thereof into a map structure according to an ino increasing mode: dent _ map. And storing the file information under the current directory in the directory structure of the client.

And then, when the file is read, if the file to be read is not in the client cache, and the data information of the file needs to be read from the osd by calling a reading interface in the objectcatcher, after the current file to be read is read, the file is pre-read according to the sequence stored in the entry _ map, and the file with the specified number is read from the osd. Under the sequential reading scene, the pre-reading cache can be hit with high probability, the IO flow of file reading is reduced, and the reading speed is accelerated. Referring to fig. 2, fig. 2 is a schematic view illustrating a batch reading process of files.

According to the method, in the file reading process, the interaction times of the client and the mds end are reduced, the metadata request processing pressure of the mds end is relieved when a large number of files are read, the IO flow of the file reading is shortened, the cache hit rate of the client is improved, and the file reading speed is increased.

As shown in fig. 3, in the file reading process, each file needs to request metadata information from mds first, mds processes the metadata information successfully, and after replying the request, the client finds the corresponding osd of the stored data according to the acquired metadata information, and reads the data information from the osd. The client needs to interact with mds and osd in the whole process. When the number of files is large, the number of interactions of the client with mds and osd is increased dramatically. Causing the system to be more processing-stressful.

As shown in fig. 4, when reading files in batch, each time obtaining file metadata information from an mds end, obtaining a plurality of file metadata information at a time reduces the number of interactions between a client and the mds, reduces the request processing pressure of the mds end, and shortens the IO flow in the file reading process. When the files are read from the osd, each time the currently specified files to be read are successfully read, the files with the subsequent specified number are sequentially pre-read according to the metadata information obtained from the mds end. In the sequential reading scenario, when the specified pre-reading number is n, the files are read from the osd once in batch, and the client caches can be hit n times in the subsequent file reading.

In a scene of a large amount of small files, the files have the characteristics of large quantity, small size and general sequential operation. The method provides batch reading in the small file reading process for the characteristics, reduces request interaction between the client and the mds, shortens IO flow in the file reading process, and improves the file reading speed. Based on the method, the specific implementation steps are as follows:

1. when a client reads a file, file metadata information needs to be acquired first, and the client sends a metadata request to mds; the mds reads the metadata information of the file from the osd and returns the metadata information to the client; the client acquires an open file after metadata information is acquired;

2. in open operation, a client sends a metadata information request of a current file to mds, and obtains batch metadata information from the mds, wherein the batch metadata information comprises the current file and metadata information of a plurality of files to be read behind the current file;

3. storing ino and corresponding dentry in the metadata information acquired from the mds end into a client dent _ map structure;

4. according to the metadata information of the file, finding osd corresponding to the stored file data, and reading the data of the file to be read from the osd;

5. after the data reading of the file to be read is completed, performing file pre-reading according to the storage sequence in the entry _ map;

acquiring a corresponding inode according to the dent structure, and reading a specified number of files from the corresponding osd according to the inode structure;

6. when the next file is read in sequence, the file is pre-read, so that the cache of the client is hit, and the file data is obtained from the cache;

and pre-reading a specified number of files from the osd into the client cache until the files to be read miss the client cache.

The method mainly comprises the steps that when a large number of files are read in sequence in a large-volume small-file-building scene, metadata information of adjacent files is obtained from mds in batches in a reading request, the files adjacent to the files are read from osd, and the file reading speed is accelerated in a file pre-reading mode.

The method for reading files in batch in distributed storage provided by the invention is described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A method for reading files in a distributed storage in batch is applied to a client and comprises the following steps:

storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure; the dent _ map structure is a map structure;

after the data reading of the file to be read is completed, performing file pre-reading according to the storage sequence in the dent _ map structure;

wherein, the storing the index number ino and the directory entry dentry in the metadata information into the dent _ map structure includes:

2. The method of claim 1, wherein the pre-reading of the file according to the storage order in the entry _ map structure comprises:

3. The method of claim 1, wherein the request is a request to obtain current file metadata while obtaining adjacent file metadata.

4. The method of claim 3, wherein the amount of metadata information is dynamically configured by a configuration item.

5. The method according to any one of claims 1 to 4, wherein after completing the data reading of the file to be read, and performing file pre-reading according to the storage sequence in the entry _ map structure, the method further comprises:

6. The method of claim 5, wherein the sequentially reading the next file, hitting the client cache, and after retrieving the file data from the client cache, further comprises: