CN107291870B - Method for reading files in distributed storage in batch - Google Patents
Method for reading files in distributed storage in batch Download PDFInfo
- Publication number
- CN107291870B CN107291870B CN201710451855.XA CN201710451855A CN107291870B CN 107291870 B CN107291870 B CN 107291870B CN 201710451855 A CN201710451855 A CN 201710451855A CN 107291870 B CN107291870 B CN 107291870B
- Authority
- CN
- China
- Prior art keywords
- file
- reading
- read
- files
- metadata information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000003993 interaction Effects 0.000 description 6
- 101100396994 Drosophila melanogaster Inos gene Proteins 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
Abstract
The invention discloses a method for reading files in distributed storage in batch, which comprises the following steps: when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file; storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure; searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd; and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure. The method reduces IO flow of file reading and improves reading speed.
Description
Technical Field
The invention relates to the technical field of file reading, in particular to a method for reading files in batch in distributed storage.
Background
At present, with the rapid development of information technology and internet technology, the data required to be transmitted and stored by enterprises is also increased dramatically. In the scenes of massive small files, such as social shopping websites, radio and television, network videos and the like, the system generates massive small files such as texts, pictures, music and the like, and the files have the following characteristics: the number of files is large, and the size is generally below 1 Mb; reading of files is generally sequential.
In the distributed data storage system, when a file is read, a client firstly sends a request to an mds terminal to acquire the authority and metadata information of the file, then the client finds corresponding stored file data osd according to the acquired metadata information, and reads the data information of the file from the osd. The whole file reading process needs to pass through a long IO flow. When a large number of files are read in batch, each file independently passes through all IO flows, so that metadata information of the file is frequently requested to mds, and then a read interface in objectcatcher is frequently called to read data from osd. In the file reading process, repeated request sending and processing are frequent, and the system pressure is large.
In the distributed file system, a client is required to request metadata information of a file to be read from an mds end every time when the file is read, and data information of the file is acquired from an osd after the metadata information is acquired. When the file is read, a longer IO flow is needed, and the reading speed is lower.
Disclosure of Invention
The invention aims to provide a method for reading files in distributed storage in batch, which aims to reduce IO (input/output) processes for reading files and improve reading speed.
In order to solve the above technical problem, the present invention provides a method for reading files in batch in distributed storage, which is applied to a client, and comprises:
when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file;
storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure;
searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd;
and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure.
Preferably, the storing the index number ino and the directory entry in the metadata information into the entry _ map structure includes:
and storing the index number ino and a corresponding directory entry in the metadata information into a entry _ map structure of the client according to the mode of increasing the index number ino.
Preferably, the performing of file pre-reading according to the storage sequence in the entry _ map structure includes:
and acquiring a corresponding inode structure according to the dent directory structure, and reading a specified number of files from the corresponding osd according to the inode structure.
Preferably, the request is to acquire adjacent file metadata in the current file metadata request at the same time.
Preferably, the amount of the metadata information is dynamically configured through a configuration item.
Preferably, after the data reading of the file to be read is completed, and the file pre-reading is performed according to the storage sequence in the entry _ map structure, the method further includes:
and sequentially reading the next file, hitting the client cache, and acquiring file data from the client cache.
Preferably, the sequentially reading the next file, hitting the client cache, and after obtaining the file data from the client cache, further includes:
and if the file to be read does not hit the client cache, pre-reading the specified number of files from the osd into the client cache.
The invention provides a method for reading files in distributed storage in batches, which is applied to a client, and is used for sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file when the files are operated in an open mode; storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure; searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd; and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure. Therefore, when files are read in batch, a plurality of file metadata information are obtained at one time each time when the file metadata information is obtained from the mds end, the interaction times between the client and the mds are reduced, the request processing pressure of the mds end is reduced, and the IO flow in the file reading process is shortened. And after the current file to be read is read, file pre-reading is carried out according to the sequence stored in the entry _ map, so that the pre-reading cache can be hit at a high probability under the scene of reading the sequence, the IO flow of file reading is reduced, and the reading speed is accelerated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart of a method for reading files in batch in distributed storage according to the present invention;
FIG. 2 is a schematic view of a batch reading process of files;
FIG. 3 is a schematic diagram of a file read;
FIG. 4 is a schematic diagram of batch reading of files.
Detailed Description
The core of the invention is to provide a method for reading files in batch in distributed storage, so as to reduce IO (input/output) processes for reading the files and improve the reading speed.
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms are explained as follows:
mds: a metadata server for managing file metadata information;
osd: and the object storage device is used for storing data information.
Referring to fig. 1, fig. 1 is a flowchart of a method for reading files in batch in distributed storage according to the present invention, where the method is applied to a client and includes:
step S11: when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file;
step S12: storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure;
step S13: searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd;
step S14: and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure.
Therefore, when files are read in batch, a plurality of file metadata information are obtained at one time each time when the file metadata information is obtained from the mds end, the interaction times between the client and the mds are reduced, the request processing pressure of the mds end is reduced, and the IO flow in the file reading process is shortened. And after the current file to be read is read, file pre-reading is carried out according to the sequence stored in the entry _ map, so that the pre-reading cache can be hit at a high probability under the scene of reading the sequence, the IO flow of file reading is reduced, and the reading speed is accelerated.
Based on the above method, further, the process of step S12 specifically includes: and storing the index number ino and a corresponding directory entry in the metadata information into a entry _ map structure of the client according to the mode of increasing the index number ino.
Further, in step S14, the process of performing file pre-reading according to the storage order in the dense _ map structure specifically includes: and acquiring a corresponding inode structure according to the dent directory structure, and reading a specified number of files from the corresponding osd according to the inode structure.
In step S11, the request is to simultaneously acquire adjacent file metadata in the current file metadata request.
In step S11, the amount of metadata information is dynamically configured by the configuration item.
In step S11, the metadata information of the current file and the metadata information of a plurality of files to be read after the current file are obtained.
Further, after step S14, the method further includes: and sequentially reading the next file, hitting the client cache, and acquiring file data from the client cache.
Wherein, the sequential reading of the next file, the hit of the client cache, and the acquisition of the file data from the client cache further comprise: and if the file to be read does not hit the client cache, pre-reading the specified number of files from the osd into the client cache.
In a small file scene, when a file open is operated, metadata information of a current file and a specified number of adjacent files is obtained from an mds end. When the file is read, after the data reading of the current file is completed, reading the adjacent files with the appointed number from the osd end according to the acquired metadata information. When reading in sequence, the cache of the client can be hit, the whole IO flow of reading the file is reduced, and the reading speed is improved. According to the method, the adjacent files are sequentially read into the client cache when the files are read each time, the IO flow of reading the files can be shortened when the read files hit the cache, and the reading speed of the files is improved.
The creation of the files in the scene of the massive small files is generally batch sequential creation, so the inos of adjacent files are also adjacent with a high probability. When the files are read in batches in sequence, the inos of the files to be read can be considered to be sequentially increased.
Based on the method, in detail, when the file open is operated, a request is sent to mds, and metadata information of a plurality of files behind the current open file is obtained. And obtaining batch metadata information through one-time metadata obtaining request. The quantity of metadata acquired each time of metadata request can be dynamically configured through a configuration item.
Then, after obtaining the metadata information of the file to be read under the current directory, storing the file ino and the corresponding entry thereof into a map structure according to an ino increasing mode: dent _ map. And storing the file information under the current directory in the directory structure of the client.
And then, when the file is read, if the file to be read is not in the client cache, and the data information of the file needs to be read from the osd by calling a reading interface in the objectcatcher, after the current file to be read is read, the file is pre-read according to the sequence stored in the entry _ map, and the file with the specified number is read from the osd. Under the sequential reading scene, the pre-reading cache can be hit with high probability, the IO flow of file reading is reduced, and the reading speed is accelerated. Referring to fig. 2, fig. 2 is a schematic view illustrating a batch reading process of files.
According to the method, in the file reading process, the interaction times of the client and the mds end are reduced, the metadata request processing pressure of the mds end is relieved when a large number of files are read, the IO flow of the file reading is shortened, the cache hit rate of the client is improved, and the file reading speed is increased.
As shown in fig. 3, in the file reading process, each file needs to request metadata information from mds first, mds processes the metadata information successfully, and after replying the request, the client finds the corresponding osd of the stored data according to the acquired metadata information, and reads the data information from the osd. The client needs to interact with mds and osd in the whole process. When the number of files is large, the number of interactions of the client with mds and osd is increased dramatically. Causing the system to be more processing-stressful.
As shown in fig. 4, when reading files in batch, each time obtaining file metadata information from an mds end, obtaining a plurality of file metadata information at a time reduces the number of interactions between a client and the mds, reduces the request processing pressure of the mds end, and shortens the IO flow in the file reading process. When the files are read from the osd, each time the currently specified files to be read are successfully read, the files with the subsequent specified number are sequentially pre-read according to the metadata information obtained from the mds end. In the sequential reading scenario, when the specified pre-reading number is n, the files are read from the osd once in batch, and the client caches can be hit n times in the subsequent file reading.
In a scene of a large amount of small files, the files have the characteristics of large quantity, small size and general sequential operation. The method provides batch reading in the small file reading process for the characteristics, reduces request interaction between the client and the mds, shortens IO flow in the file reading process, and improves the file reading speed. Based on the method, the specific implementation steps are as follows:
1. when a client reads a file, file metadata information needs to be acquired first, and the client sends a metadata request to mds; the mds reads the metadata information of the file from the osd and returns the metadata information to the client; the client acquires an open file after metadata information is acquired;
2. in open operation, a client sends a metadata information request of a current file to mds, and obtains batch metadata information from the mds, wherein the batch metadata information comprises the current file and metadata information of a plurality of files to be read behind the current file;
3. storing ino and corresponding dentry in the metadata information acquired from the mds end into a client dent _ map structure;
4. according to the metadata information of the file, finding osd corresponding to the stored file data, and reading the data of the file to be read from the osd;
5. after the data reading of the file to be read is completed, performing file pre-reading according to the storage sequence in the entry _ map;
acquiring a corresponding inode according to the dent structure, and reading a specified number of files from the corresponding osd according to the inode structure;
6. when the next file is read in sequence, the file is pre-read, so that the cache of the client is hit, and the file data is obtained from the cache;
and pre-reading a specified number of files from the osd into the client cache until the files to be read miss the client cache.
The method mainly comprises the steps that when a large number of files are read in sequence in a large-volume small-file-building scene, metadata information of adjacent files is obtained from mds in batches in a reading request, the files adjacent to the files are read from osd, and the file reading speed is accelerated in a file pre-reading mode.
The method for reading files in batch in distributed storage provided by the invention is described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
Claims (6)
1. A method for reading files in a distributed storage in batch is applied to a client and comprises the following steps:
when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file;
storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure; the dent _ map structure is a map structure;
searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd;
after the data reading of the file to be read is completed, performing file pre-reading according to the storage sequence in the dent _ map structure;
wherein, the storing the index number ino and the directory entry dentry in the metadata information into the dent _ map structure includes:
and storing the index number ino and a corresponding directory entry in the metadata information into a entry _ map structure of the client according to the mode of increasing the index number ino.
2. The method of claim 1, wherein the pre-reading of the file according to the storage order in the entry _ map structure comprises:
and acquiring a corresponding inode structure according to the dent directory structure, and reading a specified number of files from the corresponding osd according to the inode structure.
3. The method of claim 1, wherein the request is a request to obtain current file metadata while obtaining adjacent file metadata.
4. The method of claim 3, wherein the amount of metadata information is dynamically configured by a configuration item.
5. The method according to any one of claims 1 to 4, wherein after completing the data reading of the file to be read, and performing file pre-reading according to the storage sequence in the entry _ map structure, the method further comprises:
and sequentially reading the next file, hitting the client cache, and acquiring file data from the client cache.
6. The method of claim 5, wherein the sequentially reading the next file, hitting the client cache, and after retrieving the file data from the client cache, further comprises:
and if the file to be read does not hit the client cache, pre-reading the specified number of files from the osd into the client cache.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710451855.XA CN107291870B (en) | 2017-06-15 | 2017-06-15 | Method for reading files in distributed storage in batch |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710451855.XA CN107291870B (en) | 2017-06-15 | 2017-06-15 | Method for reading files in distributed storage in batch |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107291870A CN107291870A (en) | 2017-10-24 |
CN107291870B true CN107291870B (en) | 2021-03-09 |
Family
ID=60097445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710451855.XA Active CN107291870B (en) | 2017-06-15 | 2017-06-15 | Method for reading files in distributed storage in batch |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107291870B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271363B (en) * | 2018-09-17 | 2023-05-26 | 平安科技(深圳)有限公司 | File storage method and device |
CN117112104B (en) * | 2023-08-24 | 2024-03-29 | 浙江远算科技有限公司 | Local storage mapping method, equipment and medium based on remote desktop gateway |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103916465A (en) * | 2014-03-21 | 2014-07-09 | 中国科学院计算技术研究所 | Data pre-reading device based on distributed file system and method thereof |
CN104123359A (en) * | 2014-07-17 | 2014-10-29 | 江苏省邮电规划设计院有限责任公司 | Resource management method of distributed object storage system |
CN106777047A (en) * | 2016-12-09 | 2017-05-31 | 郑州云海信息技术有限公司 | A kind of metadata read method and its device for distributed system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040133606A1 (en) * | 2003-01-02 | 2004-07-08 | Z-Force Communications, Inc. | Directory aggregation for files distributed over a plurality of servers in a switched file system |
CN102541985A (en) * | 2011-10-25 | 2012-07-04 | 曙光信息产业(北京)有限公司 | Organization method of client directory cache in distributed file system |
US9715348B2 (en) * | 2015-09-09 | 2017-07-25 | Netapp, Inc. | Systems, methods and devices for block sharing across volumes in data storage systems |
CN106776759A (en) * | 2016-11-17 | 2017-05-31 | 郑州云海信息技术有限公司 | The small documents pre-head method and system of distributed file system |
-
2017
- 2017-06-15 CN CN201710451855.XA patent/CN107291870B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103916465A (en) * | 2014-03-21 | 2014-07-09 | 中国科学院计算技术研究所 | Data pre-reading device based on distributed file system and method thereof |
CN104123359A (en) * | 2014-07-17 | 2014-10-29 | 江苏省邮电规划设计院有限责任公司 | Resource management method of distributed object storage system |
CN106777047A (en) * | 2016-12-09 | 2017-05-31 | 郑州云海信息技术有限公司 | A kind of metadata read method and its device for distributed system |
Non-Patent Citations (1)
Title |
---|
基于ARM的嵌入式闪存驱动与UBIFS文件系统的分析与实现;赖尚校;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160315(第3期);第43-44页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107291870A (en) | 2017-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11907642B2 (en) | Enhanced links in curation and collaboration applications | |
CN107679211B (en) | Method and device for pushing information | |
US20190188222A1 (en) | Thumbnail-Based Image Sharing Method and Terminal | |
US10515142B2 (en) | Method and apparatus for extracting webpage information | |
US11417074B2 (en) | Methods and apparatus for identifying objects depicted in a video using extracted video frames in combination with a reverse image search engine | |
CN104462534B (en) | network information sharing method and device | |
US10043220B2 (en) | Method, device and storage medium for data processing | |
CN104700836A (en) | Voice recognition method and voice recognition system | |
US20160042087A1 (en) | Method, System And Front -End Device For Posting User Generated Content | |
US10289739B1 (en) | System to recommend content based on trending social media topics | |
CN107291870B (en) | Method for reading files in distributed storage in batch | |
WO2015154682A1 (en) | Network request processing method, network server, and network system | |
JP7217181B2 (en) | WEARABLE DEVICE, INFORMATION PROCESSING METHOD, APPARATUS AND SYSTEM | |
CN104978341A (en) | File processing method and equipment, and network system | |
CN110598049A (en) | Method, apparatus, electronic device and computer readable medium for retrieving video | |
US20160127496A1 (en) | Method and system of content caching and transmission | |
WO2015154678A1 (en) | File processing method, device, and network system | |
WO2021009597A1 (en) | A system and a method for streaming videos by creating object urls at client | |
CN111147888B (en) | Streaming media video data processing method and device, computer equipment and storage medium | |
CN113051504B (en) | Document preview method, device, apparatus, storage medium and program product | |
CN113705548B (en) | Topic type identification method and device | |
CN116916060A (en) | Video processing method and related equipment | |
CN115080571A (en) | Index updating method and device, electronic equipment and computer readable medium | |
CN116320648A (en) | Bullet screen drawing method and device and electronic equipment | |
CN114900741A (en) | Method, device, equipment, storage medium and product for displaying translated captions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210204 Address after: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province Applicant after: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd. Address before: Room 1601, floor 16, 278 Xinyi Road, Zhengdong New District, Zhengzhou City, Henan Province Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |