CN107291870B - Method for reading files in distributed storage in batch - Google Patents

Method for reading files in distributed storage in batch Download PDF

Info

Publication number
CN107291870B
CN107291870B CN201710451855.XA CN201710451855A CN107291870B CN 107291870 B CN107291870 B CN 107291870B CN 201710451855 A CN201710451855 A CN 201710451855A CN 107291870 B CN107291870 B CN 107291870B
Authority
CN
China
Prior art keywords
file
reading
read
files
metadata information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710451855.XA
Other languages
Chinese (zh)
Other versions
CN107291870A (en
Inventor
张书扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201710451855.XA priority Critical patent/CN107291870B/en
Publication of CN107291870A publication Critical patent/CN107291870A/en
Application granted granted Critical
Publication of CN107291870B publication Critical patent/CN107291870B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Abstract

The invention discloses a method for reading files in distributed storage in batch, which comprises the following steps: when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file; storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure; searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd; and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure. The method reduces IO flow of file reading and improves reading speed.

Description

Method for reading files in distributed storage in batch
Technical Field
The invention relates to the technical field of file reading, in particular to a method for reading files in batch in distributed storage.
Background
At present, with the rapid development of information technology and internet technology, the data required to be transmitted and stored by enterprises is also increased dramatically. In the scenes of massive small files, such as social shopping websites, radio and television, network videos and the like, the system generates massive small files such as texts, pictures, music and the like, and the files have the following characteristics: the number of files is large, and the size is generally below 1 Mb; reading of files is generally sequential.
In the distributed data storage system, when a file is read, a client firstly sends a request to an mds terminal to acquire the authority and metadata information of the file, then the client finds corresponding stored file data osd according to the acquired metadata information, and reads the data information of the file from the osd. The whole file reading process needs to pass through a long IO flow. When a large number of files are read in batch, each file independently passes through all IO flows, so that metadata information of the file is frequently requested to mds, and then a read interface in objectcatcher is frequently called to read data from osd. In the file reading process, repeated request sending and processing are frequent, and the system pressure is large.
In the distributed file system, a client is required to request metadata information of a file to be read from an mds end every time when the file is read, and data information of the file is acquired from an osd after the metadata information is acquired. When the file is read, a longer IO flow is needed, and the reading speed is lower.
Disclosure of Invention
The invention aims to provide a method for reading files in distributed storage in batch, which aims to reduce IO (input/output) processes for reading files and improve reading speed.
In order to solve the above technical problem, the present invention provides a method for reading files in batch in distributed storage, which is applied to a client, and comprises:
when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file;
storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure;
searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd;
and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure.
Preferably, the storing the index number ino and the directory entry in the metadata information into the entry _ map structure includes:
and storing the index number ino and a corresponding directory entry in the metadata information into a entry _ map structure of the client according to the mode of increasing the index number ino.
Preferably, the performing of file pre-reading according to the storage sequence in the entry _ map structure includes:
and acquiring a corresponding inode structure according to the dent directory structure, and reading a specified number of files from the corresponding osd according to the inode structure.
Preferably, the request is to acquire adjacent file metadata in the current file metadata request at the same time.
Preferably, the amount of the metadata information is dynamically configured through a configuration item.
Preferably, after the data reading of the file to be read is completed, and the file pre-reading is performed according to the storage sequence in the entry _ map structure, the method further includes:
and sequentially reading the next file, hitting the client cache, and acquiring file data from the client cache.
Preferably, the sequentially reading the next file, hitting the client cache, and after obtaining the file data from the client cache, further includes:
and if the file to be read does not hit the client cache, pre-reading the specified number of files from the osd into the client cache.
The invention provides a method for reading files in distributed storage in batches, which is applied to a client, and is used for sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file when the files are operated in an open mode; storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure; searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd; and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure. Therefore, when files are read in batch, a plurality of file metadata information are obtained at one time each time when the file metadata information is obtained from the mds end, the interaction times between the client and the mds are reduced, the request processing pressure of the mds end is reduced, and the IO flow in the file reading process is shortened. And after the current file to be read is read, file pre-reading is carried out according to the sequence stored in the entry _ map, so that the pre-reading cache can be hit at a high probability under the scene of reading the sequence, the IO flow of file reading is reduced, and the reading speed is accelerated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart of a method for reading files in batch in distributed storage according to the present invention;
FIG. 2 is a schematic view of a batch reading process of files;
FIG. 3 is a schematic diagram of a file read;
FIG. 4 is a schematic diagram of batch reading of files.
Detailed Description
The core of the invention is to provide a method for reading files in batch in distributed storage, so as to reduce IO (input/output) processes for reading the files and improve the reading speed.
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms are explained as follows:
mds: a metadata server for managing file metadata information;
osd: and the object storage device is used for storing data information.
Referring to fig. 1, fig. 1 is a flowchart of a method for reading files in batch in distributed storage according to the present invention, where the method is applied to a client and includes:
step S11: when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file;
step S12: storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure;
step S13: searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd;
step S14: and after the data reading of the file to be read is finished, performing file pre-reading according to the storage sequence in the entry _ map structure.
Therefore, when files are read in batch, a plurality of file metadata information are obtained at one time each time when the file metadata information is obtained from the mds end, the interaction times between the client and the mds are reduced, the request processing pressure of the mds end is reduced, and the IO flow in the file reading process is shortened. And after the current file to be read is read, file pre-reading is carried out according to the sequence stored in the entry _ map, so that the pre-reading cache can be hit at a high probability under the scene of reading the sequence, the IO flow of file reading is reduced, and the reading speed is accelerated.
Based on the above method, further, the process of step S12 specifically includes: and storing the index number ino and a corresponding directory entry in the metadata information into a entry _ map structure of the client according to the mode of increasing the index number ino.
Further, in step S14, the process of performing file pre-reading according to the storage order in the dense _ map structure specifically includes: and acquiring a corresponding inode structure according to the dent directory structure, and reading a specified number of files from the corresponding osd according to the inode structure.
In step S11, the request is to simultaneously acquire adjacent file metadata in the current file metadata request.
In step S11, the amount of metadata information is dynamically configured by the configuration item.
In step S11, the metadata information of the current file and the metadata information of a plurality of files to be read after the current file are obtained.
Further, after step S14, the method further includes: and sequentially reading the next file, hitting the client cache, and acquiring file data from the client cache.
Wherein, the sequential reading of the next file, the hit of the client cache, and the acquisition of the file data from the client cache further comprise: and if the file to be read does not hit the client cache, pre-reading the specified number of files from the osd into the client cache.
In a small file scene, when a file open is operated, metadata information of a current file and a specified number of adjacent files is obtained from an mds end. When the file is read, after the data reading of the current file is completed, reading the adjacent files with the appointed number from the osd end according to the acquired metadata information. When reading in sequence, the cache of the client can be hit, the whole IO flow of reading the file is reduced, and the reading speed is improved. According to the method, the adjacent files are sequentially read into the client cache when the files are read each time, the IO flow of reading the files can be shortened when the read files hit the cache, and the reading speed of the files is improved.
The creation of the files in the scene of the massive small files is generally batch sequential creation, so the inos of adjacent files are also adjacent with a high probability. When the files are read in batches in sequence, the inos of the files to be read can be considered to be sequentially increased.
Based on the method, in detail, when the file open is operated, a request is sent to mds, and metadata information of a plurality of files behind the current open file is obtained. And obtaining batch metadata information through one-time metadata obtaining request. The quantity of metadata acquired each time of metadata request can be dynamically configured through a configuration item.
Then, after obtaining the metadata information of the file to be read under the current directory, storing the file ino and the corresponding entry thereof into a map structure according to an ino increasing mode: dent _ map. And storing the file information under the current directory in the directory structure of the client.
And then, when the file is read, if the file to be read is not in the client cache, and the data information of the file needs to be read from the osd by calling a reading interface in the objectcatcher, after the current file to be read is read, the file is pre-read according to the sequence stored in the entry _ map, and the file with the specified number is read from the osd. Under the sequential reading scene, the pre-reading cache can be hit with high probability, the IO flow of file reading is reduced, and the reading speed is accelerated. Referring to fig. 2, fig. 2 is a schematic view illustrating a batch reading process of files.
According to the method, in the file reading process, the interaction times of the client and the mds end are reduced, the metadata request processing pressure of the mds end is relieved when a large number of files are read, the IO flow of the file reading is shortened, the cache hit rate of the client is improved, and the file reading speed is increased.
As shown in fig. 3, in the file reading process, each file needs to request metadata information from mds first, mds processes the metadata information successfully, and after replying the request, the client finds the corresponding osd of the stored data according to the acquired metadata information, and reads the data information from the osd. The client needs to interact with mds and osd in the whole process. When the number of files is large, the number of interactions of the client with mds and osd is increased dramatically. Causing the system to be more processing-stressful.
As shown in fig. 4, when reading files in batch, each time obtaining file metadata information from an mds end, obtaining a plurality of file metadata information at a time reduces the number of interactions between a client and the mds, reduces the request processing pressure of the mds end, and shortens the IO flow in the file reading process. When the files are read from the osd, each time the currently specified files to be read are successfully read, the files with the subsequent specified number are sequentially pre-read according to the metadata information obtained from the mds end. In the sequential reading scenario, when the specified pre-reading number is n, the files are read from the osd once in batch, and the client caches can be hit n times in the subsequent file reading.
In a scene of a large amount of small files, the files have the characteristics of large quantity, small size and general sequential operation. The method provides batch reading in the small file reading process for the characteristics, reduces request interaction between the client and the mds, shortens IO flow in the file reading process, and improves the file reading speed. Based on the method, the specific implementation steps are as follows:
1. when a client reads a file, file metadata information needs to be acquired first, and the client sends a metadata request to mds; the mds reads the metadata information of the file from the osd and returns the metadata information to the client; the client acquires an open file after metadata information is acquired;
2. in open operation, a client sends a metadata information request of a current file to mds, and obtains batch metadata information from the mds, wherein the batch metadata information comprises the current file and metadata information of a plurality of files to be read behind the current file;
3. storing ino and corresponding dentry in the metadata information acquired from the mds end into a client dent _ map structure;
4. according to the metadata information of the file, finding osd corresponding to the stored file data, and reading the data of the file to be read from the osd;
5. after the data reading of the file to be read is completed, performing file pre-reading according to the storage sequence in the entry _ map;
acquiring a corresponding inode according to the dent structure, and reading a specified number of files from the corresponding osd according to the inode structure;
6. when the next file is read in sequence, the file is pre-read, so that the cache of the client is hit, and the file data is obtained from the cache;
and pre-reading a specified number of files from the osd into the client cache until the files to be read miss the client cache.
The method mainly comprises the steps that when a large number of files are read in sequence in a large-volume small-file-building scene, metadata information of adjacent files is obtained from mds in batches in a reading request, the files adjacent to the files are read from osd, and the file reading speed is accelerated in a file pre-reading mode.
The method for reading files in batch in distributed storage provided by the invention is described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (6)

1. A method for reading files in a distributed storage in batch is applied to a client and comprises the following steps:
when a file is subjected to open operation, sending a request to a metadata server mds to acquire a current file and metadata information of a plurality of files to be read behind the current file;
storing the index number ino and the directory entry dentry in the metadata information into a dent _ map structure; the dent _ map structure is a map structure;
searching the corresponding object storage device osd for storing the file data according to the metadata information, and reading the data of the file to be read from the osd;
after the data reading of the file to be read is completed, performing file pre-reading according to the storage sequence in the dent _ map structure;
wherein, the storing the index number ino and the directory entry dentry in the metadata information into the dent _ map structure includes:
and storing the index number ino and a corresponding directory entry in the metadata information into a entry _ map structure of the client according to the mode of increasing the index number ino.
2. The method of claim 1, wherein the pre-reading of the file according to the storage order in the entry _ map structure comprises:
and acquiring a corresponding inode structure according to the dent directory structure, and reading a specified number of files from the corresponding osd according to the inode structure.
3. The method of claim 1, wherein the request is a request to obtain current file metadata while obtaining adjacent file metadata.
4. The method of claim 3, wherein the amount of metadata information is dynamically configured by a configuration item.
5. The method according to any one of claims 1 to 4, wherein after completing the data reading of the file to be read, and performing file pre-reading according to the storage sequence in the entry _ map structure, the method further comprises:
and sequentially reading the next file, hitting the client cache, and acquiring file data from the client cache.
6. The method of claim 5, wherein the sequentially reading the next file, hitting the client cache, and after retrieving the file data from the client cache, further comprises:
and if the file to be read does not hit the client cache, pre-reading the specified number of files from the osd into the client cache.
CN201710451855.XA 2017-06-15 2017-06-15 Method for reading files in distributed storage in batch Active CN107291870B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710451855.XA CN107291870B (en) 2017-06-15 2017-06-15 Method for reading files in distributed storage in batch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710451855.XA CN107291870B (en) 2017-06-15 2017-06-15 Method for reading files in distributed storage in batch

Publications (2)

Publication Number Publication Date
CN107291870A CN107291870A (en) 2017-10-24
CN107291870B true CN107291870B (en) 2021-03-09

Family

ID=60097445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710451855.XA Active CN107291870B (en) 2017-06-15 2017-06-15 Method for reading files in distributed storage in batch

Country Status (1)

Country Link
CN (1) CN107291870B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271363B (en) * 2018-09-17 2023-05-26 平安科技(深圳)有限公司 File storage method and device
CN117112104B (en) * 2023-08-24 2024-03-29 浙江远算科技有限公司 Local storage mapping method, equipment and medium based on remote desktop gateway

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103916465A (en) * 2014-03-21 2014-07-09 中国科学院计算技术研究所 Data pre-reading device based on distributed file system and method thereof
CN104123359A (en) * 2014-07-17 2014-10-29 江苏省邮电规划设计院有限责任公司 Resource management method of distributed object storage system
CN106777047A (en) * 2016-12-09 2017-05-31 郑州云海信息技术有限公司 A kind of metadata read method and its device for distributed system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040133606A1 (en) * 2003-01-02 2004-07-08 Z-Force Communications, Inc. Directory aggregation for files distributed over a plurality of servers in a switched file system
CN102541985A (en) * 2011-10-25 2012-07-04 曙光信息产业(北京)有限公司 Organization method of client directory cache in distributed file system
US9715348B2 (en) * 2015-09-09 2017-07-25 Netapp, Inc. Systems, methods and devices for block sharing across volumes in data storage systems
CN106776759A (en) * 2016-11-17 2017-05-31 郑州云海信息技术有限公司 The small documents pre-head method and system of distributed file system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103916465A (en) * 2014-03-21 2014-07-09 中国科学院计算技术研究所 Data pre-reading device based on distributed file system and method thereof
CN104123359A (en) * 2014-07-17 2014-10-29 江苏省邮电规划设计院有限责任公司 Resource management method of distributed object storage system
CN106777047A (en) * 2016-12-09 2017-05-31 郑州云海信息技术有限公司 A kind of metadata read method and its device for distributed system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于ARM的嵌入式闪存驱动与UBIFS文件系统的分析与实现;赖尚校;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160315(第3期);第43-44页 *

Also Published As

Publication number Publication date
CN107291870A (en) 2017-10-24

Similar Documents

Publication Publication Date Title
US11907642B2 (en) Enhanced links in curation and collaboration applications
CN107679211B (en) Method and device for pushing information
US20190188222A1 (en) Thumbnail-Based Image Sharing Method and Terminal
US10515142B2 (en) Method and apparatus for extracting webpage information
US11417074B2 (en) Methods and apparatus for identifying objects depicted in a video using extracted video frames in combination with a reverse image search engine
CN104462534B (en) network information sharing method and device
US10043220B2 (en) Method, device and storage medium for data processing
CN104700836A (en) Voice recognition method and voice recognition system
US20160042087A1 (en) Method, System And Front -End Device For Posting User Generated Content
US10289739B1 (en) System to recommend content based on trending social media topics
CN107291870B (en) Method for reading files in distributed storage in batch
WO2015154682A1 (en) Network request processing method, network server, and network system
JP7217181B2 (en) WEARABLE DEVICE, INFORMATION PROCESSING METHOD, APPARATUS AND SYSTEM
CN104978341A (en) File processing method and equipment, and network system
CN110598049A (en) Method, apparatus, electronic device and computer readable medium for retrieving video
US20160127496A1 (en) Method and system of content caching and transmission
WO2015154678A1 (en) File processing method, device, and network system
WO2021009597A1 (en) A system and a method for streaming videos by creating object urls at client
CN111147888B (en) Streaming media video data processing method and device, computer equipment and storage medium
CN113051504B (en) Document preview method, device, apparatus, storage medium and program product
CN113705548B (en) Topic type identification method and device
CN116916060A (en) Video processing method and related equipment
CN115080571A (en) Index updating method and device, electronic equipment and computer readable medium
CN116320648A (en) Bullet screen drawing method and device and electronic equipment
CN114900741A (en) Method, device, equipment, storage medium and product for displaying translated captions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210204

Address after: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Applicant after: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: Room 1601, floor 16, 278 Xinyi Road, Zhengdong New District, Zhengzhou City, Henan Province

Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant