WO2014153931A1 - File storage method and device, access client and metadata server system - Google Patents

File storage method and device, access client and metadata server system Download PDF

Info

Publication number
WO2014153931A1
WO2014153931A1 PCT/CN2013/083689 CN2013083689W WO2014153931A1 WO 2014153931 A1 WO2014153931 A1 WO 2014153931A1 CN 2013083689 W CN2013083689 W CN 2013083689W WO 2014153931 A1 WO2014153931 A1 WO 2014153931A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
copy
files
location information
file access
Prior art date
Application number
PCT/CN2013/083689
Other languages
French (fr)
Chinese (zh)
Inventor
胡剑华
朱鹏
俞超
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2014153931A1 publication Critical patent/WO2014153931A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • Embodiments of the present invention relate to the field of cloud storage, and in particular, to a file storage method, apparatus, access client, and metadata server system. Background technique
  • DFS Distributed File System
  • a distributed File System is to divide files into several CHUNKs and store multiple copies on different servers.
  • DFS must be able to Support for the storage of files of various sizes, as small as a few bytes of files, up to tens of gigabytes should be supported, and storage performance should not be different, but according to the existing mechanism, no matter how small the file
  • the file access server stores it separately in a single copy on disk, which unnecessarily increases disk fragmentation when the file access server has more than one such file. Summary of the invention
  • the embodiment of the invention provides a file storage method, which is applied to a metadata server system, and the method includes:
  • the at least two files are respectively stored to offset positions of the at least two files in the copy.
  • the offset positions of the at least two files in the copy respectively include:
  • the method further includes:
  • the embodiment of the present invention further provides a file storage method, which is applied to a file access client, and the method includes:
  • the storage location information includes offset location information and the replica location information of each of the at least two files in one copy;
  • the storing Position information is determined by the metadata server system; interacting with the file access server according to the storage location information, causing the file access server to store the at least two files to the at least two files respectively in the In the copy Offset position.
  • the offset positions of the at least two files in the copy respectively include:
  • the interacting with the file access server according to the storage location information, so that the offset location in the copy includes:
  • the method further includes:
  • the embodiment of the invention further provides a file storage device, which is applied to a metadata server system, and the device includes:
  • a determining module configured to determine storage location information of the at least two files; wherein the storage location information includes offset location information and the replica location information of each of the at least two files in one copy;
  • the sending module is configured to send the storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information to make an offset position in the copy.
  • the sending module includes:
  • a sending unit configured to send storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information, so that the file access server sets the at least two files Write a cache block and write the data in the cache block to the copy after the file is written, and set the data after the write is completed.
  • the embodiment of the present invention further provides a file storage device, which is applied to a file access client, and the device includes:
  • a receiving module configured to receive storage location information of at least two files sent by the metadata server system, where the storage location information includes offset location information and the replica location of each of the at least two files in one copy Information; the storage location information is determined by the metadata server system;
  • the interaction module is configured to interact with the file access server according to the storage location information, each offset position in the copy.
  • the interaction module includes: An interaction unit configured to: for each of the at least two files, according to the copy location information and offset location information of each file in the replica, each file is cached Writing a shared memory page to the cache block of the file access server, enabling the file access server to write data in the cache block to the copy and the data after the shared memory page is written Each file described after the completion of writing is stored at an offset position of each of the files in the copy.
  • Embodiments of the present invention also provide a metadata server system including the file storage device described above.
  • Embodiments of the present invention also provide a file access client including the file storage device described above.
  • the embodiment of the present invention has at least the following beneficial effects: Supporting different files to be stored in the same copy, thereby avoiding file access server storage compared with different files in the prior art that cannot be stored in the same copy. An unnecessary increase in the fragmentation of the medium.
  • FIG. 1 is a flow chart of steps of a file storage method according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of steps of another file storage method according to Embodiment 2 of the present invention
  • FIG. 3 is a small file of a preferred embodiment. Schematic diagram of polymerization
  • FIG. 4 is a schematic flow chart of a small file aggregation write of a preferred embodiment
  • FIG. 5 is a schematic flow chart of a small file aggregation read of a preferred embodiment. detailed description
  • a first embodiment of the present invention provides a file storage method, where the method includes the following steps: Step 101: Determine storage location information of at least two files, where the storage location information includes offset location information and replica location information of each of the at least two files in one copy; Step 102, send a storage location to the file access client The information causes the file access client to interact with the file access server according to the storage location information, so that the file access server stores the at least two files respectively to the offset positions of the two files in the copy.
  • This method is applied to a metadata server system.
  • the copy is located on the storage medium of the file access server.
  • the storage medium is, for example, a magnetic disk.
  • the copy location information includes: File access server ID and storage media ID in the file access server.
  • the file access server identifier is, for example: IP of the file access server: 10.47.107.111; storage medium identifier such as: IP of the file access server: 10.47.107.111.
  • the sum of the sizes of the at least two files is not greater than the copy size.
  • each of the at least two files is no larger than the shared cache page (PAGE) size in the file access client.
  • PAGE shared cache page
  • the copy location corresponds to the copy handle.
  • the copy location is determined by the metadata server system according to the copy handle; the copy handle is the file name of the first file in the at least two files by the metadata server system.
  • the first file is any one of at least two files; or, in consideration of the file access client sending an open file request to each of the at least two files to the metadata server system, the first file is metadata The file corresponding to the open file request received by the server system from the file access client for the first time.
  • the metadata server system comprises a metadata management server and a plurality of metadata storage servers, wherein the copy handle is specifically allocated by the metadata management server, and the copy location is specifically determined by the first metadata storage server of the plurality of metadata storage servers. Determining, according to the copy handle, the first metadata storage server is from the plurality of elements by the metadata management server according to the file name of the first file The data storage server is selected.
  • At least two files have different offset positions in the copy.
  • the offset position here refers to the position where the file is stored in the copy, and the starting position in the position is coincident or offset from the starting position of the copy.
  • the offset position information of each of the at least two files in the copy includes: offset size information between the start position of each file stored in the copy and the start position of the copy, for example, the number of bytes or The number of bits, the size and number of PAGEs, or the number of PAGEs in which the size of each file in at least two files is not larger than the size of the PAGE, and so on.
  • the offset position information of each of the at least two files in the copy can be determined as follows:
  • the first method is a timing-based approach.
  • the metadata server system considering that the metadata server system sequentially receives the request for opening the file from the file access client, the number of these open file requests is the same as the number of at least two files, and the open file requests and at least two files— - Correspondingly, the metadata server system assigns an offset position to the corresponding file in order from the start position of the copy position in the order in which the file request is received.
  • the offset position of the corresponding file allocation for the first received open file is the starting position of the copy position, and the offset position and copy of the corresponding file allocation are requested for the second received open file request.
  • the starting position differs by a shared cache page size
  • the offset position of the corresponding file allocation for the third received open file request differs from the starting position of the copy by two shared cache page sizes, and so on.
  • the second method is based on the serial number in the file name.
  • the respective offset positions of the plurality of files are assigned according to the sequence number included in the file name in the predetermined size order.
  • the metadata server system receives three open file requests from the file access client, Corresponding to the file with the file name 010, the file with the file name 001, and the file with the file name 003, the offset positions are assigned to the three files in the order of the serial number, which is the file with the file name 001.
  • the assigned offset position is the start position of the copy position
  • the offset position assigned to the file with the file name 003 differs from the start position of the copy by a shared cache page size
  • the offset position assigned to the file with the file name 010 There are two shared cache page sizes that differ from the starting position of the copy.
  • the metadata server system receives three open file requests from the file access client, corresponding to the file with the file name 112, the file with the file name 111, and the file with the file name 113, according to the serial number from large to small.
  • the order of the three files is offset
  • the offset position assigned to the file with the file name 113 is the start position of the copy position
  • the offset position assigned to the file with the file name 112 and the start position of the copy.
  • the difference is a shared cache page size
  • the offset position assigned to the file with the file name 111 differs from the start position of the copy by two shared cache page sizes.
  • Mode 2 is more suitable for scenes with fixed file names.
  • the file name has a series of consecutive serial numbers.
  • the system can put 3 files into one copy.
  • File 001, file 002, and file 003 will be put together, and the offset positions are respectively For no offset, one shared cache page size, two shared cache page sizes; for example, file 010, file 011, and file 012 are also in the same copy, offset positions are no offset, one shared cache page size, Two shared cache page sizes.
  • the file access server stores at least two files respectively.
  • the offset locations of at least two files in the copy include:
  • One, in order to access the client to file access service with existing files The file writing mechanism of the device is compatible, thereby reducing the implementation cost and complexity of the embodiment of the present invention, and interacting with the file access server according to the storage location information, so that the file access server stores at least two files to at least two files respectively.
  • the offset locations in the copy include:
  • the shared memory page in which each file is cached is written to the cache block of the file access server, so that the file is accessed. After the server finishes writing, the data in the cache block is written into the copy, and after the data is written, each file is stored at the offset position of each file in the copy.
  • a cache block in which shared memory pages having different files are written is different, and different cache blocks are written in the copy. The time is different.
  • the cache blocks in which the shared memory pages in which different files are cached are written into the same cache block.
  • the method further includes: recording a file name, a copy location information, and an offset of each of the at least two files in each of the at least two files Correspondence of location information;
  • the first information includes the copy location information and the offset location information of the file to be read in the copy;
  • Step 201 Receive storage location information of at least two files sent by the metadata server system, where the storage location information includes offset location information and replica location information of each of the at least two files in one copy; storing location information by metadata Server system determination;
  • Step 202 interacting with the file access server according to the storage location information, so that the file access service
  • the server stores at least two files in an offset position of each of the at least two files in the copy. This method is applied to the file access client.
  • the file access server stores the at least two files separately to the offset positions of the at least two files in the copy respectively:
  • Enabling the file access server to write at least two files to one cache block and after the at least two files are written the data in the cache block is written into the copy, and at least two files are respectively stored after the data is written.
  • the offset position in at least two files in the copy is
  • the file access server in order to be compatible with the file writing mechanism of the existing file access client to the file access server, thereby reducing the implementation cost and complexity of the embodiment of the present invention, according to the storage location information, and the file access server Interacting, causing the file access server to store at least two files separately to offset positions of at least two files in the copy respectively:
  • the shared memory page in which each file is cached is written to the cache block of the file access server, so that the file is accessed.
  • the server writes the data in the cache block to the copy and each file is stored in the offset position of each file in the copy after the data is written.
  • the method further includes: sending a request to the metadata server system, where the file name of the file to be read is included; and receiving the first information sent by the metadata server system
  • the first information includes copy location information and offset location information of the file to be read in the copy
  • the first information is determined by the metadata server system according to file name, copy location information, and at least file name of each of the at least two files
  • the correspondence between the offset position information of each of the two files in the copy and the file name of the file to be read are determined;
  • the correspondence is recorded by the metadata server system;
  • the file access server According to the first information, interacting with the file access server, reading the corresponding first file from the file access server.
  • the File Access Client (FAC):
  • the DFS-oriented application provides an interface calling service similar to the standard file system, and the read and write data for the application layer is the page (PAGE) size. Managed.
  • Metadata server system It is responsible for managing the metadata information such as file name and copy information of all DFS files, exists in the database, and provides metadata write and query operations to the file access client.
  • the metadata server system includes a metadata management server and a plurality of metadata storage servers.
  • File Access Server responsible for interacting with its own storage medium in the cache block unit, and performing read and write operations on the cache block.
  • the FAS manages the data in the size of the cache block (BLK);
  • the file accesses the client's data read and write request, reads the data from the storage medium and returns it to the file access client; reads the data from the file access client and writes the storage medium;
  • Storage media Generally, it is a normal SCIC disk or SATA disk. Where CHUNK is actually stored, CHUNK is a minimum of BLK size, the maximum value can be set, and the size is increased by BLK size.
  • Each of the at least two files is a small file.
  • the small file here means that the size is not greater than A PAGE size file.
  • the size of the PAGE can be set.
  • a small file corresponds to a PAGE, a BLK, corresponding to a copy on the disk; and in the preferred embodiment, after the aggregation, multiple PAGEs may correspond to the same BLK, and multiple PAGEs correspond to the same copy.
  • the offset size information between the starting position where the file is stored in the copy and the starting position of the copy is the number of PAGEs.
  • the following takes the degree of aggregation as 3 as an example, and illustrates the small file aggregation process in conjunction with FIG. As shown in FIG. 3, at least two files include a file FILE#001, a file FILE#002, and a file FILE#003.
  • the FAC When creating the file FILE#001, the FAC writes the PAGE#1 carrying FILE#001 to the first 1/3 buffer space of a BLK (FAS_BLK#1) of the FAS; after the PAGE#1 is written, When the file FILE#002 is created, the FAC writes the PAGE#2 carrying FILE#002 to the second 1/3 buffer space of FAS_BLK#1; after the PAGE#2 is written, when the file FILE#003 is created, The FAC writes PAGE#3 carrying FILE#003 to the third 1/3 buffer space of FAS_BLK#1.
  • the FAS brushes its own cache block FAS_BLK#1 ie, the data in FAS_BLK#1 is written to a copy on the disk FAS_BLK#1-CHKFILE.
  • the file FILE#001, the file FILE#002 and the file FILE#003 are stored in the respective offset positions in the copy, wherein the file FILE#001 is stored in the copy at the starting position of the copy.
  • the starting position, the corresponding offset position is 0; the starting position of the file FILE#002 in the copy is different from the starting position of the copy by a PAGE size, and the corresponding offset position is 1; File FILE#003 The starting position stored in the copy differs from the starting position of the copy by two PAGE sizes, and the corresponding offset position is 2.
  • the preferred embodiment stores a certain number of files smaller than one PAGE size on the same BLK on the file access server, and stores the same on the disk.
  • CHUNK on the one hand, compared with the prior art, the number of CHUNKs stored on the disk is greatly reduced, the disk fragmentation is effectively reduced, and a certain disk space is saved.
  • the overall read and write performance of the disk is improved; on the other hand, the reading and writing of multiple small files at the application layer only needs to occur once for the disk 10, which reduces the limited IOPS capability of the disk compared with the disk 10 that occurs in the prior art.
  • the limitation of IOPS on the application layer reads and writes, which in turn increases the IOPS of the application layer.
  • the copy location information of the CHUNK file in the metadata that is, the file access server where the copy is located and the volume information of the file are also greatly reduced.
  • the bitmap information in the metadata storage server may be added with a bitmap of the degree of aggregation, and the PAGE number information corresponding to the file is recorded.
  • the offset of the file FILE#001 is 0, the bitmap is 001 (binary); the offset of the file FILE#002 is 1, and the bitmap is 010 (binary); file FILE# The offset of 002 is 2 and the bitmap is 100 (binary).
  • the offset of the file corresponding to the number of PAGEs is 0, and the bitmap is 00000001 (binary); the offset of the file corresponding to the number of PAGEs is 3, and the bitmap is 00001000 (binary) ; The offset of the file corresponding to the number of PAGEs is 7 and the bitmap is 10000000 (binary).
  • the page size is 32k
  • the blk size is 256k.
  • a file file003 is newly written, and the size is lk, which satisfies the condition of small file aggregation (the size is smaller than one page size), and the metadata management server opens the file.
  • the offset is confirmed based on the first few small files.
  • a small file aggregation write process includes:
  • Step 401 The application layer initiates an open file request to the file access client.
  • Step 402 The file access client initiates an open file request (with a file name and a creation flag) to the metadata management server, and the database determines, according to the file name, the metadata storage server to which the file belongs and the copy handle to which the file belongs and the copy Offset position information. Multiple files within the same copy belong to the same metadata storage server.
  • Step 403 After the file access client receives the response from the metadata management server, if the given copy handle is not 0, the copy handle and the offset location information are recorded in the file management global structure of the file access client.
  • Step 404 The file access client opens the file to the corresponding metadata storage server, and the metadata storage server replies to the file access client, and the file access client responds to the application layer after receiving the response.
  • Step 405 The application layer sends a write request to the file access client. After receiving the write request, the file access client first determines whether the copy handle recorded in the global structure of the file management is 0. If not, the copy handle is used to The metadata storage server sends a request for obtaining a copy location, and if it is 0, the request for obtaining the copy location is sent to the metadata storage server by using the file identifier generated by the file identifier plus the calculated copy number of the written page.
  • Step 406 After receiving the request, the metadata storage server obtains the copy location from the database and returns it to the file access client.
  • Step 407 After receiving the copy location of the metadata storage server, the file access client writes the data of the application layer to the shared memory page of the file access client, and responds to the application layer.
  • Step 408 the file access client's write thread writes the shared memory page to the file access server according to the copy handle and the offset location information.
  • the file access client can reuse the existing mechanism for writing file data to the file access server to write different small file data.
  • the FAC only needs to write data of different small files as data buffered by the same file in different PAGEs. Therefore, the preferred embodiment does not need to modify the existing file access server, which saves the upgrade cost of the cloud service system.
  • the cache block For the file access server, if the cache block is missed, a new cache block is requested, and the data page is written into the buffer block. The data of the multiple files is written to the same cache block before being flushed into the disk, and finally stored in the buffer block. In the same copy on the disk.
  • FIG. 5 is a schematic flowchart of a small file aggregation read according to a preferred embodiment.
  • the small file aggregation read process includes the following steps:
  • Step 501 The application layer initiates an open file request to the file access client.
  • Step 502 The file access client initiates an open file request to the metadata management server, and the database determines, according to the file name, the metadata storage server to which the file belongs, and the copy to which the file belongs and the offset location information in the copy. Multiple files within the same copy belong to the same metadata storage server.
  • Step 503 After the file access client receives the response from the metadata storage server, if the given copy handle is not 0, the copy handle and the offset are recorded in the file management global structure.
  • Step 504 The file access client opens the file on the metadata storage server, and the metadata storage server responds to the file access client, and the file access client responds to the application layer after receiving the response.
  • Step 505 The application layer sends a read request to the file access client. After receiving the read request, the file access client first determines whether the copy handle recorded in the global structure of the file management is 0. If not, the copy handle is used to The metadata storage server sends a request for obtaining a copy location, and if it is 0, the request for obtaining the copy location is sent to the metadata storage server by using the file identifier generated by the file identifier plus the calculated copy number of the written page.
  • Step 506 After receiving the request, the metadata storage server obtains the copy location information from the database and returns the file to the file access client.
  • Step 507 After receiving the copy location information of the metadata storage server, the file access client first reads the shared memory of the file access client according to the copy location information and the offset location information, and determines whether there is any corresponding to the offset location information.
  • PAGE hit returns the file in the PAGE to the application layer, otherwise, continue to read to the file access server cache block, hit returns, otherwise, go to disk to read the corresponding copy, a copy of the data stored in multiple files, read One cache block data, the next time you read another file, you may directly hit the cache, no need to go to disk read, where the read data to the cache block is always read at the beginning of the cache block.
  • Step 508 The file access client reads the cache block data read by the disk into the cache of the file access server, and reads the PAGE corresponding to the offset location information into the shared memory of the file access client, and returns it to the application layer.
  • the existing file access server since the file access client reads the small file in the manner of reading the PAGE, the existing file access server does not need to be modified, which saves the upgrade cost of the cloud service system.
  • Step 509 the application layer sends a close request, and the file access client responds.
  • the preferred embodiment is for an application scenario in which small files are stored more or only small files are stored.
  • a third embodiment of the present invention provides a file storage device, which is applied to a metadata server system, where the device includes:
  • a determining module configured to determine storage location information of at least two files; wherein the storage location information includes offset location information and replica location information of each of the at least two files in one copy;
  • the sending module is configured to send the storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information, so that the file access server stores at least two files respectively into at least two files respectively in the copy The offset position in .
  • the sending module includes:
  • the sending unit is configured to send the storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information, so that the file access server writes at least two files into one cache block and at least two After the file is written, the cache block will be The data in the write is written in the copy, and at least two files are respectively stored in the offset position of each of the at least two files in the copy after the data is written.
  • the third embodiment of the present invention is an apparatus embodiment, which corresponds to the first embodiment of the present invention (which is a method embodiment), and the part that is not described in detail in the third embodiment of the present invention is related to the first embodiment of the present invention. Part of the description can be, in order to save space, no longer repeat them here.
  • the determining module in the third embodiment of the present invention may be implemented by a central processing unit of the metadata server system.
  • CPU Central Processing Unit
  • processor MPU, Micro Processing Unit
  • DSP digital signal processor
  • the fourth embodiment of the present invention provides another file storage device, which is applied to a file access client, and the device includes:
  • a receiving module configured to receive storage location information of at least two files sent by the metadata server system, where the storage location information includes offset location information and copy location information of each of the at least two files in one copy;
  • the metadata server system determines;
  • the interaction module is configured to interact with the file access server according to the storage location information, so that the file access server stores the at least two files separately to an offset position of each of the at least two files in the copy.
  • the interaction module includes:
  • the interaction unit is configured to write, to each of the at least two files, the shared memory page in which each file is cached to the file access server cache according to the copy location information and the offset location information of each file in the copy Block, after the file access server writes the shared memory page, writes the data in the cache block to the copy and stores each file in each file after the data is written.
  • the offset position of the file in the copy is configured to write, to each of the at least two files, the shared memory page in which each file is cached to the file access server cache according to the copy location information and the offset location information of each file in the copy Block
  • the fourth embodiment of the present invention is an apparatus embodiment, and corresponds to the second embodiment of the present invention (which is a method embodiment), and the parts that are not described in detail in the fourth embodiment of the present invention refer to the first and second embodiments of the present invention. The description of the relevant part is not repeated here.
  • the determining module in the fourth embodiment of the present invention may be implemented by the file accessing the CPU, the MPU or the DSP of the client; the interaction module and the interaction unit in the sending module may be implemented by the chip having the interactive function in the file access client.
  • a fifth embodiment of the present invention provides a metadata server system.
  • the metadata server system includes a file storage device according to Embodiment 3 of the present invention.
  • the sixth embodiment of the present invention provides a file access client, and the file access client includes another file storage device provided in Embodiment 4 of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a file storage method and device, an access client and a metadata server system. The method comprises: determining storage location information about at least two files, the storage location information comprising respective offset location information about the at least two files in a copy and location information about the copy; and sending the storage location information to a file access client to enable the file access client to interact with a file access server according to the storage location information and enable the file access server to respectively store the at least two files in the respective offset locations of the at least two files in the copy. The embodiments of the present invention avoid the unnecessary increase of fragments of a storage medium of the file access server.

Description

文件存储方法、 装置、 访问客户端及元数据服务器系统 技术领域  File storage method, device, access client and metadata server system
本发明实施例涉及云存储领域, 尤其涉及文件存储方法、 装置、 访问 客户端及元数据服务器系统。 背景技术  Embodiments of the present invention relate to the field of cloud storage, and in particular, to a file storage method, apparatus, access client, and metadata server system. Background technique
多副本式分布式文件系统( DFS, Distributed File System )是将文件分 为若干 CHUNK并将多份拷贝存储在不同的服务器上, 为了能够在云存储 领域适用于更广泛的应用场景, DFS必须能够支持各种大小的文件的存储, 小到只有几个字节的文件, 大到几十千兆字节都应该支持, 并且存储性能 不应存在差异, 但是, 按照现有机制, 无论文件多小, 文件访问服务器都 会将其单独存储到磁盘上的一个副本中, 从而在文件访问服务器不只一个 这种文件时, 不必要地增加了磁盘碎片。 发明内容  DFS (Distributed File System) is to divide files into several CHUNKs and store multiple copies on different servers. In order to be applicable to a wider range of applications in the cloud storage field, DFS must be able to Support for the storage of files of various sizes, as small as a few bytes of files, up to tens of gigabytes should be supported, and storage performance should not be different, but according to the existing mechanism, no matter how small the file The file access server stores it separately in a single copy on disk, which unnecessarily increases disk fragmentation when the file access server has more than one such file. Summary of the invention
有鉴于此, 本发明实施例的目的是提供文件存储方法、 装置、 访问客 户端及元数据服务器系统, 以避免文件访问服务器存储介质的碎片的不必 要增力口。  In view of this, it is an object of embodiments of the present invention to provide a file storage method, apparatus, access client, and metadata server system to avoid unnecessary addition of files to the server storage medium.
为解决上述技术问题, 本发明实施例提供方案如下:  To solve the above technical problem, the solution provided by the embodiment of the present invention is as follows:
本发明实施例提供一种文件存储方法, 应用于元数据服务器系统, 所 述方法包括:  The embodiment of the invention provides a file storage method, which is applied to a metadata server system, and the method includes:
确定至少两个文件的存储位置信息; 其中, 所述存储位置信息包括所 述至少两个文件各自在一个副本中的偏移位置信息和副本位置信息;  Determining storage location information of at least two files; wherein the storage location information includes offset location information and replica location information of each of the at least two files in one copy;
向文件访问客户端发送所述存储位置信息, 使所述文件访问客户端根 据所述存储位置信息, 与文件访问服务器交互, 使所述文件访问服务器将 所述至少两个文件分别存储到所述至少两个文件各自在所述副本中的偏移 位置。 所述至少两个文件各自在所述副本中的偏移位置包括: Sending the storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information, so that the file access server will The at least two files are respectively stored to offset positions of the at least two files in the copy. The offset positions of the at least two files in the copy respectively include:
使所述文件访问服务器能够将所述至少两个文件写入一个緩存块并在 所述至少两个文件写入完成后, 将所述緩存块中的数据写入所述副本中、 自在所述副本中的偏移位置。  Enabling the file access server to write the at least two files to a cache block and, after the at least two files are written, writing data in the cache block into the copy, from the The offset position in the copy.
优选的, 所述方法还包括:  Preferably, the method further includes:
记录所述至少两个文件中每个文件的文件名、 所述副本位置信息和所 述至少两个文件各自在所述副本中的偏移位置信息的对应关系;  Recording a correspondence between a file name of each of the at least two files, the copy location information, and offset position information of each of the at least two files in the copy;
接收所述文件访问客户端发送的请求读取的所述至少两个文件中的待 读取文件的文件名;  Receiving, by the file accessing client, a file name of the to-be-read file in the at least two files read by the request sent by the client;
根据所述对应关系和所述待读取文件的文件名,确定第一信息;其中, 所述第一信息包括所述副本位置信息和所述待读取文件在所述副本中的偏 移位置信息;  Determining first information according to the correspondence relationship and a file name of the file to be read; wherein the first information includes the copy location information and an offset position of the file to be read in the copy Information
向所述文件访问客户端发送所述第一信息, 使所述文件访问客户端根 据所述第一信息, 与所述文件访问服务器交互, 从所述文件访问服务器读 取出对应的第一文件。  Sending the first information to the file access client, causing the file access client to interact with the file access server according to the first information, and reading a corresponding first file from the file access server. .
本发明实施例还提供一种文件存储方法, 应用于文件访问客户端, 所 述方法包括:  The embodiment of the present invention further provides a file storage method, which is applied to a file access client, and the method includes:
接收元数据服务器系统发送的至少两个文件的存储位置信息; 其中, 所述存储位置信息包括所述至少两个文件各自在一个副本中的偏移位置信 息和所述副本位置信息; 所述存储位置信息由所述元数据服务器系统确定; 根据所述存储位置信息, 与文件访问服务器交互, 使所述文件访问服 务器将所述至少两个文件分别存储到所述至少两个文件各自在所述副本中 的偏移位置。 所述至少两个文件各自在所述副本中的偏移位置包括: Receiving storage location information of at least two files sent by the metadata server system; wherein the storage location information includes offset location information and the replica location information of each of the at least two files in one copy; the storing Position information is determined by the metadata server system; interacting with the file access server according to the storage location information, causing the file access server to store the at least two files to the at least two files respectively in the In the copy Offset position. The offset positions of the at least two files in the copy respectively include:
使所述文件访问服务器能够将所述至少两个文件写入一个緩存块并在 所述至少两个文件写入完成后, 将所述緩存块中的数据写入所述副本中、 自在所述副本中的偏移位置。  Enabling the file access server to write the at least two files to a cache block and, after the at least two files are written, writing data in the cache block into the copy, from the The offset position in the copy.
优选的, 所述根据所述存储位置信息, 与文件访问服务器交互, 使所 在所述副本中的偏移位置包括:  Preferably, the interacting with the file access server according to the storage location information, so that the offset location in the copy includes:
针对所述至少两个文件中的每个文件, 根据所述副本位置信息和所述 每个文件在所述副本中的偏移位置信息, 将緩存有所述每个文件的共享内 存页写入所述文件访问服务器的緩存块, 使所述文件访问服务器能够在所 述共享内存页写入完成后, 将所述緩存块中的数据写入所述副本中且所述 数据写入完成后所述每个文件存储在所述每个文件在所述副本中的偏移位 置。  Writing, for each of the at least two files, a shared memory page in which each of the files is cached according to the copy location information and offset location information of each of the files in the replica The file accessing a cache block of the server, enabling the file access server to write data in the cache block to the copy after the shared memory page is written, and after the data is written Each file is stored at an offset location in the copy of each of the files.
优选的, 所述方法还包括:  Preferably, the method further includes:
向所述元数据服务器系统发送请求, 其中包括待读取文件的文件名; 接收所述元数据服务器系统发送的第一信息; 其中, 所述第一信息包 括所述副本位置信息和所述待读取文件在所述副本中的偏移位置信息; 所 述第一信息由所述元数据服务器系统根据所述至少两个文件中每个文件的 文件名、 所述副本位置信息和所述至少两个文件各自在所述副本中的偏移 位置信息的对应关系和所述待读取文件的文件名确定; 所述对应关系由所 述元数据服务器系统记录;  Sending a request to the metadata server system, where the file name of the file to be read is received; receiving the first information sent by the metadata server system; wherein the first information includes the copy location information and the Reading offset position information of the file in the copy; the first information is used by the metadata server system according to a file name of each of the at least two files, the copy location information, and the at least Corresponding relationship between the offset position information of the two files in the copy and the file name of the file to be read; the correspondence relationship is recorded by the metadata server system;
根据所述第一信息, 与所述文件访问服务器交互, 从所述文件访问服 务器读取出对应的第一文件。 本发明实施例还提供一种文件存储装置, 应用于元数据服务器系统, 所述装置包括: And interacting with the file access server according to the first information, and reading a corresponding first file from the file access server. The embodiment of the invention further provides a file storage device, which is applied to a metadata server system, and the device includes:
确定模块, 配置为确定至少两个文件的存储位置信息; 其中, 所述存 储位置信息包括所述至少两个文件各自在一个副本中的偏移位置信息和所 述副本位置信息;  a determining module, configured to determine storage location information of the at least two files; wherein the storage location information includes offset location information and the replica location information of each of the at least two files in one copy;
发送模块, 配置为向文件访问客户端发送所述存储位置信息, 使所述 文件访问客户端根据所述存储位置信息, 与文件访问服务器交互, 使所述 所述副本中的偏移位置。  The sending module is configured to send the storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information to make an offset position in the copy.
优选的, 所述发送模块包括:  Preferably, the sending module includes:
发送单元, 配置为向所述文件访问客户端发送存储位置信息, 使所述 文件访问客户端根据所述存储位置信息, 与文件访问服务器交互, 使所述 文件访问服务器将所述至少两个文件写入一个緩存块并在所述文件写入完 成后, 将所述緩存块中的数据写入所述副本中、 且所述数据写入完成后所 置。  a sending unit, configured to send storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information, so that the file access server sets the at least two files Write a cache block and write the data in the cache block to the copy after the file is written, and set the data after the write is completed.
本发明实施例还提供一种文件存储装置, 应用于文件访问客户端, 所 述装置包括:  The embodiment of the present invention further provides a file storage device, which is applied to a file access client, and the device includes:
接收模块, 配置为接收元数据服务器系统发送的至少两个文件的存储 位置信息; 其中, 所述存储位置信息包括所述至少两个文件各自在一个副 本中的偏移位置信息和所述副本位置信息; 所述存储位置信息由所述元数 据服务器系统确定;  a receiving module, configured to receive storage location information of at least two files sent by the metadata server system, where the storage location information includes offset location information and the replica location of each of the at least two files in one copy Information; the storage location information is determined by the metadata server system;
交互模块, 配置为根据所述存储位置信息, 与文件访问服务器交互, 各自在所述副本中的偏移位置。  The interaction module is configured to interact with the file access server according to the storage location information, each offset position in the copy.
优选的, 所述交互模块包括: 交互单元, 配置为针对所述至少两个文件中的每个文件, 根据所述副 本位置信息和所述每个文件在所述副本中的偏移位置信息, 将緩存有所述 每个文件的共享内存页写入所述文件访问服务器的緩存块, 使所述文件访 问服务器能够在所述共享内存页写入完成后, 将所述緩存块中的数据写入 所述副本中且所述数据写入完成后所述每个文件存储在所述每个文件在所 述副本中的偏移位置。 Preferably, the interaction module includes: An interaction unit configured to: for each of the at least two files, according to the copy location information and offset location information of each file in the replica, each file is cached Writing a shared memory page to the cache block of the file access server, enabling the file access server to write data in the cache block to the copy and the data after the shared memory page is written Each file described after the completion of writing is stored at an offset position of each of the files in the copy.
本发明实施例还提供一种包括以上所述的文件存储装置的元数据服务 器系统。  Embodiments of the present invention also provide a metadata server system including the file storage device described above.
本发明实施例还提供一种包括以上所述的文件存储装置的文件访问客 户端。  Embodiments of the present invention also provide a file access client including the file storage device described above.
从以上所述可以看出, 本发明实施例至少具有如下有益效果: 支持不同文件存储在同一副本中, 从而与现有技术中不同文件不能存 储在同一副本中相比, 避免了文件访问服务器存储介质的碎片的不必要增 加。 附图说明  As can be seen from the above, the embodiment of the present invention has at least the following beneficial effects: Supporting different files to be stored in the same copy, thereby avoiding file access server storage compared with different files in the prior art that cannot be stored in the same copy. An unnecessary increase in the fragmentation of the medium. DRAWINGS
图 1为本发明实施例一提供的一种文件存储方法的步骤流程图; 图 2为本发明实施例二提供的另一种文件存储方法的步骤流程图; 图 3为优选实施方式的小文件聚合示意图;  1 is a flow chart of steps of a file storage method according to Embodiment 1 of the present invention; FIG. 2 is a flowchart of steps of another file storage method according to Embodiment 2 of the present invention; FIG. 3 is a small file of a preferred embodiment. Schematic diagram of polymerization;
图 4为优选实施方式的小文件聚合写的流程示意图;  4 is a schematic flow chart of a small file aggregation write of a preferred embodiment;
图 5为优选实施方式的小文件聚合读的流程示意图。 具体实施方式  FIG. 5 is a schematic flow chart of a small file aggregation read of a preferred embodiment. detailed description
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合附 图及具体实施例对本发明实施例进行详细描述。  The embodiments of the present invention will be described in detail below with reference to the drawings and specific embodiments.
本发明实施例一提供一种文件存储方法, 该方法包括如下步骤: 步骤 101, 确定至少两个文件的存储位置信息; 其中, 存储位置信息包 括上述至少两个文件各自在一个副本中的偏移位置信息和副本位置信息; 步骤 102, 向文件访问客户端发送存储位置信息, 使文件访问客户端根 据存储位置信息, 与文件访问服务器交互, 使文件访问服务器将上述至少 两个文件分别存储到这两个文件各自在副本中的偏移位置。 A first embodiment of the present invention provides a file storage method, where the method includes the following steps: Step 101: Determine storage location information of at least two files, where the storage location information includes offset location information and replica location information of each of the at least two files in one copy; Step 102, send a storage location to the file access client The information causes the file access client to interact with the file access server according to the storage location information, so that the file access server stores the at least two files respectively to the offset positions of the two files in the copy.
该方法应用于元数据服务器系统。  This method is applied to a metadata server system.
可见, 通过上述方式, 从而支持不同文件存储在同一副本中, 与现有 技术中不同文件不能存储在同一副本中相比, 避免了文件访问服务器存储 介质的碎片的不必要增加。  It can be seen that, in the above manner, different files are stored in the same copy, which avoids unnecessary increase of the fragmentation of the file access server storage medium compared with the prior art that different files cannot be stored in the same copy.
其中, 副本位于文件访问服务器的存储介质上。存储介质例如:磁盘。 副本位置信息包括: 文件访问服务器标识和文件访问服务器中的存储 介质标识。 其中, 文件访问服务器标识例如: 文件访问服务器的 IP : 10.47.107.111 ; 存储介质标识例如: 文件访问服务器的 IP: 10.47.107.111。  The copy is located on the storage medium of the file access server. The storage medium is, for example, a magnetic disk. The copy location information includes: File access server ID and storage media ID in the file access server. Wherein, the file access server identifier is, for example: IP of the file access server: 10.47.107.111; storage medium identifier such as: IP of the file access server: 10.47.107.111.
在本发明实施例一中, 至少两个文件的大小之和不大于副本大小。 优 选的, 至少两个文件中每个文件的大小均不大于文件访问客户端中的共享 緩存页 (PAGE ) 大小。  In the first embodiment of the present invention, the sum of the sizes of the at least two files is not greater than the copy size. Preferably, each of the at least two files is no larger than the shared cache page (PAGE) size in the file access client.
在本发明实施例一中, 副本位置与副本句柄对应, 具体地, 副本位置 由元数据服务器系统根据副本句柄确定; 副本句柄为元数据服务器系统根 据至少两个文件中的第一文件的文件名分配得到。 第一文件为至少两个文 件中的任一文件; 或者, 考虑到文件访问客户端为至少两个文件中的每个 文件分别向元数据服务器系统发送打开文件请求, 则第一文件为元数据服 务器系统首次从文件访问客户端接收到的打开文件请求所对应的文件。  In the first embodiment of the present invention, the copy location corresponds to the copy handle. Specifically, the copy location is determined by the metadata server system according to the copy handle; the copy handle is the file name of the first file in the at least two files by the metadata server system. Assigned. The first file is any one of at least two files; or, in consideration of the file access client sending an open file request to each of the at least two files to the metadata server system, the first file is metadata The file corresponding to the open file request received by the server system from the file access client for the first time.
优选地, 元数据服务器系统包括元数据管理服务器和多个元数据存储 服务器, 则副本句柄具体由元数据管理服务器分配得到, 副本位置具体由 多个元数据存储服务器中的第一元数据存储服务器根据副本句柄确定, 第 一元数据存储服务器由元数据管理服务器根据第一文件的文件名从多个元 数据存储服务器中选择得到。 Preferably, the metadata server system comprises a metadata management server and a plurality of metadata storage servers, wherein the copy handle is specifically allocated by the metadata management server, and the copy location is specifically determined by the first metadata storage server of the plurality of metadata storage servers. Determining, according to the copy handle, the first metadata storage server is from the plurality of elements by the metadata management server according to the file name of the first file The data storage server is selected.
在本发明实施例一中, 至少两个文件各自在副本中的偏移位置不同。 这里的偏移位置是指文件在副本中存放的位置, 该位置中的起始位置与副 本的起始位置相比, 重合或者存在偏移。 则至少两个文件中每个文件在副 本中的偏移位置信息包括: 每个文件在副本中存放的起始位置与副本的起 始位置之间的偏移大小信息, 例如, 字节数目或比特数目, PAGE大小及数 目, 或者, 在至少两个文件中每个文件的大小均不大于 PAGE 大小的情况 下的 PAGE数目, 等等。  In the first embodiment of the present invention, at least two files have different offset positions in the copy. The offset position here refers to the position where the file is stored in the copy, and the starting position in the position is coincident or offset from the starting position of the copy. The offset position information of each of the at least two files in the copy includes: offset size information between the start position of each file stored in the copy and the start position of the copy, for example, the number of bytes or The number of bits, the size and number of PAGEs, or the number of PAGEs in which the size of each file in at least two files is not larger than the size of the PAGE, and so on.
至少两个文件各自在副本中的偏移位置信息可以通过如下方式确定: 方式一  The offset position information of each of the at least two files in the copy can be determined as follows:
方式一为基于时序的方式。  The first method is a timing-based approach.
在方式一中, 考虑到元数据服务器系统依次从文件访问客户端接收到 打开文件请求的情形, 这些打开文件请求的数目与至少两个文件的数目相 同, 这些打开文件请求与至少两个文件——对应, 则元数据服务器系统从 副本位置的起始位置开始, 按照接收打开文件请求的次序、 依次为对应的 文件分配偏移位置。  In the first method, considering that the metadata server system sequentially receives the request for opening the file from the file access client, the number of these open file requests is the same as the number of at least two files, and the open file requests and at least two files— - Correspondingly, the metadata server system assigns an offset position to the corresponding file in order from the start position of the copy position in the order in which the file request is received.
具体地, 例如: 为最先接收到的打开文件请求对应的文件分配的偏移 位置为副本位置的起始位置, 为第二个接收到的打开文件请求对应的文件 分配的偏移位置与副本的起始位置相差一个共享緩存页大小, 为第三个接 收到的打开文件请求对应的文件分配的偏移位置与副本的起始位置相差两 个共享緩存页大小, 依次类推。  Specifically, for example, the offset position of the corresponding file allocation for the first received open file is the starting position of the copy position, and the offset position and copy of the corresponding file allocation are requested for the second received open file request. The starting position differs by a shared cache page size, and the offset position of the corresponding file allocation for the third received open file request differs from the starting position of the copy by two shared cache page sizes, and so on.
方式二  Way two
方式二为基于文件名中序号的方式。  The second method is based on the serial number in the file name.
在方式二中, 按照预定的大小顺序, 根据文件名中包含的序号来分配 多个文件各自的偏移位置。  In the second method, the respective offset positions of the plurality of files are assigned according to the sequence number included in the file name in the predetermined size order.
例如, 元数据服务器系统从文件访问客户端接收到三个打开文件请求, 分别对应文件名为 010的文件、 文件名为 001的文件和文件名为 003的文 件, 则按照序号由小到大的顺序依次为这三个文件分配偏移位置, 为文件 名为 001 的文件分配的偏移位置为副本位置的起始位置, 为文件名为 003 的文件分配的偏移位置与副本的起始位置相差一个共享緩存页大小, 为文 件名为 010 的文件分配的偏移位置与副本的起始位置相差两个共享緩存页 大小。 For example, the metadata server system receives three open file requests from the file access client, Corresponding to the file with the file name 010, the file with the file name 001, and the file with the file name 003, the offset positions are assigned to the three files in the order of the serial number, which is the file with the file name 001. The assigned offset position is the start position of the copy position, the offset position assigned to the file with the file name 003 differs from the start position of the copy by a shared cache page size, and the offset position assigned to the file with the file name 010 There are two shared cache page sizes that differ from the starting position of the copy.
又例如, 元数据服务器系统从文件访问客户端接收到三个打开文件请 求, 分别对应文件名为 112 的文件、 文件名为 111 的文件和文件名为 113 的文件, 则按照序号由大到小的顺序为这三个文件分别偏移位置, 为文件 名为 113 的文件分配的偏移位置为副本位置的起始位置, 为文件名为 112 的文件分配的偏移位置与副本的起始位置相差一个共享緩存页大小, 为文 件名为 111 的文件分配的偏移位置与副本的起始位置相差两个共享緩存页 大小。  For another example, the metadata server system receives three open file requests from the file access client, corresponding to the file with the file name 112, the file with the file name 111, and the file with the file name 113, according to the serial number from large to small. The order of the three files is offset, the offset position assigned to the file with the file name 113 is the start position of the copy position, the offset position assigned to the file with the file name 112, and the start position of the copy. The difference is a shared cache page size, and the offset position assigned to the file with the file name 111 differs from the start position of the copy by two shared cache page sizes.
方式二更适用于有固定文件名的场景, 文件名中有一系列连续的序号, 比如系统可以将 3个文件放到一个副本, 文件 001、 文件 002以及文件 003 会放在一起, 偏移位置分别为无偏移, 一个共享緩存页大小, 两个共享緩 存页大小; 再比如文件 010、 文件 011以及文件 012也在同一个副本里, 偏 移位置分别为无偏移, 一个共享緩存页大小, 两个共享緩存页大小。  Mode 2 is more suitable for scenes with fixed file names. The file name has a series of consecutive serial numbers. For example, the system can put 3 files into one copy. File 001, file 002, and file 003 will be put together, and the offset positions are respectively For no offset, one shared cache page size, two shared cache page sizes; for example, file 010, file 011, and file 012 are also in the same copy, offset positions are no offset, one shared cache page size, Two shared cache page sizes.
在本发明实施例一中,为了减少存储介质 IOPS( Input/Output Operations Per Second, 即每秒进行读写操作的次数)资源的不必要消耗, 使文件访问 服务器将至少两个文件分别存储到这至少两个文件各自在副本中的偏移位 置包括:  In the first embodiment of the present invention, in order to reduce the unnecessary consumption of resources of the storage medium IOPS (Input/Output Operations Per Second), the file access server stores at least two files respectively. The offset locations of at least two files in the copy include:
使文件访问服务器能够将至少两个文件写入一个緩存块并在这至少两 个文件写入完成后, 将緩存块中的数据写入副本中、 且数据写入完成后上 在本发明实施例一中, 为了与现有的文件访问客户端到文件访问服务 器的文件写入机制兼容, 从而减少本发明实施例的实现成本和复杂度, 根 据存储位置信息, 与文件访问服务器交互, 使文件访问服务器将至少两个 文件分别存储到至少两个文件各自在副本中的偏移位置包括: Enabling the file access server to write at least two files to one cache block and writing the data in the cache block to the copy after the at least two files are written, and after the data is written, in the embodiment of the present invention One, in order to access the client to file access service with existing files The file writing mechanism of the device is compatible, thereby reducing the implementation cost and complexity of the embodiment of the present invention, and interacting with the file access server according to the storage location information, so that the file access server stores at least two files to at least two files respectively. The offset locations in the copy include:
针对至少两个文件中的每个文件, 根据副本位置信息和每个文件在副 本中的偏移位置信息, 将緩存有每个文件的共享内存页写入文件访问服务 器的緩存块, 使文件访问服务器在写入完成后, 将緩存块中的数据写入副 本中、 且数据写入完成后每个文件存储在每个文件在副本中的偏移位置。  For each of the at least two files, according to the copy location information and the offset location information of each file in the copy, the shared memory page in which each file is cached is written to the cache block of the file access server, so that the file is accessed. After the server finishes writing, the data in the cache block is written into the copy, and after the data is written, each file is stored at the offset position of each file in the copy.
其中, 考虑到至少两个文件中的不同文件的写入时间可能相差较大, 则有: 緩存有不同文件的共享内存页被写入的緩存块不同, 并且不同緩存 块被写入副本中的时间也不同。 或者, 为了减少存储介质 IOPS资源的不必 要消耗, 緩存有不同文件的共享内存页被写入的緩存块为同一緩存块。  Wherein, considering that the writing times of different files in at least two files may be different, there are: a cache block in which shared memory pages having different files are written is different, and different cache blocks are written in the copy. The time is different. Alternatively, in order to reduce the unnecessary consumption of the storage medium IOPS resources, the cache blocks in which the shared memory pages in which different files are cached are written into the same cache block.
在本发明实施例一中, 为了支持副本中文件的读取, 该方法还包括: 记录至少两个文件中每个文件的文件名、 副本位置信息和至少两个文 件各自在副本中的偏移位置信息的对应关系;  In the first embodiment of the present invention, in order to support reading of a file in a copy, the method further includes: recording a file name, a copy location information, and an offset of each of the at least two files in each of the at least two files Correspondence of location information;
接收文件访问客户端发送的请求读取的至少两个文件中的待读取文件 的文件名;  Receiving a file access name of a file to be read in at least two files read by a request sent by the client;
根据上述对应关系和待读取文件的文件名, 确定第一信息; 其中, 第 一信息包括副本位置信息和待读取文件在副本中的偏移位置信息;  Determining, according to the correspondence relationship and the file name of the file to be read, the first information; wherein, the first information includes the copy location information and the offset location information of the file to be read in the copy;
向文件访问客户端发送第一信息, 使得文件访问客户端根据第一信息, 与文件访问服务器交互, 从文件访问服务器读取出第一文件。  Sending the first information to the file access client, so that the file access client interacts with the file access server according to the first information, and reads the first file from the file access server.
与本发明实施例一提供的一种文件存储方法对应, 本发明实施例二提 供的另一种文件存储方法包括如下步骤:  A file storage method according to the first embodiment of the present invention includes the following steps:
步骤 201,接收元数据服务器系统发送的至少两个文件的存储位置信息; 其中, 存储位置信息包括至少两个文件各自在一个副本中的偏移位置信息 和副本位置信息; 存储位置信息由元数据服务器系统确定;  Step 201: Receive storage location information of at least two files sent by the metadata server system, where the storage location information includes offset location information and replica location information of each of the at least two files in one copy; storing location information by metadata Server system determination;
步骤 202, 根据存储位置信息, 与文件访问服务器交互, 使文件访问服 务器将至少两个文件分别存储到至少两个文件各自在副本中的偏移位置。 该方法应用于文件访问客户端。 Step 202, interacting with the file access server according to the storage location information, so that the file access service The server stores at least two files in an offset position of each of the at least two files in the copy. This method is applied to the file access client.
可见, 通过上述方式, 从而支持不同文件存储在同一副本中, 与现有 技术中不同文件不能存储在同一副本中相比, 避免了文件访问服务器存储 介质的碎片的不必要增加。  It can be seen that, in the above manner, different files are stored in the same copy, which avoids unnecessary increase of the fragmentation of the file access server storage medium compared with the prior art that different files cannot be stored in the same copy.
在本发明实施例二中, 为了减少存储介质 IOPS资源的不必要消耗, 使 文件访问服务器将至少两个文件分别存储到至少两个文件各自在副本中的 偏移位置包括:  In the second embodiment of the present invention, in order to reduce unnecessary consumption of the storage medium IOPS resources, the file access server stores the at least two files separately to the offset positions of the at least two files in the copy respectively:
使文件访问服务器能够将至少两个文件写入一个緩存块并在该至少两 个文件写入完成后, 将緩存块中的数据写入副本中、 且数据写入完成后至 少两个文件分别存储在至少两个文件各自在副本中的偏移位置。  Enabling the file access server to write at least two files to one cache block and after the at least two files are written, the data in the cache block is written into the copy, and at least two files are respectively stored after the data is written. The offset position in at least two files in the copy.
在本发明实施例二中, 为了与现有的文件访问客户端到文件访问服务 器的文件写入机制兼容, 从而减少本发明实施例的实现成本和复杂度, 根 据存储位置信息, 与文件访问服务器交互, 使文件访问服务器将至少两个 文件分别存储到至少两个文件各自在副本中的偏移位置包括:  In the second embodiment of the present invention, in order to be compatible with the file writing mechanism of the existing file access client to the file access server, thereby reducing the implementation cost and complexity of the embodiment of the present invention, according to the storage location information, and the file access server Interacting, causing the file access server to store at least two files separately to offset positions of at least two files in the copy respectively:
针对至少两个文件中的每个文件, 根据副本位置信息和每个文件在副 本中的偏移位置信息, 将緩存有每个文件的共享内存页写入文件访问服务 器的緩存块, 使文件访问服务器在上述共享内存页写入完成后, 将緩存块 中的数据写入副本中且数据写入完成后每个文件存储在每个文件在副本中 的偏移位置。  For each of the at least two files, according to the copy location information and the offset location information of each file in the copy, the shared memory page in which each file is cached is written to the cache block of the file access server, so that the file is accessed. After the above-mentioned shared memory page write is completed, the server writes the data in the cache block to the copy and each file is stored in the offset position of each file in the copy after the data is written.
在本发明实施例二中, 为了支持副本中文件的读取, 该方法还包括: 向元数据服务器系统发送请求, 其中包括待读取文件的文件名; 接收元数据服务器系统发送的第一信息; 其中, 第一信息包括副本位 置信息和待读取文件在副本中的偏移位置信息; 第一信息由元数据服务器 系统根据至少两个文件中每个文件的文件名、 副本位置信息和至少两个文 件各自在副本中的偏移位置信息的对应关系和待读取文件的文件名确定; 对应关系由元数据服务器系统记录; In the second embodiment of the present invention, in order to support the reading of the file in the copy, the method further includes: sending a request to the metadata server system, where the file name of the file to be read is included; and receiving the first information sent by the metadata server system Wherein, the first information includes copy location information and offset location information of the file to be read in the copy; the first information is determined by the metadata server system according to file name, copy location information, and at least file name of each of the at least two files The correspondence between the offset position information of each of the two files in the copy and the file name of the file to be read are determined; The correspondence is recorded by the metadata server system;
根据第一信息, 与文件访问服务器交互, 从文件访问服务器读取出对 应的第一文件。  According to the first information, interacting with the file access server, reading the corresponding first file from the file access server.
需要说明的是, 由于本发明实施例二提供的另一种文件存储方法与本 发明实施例一提供的一种文件存储方法对应, 则上述针对该另一种文件存 储方法阐述的内容中所涉及的术语及技术手段的具体含意, 均可参照本发 明实施例一中阐述的术语及技术手段的具体含意, 为节约篇幅, 在此不再 赘述。  It should be noted that, because another file storage method provided by the second embodiment of the present invention corresponds to a file storage method provided by the first embodiment of the present invention, the content described above for the other file storage method is involved. For the specific meanings of the terms and technical means, reference may be made to the specific meanings of the terms and technical means set forth in the first embodiment of the present invention, and the details are not described herein.
为了将本发明实施例一和本发明实施例二提供的相互对应的两种文件 存储方法阐述得更加清楚明白, 下面提供该两种文件存储方法的优选实施 方式: 一种多副本式分布式文件系统(DFS ) 的小文件聚合实现。  In order to clarify the two file storage methods corresponding to the first embodiment of the present invention and the two embodiments of the present invention, a preferred embodiment of the two file storage methods is provided below: A multi-copy distributed file Small file aggregation implementation of the system (DFS).
在本优选实施方式中, 文件访问客户端( File Access Client, FAC ): 负 责 DFS面向的应用程序提供类似于标准文件系统的接口调用服务, 对于应 用层的读写数据是以页 (PAGE ) 大小进行管理的。  In the preferred embodiment, the File Access Client (FAC): The DFS-oriented application provides an interface calling service similar to the standard file system, and the read and write data for the application layer is the page (PAGE) size. Managed.
元数据服务器系统: 负责管理 DFS所有文件的文件名、 副本信息等元 数据信息, 存在数据库中, 并向文件访问客户端提供元数据写入和查询等 操作。 元数据服务器系统包括元数据管理服务器和多个元数据存储服务器。  Metadata server system: It is responsible for managing the metadata information such as file name and copy information of all DFS files, exists in the database, and provides metadata write and query operations to the file access client. The metadata server system includes a metadata management server and a plurality of metadata storage servers.
文件访问服务器( File Access Server, FAS ): 负责以緩存块为单位与其 自身的存储介质交互, 进行緩存块的读写操作, 其中, FAS 对数据是以緩 存块(BLK ) 大小进行管理的; 响应文件访问客户端的数据读写请求, 从 存储介质上读取数据并返回给文件访问客户端; 从文件访问客户端读取数 据并写入存储介质;  File Access Server (FAS): Responsible for interacting with its own storage medium in the cache block unit, and performing read and write operations on the cache block. The FAS manages the data in the size of the cache block (BLK); The file accesses the client's data read and write request, reads the data from the storage medium and returns it to the file access client; reads the data from the file access client and writes the storage medium;
存储介质: 一般为普通的 SCIC磁盘或 SATA磁盘, CHUNK实际存储 的地方, CHUNK最小为一个 BLK大小, 最大值可设定, 并且以 BLK大小 为粒度增长。  Storage media: Generally, it is a normal SCIC disk or SATA disk. Where CHUNK is actually stored, CHUNK is a minimum of BLK size, the maximum value can be set, and the size is increased by BLK size.
至少两个文件中每个文件均为小文件。 这里的小文件是指大小不大于 一个 PAGE的大小的文件。 PAGE的大小可设置,一个 CHUNK的聚合度为 BLK大小 /PAGE大小, 比如 PAGE大小 =32K, BLK大小 =1024Κ, 则一个 CHUNK中聚合的文件个数为 1024/32=32。 Each of the at least two files is a small file. The small file here means that the size is not greater than A PAGE size file. The size of the PAGE can be set. The degree of aggregation of a CHUNK is BLK size/PAGE size. For example, PAGE size=32K, BLK size=1024Κ, the number of files aggregated in one CHUNK is 1024/32=32.
现有技术中一个小文件对应一个 PAGE, —个 BLK, 在磁盘上对应一 个副本;而在本优选实施方式中,聚合之后,多个 PAGE可对应同一个 BLK, 多个 PAGE对应同一个副本。  In the prior art, a small file corresponds to a PAGE, a BLK, corresponding to a copy on the disk; and in the preferred embodiment, after the aggregation, multiple PAGEs may correspond to the same BLK, and multiple PAGEs correspond to the same copy.
文件在副本中存放的起始位置与副本的起始位置之间的偏移大小信息 为 PAGE数目。  The offset size information between the starting position where the file is stored in the copy and the starting position of the copy is the number of PAGEs.
下面以聚合度为 3为例,结合图 3说明小文件聚合过程。如图 3所示, 至少两个文件包括文件 FILE#001、 文件 FILE#002和文件 FILE#003。  The following takes the degree of aggregation as 3 as an example, and illustrates the small file aggregation process in conjunction with FIG. As shown in FIG. 3, at least two files include a file FILE#001, a file FILE#002, and a file FILE#003.
在创建文件 FILE#001时, FAC将承载了 FILE#001的 PAGE#1写入 FAS 的一个 BLK ( FAS_BLK#1 ) 的第一个 1/3緩存空间处; PAGE#1写入完成 后, 在创建文件 FILE#002时, FAC将承载了 FILE#002的 PAGE#2写入 FAS_BLK#1的第二个 1/3緩存空间处; PAGE#2写入完成后, 在创建文件 FILE#003时, FAC将承载了 FILE#003的 PAGE#3写入 FAS— BLK#1的第三 个 1/3緩存空间处。  When creating the file FILE#001, the FAC writes the PAGE#1 carrying FILE#001 to the first 1/3 buffer space of a BLK (FAS_BLK#1) of the FAS; after the PAGE#1 is written, When the file FILE#002 is created, the FAC writes the PAGE#2 carrying FILE#002 to the second 1/3 buffer space of FAS_BLK#1; after the PAGE#2 is written, when the file FILE#003 is created, The FAC writes PAGE#3 carrying FILE#003 to the third 1/3 buffer space of FAS_BLK#1.
在 PAGE#3 写入完成后, FAS 刷自身的緩存块 FAS— BLK#1 (即将 FAS_BLK#1中的数据写入到磁盘上的一个副本 FAS— BLK#1— CHKFILE中)。 则刷完后, 文件 FILE#001、 文件 FILE#002和文件 FILE#003就存储在各自 在该副本中的偏移位置, 其中, 文件 FILE#001在该副本中存放的起始位置 为该副本的起始位置, 对应的偏移位置为 0; 文件 FILE#002在该副本中存 放的起始位置与该副本的起始位置相差一个 PAGE大小, 对应的偏移位置 为 1 ; 文件 FILE#003在该副本中存放的起始位置与该副本的起始位置相差 两个 PAGE大小, 对应的偏移位置为 2。  After the PAGE#3 write is completed, the FAS brushes its own cache block FAS_BLK#1 (ie, the data in FAS_BLK#1 is written to a copy on the disk FAS_BLK#1-CHKFILE). After the brushing, the file FILE#001, the file FILE#002 and the file FILE#003 are stored in the respective offset positions in the copy, wherein the file FILE#001 is stored in the copy at the starting position of the copy. The starting position, the corresponding offset position is 0; the starting position of the file FILE#002 in the copy is different from the starting position of the copy by a PAGE size, and the corresponding offset position is 1; File FILE#003 The starting position stored in the copy differs from the starting position of the copy by two PAGE sizes, and the corresponding offset position is 2.
由此可见, 本优选实施方式通过将一定数量的小于一个 PAGE 大小的 文件在文件访问服务器上存放在同一个 BLK, 在磁盘上存放在同一个 CHUNK中,一方面,在小文件数目较多且一定的情况下,与现有技术相比, 大量减少了存储在磁盘上的 CHUNK数目, 有效减少了磁盘的碎片, 节省 了一定的磁盘空间并提高了磁盘的整体读写性能; 另一方面, 多个小文件 在应用层的读写只需要发生一次磁盘 10,与现有技术要发生多次磁盘 10相 比, 减少了磁盘有限的 IOPS能力对应用层读写的 IOPS的限制, 相应地也 就提高了应用层的 IOPS。 此外, 元数据中 CHUNK文件的副本位置信息, 即副本所在的文件访问服务器以及所在卷信息等, 也大为减少。 It can be seen that the preferred embodiment stores a certain number of files smaller than one PAGE size on the same BLK on the file access server, and stores the same on the disk. In the case of CHUNK, on the one hand, compared with the prior art, the number of CHUNKs stored on the disk is greatly reduced, the disk fragmentation is effectively reduced, and a certain disk space is saved. The overall read and write performance of the disk is improved; on the other hand, the reading and writing of multiple small files at the application layer only needs to occur once for the disk 10, which reduces the limited IOPS capability of the disk compared with the disk 10 that occurs in the prior art. The limitation of IOPS on the application layer reads and writes, which in turn increases the IOPS of the application layer. In addition, the copy location information of the CHUNK file in the metadata, that is, the file access server where the copy is located and the volume information of the file are also greatly reduced.
在本优选实施方式中, 具体地, 元数据存储服务器中的副本信息中可 以增加聚合度大小的位图, 记录文件对应的 PAGE数目信息。 例如, 对于 上述聚合度为 3的例子,文件 FILE#001的偏移为 0,位图为 001 (二进制); 文件 FILE#002的偏移为 1, 位图为 010 (二进制); 文件 FILE#002的偏移 为 2, 位图为 100 (二进制)。 再以聚合度为 8为例, 对应 PAGE数目为 0 的文件的偏移为 0, 位图为 00000001 (二进制); 对应 PAGE数目为 2的文 件的偏移为 3, 位图为 00001000 (二进制); 对应 PAGE数目为 6的文件的 偏移为 7, 位图为 10000000 (二进制)。  In the preferred embodiment, specifically, the bitmap information in the metadata storage server may be added with a bitmap of the degree of aggregation, and the PAGE number information corresponding to the file is recorded. For example, for the above example with a degree of aggregation of 3, the offset of the file FILE#001 is 0, the bitmap is 001 (binary); the offset of the file FILE#002 is 1, and the bitmap is 010 (binary); file FILE# The offset of 002 is 2 and the bitmap is 100 (binary). Taking the degree of polymerization as 8 as an example, the offset of the file corresponding to the number of PAGEs is 0, and the bitmap is 00000001 (binary); the offset of the file corresponding to the number of PAGEs is 3, and the bitmap is 00001000 (binary) ; The offset of the file corresponding to the number of PAGEs is 7 and the bitmap is 10000000 (binary).
以聚合度为 8为例, 页大小为 32k, blk大小为 256k, 比如新写入一个 文件 file003, 大小 lk, 满足小文件聚合的条件(大小小于一个页大小), 去 元数据管理服务器打开文件的时候, 会根据是第几个小文件来确认偏移。  Take the aggregation degree as 8 as an example, the page size is 32k, and the blk size is 256k. For example, a file file003 is newly written, and the size is lk, which satisfies the condition of small file aggregation (the size is smaller than one page size), and the metadata management server opens the file. At the time, the offset is confirmed based on the first few small files.
下面分别以文件写入流程和读取流程为例说明小文件聚合的过程。 图 4为本优选实施方式的小文件聚合写的流程示意图, 参照图 4, 小文 件聚合写的流程包括:  The following describes the process of small file aggregation by taking the file writing process and the reading process as examples. 4 is a schematic flowchart of a small file aggregation write according to a preferred embodiment. Referring to FIG. 4, a small file aggregation write process includes:
步骤 401, 应用层向文件访问客户端发起打开文件请求。  Step 401: The application layer initiates an open file request to the file access client.
步骤 402, 文件访问客户端向元数据管理服务器发起打开文件请求 (带 文件名与创建标志), 数据库根据文件名确定该文件所属的元数据存储服务 器以及该文件所属的副本句柄和在副本内的偏移位置信息。 同一个副本内 的多个文件归属于同一个元数据存储服务器。 步骤 403, 文件访问客户端收到元数据管理服务器的回应后, 如果给的 副本句柄不为 0,则将副本句柄与偏移位置信息记录在文件访问客户端的文 件管理全局结构中。 Step 402: The file access client initiates an open file request (with a file name and a creation flag) to the metadata management server, and the database determines, according to the file name, the metadata storage server to which the file belongs and the copy handle to which the file belongs and the copy Offset position information. Multiple files within the same copy belong to the same metadata storage server. Step 403: After the file access client receives the response from the metadata management server, if the given copy handle is not 0, the copy handle and the offset location information are recorded in the file management global structure of the file access client.
步骤 404, 文件访问客户端到对应的元数据存储服务器上打开文件, 元 数据存储服务器给文件访问客户端回应答, 文件访问客户端收到应答后给 应用层回应。  Step 404: The file access client opens the file to the corresponding metadata storage server, and the metadata storage server replies to the file access client, and the file access client responds to the application layer after receiving the response.
步骤 405,应用层给文件访问客户端发写请求, 文件访问客户端收到写 请求后,先判断文件管理全局结构中记录的副本句柄是否为 0,如果不为 0, 则用该副本句柄向元数据存储服务器发获取副本位置的请求, 如果为 0, 则 利用自身文件标识加上计算得到的写入的页所在副本序列号生成的副本句 柄向元数据存储服务器发获取副本位置的请求。  Step 405: The application layer sends a write request to the file access client. After receiving the write request, the file access client first determines whether the copy handle recorded in the global structure of the file management is 0. If not, the copy handle is used to The metadata storage server sends a request for obtaining a copy location, and if it is 0, the request for obtaining the copy location is sent to the metadata storage server by using the file identifier generated by the file identifier plus the calculated copy number of the written page.
步骤 406, 元数据存储服务器收到请求后, 从数据库获取副本位置, 返 回给文件访问客户端。  Step 406: After receiving the request, the metadata storage server obtains the copy location from the database and returns it to the file access client.
步骤 407, 文件访问客户端收到元数据存储服务器的副本位置后, 将应 用层的数据写入文件访问客户端的共享内存页, 并给应用层回应。  Step 407: After receiving the copy location of the metadata storage server, the file access client writes the data of the application layer to the shared memory page of the file access client, and responds to the application layer.
步骤 408,文件访问客户端的写线程将共享内存页根据副本句柄与偏移 位置信息写入到文件访问服务器。  Step 408, the file access client's write thread writes the shared memory page to the file access server according to the copy handle and the offset location information.
其中, 由于文件为小文件, 可以緩存在一个 PAGE 中, 则文件访问客 户端可以重用现有的将文件数据写入文件访问服务器的机制来进行不同小 文件数据的写入。 具体地, FAC 只需将不同小文件的数据当作同一文件在 不同 PAGE 中緩存的数据来进行写入操作即可。 由此, 本优选实施方式不 需要改动现有的文件访问服务器, 节约了云服务系统的升级成本。  Among them, since the file is a small file and can be cached in a PAGE, the file access client can reuse the existing mechanism for writing file data to the file access server to write different small file data. Specifically, the FAC only needs to write data of different small files as data buffered by the same file in different PAGEs. Therefore, the preferred embodiment does not need to modify the existing file access server, which saves the upgrade cost of the cloud service system.
对于文件访问服务器来说, 如果緩存块未命中, 则申请新的緩存块, 将数据页写入緩冲块中, 在刷入磁盘前多个文件的数据写入同一个緩存块, 最后存储在磁盘上的同一个副本中。  For the file access server, if the cache block is missed, a new cache block is requested, and the data page is written into the buffer block. The data of the multiple files is written to the same cache block before being flushed into the disk, and finally stored in the buffer block. In the same copy on the disk.
步骤 409, 应用层发关闭请求, 文件访问客户端回应。 图 5为本优选实施方式的小文件聚合读的流程示意图, 参照图 5, 小文 件聚合读的流程包括如下步骤: In step 409, the application layer sends a close request, and the file access client responds. FIG. 5 is a schematic flowchart of a small file aggregation read according to a preferred embodiment. Referring to FIG. 5, the small file aggregation read process includes the following steps:
步骤 501, 应用层向文件访问客户端发起打开文件请求。  Step 501: The application layer initiates an open file request to the file access client.
步骤 502, 文件访问客户端向元数据管理服务器发起打开文件请求,数 据库根据文件名确定该文件所属的元数据存储服务器以及该文件所属的副 本和在副本内的偏移位置信息。 同一个副本内的多个文件归属于同一个元 数据存储服务器。  Step 502: The file access client initiates an open file request to the metadata management server, and the database determines, according to the file name, the metadata storage server to which the file belongs, and the copy to which the file belongs and the offset location information in the copy. Multiple files within the same copy belong to the same metadata storage server.
步骤 503, 文件访问客户端收到元数据存储服务器的回应后, 如果给的 副本句柄不为 0, 则将副本句柄与偏移记录在文件管理全局结构中。  Step 503: After the file access client receives the response from the metadata storage server, if the given copy handle is not 0, the copy handle and the offset are recorded in the file management global structure.
步骤 504, 文件访问客户端到元数据存储服务器上打开文件, 元数据存 储服务器给文件访问客户端回应, 文件访问客户端收到应答后给应用层回 应。  Step 504: The file access client opens the file on the metadata storage server, and the metadata storage server responds to the file access client, and the file access client responds to the application layer after receiving the response.
步骤 505,应用层给文件访问客户端发读请求, 文件访问客户端收到读 请求后,先判断文件管理全局结构中记录的副本句柄是否为 0,如果不为 0, 则用该副本句柄向元数据存储服务器发获取副本位置的请求, 如果为 0, 则 利用自身文件标识加上计算得到的写入的页所在副本序列号生成的副本句 柄向元数据存储服务器发获取副本位置的请求。  Step 505: The application layer sends a read request to the file access client. After receiving the read request, the file access client first determines whether the copy handle recorded in the global structure of the file management is 0. If not, the copy handle is used to The metadata storage server sends a request for obtaining a copy location, and if it is 0, the request for obtaining the copy location is sent to the metadata storage server by using the file identifier generated by the file identifier plus the calculated copy number of the written page.
步骤 506,元数据存储服务器收到请求后,从数据库获取副本位置信息, 返回给文件访问客户端。  Step 506: After receiving the request, the metadata storage server obtains the copy location information from the database and returns the file to the file access client.
步骤 507, 文件访问客户端收到元数据存储服务器的副本位置信息后, 首先, 根据副本位置信息与偏移位置信息去文件访问客户端的共享内存读 取, 判断是否有与偏移位置信息对应的 PAGE,命中则返回该 PAGE中的文 件给应用层, 否则, 继续向文件访问服务器緩存块读取, 命中则返回, 否 则, 去磁盘读相应的副本, 一个副本存储多个文件的数据, 读上一个緩存 块数据, 下次读另一个文件则可能直接命中緩存, 无需到磁盘读, 其中, 到緩存块中读数据始终在緩存块的起始位置读。 步骤 508,文件访问客户端将磁盘读上的緩存块数据读入文件访问服务 器的緩存, 并且将与偏移位置信息对应的 PAGE读入文件访问客户端的共 享内存中, 并返回给应用层。 Step 507: After receiving the copy location information of the metadata storage server, the file access client first reads the shared memory of the file access client according to the copy location information and the offset location information, and determines whether there is any corresponding to the offset location information. PAGE, hit returns the file in the PAGE to the application layer, otherwise, continue to read to the file access server cache block, hit returns, otherwise, go to disk to read the corresponding copy, a copy of the data stored in multiple files, read One cache block data, the next time you read another file, you may directly hit the cache, no need to go to disk read, where the read data to the cache block is always read at the beginning of the cache block. Step 508: The file access client reads the cache block data read by the disk into the cache of the file access server, and reads the PAGE corresponding to the offset location information into the shared memory of the file access client, and returns it to the application layer.
可见, 本优选实施方式中, 由于文件访问客户端是按照读取 PAGE的 方式来读取小文件, 因而不需要改动现有的文件访问服务器, 节约了云服 务系统的升级成本。  It can be seen that, in the preferred embodiment, since the file access client reads the small file in the manner of reading the PAGE, the existing file access server does not need to be modified, which saves the upgrade cost of the cloud service system.
步骤 509, 应用层发关闭请求, 文件访问客户端回应。  Step 509, the application layer sends a close request, and the file access client responds.
优选的, 本优选实施方式用于小文件存储较多或者只存储小文件的应 用场景。  Preferably, the preferred embodiment is for an application scenario in which small files are stored more or only small files are stored.
需要说明的是, 本优选实施方式亦适用于其它基于副本进行文件存储 的分布式文件系统。  It should be noted that the preferred embodiment is also applicable to other distributed file systems based on files for file storage.
本发明实施例三提供一种文件存储装置, 应用于元数据服务器系统, 装置包括:  A third embodiment of the present invention provides a file storage device, which is applied to a metadata server system, where the device includes:
确定模块, 配置为确定至少两个文件的存储位置信息; 其中, 存储位 置信息包括至少两个文件各自在一个副本中的偏移位置信息和副本位置信 息;  a determining module, configured to determine storage location information of at least two files; wherein the storage location information includes offset location information and replica location information of each of the at least two files in one copy;
发送模块, 配置为向文件访问客户端发送存储位置信息, 使文件访问 客户端根据存储位置信息, 与文件访问服务器交互, 使文件访问服务器将 至少两个文件分别存储到至少两个文件各自在副本中的偏移位置。  The sending module is configured to send the storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information, so that the file access server stores at least two files respectively into at least two files respectively in the copy The offset position in .
可见, 通过上述方式, 从而支持不同文件存储在同一副本中, 与现有 技术中不同文件不能存储在同一副本中相比, 避免了文件访问服务器存储 介质的碎片的不必要增加。  It can be seen that, in the above manner, different files are stored in the same copy, which avoids unnecessary increase of the fragmentation of the file access server storage medium compared with the prior art that different files cannot be stored in the same copy.
在本发明实施例三中, 发送模块包括:  In the third embodiment of the present invention, the sending module includes:
发送单元, 配置为向文件访问客户端发送存储位置信息, 使文件访问 客户端根据存储位置信息, 与文件访问服务器交互, 使文件访问服务器将 至少两个文件写入一个緩存块并在该至少两个文件写入完成后, 将緩存块 中的数据写入副本中、 且数据写入完成后至少两个文件分别存储在至少两 个文件各自在副本中的偏移位置。 The sending unit is configured to send the storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information, so that the file access server writes at least two files into one cache block and at least two After the file is written, the cache block will be The data in the write is written in the copy, and at least two files are respectively stored in the offset position of each of the at least two files in the copy after the data is written.
需要说明的是,本发明实施例三为装置实施例, 与本发明实施例一(为 方法实施例)相对应, 在本发明实施例三中未详细描述的部分参照本发明 实施例一中相关部分的描述即可, 为节约篇幅, 在此不再赘述。 另外, 本 发明实施例三中的确定模块可由所述元数据服务器系统的中央处理器 It should be noted that the third embodiment of the present invention is an apparatus embodiment, which corresponds to the first embodiment of the present invention (which is a method embodiment), and the part that is not described in detail in the third embodiment of the present invention is related to the first embodiment of the present invention. Part of the description can be, in order to save space, no longer repeat them here. In addition, the determining module in the third embodiment of the present invention may be implemented by a central processing unit of the metadata server system.
( CPU, Central Processing Unit )、 处理器( MPU, Micro Processing Unit ) 或数字信号处理器(DSP, Digital Signal Processor )来实现; 发送模块以及 发送模块中的发送单元可由所述元数据服务器系统中具备对外通信功能的 芯片来实现。 (CPU, Central Processing Unit), processor (MPU, Micro Processing Unit) or digital signal processor (DSP); the transmitting module and the transmitting unit in the transmitting module are available in the metadata server system The external communication function of the chip is implemented.
本发明实施例四提供另一种文件存储装置, 应用于文件访问客户端, 装置包括:  The fourth embodiment of the present invention provides another file storage device, which is applied to a file access client, and the device includes:
接收模块, 配置为接收元数据服务器系统发送的至少两个文件的存储 位置信息; 其中, 存储位置信息包括至少两个文件各自在一个副本中的偏 移位置信息和副本位置信息; 存储位置信息由元数据服务器系统确定; 交互模块, 配置为根据存储位置信息, 与文件访问服务器交互, 使文 件访问服务器将至少两个文件分别存储到至少两个文件各自在副本中的偏 移位置。  a receiving module, configured to receive storage location information of at least two files sent by the metadata server system, where the storage location information includes offset location information and copy location information of each of the at least two files in one copy; The metadata server system determines; the interaction module is configured to interact with the file access server according to the storage location information, so that the file access server stores the at least two files separately to an offset position of each of the at least two files in the copy.
可见, 通过上述方式, 从而支持不同文件存储在同一副本中, 与现有 技术中不同文件不能存储在同一副本中相比, 避免了文件访问服务器存储 介质的碎片的不必要增加。  It can be seen that, in the above manner, different files are stored in the same copy, which avoids unnecessary increase of the fragmentation of the file access server storage medium compared with the prior art that different files cannot be stored in the same copy.
在本发明实施例四中, 交互模块包括:  In the fourth embodiment of the present invention, the interaction module includes:
交互单元, 配置为针对至少两个文件中的每个文件, 根据副本位置信 息和每个文件在副本中的偏移位置信息, 将緩存有每个文件的共享内存页 写入文件访问服务器的緩存块, 使文件访问服务器在共享内存页写入完成 后, 将緩存块中的数据写入副本中且数据写入完成后每个文件存储在每个 文件在副本中的偏移位置。 The interaction unit is configured to write, to each of the at least two files, the shared memory page in which each file is cached to the file access server cache according to the copy location information and the offset location information of each file in the copy Block, after the file access server writes the shared memory page, writes the data in the cache block to the copy and stores each file in each file after the data is written. The offset position of the file in the copy.
需要说明的是,本发明实施例四为装置实施例, 与本发明实施例二(为 方法实施例 )相对应, 在本发明实施例四中未详细描述的部分参照本发明 实施例一、 二中相关部分的描述, 在此不再赘述。 本发明实施例四中的确 定模块可由所述文件访问客户端的 CPU、 MPU或 DSP来实现; 交互模块 以及发送模块中的交互单元可由所述文件访问客户端中具备交互功能的芯 片来实现。  It should be noted that the fourth embodiment of the present invention is an apparatus embodiment, and corresponds to the second embodiment of the present invention (which is a method embodiment), and the parts that are not described in detail in the fourth embodiment of the present invention refer to the first and second embodiments of the present invention. The description of the relevant part is not repeated here. The determining module in the fourth embodiment of the present invention may be implemented by the file accessing the CPU, the MPU or the DSP of the client; the interaction module and the interaction unit in the sending module may be implemented by the chip having the interactive function in the file access client.
本发明实施例五提供一种元数据服务器系统, 元数据服务器系统包括 本发明实施例三提供的一种文件存储装置。  A fifth embodiment of the present invention provides a metadata server system. The metadata server system includes a file storage device according to Embodiment 3 of the present invention.
本发明实施例六提供一种文件访问客户端, 文件访问客户端包括本发 明实施例四提供的另一种文件存储装置。  The sixth embodiment of the present invention provides a file access client, and the file access client includes another file storage device provided in Embodiment 4 of the present invention.
以上仅是本发明实施例的实施方式, 应当指出, 对于本技术领域的普 通技术人员来说, 在不脱离本发明实施例原理的前提下, 还可以作出若干 改进和润饰, 这些改进和润饰也应视为本发明实施例的保护范围。  The above is only an embodiment of the embodiments of the present invention. It should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the embodiments of the present invention. It should be considered as the scope of protection of the embodiments of the present invention.

Claims

权利要求书 Claim
1. 一种文件存储方法, 应用于元数据服务器系统, 所述方法包括: 确定至少两个文件的存储位置信息; 其中, 所述存储位置信息包括 所述至少两个文件各自在一个副本中的偏移位置信息和副本位置信息; 向文件访问客户端发送所述存储位置信息, 使所述文件访问客户端 根据所述存储位置信息, 与文件访问服务器交互, 使所述文件访问服务 器将所述至少两个文件分别存储到所述至少两个文件各自在所述副本中 的偏移位置。  A file storage method, applied to a metadata server system, the method comprising: determining storage location information of at least two files; wherein the storage location information includes each of the at least two files in one copy Offset location information and copy location information; sending the storage location information to a file access client, causing the file access client to interact with the file access server according to the storage location information, so that the file access server will At least two files are respectively stored to offset positions of the at least two files in the copy.
2. 如权利要求 1所述的方法, 其中, 所述使所述文件访问服务器将 所述至少两个文件分别存储到所述至少两个文件各自在所述副本中的偏 移位置包括: 述至少两个文件写入完成后, 将所述緩存块中的数据写入所述副本中、 各自在所述副本中的偏移位置。  2. The method according to claim 1, wherein the causing the file access server to store the at least two files separately to an offset position of each of the at least two files in the copy comprises: After the at least two files are written, the data in the cache block is written into the copy, each offset position in the copy.
3. 如权利要求 1所述的方法, 其中, 所述方法还包括:  3. The method according to claim 1, wherein the method further comprises:
记录所述至少两个文件中每个文件的文件名、 所述副本位置信息和 所述至少两个文件各自在所述副本中的偏移位置信息的对应关系;  Recording a correspondence between a file name of each of the at least two files, the copy location information, and offset location information of each of the at least two files in the copy;
接收所述文件访问客户端发送的请求读取的所述至少两个文件中的 待读取文件的文件名;  Receiving, by the file accessing client, a file name of the file to be read in the at least two files read by the request sent by the client;
根据所述对应关系和所述待读取文件的文件名, 确定第一信息; 其 中, 所述第一信息包括所述副本位置信息和所述待读取文件在所述副本 中的偏移位置信息;  Determining first information according to the correspondence relationship and a file name of the file to be read; wherein the first information includes the copy location information and an offset position of the file to be read in the copy Information
向所述文件访问客户端发送所述第一信息, 使所述文件访问客户端 根据所述第一信息, 与所述文件访问服务器交互, 从所述文件访问服务 器读取出对应的第一文件。 Sending the first information to the file access client, causing the file access client to interact with the file access server according to the first information, and access the service from the file The device reads out the corresponding first file.
4. 一种文件存储方法, 应用于文件访问客户端, 所述方法包括: 接收元数据服务器系统发送的至少两个文件的存储位置信息; 其中, 所述存储位置信息包括所述至少两个文件各自在一个副本中的偏移位置 信息和所述副本位置信息; 所述存储位置信息由所述元数据服务器系统 确定;  A file storage method, which is applied to a file access client, the method comprising: receiving storage location information of at least two files sent by a metadata server system; wherein the storage location information includes the at least two files Offset location information and copy location information in a copy; the storage location information is determined by the metadata server system;
根据所述存储位置信息, 与文件访问服务器交互, 使所述文件访问 服务器将所述至少两个文件分别存储到所述至少两个文件各自在所述副 本中的偏移位置。  And interacting with the file access server according to the storage location information, causing the file access server to separately store the at least two files to an offset position of each of the at least two files in the copy.
5. 如权利要求 4所述的方法, 其中, 所述使所述文件访问服务器将 所述至少两个文件分别存储到所述至少两个文件各自在所述副本中的偏 移位置包括: 述至少两个文件写入完成后, 将所述緩存块中的数据写入所述副本中、 各自在所述副本中的偏移位置。  5. The method according to claim 4, wherein the causing the file access server to store the at least two files separately to an offset position of each of the at least two files in the copy comprises: After the at least two files are written, the data in the cache block is written into the copy, each offset position in the copy.
6. 如权利要求 4所述的方法, 其中, 所述根据所述存储位置信息, 与文件访问服务器交互, 使所述文件访问服务器将所述至少两个文件分 别存储到所述至少两个文件各自在所述副本中的偏移位置包括:  The method according to claim 4, wherein the interacting with the file access server according to the storage location information causes the file access server to separately store the at least two files to the at least two files The respective offset positions in the copy include:
针对所述至少两个文件中的每个文件, 根据所述副本位置信息和所 述每个文件在所述副本中的偏移位置信息, 将緩存有所述每个文件的共 享内存页写入所述文件访问服务器的緩存块, 使所述文件访问服务器在 所述共享内存页写入完成后, 将所述緩存块中的数据写入所述副本中且 所述数据写入完成后所述每个文件存储在所述每个文件在所述副本中的 偏移位置。  Writing, for each of the at least two files, a shared memory page in which each of the files is cached according to the copy location information and offset location information of each of the files in the replica The file accessing a cache block of the server, after the file access server writes the data in the cache block into the copy after the writing of the shared memory page is completed, and the data is written after the writing is completed. Each file is stored at an offset location in the copy of each of the files.
7. 如权利要求 4所述的方法, 其中, 所述方法还包括: 向所述元数据服务器系统发送请求, 其中包括待读取文件的文件名; 接收所述元数据服务器系统发送的第一信息; 其中, 所述第一信息 包括所述副本位置信息和所述待读取文件在所述副本中的偏移位置信息; 所述第一信息由所述元数据服务器系统根据所述至少两个文件中每个文 件的文件名、 所述副本位置信息和所述至少两个文件各自在所述副本中 的偏移位置信息的对应关系和所述待读取文件的文件名确定; 所述对应 关系由所述元数据服务器系统记录; The method of claim 4, wherein the method further comprises: Sending a request to the metadata server system, where the file name of the file to be read is received; receiving the first information sent by the metadata server system; wherein the first information includes the copy location information and the Reading offset position information of the file in the copy; the first information is used by the metadata server system according to a file name of each of the at least two files, the copy location information, and the at least Corresponding relationship between the offset position information of the two files in the copy and the file name of the file to be read; the correspondence relationship is recorded by the metadata server system;
根据所述第一信息, 与所述文件访问服务器交互, 从所述文件访问 服务器读取出对应的第一文件。  And interacting with the file access server according to the first information, and reading a corresponding first file from the file access server.
8. 一种文件存储装置, 应用于元数据服务器系统, 所述装置包括: 确定模块, 配置为确定至少两个文件的存储位置信息; 其中, 所述 存储位置信息包括所述至少两个文件各自在一个副本中的偏移位置信息 和所述副本位置信息;  A file storage device, which is applied to a metadata server system, the device comprising: a determining module configured to determine storage location information of at least two files; wherein the storage location information includes each of the at least two files Offset location information and copy location information in one copy;
发送模块, 配置为向文件访问客户端发送所述存储位置信息, 使所 述文件访问客户端根据所述存储位置信息, 与文件访问服务器交互, 使 各自在所述副本中的偏移位置。  And a sending module, configured to send the storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information to make an offset position in the copy.
9. 如权利要求 8所述的装置, 其中, 所述发送模块包括:  9. The device of claim 8, wherein the sending module comprises:
发送单元, 配置为向所述文件访问客户端发送存储位置信息, 使所 述文件访问客户端根据所述存储位置信息, 与文件访问服务器交互, 使 所述文件访问服务器将所述至少两个文件写入一个緩存块并在所述至少 两个文件写入完成后, 将所述緩存块中的数据写入所述副本中、 且所述 所述副本中的偏移位置。  a sending unit, configured to send storage location information to the file access client, so that the file access client interacts with the file access server according to the storage location information, so that the file access server sets the at least two files Writing a cache block and writing data in the cache block to the copy and the offset position in the copy after the at least two files are written.
10. 一种文件存储装置, 应用于文件访问客户端, 所述装置包括: 接收模块, 配置为接收元数据服务器系统发送的至少两个文件的存 储位置信息; 其中, 所述存储位置信息包括所述至少两个文件各自在一 个副本中的偏移位置信息和所述副本位置信息; 所述存储位置信息由所 述元数据服务器系统确定; A file storage device, applied to a file access client, the device comprising: a receiving module configured to receive at least two files sent by a metadata server system Storage location information; wherein the storage location information includes offset location information and the replica location information of each of the at least two files in one copy; the storage location information is determined by the metadata server system;
交互模块,配置为根据所述存储位置信息,与文件访问服务器交互, 件各自在所述副本中的偏移位置。  And an interaction module configured to interact with the file access server according to the storage location information, and each of the pieces is offset from the copy.
11. 如权利要求 10所述的装置, 其中, 所述交互模块包括: 交互单元, 配置为针对所述至少两个文件中的每个文件, 根据所述 副本位置信息和所述每个文件在所述副本中的偏移位置信息, 将緩存有 所述每个文件的共享内存页写入所述文件访问服务器的緩存块, 使所述 文件访问服务器在所述共享内存页写入完成后, 将所述緩存块中的数据 写入所述副本中且所述数据写入完成后所述每个文件存储在所述每个文 件在所述副本中的偏移位置。  11. The apparatus according to claim 10, wherein the interaction module comprises: an interaction unit configured to, for each of the at least two files, according to the copy location information and each of the files Offset location information in the replica, writing a shared memory page in which each file is cached to a cache block of the file access server, so that the file access server writes the shared memory page after completion Each of the files in the cache block is written to the copy and the file is stored at an offset location in the copy of the file.
12. 一种元数据服务器系统, 包括如权利要求 8或 9所述的文件存储 装置。  A metadata server system comprising the file storage device of claim 8 or 9.
13. 一种文件访问客户端, 包括如权利要求 10或 11所述的文件存储 装置。  A file access client comprising the file storage device of claim 10 or 11.
PCT/CN2013/083689 2013-03-27 2013-09-17 File storage method and device, access client and metadata server system WO2014153931A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310102382.4A CN104079600B (en) 2013-03-27 2013-03-27 File memory method, device, access client and meta data server system
CN201310102382.4 2013-03-27

Publications (1)

Publication Number Publication Date
WO2014153931A1 true WO2014153931A1 (en) 2014-10-02

Family

ID=51600642

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/083689 WO2014153931A1 (en) 2013-03-27 2013-09-17 File storage method and device, access client and metadata server system

Country Status (2)

Country Link
CN (1) CN104079600B (en)
WO (1) WO2014153931A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105141685A (en) * 2015-08-18 2015-12-09 浪潮(北京)电子信息产业有限公司 File read-write system and meta data memory thereof as well as method and device for reading and writing files

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331428B (en) * 2014-10-20 2017-07-04 暨南大学 The storage of a kind of small documents and big file and access method
KR102030786B1 (en) 2014-12-27 2019-10-10 후아웨이 테크놀러지 컴퍼니 리미티드 Data processing method, apparatus and system
CN106911743B (en) * 2015-12-23 2019-03-26 中兴通讯股份有限公司 Small documents write polymerization, read polymerization and system and client
CN107451070B (en) * 2016-06-01 2020-08-04 腾讯科技(深圳)有限公司 Data processing method and server
CN106250212A (en) * 2016-07-29 2016-12-21 努比亚技术有限公司 Resource access method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101167047A (en) * 2005-04-22 2008-04-23 微软公司 Local thumbnail cache
US8095577B1 (en) * 2008-03-31 2012-01-10 Emc Corporation Managing metadata
CN102707900A (en) * 2011-03-11 2012-10-03 微软公司 Virtual disk storage techniques
CN102855239A (en) * 2011-06-28 2013-01-02 清华大学 Distributed geographical file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101167047A (en) * 2005-04-22 2008-04-23 微软公司 Local thumbnail cache
US8095577B1 (en) * 2008-03-31 2012-01-10 Emc Corporation Managing metadata
CN102707900A (en) * 2011-03-11 2012-10-03 微软公司 Virtual disk storage techniques
CN102855239A (en) * 2011-06-28 2013-01-02 清华大学 Distributed geographical file system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105141685A (en) * 2015-08-18 2015-12-09 浪潮(北京)电子信息产业有限公司 File read-write system and meta data memory thereof as well as method and device for reading and writing files

Also Published As

Publication number Publication date
CN104079600A (en) 2014-10-01
CN104079600B (en) 2018-10-12

Similar Documents

Publication Publication Date Title
US10782880B2 (en) Apparatus and method for providing storage for providing cloud services
US10346081B2 (en) Handling data block migration to efficiently utilize higher performance tiers in a multi-tier storage environment
CN106547476B (en) Method and apparatus for data storage system
US8504797B2 (en) Method and apparatus for managing thin provisioning volume by using file storage system
US11397668B2 (en) Data read/write method and apparatus, and storage server
US9182912B2 (en) Method to allow storage cache acceleration when the slow tier is on independent controller
US8290911B1 (en) System and method for implementing data deduplication-aware copying of data
WO2014153931A1 (en) File storage method and device, access client and metadata server system
JP2009181148A (en) Storage subsystem
US20180203637A1 (en) Storage control apparatus and storage control program medium
WO2013107029A1 (en) Data processing method, device and system based on block storage
WO2017148242A1 (en) Method for accessing shingled magnetic recording (smr) hard disk, and server
US11899580B2 (en) Cache space management method and apparatus
WO2023169235A1 (en) Data access method and system, device, and storage medium
US20190294590A1 (en) Region-integrated data deduplication implementing a multi-lifetime duplicate finder
US11226778B2 (en) Method, apparatus and computer program product for managing metadata migration
CN113360098A (en) Data writing method, device and system, electronic equipment and storage medium
US20220129346A1 (en) Data processing method and apparatus in storage system, and storage system
JP4502375B2 (en) File system and control method thereof
JP2019028954A (en) Storage control apparatus, program, and deduplication method
CN109144403B (en) Method and equipment for switching cloud disk modes
US20220350779A1 (en) File system cloning method and apparatus
WO2012171363A1 (en) Method and equipment for data operation in distributed cache system
US20210311654A1 (en) Distributed Storage System and Computer Program Product
CN108984432B (en) Method and device for processing IO (input/output) request

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13880301

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13880301

Country of ref document: EP

Kind code of ref document: A1