WO2015100627A1 - 一种分布式文件存储系统中的数据处理方法及设备 - Google Patents

一种分布式文件存储系统中的数据处理方法及设备 Download PDF

Info

Publication number
WO2015100627A1
WO2015100627A1 PCT/CN2013/091143 CN2013091143W WO2015100627A1 WO 2015100627 A1 WO2015100627 A1 WO 2015100627A1 CN 2013091143 W CN2013091143 W CN 2013091143W WO 2015100627 A1 WO2015100627 A1 WO 2015100627A1
Authority
WO
WIPO (PCT)
Prior art keywords
stripe
blocks
file
data
target file
Prior art date
Application number
PCT/CN2013/091143
Other languages
English (en)
French (fr)
Inventor
郭洪星
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CA2897129A priority Critical patent/CA2897129C/en
Priority to CN201380002274.8A priority patent/CN104272274B/zh
Priority to EP13900799.1A priority patent/EP2933733A4/en
Priority to JP2015559412A priority patent/JP6106901B2/ja
Priority to PCT/CN2013/091143 priority patent/WO2015100627A1/zh
Priority to AU2013409624A priority patent/AU2013409624B2/en
Publication of WO2015100627A1 publication Critical patent/WO2015100627A1/zh
Priority to US14/806,064 priority patent/US10127233B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1727Details of free space management performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/1827Management specifically adapted to NAS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2211/00Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
    • G06F2211/10Indexing scheme relating to G06F11/10
    • G06F2211/1002Indexing scheme relating to G06F11/1076
    • G06F2211/1028Distributed, i.e. distributed RAID systems with parity

Definitions

  • the present invention relates to the field of storage technologies, and in particular, to a data processing method and device in a distributed file storage system. Background technique
  • a distributed file storage system includes multiple storage servers, and multiple storage servers are interconnected by a low-latency, high-throughput network (such as IB network, 10G Ethernet) to form a large network RAID ( Redundant Array of Inexpensive Disks, and all storage servers provide data read and write services at the same time.
  • a low-latency, high-throughput network such as IB network, 10G Ethernet
  • RAID Redundant Array of Inexpensive Disks
  • all storage servers provide data read and write services at the same time.
  • the file data is stripped by algorithms such as a cross-node RAID algorithm (such as RAID5, RAID6, or RAIDZ) or an Erasure Code algorithm.
  • the file data is divided into a plurality of data strips (Strip), and a corresponding check strip block is generated, and then the data strip block and the check strip block are stored in the corresponding node.
  • Strip data strips
  • the stored file data When the stored file data is read, a certain number of data strip blocks and a check strip block are read from the storage server node to construct the original file data that the user needs to read.
  • the number of data strips that are segmented when the file data is striped is more and more, read and write.
  • the disk 10 and the network 10 at the time of operation also increase accordingly. In this way, the number of data strips that are segmented when the file data is striped is correspondingly increased. In the small file scenario, the access performance of the distributed file storage system is greatly burdened.
  • the technical problem to be solved by the present invention is how to improve the access performance of a distributed file storage system in a small file scenario.
  • a first aspect of the present invention provides a data processing method applied to a distributed file storage system, the method comprising: a client agent receiving a data processing request of a user, where the data processing request carries a file identifier of a target file Information such as an offset address and a file length; the target file is a file to be processed in the data processing request; the client agent obtains a redundancy ratio according to the file identifier of the target file carried in the data processing request Information, the redundancy ratio information includes a number N of data strip blocks of the distributed file storage system and a number M of check strip blocks of the distributed file storage system; according to the data processing request Determining, by the offset address and length information of the target file, the number DSC of valid strip blocks of the target file, the valid strip block being a strip block containing data of the target file; The number of valid strip blocks DSC and the number of the check strip blocks M determine the number of actual strip blocks of the target file ⁇ '; The number N' of the actual strip blocks determines the corresponding
  • the number of the check stripe blocks and the number of data stripe blocks N may be multiple groups, and are respectively stored in corresponding directories. In the information form.
  • N' determines the corresponding stripe block and processes it further includes:
  • the consistent label information may be a timestamp or a version number
  • the method when the data processing request is a data read request, the method further includes: acquiring, according to the file identifier, the target file The distribution information of the stripe block; the determining the corresponding stripe block according to the number N' of the actual stripe blocks and performing processing is specifically:
  • the data block read request is based on the stripe block of the acquired target file
  • the distribution information is sent to a storage server node that stores the actual stripe block; Receiving the response message of the storage server node storing the actual stripe block; the response message is a readable success response message or an unreadable failure response message, where the success response message carries the actual stripe block Consistency tag information and the number of valid slice blocks DSC information; determining whether the target file can be read according to the received response message.
  • determining, according to the received response message, whether the target file can be read is: if the received success response message is The quantity is equal to the number N' of the actual stripe blocks, and the consistency label information carried in the success response message and the number of valid stripe blocks are the same, and the target file can be read;
  • the number of received success response messages is less than the number of the actual stripe blocks ⁇ ', it is determined whether the number of the received success response messages is greater than the number of the check strip blocks; If the number of successful response messages is greater than the number of the check strip blocks, determine whether the number of successful response messages is greater than or equal to the number DSC of valid strip blocks of the target file, and the success response message is The carried consistency label information and the number of valid stripe blocks are the same; if yes, the target file can be read; otherwise, the data block read request is sent to the storage check according to the acquired distribution information.
  • a storage server node of the stripe block if the number of success response messages returned by the storage server node storing the check strip block is greater than or equal to the number DSC of the valid stripe blocks of the target file, and the success response message If the consistency label information carried in the same and the number of valid stripe blocks are the same, the target file can be read;
  • the number of success response messages returned by the storage server node storing the check strip block is smaller than the number DSC of the valid stripe blocks of the target file, or the consistency label information carried in the success response message and the valid stripe block. If the number of DSC information is different, the target file cannot be read.
  • a device 30 for implementing a data processing method in a distributed file storage system the device 30 communicating with a storage server node 101 in the distributed file storage system 10, the device The receiving module 301, the processing module 303, and the sending module 305 are included:
  • the receiving module 301 is configured to receive a data processing request of the user, where the data processing request carries information such as a file identifier, an offset address, and a file length of the target file; and the target file needs to be processed in the data processing request. document;
  • the processing module 303 is configured to:
  • redundancy ratio information from the storage server node according to the file identifier of the target file carried in the data processing request, where the redundancy ratio information includes a data strip block of the distributed file storage system Number N and the number M of check strip blocks of the distributed file storage system;
  • the corresponding stripe block is determined and processed according to the number N' of the actual stripe blocks; the sending module 305 is configured to feed back the processing result to the user.
  • the determining, according to the number of the valid stripe blocks, the number of the DSCs, and the number of the check strips, the object file The actual number of strips N' is specifically:
  • the actual number of stripe blocks N' of the target file is equal to the number DSC of the valid stripe blocks, that is, N'DSC .
  • the determining the corresponding strip according to the number N′ of the actual stripe blocks Blocking and processing also includes:
  • the consistent label information may be a timestamp or a version number
  • the N' actual stripe blocks and the M check stripe blocks are written to corresponding storage server nodes.
  • N' determines the corresponding stripe block and processes it further includes:
  • the data block read request is based on the stripe block of the acquired target file
  • the distribution information is sent to a storage server node that stores the actual stripe block
  • the response message determines whether the target file can be read.
  • the receiving module is further configured to receive a data deletion request of the user, where the data deletion request carries a file identifier of the target file;
  • the target file is the file that needs to be deleted;
  • the processing module obtains redundancy ratio information from the storage server node according to the file identifier, where the redundancy ratio information includes a number N of data strip blocks of the distributed file storage system and the distribution The number of check strip blocks of the file storage system M;
  • the response message is one of a response message of deleting success, deleting a response message corresponding to non-existence, and deleting a failed response message;
  • the sending module is configured to feed back the result of the deletion success or the deletion failure to the user.
  • the effective stripe block is determined according to the size of the target file, and the actual stripe block of the target file is further determined.
  • the distributed file storage can also be reduced.
  • the number of empty strips in the system 10 can save a large amount of network read/write I/O and disk read/write I/O in a small file scenario, and improve the performance of the distributed storage system 10.
  • Figure 2 is a schematic diagram of the distribution of strip blocks
  • FIG. 3 is a schematic flowchart of a method for implementing a data processing request according to an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a method for a data processing request as a data write request according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a method for processing a data processing request as a data read request according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of data in an embodiment of the present invention. A schematic diagram of a method flow for processing a request as a data deletion request;
  • FIG. 7 is a schematic structural diagram of an apparatus for implementing a data processing method in a distributed file storage system according to an embodiment of the present invention.
  • the main components of the distributed file storage system 10, as shown in FIG. 1, include a plurality of storage servers 101, and a plurality of storage servers 101 pass through a low-latency, high-throughput network (eg, IB network, 10G Ethernet).
  • the interconnections form a cluster.
  • the distributed file storage system 10 also includes a Front-End switch 50 and a Back-End switch 60.
  • the front end switch 50 is used for External business requests and data interactions between the user data and the cluster.
  • the backend switch 60 is used for internal request and data interaction between various storage server nodes within the cluster.
  • the application server 20 communicates with the distributed file storage system 10 via the front end switch 50.
  • each application typically interacts with the storage server 101 in two ways.
  • the first way is that each application directly accesses the file system client agent CA (Client Agent) 30 deployed on the application server 20 through a standard portable operating system interface (English: Portable Operating System Interface, abbreviation: POSIX).
  • the client agent 30 acts as a portal for the distributed file storage system 10 to provide external services, and then interacts with the storage server 101 inside the cluster after receiving the application request.
  • the second access method is that each application accesses a corresponding network attached storage server (NAS Server) through a commonly used NAS protocol (such as NFS/CIFS), and the NAS server and the storage server 101 are deployed in the network. Together, the NAS Server accesses the file system client agent 30 deployed on the server node to implement the storage service.
  • the first access mode is used for specific description, and the second access mode adopts a similar implementation principle.
  • the service system includes two application servers 20, and the application server 20 communicates with the distributed file storage system 10 through the front-end switch 50.
  • the client agent 30 is deployed in the application server 20, and the user's data processing request is first sent to the client agent 30 of the application server 20, and the client agent 30 performs corresponding processing on the data processing request.
  • the application server 20 transmits a data processing request to the client agent 30 in the corresponding storage server node 101 through the front end switch 50, The data processing request is processed by the client agent 30 in the storage server node 101.
  • the storage servers 101 in the distributed file storage system 10 are interconnected to form a cluster, which constitutes a large network RAID, and the stored data adopts a N+M redundancy protection mechanism.
  • M is the number of check strip blocks of the distributed file storage system 10 for redundantly protecting the stored file data.
  • the specific value of M can be set to a fixed value according to business needs.
  • N is the number of data strips generated when the file data is striped, and N is calculated according to the number of storage server nodes and the value of M of the distributed file storage system 10, and may also be fixed according to service requirements. value.
  • the distributed file storage system 10 can set a uniform value of N and M, and can also set different values of N and M for a certain directory, according to service requirements.
  • N and M of the distributed file storage system 10 are stored in a file data metadata information table of the distributed file storage system 10.
  • the metadata information table is the same as the existing implementation manner, and will not be further described herein.
  • a file data to be processed is called a stripe data
  • a stripe data striping process is divided into N data strip blocks, and M schools are generated according to the redundancy algorithm.
  • the inspection strip has a block.
  • the client agent 30 performs striping processing on the received file data, and divides the file data into
  • N data strips and then generate M check strip blocks according to the redundancy algorithm, and store the generated data strip block and check strip block into the corresponding storage server node 101, and in the data strip Consistent tag information such as the same time stamp or version number is recorded in the block and check strip blocks.
  • N data strip blocks and M check strip blocks are written to the corresponding storage server node 101.
  • the stripe blocks may be written to the respective storage server nodes 101 in accordance with the number order of the storage server node 101, or may be written in the corresponding server node 101 according to other rules.
  • the write rule is the same as the existing write rule and will not be described here.
  • the client agent 30 When the user needs to read the file data, the client agent 30 reads the same number of data strips or check strips of the same consistency label information through the identification information of the file data that needs to be read to obtain the user needs to read. File data.
  • the consistency principle of the redundancy algorithm when reading the stripe data with the N+M redundancy protection, at least any N strips of the same consistency label information in the N+M stripe blocks need to be obtained. , in order to ensure that the read strip data is the correct data.
  • the distributed file storage system 10 as shown in FIG. 1 includes eight storage server nodes 101. It is assumed that the M of the distributed file storage system 10 is set to 2, and N is set to 6. Of course, M can be set to 2 and N is 5. In the present embodiment, N is 6 and M is 2 for exemplary description.
  • the client agent 30 in the application server 20 receives the file data to be stored, and strips the file data into six data strips, and then uses the redundancy. The remaining algorithm generates two check strip blocks, and records the same time stamp or version number consistency label information in the data strip block and the check strip block.
  • the client agent 30 stores the six data strips in the order of the storage server node 101 to the storage server node 1 - storage server node 6, and stores the two check strip blocks in order to the storage server node 7 and the storage server.
  • node 8 for example, as shown in FIG.
  • the client agent 30 reads at least six stripe blocks having the same consistency label information from the storage server node 101 according to the identification information of the file data, and the six stripe blocks are data strips. Any 6 stripe blocks in the block and check strip blocks.
  • the valid file data may only occupy part of the data stripe block.
  • the remaining data stripe block is a blank stripe block that does not contain valid file data.
  • N 6
  • M 2
  • the check strip block is stored to the storage server node 7 - storage server node 8.
  • the small file data is striped, it only takes 2 data strips, such as D1 and D2, and stores them on storage server node 1 and storage server node 2 and records the timestamp T1.
  • the present invention proposes a new method for processing small file data in the distributed file storage system 10, which can reduce the operation of empty strip blocks, reduce the overhead of the disk 10 and the network 10 in the distributed file storage system, and improve the distributed file storage system. 10 performance.
  • the invention mainly adopts a redundancy protection mechanism of N'+M according to the size of the file to be processed. Where M is the number of check strip blocks of the distributed file storage system, and N' is the number of actual strip blocks determined according to the size of the file data when the file data is striped.
  • the size of the file data is different, and the number N' of the actual stripe blocks that are sliced may be different, realizing the dynamic adjustment of the actual number of stripe blocks of the file data.
  • This can reduce the number of empty stripe blocks in a small file scenario, thereby reducing the number of disks 10 and 10 in a distributed file storage system and improving the performance of a distributed file storage system.
  • N' needs to be larger than M, that is, the actual number of data strips and the number of check strip blocks should be large. Most principles can recover data in any abnormal situation.
  • the structure of the distributed file storage system 10 applicable in the embodiment of the present invention is as shown in FIG. 1, and includes eight storage servers 101, and a plurality of storage servers 101 pass through a low-latency, high-throughput network (for example, an IB network). , 10G Ethernet) Interconnects form a cluster.
  • the client agent 30 is deployed in the application server 20 and communicates between the user data and the cluster through the front end switch 50.
  • Each storage server node 101 in the cluster communicates internally through the backend switch 60.
  • the client agent 30 can also be deployed in the various storage server nodes 101 of the distributed file storage system 10, the functions of which are similar to those of the client agent 30 deployed in the application server 20, and will not be described again.
  • M is 2 and N is 6.
  • M is the number of check strip blocks in the distributed file storage system 10 for redundantly protecting the stored file data.
  • the specific value of M may be set according to service requirements.
  • N is the number of data strips that are sliced when the file data is striped.
  • N is calculated according to the number of storage server nodes and the value of M in the distributed file storage system, and may also be set according to business needs. Fixed value.
  • the distributed file storage system can set a uniform value of N and M, and can also set different N and M for a certain directory. The value depends on the business needs. In this embodiment, a unified N and M are taken as an example for description.
  • N and M of the distributed file storage system are stored in the file data metadata information table of the distributed file storage system 10.
  • the client agent 30 on the application server 20 receives the data processing request of the user sent by the application, and the data processing request carries information such as a file identifier FID, an offset address offset, and a file length length of the target file.
  • the target file is a file to be processed.
  • Data Strip Count DSO o valid stripe block refers to the data stripe block containing the file data.
  • N The larger the size of the distributed file storage system, the smaller file data scenario
  • the difference between the number of valid stripe blocks and the number N of data strips of the distributed file storage system is larger.
  • the application server 20 When the client agent 30 is deployed on the storage server node 101 of the distributed file storage system 10, after receiving the data processing request, the application server 20 sends the data request to the client agent 30 on a storage server through the front end switch 50. .
  • the method by which the application server 20 sends the data processing request to the client agent on the storage server 101 is similar to the existing method and will not be described in detail herein.
  • the method of the client agent 30 on the storage server 101 for the data processing request is similar to that of the client agent 30 on the application server 20 and will not be described again.
  • the number of valid stripe blocks DSC is greater than the number M of check strip blocks, that is, when DSC>M
  • the number of empty stripe blocks is 0, which means that no empty stripe blocks need to be added, which reduces the number of empty stripe blocks in the distributed file storage system.
  • the number of valid stripe blocks may be the same as the number N of data strips of the distributed file storage system.
  • the number of empty stripe blocks is 0, that is, no empty strips need to be added. Piece.
  • the number of actual stripe blocks N' is used instead of the strips of the distributed file storage system.
  • the number N effectively reducing the number of hollow strips in the distributed file storage system.
  • the read and write 10 operations for empty stripe blocks are reduced, and the performance of the distributed file storage system is improved.
  • the client agent 30 processes the N' actual stripe blocks accordingly.
  • the specific processing method differs from the type of data processing request.
  • the storage server node 1 - the storage server node 8 form a cluster by low latency, high throughput network interconnection.
  • the client agent 30 is deployed in the application server 20 and communicates between the user data and the cluster through the front end switch 50.
  • Each storage server node 101 in the cluster communicates internally through the backend switch 60.
  • the client agent 30 can also be deployed in each storage server node 101 of the distributed file storage system 10. Its function is similar to that of the client agent 30 deployed in the application server 20 and will not be described separately.
  • the file data stored in the distributed file storage system 10 adopts a N+M redundancy protection mechanism.
  • M is the number of check strip blocks in which the distributed file storage system 10 performs redundancy protection on the stored file data.
  • N is the number of data strips that are split when the file data is striped.
  • N is calculated according to the number of storage server nodes and the value of M of the distributed file storage system 10, and may also be set according to business needs. Fixed value.
  • the distributed file storage system 10 can set a uniform value of N and M, and can also set different values of N and M for a certain directory, according to service requirements. In this embodiment, a unified N and M are taken as an example for description.
  • N and M of the distributed file storage system 10 are stored in a file data metadata information table of the distributed file storage system 10.
  • M and N for a directory device are stored in the directory metadata table.
  • N and M may be equal to the sum of the number of storage server nodes in the distributed file storage system 10, or a plurality of stripe blocks may be stored in one storage server node 101, that is, the sum of N and M may also be larger than that of the storage server node.
  • N of the distributed file storage system 10 is 6, and M is 2.
  • the data write request is sent to the file system client agent 30 deployed on the application server 20 via a standard portable operating system interface.
  • the client agent 30 processes the data write request and communicates with each storage server node 101 via the front end switch 50.
  • the application server 20 sends a data write request through the front end switch 50 to the client agent 30 on the corresponding storage server node 101, and the client agent 30 on the storage server node 101 is again
  • the data write request is processed and communicated with the storage server node 101 in the cluster via the backend switch 60.
  • a data write request is initiated by the client to the client agent 30 on the application server 20.
  • the file data to be written is referred to as an object file.
  • the data write request carries information such as a file identifier FID, an offset address offset, and a file length length of the target file.
  • the file identifier FID is 485, the offset address is 0K, and the file length is 160K.
  • Client agent 30 according to the target text
  • the file identifier of the piece obtains the redundancy ratio information of the distributed file storage system and the size of the stripe block from the file system metadata information table, that is, the values of N and M. In this embodiment, the value of N is 6, the value of M is 2, and the size of the stripe block is 128K.
  • the client agent 30 performs striping processing on the target file according to the offset address, the file length information, and the size of the obtained stripe block in the data write request, to obtain the number DSC of the effective stripe block of the target file.
  • the offset address of the target file is 0K
  • the file size is 160K
  • the size of the stripe block is 128K. Therefore, the client agent strips the target file to generate two valid stripe blocks.
  • the client agent 30 compares the number of valid slice blocks that are sliced when the target file is striped, with the number M of the obtained check strip blocks.
  • the number DSC of the effective slice blocks is less than or equal to the number M of the check strip blocks, that is, DSC M
  • the number of valid stripe blocks is DSC
  • the number of check stripe blocks M is 2
  • the target file needs to be divided into three stripe blocks, that is, the number of actual stripe blocks of the target file is N', and the number of valid stripe blocks of the target file is 2, This only needs to add 1 empty strip block.
  • the target file needs to have 6 strips of data strips. In the case where the effective stripe block is 2, 4 empty stripe blocks need to be added. It can be seen that with the method of the present invention, the number of empty strips can be greatly reduced.
  • the number of valid strips that are sliced is DSC
  • the check strip is When the number M of blocks is 2, the number DSC of the effective strip blocks of the target file at this time is smaller than the number M of the check strip blocks, that is, DSC ⁇ M.
  • the target file is striped, it needs to be divided into 3 stripe blocks, that is, the number of actual stripe blocks of the target file is N', and the number of valid strips of the target file is DS1 is 1, so that Need to add 2 empty strip blocks.
  • the number of valid strip blocks sliced when the target file is striped is larger than the number M of the check strips, that is, DSC>M.
  • the number of valid stripe blocks is 10
  • the number N' of actual strip blocks of the target file can be dynamically adjusted according to the size of the target file, which can ensure that the target file can be correctly read under any circumstances, and can be effective. Reduce the number of empty strips. Accordingly, the read and write 10 operations for empty stripe blocks are reduced, and the performance of the distributed file storage system is improved.
  • the client agent 30 generates a M check strip block by using a redundancy algorithm according to the obtained actual stripe block, and adds the consistency label information and the valid strip in the N' actual strip block and the M check strip block.
  • the consistency tag information may be the same timestamp timestamp or version number information.
  • the client agent 30 strips the target file Korean to get the valid stripes of D1 and D2 ⁇ .
  • the number of valid strip blocks of the target file DSC is 2, and the number 2 of valid strip blocks is compared with the number 2 of check strip blocks, and the number of valid strip blocks is DSC and the number of check strip blocks M is equal, the number of actual stripe blocks of the target file N' is 3, and an empty stripe block D3 needs to be added.
  • the client agent 30 writes the actual stripe block and the check strip block carrying the consistency tag information and the number of valid stripe blocks DSC into the corresponding storage server node 101. How to confirm each The method of the storage server node 101 that the stripe blocks should be written is similar to the existing implementation method, and will not be described in detail herein.
  • the client agent 30 saves the number N' of the actual stripe blocks of the target file and the number M of the check strip blocks and the distribution information of the strip blocks to the metadata information table of the file, so as to facilitate reading of the target file.
  • a stripe block of the target file can be read into the corresponding storage server node 101.
  • the client agent 30 implements operations such as storage of each strip block through the backend switch 60, and the specific implementation manner is the same as the existing implementation manner. , no further explanation here.
  • the number DSC of the effective stripe block segmented by the target file striping process is smaller than the number M of the check strip block of the distributed file storage system
  • the number of actual stripe blocks N' is equal to the number of check strips M+1, and the value of M is generally small. At this time, only a few empty stripe blocks need to be added; when the target file is striped
  • the number of valid strip blocks DSC is larger than the number M of the check strip blocks of the distributed file storage system
  • the number of actual strip blocks N' is equal to the number of valid strip blocks. Empty strips.
  • the target file is small file data
  • the target file is striped to generate N' actual stripe blocks, instead of the number N of data strips of the distributed file storage system in the prior art, which is reduced.
  • the number of stripe blocks effectively reduces the write operation of empty stripe blocks, reduces the operation of the distributed file storage system, and improves the performance of the distributed file storage system.
  • the performance improvement of distributed file storage systems is more obvious.
  • the following embodiment is described in detail by taking a data processing request as a data read request as an example, and the implemented method flow is as shown in FIG.
  • a data read request is a request for a user to read a target file.
  • the structure of the distributed file storage system 10 is the same as that of the distributed file storage system 10 for which the data processing request is a data write request application, as shown in FIG.
  • the description will be made by taking N as 6 and M as 2.
  • the values of N and M can also be set to other values as needed, and the implementation principles are the same and will not be described separately.
  • the client agent 30 is also deployed in the application server 20 as an example for description.
  • a data read request is initiated to the client agent 30 on the application server 20.
  • the data read request carries information such as a file identifier FID, an offset address offset, and a file length length of the target file.
  • the client agent 30 obtains the redundancy ratio information of the distributed file storage system 10 from the file system metadata information table according to the file identifier FID carried in the data read request.
  • the remainder ratio information is the values of N and M.
  • M is the number of check strip blocks in the distributed file storage system in order to redundantly protect the stored file data. The specific value of M may be set according to service requirements.
  • N is the number of data strips that are sliced when the file data is striped.
  • N is calculated according to the number of storage server nodes and the value of M in the distributed file storage system, and may also be set according to business needs. Fixed value.
  • the distributed file storage system 10 can set a uniform value of N and M, and can also set different values of N and M for a certain directory, according to service requirements. In the present embodiment, a unified N and M will be described as an example.
  • N and M of the distributed file storage system 10 are stored in the file data metadata information table of the distributed file storage system 10.
  • M and N for a directory device are stored in the directory metadata table.
  • the value of N of the distributed file storage system 10 is 6, and the value of M is 2.
  • the client agent 30 on the application server 20 can obtain the number N of data strips of the target file, the number M of the check strip blocks, and the strip block according to the file identifier. Size.
  • the client agent calculates the number DSC of the valid stripe blocks of the target file according to the offset address, the file length information, and the size of the stripe block carried in the data read request.
  • the specific calculation method is similar to the method in the data write request scheme, and will not be further described herein. In the following steps, the number of valid stripe blocks of the target file is 2, the number of actual stripe blocks is 3, and the number of check stripe blocks is 2.
  • the client agent 30 can search for the distribution information of the stripe block of the target file by using the file identifier FID carried in the data read request, and determine, according to the distribution information, which storage block and the check strip block of the target file are stored. On the server node, the location of the actual stripe block storing the target file and the server node of the check strip block may also be confirmed according to the calculated number of actual stripe blocks N' and the number of check stripe blocks.
  • the client agent 30 generates a new data block read request and sends the data block read request to the determined storage server node 101.
  • the data block read request is for reading a stripe block of the object file in the storage server node 101.
  • the client agent 30 may send a data block read request to all of the determined storage server nodes 101; the data block read request may also be sent to all of the storage server nodes 101 in the distributed file storage system 10.
  • the client agent 30 can also send the data block read request to the storage server node 101 of the actual stripe block storing the target file, and the response fed back from the storage server node 101 storing the actual strip file of the target file cannot be correctly read.
  • the data block read request is sent to the storage server node 101 of the check strip block storing the target file.
  • the data block read request is first sent to the storage server node 101 storing the actual strip file block of the target file, and is fed back to the storage server node storing the actual strip file of the target file.
  • the data block read request is sent to the storage server node 101 storing the target file check strip block.
  • the storage server node 101 After receiving the data block read request, the storage server node 101 determines whether the stored corresponding stripe block is readable according to the file identifier carried in the data block read request, and if it can be read, sends the readable record to the client proxy. Successful response message, the success response message carries a stripe block The consistency label information and the number of valid stripe blocks DSC information; if the corresponding stripe block is not stored or the stored corresponding stripe block is corrupted and cannot be read, an unreadable failure response message is sent to the client.
  • the client agent 30 transmits a data block read request to the storage server node 101 storing the actual strip file block of the target file, and receives the response information of the data block read request returned by each storage server node 101. If the number of successful response messages is the same as the number N' of actual stripe blocks, and the consistency tag information and the number of valid stripe blocks in the success response message are the same, the target file can be read, the client agent 30 reads the actual stripe block and constructs the object file to send to the user. The client agent 30 reads the actual stripe block and constructs the object file in the same way as the existing method, and will not be described here.
  • the client agent 30 searches the metadata information table according to the file identifier of the target file carried in the data read request, and obtains the number N of data strips of the target file is 6, and the check strip block
  • the quantity M is 2, and the distribution information of the target file stripe block, that is, the actual stripe block D1 is stored in the storage server node 1, the actual stripe block D2 is stored in the storage server node 2, and the actual stripe block D3 is stored in the storage server node. 3.
  • the check strip block D7 is stored in the storage server node 7, and the check strip block D8 is stored in the storage server node 8.
  • the client agent 30 calculates the number of valid stripe blocks of the target file DSC and the number of actual stripe blocks of the target file N' based on the information carried in the data read request and the information obtained from the file identifier of the target file. For detailed calculation methods, please refer to the relevant description in the process of data write request.
  • the client agent 30 generates a new data block read request for reading the stripe block of the object file in the storage server node 101.
  • the client agent 30 sends a data block read request to the storage server node 1-3 storing the actual strip file block of the target file.
  • the storage server node 1-3 returns a success response message that can be read, and the success response message carries the time stamp information of the stripe block and the quantity information of the valid stripe block respectively.
  • the client agent 30 determines that the timestamp information carried in all the response messages and the quantity information of the valid stripe block are Nothing is the same. If they are the same, the client agent 30 reads the actual stripe block and constructs the object file to send to the user. In the embodiment of the present invention, the client agent 30 only needs to read the stripe block with the same consistency label information and the effective stripe block number DSC information, but in the prior art, it needs to be read.
  • N the value of the number of data strips N will be larger, and the corresponding number of empty strip blocks that need to be operated will be burdened by the performance of the distributed file storage system. It is also bigger.
  • the read operation of the empty stripe block can be effectively reduced, and the performance of the entire distributed file storage system is improved.
  • the number of successful response messages fed back by the storage server node 101 storing the actual strip file block is smaller than the number of actual stripe blocks N' or the success response message carries the same consistency label information and the number of valid stripe blocks DSC If the number of information is smaller than the number N' of the actual stripe blocks, it is further determined whether the number of successful response messages or the number of successful response messages with the same consistency label information and the number of valid stripe blocks DSC information is greater than the distributed file.
  • the number of check strip blocks of the storage system is ⁇ 1.
  • the client agent 30 judges success. Whether the number of response messages and the number of successful response messages with the same consistency label information and the effective stripe block number information is greater than or equal to the number DSC of the valid stripe blocks of the target file, if greater than or equal to the valid stripe of the target file The number of blocks is DSC. At this time, the correct target file can be read, and the client agent 30 performs corresponding processing.
  • the specific processing method is the same as the existing implementation method, and will not be further described herein.
  • the client agent 30 will The data block read request is sent to the storage server node 101 storing the target file check strip block, and receives the response message of the storage server node 101 storing the target file check strip block.
  • the client agent 30 further determines the number of all successful response messages and the number of successful response messages with the same consistency label information and the effective stripe block quantity information is greater than or equal to the number DSC of the valid stripe blocks of the target file, if greater than Or equal to the number DSC of the valid stripe blocks of the target file, at which time the correct target file can be read. If the number of valid stripe blocks of the target file is less than DSC, the user fails to read the failed information.
  • the client agent 30 If the number of successful response messages or the number of successful response messages with the same consistency label information and valid stripe block number information is less than or equal to the number M of the check strip blocks of the distributed file storage system, the client agent 30 The data block read request is sent to the storage server node 101 storing the target file check strip block, and the response message of the storage server node 101 storing the target file check strip block is received. The client agent 30 further determines whether the number of all successful response messages or the number of successful response messages with the same consistency label information and the effective stripe block number information is greater than the check strip block of the distributed file storage system. Quantity M.
  • the number of all successful responses and the number of successful response messages with the same consistency label information and valid stripe block number information is greater than the number M of check strip blocks of the distributed file storage system, then all success is determined. Whether the number of response messages or the number of successful response messages with the same consistency tag information and the number of valid slice blocks is greater than or equal to the number DSC of valid slice blocks of the target file. If the number of all successful response messages and the number of successful response messages with the same consistency label information and valid stripe block number information is greater than or equal to the number DSC of the valid stripe blocks of the target file, then the correct one can be read.
  • Target file if the number of all successful response messages or the number of successful response messages with the same consistency tag information and the number of valid slice blocks is less than the number DSC of the valid slice blocks of the target file, the read failure is fed back to the user.
  • Information If the number of all successful response messages or the number of successful response messages with the same consistency tag information and the number of valid slice blocks is smaller than that of the distributed file storage system The number M of stripe blocks is checked, and the client agent 30 feeds back to the user the information that failed to be read. The following is an example in which the number of valid strips of the target file DSC is 2, the number of actual stripe blocks N' is 3, and the number of check stripe blocks M is 2.
  • the valid stripe block D1 of the target file is stored in the storage server node 1, and the valid stripe block D2 of the target file is stored in the server node 2, and the empty strip block D3 with the target file is stored in the storage server node 3, and the check strip
  • the tape block D7 is stored in the storage server node 7, and the check strip block D8 is stored in the storage server node 8.
  • the client agent 30 first transmits the received data block read request of the user to the storage server node 1, the storage server node 2, and the storage server node 3.
  • the storage server node 2 feeds back the failure response message due to the damaged stripe block D2, and only two stripe blocks successfully respond, that is, the client agent receives two success response messages, which is smaller than the actual stripe of the target file.
  • the number of blocks 3, the client agent 30 needs to further compare the number 2 of successful response messages with the number 2 of check strip blocks of the distributed file storage system.
  • the number 2 of successful response messages is equal to the number 2 of the check strip blocks of the distributed file storage system, and the client agent 30 sends a block read request to the storage server node 7 and the storage server node 8. Both the storage server node 7 and the storage server node 8 feed back a message of successful response to the client agent 30.
  • the sum of the number of successful response messages with the same time stamp and the valid stripe block number information received by the client agent 30 (the number of successful response messages of the valid stripe block and the success of the check strip block)
  • the number of successful response messages with the same time stamp and the number of valid strips is greater than the effective of the target file.
  • the number of stripe blocks is 2, then the correct target file can be read at this time, and the client agent 30 reads the stripe blocks D1, D7, and D8 and constructs the target file to be sent to the user.
  • the data processing request may also be a data deletion request or a data truncation request.
  • the implemented access flow is as shown in FIG. 5.
  • the structure of the distributed file storage system 10 applied is the same as that of other data processing requests, as shown in FIG. .
  • the implementation method of the data deletion request is similar to the implementation method of the data truncation request.
  • the following describes the data deletion request as an example.
  • the implementation method of the data truncation request is not described separately.
  • the value of N of the distributed file storage system is 6, and the value of M is 2.
  • the values of N and M can also be set to other values as needed, and the implementation principles are the same and will not be described separately.
  • the client agent 30 is also deployed in the application server 20 as an example for description.
  • the user initiates a data deletion request to the client agent 30 on the application server 20, and the file data that the user wants to delete is referred to as an object file, and the data deletion request is used to delete related items stored in the storage server nodes of the target file. With blocks.
  • the data deletion request carries the file identifier FID information of the target file.
  • the client agent 30 obtains the redundancy ratio information of the distributed file storage system and the stripe block from the file system metadata information table according to the file identifier FID carried in the data deletion request. Distribution information, the redundancy ratio information, that is, the values of N and M. M is the number of check strip blocks in the distributed file storage system in order to redundantly protect the stored file data.
  • N is the number of data strips that are split when the file data is striped. N is calculated according to the number of storage server nodes and the value of M in the distributed file storage system, and can also be fixed according to business needs. Value.
  • the distributed file storage system may set a uniform value of N and M, and may also set different values of N and M for a certain directory, according to service requirements. In this embodiment, a unified N and M are taken as an example for description.
  • Distributed file storage system N and M are stored in distributed text
  • the file data metadata information table of the storage system. M and N for a directory device are stored in the directory metadata table.
  • the number of check strip blocks of the distributed file storage system of the target file is M
  • the number of data strips of the distributed file storage system N is 6 for description.
  • the client agent sends a data deletion request to the corresponding storage server node 101 in the distributed file storage system 10 based on the obtained distribution information of the stripe blocks. If there is no stripe block of the target file or only an empty stripe block of the target file in the storage server node 101 that has received the data deletion request, a response message indicating that the deleted object does not exist is returned to the client agent 30. If the valid stripe block or the check strip block of the target file is stored in the storage server node 101 that has received the data deletion request, the stripe block is deleted and the response message of the successful deletion is returned to the client agent 30. If it cannot be deleted or not completely deleted, the response message of the deletion failure is fed back to the client agent 30.
  • the client agent 30 After receiving the response message of each storage server node, the client agent 30 determines whether the sum of the received response message and the successful deletion response message is greater than or equal to the number of data strip blocks of the distributed file storage system. N. That is to say, the number of stripe blocks in which the object file is stored in the distributed file storage system cannot exceed the number M of the stripe strips of the distributed file storage system, so as to ensure that the target file can be deleted after the target file is deleted. Read from a distributed file storage system. If the sum of the received response message and the successful deletion response message is greater than or equal to the number N of data stripe blocks of the distributed file storage system, the client agent 30 returns a response message indicating that the deletion was successful to the user. Otherwise, the client agent 30 returns a response message to the user that the deletion failed.
  • the client agent 30 After receiving the data deletion request, the client agent 30 searches the metadata information table according to the file identifier of the target file carried in the data deletion request, and obtains the number M of the check strip block of the distributed file storage system is 2, distributed. The number N of stripe strips of the storage system is 6 and the distribution information of the stripe blocks.
  • the client agent 30 sends the received data deletion request to the corresponding storage server node 101 in the distributed storage system 30 according to the distribution information of the stripe block.
  • storage server The node 1 stores the valid stripe block D1 of the target file
  • the storage server node 2 stores the valid stripe block D2 of the target file
  • the client proxy 30 transmits the received data deletion request to the storage server node 1 and the storage server node.
  • Storage Server Node 1 and Storage Server Node 2 delete the stripe blocks D1 and D2, and return a successful response message to the client agent 30 after successful deletion.
  • the storage server node 3 stores an empty stripe block of the target file, and the storage server node 4, the storage server node 5, and the storage server node 6 have no stripe blocks of the target file, and the storage server node 3-6 receives the data deletion request. Thereafter, a response message indicating that the deleted object does not exist is fed back to the client agent 30, respectively.
  • the storage server node 7 and the storage server node 8 store the target file check strip block, delete the corresponding strip block D7 and D8, and return a successful response message to the client agent 30 after successful deletion.
  • the client agent 30 receives 4 response messages for successful data deletion returned by each storage server node 101, 4 response messages for which the deletion object does not exist, and the received deletion success response message and the response message for which the deletion object does not exist are 8 in total. , greater than the number of striped strips of the distributed file storage system, the client agent 30 returns a response to the successful deletion of the user. In the case that the data processing request is a data deletion request, only the sum of the number of response messages returning the deletion success and the number of response messages not returning the deletion object is greater than or equal to the number of data strip blocks of the distributed file storage system. When the client agent returns a message that the deletion was successful to the user.
  • the data deletion operation it is required to compare with the number N of data strip blocks of the distributed file storage system, and the data write request and the data read request need to be compared with the number N' of actual data strip blocks of the target file. This is mainly to confirm that the strip file block of the target file is deleted and it is ensured that the target file cannot be read again.
  • the method of deleting the stripe data caused by the truncation is the same as the method of deleting the strip of data strips, and will not be further described herein.
  • a redundancy protection mechanism of N'+M is adopted, that is, when the target file is striped, the size is generated according to the size.
  • the number of different strip blocks that is, the number of actual strip blocks N'.
  • the present invention also provides an apparatus for implementing a data processing method in a distributed file storage system.
  • the device may be a client agent.
  • the device may be deployed in an application server connected to the distributed file storage system, or may be deployed in each storage server node in the distributed file storage system, as shown in FIG.
  • the distributed file storage system 10 includes a plurality of storage servers 101, and a plurality of storage servers 101 are interconnected by a low-latency, high-throughput network (eg, IB network, 10G Ethernet). .
  • the distributed file storage system 10 also includes a Front-End switch 50 and a Back-End switch 60.
  • the front end switch 50 is used for external service request and data interaction between the user data and the cluster.
  • the backend switch 60 is used for internal request and data interaction between the various storage server nodes 101 within the cluster.
  • the application server 20 communicates with the distributed file storage system 10 via the front end switch 50.
  • each application When the device is deployed in the application server 20 connected to the distributed file storage system 10, each application is directly accessed and deployed on the application server through a standard portable operating system interface (English: Portable Operating System Interface, POSIX). 20 on the file system client agent CA (Client Agent) 30.
  • the client agent 30 serves as a portal for the distributed file storage system 10 to provide external services, and then interacts with the storage server 101 inside the cluster after receiving the application request.
  • each application accesses the corresponding network attached storage server through a commonly used NAS protocol (such as NFS/CIFS). Attached Storage Server (NAS Server), and the NAS Server is deployed with the storage server.
  • the NAS Server then accesses the file system client agent deployed on the server node to implement the storage service.
  • the first access mode is specifically described, and the second access mode adopts a similar implementation principle.
  • the following uses the device as a client agent as an example for description.
  • the service system includes two application servers 20, and the application server 20 communicates with the distributed file storage system 10 through the front-end switch 50.
  • the client agent 30 is deployed in the application server 20, and the user's data processing request is first sent to the client agent 30 of the application server 20, and the client agent 30 performs corresponding processing on the data processing request.
  • the application server 20 transmits a data processing request to the client agent 30 in the corresponding storage server node 101 through the front end switch 50, The data processing request is processed by the client agent 30 in the storage server node 101.
  • the storage servers 101 in the distributed file storage system 10 are interconnected to form a cluster, which constitutes a large-scale network RAID, and the stored data adopts a N+M redundancy protection mechanism.
  • the M is a distributed file storage system 10.
  • the specific value of the M may be set to a fixed value according to service requirements.
  • N is the number of data strips that are split when the file data is striped.
  • N is calculated according to the number of storage server nodes and the value of M of the distributed file storage system 10, and may also be set according to business needs. Fixed value.
  • the distributed file storage system 10 can set a uniform value of N and M, and can also set different values of N and M for a certain directory, according to service requirements.
  • the N and M of the distributed file storage system are stored in a file data metadata information table of the distributed file storage system.
  • a file data to be processed is called a stripe data
  • a stripe data stripe is divided into N data strip blocks, and M checksums are generated according to the redundancy algorithm.
  • Strip block is a file data to be processed.
  • the client agent 30 includes a receiving module 301, a processing module 303, and a sending module 305.
  • the receiving module 301 of the client proxy 30 is configured to receive a data processing request sent by the application, where the data processing request carries information such as a file identifier FID, an offset address offset, and a file length length of the target file.
  • the target file is a file to be processed.
  • the processing module 303 is based on the number The number of valid strips (DSC) of the file data is calculated according to the offset address and length information carried in the processing request.
  • a valid stripe block is a strip of data strips containing file data.
  • the number of valid stripe blocks that are sliced when the file data is striped is smaller than the number N of data strips of the distributed file storage system.
  • the application server 20 When the client agent 30 is deployed on the server node 101 of the distributed file storage system 10, after receiving the data processing request, the application server 20 sends the data request to the client agent 30 on a storage server 101 through the front end switch 50. .
  • the method by which the application server 20 sends the data processing request to the client agent 30 on the storage server 101 is similar to the existing method and will not be described in detail herein.
  • the method of the client agent 30 on the storage server 101 for the data processing request is similar to that of the client agent 30 on the application server 20 and will not be described again.
  • the number DSC of valid strip blocks is greater than the number M of check strip blocks, that is, when DSC>M
  • the number of empty strip blocks is 0, which means that there is no need to add empty strip blocks, which reduces the number of empty strip blocks in the distributed file storage system.
  • the number of valid stripe blocks may be the same as the number N of data strips of the distributed file storage system. In this case, the number of empty stripe blocks is 0, that is, no empty strips need to be added. Piece.
  • the number of actual stripe blocks N' is used instead of the strips of the distributed file storage system.
  • the number N effectively reducing the number of hollow strips in the distributed file storage system.
  • the read and write 10 operations for empty stripe blocks are reduced, and the performance of the distributed file storage system is improved.
  • Corresponding processing is performed on N' actual stripe blocks.
  • the specific processing method differs from the type of data processing request.
  • the sending module 305 is configured to send the result processed by the processing module 303 to the user.
  • the receiving module 301 is configured to receive the data write request.
  • the processing module 303 performs striping processing on the target file according to the offset address, the file length information, and the size of the obtained stripe block, and obtains the number DSC of the effective stripe blocks of the target file.
  • the processing module 303 compares the number DSC of the valid strip blocks sliced when the target file is stripped with the obtained number of the check strip blocks.
  • the number DSC of the effective slice blocks is less than or equal to the number M of the check strip blocks, that is, DSC M
  • the processing module 303 generates M check bars by using a redundancy algorithm according to the obtained actual stripe block. With blocks, and adding the consistency label information and the number of valid strip blocks DSC in N' actual strip blocks and M check strip blocks.
  • the consistency tag information may be the same timestamp timestamp or version number information.
  • the processing module 303 writes the actual stripe block and the check strip block carrying the consistency label information and the number of valid stripe blocks DSC into the corresponding storage server node.
  • the method of how to confirm the storage server node that each stripe block should be written is similar to the existing implementation method, and will not be described in detail here.
  • the processing module 303 saves the number N′ of the actual stripe blocks of the target file and the number M of the check strip blocks and the distribution information of each strip block into the metadata information table of the file, so as to read the target file.
  • the stripe block of the target file can be read into the corresponding storage server node.
  • the number DSC of the effective stripe block segmented by the target file striping process is smaller than the number M of the check strip block of the distributed file storage system
  • the number of actual stripe blocks N' is equal to the number of check strips M+1, and the value of M is generally small. At this time, only a few empty stripe blocks need to be added; when the target file is striped
  • the number of valid strip blocks DSC is larger than the number M of the check strip blocks of the distributed file storage system
  • the number of actual strip blocks N' is equal to the number of valid strip blocks. Empty strips.
  • the target file is small file data
  • the target file is striped to generate N' actual stripe blocks, instead of the number N of data strips of the distributed file storage system in the prior art, which is reduced.
  • the number of stripe blocks effectively reduces the write operation of the empty stripe block, reduces the 10 operations of the distributed file storage system 10, and improves the performance of the distributed file storage system 10.
  • the performance improvement of the distributed file storage system 10 is more obvious.
  • the file data to be read by the user is referred to as a target file.
  • the data read request is a block that needs to be read to the target file and stored in the corresponding server node, and a request for restoring the original target file is constructed.
  • the receiving module 301 is configured to receive the data read request, where the data read request carries The file identifier of the target file identifies information such as FID, offset address offset, and file length length.
  • the processing module 303 obtains redundancy ratio information of the distributed file storage system from the file system metadata information table according to the file identifier FID carried in the data read request, where the redundancy ratio information is N and M value.
  • M is the number of check strip blocks in the distributed file storage system in order to redundantly protect the stored file data. The specific value of M may be set according to service requirements.
  • N is the number of data strips that are split when the file data is striped. N is calculated according to the number of storage server nodes and the value of M in the distributed file storage system, and can also be fixed according to business needs.
  • the distributed file storage system may set a uniform value of N and M, and may also set different values of N and M for a certain directory, according to service requirements.
  • a unified N and M are taken as an example for description.
  • the N and M of the distributed file storage system are stored in a file data metadata information table of the distributed file storage system.
  • M and N for a directory device are stored in the directory metadata table.
  • the processing module 303 can find the number N of data strip blocks of the target file, the number M of the check strip blocks, and the size of the strip block according to the file identifier.
  • the processing module 303 calculates the number DSC of the valid stripe blocks of the target file according to the offset address, the file length information, and the size of the stripe block carried in the data read request.
  • the specific calculation method is similar to the method in the data write request scheme, and will not be further described herein.
  • the processing module 303 may search for the distribution information of the stripe block of the target file by using the file identifier carried in the data read request, and determine, according to the published information, which storage block and the check strip block of the target file are stored.
  • the processing module 303 can send a data read request to all of the determined storage server nodes; the data read request can also be sent to all storage server nodes in the distributed file storage system.
  • the processing module 303 may also send the data read request to the storage server node of the actual stripe block storing the target file, and the response fed back by the storage server node storing the actual strip file of the target file may not be correctly read to the target file. Then, the data read request is sent to the storage server node of the check strip block storing the target file.
  • the processing is performed in the last case, that is, the processing module 303 first sends the data read request to the storage server node storing the actual strip file block of the target file, and the storage server storing the actual strip file block of the target file.
  • the data read request is sent to the storage server node storing the target file check strip block.
  • the processing module generates a new data block read request and sends the data block read request to the determined storage server node 101.
  • the data block read request is for reading a stripe block of the object file in the storage server node 101.
  • the storage server node After receiving the data block read request, the storage server node determines whether the stored corresponding stripe block is readable according to the file identifier carried in the data block read request, and if it can be read, sends the read to the processing module 303 to be read. Success response message, the success response message carries the consistency label information of the stripe block and the number of valid stripe blocks DSC information; if the corresponding stripe block is not stored or the corresponding stripe block of the storage is damaged and cannot be read, Then, the failure processing message that cannot be read is sent to the processing module 303.
  • the processing module 303 sends the data block read request to the storage server node storing the actual strip file block of the target file, and receives the response information of the data block read request returned by each storage server node. If the number of successful response messages is the same as the number N' of actual stripe blocks, and the consistency label information and the number of valid stripe blocks DSC in the success response message are the same, the target file can be read at this time, the processing Module 303 reads the actual stripe block and constructs the object file for transmission to the user. The processing module 303 reads the actual stripe block and implements the object file in the same manner as the existing method, and is not described here.
  • the number of successful response messages fed back by the storage server node storing the actual strip file block is smaller than the number of actual stripe blocks N' or the success response message with the same consistency label information and the number of valid stripe blocks If the number is smaller than the number N' of the actual stripe blocks, it is further determined whether the number of successful response messages or the number of successful response messages with the same consistency label information and the number of valid stripe blocks is greater than the distributed file storage. The number M of check strips of the storage system.
  • the number of successful response messages fed back by the storage server node storing the actual strip file block is smaller than the number of actual stripe blocks N' or the success response message with the same consistency label information and the number of valid stripe blocks If the number is smaller than the number N' of the actual stripe blocks, it is further determined whether the number of successful response messages or the number of successful response messages with the same consistency label information and the number of valid stripe blocks DSC information is greater than the distributed file storage system.
  • the number of check strips is M.
  • the processing module 303 determines Whether the number of successful response messages or the number of successful response messages with the same consistency tag information and the number of valid slice blocks is greater than or equal to the number of valid slice blocks of the target file DSC, if the number of successful response messages is The number of successful response messages of the same consistency label information and the effective stripe block quantity information is greater than or equal to the number DSC of the valid stripe blocks of the target file, and the correct target file can be read at this time, and the processing module 303 performs corresponding
  • the specific processing method is the same as the existing implementation method, and will not be further described here.
  • the processing module 303 reads the data The request is sent to a storage server node storing the target file check strip block, and receives a response message of the storage server node storing the target file check strip block.
  • the processing module 303 further determines whether the number of all successful response messages or the number of successful response messages with the same consistency label information and the effective stripe block quantity information is greater than or equal to the number of valid stripe blocks of the target file, DSC, all If the number of successful response messages and the number of successful response messages with the same consistency label information and valid stripe block number information is greater than or equal to the number of valid stripe blocks of the target file DSC, the correct target can be read at this time. File, if the number of all successful response messages or with the same consistency label information The number of successful response messages of the valid stripe block number information is smaller than the number DSC of the valid stripe blocks of the target file, and the user is fed back the failed information.
  • the processing module 303 If the number of successful response messages or the number of successful response messages with the same consistency label information and valid stripe block number information is less than or equal to the number M of check strip blocks of the distributed file storage system, the processing module 303 And sending the data block read request to the storage server node storing the target file check strip block, and receiving the response message of the storage server node storing the target file check strip block. The processing module 303 further determines whether the number of all successful response messages or the number of successful response messages with the same consistency label information and the effective stripe block quantity information is greater than the check strip block of the distributed file storage system. The number M.
  • the number of all successful responses and the number of successful response messages with the same consistency label information and valid stripe block number information is greater than the number M of check strip blocks of the distributed file storage system, then all success is determined. Whether the number of response messages or the number of successful response messages with the same consistency tag information and the number of valid slice blocks is greater than or equal to the number DSC of valid slice blocks of the target file. If the number of all successful response messages and the number of successful response messages with the same consistency label information and valid stripe block number information is greater than or equal to the number DSC of the valid stripe blocks of the target file, then the correct one can be read.
  • Target file if the number of all successful response messages or the number of successful response messages with the same consistency tag information and the number of valid slice blocks is less than the number DSC of the valid slice blocks of the target file, the read failure is fed back to the user. Information. If the number of all successful response messages or the number of successful response messages with the same consistency tag information and the number of valid slice blocks is smaller than the number M of the check strip blocks of the distributed file storage system, feedback to the user Read failed information.
  • the data processing request may also be a data deletion request or a data truncation request.
  • the implementation method of the data deletion request is similar to the implementation method of the data truncation request.
  • the implementation method of the data truncation request is not described separately.
  • the file data that the user wants to delete is called an object file, and the data deletion request is used to delete the file.
  • the associated strip file of the target file stored in each storage server node.
  • the data deletion request carries the file identifier FID information of the target file.
  • the receiving module 301 receives the data deletion request; the processing module 303 obtains the redundant configuration of the distributed file storage system from the file system metadata information table according to the received file identifier FID carried in the data deletion request.
  • the ratio information and the distribution information of the stripe blocks that is, the values of N and M.
  • M is the number of check strip blocks in the distributed file storage system in order to redundantly protect the stored file data.
  • the specific value of M may be set according to service requirements.
  • N is the number of data strips that are split when the file data is striped.
  • N is calculated according to the number of storage server nodes and the value of M in the distributed file storage system, and can also be fixed according to business needs. Value.
  • the distributed file storage system may set a uniform value of N and M, and may also set different values of N and M for a certain directory, according to service requirements. In this embodiment, a unified N and M are taken as an example for description.
  • the N and M of the distributed file storage system are stored in a file data metadata information table of the distributed file storage system.
  • M and N for a directory device are stored in the directory metadata table.
  • the processing module 303 sends a data deletion request to the corresponding storage server node in the distributed file storage system according to the obtained distribution information of the stripe block. If there is no stripe block of the target file or only an empty stripe block of the target file in the storage server node that receives the data deletion request, a response message indicating that the deleted object does not exist is returned to the processing module 303. If the valid stripe block or the check stripe block of the target file is stored in the storage server node that receives the data deletion request, the stripe block is deleted, and the response message of the successful deletion is returned to the processing module 303. If it cannot be deleted or not completely deleted, the processing module 303 feeds back the response message of the deletion failure.
  • the processing module 303 determines whether the sum of the received response message and the deletion success response message is greater than or equal to the data stripe block of the distributed file storage system.
  • Quantity N That is to say, the number of stripe blocks in which the object file is stored in the distributed file storage system cannot exceed the number M of the stripe strips of the distributed file storage system, so as to ensure that the target file can be deleted after the target file is deleted.
  • Read from a distributed file storage system receive. If the sum of the received response message that the strip does not exist and the response message that is successfully deleted is greater than or equal to the number N of data strip blocks of the distributed file storage system, a response message indicating that the deletion is successful is returned to the user. Otherwise, a response message to the deletion failure is returned to the user.
  • the device provided in the embodiment of the present invention when writing a file into the distributed file storage system, adopts a redundancy protection mechanism of N'+M according to the size of the file to be written, that is, according to the size of the target file striping
  • the number of different strip blocks is generated, that is, the number of actual strip blocks N'. In this way, it can ensure that the correct target file can be correctly obtained under any circumstances, and the number of hollow strips in the distributed file storage system can be effectively reduced, and the number of disks 10 and 10 in the distributed file storage system can be reduced. Improve the performance of distributed file storage systems.
  • the function is implemented in the form of computer software and sold or used as a stand-alone product, it may be considered to some extent that all or part of the technical solution of the present invention (for example, a part contributing to the prior art) is It is embodied in the form of computer software products.
  • the computer software product is typically stored in a computer readable non-volatile storage medium, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all of the methods of various embodiments of the present invention. Or part of the steps.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

 本发明涉及一种分布式文件存储系统中的数据处理方法及设备30,包括客户端代理30接收用户的数据处理请求,数据处理请求中携带有目标文件的文件标识、偏移地址和文件长度等信息;客户端代理30根据数据处理请求中携带的文件标识获得冗余配比信息,冗余配比信息包括所述分布式文件存储系统的数据条带块的数量N和所述分布式文件存储系统的校验条带块的数量M;根据数据处理请求中携带的偏移地址和长度信息确定所述目标文件的有效条带块的数量DSC;根据所述有效条带块的数量DSC和所述校验条带块的数量M确定所述目标文件的实际条带块的数量N';根据所述实际条带块的数量N'确定对应的条带块并进行处理。通过根据待处理文件的大小,动态调整目标文件条带化处理时生成的实际条带块的数量,这样既能保证在任何情况下都可以获取到正确的目标文件,还可以减少分布式文件存储系统10中的空条带块的个数,这样可以小文件场景下节省大量的网络读写I/O与磁盘读写I/O,提高分布式存储系统10的性能。

Description

一种分布式文件存储系统中的数据处理方法及设备 技术领域
本发明涉及存储技术领域, 尤其涉及一种分布式文件存储系统中的数据 处理方法及设备。 背景技术
随着计算机技术、 网络技术的发展及人类生活的信息化, 用户对存储系 统存储容量的需求越来越大, 对存储系统性能的要求也越来越高。 存储系统 亦由计算机自带存储器发展到存储阵列、 网络附加存储 (Network Attached Storage, NAS )等独立存储系统, 再到大型的分布式文件存储系统。 随着数 字化程度的提高, 存储对象也从结构化的数据为主转变为以图片、 微视频等 非结构化的文件数据为主。这样就对存储系统中文件数据的访问性能提出了 更高的要求, 提高大型的分布式文件存储系统的访问性能成为当前存储领域 的首要任务。
在分布式文件存储系统中包含有多个存储服务器, 多个存储服务器之间 通过低延迟、 高吞吐量的网络 (例如 IB网络、 10G以太网) 互连形成集群, 构成一个大型的网络 RAID ( Redundant Array of Inexpensive Disks, 独立冗余 磁盘阵列) , 同时所有存储服务器同时对外提供数据读写服务。 文件数据存 储到分布式文件存储系统中时, 利用跨节点的 RAID算法 (例如 RAID5、 RAID6或者 RAIDZ) , 或前向纠错码 (Erasure Code) 算法等算法把文件数 据条带化 (Stripe) , 即将文件数据切分为多个数据条带块 (Strip ) , 并生 成相应的校验条带块,然后将数据条带块和校验条带块存储到相应节点的存 储服务器上。 当读取存储的文件数据时, 从存储服务器节点中读取一定数量 的数据条带块和校验条带块后构造出用户需要读取的原始文件数据。 随着分布式文件存储系统中集群规模的增大, 为了提高整个分布式文件 存储系统的空间利用率,文件数据条带化时切分的数据条带块的数量也越来 越多, 读写操作时的磁盘 10和网络 10也相应增加。 这样, 文件数据条带化时 切分的数据条带块数量也相应增加, 在小文件场景下, 会对分布式文件存储 系统的访问性能造成较大的负担。
发明内容
有鉴于此, 本发明要解决的技术问题是, 如何提高小文件场景下分布式 文件存储系统的访问性能。
为达到上述目的, 本发明的实施例采用如下技术方案:
本发明的第一方面,提供一种应用于分布式文件存储系统的数据处理方 法, 所述方法包括: 客户端代理接收用户的数据处理请求, 所述数据处理请 求中携带有目标文件的文件标识、 偏移地址和文件长度等信息; 所述目标文 件为所述数据处理请求中需要处理的文件; 客户端代理根据所述数据处理请 求中携带的所述目标文件的文件标识获得冗余配比信息,所述冗余配比信息 包括所述分布式文件存储系统的数据条带块的数量 N和所述分布式文件存储 系统的校验条带块的数量 M; 根据所述数据处理请求中携带的所述目标文件 的偏移地址和长度信息确定所述目标文件的有效条带块的数量 DSC, 所述有 效条带块为包含有所述目标文件的数据的条带块; 根据所述有效条带块的数 量 DSC和所述校验条带块的数量 M确定所述目标文件的实际条带块的数量 Ν'; 根据所述实际条带块的数量 N'确定对应的条带块并进行处理。
结合第一方面, 在一种可能的实现方式中, 所述根据所述有效条带块的数量 DSC和所述校验条带块的数量 M确定 所述目标文件的实际条带块数量 N'具体为:若所述有效条带块的数量 DSC小 于或等于所述校验条带块的数量 M, 则所述目标文件的实际条带块的数量 N' 为所述校验条带块的数量 M+1 , 即 N'=M+1 ; 若所述有效条带块的数量 DSC 大于所述校验条带块的数量 M, 则所述目标文件的实际条带块数量 N'等于所 述有效条带块的数量 DSC,即 N' =DSC。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方式中, 所述校验条带块的数量 M和数据条带块的数量 N可以有多组, 分别存 储在对应的目录信息表中。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方式中, 当所述数据处理请求为数据写请求时,所述根据所述实际条带块的数量
N'确定对应的条带块并进行处理还包括:
对所述目标文件进行条带化处理, 得到 N'个实际条带块, 并使用冗余算 法生成 M个校验条带块;
在所述 N'个实际条带块和所述 M个校验条带块中添加一致性标签信息 和有效条带块的数量 DSC信息; 所述一致性标签信息可以为时间戳或者版本 号;
将所述 N'个实际条带块和所述 M个校验条带块写到对应的存储服务器 节点中。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方式中, 当所述数据处理请求为数据读请求时, 所述方法还包括, 根据所述文件 标识获取所述目标文件的条带块的分布信息; 所述根据所述实际条带块的数 量 N'确定对应的条带块并进行处理具体为:
生成新的数据块读请求,所述数据块读请求用于读取存储服务器节点中 的所述目标文件的条带块; 将所述数据块读请求根据获取到的目标文件的条 带块的分布信息发送给存储实际条带块的存储服务器节点; 接收所述存储实际条带块的存储服务器节点的响应消息; 所述响应消息为 可以读取的成功响应消息或无法读取的失败响应消息, 所述成功响应消息 中携带有实际条带块的一致性标签信息和有效条带块的数量 DSC信息; 根据接收到的所述响应消息判断是否可以读取到所述目标文件。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方式中, 根据接收到的所述响应消息判断是否可以读取到所述目标文件具体为: 若接收到的成功响应消息的数量等于所述实际条带块的数量 N', 并且 所述成功响应消息中携带的一致性标签信息和有效条带块的数量 DSC信息 都相同, 则可以读取到所述目标文件;
若接收到的成功响应消息的数量小于所述实际条带块的数量 Ν', 则判 断所述接收到的成功响应消息的数量是否大于所述校验条带块的数量 Μ; 若所述接收到的成功响应消息的数量大于所述校验条带块的数量 Μ,则 判断成功响应消息的数量是否大于或等于所述目标文件的有效条带块的数 量 DSC, 并且所述成功响应消息中携带的一致性标签信息和有效条带块的 数量 DSC信息都相同; 若是, 则可以读取到所述目标文件; 否则, 根据获 取到的分布信息将所述数据块读请求发送给存储校验条带块的存储服务器 节点; 若所述存储校验条带块的存储服务器节点返回的成功响应消息的数 量大于或等于所述目标文件的有效条带块的数量 DSC, 并且所述成功响应 消息中携带的一致性标签信息和有效条带块的数量 DSC信息都相同, 则可 以读取到所述目标文件; 若所述存储校验条带块的存储服务器节点返回的 成功响应消息的数量小于所述目标文件的有效条带块的数量 DSC, 或者所 述成功响应消息中携带的一致性标签信息和有效条带块的数量 DSC信息不 相同, 则无法读取到目标文件。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方式中, 若接收到的成功响应消息的数量小于或等于所述校验条带块的数量 Μ, 根据获取到的分布信息将所述数据块读请求发送给存储校验条带块的存储 服务器节点;
接收所述存储校验条带块的存储服务器节点返回的响应消息; 根据所述存储校验条带块的存储服务器节点返回的响应消息判断是否可以 读取到所述目标文件。
本发明的第二方面,提供了一种实现分布式文件存储系统中数据处理方 法的设备 30, 所述设备 30与所述分布式文件存储系统 10中的存储服务器 节点 101进行通信, 所述设备包括接收模块 301、 处理模块 303和发送模块 305:
所述接收模块 301用于接收用户的数据处理请求, 所述数据处理请求中 携带有目标文件的文件标识、 偏移地址和文件长度等信息; 所述目标文件为 所述数据处理请求中需要处理的文件;
所述处理模块 303用于:
根据所述数据处理请求中携带的所述目标文件的文件标识从所述存 储服务器节点中获得冗余配比信息, 所述冗余配比信息包括所述分布式 文件存储系统的数据条带块的数量 N和所述分布式文件存储系统的校验 条带块的数量 M;
根据所述数据处理请求中携带的所述目标文件的偏移地址和长度信 息确定所述目标文件的有效条带块的数量 DSC, 所述有效条带块为包含 有所述目标文件的数据的条带块;
根据所述有效条带块的数量 DSC和所述校验条带块的数量 M确定所 述目标文件的实际条带块的数量 N' ;
根据所述实际条带块的数量 N'确定对应的条带块并进行处理; 所述发送模块 305用于将处理结果反馈给所述用户。
结合第二方面, 在一种可能的实现方式中, 中或者所述分布式文件存储系统中 10的存储服务器节点 101中
结合第二方面和上述可能的实现方式, 在另一种可能的实现方式中, 所述根据所述有效条带块的数量 DSC和所述校验条带块的数量 M确定 所述目标文件的实际条带块数量 N'具体为:
若所述有效条带块的数量 DSC小于或等于所述校验条带块的数量 M, 则 所述目标文件的实际条带块的数量 N'为所述校验条带块的数量 M+1 , 即 N,=M+1 ;
若所述有效条带块的数量 DSC大于所述校验条带块的数量 M, 则所述目 标文件的实际条带块数量 N'等于所述有效条带块的数量 DSC,即 N'DSC。
结合第二方面和上述可能的实现方式, 在另一种可能的实现方式中, 当所述数据处理请求为数据写请求时,所述根据所述实际条带块的数量 N'确定对应的条带块并进行处理还包括:
对所述目标文件进行条带化处理, 得到 N'个实际条带块, 并使用冗余算 法生成 M个校验条带块;
在所述 N'实际条带块和所述 M个校验条带块中添加一致性标签信息和 有效条带块的数量 DSC信息; 所述一致性标签信息可以为时间戳或者版本号; 将所述 N'个实际条带块和所述 M个校验条带块写到对应的存储服务器 节点中。
结合第二方面和上述可能的实现方式, 在另一种可能的实现方式中, 当所述数据处理请求为数据写请求时,所述根据所述实际条带块的数量
N'确定对应的条带块并进行处理还包括:
生成新的数据块读请求,所述数据块读请求用于读取存储服务器节点中 的所述目标文件的条带块; 将所述数据块读请求根据获取到的目标文件的条 带块的分布信息发送给存储实际条带块的存储服务器节点;
接收所述存储实际条带块的存储服务器节点的响应消息;所述响应消息 为可以读取的成功响应消息或无法读取的失败响应消息,所述成功响应消息 中携带有实际条带块的一致性标签信息和有效条带块的数量 DSC信息; 根据接收到的所述响应消息判断是否可以读取到所述目标文件。
结合第二方面和上述可能的实现方式, 在另一种可能的实现方式中, 所 述接收模块还用于接收用户的数据删除请求,所述数据删除请求中携带有目 标文件的文件标识; 所述目标文件为需要删除的文件;
所述处理模块根据所述文件标识从所述存储服务器节点中获得冗余配 比信息,所述冗余配比信息包括所述分布式文件存储系统的数据条带块的数 量 N和所述分布式文件存储系统的校验条带块的数量 M;
根据所述分布式文件存储系统的数据条带块的数量 N确认存储所述目标 文件的条带块的存储服务器节点;
将所述数据删除请求发送给所述存储所述目标文件的条带块的存储服 务器节点;
接收所述存储所述目标文件的条带块的存储服务器节点的响应消息;所 述响应消息为删除成功的响应消息、删除对应不存在的响应消息以及删除失 败的响应消息中的一种;
当接收到的删除成功的响应消息和删除对象不存在的响应消息的数量 超过所述数据条带块的数量 N时, 删除成功; 否则删除失败;
所述发送模块用于将删除成功或删除失败的结果反馈给所述用户。 本发明实施例根据目标文件的大小确定有效条带块,进一步确定目标文 件的实际条带块,在能保证任何情况下都可以获取到正确的目标文件的情况 下, 还可以减少分布式文件存储系统 10中的空条带块个数, 这样在小文件场 景下可以节省大量的网络读写 I/O与磁盘读写 I/O, 提高分布式存储系统 10的 性能。 附图说明 为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下 面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲, 还 可以根据这些附图获得其他的附图。 图 1为分布式文件存储系统的结构示意图;
图 2为条带块分布的示意图;
图 3为本发明实施例中数据处理请求实现的方法流程示意图;
图 4为本发明实施例中当数据处理请求为数据写请求的方法流程示意图 图 5为本发明实施例中当数据处理请求为数据读请求的方法流程示意图 图 6为本发明实施例中当数据处理请求为数据删除请求的方法流程示意 图;
图 7为本发明实施例中实现分布式文件存储系统中数据处理方法的设备 的结构示意图。
具体实鮮式
以下将参考附图详细说明本发明的各种示例性实施例、 特征和方面。 附 图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施 例的各种方面, 但是除非特别指出, 不必按比例绘制附图。
另外, 为了更好的说明本发明, 在下文的具体实施方式中给出了众多的 具体细节。 在一些实例中, 对于本领域技术人员熟知的方法、 手段、 元件和 电路未作详细描述, 以便于凸显本发明的主旨。
分布式文件存储系统 10的主要组成部件如附图 1所示, 包含有多个存储 服务器 101, 多个存储服务器 101之间通过低延迟、 高吞吐量的网络(例如 IB 网络、 10G以太网) 互连形成集群。 分布式文件存储系统 10还包括前端 (Front-End)交换机 50和后端(Back-End)交换机 60。 前端交换机 50用于用 户数据与集群之间进行外部业务请求与数据交互。后端交换机 60用于集群内 部各个存储服务器节点之间内部请求与数据交互。应用服务器 20通过所述前 端交换机 50与所述分布式文件存储系统 10进行通信。
在分布式文件存储系统 10中, 各应用一般通过两种方式与存储服务器 101进行交互。第一种方式是各应用通过标准的可移植操作系统接口(英文: Portable Operating System Interface, 缩写: POSIX) 直接访问部署在应用服 务器 20上的文件系统客户端代理 CA (Client Agent) 30。 客户端代理 30作为 分布式文件存储系统 10对外提供服务的门户, 收到应用的请求后再与集群内 部的存储服务器 101交互。第二种访问方式是各应用通过常用的 NAS协议(如 NFS/CIFS等 )客户端访问相应的网络附加存储的服务器端( Network Attached Storage Server, NAS Server) , 而 NAS Server与存储服务器 101部署在一起, NAS Server再访问部署在该 Server节点上的文件系统客户端代理 30实现存储 业务。 为了清楚说明本发明的实现原理, 现采用第一种访问方式进行具体说 明, 第二种访问方式采用类似的实现原理。
以附图 1中所示的业务系统为例,业务系统中包含有两个应用服务器 20, 所述应用服务器 20通过前端交换机 50与分布式文件存储系统 10进行通信。所 述客户端代理 30部署在所述应用服务器 20中,用户的数据处理请求先发送到 应用服务器 20的客户端代理 30, 客户端代理 30对数据处理请求进行相应的处 理。 在客户端代理 30部署在存储服务器节点 101的情况下, 应用服务器 20接 收到用户的数据处理请求之后,通过前端交换机 50将数据处理请求发送到对 应的存储服务器节点 101中的客户端代理 30, 由存储服务器节点 101中的客户 端代理 30对数据处理请求进行处理。
分布式文件存储系统 10中的存储服务器 101互连形成集群, 构成一个大 型的网络 RAID, 存储的数据采取 N+M的冗余保护机制。 其中, M是分布式 文件存储系统 10为了对存储的文件数据进行冗余保护的校验条带块的数量, M的具体取值可以根据业务需要设定一个固定的值。 N是文件数据进行条带 化时生成的数据条带块的数量, N是根据分布式文件存储系统 10的存储服务 器节点数以及 M的值计算得到的, 也可以根据业务需要设定一固定的值。 所 述分布式文件存储系统 10可以设定统一的 N和 M的取值, 也可以为某个目录 设定不同的 N和 M的取值, 根据业务需求而定。 分布式文件存储系统 10的 N 和 M存储在分布式文件存储系统 10的文件数据元数据信息表中。 所述元数据 信息表与已有的实现方式相同, 在此不再另行说明。 为了使描述更清楚, 将 一个待处理的文件数据称为一个分条数据, 一个分条数据条带化处理时会被 切分为 N个数据条带块, 并根据冗余算法生成 M个校验条带块。
客户端代理 30对接收到的文件数据进行条带化处理,将文件数据切分成
N个数据条带块, 再根据冗余算法生成 M个校验条带块, 并将生成的数据条 带块和校验条带块存储到相应的存储服务器节点 101中, 并在数据条带块和 校验条带块中记录相同的时间戳或版本号等一致性标签信息。 N个数据条带 块和 M个校验条带块写入到对应的存储服务器节点 101中。 条带块可以根据 存储服务器节点 101的编号顺序写入各存储服务器节点 101中, 也可以根据其 他的规则写入对应的服务器节点 101中。 写入规则与已有的写入规则相同, 在此不再另行说明。 当用户需要读取文件数据时, 客户端代理 30通过需要读 取的文件数据的标识信息读取到一致性标签信息相同的一定数量数据条带 块或者校验条带块来获取用户需要读取的文件数据。根据冗余算法的一致性 原理, 在读取采用 N+M冗余保护的分条数据时, 至少需要获取到 N+M个条 带块中至少任意 N个一致性标签信息相同的条带块, 才能保证读取出的分条 数据是正确的数据。
如附图 1中所示的分布式文件存储系统 10, 包含有 8个存储服务器节点 101。 假设分布式文件存储系统 10的 M设置为 2, N设置为 6; 当然也可以设 置 M为 2, N为 5; 在本实施例中, 以 N为 6、 M为 2进行示例性说明。 当需要将 文件数据存储到分布式文件存储系统 10中时, 应用服务器 20中的客户端代理 30接收需要存储的文件数据, 并将文件数据条带化处理切分为 6个数据条带 块, 再利用冗余算法生成 2个校验条带块, 并在数据条带块和校验条带块中 记录相同的时间戳或者版本号等一致性标签信息。 客户端代理 30将 6个数据 条带块按存储服务器节点 101的顺序分别存储到存储服务器节点 1-存储服务 器节点 6, 将 2个校验条带块按顺序存储到存储服务器节点 7和存储服务器节 点 8中, 例如附图 2所示。 当需要读取文件数据时, 客户端代理 30根据文件数 据的标识信息, 从存储服务器节点 101中读一致性标签信息相同的至少 6个条 带块即可, 这 6个条带块为数据条带块和校验条带块中的任意 6个条带块。
在小文件数据的场景下, 有效文件数据可能只会占用部份的数据条带块, 在已有的实现方案中, 其余的数据条带块是不包含有效文件数据的空白条带 块。 如附图 1和附图 2所示, 在 N为 6、 M为 2的分布式文件存储系统 10中, 有 8 个存储服务器节点 101, 数据条带块存储到存储服务器节点 1-存储服务器节 点 6中, 校验条带块存储到存储服务器节点 7-存储服务器节点 8。 小文件数据 在分条时, 只占用了 2个数据条带块, 如 D1和 D2, 存储到存储服务器节点 1 和存储服务器节点 2上并记录时间戳 T1。 还有 4个数据条带块 D3-D6并不包含 文件数据, 但是在已有的实现方法中, 仍然需要在存储服务器节点 3-6中记 录相同的时间戳 T1。校验条带块 D7和 D8分别存储到存储服务器节点 7和存储 服务器节点 8中, 并记录相同的时间戳 Tl。 读取小文件数据时, 需要至少读 取记录有时间戳 T1的 6个条带块, 才能获取到正确的小文件数据。 这样在写 或者读取小文件数据时, 会产生空条带块的 10操作, 会占用分布式文件存储 系统的磁盘 10以及网络 10。大型分布式文件存储系统中数据条带块的数量更 多, 小文件数据产生的空条带块的数量也会相应增加多, 此时会造成大量的 分布式文件存储系统的磁盘 10以及网络 10资源的浪费,从而影响分布式文件 存储系统的 10性能。 本发明提出一种在分布式文件存储系统 10中小文件数据处理的新方法, 能够减少空条带块的操作,减少分布式文件存储系统中磁盘 10和网络 10的开 销, 提升分布式文件存储系统的 10性能。本发明主要是根据待处理的文件的 大小不同, 采取 N'+M的冗余保护机制。 其中 M为分布式文件存储系统的校 验条带块的数量, N'是文件数据在条带化时根据文件数据的大小确定的实际 条带块的数量。 文件数据的大小不同, 切分出来的实际条带块的数量 N'就可 能不同, 实现了文件数据的实际条带块数量的动态调整。 这样可以减少在小 文件场景下空条带块的数量,从而减少分布式文件存储系统中磁盘 10和网络 10的数量, 提高分布式文件存储系统的性能。 为了确保本发明提供的方案仍 然可以在各种异常情况下得到正确的文件数据, N'需要大于 M, 也就是说, 实际的数据条带块的数量和校验条带块的数量要满足大多数原则, 能够在任 何异常情况下都能恢复数据。
本发明实施例中适用的分布式文件存储系统 10的结构如附图 1所示, 包 含有 8个存储服务器 101, 多个存储服务器 101之间通过低延迟、 高吞吐量的 网络 (例如 IB网络、 10G以太网) 互连形成集群。 客户端代理 30部署在应用 服务器 20中, 并通过前端交换机 50实现用户数据与集群间的通信。集群中的 各个存储服务器节点 101则通过后端交换机 60实现内部通信。 客户端代理 30 也可以部署在分布式文件存储系统 10的各个存储服务器节点 101中, 其功能 与部署在应用服务器 20中的客户端代理 30的功能类似, 不再另行描述。
分布式文件存储系统的冗余配比中 M为 2、 N为 6。 M是分布式文件存储 系统 10为了对存储的文件数据进行冗余保护的校验条带块的数量, M的具体 取值可以是根据业务需要设定的。 N是对文件数据进行条带化处理时切分的 数据条带块的数量, N是根据分布式文件存储系统的存储服务器节点数以及 M的值计算得到的, 也可以根据业务需要设定一固定的值。 所述分布式文件 存储系统可以设定统一的 N和 M的取值,也可以为某个目录设定不同的 N和 M 的取值,根据业务需求而定。在本实施例中, 以统一的 N和 M为例进行说明。 分布式文件存储系统的 N和 M存储在分布式文件存储系统 10的文件数据元数 据信息表中。
应用服务器 20上的客户端代理 30接收应用发送的用户的数据处理请求, 所述数据处理请求中携带有目标文件的文件标识 FID、 偏移地址 offset、 文件 长度 length等信息。 所述目标文件为待处理的文件。 根据数据处理请求中携 带的偏移地址和长度信息计算出文件数据的有效条带块的数量 (Data Strip Count, DSO o 有效条带块是指包含有文件数据的数据条带块。 在小文件数 据场景下, 文件数据条带化时切分出来的有效条带块的数量 DSC小于分布式 文件存储系统的数据条带块的数量 N。 分布式文件存储系统的规模越大, 小 文件数据场景下, 有效条带块的数量 DSC与分布式文件存储系统的数据条带 块的数量 N的差距也就越大。
当客户端代理 30部署在分布式文件存储系统 10的存储服务器节点 101上 时, 应用服务器 20接收到数据处理请求之后, 将数据请求通过前端交换机 50 发送给某个存储服务器上的客户端代理 30。应用服务器 20将数据处理请求发 送给存储服务器 101上的客户端代理的方法与现有方法类似,在此不再详述。 存储服务器 101上的客户端代理 30对数据处理请求的方法与应用服务器 20上 的客户端代理 30的处理方式类似, 不再另行描述。
应用服务器 20上的客户端代理 30将计算出的有效条带块的数量 DSC与 校验条带块的数量 M进行比较。 根据冗余算法的大多数原则, 为了确保在处 理过程中无论出现何种故障都可以获取到正确的文件数据,就需要读取到超 过校验条带块数量 M的条带块数量。 当有效条带块的数量 DSC小于或者等于 校验条带块的数量 M时, 即 DSC M时, 实际条带块的数量 N'等于校验条带 块的数量 M+1 , 即 N'=M+1。 实际条带块的数量 N'与有效条带块的数量 DSC 之间的数量差需要添加空条带块来补充, 即此时需要添加的空条带块的数量 为 ESC=N'-DSC= (M+l ) -DSC。 这样可以减少分布式文件存储系统中空条 带块的数量, 也相应的减少了空条带块的读写 10操作, 提高了分布式文件存 储系统的 10性能。
当有效条带块的数量 DSC大于校验条带块的数量 M时, 即 DSC>M时, 实 际条带块的数量 N'等于有效条带块的数量 DSC, 即 N'=DSC。 此时空条带块 的数量为 0, 也就是说不需要添加空条带块, 减少了分布式文件存储系统中 的空条带块的数量。
当文件数据比较大时, 有效条带块的数量 DSC可能与分布式文件存储系 统的数据条带块的数量 N相同,此时的空条带块的数量为 0, 即不需要添加空 条带块。
通过根据文件数据的大小, 对实际条带块的数量 N'进行动态调整, 特别 是在小文件数据场景下,采用实际条带块的数量 N'而不是分布式文件存储系 统的数据条带块的数量 N, 有效的减少了分布式文件存储系统中空条带块的 数量。相应的, 针对空条带块的读写 10操作减少, 分布式文件存储系统的 10 性能得到提高。
客户端代理 30对 N'个实际条带块进行相应的处理。具体处理方式与数据 处理请求的类型不同而各异。
下面以数据处理请求为数据写请求为例进行详细的说明, 实现的方法流 程如附图 3所示。所适用的分布式文件存储系统仍然以附图 1中所示的分布式 文件存储系统 10为例进行说明。
在附图 1所示的分布式文件存储系统 10中, 存储服务器节点 1-存储服务 器节点 8之间通过低延迟、 高吞吐量的网络互连形成集群。 客户端代理 30部 署在应用服务器 20中, 并通过前端交换机 50实现用户数据与集群间的通信。 集群中的各个存储服务器节点 101则通过后端交换机 60实现内部通信。 客户 端代理 30也可以部署在分布式文件存储系统 10的各个存储服务器节点 101中, 其功能与部署在应用服务器 20中的客户端代理 30的功能类似, 不再另行描述。 分布式文件存储系统 10中存储的文件数据采取 N+M的冗余保护机制。 M 是分布式文件存储系统 10为了对存储的文件数据进行冗余保护的校验条带 块的数量, M的具体取值可以是根据业务需要设定的。 N是对文件数据进行 条带化时切分的数据条带块的数量, N是根据分布式文件存储系统 10的存储 服务器节点数以及 M的值计算得到的, 也可以根据业务需要设定一固定的值。 所述分布式文件存储系统 10可以设定统一的 N和 M的取值, 也可以为某个目 录设定不同的 N和 M的取值, 根据业务需求而定。 在本实施例中, 以统一的 N和 M为例进行说明。分布式文件存储系统 10的 N和 M存储在分布式文件存储 系统 10的文件数据元数据信息表中。 为某个目录设备的 M和 N存储在目录元 数据表中。 N与 M的和可以等于分布式文件存储系统 10中存储服务器节点的 数量总和, 也可以在一个存储服务器节点 101中存储多个条带块, 即 N与 M的 和也可以大于存储服务器节点的数量总和。 在本实施例中, 分布式文件存储 系统 10的 N为 6, M为 2。 数据写请求通过标准的可移植操作系统接口发送给 部署在应用服务器 20上的文件系统客户端代理 30。客户端代理 30对所述数据 写请求进行处理, 再通过前端交换机 50与各存储服务器节点 101进行通信。 如果客户端代理 30部署在存储服务器节点 101上, 应用服务器 20将数据写请 求通过前端交换机 50发送给对应存储服务器节点 101上的客户端代理 30, 存 储服务器节点 101上的客户端代理 30再对数据写请求进行处理, 再通过后端 交换机 60与集群中的存储服务器节点 101进行通信。
当用户有文件数据需要写到分布式文件存储系统 10中时,通过客户端向 应用服务器 20上的客户端代理 30发起数据写请求。 为了方便说明, 将待写入 的文件数据称之为目标文件。所述数据写请求中携带有所述目标文件的文件 标识 FID、 偏移地址 offset、 文件长度 length等信息。 在本实施例中, 文件标 识 FID为 485, 偏移地址为 0K, 文件长度为 160K。 客户端代理 30根据目标文 件的文件标识从文件系统元数据信息表中获得分布式文件存储系统的冗余 配比信息以及条带块的大小, 所述冗余配比信息即 N和 M的值。 在本实施例 中, N的值为 6, M的值为 2, 条带块的大小为 128K。
客户端代理 30根据数据写请求中携带的偏移地址、文件长度信息以及获 取到的条带块的大小, 对目标文件进行条带化处理, 得到目标文件的有效条 带块的数量 DSC。 在本实施例中, 目标文件的偏移地址为 0K, 文件大小为 160K, 条带块的大小为 128K, 因此, 客户端代理对目标文件条带化处理, 生成 2个有效条带块。
客户端代理 30将目标文件条带化处理时切分出来的有效条带块的数量 DSC与获取的所述校验条带块的数量 M进行比较。 当有效条带块的数量 DSC 小于或者等于所述校验条带块的数量 M时, 即 DSC M时, 目标文件的实际 条带块的数量 N'等于校验条带块的数量 M+1 , 即 N'=M+1, 此时需要添加空 条带块, 需要添加的空条带块的数量 ESC等于实际条带块的数量 N'减去有效 条带块的数量 DSC, 即 ESC=N'-DSC=(M+1)-DSC。 当有效条带块的数量 DSC 大于所述校验条带块的数量 M时, 即 DSOM时, 目标文件的实际条带块的数 量 N'等于有效条带块的数量 DSC, 即 N'=DSC, 此时不需要添加空条带块。
在本实施例中, 有效条带块的数量 DSC为 2, 校验条带块的数量 M为 2, 有效条带块的数量 DSC等于校验条带块的数量 M, 即 DSC=M。此时, 目标文 件的实际条带块的数量 N'为校验条带块的数量 M+1 ,即 N'=M+1, N'=2+l=3。 目标文件被条带化处理时需要被切分为 3个条带块, 即目标文件的实际条带 块的数量 N'为 3, 而目标文件切分的有效条带块的数量 DSC为 2, 这样就只需 要添加 1个空条带块。 而在现有的实现方式中, 目标文件需要有 6个数据条带 块, 在有效条带块为 2的情况下, 需要添加 4个空条带块。 可见, 采用本发明 的方法, 可以大大减少空条带块的数量。
如果目标文件条带化时切分出的有效条带块的数量 DSC为 1, 校验条带 块的数量 M为 2时, 此时目标文件的有效条带块的数量 DSC小于校验条带块 的数量 M, 即 DSC<M。 此时, 目标文件的实际条带块的数量 N'为校验条带 块的数量 M+l, gPN'=M+l, N'=2+l=3。 目标文件条带化时需要被切分为 3 个条带块, 即目标文件的实际条带块的数量 N'为 3, 而目标文件切分的有效 条带块的数量 DSC为 1, 这样就需要添加 2个空条带块。
另外, 还有目标文件条带化时切分出来的有效条带块的数量 DSC大于校 验条带块的数量 M的情况, 即 DSC>M。例如, 客户端代理对目标文件进行条 带化处理时切分出的有效条带块的数量 DSC为 5, 此时 DSC>M, 实际条带块 的数量 N'=DSC, 即 N'=5, 此时不需要添加空条带块。
由此可见, 采用本发明的方法, 可以根据目标文件的大小动态调整目标 文件的实际条带块的数量 N',既能保证在任何情况下都可以正确读取到目标 文件, 又可以有效的减少空条带块的数量。 相应的, 针对空条带块的读写 10 操作减少, 分布式文件存储系统的 10性能得到提高。
客户端代理 30根据得到的实际条带块使用冗余算法生成 M个校验条带 块, 并在 N'个实际条带块和 M个校验条带块中添加一致性标签信息和有效条 带块的数量 DSC。所述一致性标签信息可以是相同的时间戳 timestamp或者版 本号信息。
例如, 客户端代理 30对目标文件韩版条带化处理得到 D1和 D2有效条带 ±夬。 此时目标文件的有效条带块的数量 DSC为 2, 将有效条带块的数量 2与校 验条带块的数量 2进行比较, 有效条带块的数量 DSC与校验条带块的数量 M 相等,目标文件的实际条带块的数量 N'为 3,此时需要添加一个空条带块 D3。 客户端代理根据条带块 Dl、 D2和 D3生成校验条带块 D7和 D8, 并在条带块 Dl、 D2、 D3、 D7和 D8中添加上时间戳 T1和有效条带块的数量 DSC=2。
客户端代理 30将携带了一致性标签信息和有效条带块的数量 DSC的实 际条带块和校验条带块写入相应的存储服务器节点 101中。 具体如何确认各 个条带块应该写入的存储服务器节点 101的方法与现有的实现方法相似, 在 此不再详细说明。客户端代理 30将目标文件的实际条带块的数量 N'和校验条 带块的数量 M以及各条带块的分布信息保存到文件的元数据信息表中, 以便 于读取目标文件时能够到相应的存储服务器节点 101中读取目标文件的条带 块。
例如, 客户端代理 30将带有时间戳 T1和有效条带块的数量 DSC=2的 Dl、 D2和 D3分别存储到存储服务器节点 1、存储服务器节点 2和存储服务器节点 3 中, D7和 D8分别存储到存储服务器节点 7和存储服务器节点 8中。 客户端代 理 30将目标文件的实际条带块的数量 N'=3、校验条带块的数量 M=2以及各条 带块的分布信息保存到文件的元数据信息表中。
对于客户端代理 30部署在存储服务器节点 101的分布式文件存储系统 10, 客户端代理 30则通过后端交换机 60来实现各条带块的存储等操作, 具体实现 方式与已有的实现方式相同, 在此不再另行说明。
由此可见, 在目标文件为小文件数据的场景下, 当对目标文件条带化处 理切分出的有效条带块的数量 DSC小于分布式文件存储系统的校验条带块 的数量 M时, 实际条带块的数量 N'等于校验条带块的数量 M+1 , 而 M的值一 般比较小, 此时只需要添加少许的空条带块即可; 当对目标文件条带化时切 分出的有效条带块的数量 DSC大于分布式文件存储系统的校验条带块的数 量 M时, 此时实际条带块的数量 N'等于有效条带块的数量 DSC不需要添加空 条带块。 这样当目标文件为小文件数据时, 将目标文件条带化处理生成 N' 个实际条带块, 而不是现有技术中的分布式文件存储系统的数据条带块的数 量 N, 减少了空条带块的数目, 有效的减少了空条带块的写操作, 减少了分 布式文件存储系统的 10操作, 提高了分布式文件存储系统的性能。特别是在 大型的分布式文件存储系统中,对分布式文件存储系统性能的提升效果更明 显。 下面这一实施例以数据处理请求为数据读请求为例进行详细的说明, 实 现的方法流程如附图 4所示。数据读请求是指用户需要读取目标文件的请求。 分布式文件存储系统 10的结构与数据处理请求为数据写请求应用的分布式 文件存储系统 10的结构相同,如附图 1所示。此方法实施例中,同样以 N为 6、 M为 2进行说明。 N和 M的值也可以根据需要设定为其他数值, 其实现原理相 同, 不再另行描述。 另外, 在本实施例中, 也以客户端代理 30部署在应用服 务器 20中为例进行说明。
当用户需要从分布式文件存储系统 10中读取存储的文件数据时, 向应用 服务器 20上的客户端代理 30发起数据读请求。 为了描述清楚, 将用户待读取 的文件数据称之为目标文件。所述数据读请求中携带有目标文件的文件标识 FID、 偏移地址 offset、 文件长度 length等信息。 客户端代理 30接收到所述数 据读请求之后, 根据所述数据读请求中携带的文件标识 FID从文件系统元数 据信息表中获得分布式文件存储系统 10的冗余配比信息,所述冗余配比信息 即 N和 M的值。 M是分布式文件存储系统为了对存储的文件数据进行冗余保 护的校验条带块的数量, M的具体取值可以是根据业务需要设定的。 N是对 文件数据进行条带化处理时切分的数据条带块的数量, N是根据分布式文件 存储系统的存储服务器节点数以及 M的值计算得到的, 也可以根据业务需要 设定一固定的值。所述分布式文件存储系统 10可以设定统一的 N和 M的取值, 也可以为某个目录设定不同的 N和 M的取值, 根据业务需求而定。 在本实施 例中, 以统一的 N和 M为例进行说明。分布式文件存储系统 10的 N和 M存储在 分布式文件存储系统 10的文件数据元数据信息表中。 为某个目录设备的 M和 N存储在目录元数据表中。 在本实施例中, 分布式文件存储系统 10的 N的值 为 6, M的值为 2。
应用服务器 20上的客户端代理 30接收到数据读请求之后,可以根据文件 标识获取目标文件的数据条带块的数量 N、 校验条带块的数量 M和条带块的 大小。 客户端代理根据所述数据读请求中携带的偏移地址、 文件长度信息以 及条带块的大小计算得到目标文件的有效条带块的数量 DSC。具体计算方法 与数据写请求方案中的方法类似, 在此不再另行说明。 下面的步骤中, 以目 标文件的有效条带块的数量为 2、 实际条带块的数量为 3、 校验条带块的数量 为 2进行说明。客户端代理 30可以通过数据读请求中携带的文件标识 FID查找 到目标文件的条带块的分布信息, 根据所述分布信息确定目标文件的实际条 带块和校验条带块存储在哪些存储服务器节点上, 也可以根据计算出来的实 际条带块的数量 N'以及校验条带块的数量确认存储有目标文件的实际条带 块和校验条带块的服务器节点的位置。
客户端代理 30生成新的数据块读请求, 并将所述数据块读请求发送给确 定的存储服务器节点 101。所述数据块读请求用于读取存储服务器节点 101中 的所述目标文件的条带块。
客户端代理 30可以将数据块读请求发送给确定的所有存储服务器节点 101; 也可以将数据块读请求发送给分布式文件存储系统 10中所有的存储服 务器节点 101。 客户端代理 30还可以将数据块读请求先发送给存储目标文件 的实际条带块的存储服务器节点 101, 在存储有目标文件实际条带块的存储 服务器节点 101反馈的响应不能正确读取到目标文件时, 再将数据块读请求 发送给存储目标文件的校验条带块的存储服务器节点 101。 本实施例中以最 后一种情况进行说明, 即先将数据块读请求先发送给存储目标文件实际条带 块的存储服务器节点 101, 在存储有目标文件实际条带块的存储服务器节点 反馈的响应不能正确读取到目标文件时,再将数据块读请求发送给存储目标 文件校验条带块的存储服务器节点 101。
存储服务器节点 101接收到数据块读请求之后, 根据数据块读请求中携 带的文件标识判断存储的对应的条带块是否可以读取, 如果可以读取, 则向 客户端代理发送可以读取的成功响应消息, 成功响应消息中携带有条带块的 一致性标签信息和有效条带块的数量 DSC信息; 如果没有存储对应的条带块 或者存储的对应的条带块损坏无法读取,则向客户端发送无法读取的失败响 应消息。
客户端代理 30将数据块读请求发送给存储目标文件实际条带块的存储 服务器节点 101, 并接收各存储服务器节点 101返回的数据块读请求的响应信 息。如果成功响应消息的数量与实际条带块的数量 N'相同, 并且成功响应消 息中的一致性标签信息和有效条带块的数量 DSC都相同, 此时可以读取到目 标文件, 客户端代理 30读取实际条带块并构造出目标文件发送给用户。 客户 端代理 30读取实际条带块以及构造目标文件的实现方法与已有的方法相同, 在此不再另行说明。
例如客户端代理 30收到数据读请求之后,根据数据读请求中携带的目标 文件的文件标识查找元数据信息表,得到目标文件的数据条带块的数量 N为 6、 校验条带块的数量 M为 2, 以及目标文件条带块的分布信息, 即实际条带块 D1存储在存储服务器节点 1, 实际条带块 D2存储在存储服务器节点 2, 实际 条带块 D3存储在存储服务器节点 3, 校验条带块 D7存储在存储服务器节点 7, 校验条带块 D8存储在存储服务器节点 8。 客户端代理 30根据数据读请求中携 带的信息以及通过目标文件的文件标识得到的信息计算出目标文件的有效 条带块的数量 DSC和目标文件的实际条带块的数量 N'。详细的计算方法请参 考数据写请求的流程中的相关描述。
客户端代理 30生成新的数据块读请求,所述数据块读请求用于读取存储 服务器节点 101中的所述目标文件的条带块。
客户端代理 30将数据块读请求发送给存储目标文件实际条带块的存储 服务器节点 1-3。 存储服务器节点 1-3返回可以读取的成功响应消息, 成功响 应消息中分别携带了条带块的时间戳信息和有效条带块的数量信息。客户端 代理 30判断所有响应消息中携带的时间戳信息和有效条带块的数量信息是 否都相同。 如果相同, 客户端代理 30读取实际条带块并构造出目标文件发送 给用户。 在本发明实施例中, 客户端代理 30只需要读取到 3个一致性标签信 息和有效条带块数量 DSC信息都相同的条带块即可, 而在现有技术中, 则需 要读取到 N个 (N=6) 个相同一致性标签信息和有效条带块数量相同的条带 块才可以得到正确的目标文件。 这样, 就需要多读取 3个空条带块, 对分布 式文件存储系统的 10操作造成浪费。在大型的分布式文件存储系统中, 数据 条带块的数量 N的值会更大, 相应的需要操作的空条带块的数量就越多, 对 分布式文件存储系统的 10性能造成的负担也越大。采取本实施例中提到的方 法, 可以有效的减少空条带块的读操作, 提高整个分布式文件存储系统的 10 性能。
如果存储目标文件实际条带块的存储服务器节点 101反馈的成功响应消 息的数量小于实际条带块的数量 N'或者成功响应消息中携带有相同的一致 性标签信息和有效条带块的数量 DSC信息的数量小于实际条带块的数量 N', 则进一步判断成功响应消息的数量或者带有相同的一致性标签信息和有效 条带块的数量 DSC信息的成功响应消息的数量是否大于分布式文件存储系 统的校验条带块的数量^1。
如果成功响应消息的数量并且带有相同一致性标签信息和有效条带块 数量信息的成功响应消息的数量大于分布式文件存储系统的校验条带块的 数量 M, 客户端代理 30则判断成功响应消息的数量并且带有相同一致性标签 信息和有效条带块数量信息的成功响应消息的数量是否大于或者等于目标 文件的有效条带块的数量 DSC, 如果大于或者等于目标文件的有效条带块的 数量 DSC,此时可以读取到正确的目标文件,客户端代理 30进行相应的处理, 具体的处理方法与已有的实现方法相同, 此处不再另行说明。 如果成功响应 消息的数量或者带有相同一致性标签信息和有效条带块数量信息的成功响 应消息的数量小于目标文件的有效条带块的数量 DSC, 客户端代理 30则将所 述数据块读请求发送给存储目标文件校验条带块的存储服务器节点 101, 并 接收存储目标文件校验条带块的存储服务器节点 101的响应消息。 客户端代 理 30再判断所有成功响应消息的数量并且带有相同一致性标签信息和有效 条带块数量信息的成功响应消息的数量是否大于或者等于目标文件的有效 条带块的数量 DSC, 如果大于或者等于目标文件的有效条带块的数量 DSC, 此时可以读取到正确的目标文件, 如果小于目标文件的有效条带块的数量 DSC, 则向用户反馈读取失败的信息。
如果成功响应消息的数量或者带有相同一致性标签信息和有效条带块 数量信息的成功响应消息的数量小于或者等于分布式文件存储系统的校验 条带块的数量 M, 客户端代理 30则将所述数据块读请求发送给存储目标文件 校验条带块的存储服务器节点 101, 并接收存储目标文件校验条带块的存储 服务器节点 101的响应消息。 客户端代理 30再判断所有成功响应消息的数量 或者带有相同一致性标签信息和有效条带块数量信息的成功响应消息的数 量是否大于所述于分布式文件存储系统的校验条带块的数量 M。 如果所有成 功响应的数量并且带有相同一致性标签信息和有效条带块数量信息的成功 响应消息的数量大于所述于分布式文件存储系统的校验条带块的数量 M, 则 判断所有成功响应消息的数量或者带有相同一致性标签信息和有效条带块 数量信息的成功响应消息的数量是否大于或者等于目标文件的有效条带块 的数量 DSC。如果所有成功响应消息的数量并且带有相同一致性标签信息和 有效条带块数量信息的成功响应消息的数量大于或者等于目标文件的有效 条带块的数量 DSC, 此时可以读取到正确的目标文件; 所有成功响应消息的 数量或者带有相同一致性标签信息和有效条带块数量信息的成功响应消息 的数量如果小于目标文件的有效条带块的数量 DSC, 则向用户反馈读取失败 的信息。如果所有成功响应消息的数量或者带有相同一致性标签信息和有效 条带块数量信息的成功响应消息的数量小于所述于分布式文件存储系统的 校验条带块的数量 M, 客户端代理 30向用户反馈读取失败的信息。 下面以目标文件的有效条带块的数量 DSC为 2、实际条带块的数量 N'为 3、 校验条带块的数量 M为 2为例进行说明。 目标文件的有效条带块 D1存储在存 储服务器节点 1中, 目标文件的有效条带块 D2存储服务器节点 2中,存储服务 器节点 3中存储有与目标文件的空条带块 D3 , 校验条带块 D7存储在存储服务 器节点 7中,校验条带块 D8存储在存储服务器节点 8中。条带块 Dl、 D2、 D3、 D7、 D8中的一致性标签信息是时间戳 Tl, 有效条带块的数量信息 DSC=2。 客户端代理 30先将接收到的用户的数据块读请求发送给存储服务器节点 1、 存储服务器节点 2和存储服务器节点 3。 当存储服务器节点 1、 存储服务器节 点 2、 存储服务器节点 3都发送成功响应消息给客户端代理 30时, 客户端代理 30判断有 3个条带块成功响应消息且成功响应消息中携带的时间戳和 DSC相 同, 即与目标文件的实际条带块的数量 3相同, 则此时可以读取到正确的目 标文件, 客户端代理 30读取条带块 Dl、 D2和 D3并构造出目标文件发送给用 户。 如果存储服务器节点 1和存储服务器节点 3反馈成功响应的消息, 成功响 应消息中分别携带了条带块 D1和 D3的时间戳 T1和有效条带块数量信息 DSC=2。 但是存储服务器节点 2由于存储的条带块 D2损坏而反馈失败响应的 消息, 这时只有 2个条带块成功响应, 即客户端代理收到 2条成功响应消息, 小于目标文件的实际条带块的数量 3, 客户端代理 30需要进一步将成功响应 消息的数量 2与分布式文件存储系统的校验条带块的数量 2进行比较。成功响 应消息的数量 2与分布式文件存储系统的校验条带块的数量 2相等, 则客户端 代理 30将数据块读请求发送给存储服务器节点 7和存储服务器节点 8。存储服 务器节点 7和存储服务器节点 8均向客户端代理 30反馈成功响应的消息, 成功 响应消息中分别携带了条带块 D7和 D8的时间戳 T1和有效条带块数量信息 DSC=2。客户端代理 30将收到的时间戳和有效条带块数量信息相同的成功响 应消息的数量之和(有效条带块的成功响应消息数量和校验条带块的成功响 应消息数量之和 (1+2=3 ) ) 与目标文件的有效条带块的数量 2进行比较, 此 时时间戳和有效条带块数量信息相同的成功响应消息的数量大于目标文件 的有效条带块的数量 2, 则此时可以读取到正确的目标文件, 客户端代理 30 读取条带块 Dl、 D7和 D8并构造出目标文件发送给用户。
数据处理请求还可以是数据删除请求或者数据截断请求, 实现的访求 流程如附图 5所示, 所应用的分布式文件存储系统 10的结构与其他数据处理 请求的相同, 如附图 1所示。 数据删除请求的实现方法与数据截断请求的实 现方法类似, 下面以数据删除请求为例进行说明, 数据截断请求的实现方法 不再另行说明。 在本实施例中, 分布式文件存储系统的 N的值为 6, M的值为 2。 N和 M的值也可以根据需要设定为其他数值, 其实现原理相同, 不再另行 描述。 另外, 在本实施例中, 也以客户端代理 30部署在应用服务器 20中为例 进行说明。
用户向应用服务器 20上的客户端代理 30发起数据删除请求,用户想删除 的文件数据称为目标文件,所述数据删除请求用于删除所述目标文件的存储 在各存储服务器节点中的相关条带块。所述数据删除请求中携带有目标文件 的文件标识 FID信息。 客户端代理 30接收到所述数据删除请求之后, 根据所 述数据删除请求中携带的文件标识 FID从文件系统元数据信息表中获得分布 式文件存储系统的冗余配比信息以及条带块的分布信息,所述冗余配比信息 即 N和 M的值。 M是分布式文件存储系统为了对存储的文件数据进行冗余保 护的校验条带块的数量, M的具体取值可以是根据业务需要设定的。 N是对 文件数据进行条带化时切分的数据条带块的数量, N是根据分布式文件存储 系统的存储服务器节点数以及 M的值计算得到的, 也可以根据业务需要设定 一固定的值。 所述分布式文件存储系统可以设定统一的 N和 M的取值, 也可 以为某个目录设定不同的 N和 M的取值,根据业务需求而定。在本实施例中, 以统一的 N和 M为例进行说明。分布式文件存储系统的 N和 M存储在分布式文 件存储系统的文件数据元数据信息表中。 为某个目录设备的 M和 N存储在目 录元数据表中。
这里以目标文件的分布式文件存储系统的校验条带块的数量 M为 2、 分 布式文件存储系统的数据条带块的数量 N为 6进行说明。
客户端代理根据获得的条带块的分布信息将数据删除请求发送给分布 式文件存储系统 10中对应的存储服务器节点 101。 如果接收到数据删除请求 的存储服务器节点 101中没有目标文件的条带块或者只有目标文件的空条带 块, 则向客户端代理 30返回删除对象不存在的响应消息。 如果接收到数据删 除请求的存储服务器节点 101中存储有目标文件的有效条带块或者校验条带 块, 则删除条带块后向客户端代理 30返回删除成功的响应消息。 如果无法删 除或者没有完全删除, 则向客户端代理 30反馈删除失败的响应消息。 客户端 代理 30接收到各存储服务器节点的响应消息后,判断接收到的删除对象不存 在的响应消息和删除成功的响应消息的总和是否大于或等于分布式文件存 储系统的数据条带块的数量 N。 也就是说, 目标文件存储在所述分布式文件 存储系统中的条带块的数量不能超过分布式文件存储系统的校验条带块的 数量 M, 这样才能确保目标文件被删除后, 无法再从分布式文件存储系统中 读取到。如果接收到的条带不存在的响应消息和删除成功的响应消息的总和 大于或等于分布式文件存储系统的数据条带块的数量 N, 客户端代理 30向用 户返回删除成功的响应消息。 否则, 客户端代理 30向用户返回删除失败的响 应消息。
例如客户端代理 30收到数据删除请求之后, 根据数据删除请求中携带的 目标文件的文件标识查找元数据信息表,得到分布式文件存储系统的校验条 带块的数量 M为 2, 分布式存储系统的分条条带的数量 N为 6以及条带块的分 布信息。客户端代理 30将接收到的数据删除请求根据条带块的分布信息发送 给分布式存储系统 30中对应的各个存储服务器节点 101。 例如, 存储服务器 节点 1存储有目标文件的有效条带块 Dl,存储服务器节点 2中存储有目标文件 的有效条带块 D2,客户端代理 30将接收到的数据删除请求发送给存储服务器 节点 1和存储服务器节点 2; 存储服务器节点 1和存储服务器节点 2删除条带块 D1和 D2, 并在成功删除后向客户端代理 30返回删除成功的响应消息。 存储 服务器节点 3中存储有目标文件的空条带块, 存储服务器节点 4、 存储服务器 节点 5和存储服务器节点 6中没有目标文件的条带块, 存储服务器节点 3-6在 接收到数据删除请求之后, 分别向客户端代理 30反馈删除对象不存在的响应 消息。 存储服务器节点 7和存储服务器节点 8中存储有目标文件校验条带块, 删除相应的条带块 D7和 D8, 并在成功删除后向客户端代理 30返回删除成功 的响应消息。 客户端代理 30接收到各存储服务器节点 101返回的数据删除成 功的响应消息 4条, 删除对象不存在的响应消息 4条, 接收到的删除成功响应 消息和删除对象不存在的响应消息共 8条, 大于分布式文件存储系统的分条 条带的数量, 客户端代理 30向用户返回删除成功的响应。在数据处理请求为 数据删除请求的情况下,只有返回删除成功的响应消息的数量与返回删除对 象不存在的响应消息的数量之和大于或者等于分布式文件存储系统的数据 条带块的数量 N时, 客户端代理才能向用户返回删除成功的消息。 在数据删 除操作中需要与分布式文件存储系统的数据条带块的数量 N进行比较, 数据 写请求以及数据读请求中则需要与目标文件的实际数据条带块的数量 N'进 行比较。这主要是为了确认目标文件的条带块被删除后确保无法再次读取到 目标文件。
同样的, 由截断引起的分条数据的删除方法, 与删除数据条带块的方法 相同, 在此不再另行说明。
通过本发明中提供的方法, 在将文件写入分布式文件存储系统时, 根据 待写入的文件的大小, 采用 N'+M的冗余保护机制, 即将目标文件条带化时 根据大小生成不同条带块的数量, 即实际条带块数量 N'。 这样, 既能保证可 以在任何情况下正确的获取到正确的目标文件,又能有效地减少分布式文件 存储系统中空条带块的数量,减少分布式文件存储系统中磁盘 10和网络 10的 数量, 提高分布式文件存储系统的性能。
本发明还提供了一种实现分布式文件存储系统中数据处理方法的设备。 在本发明实施例中, 该设备可以为客户端代理。 所述设备可以部署在与所述 分布式文件存储系统相连的应用服务器中, 也可以部署在所述分布式文件存 储系统中的各个存储服务器节点中, 如附图 1所示。
如附图 1所示, 分布式文件存储系统 10包含有多个存储服务器 101, 多个 存储服务器 101之间通过低延迟、高吞吐量的网络(例如 IB网络、 10G以太网) 互连形成集群。 分布式文件存储系统 10还包括前端 (Front-End) 交换机 50 和后端 (Back-End)交换机 60。 前端交换机 50用于用户数据与集群之间进行 外部业务请求与数据交互。后端交换机 60用于集群内部各个存储服务器节点 101之间内部请求与数据交互。 应用服务器 20通过所述前端交换机 50与所述 分布式文件存储系统 10进行通信。
所述设备部署在与所述分布式文件存储系统 10相连的应用服务器 20中 时,各应用通过标准的可移植操作系统接口(英文: Portable Operating System Interface, 缩写: POSIX) 直接访问部署在应用服务器 20上的文件系统客户 端代理 CA (Client Agent) 30。 客户端代理 30作为分布式文件存储系统 10对 外提供服务的门户, 收到应用的请求后再与集群内部的存储服务器 101交互。 当所述设备部署在所述分布式文件存储系统中的各个存储服务器节点 101中 时, 各应用通过常用的 NAS协议(如 NFS/CIFS等)客户端访问相应的网络附 加存储的服务器端 (Network Attached Storage Server, NAS Server) , 而 NAS Server与存储服务器部署在一起, NAS Server再访问部署在该 Server节点上的 文件系统客户端代理实现存储业务。 为了清楚说明本发明的实现原理, 现采 用第一种访问方式进行具体说明, 第二种访问方式采用类似的实现原理。 为了清楚的描述所述设备的数据处理方式, 以下以所述设备为客户端代 理为例进行说明。
以附图 1中所示的业务系统为例, 业务系统中包含有 2个应用服务器 20, 所述应用服务器 20通过前端交换机 50与分布式文件存储系统 10进行通信。所 述客户端代理 30部署在所述应用服务器 20中,用户的数据处理请求先发送到 应用服务器 20的客户端代理 30, 客户端代理 30对数据处理请求进行相应的处 理。 在客户端代理 30部署在存储服务器节点 101的情况下, 应用服务器 20接 收到用户的数据处理请求之后,通过前端交换机 50将数据处理请求发送到对 应的存储服务器节点 101中的客户端代理 30, 由存储服务器节点 101中的客户 端代理 30对数据处理请求进行处理。
分布式文件存储系统 10中的存储服务器 101互连形成集群, 构成一个大 型的网络 RAID, 存储的数据采取 N+M的冗余保护机制。 其中, M是分布式 文件存储系统 10为了对存储的文件数据进行冗余保护的校验条带块的数量, M的具体取值可以根据业务需要设定一个固定的值。 N是对文件数据进行条 带化时切分的数据条带块的数量, N是根据分布式文件存储系统 10的存储服 务器节点数以及 M的值计算得到的, 也可以根据业务需要设定一固定的值。 所述分布式文件存储系统 10可以设定统一的 N和 M的取值, 也可以为某个目 录设定不同的 N和 M的取值, 根据业务需求而定。 分布式文件存储系统的 N 和 M存储在分布式文件存储系统的文件数据元数据信息表中。 为了使描述更 清楚, 将一个待处理的文件数据称为一个分条数据, 一个分条数据条带化时 会被切分为 N个数据条带块, 并根据冗余算法生成 M个校验条带块。
客户端代理 30接包含有接收模块 301、 处理模块 303和发送模块 305。 所述客户端代理 30的接收模块 301用于接收应用发送的数据处理请求, 所述数据处理请求中携带有目标文件的文件标识 FID、 偏移地址 offset、 文件 长度 length等信息。 所述目标文件为待处理的文件。 所述处理模块 303根据数 据处理请求中携带的偏移地址和长度信息计算出文件数据的有效条带块的 数量 (Data Strip Count, DSC)。 有效条带块是指包含有文件数据的数据条带 块。在小文件数据场景下, 文件数据条带化时切分出来的有效条带块的数量 DSC小于分布式文件存储系统的数据条带块的数量 N。 分布式文件存储系统 10的规模越大, 小文件数据场景下, 有效条带块的数量 DSC与分布式文件存 储系统的数据条带块的数量 N的差距也就越大。
当客户端代理 30部署在分布式文件存储系统 10的服务器节点 101上时, 应用服务器 20接收到数据处理请求之后,将数据请求通过前端交换机 50发送 给某个存储服务器 101上的客户端代理 30。 应用服务器 20将数据处理请求发 送给存储服务器 101上的客户端代理 30的方法与现有方法类似, 在此不再详 述。 存储服务器 101上的客户端代理 30对数据处理请求的方法与应用服务器 20上的客户端代理 30的处理方式类似, 不再另行描述。
处理模块 303将计算出的有效条带块的数量 DSC与校验条带块的数量 M 进行比较。 根据冗余算法的大多数原则, 为了确保在处理过程中无论出现何 种故障都可以获取到正确的文件数据, 就需要读取到超过校验条带块数量 M 的条带块数量。 当有效条带块的数量 DSC小于或者等于校验条带块的数量 M 时, 即 DSC M时, 实际条带块的数量 N'等于校验条带块的数量 M+1 , 即 N'=M+1。 实际条带块的数量 N'与有效条带块的数量 DSC之间的数量差需要 添加空条带块来补充, 即此时需要添加的空条带块的数量为 ESC=N'-DSC= (M+1 ) -DSC。这样可以减少分布式文件存储系统中空条带块的数量, 也相 应的减少了空条带块的读写 10操作, 提高了分布式文件存储系统的 10性能。
当有效条带块的数量 DSC大于校验条带块的数量 M时, 即 DSC>M时, 实 际条带块的数量 N'等于有效条带块的数量 DSC, 即 N'=DSC。 此时空条带块 的数量为 0, 也就是说不需要添加空条带块, 减少了分布式文件存储系统中 的空条带块的数量。 当文件数据比较大时, 有效条带块的数量 DSC可能与分布式文件存储系 统的数据条带块的数量 N相同,此时的空条带块的数量为 0, 即不需要添加空 条带块。
通过根据文件数据的大小, 对实际条带块的数量 N'进行动态调整, 特别 是在小文件数据场景下,采用实际条带块的数量 N'而不是分布式文件存储系 统的数据条带块的数量 N, 有效的减少了分布式文件存储系统中空条带块的 数量。相应的, 针对空条带块的读写 10操作减少, 分布式文件存储系统的 10 性能得到提高。
对 N'个实际条带块进行相应的处理。具体处理方式与数据处理请求的类 型不同而各异。
所述发送模块 305用于将所述处理模块 303处理的结果发送给用户。 当所述数据处理为数据写请求时, 所述接收模块 301用于接收所述数据 写请求。
所述处理模块 303根据数据写请求中携带的偏移地址、 文件长度信息以 及获取到的条带块的大小, 对目标文件进行条带化处理, 得到目标文件的有 效条带块的数量 DSC。
所述处理模块 303将目标文件条带化时切分出来的有效条带块的数量 DSC与获取的所述校验条带块的数量 M进行比较。 当有效条带块的数量 DSC 小于或者等于所述校验条带块的数量 M时, 即 DSC M时, 目标文件的实际 条带块的数量 N'等于校验条带块的数量 M+1 , 即 N'=M+1, 此时需要添加空 条带块, 需要添加的空条带块的数量 ESC等于实际条带块的数量 N'减去有效 条带块的数量 DSC, 即 ESC=N'-DSC=(M+1)-DSC。 当有效条带块的数量 DSC 大于所述校验条带块的数量 M时, 即 DSOM时, 目标文件的实际条带块的数 量 N'等于有效条带块的数量 DSC, 即 N'=DSC, 此时不需要添加空条带块。
所述处理模块 303根据得到的实际条带块使用冗余算法生成 M个校验条 带块, 并在 N'个实际条带块和 M个校验条带块中添加一致性标签信息和有效 条带块的数量 DSC。所述一致性标签信息可以是相同的时间戳 timestamp或者 版本号信息。
所述处理模块 303将携带了一致性标签信息和有效条带块的数量 DSC的 实际条带块和校验条带块写入相应的存储服务器节点中。具体如何确认各个 条带块应该写入的存储服务器节点的方法与现有的实现方法相似,在此不再 详细说明。 所述处理模块 303将目标文件的实际条带块的数量 N'和校验条带 块的数量 M以及各条带块的分布信息保存到文件的元数据信息表中, 以便于 读取目标文件时能够到相应的存储服务器节点中读取目标文件的条带块。
由此可见, 在目标文件为小文件数据的场景下, 当对目标文件条带化处 理切分出的有效条带块的数量 DSC小于分布式文件存储系统的校验条带块 的数量 M时, 实际条带块的数量 N'等于校验条带块的数量 M+1 , 而 M的值一 般比较小, 此时只需要添加少许的空条带块即可; 当对目标文件条带化时切 分出的有效条带块的数量 DSC大于分布式文件存储系统的校验条带块的数 量 M时, 此时实际条带块的数量 N'等于有效条带块的数量 DSC不需要添加空 条带块。 这样当目标文件为小文件数据时, 将目标文件条带化处理生成 N' 个实际条带块, 而不是现有技术中的分布式文件存储系统的数据条带块的数 量 N, 减少了空条带块的数目, 有效的减少了空条带块的写操作, 减少了分 布式文件存储系统 10的 10操作, 提高了分布式文件存储系统 10的性能。特别 是在大型的分布式文件存储系统 10中,对分布式文件存储系统 10性能的提升 效果更明显。
当数据处理请求为数据读请求时,将用户待读取的文件数据称之为目标 文件。所述数据读请求即是需要读取到目标文件存储在相应服务器节点中的 各条带块, 并构造还原出原目标文件的请求。
所述接收模块 301用于接收所述数据读请求, 所述数据读请求中携带有 目标文件的文件标识 FID、 偏移地址 offset、 文件长度 length等信息。 所述处 理模块 303根据所述数据读请求中携带的文件标识 FID从文件系统元数据信 息表中获得分布式文件存储系统的冗余配比信息, 所述冗余配比信息即 N和 M的值。 M是分布式文件存储系统为了对存储的文件数据进行冗余保护的校 验条带块的数量, M的具体取值可以是根据业务需要设定的。 N是对文件数 据进行条带化时切分的数据条带块的数量, N是根据分布式文件存储系统的 存储服务器节点数以及 M的值计算得到的, 也可以根据业务需要设定一固定 的值。 所述分布式文件存储系统可以设定统一的 N和 M的取值, 也可以为某 个目录设定不同的 N和 M的取值, 根据业务需求而定。 在本实施例中, 以统 一的 N和 M为例进行说明。分布式文件存储系统的 N和 M存储在分布式文件存 储系统的文件数据元数据信息表中。 为某个目录设备的 M和 N存储在目录元 数据表中。
所述处理模块 303可以根据文件标识查找目标文件的数据条带块的数量 N、校验条带块的数量 M和条带块的大小。所述处理模块 303根据所述数据读 请求中携带的偏移地址、文件长度信息以及条带块的大小计算得到目标文件 的有效条带块的数量 DSC。 具体计算方法与数据写请求方案中的方法类似, 在此不再另行说明。 所述处理模块 303可以通过数据读请求中携带的文件标 识查找到目标文件的条带块的分布信息,根据所述公布信息确定目标文件的 实际条带块和校验条带块存储在哪些存储服务器节点上,还可以根据计算出 来的实际条带块的数量 N'以及校验条带块的数量确认存储有目标文件的实 际条带块和校验条带块的服务器节点的位置。 所述处理模块 303可以将数据 读请求发送给确定的所有存储服务器节点; 也可以将数据读请求发送给分布 式文件存储系统中所有的存储服务器节点。 所述处理模块 303还可以将数据 读请求先发送给存储目标文件的实际条带块的存储服务器节点,在存储有目 标文件实际条带块的存储服务器节点反馈的响应不能正确读取到目标文件 时, 再将数据读请求发送给存储目标文件的校验条带块的存储服务器节点。 本实施例中以最后一种情况进行说明, 即所述处理模块 303先将数据读请求 先发送给存储目标文件实际条带块的存储服务器节点,在存储有目标文件实 际条带块的存储服务器节点反馈的响应不能正确读取到目标文件时, 再将数 据读请求发送给存储目标文件校验条带块的存储服务器节点。
所述处理模块生成新的数据块读请求, 并将所述数据块读请求发送给确 定的存储服务器节点 101。所述数据块读请求用于读取存储服务器节点 101中 的所述目标文件的条带块。
存储服务器节点接收到数据块读请求之后, 根据数据块读请求中携带的 文件标识判断存储的对应的条带块是否可以读取, 如果可以读取, 则向所述 处理模块 303发送可以读取的成功响应消息, 成功响应消息中携带有条带块 的一致性标签信息和有效条带块的数量 DSC信息; 如果没有存储对应的条带 块或者存储的对应的条带块损坏无法读取, 则向所述处理模块 303发送无法 读取的失败响应消息。
所述处理模块 303将数据块读请求发送给存储目标文件实际条带块的存 储服务器节点, 并接收各存储服务器节点返回的数据块读请求的响应信息。 如果成功响应消息的数量与实际条带块的数量 N'相同,并且成功响应消息中 的一致性标签信息和有效条带块的数量 DSC都相同, 此时可以读取到目标文 件, 所述处理模块 303读取实际条带块并构造出目标文件发送给用户。 所述 处理模块 303读取实际条带块以及构造目标文件的实现方法与已有的方法相 同, 在此不再另行说明。 如果存储目标文件实际条带块的存储服务器节点反 馈的成功响应消息的数量小于实际条带块的数量 N'或者带有相同的一致性 标签信息和有效条带块的数量信息的成功响应消息的数量小于实际条带块 的数量 N',则进一步判断成功响应消息的数量或者带有相同的一致性标签信 息和有效条带块的数量信息的成功响应消息的数量是否大于分布式文件存 储系统的校验条带块的数量 M。
如果存储目标文件实际条带块的存储服务器节点反馈的成功响应消息 的数量小于实际条带块的数量 N'或者带有相同的一致性标签信息和有效条 带块的数量信息的成功响应消息的数量小于实际条带块的数量 N',则进一步 判断成功响应消息的数量或者带有相同的一致性标签信息和有效条带块的 数量 DSC信息的成功响应消息的数量是否大于分布式文件存储系统的校验 条带块的数量 M。
如果成功响应消息的数量并且带有相同一致性标签信息和有效条带块 数量信息的成功响应消息的数量大于分布式文件存储系统的校验条带块的 数量 M, 所述处理模块 303则判断成功响应消息的数量或者带有相同一致性 标签信息和有效条带块数量信息的成功响应消息的数量是否大于或者等于 目标文件的有效条带块的数量 DSC, 如果成功响应消息的数量并且带有相同 一致性标签信息和有效条带块数量信息的成功响应消息的数量大于或者等 于目标文件的有效条带块的数量 DSC, 此时可以读取到正确的目标文件, 所 述处理模块 303进行相应的处理, 具体的处理方法与已有的实现方法相同, 此处不再另行说明。如果成功响应消息的数量或者带有相同一致性标签信息 和有效条带块数量信息的成功响应消息的数量小于目标文件的有效条带块 的数量 DSC, 所述处理模块 303则将所述数据读请求发送给存储目标文件校 验条带块的存储服务器节点, 并接收存储目标文件校验条带块的存储服务器 节点的响应消息。 所述处理模块 303再判断所有成功响应消息的数量或者带 有相同一致性标签信息和有效条带块数量信息的成功响应消息的数量是否 大于或者等于目标文件的有效条带块的数量 DSC, 所有成功响应消息的数量 并且带有相同一致性标签信息和有效条带块数量信息的成功响应消息的数 量如果大于或者等于目标文件的有效条带块的数量 DSC, 此时可以读取到正 确的目标文件, 如果所有成功响应消息的数量或者带有相同一致性标签信息 和有效条带块数量信息的成功响应消息的数量小于目标文件的有效条带块 的数量 DSC, 则向用户反馈读取失败的信息。
如果成功响应消息的数量或者带有相同一致性标签信息和有效条带块 数量信息的成功响应消息的数量小于或者等于分布式文件存储系统的校验 条带块的数量 M, 所述处理模块 303则将所述数据块读请求发送给存储目标 文件校验条带块的存储服务器节点, 并接收存储目标文件校验条带块的存储 服务器节点的响应消息。 所述处理模块 303再判断所有成功响应消息的数量 或者带有相同一致性标签信息和有效条带块数量信息的成功响应消息的数 量是否大于所述于分布式文件存储系统的校验条带块的数量 M。 如果所有成 功响应的数量并且带有相同一致性标签信息和有效条带块数量信息的成功 响应消息的数量大于所述于分布式文件存储系统的校验条带块的数量 M, 则 判断所有成功响应消息的数量或者带有相同一致性标签信息和有效条带块 数量信息的成功响应消息的数量是否大于或者等于目标文件的有效条带块 的数量 DSC。如果所有成功响应消息的数量并且带有相同一致性标签信息和 有效条带块数量信息的成功响应消息的数量大于或者等于目标文件的有效 条带块的数量 DSC, 此时可以读取到正确的目标文件; 如果所有成功响应消 息的数量或者带有相同一致性标签信息和有效条带块数量信息的成功响应 消息的数量小于目标文件的有效条带块的数量 DSC, 则向用户反馈读取失败 的信息。如果所有成功响应消息的数量或者带有相同一致性标签信息和有效 条带块数量信息的成功响应消息的数量小于所述于分布式文件存储系统的 校验条带块的数量 M, 向用户反馈读取失败的信息。
数据处理请求还可以是数据删除请求或者数据截断请求, 数据删除请求 的实现方法与数据截断请求的实现方法类似, 下面以数据删除请求为例进行 说明, 数据截断请求的实现方法不再另行说明。
用户想删除的文件数据称为目标文件,所述数据删除请求用于删除所述 目标文件的存储在各存储服务器节点中的相关条带块。所述数据删除请求中 携带有目标文件的文件标识 FID信息。所述接收模块 301接收所述数据删除请 求;所述处理模块 303根据接收到的所述数据删除请求中携带的文件标识 FID 从文件系统元数据信息表中获得分布式文件存储系统的冗余配比信息以及 条带块的分布信息, 所述冗余配比信息即 N和 M的值。 M是分布式文件存储 系统为了对存储的文件数据进行冗余保护的校验条带块的数量, M的具体取 值可以是根据业务需要设定的。 N是对文件数据进行条带化时切分的数据条 带块的数量, N是根据分布式文件存储系统的存储服务器节点数以及 M的值 计算得到的, 也可以根据业务需要设定一固定的值。 所述分布式文件存储系 统可以设定统一的 N和 M的取值,也可以为某个目录设定不同的 N和 M的取值, 根据业务需求而定。 在本实施例中, 以统一的 N和 M为例进行说明。 分布式 文件存储系统的 N和 M存储在分布式文件存储系统的文件数据元数据信息表 中。 为某个目录设备的 M和 N存储在目录元数据表中。
所述处理模块 303根据获得的条带块的分布信息将数据删除请求发送给 分布式文件存储系统中对应的存储服务器节点。如果接收到数据删除请求的 存储服务器节点中没有目标文件的条带块或者只有目标文件的空条带块, 则 向所述处理模块 303返回删除对象不存在的响应消息。 如果接收到数据删除 请求的存储服务器节点中存储有目标文件的有效条带块或者校验条带块, 则 删除条带块后向处理模块 303返回删除成功的响应消息。 如果无法删除或者 没有完全删除, 则向处理模块 303反馈删除失败的响应消息。 所述处理模块 303接收到各存储服务器节点的响应消息后, 判断接收到的删除对象不存在 的响应消息和删除成功的响应消息的总和是否大于或等于分布式文件存储 系统的数据条带块的数量 N。 也就是说, 目标文件存储在所述分布式文件存 储系统中的条带块的数量不能超过分布式文件存储系统的校验条带块的数 量 M, 这样才能确保目标文件被删除后, 无法再从分布式文件存储系统中读 取到。如果接收到的条带不存在的响应消息和删除成功的响应消息的总和大 于或等于分布式文件存储系统的数据条带块的数量 N, 向用户返回删除成功 的响应消息。 否则, 向用户返回删除失败的响应消息。
本发明实施例中提供的设备, 在将文件写入分布式文件存储系统时, 根 据待写入的文件的大小, 采用 N'+M的冗余保护机制, 即将目标文件条带化 时根据大小生成不同条带块的数量, 即实际条带块数量 N'。 这样, 既能保证 可以在任何情况下正确的获取到正确的目标文件,又能有效地减少分布式文 件存储系统中空条带块的数量,减少分布式文件存储系统中磁盘 10和网络 10 的数量, 提高分布式文件存储系统的性能。
本领域普通技术人员可以意识到,本文所描述的实施例中的各示例性单 元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。 这些功能究竟以硬件还是软件形式来实现, 取决于技术方案的特定应用和设 计约束条件。专业技术人员可以针对特定的应用选择不同的方法来实现所描 述的功能, 但是这种实现不应认为超出本发明的范围。
如果以计算机软件的形式来实现所述功能并作为独立的产品销售或使 用时, 则在一定程度上可认为本发明的技术方案的全部或部分(例如对现有 技术做出贡献的部分)是以计算机软件产品的形式体现的。 该计算机软件产 品通常存储在计算机可读取的非易失性存储介质中,包括若干指令用以使得 计算机设备(可以是个人计算机、 服务器、 或者网络设备等)执行本发明各 实施例方法的全部或部分步骤。 而前述的存储介质包括 U盘、 移动硬盘、 只 读存储器 (ROM, Read-Only Memory )、 随机存取存储器 (RAM, Random Access Memory), 磁碟或者光盘等各种可以存储程序代码的介质。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内想到的变 化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应 所述以权利要求的保护范围为准。

Claims

权利要求书
1、 一种应用于分布式文件存储系统的数据处理方法, 其特征在于, 所 述方法包括:
客户端代理接收用户的数据处理请求,所述数据处理请求中携带有目标 文件的文件标识、 偏移地址和文件长度等信息; 所述目标文件为所述数据处 理请求中需要处理的文件;
客户端代理根据所述数据处理请求中携带的所述目标文件的文件标识 获得冗余配比信息,所述冗余配比信息包括所述分布式文件存储系统的数据 条带块的数量 N和所述分布式文件存储系统的校验条带块的数量 M;
根据所述数据处理请求中携带的所述目标文件的偏移地址和长度信息 确定所述目标文件的有效条带块的数量 DSC, 所述有效条带块为包含有所述 目标文件的数据的条带块;
根据所述有效条带块的数量 DSC和所述校验条带块的数量 M确定所述 目标文件的实际条带块的数量 N';
根据所述实际条带块的数量 N'确定对应的条带块并进行处理。
2、 根据权利要求 1所述的方法, 其特征在于, 所述根据所述有效条带块 的数量 DSC和所述校验条带块的数量 M确定所述目标文件的实际条带块数 量 N'具体为:
若所述有效条带块的数量 DSC小于或等于所述校验条带块的数量 M, 则 所述目标文件的实际条带块的数量 N'为所述校验条带块的数量 M+1 , 即 N,=M+1 ;
若所述有效条带块的数量 DSC大于所述校验条带块的数量 M, 则所述目 标文件的实际条带块数量 N'等于所述有效条带块的数量 DSC,即 N'=DSC。
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述校验条带块的数 量 M和数据条带块的数量 N可以有多组, 分别存储在对应的目录信息表中。
4、 根据权利要求 1-3任一所述的方法, 其特征在于, 当所述数据处理请 求为数据写请求时,所述根据所述实际条带块的数量 N'确定对应的条带块并 进行处理还包括:
对所述目标文件进行条带化处理, 得到 N'个实际条带块, 并使用冗余算 法生成 M个校验条带块;
在所述 N'个实际条带块和所述 M个校验条带块中添加一致性标签信息 和有效条带块的数量 DSC信息; 所述一致性标签信息可以为时间戳或者版本 号;
将所述 N'个实际条带块和所述 M个校验条带块写到对应的存储服务器 节点中。
5、 根据权利要求 1-3任一所述的方法, 其特征在于, 当所述数据处理请 求为数据读请求时, 所述方法还包括, 根据所述文件标识获取所述目标文件 的条带块的分布信息;所述根据所述实际条带块的数量 N'确定对应的条带块 并进行处理具体为:
生成新的数据块读请求,所述数据块读请求用于读取存储服务器节点中 的所述目标文件的条带块;
将所述数据块读请求根据获取到的目标文件的条带块的分布信息发送 给存储实际条带块的存储服务器节点;
接收所述存储实际条带块的存储服务器节点的响应消息;所述响应消息 为可以读取的成功响应消息或无法读取的失败响应消息,所述成功响应消息 中携带有实际条带块的一致性标签信息和有效条带块的数量 DSC信息; 根据接收到的所述响应消息判断是否可以读取到所述目标文件。
6、 根据权利要求 5所述的方法, 其特征在于, 根据接收到的所述响应 消息判断是否可以读取到所述目标文件具体为:
若接收到的成功响应消息的数量等于所述实际条带块的数量 N', 并且 所述成功响应消息中携带的一致性标签信息和有效条带块的数量 DSC信息 都相同, 则可以读取到所述目标文件;
若接收到的成功响应消息的数量小于所述实际条带块的数量 Ν', 则判 断所述接收到的成功响应消息的数量是否大于所述校验条带块的数量 Μ; 若所述接收到的成功响应消息的数量大于所述校验条带块的数量 Μ,则 判断成功响应消息的数量是否大于或等于所述目标文件的有效条带块的数 量 DSC, 并且所述成功响应消息中携带的一致性标签信息和有效条带块的 数量 DSC信息都相同; 若是, 则可以读取到所述目标文件; 否则, 根据获 取到的分布信息将所述数据块读请求发送给存储校验条带块的存储服务器 节点; 若所述存储校验条带块的存储服务器节点返回的成功响应消息的数 量大于或等于所述目标文件的有效条带块的数量 DSC, 并且所述成功响应 消息中携带的一致性标签信息和有效条带块的数量 DSC信息都相同, 则可 以读取到所述目标文件; 若所述存储校验条带块的存储服务器节点返回的 成功响应消息的数量小于所述目标文件的有效条带块的数量 DSC, 或者所 述成功响应消息中携带的一致性标签信息和有效条带块的数量 DSC信息不 相同, 则无法读取到目标文件。
7、 根据权利要求 6所述的方法, 其特征在于, 所述方法还包括: 若接收到的成功响应消息的数量小于或等于所述校验条带块的数量 Μ, 根据获取到的分布信息将所述数据块读请求发送给存储校验条带块的存储 服务器节点;
接收所述存储校验条带块的存储服务器节点返回的响应消息; 根据所述存储校验条带块的存储服务器节点返回的响应消息判断是否 可以读取到所述目标文件。
8、 一种实现分布式文件存储系统中数据处理方法的设备 30, 其特征在 于: 所述设备 30与所述分布式文件存储系统 10中的存储服务器节点 101进行 通信, 所述设备包括接收模块 301、 处理模块 303和发送模块 305 :
所述接收模块 301用于接收用户的数据处理请求, 所述数据处理请求中 携带有目标文件的文件标识、 偏移地址和文件长度等信息; 所述目标文件为 所述数据处理请求中需要处理的文件;
所述处理模块 303用于:
根据所述数据处理请求中携带的所述目标文件的文件标识从所述存 储服务器节点中获得冗余配比信息, 所述冗余配比信息包括所述分布式 文件存储系统的数据条带块的数量 N和所述分布式文件存储系统的校验 条带块的数量 M;
根据所述数据处理请求中携带的所述目标文件的偏移地址和长度信 息确定所述目标文件的有效条带块的数量 DSC, 所述有效条带块为包含 有所述目标文件的数据的条带块;
根据所述有效条带块的数量 DSC和所述校验条带块的数量 M确定所 述目标文件的实际条带块的数量 N' ;
根据所述实际条带块的数量 N'确定对应的条带块并进行处理; 所述发送模块 305用于将处理结果反馈给所述用户。
9、 根据权利要求 8所述的设备, 其特征在于, 所述设备 30位于与所述分 布式文件存储系统 10相连的应用服务器 20中或者所述分布式文件存储系统 中 10的存储服务器节点 101中。
10、 根据权利要求 8或 9所述的设备, 其特征在于, 所述根据所述有效条 带块的数量 DSC和所述校验条带块的数量 M确定所述目标文件的实际条带 块数量 N'具体为:
若所述有效条带块的数量 DSC小于或等于所述校验条带块的数量 M, 则 所述目标文件的实际条带块的数量 N'为所述校验条带块的数量 M+1 , 即 N,=M+1 ;
若所述有效条带块的数量 DSC大于所述校验条带块的数量 M, 则所述目 标文件的实际条带块数量 N'等于所述有效条带块的数量 DSC,即 N'DSC。
11、 根据权利要求 8-10任一所述的设备, 其特征在于, 当所述数据处理 请求为数据写请求时,所述根据所述实际条带块的数量 N'确定对应的条带块 并进行处理还包括:
对所述目标文件进行条带化处理, 得到 N'个实际条带块, 并使用冗余算 法生成 M个校验条带块;
在所述 N'实际条带块和所述 M个校验条带块中添加一致性标签信息和 有效条带块的数量 DSC信息; 所述一致性标签信息可以为时间戳或者版本号; 将所述 N'个实际条带块和所述 M个校验条带块写到对应的存储服务器 节点中。
12、 根据权利要求 8-11任一所述的设备, 其特征在于, 当所述数据处理 请求为数据写请求时,所述根据所述实际条带块的数量 N'确定对应的条带块 并进行处理还包括: 生成新的数据块读请求,所述数据块读请求用于读取存储服务器节点中 的所述目标文件的条带块。
将所述数据块读请求根据获取到的目标文件的条带块的分布信息发送 给存储实际条带块的存储服务器节点;
接收所述存储实际条带块的存储服务器节点的响应消息;所述响应消息 为可以读取的成功响应消息或无法读取的失败响应消息,所述成功响应消息 中携带有实际条带块的一致性标签信息和有效条带块的数量 DSC信息; 根据接收到的所述响应消息判断是否可以读取到所述目标文件。
13、 根据权利要求 8-12任一所述的设备, 其特征在于:
所述接收模块还用于接收用户的数据删除请求,所述数据删除请求中携 带有目标文件的文件标识; 所述目标文件为需要删除的文件;
所述处理模块根据所述文件标识从所述存储服务器节点中获得冗余配 比信息,所述冗余配比信息包括所述分布式文件存储系统的数据条带块的数 量 N和所述分布式文件存储系统的校验条带块的数量 M;
根据所述分布式文件存储系统的数据条带块的数量 N确认存储所述目标 文件的条带块的存储服务器节点;
将所述数据删除请求发送给所述存储所述目标文件的条带块的存储服 务器节点;
接收所述存储所述目标文件的条带块的存储服务器节点的响应消息;所 述响应消息为删除成功的响应消息、删除对应不存在的响应消息以及删除失 败的响应消息中的一种;
当接收到的删除成功的响应消息和删除对象不存在的响应消息的数量 超过所述数据条带块的数量 N时, 删除成功; 否则删除失败;
所述发送模块用于将删除成功或删除失败的结果反馈给所述用户。
PCT/CN2013/091143 2013-12-31 2013-12-31 一种分布式文件存储系统中的数据处理方法及设备 WO2015100627A1 (zh)

Priority Applications (7)

Application Number Priority Date Filing Date Title
CA2897129A CA2897129C (en) 2013-12-31 2013-12-31 Data processing method and device in distributed file storage system
CN201380002274.8A CN104272274B (zh) 2013-12-31 2013-12-31 一种分布式文件存储系统中的数据处理方法及设备
EP13900799.1A EP2933733A4 (en) 2013-12-31 2013-12-31 DATA PROCESSING METHOD AND DEVICE IN A DISTRIBUTED FILE STORAGE SYSTEM
JP2015559412A JP6106901B2 (ja) 2013-12-31 2013-12-31 分散ファイルストレージシステムにおけるデータ処理の方法およびデバイス
PCT/CN2013/091143 WO2015100627A1 (zh) 2013-12-31 2013-12-31 一种分布式文件存储系统中的数据处理方法及设备
AU2013409624A AU2013409624B2 (en) 2013-12-31 2013-12-31 Data processing method and device in distributed file storage system
US14/806,064 US10127233B2 (en) 2013-12-31 2015-07-22 Data processing method and device in distributed file storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/091143 WO2015100627A1 (zh) 2013-12-31 2013-12-31 一种分布式文件存储系统中的数据处理方法及设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/806,064 Continuation US10127233B2 (en) 2013-12-31 2015-07-22 Data processing method and device in distributed file storage system

Publications (1)

Publication Number Publication Date
WO2015100627A1 true WO2015100627A1 (zh) 2015-07-09

Family

ID=52162404

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/091143 WO2015100627A1 (zh) 2013-12-31 2013-12-31 一种分布式文件存储系统中的数据处理方法及设备

Country Status (7)

Country Link
US (1) US10127233B2 (zh)
EP (1) EP2933733A4 (zh)
JP (1) JP6106901B2 (zh)
CN (1) CN104272274B (zh)
AU (1) AU2013409624B2 (zh)
CA (1) CA2897129C (zh)
WO (1) WO2015100627A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274208A (zh) * 2018-12-05 2020-06-12 杭州海康威视系统技术有限公司 锁定文件的方法和装置

Families Citing this family (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6318878B2 (ja) * 2014-06-04 2018-05-09 富士通株式会社 通信装置、システム及び通信処理方法
CN104639661A (zh) * 2015-03-13 2015-05-20 华存数据信息技术有限公司 分布式存储系统及文件存储和读取方法
CN106161523B (zh) * 2015-04-02 2019-11-22 腾讯科技(深圳)有限公司 一种数据处理方法和设备
US10466913B2 (en) 2015-04-29 2019-11-05 EMC IP Holding Company LLC Method and system for replicating and using grid level metadata in a storage system
US11237727B2 (en) * 2015-10-05 2022-02-01 Weka.IO Ltd. Electronic storage system
CN105404469B (zh) * 2015-10-22 2018-11-13 浙江宇视科技有限公司 一种视频数据的存储方法和系统
CN105426483B (zh) * 2015-11-19 2019-01-11 华为技术有限公司 一种基于分布式系统的文件读取方法及装置
US11847095B1 (en) * 2015-12-30 2023-12-19 EMC IP Holding Company LLC Managing truncation of files of file systems
CN105824721B (zh) * 2016-03-14 2019-07-12 浙江宇视科技有限公司 一种数据存储系统及其存储纠删方法
CN107203559B (zh) * 2016-03-17 2021-01-01 华为技术有限公司 一种划分数据条带的方法和装置
US10545825B2 (en) * 2016-04-29 2020-01-28 Synamedia Limited Fault-tolerant enterprise object storage system for small objects
CN107247714B (zh) * 2016-06-01 2018-02-27 国家电网公司 一种基于分布式存储技术的小文件存取系统的存取方法
CN107819729B (zh) * 2016-09-13 2021-06-25 腾讯科技(深圳)有限公司 一种数据请求方法及其系统、接入设备、存储设备和存储介质
CN108021333B (zh) * 2016-11-03 2021-08-24 阿里巴巴集团控股有限公司 随机读写数据的系统、装置及方法
JP6526235B2 (ja) * 2016-11-25 2019-06-05 華為技術有限公司Huawei Technologies Co.,Ltd. データチェック方法および記憶システム
CN108241548A (zh) * 2016-12-23 2018-07-03 航天星图科技(北京)有限公司 一种基于分布式系统的文件读取方法
JP6833990B2 (ja) * 2017-03-29 2021-02-24 華為技術有限公司Huawei Technologies Co.,Ltd. 分散型ストレージシステムにアクセスするための方法、関係する装置及び関係するシステム
US10057373B1 (en) * 2017-05-15 2018-08-21 Palantir Technologies Inc. Adaptive computation and faster computer operation
CN107273048B (zh) * 2017-06-08 2020-08-04 浙江大华技术股份有限公司 一种数据写入方法及装置
CN109947842B (zh) * 2017-07-27 2021-06-18 杭州海康威视数字技术股份有限公司 分布式存储系统中的数据存储方法、装置及系统
CN109597566B (zh) * 2017-09-30 2022-03-04 杭州海康威视系统技术有限公司 一种数据读取、存储方法及装置
CN107679197A (zh) * 2017-10-10 2018-02-09 郑州云海信息技术有限公司 一种文件截断的优化方法及装置
EP3495939B1 (en) * 2017-10-13 2021-06-30 Huawei Technologies Co., Ltd. Method and device for storing data in distributed block storage system, and computer readable storage medium
CN109672544B (zh) 2017-10-13 2020-12-11 杭州海康威视系统技术有限公司 一种数据处理方法、装置及分布式存储系统
CN107918527B (zh) * 2017-11-01 2021-04-23 北京小米移动软件有限公司 存储空间分配方法及装置以及文件存储方法及装置
US11580068B2 (en) 2017-12-15 2023-02-14 Palantir Technologies Inc. Systems and methods for client-side data analysis
CN110109886B (zh) * 2018-02-01 2022-11-18 中兴通讯股份有限公司 分布式文件系统的文件存储方法及分布式文件系统
US10783214B1 (en) 2018-03-08 2020-09-22 Palantir Technologies Inc. Adaptive and dynamic user interface with linked tiles
CN110413202B (zh) * 2018-04-28 2024-03-08 伊姆西Ip控股有限责任公司 数据复制方法、设备和计算机程序产品
US10768844B2 (en) 2018-05-15 2020-09-08 International Business Machines Corporation Internal striping inside a single device
CN109165208B (zh) * 2018-07-26 2020-12-15 佛山市电子政务科技有限公司 一种用于将数据加载到数据库中的方法及系统
US10606851B1 (en) 2018-09-10 2020-03-31 Palantir Technologies Inc. Intelligent compute request scoring and routing
CN109597903B (zh) * 2018-11-21 2021-12-28 北京市商汤科技开发有限公司 图像文件处理装置和方法、文件存储系统及存储介质
CN109726036B (zh) * 2018-11-21 2021-08-20 华为技术有限公司 一种存储系统中的数据重构方法和装置
US10409641B1 (en) 2018-11-26 2019-09-10 Palantir Technologies Inc. Module assignment management
CN109634525B (zh) * 2018-12-10 2022-03-08 浪潮电子信息产业股份有限公司 一种存储系统有效容量的估算方法、系统及相关组件
WO2020124608A1 (zh) * 2018-12-22 2020-06-25 华为技术有限公司 分布式存储系统及计算机程序产品
CN109814805B (zh) * 2018-12-25 2020-08-25 华为技术有限公司 存储系统中分条重组的方法及分条服务器
WO2020181478A1 (zh) * 2019-03-12 2020-09-17 华为技术有限公司 亚健康节点的管理方法和装置
CN110308875B (zh) * 2019-06-27 2023-07-14 深信服科技股份有限公司 数据读写方法、装置、设备及计算机可读存储介质
CN112394876B (zh) * 2019-08-14 2024-02-23 深圳市特思威尔科技有限公司 大文件存储/读取方法、存储/读取装置和计算机设备
CN112579554A (zh) * 2019-09-29 2021-03-30 北京金山云网络技术有限公司 服务器配置文件的批量对比方法、装置及电子设备
CN110795391A (zh) * 2019-10-28 2020-02-14 深圳市元征科技股份有限公司 一种汽修资料处理方法、装置及电子设备和存储介质
LU101763B1 (en) * 2020-05-04 2021-11-05 Microsoft Technology Licensing Llc Microsegment secure speech transcription
CN111858540B (zh) * 2020-07-28 2024-10-15 昆明大棒客科技有限公司 带权重的分布式数据存储方法、系统和存储介质
CN112083888A (zh) * 2020-09-10 2020-12-15 北京金山云网络技术有限公司 文件存储方法、装置和电子设备
CN113010103B (zh) * 2021-01-15 2023-03-21 腾讯科技(深圳)有限公司 数据存储方法、装置、相关设备及存储介质
CN114281267B (zh) * 2021-12-30 2024-04-26 西北工业大学 分布式存储系统间的数据迁移方法和装置
CN115277735B (zh) * 2022-07-20 2023-11-28 北京达佳互联信息技术有限公司 数据的处理方法和装置、电子设备及存储介质
CN116048424B (zh) * 2023-03-07 2023-06-06 浪潮电子信息产业股份有限公司 Io数据处理方法、装置、设备及介质
CN116521091B (zh) * 2023-06-28 2023-09-15 苏州浪潮智能科技有限公司 数据读取方法、装置、设备、数据传输系统及存储介质
CN117057142B (zh) * 2023-08-15 2024-06-18 中交一公局集团有限公司 一种基于数字孪生的车辆测试数据处理方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101834898A (zh) * 2010-04-29 2010-09-15 中科院成都信息技术有限公司 一种网络分布式编码存储方法
CN102364472A (zh) * 2011-10-25 2012-02-29 中兴通讯股份有限公司 数据存储方法及系统
CN102624866A (zh) * 2012-01-13 2012-08-01 北京大学深圳研究生院 一种存储数据的方法、装置及分布式网络存储系统
CN102801784A (zh) * 2012-07-03 2012-11-28 华为技术有限公司 一种分布式数据存储方法及设备

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4451916A (en) * 1980-05-12 1984-05-29 Harris Corporation Repeatered, multi-channel fiber optic communication network having fault isolation system
US5388108A (en) 1992-10-23 1995-02-07 Ncr Corporation Delayed initiation of read-modify-write parity operations in a raid level 5 disk array
US7685126B2 (en) * 2001-08-03 2010-03-23 Isilon Systems, Inc. System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system
JP3682256B2 (ja) * 2001-11-30 2005-08-10 株式会社東芝 ディスクアレイ装置及び同装置におけるパリティ処理方法
US6985995B2 (en) * 2002-03-29 2006-01-10 Panasas, Inc. Data file migration from a mirrored RAID to a non-mirrored XOR-based RAID without rewriting the data
US7328305B2 (en) * 2003-11-03 2008-02-05 Network Appliance, Inc. Dynamic parity distribution technique
US7680758B2 (en) * 2004-09-30 2010-03-16 Citrix Systems, Inc. Method and apparatus for isolating execution of software applications
KR100579133B1 (ko) * 2004-12-06 2006-05-12 한국전자통신연구원 블록분할 디스크 어레이에서의 분산 패러티를 이용한데이터 배치 방법 및 블록분할 분산패러티 디스크어레이에서의 대형/소형 블록 읽기/쓰기 제어 방법
US7788303B2 (en) * 2005-10-21 2010-08-31 Isilon Systems, Inc. Systems and methods for distributed system scanning
US8209587B1 (en) * 2007-04-12 2012-06-26 Netapp, Inc. System and method for eliminating zeroing of disk drives in RAID arrays
US8442751B2 (en) * 2007-11-27 2013-05-14 The Boeing Company Onboard electronic distribution system
US10133883B2 (en) * 2009-02-09 2018-11-20 International Business Machines Corporation Rapid safeguarding of NVS data during power loss event
US8209513B2 (en) 2009-11-12 2012-06-26 Autonomy, Inc. Data processing system with application-controlled allocation of file storage space
WO2012000997A1 (en) * 2010-07-02 2012-01-05 International Business Machines Corporation An apparatus for processing a batched unit of work
US9092864B2 (en) * 2010-11-30 2015-07-28 Pixart Imaging Inc Displacement detection apparatus and method
JP5405513B2 (ja) 2011-03-22 2014-02-05 株式会社東芝 メモリシステム、不揮発性記憶装置、不揮発性記憶装置の制御方法、及びプログラム
CN102855194B (zh) * 2012-08-08 2015-05-13 北京君正集成电路股份有限公司 数据存储方法和存储器
US8909860B2 (en) * 2012-08-23 2014-12-09 Cisco Technology, Inc. Executing parallel operations to increase data access performance
US9317363B2 (en) * 2013-11-06 2016-04-19 International Business Machines Corporation Management of a secure delete operation in a parity-based system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101834898A (zh) * 2010-04-29 2010-09-15 中科院成都信息技术有限公司 一种网络分布式编码存储方法
CN102364472A (zh) * 2011-10-25 2012-02-29 中兴通讯股份有限公司 数据存储方法及系统
CN102624866A (zh) * 2012-01-13 2012-08-01 北京大学深圳研究生院 一种存储数据的方法、装置及分布式网络存储系统
CN102801784A (zh) * 2012-07-03 2012-11-28 华为技术有限公司 一种分布式数据存储方法及设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2933733A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274208A (zh) * 2018-12-05 2020-06-12 杭州海康威视系统技术有限公司 锁定文件的方法和装置
CN111274208B (zh) * 2018-12-05 2023-06-30 杭州海康威视系统技术有限公司 锁定文件的方法和装置

Also Published As

Publication number Publication date
CN104272274A (zh) 2015-01-07
US10127233B2 (en) 2018-11-13
AU2013409624A1 (en) 2015-07-30
EP2933733A4 (en) 2016-05-11
US20150324371A1 (en) 2015-11-12
CN104272274B (zh) 2017-06-09
JP2016510148A (ja) 2016-04-04
AU2013409624B2 (en) 2016-11-17
JP6106901B2 (ja) 2017-04-05
EP2933733A1 (en) 2015-10-21
CA2897129A1 (en) 2015-07-09
CA2897129C (en) 2022-03-15

Similar Documents

Publication Publication Date Title
WO2015100627A1 (zh) 一种分布式文件存储系统中的数据处理方法及设备
US11281531B2 (en) Serial storage node processing of data functions
US8972779B2 (en) Method of calculating parity in asymetric clustering file system
US7266716B2 (en) Method and recovery of data using erasure coded data from stripe blocks
US9021335B2 (en) Data recovery for failed memory device of memory device array
US11074129B2 (en) Erasure coded data shards containing multiple data objects
US9135115B2 (en) Storing data in multiple formats including a dispersed storage format
US7953771B2 (en) Virtualized data storage vaults on a dispersed data storage network
CN101488104B (zh) 一种实现高效安全存储的系统和方法
US9311184B2 (en) Storing raid data as encoded data slices in a dispersed storage network
US20180267856A1 (en) Distributed storage system, data storage method, and software program
US7284088B2 (en) Methods of reading and writing data
US7310703B2 (en) Methods of reading and writing data
WO2014056381A1 (zh) 数据冗余实现方法及装置
WO2013023516A1 (zh) 数据冗余处理方法、装置和分布式存储系统
WO2020034695A1 (zh) 数据存储方法、数据恢复方法、装置、设备及存储介质
US11055018B2 (en) Parallel storage node processing of data functions
Liu et al. Reo: Enhancing reliability and efficiency of object-based flash caching

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2897129

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2013900799

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2013409624

Country of ref document: AU

Date of ref document: 20131231

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13900799

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2015559412

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE