WO2022083267A1 - 数据处理方法、装置、计算节点以及计算机可读存储介质 - Google Patents

数据处理方法、装置、计算节点以及计算机可读存储介质 Download PDF

Info

Publication number
WO2022083267A1
WO2022083267A1 PCT/CN2021/114136 CN2021114136W WO2022083267A1 WO 2022083267 A1 WO2022083267 A1 WO 2022083267A1 CN 2021114136 W CN2021114136 W CN 2021114136W WO 2022083267 A1 WO2022083267 A1 WO 2022083267A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
address
storage
storage device
index
Prior art date
Application number
PCT/CN2021/114136
Other languages
English (en)
French (fr)
Inventor
罗先强
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022083267A1 publication Critical patent/WO2022083267A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time

Definitions

  • the present application relates to the field of communication technologies, and in particular, to a data processing method, apparatus, computing node, and computer-readable storage medium.
  • HPC high-performance computing
  • NAS network attached storage
  • the client reads the data in the shared file through the file system.
  • the client sends a read request to the virtual file system (VFS) in the kernel.
  • VFS virtual file system
  • the VFS sends a read request to the user space file system (file system in userspace, FUSE) in the kernel. ) forwards the read request, and then sends the read request to the file system in the user space by FUSE.
  • FUSE file system in userspace
  • the file system After the file system receives the read request, it reads the data in the shared file managed by the file system from the data node according to the read request. , and return the read data to the client according to the transmission path of the read request.
  • the read request sent by the client to the file system needs to pass through the kernel once, and the read data returned by the file system to the client needs to pass through the kernel once, and it takes a certain amount of time to pass through the kernel twice. , resulting in a long total time for the client to read data and low data read efficiency.
  • Embodiments of the present application provide a data processing method, apparatus, computing node, and computer-readable storage medium, which can improve the efficiency of reading data.
  • a data processing method is provided, which is applied to a computing node, where the computing node includes a storage device, and the storage device has first data of a first object; the computing node further includes a data index; the The data index includes the first storage address of the first data in the storage device; the method includes:
  • Receive a first read request read the first data of the storage device based on the data index of the first data and the first read request, wherein the first read request is used to read the first data.
  • This method uses the computing node to read the data in the object from the storage device without accessing the file system, avoiding the interaction with the file system in the process of reading the data, and the read request and the read data do not need to pass through the computing node. It saves the time to traverse the core, thereby shortening the time for reading data and improving the efficiency of reading data.
  • the first read request includes an object identifier of the first object, a start offset of the first data in the first object, and an offset of the first data
  • the amount of data; the reading of the first data in the storage device based on the data index of the first data and the first read request includes:
  • the first storage address is determined from the data index based on the object identifier of the first object, the starting offset and the data amount; the first storage address is read from the first storage address a data.
  • the first object is an index node inode in a file system; the first object is based on the object identifier of the first object, the starting offset and the data amount, from the Determining the first storage address in the data index includes:
  • data verification information of the first storage address is stored in the second storage address of the storage device, and the data verification information is used to verify the data in the first storage address.
  • the method further includes:
  • the read first data is verified according to the first read request and the data verification information.
  • the method before the receiving the first read request, the method further includes:
  • a data processing apparatus for executing the above data processing method.
  • the data processing apparatus includes a functional module for executing the data processing method provided in the first aspect or in any optional manner of the first aspect.
  • a computing node in a third aspect, includes a processor and a memory, the memory stores at least one piece of program code, the program code is loaded and executed by the processor to implement the above-mentioned first aspect or the above-mentioned first aspect.
  • An operation performed by the data processing method provided in any optional manner of an aspect.
  • a computer-readable storage medium is provided, and at least one piece of program code is stored in the storage medium, and the program code is loaded and executed by a processor to realize the above-mentioned first aspect or any one of the above-mentioned first aspects.
  • a computer program product or computer program includes program code, the program code is stored in a computer-readable storage medium, and the processor of the computing node reads from the computer-readable storage medium.
  • the program code is taken, and the processor executes the program code, so that the computing node executes the data processing method provided by the first aspect or any optional manner of the first aspect.
  • FIG. 1 is a schematic diagram of a data processing system provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a computing node provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a data writing method provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of a data processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a data processing system provided by an embodiment of the present application.
  • the system 100 includes a server 101 and at least one computing node 102 .
  • the server 101 is used to manage at least one data node, and the at least one data node is used to provide data read and write services and data storage services for the computing node 102 .
  • a plurality of objects are stored in the at least one data node, the objects are the basic unit of data storage, each object is used to store data, and the data in each object is written by the at least one computing node 102, or by outside the system 100 It can be written by other computing nodes and can be read by the at least one computing node 102 , that is, each object is shared by the at least one computing node 102 .
  • Each computing node 102 is configured to write data to the at least one data node, so that the at least one data node generates a new object, or writes new data in an object already stored by the at least one data node, or deletes the at least one data node.
  • the computing node 102 includes a file system 1021 , a storage device 1022 and a client 1023 .
  • the file system 1021 is used to manage at least one object stored in the at least one data node, and the at least one object is stored in the at least one data node through the server 101 .
  • the file system 1021 is further configured to store a copy of the at least one object in the storage device 1022, which is managed by the storage device 1022, so that the client 1023 can read the data in the object managed by the file system 1021 from the storage device 1022 .
  • the file system 1021 is also used to provide data read and write services for the client 1023 .
  • the file system 1021 is a client for managing objects stored in the at least one data node, and the file system 1021 can be installed in each computing node 102 .
  • the storage device 1022 includes a solid-state hard disk, a mechanical hard disk, or a storage device of other types of storage media, and the type of the storage device 1022 is not specifically limited in this embodiment of the present application.
  • the storage device 1022 is used for storing a copy of at least one object managed by the file system 1021 , and provides data read and write services for the client 1023 .
  • the storage space provided by the storage device 1022 is identified by a plurality of storage addresses, and each storage address is used to identify a part of the storage space provided by the storage device 1022 .
  • the storage spaces identified by the multiple storage addresses have the same or different space sizes.
  • a storage space with a size of 4 kilobytes is identified by storage address 0000, and a storage space with a size of 64 bytes (byte, B)
  • the storage space is identified by storage address 0001, and another 8B storage space is identified by storage address 01000.
  • the storage device writes data in the storage address, that is, the storage device writes data in the storage space identified by the storage address; the storage device writes data in the storage address; Reading data from the storage address means reading data from the storage space identified by the storage address; the data in the storage address is also the data stored in the storage space identified by the storage address.
  • the storage device 1022 divides the multiple storage addresses into multiple data addresses, multiple verification addresses, multiple data index addresses, and multiple meta index addresses.
  • the space size of a storage address is the space size of the storage space identified by the storage address, that is, the maximum amount of data that the storage address can store.
  • the storage device records the space size of a data address as the first space size, the space size of a verification address as the second space size, the space size of a data index address as the third space size, and the The space size of a meta index address is recorded as the fourth space size. It should be noted that the size of the first space, the size of the second space, the size of the third space and the size of the fourth space can be set according to specific implementation examples.
  • the size of the first space is 4K
  • the size of the second space is 64B
  • the size of the third space is 4K
  • the size is 8B
  • the fourth space size is 32B.
  • the embodiments of the present application do not specifically limit the first space size, the second space size, the third space size, and the fourth space size.
  • the storage space identified by a data address is used to store part or all of the data in the object, or in other words, the data in an object can be stored by the storage space identified by at least one data address.
  • a storage space identified by a verification address is used to store data verification information of a data address, and a storage space identified by a verification address is also a verification area of the data address.
  • the data verification information is used to verify the data stored in the data address.
  • the plurality of data addresses are in one-to-one correspondence with the plurality of verification addresses.
  • the storage spaces identified by the multiple data addresses and the storage spaces identified by the multiple verification addresses form a data shared area, and the data shared area is used to store at least one file system managed by the file system.
  • a storage space identified by a data address is adjacent to a storage space identified by a corresponding verification address, such as the data sharing area in FIG. 1 .
  • the storage spaces identified by the multiple data addresses are adjacent, and the storage spaces identified by the multiple verification addresses are adjacent.
  • a storage space identified by a data index address is used to store a data address.
  • the storage device 1022 divides the storage space identified by the multiple data index addresses into two parts, which are a first storage space and a second storage space, the first storage space and the second storage space respectively.
  • the storage space is composed of a storage space identified by at least one data index address, wherein the first storage space is used to store the data address of the data written in the first time period, and the second storage space is used to store the first time period.
  • the data address of the previously written data; the first time period is the time period of the current moment, the end time of the first time period is the current moment, the start time is the target moment before the current moment, and the target moment is the same as the current moment.
  • the duration between the current times is the target duration, and the target duration is also the duration of the first time period; the data written in the first time period is also the newly written data, and the data written before the first time period
  • the entered data is also the old data written before.
  • the storage space identified by the multiple data index addresses is the data index area, and the data index area is used to store the occupied data addresses in the data sharing area, and the occupied data addresses are also stored in the object.
  • the storage address of the data is equivalent to a data index, in which data addresses are stored in the form of key-value pairs.
  • the data verification information stored in the verification address is used as a key, and the value corresponding to the key corresponds to the data index address where the data address corresponding to the verification address is stored.
  • the verification address 1 corresponds to the data address 1
  • the data verification information of the data address 1 stored in the verification address 1 is the key
  • the data index address where the data address 1 is stored corresponds to the value corresponding to the key.
  • a storage space identified by a meta index address is used to store metadata of an object and metadata verification information of the metadata, where the metadata verification information is used to verify the metadata.
  • the storage space identified by the meta index address is divided into two adjacent partial storage spaces, one part of the storage space is used to store the metadata, the other part of the storage space is used to store the metadata verification information, the metadata
  • the storage space where the verification information is located is also the verification area.
  • the storage space identified by the plurality of meta index addresses is a metadata shared area, and the metadata shared area is used for storing metadata of objects stored in the data shared area.
  • the storage device 1022 is not located in the computing node 102, but is located in a target data node, where the target data node is at least one data node managed by the server 101, or the at least one data node Data nodes other than nodes.
  • the client 1023 is used to provide users with data writing and data reading services.
  • the client 1023 writes the user's data into the storage device 1022, and when the writing is completed, sends the data written into the storage device 1022 to the file system 1021, and the file system 1021 writes the written data through the server 101 into the data node, so that other computing nodes share the data written by the client 1023 in the data node.
  • the client 1023 sends a read request to the storage device 1022 to read the data in the object managed by the file system 1021 from the storage device 1022, without reading the data in the object through the file system 1021, thereby avoiding the need for the client 1023 interacts with the file system 1021 through the kernel, which shortens the time for data reading and improves the efficiency of data reading.
  • the client 1023 fails to read data from the storage device 1022, the client 1023 can also send the read request to the file system 1021 through the message channel, so as to read the data through the file system.
  • message passing is inter-process communication (IPC).
  • An embodiment of the present application also provides a schematic structural diagram of a computing node.
  • the computing node 200 shown in FIG. 2 may vary greatly due to different configurations or performances, including one or more processors 201 and one or more One or more memories 202, wherein the processors include central processing units (CPUs) or other types of processors, and at least one piece of program code is stored in the memory 202, and the at least one piece of program code is The processor 201 loads and executes it to implement any method provided by the following method embodiments.
  • the computing node 200 is also installed with multiple clients and file systems, the memory 202 can also have the function of a storage device in the following method embodiments, and the computing node 200 also has a wired or wireless network interface, a keyboard and components such as input and output interfaces, so as to perform input and output, the computing node 200 also includes other components for implementing device functions, which will not be repeated here.
  • a computer-readable storage medium such as a memory including program codes
  • the program codes can be executed by a processor in the terminal to implement any of the methods in the following embodiments.
  • the computer-readable storage medium is a non-transitory computer-readable storage medium, such as read-only memory (ROM), random access memory (RAM), compact disc read-only memory, CD-ROM), magnetic tapes, floppy disks, and optical data storage devices, etc.
  • the file system writes a copy of at least one object managed by the file system to the storage device, so that the client in the computing node can read the data in the at least one object from the storage device.
  • the method refer to the flowchart of a data writing method provided by an embodiment of the present application shown in FIG. 3 , and the method is applied to a computing node including a client and a file system.
  • the file system sends a resource allocation request to a storage device, where the resource allocation request is used to instruct the storage device to allocate a data shared area, a data index area, and a metadata shared area.
  • the storage device is any storage device in the computing node, or any storage device in the target data node.
  • the storage device is located in the computing node as an example for description.
  • the resource allocation request includes an allocation identifier, where the allocation identifier is used to instruct the storage device to allocate the data shared area, the data index area, and the metadata shared area.
  • step 301 After the computing node is powered on, if the storage device is not divided into a data shared area, a data index area, and a metadata shared area, the file system executes step 301 .
  • the storage device Based on the resource allocation request, the storage device allocates a metadata index area, a data index area, and a data sharing area to the file system.
  • the storage device After receiving the resource allocation request, the storage device obtains the allocation identifier from the resource allocation request, and based on the allocation identifier, the storage device divides all or part of its own storage space into three parts, which are The data shared area, the data index area and the metadata shared area, and the data shared area is divided into storage spaces corresponding to multiple data addresses and storage spaces corresponding to multiple verification addresses, and the storage device divides the data index area into The storage space corresponding to the plurality of data index addresses is divided into the storage space corresponding to the plurality of meta index addresses in the metadata shared area.
  • the storage device sends an allocation completion response to the file system, where the allocation completion response is used to indicate that the metadata index area, the data index area, and the data sharing area have been allocated.
  • the allocation completion response includes an allocation completion flag, and the allocation completion flag is used to indicate that the metadata index area, the data index area, and the data sharing area have been allocated.
  • the storage device only needs to allocate the metadata index area, the data index area and the data sharing area once, and does not need to allocate it multiple times, that is, the process shown in the above steps 301-303 can be executed once, and there is no need to perform multiple times. implement.
  • the file system After receiving the allocation completion response, the file system sends a target write request to the storage device, where the target write request is used to instruct to write multiple objects to the storage device.
  • the target write request includes a number of objects, an object identifier for each object, and an object offset.
  • the object is an index node (index node, inode) in the file system, such as a file or a data block.
  • the multiple objects are objects managed by the file system, and the multiple objects can be shared by at least one computing node, and the computing node is one of the at least one computing nodes.
  • An object is used to store data, and the data stored in an object is the service data uploaded by the user through the client, and the service data includes any type of data such as video data, audio data, text data, etc.
  • the service data is not specifically limited.
  • the object offset of an object is the total amount of data for that object at the current moment.
  • the file system When the file system receives the allocation completion response, it indicates that the storage device has allocated the metadata index area, the data index area and the data shared area. At this time, the storage device has not yet stored objects, and the file system passes this step 304 Writes all objects it manages to the storage device.
  • the file system sends an object acquisition request to the server, where the object acquisition request is used to acquire the object managed by the file system; after the server receives the object acquisition request, the server acquires the file system management from the at least one data node and send each object, the object identifier of each object and the object offset to the file system; the file system receives each object, the object identifier of each object and the object offset sent by the server, And based on the received multiple objects, the object identifier of each object, and the object offset, the target write request is generated, and the file system sends the target write request to the storage device.
  • the target write request further includes metadata of each object, and the metadata of an object is used to describe the properties of the object, and the properties include the object identifier of the object, the object offset, the user At least one of an identifier, a read-write permission identifier, and a time identifier.
  • the user identifier is used to indicate the user who uploaded the object
  • the read permission identifier is used to indicate whether the data in the object is allowed to be read and whether data is allowed to continue to be written in any of the objects.
  • the time stamp is used to indicate the time when the data in any object is updated, and the data update means that the data is written in the object, or the data in the object is deleted.
  • the storage device writes the multiple objects based on the target write request.
  • this step 305 includes the following steps 3051-3053.
  • Step 3051 For any object in the plurality of objects, the storage device writes the any object in at least one data address.
  • the storage device may need one or more data addresses when writing any object.
  • the storage device may need one or more data addresses when writing any object.
  • the data amount of any object is less than or equal to the first space size, one data address is required to write the any object, otherwise, multiple data addresses are required to write the any object.
  • the storage spaces identified by the at least one data address are adjacent or not.
  • the storage device divides the data in any object into at least one data block, and writes the at least one data block into the storage space identified by the at least one data address; wherein, The data amount of the at least one data block is smaller than or equal to the first space size.
  • the data volume of any object is an integer multiple of the first space size
  • the data volume of the at least one data block is equal to the first space size
  • one data block in the at least one data block is The size of the data block is smaller than the first space size, and the sizes of other data blocks are equal to the first space size.
  • the storage device writes the data in any object into data addresses 0-3. Specifically, the storage device is in data addresses 0-3. In 3, the first 4K data, the middle 4K data and the last 4K data of any object are written respectively.
  • the storage device writes the data in any object into data addresses 4-5. Specifically, the storage device is at data address 4. Write the first 4K data and the last 3K data of any object in -5.
  • Step 3052 For any data address in the at least one data address, the storage device generates data verification information for any data address based on the object identifier of the any object and the data in the any data address.
  • the data verification information of a data address is used to verify the data in the data address.
  • the data verification information includes the object identifier of the object to which the data in the data address belongs, and the starting offset of the data in the data address in the object to which it belongs.
  • the storage device determines the data in the any object and before the data in the any data address as the target data, and determines the data amount of the target data as the data in the any data address in the any data address.
  • the starting offset in an object, and the object identifier of any object and the starting offset of the data in any data address in the any object are determined as the data verification of any data address information.
  • the storage device writes the data in any object into data addresses 0-3 respectively.
  • the data verification information of data addresses 0-3 are (A, 0K), (A, 4K), and (A, 8K) respectively.
  • Step 3053 The storage device associates and stores the data verification information of any data address with the data verification information of the any data address.
  • the storage device determines a verification address corresponding to any data address, and writes the data verification information in the determined verification address, so as to implement associative storage. For example, if any data address is data address 0, and the data address 0 corresponds to the verification address 1, the storage device writes the data verification information in the storage center identified by the verification address 1.
  • the storage device stores a data address corresponding to the any object.
  • the data address corresponding to the any object is at least one data address where the data in the any object is stored.
  • the storage device stores the any data address in the data index area based on the data verification information of the any data address.
  • the storage device storing the any data address in the data index area based on the data verification information of the any data address includes: the storage device determining a first data in the data index area index address, and write any data address into the first data index address; the storage device performs hash calculation on the data verification information of any data address to obtain a first hash value, and the storage device stores the first data address by hash calculation.
  • a hash value is stored in association with the first data index address, so that the first hash value corresponds to the first data index address for subsequent query.
  • the first data index address is an unoccupied data index address, and the unoccupied data index address is a data index address where no data address is stored.
  • the storage device determining a first target data index in the data index area includes: if the data index area is not divided into the first storage space and the second storage space, then the storage device Any unoccupied data index address in the data index area is determined as the first data index address; if the data index area has been divided into the first storage space and the second storage space, the storage device will use any unoccupied data index address in the first storage space.
  • An unoccupied data index address is determined as the first data index address; if the data index area has been divided into a first storage space and a second storage space, and there is no unoccupied data index address in the first storage space, Then the storage device transfers the first stored data address in the first storage space to the second storage space, and determines the data index address occupied by the first stored data address in the first storage space as the first storage space. Data index address.
  • the storage device modifies the hash value corresponding to the first data index address first.
  • the first corresponding hash value of the first data index address is stored in association with the second data index address.
  • the storage device stores the metadata of any object based on the target write request.
  • the storage device acquires the metadata of the any object from the target write request, and writes the acquired metadata of the any object into the metadata shared area.
  • the storage device obtains any unoccupied meta index address from the metadata shared area, and writes the metadata of any object in the meta index address.
  • the unoccupied meta-index address is a meta-index address where no metadata is stored in the metadata shared area.
  • the storage device after the storage device writes the metadata into the metadata index area, the storage device performs a hash calculation on the metadata to obtain a second hash value, and stores the first hash value in the metadata.
  • the second hash value is used as the metadata verification information of the metadata, and is written into the metadata index area.
  • the metadata verification information of the metadata is used to verify the metadata, so as to ensure that any object stored in the storage device is the same as the one stored in the storage device. Consistency between any of the objects managed by the file system.
  • the storage device After the storage device writes the metadata into the metadata index area, the storage device performs a hash calculation on the object identifier of any object to obtain a third hash value, The third hash value is stored in association with the meta index address for subsequent query.
  • multiple objects managed by the file system are stored in the storage device, so that the client in the computing node can directly read the data in the objects from the storage device, without reading the data from the file system. Therefore, the interaction between the client and the file system through the kernel is avoided, thereby shortening the data reading time and improving the data reading efficiency.
  • the client in the computing node can read the data in the object managed by the file system from the storage device.
  • the method is applied to a computing node, the computing node includes a storage device, and the storage device has the first data of the first object; the computing node also includes a data index; the data index includes the first data in the storage device. first storage address.
  • the client sends a first read request to the storage device, where the first read request is used to instruct to read the first data.
  • the client is any client in the computing node, or any client in other nodes other than the computing node.
  • the client is located in the computing node as an example for description.
  • the first object is any object managed by the file system.
  • the first read request includes the object identifier of the first object, the starting offset of the first data in the first object, and the data amount of the first data.
  • the first read request further includes a read identifier, where the read identifier is used to indicate read data.
  • the first data is the data to be read, and the starting offset is the data amount of the data before the first data in the first object.
  • the storage device receives the first read request.
  • the storage device After the storage device receives the first read request, the storage device first determines the data address where the first data is stored according to the first read request, and then reads the data address from the data address where the first data is stored. first data. For details, please refer to the following steps 403-404.
  • the storage device determines a first storage address from the data index based on the object identifier of the first object, the start offset, and the data amount.
  • the first storage address is also the data address in which the first data is stored. Since the data stored in one data address is limited, the first data may be stored in at least one data address, that is, the first data address is stored in the first data address. There may be at least one first storage address of a data. In a possible implementation manner, this step 403 includes the following steps A-B.
  • Step A the storage device calculates a hash value based on the object identifier of the first object, the starting offset and the data amount.
  • the storage device calculates at least one first storage address according to the object identifier, the starting offset and the data amount at least one piece of data verification information, and perform hash calculation on the at least one piece of data verification information, wherein one first storage address corresponds to one piece of data verification information.
  • the storage device acquiring at least one data verification information of the at least one first storage address according to the object identifier, the starting offset and the data amount includes: if the starting offset The amount is not an integer multiple of the size of the first space, the storage device selects a first starting offset from the first interval [0, starting offset], and the first starting offset is the first starting offset in the first interval.
  • the largest integer multiple of the first space size the storage device selects at least one integer multiple of the first space size from the second interval [first starting offset, (starting offset + the data amount)], and determine the at least one integer multiple as at least one second starting offset; for any starting offset in the first starting offset and the at least one second starting offset, the storage device
  • the object identifier and any starting offset are determined as data verification information of a first storage address.
  • the storage device acquiring at least one data verification information of the at least one first storage address according to the object identifier, the starting offset and the data amount includes: if the starting offset The amount is an integer multiple of the first space size, and the storage device selects at least one integer of the first space size from the third interval [the start offset, (start offset + the data amount)] times, and the at least one integer multiple is determined as at least one third starting offset; for any third starting offset in the at least one third starting offset, the storage device uses the The object identifier and any third starting offset are determined as data verification information of a first storage address.
  • the storage device performs a hash calculation on the any data verification information to obtain a fourth hash value.
  • Step B Based on the hash value, the storage device determines a first storage address of the first data included in the data index in the storage device.
  • the hash value is at least one fourth hash value obtained by the storage device performing hash calculation on the at least one piece of data verification information.
  • the storage device queries the corresponding relationship between the data index address and the hash value recorded in the data index for the corresponding value of the fourth hash value.
  • the data index address is queried, and the data index address is determined as a third storage address; the storage device reads the data address stored in the third storage address as a first storage address.
  • a third storage address is a data index address in which a first storage address is stored in the data index.
  • a fourth hash value in the at least one fourth hash value if the storage device divides the data index area into a first storage space and a second storage space, the storage The device first queries the data index address corresponding to the fourth hash value from the correspondence between the data index address and the hash value in the first storage space.
  • the data index address corresponding to the fourth hash value indicates that the first data is data written in the first time period, that is, newly written data, and the storage device determines the data index address queried.
  • the storage device determines the queried data index address as a third storage address.
  • the storage device reads the first data from the first storage address.
  • the first storage address where the starting data of the first data is located is recorded as the first storage address
  • the ending data of the first data is recorded as the first storage address.
  • the first storage address where it is located is recorded as the tail storage address
  • the storage addresses other than the first storage address and the tail storage address in the at least one first storage address are recorded as the intermediate storage address.
  • the first storage address is stored at the third storage address corresponding to the fourth hash value calculated based on the first starting offset
  • the tail storage address is stored at the fourth hash value calculated based on the fourth starting offset.
  • the third storage address corresponding to the zero value, the fourth starting offset is the maximum value of the at least one second starting offset or the maximum value of the at least one third starting offset.
  • the starting offset is not an integer multiple of the first space size, it means that the starting data of the first data is at a non-starting position of the first storage address, that is, the first storage address in the at least one first storage address. Part of the data in the storage address belongs to the first data, and part of the data does not belong to the first data.
  • the end data of the first data is at a non-start position of the tail storage address, that is, the at least one Part of the data in the tail storage address in the first storage address belongs to the first data, and part of the data does not belong to the first data.
  • the storage device starts reading with the starting offset as a starting point, reads the data of the data amount, and obtains the read data as the first data, wherein, the data amount of the data located before the starting offset in the first storage address is: the difference between the starting offset and the first starting offset.
  • the data volume of the first object is 12K
  • the first space size is 4K
  • the starting offset of the first data in the first object is 5K
  • the data volume of the first data is 2K
  • the first object Stored at data addresses 0-2 (ie, at least one first storage address).
  • data address 0 stores the data of [0, 4k] of the first object
  • data address 1 stores the data of (4k, 8k) of the first object
  • data address 2 stores the data of the first object The data of (8k, 12k].
  • the starting offset of the first data is 5K
  • the starting data of the first data is in the data address 1, that is, the data address 1 is the first storage address
  • the offset of a data is 2K
  • the termination data of the first data is also in the data address 1
  • the data address 1 is the tail storage address at the same time, and the storage device reads the data from 5K to 7K in the data address 1 as the first data.
  • the at least one first storage address includes at least a first storage address and a tail storage address; if the starting offset The amount is not an integer multiple of the size of the first space, the storage device starts reading with the starting offset in the storage space identified by the first storage address, and the data located after the starting offset, Read the data as the starting part of the first data; if the starting offset is an integer multiple of the size of the first space, the storage device reads all the data in the storage space identified by the first storage address is the starting part of the first data; if the sum between the starting offset and the data amount (denoted as the target sum value) is not an integer multiple of the size of the first space, the storage device stores the data from the tail The starting position of the storage space identified by the address is the starting point to start reading, and the amount of data to be read is the data of the target difference value, and the read data is obtained as the end part of the first data, wherein the target difference
  • the data volume of the first object is 12K
  • the first space size is 4K
  • the starting offset of the first data in the first object is 2K
  • the data volume of the first data is 9K
  • the first object Stored at data addresses 0-2.
  • data address 0 stores the data of [0, 4k] of the first object
  • data address 1 stores the data of (4k, 8k) of the first object
  • data address 2 stores the data of the first object The data of (8k, 12k].
  • the storage device Since the starting offset of the first data is 2K, the starting data of the first data is in the data address 0, that is, the data address 0 is the first storage address, then the storage The device reads all the data after the 2K in the data address 0 as the starting part of the first data; and because the offset of the first data is 9K, the end data of the first data is in the first object.
  • the data address 1 is the intermediate storage address
  • the storage device reads all the data in the space identified by the data address 1 as the intermediate data of the first data
  • the storage device reads the data address 0-
  • the 2K to 11K data in the first object are read as the first data.
  • the process shown in the above steps 403-404 is also a process in which the computing node reads the first data of the storage device based on the data index of the first data and the first read request.
  • the storage device verifies the first data according to the first read request and the data verification information of the first storage address stored in the second storage address, and the data verification information of a first storage address is used to verify the data in the first memory address.
  • a second storage address is a verification address corresponding to a first storage address.
  • the storage device reads the data verification information of the at least one first storage address from at least one second storage address corresponding to the at least one first storage address, and the storage device reads the at least one first read request according to the first read request.
  • the data verification information of a first storage address is verified. If the read data verification information of the at least one first storage address passes the verification, the storage device determines that the first data has passed the verification, otherwise, determines that the first data has passed the verification. Data failed validation.
  • the storage device reads a data verification from the second storage address corresponding to the any first storage address If the read data verification information is consistent with the data verification information of any first storage address calculated by the storage device according to the first read request, then the storage device determines that the read data verification information has passed the verification , otherwise, the storage device determines that the read data verification information fails the verification.
  • this step 405 is an optional step.
  • the storage device executes this step 405, and after the first data passes the verification, the storage device executes the following step 406. In other embodiments, the storage device does not perform this step 405, and directly performs the following step 406 after performing the above-mentioned step 404.
  • the storage device sends the first data to the client.
  • the processes shown in the above steps 401 to 406 are described by taking the storage device being able to read the first data as an example.
  • the storage device if the storage device does not read the first data, or the read first data fails the verification, the storage device sends a read failure response to the client to indicate Data read failed. For example, if the storage device cannot query at least one third storage address from the data index area, it means that the storage device does not store the first data, and the storage device cannot read the first data.
  • the client can also read the first data based on the file system. For example, the client interacts with the file system to read data through the kernel, such as the data reading process in the background art, or the client can also send the first read request to the file system through IPC, and the file system can retrieve the data from the data through the server.
  • the node reads the first data in the first object, and returns the read first data to the client through IPC.
  • the data in the object is read from the storage device by the computing node without accessing the file system, which avoids the interaction with the file system in the process of reading data, and the read request and the read data
  • the data does not need to pass through the core of the computing node, which saves the time of passing through the core, thereby shortening the time for reading data and improving the efficiency of reading data.
  • the client can also write data into the storage device. For example, before the client reads the first data from the storage device, the client The terminal writes the first data to the storage device.
  • FIG. 5 To further describe the process of writing data to the storage device by the client, refer to the flowchart of a data processing method provided by an embodiment of the present application shown in FIG. 5 . The method is applied to a computing node including a client and a file system.
  • the client sends a first write request to the storage device, where the first write request is used to instruct to write the first data into the first object.
  • the storage device is any storage device in the computing node, or any storage device in the target data node. In the embodiment of the present application, the storage device is located in the computing node as an example for description.
  • the first write request includes the object identifier of the first object, the object offset of the first object, and the first data, and the object offset of the first object is the total amount of data of the first object at the current moment .
  • the storage device receives the first write request.
  • the storage device writes the first data to the first storage address of the storage device according to the first write request.
  • the first storage address is a data address in the data sharing area, and one data address stores limited data. In order to completely write the first data in the storage device, there may be at least one first storage address.
  • the storage device divides the first data into at least one data block, and selects a first storage address for each data block from unoccupied data addresses in the data sharing area, and In the at least one selected first storage address, the corresponding data blocks are respectively written; wherein, the data amount of the at least one data block is smaller than or equal to the first space size.
  • the data amount of the first data is an integer multiple of the first space size
  • the data amount of the at least one data block is equal to the first space size
  • one data block in the at least one data block is The size of the data block is smaller than the first space size, and the sizes of other data blocks are equal to the first space size.
  • the storage device determines whether the first object is an object already stored in the storage device according to the object offset of the first object; if the first object is an object already stored in the storage device object, then the storage device determines, according to the object offset, whether there is free space in the data address used to store the first object in the storage device at the current moment; Start with the starting data and write into the free space; if the first data remains after the free space is filled, the storage device divides the remaining first data into at least one data block, and selects one data block for each data block. The first storage address, and in the selected at least one first storage address, the corresponding data blocks are respectively written; if the first data is all written into the free space, the data address corresponding to the free space is also a first storage address.
  • the storage device determines that there is no free space in the data address used to store the first object in the storage device at the current moment, otherwise, the storage device is the free space.
  • the storage device performs a hash calculation on the object identifier of the first object and the largest integer multiple of the first space size smaller than the offset of the object, to obtain the No. Five hash values; the storage device reads the target data address in the data index address corresponding to the fifth hash value, and the free space is a part of the storage space in the storage space identified by the target data address.
  • the storage device divides the first data into at least one data block, and from the unoccupied data addresses in the data sharing area, A first storage address is selected for each data block, and corresponding data blocks are respectively written in at least one of the selected first storage addresses.
  • the storage device obtains the object data volume of the first object from the first write request. If the object data volume is 0, it means that the first object is a new object to be written, and at this time the storage device The first object is not stored in the storage device, and if the data amount of the object is not 0, it means that the first object has been stored in the storage device.
  • the storage device writes the data verification information of the first storage address in the second storage address according to the object identifier and the object offset, and the data verification information of a first storage address includes the object identifier and the first storage address.
  • the starting offset of the data in a data address in the first object, and the second storage address corresponds to the first storage address one-to-one.
  • a second storage address is a verification address corresponding to a first storage address in the data sharing area.
  • the storage device determines the target data volume, and the target data volume is the data volume of the data in the first data and before the data in the any first storage address; the storage device will The sum of the object data amount and the target data amount is determined as the starting offset of the data in any first storage address in the first object; the storage device is the object identifier of the first object and the starting offset of the first object. The offset is determined as the data verification information of any first storage address; the storage device writes the data verification information of any first storage address in the verification address corresponding to the any first storage address.
  • the process shown in steps 503-504 is that the storage device, according to the first write request, writes the first data to the first object. A process of storing data in association with the first object.
  • the storage device records the first storage address in the data index.
  • the storage device writes the at least one first storage address in at least one third storage address, and one first storage address corresponds to one third storage address.
  • a third storage address is a data index address in the data index area.
  • the storage device stores the any one of the first storage addresses in the data index area based on the data verification information of the any one of the first storage addresses.
  • the storage device is based on the data verification information of the any first storage address, and the process of storing the any first storage address in the data index area and step 306
  • the storage device is based on the data verification information of the any data address , the process of storing this any data address in the data index area is the same, here, in this embodiment of the present application, the storage device stores this any first storage address based on the data verification information of the any first storage address The process in the data index area will not be repeated.
  • the storage device writes the metadata of the first object in a fourth storage address according to the first write request.
  • the fourth storage address is a meta index address in the metadata shared area. If before writing the first data, the first object is an object already stored in the storage device, and the metadata of the first object has been stored in the fourth storage address, then the storage device according to the first object The object identifier is determined, the fourth storage address is determined, and the metadata of the first object in the fourth storage address is updated according to the written first data.
  • the storage device determining the fourth storage address according to the object identifier of the first object includes: the storage device performs a hash calculation on the object identifier of the first object, obtains a sixth hash value, and obtains a sixth hash value from the metadata.
  • the meta-index address corresponding to the sixth hash value is queried, the meta-index address corresponding to the sixth hash value is also the fourth storage address .
  • the storage device determines any storage address in a plurality of unoccupied storage addresses as the fourth storage address, and The metadata of the first object is generated according to the first write request, and the metadata of the first object is written in the fourth storage address; when the metadata of the first object is written into the fourth storage address , the storage device performs hash calculation on the object identifier of the first object to obtain a sixth hash value, and associates the six hash value with the fourth storage address for subsequent query.
  • the multiple unoccupied storage addresses are multiple unoccupied metadata index addresses in the metadata shared area.
  • the storage device After performing this step 506, the storage device writes the metadata verification information of the metadata in the fourth storage address according to the metadata in the fourth storage address, and the metadata verification information is used to verify the metadata data.
  • the storage device performs hash calculation on the metadata in the fourth storage address to obtain a seventh hash value, and determines the seventh hash value as the metadata verification information of the metadata; if the A metadata verification information has been stored in the fourth storage address, then the storage device replaces the stored metadata verification information with the seventh hash value; if no metadata verification information is stored in the fourth storage address, then The storage device directly writes the seventh hash value to the fourth storage address.
  • the storage device sends a write success response to the client, where the write success response is used to indicate that the first data has been written into the first object.
  • the client After the client receives the write success response, it indicates that the storage device has written the first data in the stored first object, because the first object stored by the storage device is only the first object managed by the file system If the first object managed by the file system has not yet written the first data, then, in order to enable the first data to be shared by other computing nodes, the client sends the first write request to the file system, The file system writes the first data to the first object stored in the data node according to the first write request.
  • the computing node may not be able to access the storage device, and the first data is not stored in the target storage device that can be accessed by the computing node, then when When the file system in any computing node detects that the first data is written into the first object stored in the data node, the file system sends the first write request to the target storage device, so that the target storage device can perform the above steps by performing the above steps.
  • the first data is written into the target storage device, so that any subsequent computing node can read the first data from the target storage device, so as to avoid interaction with the file system.
  • data is stored in association with the object managed by the file system through the storage device, so that data can be written in the object managed by the file system stored in the storage device, so that the subsequent client can read it when needed.
  • the client can directly read from the storage device without interacting with the file system, which saves the time of traversing the kernel, shortens the time for the client to read the data, and improves the reading time. Efficiency of fetching data.
  • the apparatus 600 includes a storage device 601, and the storage device 601 contains first data of a first object; the apparatus 600 further includes a data index ; The data index includes the first storage address of the first data in the storage device 601; the device 600 includes:
  • a receiving module 602 configured to receive a first read request, where the first read request is used to read the first data
  • the reading module 603 is configured to read the first data of the storage device 601 based on the data index of the first data and the first read request.
  • the first read request includes an object identifier of the first object, a start offset of the first data in the first object, and a data amount of the first data;
  • the reading module 603 includes:
  • a determining unit configured to determine the first storage address from the data index based on the object identifier of the first object, the starting offset and the data amount;
  • a reading unit configured to read the first data from the first storage address.
  • the first object is an index node inode in a file system; the determining unit is used for:
  • the first storage address of the first data included in the data index in the storage device 601 is determined.
  • the second storage address of the storage device 601 stores data verification information of the first storage address, where the data verification information is used to verify the data in the first storage address.
  • the apparatus 600 further includes:
  • a verification module configured to verify the read first data according to the first read request and the data verification information.
  • the device 600 also includes a writing module and a recording module:
  • the receiving module 602 is further configured to receive a first write request, where the first write request is used to write the first data into the first object;
  • the writing module configured to write the first data to the first storage address of the storage device according to the first write request
  • the recording module is configured to record the first storage address in the data index.
  • Embodiments of the present application also provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, where the computer instructions are stored in a computer-readable storage medium, and the processor of the computing node is obtained from the computer-readable storage medium. After reading the computer instructions, the processor executes the computer instructions, so that the computing node executes the above data processing method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据处理方法、装置、计算节点以及计算机可读存储介质,属于通信技术领域。本方法通过计算节点从存储设备内读取对象中的数据,而无须访问文件系统,避免了在读取数据过程中与文件系统进行交互,则读请求以及读取到的数据均无须穿越计算节点的内核,节省了穿越内核的时间,从而缩短了读取数据的时间,提高了读取数据的效率。

Description

数据处理方法、装置、计算节点以及计算机可读存储介质
本申请要求于2020年10月23日提交的申请号为202011148703.0、发明名称为“数据处理方法、装置、计算节点以及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,特别涉及一种数据处理方法、装置、计算节点以及计算机可读存储介质。
背景技术
在高性能场景下,高性能计算机群(high performance computing,HPC)无法通过标准的网络附加存储(network attached storage,NAS)协议,提供访问共享文件的性能。而为了满足HPC和高性能文件共享场景的需求,客户端一般通过处于用户态的文件系统(file system),来访问共享文件。
以计算节点包括客户端、内核以及文件系统,客户端通过文件系统读取共享文件中的数据为例,数据读取的过程为:客户端和文件系统均处于用户态,均位于计算节点内的用户空间(userspace),客户端向内核中的虚拟文件系统(virtual file system,VFS)发送读请求,由于文件系统处于用户空间,则VFS向内核中的用户空间文件系统(file system in userspace,FUSE)转发该读请求,再由FUSE将该读请求发送至用户空间中的文件系统,文件系统接收到该读请求后,根据读请求,从数据节点读取该文件系统管理的共享文件中的数据,并按照读请求的传输路径,将读取到的数据原路返回给客户端。
在上述的数据读取过程中,客户端向文件系统发送的读请求需要穿越一次内核,文件系统向客户端返回读取到的数据还需要穿越一次内核,两次穿越内核均需耗费一定的时长,导致客户端读取数据的总时长较长,读取数据的效率低。
发明内容
本申请实施例提供了一种数据处理方法、装置、计算节点以及计算机可读存储介质,能够提高读取数据的效率。
第一方面,提供了一种数据处理方法,应用于计算节点,所述计算节点包含存储设备,所述存储设备中有第一对象的第一数据;所述计算节点还包含数据索引;所述数据索引中包含所述第一数据在所述存储设备中的第一存储地址;所述方法包括:
接收第一读请求;基于所述第一数据的数据索引和所述第一读请求,读取所述存储设备的所述第一数据,其中,所述第一读请求用于读取所述第一数据。
本方法通过计算节点从存储设备内读取对象中的数据,而无须访问文件系统,避免了在读取数据过程中与文件系统进行交互,则读请求以及读取到的数据均无须穿越计算节点的内核,节省了穿越内核的时间,从而缩短了读取数据的时间,提高了读取数据的效率。
在一种可能的实现方式中,所述第一读请求包括所述第一对象的对象标识、所述第一数据在所述第一对象中的起始偏移量和所述第一数据的数据量;所述基于所述第一数据的数据索引和所述第一读请求,读取所述存储设备中的所述第一数据包括:
基于所述第一对象的对象标识、所述起始偏移量以及所述数据量,从所述数据索引中确定所述第一存储地址;从所述第一存储地址中读取所述第一数据。
在一种可能的实现方式中,所述第一对象为文件系统中的索引节点inode;所述基于所述第一对象的对象标识、所述起始偏移量以及所述数据量,从所述数据索引中确定第一存储地址包括:
基于所述第一对象的对象标识、所述起始偏移量以及所述数据量计算哈希值;基于所述哈希值,确定所述数据索引中包含的所述第一数据在所述存储设备中的第一存储地址。
在一种可能的实现方式中,所述存储设备的第二存储地址中存储有所述第一存储地址的数据验证信息,所述数据验证信息用于验证所述第一存储地址中的数据。
在一种可能的实现方式中,所述读取所述存储设备的所述第一数据之后,所述方法还包括:
根据所述第一读请求以及所述数据验证信息,对读取到的所述第一数据进行验证。
在一种可能的实现方式中,所述接收第一读请求之前,所述方法还包括:
接收第一写请求;根据所述第一写请求,向所述存储设备的所述第一存储地址写入所述第一数据;在所述数据索引中记录所述第一存储地址;其中,所述第一写请求用于向所述第一对象中写入所述第一数据。
第二方面,提供了一种数据处理装置,用于执行上述数据处理方法。具体地,该数据处理装置包括用于执行上述第一方面或上述第一方面的任一种可选方式提供的数据处理方法的功能模块。
第三方面,提供了一种计算节点,该计算节点包括处理器和存储器,该存储器中存储有至少一条程序代码,该程序代码由该处理器加载并执行以实现如上述第一方面或上述第一方面的任一种可选方式提供的数据处理方法所执行的操作。
第四方面,提供了一种计算机可读存储介质,该存储介质中存储有至少一条程序代码,该程序代码由处理器加载并执行以实现如上述第一方面或上述第一方面的任一种可选方式提供的数据处理方法所执行的操作。
第五方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括程序代码,该程序代码存储在计算机可读存储介质中,计算节点的处理器从计算机可读存储介质读取该程序代码,处理器执行该程序代码,使得该计算节点执行上述第一方面或上述第一方面的任一种可选方式提供的数据处理方法。
附图说明
图1是本申请实施例提供的一种数据处理系统的示意图;
图2是本申请实施例提供的一种计算节点的结构示意图;
图3是本申请实施例提供的一种数据写入方法的流程图;
图4是本申请实施例提供的一种数据处理方法的流程图;
图5是本申请实施例提供的一种数据处理方法的流程图;
图6是本申请实施例提供的一种数据处理装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
图1是本申请实施例提供的一种数据处理系统的示意图,参见图1,该系统100包括服务器101以及至少一个计算节点102。
其中,该服务器101用于管理至少一个数据节点,该至少一个数据节点用于为计算节点102提供数据读写服务以及数据存储服务。该至少一个数据节点内存储有多个对象,对象是数据存储的基本单位,每个对象用于存储数据,每个对象中的数据由该至少一个计算节点102写入,或者由该系统100以外的其他计算节点写入,且能够被该至少一个计算节点102读取,也即是每个对象由该至少一个计算节点102所共享。
每个计算节点102,用于向该至少一个数据节点写数据,以便该至少一个数据节点生成新的对象,或者在该至少一个数据节点已存储的对象中写入新的数据,或者删除该至少一个数据节点已存储的对象中的数据。
在一种可能的实现方式中,计算节点102包括文件系统1021、存储设备1022以及客户端1023。其中,文件系统1021,用于管理该至少一个数据节点中存储的至少一个对象,该至少一个对象通过服务器101存储于至少一个数据节点。该文件系统1021,还用于将该至少一个对象的副本存储在存储设备1022,由存储设备1022来管理,以供该客户端1023从存储设备1022读取文件系统1021所管理的对象中的数据。该文件系统1021,还用于为客户端1023提供数据读写服务。可选地,该文件系统1021是用于管理该至少一个数据节点中存储的对象的客户端,每个计算节点102内均能安装该文件系统1021。
该存储设备1022包括固态硬盘、机械硬盘或其他类型存储介质的存储设备,在此,本申请实施例对该存储设备1022的类型不做不作具体限定。该存储设备1022,用于存储文件系统1021所管理的至少一个对象的副本,为该客户端1023提供数据读写服务。该存储设备1022所提供的存储空间由多个存储地址来标识,每个存储地址用于标识该存储设备1022所提供的部分存储空间。该多个存储地址所标识的存储空间的空间大小相同或不相同,例如一个4千字节(kilobyte,K)大小的存储空间由存储地址0000来标识,一个64字节(byte,B)大小的存储空间由存储地址0001来标识,而另一个8B大小的存储空间由存储地址01000来标识。需要说明的是,在本申请实施例中,对于一个存储地址,存储设备在该存储地址中写入数据,也即是该存储设备在该存储地址所标识的存储空间中写入数据;存储设备从该存储地址中读取数据,也即是从该存储地址所标识的存储空间中读取数据;该存储地址中的数据也即是该存储地址所标识的存储空间中存储的数据。
在一种可能的实现方式中,该存储设备1022将该多个存储地址划分为多个数据地址、多个验证地址、多个数据索引地址以及多个元索引地址。一个存储地址的空间大小为该存储地址所标识的存储空间的空间大小,也即是该存储地址最多能够存储的数据的数据量。为了便于描述,存储设备将一个数据地址的空间大小记为第一空间大小,将一个验证地址的空间大小记为第二空间大小,将一个数据索引地址的空间大小记为第三空间大小,将一个元索引地址的空间大小记为第四空间大小。需要说明的是,第一空间大小、第二空间大小、第三空间 大小以及第四空间大小可根据具体实施实例进行设置,例如第一空间大小为4K,第二空间大小为64B,第三空间大小为8B,第四空间大小为32B,在此,本申请实施例对该第一空间大小、第二空间大小、第三空间大小以及第四空间大小不作具体限定。
一个数据地址所标识的存储空间用于存储对象中的部分数据或全部数据,或者说,一个对象内的数据能够由至少一个数据地址所标识的存储空间来存储。一个验证地址所标识的存储空间用于存储一个数据地址的数据验证信息,一个验证地址所标识的存储空间也即是该数据地址的验证区。该数据验证信息用于验证该数据地址中存储的数据。该多个数据地址与该多个验证地址一一对应。在一种可能的实现方式中,该多个数据地址所标识的存储空间和该多个验证地址所标识的存储空间组成数据共享区,该数据共享区用于存储该文件系统所管理的至少一个对象、该至少一个对象所占用的存储地址的数据验证信息。可选地,在该数据共享区中,一个数据地址所标识的存储空间与对应的验证地址所标识的存储空间相邻,例如图1中的数据共享区。可选地,在该数据共享区中,该多个数据地址所标识的存储空间相邻,该多个验证地址所标识的存储空间相邻。
一个数据索引地址所标识的存储空间用于存储一个数据地址。在一种可能的实现方式中,该存储设备1022将该多个数据索引地址所标识的存储空间分为两部分,分别为第一存储空间和第二存储空间,该第一存储空间和第二存储空间均由至少一个数据索引地址所标识的存储空间组成,其中,第一存储空间用于存储在第一时间段内写入数据的数据地址,第二存储空间用于存储在第一时间段之前写入数据的数据地址;该第一时间段为当前时刻所在的时间段,该第一时间段的结束时间为当前时刻,起始时间为该当前时刻之前的目标时刻,该目标时刻与该当前时间之间的时长为目标时长,该目标时长也即是该第一时间段的时长;在第一时间段内写入的数据也即是新写入的数据,在第一时间段之前写入的数据也即是之前写入的旧数据。可选地,该多个数据索引地址所标识的存储空间为数据索引区,该数据索引区用于存储数据共享区中已占用的数据地址,已占用的数据地址也即是存储有对象中的数据的存储地址。需要说明的是,该数据索引区相当于数据索引,在该数据索引中以键值对(key-value)的方式存储数据地址。可选地,以验证地址中存储的数据验证信息为key,该key所对应value与存储有该验证地址所对应的数据地址的数据索引地址对应。例如,验证地址1与数据地址1对应,验证地址1中存储的数据地址1的数据验证信息为key,存储有数据地址1的数据索引地址与该key对应的value对应。
一个元索引地址所标识的存储空间用于存储一个对象的元数据以及该元数据的元数据验证信息,该元数据验证信息用于验证该元数据。可选地,该元索引地址所标识的存储空间分为相邻的两个部分存储空间,一部分存储空间用于存储该元数据,另一部分存储空间用于存储该元数据验证信息,该元数据验证信息所在的存储空间也即是验证区。可选地,该多个元索引地址所标识的存储空间为元数据共享区,该元数据共享区用于存储数据共享区中已存储的对象的元数据。
在一种可能的实现方式中,该存储设备1022不位于该计算节点102中,而是位于目标数据节点中,该目标数据节点为该服务器101管理的至少一个数据节点,或者为该至少一个数据节点以外的数据节点。
该客户端1023,用于为用户提供数据写入以及数据读取的业务。可选地,客户端1023将用户的数据写入存储设备1022,当写入完成后,向文件系统1021发送写入存储设备1022 的数据,由文件系统1021通过服务器101将该写入的数据写入数据节点,以便其他计算节点共享数据节点中该客户端1023所写入的数据。可选地,客户端1023向存储设备1022发送读请求,以从存储设备1022读取文件系统1021所管理的对象中的数据,无须通过文件系统1021来读取对象中的数据,从而避免客户端1023通过内核与文件系统1021进行交互,缩短了数据读取的时间,提高了数据读取的效率。而当客户端1023从存储设备1022读取数据失败时,则该客户端1023还能够通过消息通道向文件系统1021发送该读请求,以通过文件系统读取数据。例如消息通过为进程间通信(inter-process communication,IPC)。
本申请实施例还提供一种计算节点的结构示意图,参见图2,图2所示的计算节点200可因配置或性能不同而产生比较大的差异,包括一个或一个以上处理器201和一个或一个以上的存储器202,其中,所述处理器包括中央处理器(central processing units,CPU)或其他类型的处理器,所述存储器202中存储有至少一条程序代码,所述至少一条程序代码由所述处理器201加载并执行以实现下述各个方法实施例提供的任一方法。可选地,该计算节点200还安装有多个客户端和文件系统,该存储器202还能够具有下述各个方法实施例中存储设备的功能,该计算节点200还具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算节点200还包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括程序代码的存储器,上述程序代码可由终端中的处理器执行以完成下述实施例中的任一方法。例如,该计算机可读存储介质是非临时计算机可读存储介质,如只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、只读光盘(compact disc read-only memory,CD-ROM)、磁带、软盘和光数据存储设备等。
计算节点在初始化的过程中,文件系统将其管理的至少一个对象的副本写入存储设备,以便计算节点内的客户端能够从该存储设备读取到该至少一个对象中的数据,为了进一步说明这个过程,参见图3所示的本申请实施例提供的一种数据写入方法的流程图,该方法应用于包括客户端和文件系统的计算节点。
301、该文件系统向存储设备发送资源分配请求,该资源分配请求用于指示存储设备分配数据共享区、数据索引区以及元数据共享区。
该存储设备为该计算节点内的任一存储设备,或者目标数据节点中的任一存储设备。在本申请实施例中,以存储设备位于计算节点为例进行说明。该资源分配请求包括分配标识,该分配标识用于指示存储设备分配数据共享区、数据索引区以及元数据共享区。
当该计算节点上电后,若该存储设备未划分出数据共享区、数据索引区以及元数据共享区,则该文件系统执行本步骤301。
302、该存储设备基于该资源分配请求,为该文件系统分配元数据索引区、数据索引区以及数据共享区。
当接收到该资源分配请求后,该存储设备从该资源分配请求中获取到分配标识,则该存储设备基于该分配标识,将自己的全部存储空间或部分存储空间划分为三个部分,分别为数据共享区、数据索引区以及元数据共享区,并将数据共享区划分为多个数据地址所对应的存储空间以及多个验证地址所对应的存储空间,该存储设备将该数据索引区划分为多个数据索引地址所对应的存储空间,将该元数据共享区划分出多个元索引地址所对应的存储空间。
其中,关于数据共享区、数据索引区以及元数据共享区在上文中有相关介绍,在此,本 申请实施例对数据共享区、数据索引区以及元数据共享区不做赘述。
303、该存储设备向该文件系统发送分配完成响应,该分配完成响应用于指示元数据索引区、数据索引区以及数据共享区已分配完成。
该分配完成响应包括分配完成标识,该分配完成标识用于指示该元数据索引区、数据索引区以及数据共享区已分配完成。当该存储设备分配完元数据索引区、数据索引区以及数据共享区后,该存储设备执行本步骤303。
需要说明的是,该存储设备分配一次元数据索引区、数据索引区以及数据共享区即可,无须多次分配,也即是上述步骤301-303所示的过程执行一次即可,无须多次执行。
304、当接收到该分配完成响应后,该文件系统向存储设备发送目标写请求,该目标写请求用于指示将多个对象写入存储设备。
该目标写请求包括多个对象、每个对象的对象标识以及对象偏移量。其中,以文件系统为例,对象为文件系统中的索引节点(index node,inode),例如文件或者数据块。
该多个对象为该文件系统所管理的对象,该多个对象能够被至少一个计算节点所共享,该计算节点为该至少一个计算节点中的一个。一个对象用于存储数据,一个对象中存储的数据为用户通过客户端所上传的业务数据,该业务数据包括视频数据、音频数据、文本数据等任一类型的数据,在此,本申请实施例对该业务数据不作具体限定。一个对象的对象偏移量为在当前时刻该对象的数据总量。
当该文件系统接收到该分配完成响应后,说明该存储设备已分配元数据索引区、数据索引区以及数据共享区,此时,该存储设备内还未存储对象,则该文件系统通过本步骤304向存储设备写入其管理的全部对象。
该文件系统向服务器发送对象获取请求,该对象获取请求用于获取该文件系统所管理的对象;当该服务器接收到该对象获取请求后,该服务器从该至少一个数据节点中获取该文件系统管理的对象,并向该文件系统发送获取到每个对象、每个对象的对象标识以及对象偏移量;该文件系统接收服务器发送的每个对象、每个对象的对象标识以及对象偏移量,并基于接收到多个对象、每个对象的对象标识以及对象偏移量,生成该目标写请求,该文件系统向该存储设备发送获该目标写请求。
在一种可能的实现方式中,该目标写请求还包括每个对象的元数据,一个对象的元数据用于描述该对象的属性,该属性包括该对象的对象标识、对象偏移量、用户标识、读写权限标识以及时间标识中的至少一个。该用户标识用于指示上传该对象的用户、该读取权限标识用于指示是否允许读取该对象中的数据以及是否允许在该任一对象中继续写入数据。该时间标识用于指示该任一对象中的数据出现更新的时间,数据出现更新是指该对象中写入了数据,或该对象中的数据被删除。
305、该存储设备基于该目标写请求,写入该多个对象。
当接收到该目标写请求后,该存储设备从该目标写请求获取该多个对象,该存储设备将该多个对象写入到数据共享区。一种可能的实现方式中,本步骤305包括下述步骤3051-3053。
步骤3051、对于该多个对象中的任一对象,该存储设备在至少一个数据地址中写入该任一对象。
由于一个数据地址的第一空间大小有限,则该存储设备在写入该任一对象时,可能需要一个或多个数据地址。在一种可能的实现方式中,若该任一对象的数据量小于或等于该第一 空间大小,则需要一个数据地址来写入该任一对象,否则,需要多个数据地址来写入该任一对象。该至少一个数据地址所标识的存储空间相邻或者不相邻。
在一种可能的实现方式中,该存储设备将该任一对象中的数据划分为至少一个数据块,并将该至少一个数据块分别写入该至少一个数据地址所标识的存储空间;其中,该至少一个数据块的数据量均小于或等于该第一空间大小。
可选地,若该任一对象的数据量为该第一空间大小的整数倍,则该至少一个数据块的数据量均等于该第一空间大小,否则,该至少一个数据块中一个数据块的大小小于该第一空间大小,其他数据块的大小均等于该第一空间大小。
例如,该任一对象的数据量为12K,第一空间大小为4K,则该存储设备将该任一对象中的数据写入数据地址0-3,具体地,该存储设备在数据地址0-3中分别写入该任一对象的前4K的数据、中间4K的数据以及后4K的数据。
再例如,该任一对象的数据量为7K,第一空间大小为4K,则该存储设备将该任一对象中的数据写入数据地址4-5,具体地,该存储设备在数据地址4-5中分别写入该任一对象的前4K的数据以及后3K的数据。
步骤3052、对于该至少一个数据地址中的任一数据地址,该存储设备基于该任一对象的对象标识以及该任一数据地址中的数据,生成该任一数据地址的数据验证信息。
一个数据地址的数据验证信息用于验证该数据地址中的数据。该数据验证信息包括该数据地址中的数据所属对象的对象标识、该数据地址中的数据在所属对象中的起始偏移量。
该存储设备将在该任一对象内且位于该任一数据地址中的数据之前的数据,确定为目标数据,将该目标数据的数据量,确定为该任一数据地址中的数据在该任一对象中的起始偏移量,并将该任一对象的对象标识、任一数据地址中的数据在该任一对象中的起始偏移量,确定为该任一数据地址的数据验证信息。
例如,该任一对象的对象标识为A,该任一对象的数据量为12K,第一空间大小为4K,则该存储设备将该任一对象中的数据分别写入数据地址0-3,数据地址0-3的数据验证信息为分别为(A,0K)、(A,4K)、(A,8K)。
步骤3053、该存储设备将该任一数据地址与该任一数据地址的数据验证信息进行关联存储。
该存储设备确定该任一数据地址所对应的验证地址,并在确定的该验证地址中写入该数据验证信息,以实现关联存储。例如,该任一数据地址为数据地址0,而该数据地址0与验证地址1对应,则该存储设备在该验证地址1所标识的存储中间中写入该数据验证信息。
306、对于多个对象中的任一对象,该存储设备存储该任一对象对应的数据地址。
该任一对象对应的数据地址为存储有该任一对象中的数据的至少一个数据地址。对于该至少一个数据地址中的任一数据地址,该存储设备基于该任一数据地址的数据验证信息,将该任一数据地址存储在数据索引区。
在一种可能的实现方式中,该存储设备基于该任一数据地址的数据验证信息,将该任一数据地址存储在数据索引区包括:该存储设备确定该数据索引区中的一个第一数据索引地址,并将该任一数据地址写入该第一数据索引地址;该存储设备对该任一数据地址的数据验证信息进行哈希计算,得到第一哈希值,该存储设备将该第一哈希值与该第一数据索引地址进行关联存储,使得该第一哈希值与该第一数据索引地址对应,以便后续查询。该第一数据索引 地址为未占用的数据索引地址,未占用的数据索引地址为没有存储数据地址的数据索引地址。
在一种可能的实现方式中,该存储设备确定该数据索引区中的一个第一目标数据索引包括:若该数据索引区未划分第一存储空间和第二存储空间,则该存储设备将该数据索引区中任一未占用的数据索引地址,确定为该第一数据索引地址;若该数据索引区已划分第一存储空间和第二存储空间,则该存储设备将第一存储空间中任一未占用的数据索引地址,确定为该第一数据索引地址;若该数据索引区已划分第一存储空间和第二存储空间,且该第一存储空间中不存在未占用的数据索引地址,则该存储设备将该第一存储空间中最先存储的数据地址转移至第二存储空间,并将该第一存储空间中该最先存储的数据地址所占用的数据索引地址,确定为第一数据索引地址。
需要说明的是,若该存储设备将该最先存储的数据地址转移至第二存储空间中的第二数据索引地址,则该存储设备将该第一数据索引地址最先对应的哈希值修改为该第一哈希值,将该第一数据索引地址最先对应的哈希值与该第二数据索引地址关联存储。
307、该存储设备基于该目标写请求,存储该任一对象的元数据。
该存储设备从该目标写请求中获取该任一对象的元数据,并将获取到的该任一对象的元数据写入元数据共享区。
在一种可能的实现方式中,该存储设备从元数据共享区,获取任一未占用的元索引地址,并在该元索引地址中写入该任一对象的元数据。其中,未占用的元索引地址为元数据共享区中未存储元数据的元索引地址。
在一种可能的实现方式中,当该存储设备将该元数据写入到该元数据索引区之后,该存储设备对该元数据进行哈希计算,得到第二哈希值,并将该第二哈希值作为该元数据的元数据验证信息,写入该元数据索引区,该元数据的元数据验证信息用于验证该元数据,以保证该存储设备中存储的该任一对象与该文件系统所管理的该任一对象之间的一致性。
在一种可能的实现方式中,当该存储设备将该元数据写入到该元数据索引区之后,该存储设备对该任一对象的对象标识进行哈希计算,得到第三哈希值,并将该第三哈希值与该元索引地址进行关联存储,以便后续查询。
本申请实施例所提供的方法,通过将文件系统所管理的多个对象存储该存储设备,后续以便计算节点内的客户端直接从该存储设备读取对象中的数据,而无需从文件系统读取,从而避免客户端通过内核与文件系统进行交互,从而缩短了数据读取的时间,提高了数据读取的效率。
计算节点内的客户端能够从存储设备读取文件系统所管理的对象中的数据,为了进一步说明该过程,参见如图4所示的本申请实施例提供的一种数据处理方法的流程图,该方法应用于计算节点,该计算节点包含存储设备,该存储设备中有第一对象的第一数据;该计算节点还包含数据索引;该数据索引中包含该第一数据在该存储设备中的第一存储地址。
401、客户端向该存储设备发送第一读请求,该第一读请求用于指示读取该第一数据。
该客户端为该计算节点内的任一客户端,或者该计算节点以外的其他节点内的任一客户端。在本申请实施例中,以客户端位于该计算节点为例进行说明。
该第一对象为该文件系统管理的任一对象。该第一读请求包括该第一对象的对象标识、该第一数据在该第一对象中的起始偏移量、该第一数据的数据量。可选地,该第一读请求还包括读标识,该读标识用于指示读取数据。该第一数据为待读取数据,该起始偏移量为该第 一对象中处于该第一数据之前的数据的数据量。
402、该存储设备接收该第一读请求。
当该存储设备接收到该第一读请求后,该存储设备先根据该第一读请求确定存储有该第一数据的数据地址,再从存储有该第一数据的数据地址中,读取该第一数据。具体请参见下述步骤403-404。
403、该存储设备基于该第一对象的对象标识、该起始偏移量以及该数据量,从数据索引中确定第一存储地址。
该第一存储地址也即是存储有该第一数据的数据地址,由于一个数据地址中存储的数据有限,因此,该第一数据可能存储在至少一个数据地址,也即是,存储有该第一数据的第一存储地址可能有至少一个。在一种可能的实现方式中,本步骤403包括下述步骤A-B。
步骤A、该存储设备基于该第一对象的对象标识、该起始偏移量以及该数据量计算哈希值。
由于存储有数据地址的数据索引地址与该数据地址的数据验证信息的哈希值对应,则该存储设备根据该对象标识、该起始偏移量以及该数据量,计算至少一个第一存储地址的至少一个数据验证信息,并分别对该至少一个数据验证信息进行哈希计算,其中,一个第一存储地址对应一个数据验证信息。
在一种可能的实现方式中,该存储设备根据该对象标识、该起始偏移量以及该数据量,获取该至少一个第一存储地址的至少一个数据验证信息包括:若该起始偏移量不是该第一空间大小的整数倍,该存储设备从第一区间[0,起始偏移量]中,选取第一起始偏移量,该第一起始偏移量为第一区间中该第一空间大小的最大整数倍;该存储设备从第二区间[第一起始偏移量,(起始偏移量+该数据量)]中,选取该第一空间大小的至少一个整数倍,并将该至少一个整数倍,确定为至少一个第二起始偏移量;对于该第一起始偏移量以及至少一个第二起始偏移量中的任一起始偏移量,该存储设备将该对象标识以及该任一起始偏移量,确定为一个第一存储地址的数据验证信息。
在一种可能的实现方式中,该存储设备根据该对象标识、该起始偏移量以及该数据量,获取该至少一个第一存储地址的至少一个数据验证信息包括:若该起始偏移量是该第一空间大小的整数倍,该存储设备从第三区间[该起始偏移量,(起始偏移量+该数据量)]中,选取该第一空间大小的至少一个整数倍,并将该至少一个整数倍,确定为至少一个第三起始偏移量;对于该至少一个第三起始偏移量中的任一第三起始偏移量,该存储设备将该对象标识以及任一第三起始偏移量,确定为一个第一存储地址的数据验证信息。
在一种可能的实现方式中,对于该至少一个数据验证信息中的任一数据验证信息,该存储设备对该任一数据验证信息进行哈希计算,得到第四哈希值。
步骤B、该存储设备基于该哈希值,确定该数据索引中包含的该第一数据在该存储设备中的第一存储地址。
该哈希值为该存储设备对该至少一个数据验证信息进行哈希计算所得到的至少一个第四哈希值。
对于该至少一个第四哈希值中的一个第四哈希值,该存储设备从数据索引中记录的数据索引地址与哈希值之间的对应关系,查询该第四哈希值所对应的数据索引地址,并将查询到数据索引地址,确定为一个第三存储地址;该存储设备将该第三存储地址中存储的数据地址, 读取为一个第一存储地址。其中,一个第三存储地址为数据索引中存储有一个第一存储地址的数据索引地址。
在一种可能的实现方式中,对于该至少一个第四哈希值中的一个第四哈希值,若该存储设备将数据索引区划分为第一存储空间和第二存储空间,则该存储设备先从该第一存储空间中数据索引地址与哈希值之间的对应关系中,查询该第四哈希值所对应的数据索引地址,若能在该第一存储空间中能查询到该第四哈希值所对应的数据索引地址,则说明该第一数据为在第一时间段内写入的数据,也即是新写入的数据,该存储设备将查询到的数据索引地址确定为一个第三存储地址;若未能在该第一存储空间中查询到该第四哈希值所对应的数据索引地址,则该存储设备再从该第二存储空间中数据索引地址与哈希值之间的对应关系中,查询该第四哈希值所对应的数据索引地址,若能在该第二存储空间中查询到该第四哈希值所对应的数据索引地址,则说明该第一数据为在第一时间段之前写入的数据,该存储设备将查询到的数据索引地址确定为一个第三存储地址。
404、该存储设备从该第一存储地址中读取该第一数据。
对于确定出的存储有该第一数据的至少一个第一存储地址,为了便于说明,将该第一数据的起始数据所在的第一存储地址记为首存储地址,将该第一数据的终止数据所在的第一存储地址记为尾存储地址,将该至少一个第一存储地址中除首存储地址和尾存储地址以外的存储地址记为中间存储地址。其中,该首存储地址存储在基于第一起始偏移量计算出的第四哈希值所对应的第三存储地址,尾存储地址存储在基于第四起始偏移量计算出的第四哈希值所对应的第三存储地址,该第四起始偏移量为该至少一个第二起始偏移量中的最大值或至少一个第三起始偏移量中的最大值。
若该起始偏移量不是该第一空间大小的整数倍,说明该第一数据的起始数据处于首存储地址的非起始位置,也即是,该至少一个第一存储地址中的首存储地址内的部分数据属于该第一数据,部分数据不属于该第一数据。
若该起始偏移量与该数据量之间的和不是该第一空间大小的整数倍,说明该第一数据的终止数据处于尾存储地址的非起始位置,也即是,该至少一个第一存储地址中的尾存储地址内的部分数据属于该第一数据,部分数据不属于该第一数据。
在一种可能的实现方式中,若该起始偏移量不是该第一空间大小的整数倍,且该至少一个第一存储地址的个数为1,也即是仅有一个第一存储地址,则说明该第一存储地址既是首存储地址,也是尾存储地址。该存储设备在该第一存储地址所标识的存储空间中,以该起始偏移量为起点开始读取,读取该数据量的数据,将读取到的数据获取为该第一数据,其中,该第一存储地址中位于该起始偏移量之前的数据的数据量为:该起始偏移量与第一起始偏移量之间的差值。
例如,该第一对象的数据量为12K,第一空间大小为4K,第一数据在该第一对象中的起始偏移量为5K,第一数据的数据量为2K,该第一对象存储在数据地址0-2(也即是至少一个第一存储地址)。其中,数据地址0中存储有该第一对象的[0,4k]的数据,数据地址1中存储有该第一对象的(4k,8k]的数据,数据地址2中存储有该第一对象的(8k,12k]的数据。由于第一数据的起始偏移量为5K,则该第一数据的起始数据处于数据地址1中,也即是数据地址1为首存储地址;又因为第一数据的偏移量为2K,则第一数据的终止数据在该第一对象的偏移量为5+2=7K,该第一数据的终止数据也处于该数据地址1,则该数据地址1同时为尾存储地址, 该存储设备将数据地址1中第5K至7K的数据读取为第一数据。
在一种可能的实现方式中,若该至少一个第一存储地址的个数大于或等于2,则说明该至少一个第一存储地址至少包括首存储地址和尾存储地址;若该起始偏移量不是该第一空间大小的整数倍,该存储设备在首存储地址所标识的存储空间中,以该起始偏移量为起点开始读取,将位于该起始偏移量之后的数据,读取为该第一数据的起始部分数据;若该起始偏移量是该第一空间大小的整数倍,该存储设备将该首存储地址所标识的存储空间中的所有数据,读取为该第一数据的起始部分数据;若该起始偏移量与该数据量之间的和(记为目标和值)不是该第一空间大小的整数倍,该存储设备从该尾存储地址所标识的存储空间的起始位置为起点开始读取,读取数据量为目标差值的数据,将读取到的数据获取为该第一数据的结束部分数据,其中,该目标差值为目标和值与第四起始偏移量之间的差值;若该起始偏移量与该数据量之间的和是该第一空间大小的整数倍,该存储设备将该尾存储地址所标识的存储空间中所有的数据,读取为该第一数据的结束部分数据;若该至少一个第一存储地址还包括除首存储地址和尾存储地址以外的至少一个中间存储地址,则该存储设备将该至少一个中间存储地址所标识的存储空间中所有的数据,读取为该第一数据的中间部分数据;该存储设备将读取到的起始部分数据、中间部分数据以及结束部分数据,确定为该第一数据。
例如,该第一对象的数据量为12K,第一空间大小为4K,第一数据在该第一对象中的起始偏移量为2K,第一数据的数据量为9K,该第一对象存储在数据地址0-2。其中,数据地址0中存储有该第一对象的[0,4k]的数据,数据地址1中存储有该第一对象的(4k,8k]的数据,数据地址2中存储有该第一对象的(8k,12k]的数据。由于第一数据的起始偏移量为2K,则该第一数据的起始数据处于数据地址0中,也即是数据地址0为首存储地址,则该存储设备将数据地址0中第2K之后的所有数据,读取为第一数据的起始部分数据;又因为第一数据的偏移量为9K,则第一数据的终止数据在该第一对象的偏移量为2+9=11K,则该第一数据的终止数据处于该数据地址2,数据地址2为尾存储地址,该存储设备将数据地址2中前3K数据读取为第一数据的结束部分数据,则数据地址1为中间存储地址,该存储设备将数据地址1所标识的空间中的所有数据,读取为第一数据的中间部分数据,该存储设备将数据地址0-中该第一对象中第2K至第11K的数据读取为该第一数据。
需要说明的是,上述步骤403-404所示的过程也即是计算节点基于该第一数据的数据索引和该第一读请求,读取该存储设备的该第一数据的过程。
405、该存储设备根据该第一读请求以及第二存储地址中存储的该第一存储地址的数据验证信息,对该第一数据进行验证,一个第一存储地址的数据验证信息用于验证该第一存储地址中的数据。
一个第二存储地址为一个第一存储地址对应的验证地址。该存储设备从至少一个第一存储地址对应的至少一个第二存储地址中,读取该至少一个第一存储地址的数据验证信息,该存储设备根据该第一读请求,对读取到该至少一个第一存储地址的数据验证信息进行验证,若读取到的该至少一个第一存储地址的数据验证信息均通过验证,则该存储设备确定该第一数据通过验证,否则,确定该第一数据未通过验证。
在一种可能的实现方式中,对于该至少一个第一存储地址中的任一第一存储地址,该存储设备从该任一第一存储地址对应的第二存储地址中,读取一个数据验证信息,若读取到的数据验证信息与该存储设备根据该第一读请求计算出的该任一第一存储地址的数据验证信息 一致,则该存储设备确定读取到的数据验证信息通过验证,否则,该存储设备确定读取到的数据验证信息未通过验证。
需要说明的是,本步骤405为可选步骤,在一些实施例中,存储设备执行本步骤405,当该第一数据通过验证后,该存储设备执行下述步骤406。而在另一些实施例中,存储设备不执行本步骤405,执行完上述步骤404直接执行下述步骤406。
406、该存储设备向该客户端发送该第一数据。
上述步骤401-406所示的过程是以存储设备能够读取到第一数据为例进行说明的。而在一种可能的实现方式中,若该存储设备未读取到第一数据,或者读取到的第一数据未通过验证,则该存储设备向该客户端发送读取失败响应,以指示数据读取失败。例如,若该存储设备从该数据索引区查询不到至少一个第三存储地址,说明该存储设备内未存储有该第一数据,则该存储设备读取不到该第一数据。
若该客户端接收到该读取失败响应,则该客户端还能够基于文件系统读取该第一数据。例如客户端通过内核与文件系统交互读取数据,例如背景技术中的数据读取过程,或者,该客户端还能够通过IPC,向文件系统发送该第一读请求,由文件系统通过服务器从数据节点读取该第一对象中的第一数据,并通过IPC将读取到的第一数据返回该客户端。
本申请实施例提供的方法,通过计算节点从存储设备内读取对象中的数据,而无需访问文件系统,避免了在读取数据过程中与文件系统进行交互,则读请求以及读取到的数据均无须穿越计算节点的内核,节省了穿越内核的时间,从而缩短了读取数据的时间,提高了读取数据的效率。
该客户端除了能够从存储设备读取文件系统所管理的对象中的数据以外,该客户端还能够向存储设备内写入数据,例如,客户端在从存储设备读取第一数据之前,客户端向存储设备写入该第一数据,为了进一步说明客户端向存储设备写入数据的过程,参见图5所示的本申请实施例提供的一种数据处理方法的流程图。该方法应用于计算节点,该计算节点包括客户端和文件系统。
501、该客户端向该存储设备发送第一写请求,该第一写请求用于指示向该第一对象中写入第一数据。
该存储设备为该计算节点内的任一存储设备,或者目标数据节点中的任一存储设备。在本申请实施例中,以存储设备位于该计算节点为例进行说明。该第一写请求包括该第一对象的对象标识、该第一对象的对象偏移量以及该第一数据,该第一对象的对象偏移量为在当前时刻该第一对象的数据总量。
502、该存储设备接收该第一写请求。
503、该存储设备根据该第一写请求,向该存储设备的第一存储地址写入该第一数据。
该第一存储地址为数据共享区中的数据地址,而一个数据地址存储的数据有限,为了在该存储设备内完全写入该第一数据,该第一存储地址可能有至少一个。
在一种可能的实现方式中,该存储设备将该第一数据划分为至少一个数据块,并从数据共享区内未占用的数据地址中,为每个数据块选取一个第一存储地址,并在选取出的至少一个第一存储地址中,分别写入对应的数据块;其中,该至少一个数据块的数据量均小于或等于该第一空间大小。可选地,若该第一数据的数据量为该第一空间大小的整数倍,则该至少一个数据块的数据量均等于该第一空间大小,否则,该至少一个数据块中一个数据块的大小 小于该第一空间大小,其他数据块的大小均等于该第一空间大小。
在一种可能的实现方式中,该存储设备根据该第一对象的对象偏移量,确定该第一对象是否为该存储设备内已存储的对象;若第一对象为该存储设备内已存储的对象,则该存储设备根据该对象偏移量,确定当前时刻该存储设备内用于存储该第一对象的数据地址中是否存在空余空间;若存在,则该存储设备从该第一数据的起始数据开始,写入该空余空间;若该空余空间填充完成后,第一数据还有剩余,该存储设备将剩余的第一数据划分为至少一个数据块,并为每个数据块选取一个第一存储地址,并在选取出的至少一个第一存储地址中,分别写入对应的数据块;若该第一数据全部写入该空余空间,则该空余空间对应的数据地址也即是一个第一存储地址。
可选地,若该对象偏移量为第一空间大小的整数倍,则该存储设备确定当前时刻该存储设备内用于存储该第一对象的数据地址中不存在空余空间,否则,存成该空余空间。可选地,若该存储设备内存在该空余空间,则该存储设备对该第一对象的对象标识和小于该对象偏移量的该第一空间大小的最大整数倍进行哈希计算,得到第五哈希值;该存储设备在该第五哈希值对应的数据索引地址中,读取目标数据地址,该空余空间为该目标数据地址所标识的存储空间中的部分存储空间。
在一种可能的实现方式中,若该存储设备内未存储该第一对象,则该存储设备将该第一数据划分为至少一个数据块,并从数据共享区内未占用的数据地址中,为每个数据块选取一个第一存储地址,并在选取出的至少一个第一存储地址中,分别写入对应的数据块。
可选地,该存储设备从第一写请求中获取该第一对象的对象数据量,若该对象数据量为0,则说明该第一对象为待写入的新对象,此时该存储设备内未存储有该第一对象,若该对象数据量不为0,则说明该存储设备内已经存储该第一对象。
504、该存储设备根据该对象标识以及该对象偏移量,在第二存储地址中写入该第一存储地址的数据验证信息,一个第一存储地址的数据验证信息包括该对象标识以及该第一数据地址中的数据在该第一对象中的起始偏移量,该第二存储地址与该第一存储地址一一对应。
一个第二存储地址为数据共享区中一个第一存储地址所对应的验证地址。
若该至少一个第一存储地址不是该存储设备之前用于存储该第一对象时所使用的数据地址,那么,对于该至少一个第一存储地址中任一第一存储地址,该存储设备根据该任一第一存储址中的数据,确定目标数据量,该目标数据量为在该第一数据内,且位于该任一第一存储地址中的数据之前的数据的数据量;该存储设备将该对象数据量与该目标数据量之和确定为该任一第一存储地址中的数据在该第一对象的起始偏移量;该存储设备将该第一对象的对象标识以及该起始偏移量,确定为该任一第一存储地址的数据验证信息;存储设备在该任一第一存储地址对应的验证地址中,写入该任一第一存储地址的数据验证信息。
由于至少一个第一存储地址所对应的验证地址内存储有该至少一个第一存储地址的数据验证信息,而该至少一个第一存储地址的数据验证信息内均包括该第一对象的对象标识,从而使得该至少一个第一存储地址内存储的第一数据与该第一对象对应,实现了关联存储,因此,步骤503-504所示的过程为存储设备根据该第一写请求,对该第一数据与该第一对象进行关联存储的过程。
505、该存储设备在数据索引中记录该第一存储地址。
该存储设备在至少一个第三存储地址中写入该至少一个第一存储地址,一个第一存储地 址对应一个第三存储地址。一个第三存储地址为数据索引区中的一个数据索引地址。
对于该至少一个第一存储地址中的任一第一存储地址,该存储设备基于该任一第一存储地址的数据验证信息,将该任一第一存储地址存储在数据索引区。其中,该存储设备基于该任一第一存储地址的数据验证信息,将该任一第一存储地址存储在数据索引区的过程与步骤306中该存储设备基于该任一数据地址的数据验证信息,将该任一数据地址存储在数据索引区的过程同理,在此,本申请实施例对该存储设备基于该任一第一存储地址的数据验证信息,将该任一第一存储地址存储在数据索引区的过程不做赘述。
506、该存储设备根据该第一写请求,在第四存储地址中写入该第一对象的元数据。
该第四存储地址为元数据共享区中的一个元索引地址。若在写入该第一数据之前,该第一对象为该存储设备中已存储的对象,该第四存储地址中已存储有该第一对象的元数据,则该存储设备根据该第一对象的对象标识,确定该第四存储地址,并根据写入的该第一数据,对该第四存储地址中该第一对象的元数据进行更新。
可选地,该存储设备根据该第一对象的对象标识,确定该第四存储地址包括:该存储设备对第一对象的对象标识进行哈希计算,得到第六哈希值,并从元数据共享区中元索引地址与哈希值的对应关系中,查询该第六哈希值所对应的元索引地址,则该第六哈希值所对应的元索引地址也即是该第四存储地址。
若在写入该第一数据之前,该存储设备中未已存储该第一对象,则该存储设备将多个未占用的存储地址中的任一存储地址,确定为该第四存储地址,并根据该第一写请求生成该第一对象的元数据,并在该第四存储地址中写入该第一对象的元数据;当将该第一对象的元数据写入该第四存储地址后,该存储设备对第一对象的对象标识进行哈希计算,得到第六哈希值,并将该六哈希值与该第四存储地址进行关联存储,以便后续查询。其中,该多个未占用的存储地址为元数据共享区中未占用的多个元索引地址。
当执行完本步骤506之后,该存储设备根据该第四存储地址中的元数据,在该第四存储地址中写入该元数据的元数据验证信息,该元数据验证信息用于验证该元数据。可选地,该存储设备对该第四存储地址中的元数据进行哈希计算,得到第七哈希值,并将该第七哈希值确定为该元数据的元数据验证信息;若该第四存储地址中已经存储有一个元数据验证信息,则该存储设备将存储的元数据验证信息替换为该第七哈希值;若该第四存储地址中未存储有元数据验证信息,则该存储设备将该第七哈希值直接写入该第四存储地址。
507、该存储设备向该客户端发送写入成功响应,该写入成功响应用于指示已经将该第一数据写入该第一对象。
该客户端接收到该写入成功响应后,说明该存储设备已经在存储的第一对象内写入该第一数据,由于该存储设备存储的第一对象仅是文件系统所管理的第一对象的副本,则该文件系统管理的第一对象内还未写入该第一数据,那么,为了使得该第一数据能够被其他计算节点共享,该客户端向文件系统发送该第一写请求,由文件系统根据该第一写请求,将该第一数据写入数据节点所存储的第一对象。
而对于除该计算节点以外的任一计算节点而言,该任一计算节点可能无法访问该存储设备,而该任一计算节点能够访问的目标存储设备中并未存储该第一数据,则当该任一计算节点内的文件系统检测到该第一数据写入数据节点存储的第一对象时,该文件系统向该目标存储设备发送该第一写请求,以便该目标存储设备通过执行上述步骤502-506所示的过程,将 该第一数据写入该目标存储设备,以便后续该任一计算节点能够从该目标存储设备读取该第一数据,以避免与文件系统进行交互。
本申请实施例提供的方法,通过存储设备将数据与文件系统所管理的对象进行关联存储,从而能够在存储设备存储的文件系统所管理的对象中写入数据,以便后续客户端在需要读取文件系统所管理的对象中的数据时,客户端能够直接从存储设备读取,而无须与文件系统进行交互,节省了穿越内核的时间,从而缩短了客户端读取数据的时间,提高了读取数据的效率。
图6是本申请实施例提供的一种数据处理装置的结构示意图,所述装置600包含存储设备601,所述存储设备601中有第一对象的第一数据;所述装置600还包含数据索引;所述数据索引中包含所述第一数据在所述存储设备601中的第一存储地址;所述装置600包括:
接收模块602,用于接收第一读请求,所述第一读请求用于读取所述第一数据;
读取模块603,用于基于所述第一数据的数据索引和所述第一读请求,读取所述存储设备601的所述第一数据。
可选地,所述第一读请求包括所述第一对象的对象标识、所述第一数据在所述第一对象中的起始偏移量和所述第一数据的数据量;
所述读取模块603,包括:
确定单元,用于基于所述第一对象的对象标识、所述起始偏移量以及所述数据量,从所述数据索引中确定所述第一存储地址;
读取单元,用于从所述第一存储地址中读取所述第一数据。
可选地,所述第一对象为文件系统中的索引节点inode;所述确定单元用于:
基于所述第一对象的对象标识、所述起始偏移量以及所述数据量计算哈希值;
基于所述哈希值,确定所述数据索引中包含的所述第一数据在所述存储设备601中的第一存储地址。
可选地,所述存储设备601的第二存储地址中存储有所述第一存储地址的数据验证信息,所述数据验证信息用于验证所述第一存储地址中的数据。
可选地,所述装置600还包括:
验证模块,用于根据所述第一读请求以及所述数据验证信息,对读取到的所述第一数据进行验证。
可选地,所述装置600还包括写入模块和记录模块:
所述接收模块602,还用于接收第一写请求,所述第一写请求用于向所述第一对象中写入所述第一数据;
所述写入模块,用于根据所述第一写请求,向所述存储设备的所述第一存储地址写入所述第一数据;
所述记录模块,用于在所述数据索引中记录所述第一存储地址。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
需要说明的是:上述实施例提供的数据处理装置在进行数据处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另 外,上述实施例提供的数据处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中,计算节点的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算节点执行上述数据处理方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (14)

  1. 一种数据处理方法,其特征在于,应用于计算节点,所述计算节点包含存储设备,所述存储设备中有第一对象的第一数据;所述计算节点还包含数据索引;所述数据索引中包含所述第一数据在所述存储设备中的第一存储地址;所述方法包括:
    接收第一读请求,所述第一读请求用于读取所述第一数据;
    基于所述第一数据的数据索引和所述第一读请求,读取所述存储设备的所述第一数据。
  2. 根据权利要求1所述的方法,其特征在于,所述第一读请求包括所述第一对象的对象标识、所述第一数据在所述第一对象中的起始偏移量和所述第一数据的数据量;
    所述基于所述第一数据的数据索引和所述第一读请求,读取所述存储设备中的所述第一数据包括:
    基于所述第一对象的对象标识、所述起始偏移量以及所述数据量,从所述数据索引中确定所述第一存储地址;
    从所述第一存储地址中读取所述第一数据。
  3. 根据权利要求1所述的方法,其特征在于,所述第一对象为文件系统中的索引节点inode;
    所述基于所述第一对象的对象标识、所述起始偏移量以及所述数据量,从所述数据索引中确定第一存储地址包括:
    基于所述第一对象的对象标识、所述起始偏移量以及所述数据量计算哈希值;
    基于所述哈希值,确定所述数据索引中包含的所述第一数据在所述存储设备中的第一存储地址。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述存储设备的第二存储地址中存储有所述第一存储地址的数据验证信息,所述数据验证信息用于验证所述第一存储地址中的数据。
  5. 根据权利要求4所述的方法,其特征在于,所述读取所述存储设备的所述第一数据之后,所述方法还包括:
    根据所述第一读请求以及所述数据验证信息,对读取到的所述第一数据进行验证。
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述接收第一读请求之前,所述方法还包括:
    接收第一写请求,所述第一写请求用于向所述第一对象中写入所述第一数据;
    根据所述第一写请求,向所述存储设备的所述第一存储地址写入所述第一数据;
    在所述数据索引中记录所述第一存储地址。
  7. 一种数据处理装置,其特征在于,所述装置包含存储设备,所述存储设备中有第一对 象的第一数据;所述装置还包含数据索引;所述数据索引中包含所述第一数据在所述存储设备中的第一存储地址;所述装置包括:
    接收模块,用于接收第一读请求,所述第一读请求用于读取所述第一数据;
    读取模块,用于基于所述第一数据的数据索引和所述第一读请求,读取所述存储设备的所述第一数据。
  8. 根据权利要求7所述的装置,其特征在于,所述第一读请求包括所述第一对象的对象标识、所述第一数据在所述第一对象中的起始偏移量和所述第一数据的数据量;
    所述读取模块,包括:
    确定单元,用于基于所述第一对象的对象标识、所述起始偏移量以及所述数据量,从所述数据索引中确定所述第一存储地址;
    读取单元,用于从所述第一存储地址中读取所述第一数据。
  9. 根据权利要求7所述的装置,其特征在于,所述第一对象为文件系统中的索引节点inode;所述确定单元用于:
    基于所述第一对象的对象标识、所述起始偏移量以及所述数据量计算哈希值;
    基于所述哈希值,确定所述数据索引中包含的所述第一数据在所述存储设备中的第一存储地址。
  10. 根据权利要求7-9任一所述的装置,其特征在于,所述存储设备的第二存储地址中存储有所述第一存储地址的数据验证信息,所述数据验证信息用于验证所述第一存储地址中的数据。
  11. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    验证模块,用于根据所述第一读请求以及所述数据验证信息,对读取到的所述第一数据进行验证。
  12. 根据权利要求7-11任一所述的装置,其特征在于,所述装置还包括写入模块和记录模块:
    所述接收模块,还用于接收第一写请求,所述第一写请求用于向所述第一对象中写入所述第一数据;
    所述写入模块,用于根据所述第一写请求,向所述存储设备的所述第一存储地址写入所述第一数据;
    所述记录模块,用于在所述数据索引中记录所述第一存储地址。
  13. 一种计算节点,其特征在于,所述计算节点包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述程序代码由所述处理器加载并执行以实现如权利要求1至权利要求6任一项所述的数据处理方法所执行的操作。
  14. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条程序代码,所述程序代码由处理器加载并执行以实现如权利要求1至权利要求6任一项所述的数据处理方法所执行的操作。
PCT/CN2021/114136 2020-10-23 2021-08-23 数据处理方法、装置、计算节点以及计算机可读存储介质 WO2022083267A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011148703.0 2020-10-23
CN202011148703.0A CN114490517A (zh) 2020-10-23 2020-10-23 数据处理方法、装置、计算节点以及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022083267A1 true WO2022083267A1 (zh) 2022-04-28

Family

ID=81291530

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114136 WO2022083267A1 (zh) 2020-10-23 2021-08-23 数据处理方法、装置、计算节点以及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN114490517A (zh)
WO (1) WO2022083267A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101656094A (zh) * 2009-09-25 2010-02-24 杭州华三通信技术有限公司 数据存储方法和存储设备
US20130036289A1 (en) * 2010-09-30 2013-02-07 Nec Corporation Storage system
CN102968498A (zh) * 2012-12-05 2013-03-13 华为技术有限公司 数据处理方法及装置
CN105653539A (zh) * 2014-11-13 2016-06-08 腾讯数码(深圳)有限公司 索引分布式存储的实现方法和装置
CN106339270A (zh) * 2016-08-26 2017-01-18 华为技术有限公司 数据校验方法及装置
CN111061680A (zh) * 2018-10-15 2020-04-24 北京京东尚科信息技术有限公司 一种数据检索的方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101656094A (zh) * 2009-09-25 2010-02-24 杭州华三通信技术有限公司 数据存储方法和存储设备
US20130036289A1 (en) * 2010-09-30 2013-02-07 Nec Corporation Storage system
CN102968498A (zh) * 2012-12-05 2013-03-13 华为技术有限公司 数据处理方法及装置
CN105653539A (zh) * 2014-11-13 2016-06-08 腾讯数码(深圳)有限公司 索引分布式存储的实现方法和装置
CN106339270A (zh) * 2016-08-26 2017-01-18 华为技术有限公司 数据校验方法及装置
CN111061680A (zh) * 2018-10-15 2020-04-24 北京京东尚科信息技术有限公司 一种数据检索的方法和装置

Also Published As

Publication number Publication date
CN114490517A (zh) 2022-05-13

Similar Documents

Publication Publication Date Title
US9830101B2 (en) Managing data storage in a set of storage systems using usage counters
US10956601B2 (en) Fully managed account level blob data encryption in a distributed storage environment
US10764045B2 (en) Encrypting object index in a distributed storage environment
US10127233B2 (en) Data processing method and device in distributed file storage system
US8762353B2 (en) Elimination of duplicate objects in storage clusters
US20190007208A1 (en) Encrypting existing live unencrypted data using age-based garbage collection
US8135918B1 (en) Data de-duplication for iSCSI
WO2014180232A1 (zh) 请求响应方法、装置及分布式文件系统
US11262916B2 (en) Distributed storage system, data processing method, and storage node
WO2014101108A1 (zh) 分布式存储系统的缓存方法、节点和计算机可读介质
CN110908589B (zh) 数据文件的处理方法、装置、系统和存储介质
WO2015039569A1 (zh) 副本存储装置及副本存储方法
KR20180117377A (ko) 분산 파일시스템 환경에서 사용자 수준 dma i/o를 지원하는 시스템 및 그 방법
CN113032335A (zh) 文件访问方法、装置、设备及存储介质
US20170318093A1 (en) Method and System for Focused Storage Access Notifications from a Network Storage System
CN115277145A (zh) 分布式存储访问授权管理方法、系统、设备和可读介质
CN114466083B (zh) 支持协议互通的数据存储系统
WO2014153931A1 (zh) 文件存储方法、装置、访问客户端及元数据服务器系统
CN111435286B (zh) 一种数据存储方法、装置和系统
WO2022083267A1 (zh) 数据处理方法、装置、计算节点以及计算机可读存储介质
WO2023024656A1 (zh) 数据访问方法、存储系统及存储节点
WO2023273803A1 (zh) 一种认证方法、装置和存储系统
US10503409B2 (en) Low-latency lightweight distributed storage system
CN116594551A (zh) 一种数据存储方法及装置
CN111221857B (zh) 从分布式系统中读数据记录的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21881686

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21881686

Country of ref document: EP

Kind code of ref document: A1