CN112632029B - Data management method, device and equipment of distributed storage system - Google Patents

Data management method, device and equipment of distributed storage system Download PDF

Info

Publication number
CN112632029B
CN112632029B CN202011407835.0A CN202011407835A CN112632029B CN 112632029 B CN112632029 B CN 112632029B CN 202011407835 A CN202011407835 A CN 202011407835A CN 112632029 B CN112632029 B CN 112632029B
Authority
CN
China
Prior art keywords
storage
node
data object
data
virtual node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011407835.0A
Other languages
Chinese (zh)
Other versions
CN112632029A (en
Inventor
李丹旺
夏伟强
丁光凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision System Technology Co Ltd
Original Assignee
Hangzhou Hikvision System Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision System Technology Co Ltd filed Critical Hangzhou Hikvision System Technology Co Ltd
Priority to CN202011407835.0A priority Critical patent/CN112632029B/en
Publication of CN112632029A publication Critical patent/CN112632029A/en
Application granted granted Critical
Publication of CN112632029B publication Critical patent/CN112632029B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The application provides a data management method, a data management device and data management equipment of a distributed storage system, and belongs to the technical field of distributed storage. The method comprises the following steps: receiving an object uploading request of a terminal, wherein the object uploading request carries a data object to be uploaded and is used for requesting to store the data object into a distributed storage system, the distributed storage system comprises a plurality of virtual nodes and a plurality of storage nodes, and one storage node is associated with at least one virtual node; determining a target virtual node corresponding to the data object; determining a first storage node associated with a target virtual node; the method comprises the steps of storing a data object to a first storage node, updating a version identification of a target virtual node and a storage time stamp of the data object to a storage index of the first storage node, wherein the version identification of the target virtual node and the storage time stamp are used for checking the version of the data object. The method and the device can ensure that the data object downloaded by the user is the latest version of data, and improve the storage performance.

Description

Data management method, device and equipment of distributed storage system
Technical Field
The present application relates to the field of data distributed storage technologies, and in particular, to a data management method, apparatus and device for a distributed storage system.
Background
In order to reduce the storage pressure of the terminal, the data may be stored in a distributed storage system. The distributed storage system comprises network equipment and a plurality of storage nodes, wherein the network equipment is used for receiving data objects to be stored of the terminal and storing the data objects to be stored into any storage node.
In the related art, a network device hashes keys (key values) of a data object to be stored with the number of storage nodes included in a distributed storage system to obtain node identifiers of the storage nodes for storing the data object, and stores the data object in the determined storage nodes based on the node identifiers of the storage nodes.
However, the same data object may have different versions of data, and when storage nodes in the distributed storage system are offline due to a failure, because the number of storage nodes in the distributed storage system changes, hash values based on the number of data objects and storage nodes change, which may cause the different versions of data of the same data object to be stored in different storage nodes, so that when a user subsequently downloads data of the data object, it cannot be guaranteed that the user downloads the latest version of data of the data object, thereby resulting in poor performance of the distributed storage system.
Disclosure of Invention
The embodiment of the application provides a data management method, a device and equipment for a distributed storage system, which can ensure that a data object downloaded by a user is the latest version of data, and improve the performance of the distributed storage system. The technical scheme is as follows:
in one aspect, a data management method of a distributed storage system is provided, where the method includes:
receiving an object uploading request of a terminal, wherein the object uploading request carries a data object to be uploaded and is used for requesting to store the data object into a distributed storage system, the distributed storage system comprises a plurality of virtual nodes and a plurality of storage nodes, and one storage node is associated with at least one virtual node;
determining a target virtual node mapped by the data object based on the object identifier of the data object and the number of the virtual nodes;
determining a first storage node associated with the target virtual node based on an association relation between the storage node and the virtual node;
and updating the version identification of the target virtual node and the storage timestamp of the data object into a storage index of the first storage node, wherein the version identification of the target virtual node and the storage timestamp are used for verifying the version of the data object when the terminal downloads the data object.
In one possible implementation, the storing the data object to the first storage node includes:
responding to the first storage node comprising a target disk data block, wherein the target disk data block is a disk data block used for storing a data object mapped to the target virtual node, and storing the data object into the target disk data block;
and responding to the first storage node not including the target disk data block, allocating a target disk data block to the target virtual node, and storing the data object into the allocated target disk data block.
In another possible implementation manner, the updating the version identifier of the target virtual node and the storage timestamp of the data object to the storage index of the first storage node includes:
in response to the version identification of the target virtual node and the storage timestamp of the data object not existing in the storage index, storing the version identification of the target virtual node and the storage timestamp of the data object into the storage index of the first storage node;
and in response to the version identification of the target virtual node and the storage time stamp of the data object existing in the storage index, replacing the version identification of the target virtual node and the storage time stamp of the data object with the stored version identification and the stored time stamp in the storage index respectively.
In another possible implementation manner, the method further includes:
determining a second storage node in the distributed storage system, wherein the second storage node is a backup storage node of the first storage node;
and backing up the data object to the second storage node.
In another possible implementation manner, the method further includes:
determining a third storage node from the distributed storage system in response to the second storage node being offline, and backing up the data object in the first storage node in the third storage node;
and responding to the offline of the first storage node, taking the second storage node as a main storage node of the target virtual node, determining a backup storage node for the second storage node from the distributed storage system, and backing up the data object in the second storage node to the backup storage node.
In another possible implementation manner, the method further includes:
in response to any storage node of the distributed storage system being offline, determining a virtual node associated with the offline storage node;
re-determining a storage node for the virtual node, and updating the version identification of the virtual node;
and updating the association relation between the storage node and the virtual node based on the updated version identification of the virtual node and the redetermined node identification of the storage node.
In another possible implementation manner, the method further includes:
adding the node identification of the redetermined storage node to an association track record of the virtual node, wherein the association track record comprises node identifications of a plurality of storage nodes which are historically associated with the virtual node and migration paths among the plurality of storage nodes;
and updating the associated track record to each storage node of the distributed storage system.
In another possible implementation manner, the method further includes:
receiving a first downloading request of the terminal, wherein the first downloading request carries an object identifier of a data object to be downloaded;
determining a target virtual node mapped by the data object based on the object identifier of the data object and the number of the virtual nodes;
determining a first storage node associated with the target virtual node based on the association relationship between the storage nodes and the virtual nodes;
and acquiring the data object from the first storage node, and sending the data object to the terminal.
In another possible implementation manner, the obtaining the data object from the first storage node includes:
sending a second download request to the first storage node, where the second download request carries a version identifier of the target virtual node and an object identifier of the data object, the version identifier is used for the first storage node to verify the version of the stored data object, and the object identifier of the data object is used for the first storage node to read the data object;
and receiving the data object sent by the first storage node.
In another aspect, the present application provides a data management method for a distributed storage system, where the method includes:
receiving a second downloading request sent by the network equipment, wherein the second downloading request carries the version identification of the target virtual node and the object identification of the data object to be downloaded;
acquiring the data of the latest version of the data object from a distributed storage system based on the version identification of the target virtual node, the object identification of the data object and a locally stored storage index, wherein the version identification of the target virtual node and the storage time stamp of the data object are stored in the storage index;
sending the latest version of the data object to the network device.
In one possible implementation, the obtaining the latest version of the data object from the distributed storage system based on the version identifier of the target virtual node, the object identifier of the data object, and the locally stored storage index includes:
in response to that the version identifier of the target virtual node in the locally stored storage index is the same as the version identifier of the target virtual node carried by the second download request, acquiring a storage address of the data object from the storage index based on the object identifier of the data object, and acquiring data of the latest version of the data object from the local based on the storage address;
and in response to that the version identification of the target virtual node in the locally stored storage index is different from the version identification of the target virtual node carried by the second download request, determining a storage node storing the latest version data of the data object from the distributed storage system, and acquiring the data of the latest version of the data object from the storage node.
In another possible implementation manner, the determining, from the distributed storage system, a storage node storing the latest version data of the data object includes:
acquiring an association track record of a target virtual node stored locally, wherein the association track record comprises node identifications of a plurality of storage nodes associated with the target virtual node history and migration paths among the plurality of storage nodes;
determining a plurality of storage nodes which are historically associated with the target virtual node from the associated track record;
and determining the latest storage node of the version identification of the associated target virtual node from the plurality of storage nodes based on the version identification of the target virtual node associated with each storage node.
In another possible implementation manner, the determining, from the plurality of storage nodes, a storage node whose associated version identifier is the latest based on the version identifier of the target virtual node associated with each storage node includes:
in response to the version identification of the associated target virtual node among the plurality of storage nodes identifying the latest at least two storage nodes, retrieving the storage node with the closest timestamp based on the timestamp of the data object stored by each storage node.
In another possible implementation manner, the method further includes:
periodically scanning the stored data objects;
in response to that a virtual node mapped by any data object is not a self-associated virtual node and the locally stored data is the data of the latest version of the data object, pushing the data of the latest version of the data object to a storage node currently associated with the virtual node;
and in response to the virtual node mapped by any data object not being the self-associated virtual node and the locally stored data not being the latest version of the data object, deleting the locally stored data of the data object.
In another possible implementation manner, the method further includes:
deleting the virtual node mapped by any data object in the associated track record;
and updating the deleted associated track record to each storage node of the distributed storage system.
In another aspect, a data management apparatus of a distributed storage system is provided, the apparatus including:
the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving an object uploading request of a terminal, the object uploading request carries a data object to be uploaded, and is used for requesting to store the data object into a distributed storage system, the distributed storage system comprises a plurality of virtual nodes and a plurality of storage nodes, and one storage node is associated with at least one virtual node;
a first determining module, configured to determine a target virtual node to which the data object is mapped based on the object identifier of the data object and the number of virtual nodes;
the second determination module is used for determining a first storage node related to the target virtual node based on the incidence relation between the storage node and the virtual node;
a storage module for storing the data object to the first storage node;
a first updating module, configured to update the version identifier of the target virtual node and the storage timestamp of the data object into a storage index of the first storage node, where the version identifier of the target virtual node and the storage timestamp are used to verify the version of the data object when the terminal downloads the data object.
In a possible implementation manner, the storage module is configured to, in response to that the first storage node includes a target disk data block, where the target disk data block is a disk data block used for storing a data object mapped to the target virtual node, store the data object in the target disk data block;
and responding to the first storage node not including the target disk data block, allocating a target disk data block to the target virtual node, and storing the data object into the allocated target disk data block.
In another possible implementation manner, the updating module is configured to store the version identifier of the target virtual node and the storage timestamp of the data object into the storage index of the first storage node in response to that the version identifier of the target virtual node and the storage timestamp of the data object do not exist in the storage index; and in response to the version identification of the target virtual node and the storage time stamp of the data object existing in the storage index, replacing the version identification of the target virtual node and the storage time stamp of the data object with the stored version identification and the stored time stamp in the storage index respectively.
In another possible implementation manner, the apparatus further includes:
a third determining module, configured to determine a second storage node in the distributed storage system, where the second storage node is a backup storage node of the first storage node;
and the backup module is used for backing up the data object to the second storage node.
In another possible implementation manner, the backup module is further configured to determine a third storage node from the distributed storage system in response to the second storage node being offline, and backup the data object in the first storage node in the third storage node;
and responding to the first storage node being offline, taking the second storage node as a main storage node of the target virtual node, determining a backup storage node for the second storage node from the distributed storage system, and backing up the data object in the second storage node to the backup storage node.
In another possible implementation manner, the apparatus further includes:
a fourth determining module, configured to determine, in response to an offline of any storage node of the distributed storage system, a virtual node associated with the offline storage node;
the second updating module is used for determining a storage node for the virtual node again and updating the version identifier of the virtual node; and updating the association relation between the storage node and the virtual node based on the updated version identification of the virtual node and the redetermined node identification of the storage node.
In another possible implementation manner, the apparatus further includes:
an adding module, configured to add the node identifier of the redetermined storage node to an associated track record of the virtual node, where the associated track record includes node identifiers of a plurality of storage nodes historically associated with the virtual node and migration paths between the plurality of storage nodes;
and the third updating module is used for updating the associated track record to each storage node of the distributed storage system.
In another possible implementation manner, the apparatus further includes:
a second receiving module, configured to receive a first download request of the terminal, where the first download request carries an object identifier of a data object to be downloaded;
a fifth determining module, configured to determine a target virtual node to which the data object is mapped based on the object identifier of the data object and the number of virtual nodes, and determine a first storage node associated with the target virtual node based on an association relationship between storage nodes and virtual nodes;
a first obtaining module, configured to obtain the data object from the first storage node;
and the first sending module is used for sending the data object to the terminal.
In another possible implementation manner, the obtaining module is configured to send a second download request to the first storage node, where the second download request carries a version identifier of the target virtual node and an object identifier of the data object, the version identifier is used for the first storage node to verify a version of the stored data object, and the object identifier of the data object is used for the first storage node to read the data object; and receiving the data object sent by the first storage node.
In another aspect, a data management apparatus of a distributed storage system is provided, the apparatus including:
a third receiving module, configured to receive a second download request sent by a network device, where the second download request carries a version identifier of a target virtual node and an object identifier of a data object to be downloaded;
a second obtaining module, configured to obtain, from a distributed storage system, data of a latest version of the data object based on the version identifier of the target virtual node and the object identifier of the data object, and a locally stored storage index, where the version identifier of the target virtual node and a storage timestamp of the data object are stored in the storage index;
a second sending module, configured to send the data of the latest version of the data object to the network device.
In a possible implementation manner, the second obtaining module is configured to, in response to that a version identifier of the target virtual node in a locally stored storage index is the same as a version identifier of the target virtual node carried in the second download request, obtain, based on an object identifier of the data object, a storage address of the data object from the storage index, and obtain, based on the storage address, data of a latest version of the data object from the local;
and in response to that the version identification of the target virtual node in the locally stored storage index is different from the version identification of the target virtual node carried by the second download request, determining a storage node storing the latest version data of the data object from the distributed storage system, and acquiring the data of the latest version of the data object from the storage node.
In another possible implementation manner, the second obtaining module is configured to obtain an association track record of a target virtual node stored locally, where the association track record includes node identifiers of a plurality of storage nodes associated with the target virtual node in history and migration paths between the plurality of storage nodes;
determining a plurality of storage nodes which are historically associated with the target virtual node from the associated track record;
and determining the latest storage node of the version identification of the associated target virtual node from the plurality of storage nodes based on the version identification of the target virtual node associated with each storage node.
In another possible implementation manner, the second obtaining module is configured to, in response to that the version identification of the associated target virtual node among the plurality of storage nodes identifies at least two storage nodes that are the latest, obtain a storage node whose timestamp is the latest based on a timestamp of the data object stored by each storage node.
In another possible implementation manner, the apparatus further includes:
a scanning module for periodically scanning the stored data objects;
the pushing module is used for responding to the situation that a virtual node mapped by any data object is not a self-associated virtual node and the locally stored data of the latest version of the data object is, pushing the data of the latest version of the data object to a storage node which is currently associated with the virtual node;
and the deleting module is used for deleting the data of any locally stored data object in response to that the virtual node mapped by any data object is not the virtual node associated with the virtual node and the locally stored data is not the latest version of the data object.
In another possible implementation manner, the apparatus further includes:
the deleting module is further configured to delete a virtual node mapped by any data object in the associated track record;
and the fourth updating module is used for updating the deleted associated track record to each storage node of the distributed storage system.
In another aspect, a network device is provided, where the network device includes a processor and a memory, and the memory stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the operations performed by the data management method of the distributed storage system.
In another aspect, a storage node is provided, where the storage node includes a processor and a memory, and the memory stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the operations performed by the data management method of the distributed storage system.
In another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor to implement the operations performed by the data management method of the distributed storage system.
In another aspect, a computer program product is provided, the computer program product comprising computer program code stored in a computer readable storage medium. The processor of the computer apparatus reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer apparatus performs the operations performed by the data management method of the distributed storage system described above.
In the embodiment of the application, on one hand, since the number of the virtual nodes in the distributed storage system is fixed, the same data object is mapped to the same virtual node, and the virtual node is associated with a storage node, it can be ensured that the data of the same data object is stored in the same storage node through two layers of address mapping. On the other hand, since the storage node stores therein the version identification of the virtual node and the time stamp of the storage data, the version identification of the virtual node and the time stamp of the storage data can be used to verify the version of the data. Therefore, according to the scheme, the data of the same data object can be stored in the same storage node, and the version of the data can be verified through the version identification and the storage timestamp of the virtual node, so that the terminal can be ensured to download the data of the latest version of the data object, and the performance of the distributed storage system is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;
fig. 2 is a schematic structural diagram of a distributed system provided in an embodiment of the present application;
fig. 3 is a flowchart of a data management method of a distributed storage system according to an embodiment of the present application;
fig. 4 is a diagram of a distributed object storage structure provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of an allocation relationship between a data object and a node according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating an allocation relationship between a data object and a disk data block according to an embodiment of the present application;
fig. 7 is a schematic diagram of an association relationship change after a node is offline according to an embodiment of the present application;
fig. 8 is a flowchart of a data management method of a distributed storage system according to an embodiment of the present application;
fig. 9 is a block diagram of a data management apparatus of a distributed storage system according to an embodiment of the present application;
fig. 10 is a block diagram of a data management apparatus of a distributed storage system according to an embodiment of the present application;
fig. 11 is a block diagram of a network device provided in an embodiment of the present application;
fig. 12 is a block diagram of a storage node according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. Referring to fig. 1, the implementation environment includes a terminal 101 and a distributed storage system 102; the terminal 101 is installed with a client associated with the distributed storage system 102, and performs data interaction with the distributed storage system 102 through the client, so as to store the data object of the terminal 101 to the distributed storage system 102, or read the stored data object from the distributed storage system 102.
The client can be a cloud storage client or any client capable of storing or reading data. In the embodiment of the present application, the client is taken as a cloud storage client as an example for description. The terminal 101 is at least one of a mobile phone, a tablet computer, a wearable device, or an intelligent home device, and in this embodiment, the terminal 101 is taken as the mobile phone for example.
Fig. 2 is a schematic diagram of a distributed storage system 102 according to an embodiment of the present application. Referring to fig. 2, the distributed storage system 102 includes: at least one network device and a plurality of storage nodes. And each network device and each storage node establish connection through a wireless or wired network. The network device is used for performing data interaction with the terminal 101, so as to store the data object of the terminal 101 in any storage node or read the stored data object from the storage node.
It should be noted that the distributed storage system 102 further includes a plurality of virtual nodes, and the virtual nodes are associated with the storage nodes. When the network device determines which storage node the data object of the terminal 101 is stored in or reads the stored data object from, the network device determines the hash value of the object identifier of the data object and the number of virtual nodes, determines the virtual node to which the data object is mapped, and stores the data object into or reads the data object from the storage node associated with the virtual node based on the virtual node.
The number of the virtual nodes and the number of the storage nodes can be the same or different; in the embodiment of the present application, it is described by taking an example that the number of the virtual nodes is different from that of the storage nodes, and the number of the virtual nodes is greater than that of the storage nodes, that is, one storage node is associated with at least one virtual node. In the embodiment of the present application, the number of virtual nodes is 65536 for example.
It should be noted that, the distributed storage system 102 further includes a management device, and the management device establishes a network connection with each network device and each storage node; the management equipment is used for monitoring the current state of each storage node; when it is monitored that one or some storage nodes in the distributed storage system 102 are offline, the storage nodes are re-allocated to the virtual nodes associated with the offline storage nodes, the version identifiers of the virtual nodes are updated, and the association relationship between the virtual nodes and the storage nodes is updated based on the updated version identifiers of the virtual nodes. For example, if the version identifier is a version number, the version number of the virtual node is increased. Wherein, the association relation stores the node identification of the storage node and the version identification of the virtual node; correspondingly, when a certain storage node is offline, the version identification of the virtual node associated with the storage node is updated, and the association relationship between the storage node and the virtual node is updated based on the updated version identification of the virtual node and the node identification of the storage node which is determined again.
The network device may be any GateWay such as a HGW (High Performance GateWay Service), the storage node may be a storage server, the management device may be any GateWay such as a MS (Monitor Service), and the Virtual node may be a VG node (Virtual Group).
Fig. 3 is a flowchart of a data association method of a distributed storage system according to an embodiment of the present application, and in the embodiment of the present application, a terminal uploads a data object as an example for description. Referring to fig. 3, the embodiment includes:
301. the terminal sends an object uploading request to the network equipment, wherein the object uploading request carries a data object to be uploaded and is used for requesting to store the data object into the distributed storage system.
The data object may be any type of file such as an image, a video, a document, and text, and in the embodiment of the present application, the data object is not particularly limited. The object uploading request carries the object identifier of the data object besides the data object to be uploaded, and the object identifier can be any identifier capable of identifying the data object; for example, the object is identified as KEY.
It should be noted that the terminal may send an object upload request to the network device through a restful (design style and Development mode of a network application) protocol or an object upload SDK (Software Development Kit) interface.
It should be noted that, the distributed storage system includes at least one network device, and in a possible implementation manner, the terminal may send the object upload request to any network device of the distributed storage system when the distributed storage system includes a plurality of network devices. In another possible implementation manner, different network devices manage different areas, that is, different management devices process object upload requests uploaded by terminals in different areas, and accordingly, a terminal selects a network device in an area where a management terminal is currently located from a plurality of network devices and sends the object upload request to the network device.
In the embodiment of the application, a plurality of network devices are arranged in the distributed storage system, so that the plurality of network devices can simultaneously process the object uploading requests of a plurality of terminals, and the response speed and the fault tolerance are improved.
Another point to be noted is that the data object may be any whole file, or a part of the content of a certain file; for example, in response to that the file is large, that is, the data amount of the file is greater than a preset data amount, the file is divided into a plurality of data fragments, each data fragment is used as a data object, the terminal sends a plurality of object upload requests to the network device, and each object upload request carries one data object. And in response to the file being smaller, namely the data volume of the file is not larger than the preset data volume, the terminal takes the file as a data object, and the terminal sends an object uploading request to the network equipment, wherein the object uploading request carries the data object.
For example, referring to fig. 4, if the file is a file and the file is large, the terminal divides the file into a plurality of data objects, namely data object 1 and data object 2 … …, and the object identifiers of data object 1 and data object 2 … … are KEY1 and KEY2 … … KEY, respectively.
302. And the network equipment receives the object uploading request, and determines a target virtual node corresponding to the data object based on the object identification of the data object and the number of virtual nodes.
The network equipment determines the hash value of the object identification of the data object and the number of the virtual nodes, and determines that the node identification is the target virtual node of the hash value. For example, if the object identifier of the data object is KEY and the number of virtual nodes is 65536, in this step, the network device hashes the KEY of the data object with 65536, and obtains the number of the target virtual node.
For example, with continued reference to FIG. 4, the KEY and 65536 for the data object 1 are hashed to obtain a target virtual node VG1, and the KEY and 65536 for the data object n are hashed to obtain a target virtual node VG 8.
It should be noted that, because the number of virtual nodes in the distributed storage system is fixed, the target virtual nodes to which the same data object is mapped are the same, so that it can be ensured that the same data object can be mapped to the same virtual node.
303. The network equipment determines a first storage node associated with the target virtual node based on the association relationship between the storage nodes and the virtual nodes.
The network device stores an association relationship between the storage node and the virtual node, and the version identification of the virtual node and the node identification of the storage node are stored in the association relationship. The version identifier of the virtual node may be a version number of the virtual node, and the node identifier of the storage node may be a number of the storage node. In this step, the network device obtains a node identifier associated with the target virtual node from the association relationship according to the version identifier of the virtual node, and determines a first storage node corresponding to the node identifier.
It should be noted that, in the embodiment of the present application, the data is stored in a primary/standby manner, so that the security of the data object is improved. In a possible implementation manner, the network device stores an association relationship among the main storage node, the backup storage node, and the virtual node, and in this step, the network device determines a first storage node and a second storage node associated with the target virtual node based on the association relationship among the main storage node, the backup storage node, and the virtual node, where the first storage node is the main storage node associated with the target virtual node, and the second storage node is the backup storage node associated with the target virtual node.
For example, referring to fig. 5, a schematic diagram of the association relationship between the storage nodes and the virtual nodes is provided, in the diagram, the target virtual node corresponding to the data object 1 is VG2, and the storage nodes corresponding to VG1 are node1 and node2, where node1 and node2 are the main storage node and the backup storage node, respectively; the storage nodes corresponding to VG2 are node2 and node3, where node2 and node3 are a main storage node and a backup storage node, respectively; the storage nodes corresponding to VG4 are node1 and node3, where node1 and node4 are a main storage node and a backup storage node, respectively; the storage nodes corresponding to VG4 are node2 and node3, where node2 and node3 are the main storage node and the backup storage node, respectively.
It should be noted that, when a certain storage node or certain storage nodes in the distributed storage system are offline, the network device or the management device updates the association relationship; in the embodiment of the present application, an example in which the management device updates the association relationship and synchronizes the updated association relationship to the network device is described. Wherein the step of updating the association relationship by the management device may be implemented by the following steps (1) to (3), including:
(1) the management device determines a virtual node associated with the offline storage node.
In the embodiment of the application, a latest node list and an association track record of each virtual node are maintained in a management device, a node identifier of a plurality of storage nodes which are historically associated with the virtual node and a migration path between the plurality of storage nodes are stored in the association track record of any virtual node, and the node list comprises the node identifier of an online storage node in a distributed storage system. Each storage node in the distributed storage system needs to periodically synchronize the latest node list and the associated track record of each virtual node from the management device. In response to the management device not receiving a synchronization request of any storage node in the current period, the synchronization request being used for requesting synchronization of the latest node list and the associated track record of each virtual node, the management device determines that the storage node is offline.
The synchronization period may be set and changed as needed, and in the embodiment of the present application, the synchronization period is not specifically limited; for example, if the synchronization period is 30 seconds, in response to the management node not receiving the synchronization request sent by any storage node within 30 seconds before the current time, the management node determines that the storage node is offline.
It should be noted that, after determining that a certain storage node is offline, the management device updates the latest node list.
(2) The management device re-determines the storage node for the virtual node and updates the version identification of the virtual node.
The management device selects one storage node from the storage nodes in the online state in the distributed storage system, and takes the storage node as a new storage node associated with the virtual node. For example, if the version identifier of the virtual node is a version number, the management device increases the version number of the virtual node by 1.
It should be noted that the management device may randomly select one storage node from the storage nodes in an online state in the distributed storage system; or the management device takes the backup storage node of the virtual node as the main storage node of the virtual node; or the management device selects the storage node with the lowest load from the distributed storage system according to the principle of load balancing.
(3) And updating the association relation between the storage node and the virtual node by the management equipment based on the updated version identification of the virtual node and the redetermined node identification of the storage node.
And the management equipment modifies the node identifier of the storage node associated with the virtual node in the association relationship into the node identifier of the redetermined storage node, and updates the version identifier of the virtual node in the association relationship into the updated version identifier of the virtual node.
It should be noted that the management device further updates an association track record of the virtual node, where the association track record includes node identifiers of a plurality of storage nodes associated with the virtual node history and migration paths between the plurality of storage nodes, and updates the association track record to each storage node of the distributed storage system.
For example, the virtual node is VG1, the storage node originally associated with VG1 is node2, and in response to node2 going offline, the storage node newly allocated to VG1 is node3, and the associated track record includes node 2-node 3.
The management device may directly send the updated association track record of the virtual node to each storage node in the distributed storage system, or send the updated association track record to the network device, and send the updated association track record to each storage node in the distributed storage system. The association device may also send the association track record to each storage node when each storage node requests synchronization of the association track record, or synchronize the association track record to the network device first, and send the association track record to each storage node when each storage node requests synchronization of the association track record. In the embodiment of the present application, a manner and a timing for the management device to store the association logic record to each storage node are not particularly limited.
304. The network device stores the data object to the first storage node.
In one possible implementation, the network device may store the data object in any disk data block of the first storage node. In another possible implementation manner, the network device allocates a disk data block to the data object, and the allocation principle is that all object data in the same disk data block are sourced from the same virtual node. Correspondingly, the steps can be as follows:
responding to the first storage node comprising a target disk data block, wherein the target disk data block is a disk data block used for storing a data object mapped to a target virtual node, and the network equipment stores the data object into the target disk data block; and responding to the first storage node not including the target disk data block, allocating a target disk data block for the target virtual node, and storing the data object into the allocated target disk data block by the network equipment.
The size of the disk data block can be set and changed as required, and in the embodiment of the application, the disk data block is not specifically limited; for example, the size of the disk data block is 64 MB.
In the embodiment of the application, the data object is stored in a master-slave backup mode so as to further improve the safety of the data object; correspondingly, after the network device stores the data object to the first storage node, the method further includes: the network equipment determines a second storage node in the distributed storage system, wherein the second storage node is a backup storage node of the first storage node; and backing up the data object to the second storage node.
It should be noted that, in a possible implementation manner, when the network device determines the first storage node in step 303, it may determine the second storage node; in another possible implementation manner, the network device stores an association relationship between the main storage node and the backup storage node, and in this step, the network device obtains the backup storage node corresponding to the first storage node, that is, the second storage node, from the association relationship between the main storage node and the backup storage node according to the node identifier of the first storage node.
In a possible implementation manner, the network device may synchronize a data object to the second storage node every time the data object is stored in the first storage node, so that timeliness of data storage is improved, and stability of data is further improved. The network device may also store a plurality of data objects in the first storage node, and synchronize the plurality of data objects to the second storage node, thereby saving resources.
In this step, the second storage node is determined by the management device, and the data object is backed up to the second storage node. In another possible implementation manner, the first storage node may also determine a second storage node, and the data object may be backed up to the second storage node.
It should be noted that the storage resource of the storage node may be an SSD (Solid State Disk). For example, with continued reference to fig. 4, the storage node associated with VG1 and VG2 is storage node1, the storage resource of storage node1 is SSD1, the storage node associated with VG3 and VG4 is storage node2, the storage resource of storage node2 is SSD2, the storage node associated with VG5 and VG6 is storage node3, the storage resource of storage node3 is SSD3, the storage node associated with VG7 and VG8 is storage node4, and the storage resource of storage node4 is SSD 4; the storage node1 and the storage node2 are backup storage nodes, and the storage node3 and the storage node4 are backup storage nodes.
It should be noted that the manner of backing up the data object in the second storage node by the network device is the same as the manner of storing the data object in the first storage node; that is, the second storage node stores the data object in the disk data block in the same manner of allocating the disk data block. For example, after the first storage node opens a new disk data block blockA for the first time, the second storage node also opens a new disk data block blockB, and when a data object is subsequently written into the blockA in the first storage node, the data object is backed up into the blockB.
For example, referring to FIG. 6, the object identifications for data object 1, data object 2, and data object 3 are KEY1, KEY2, and KEY3, respectively; the virtual nodes corresponding to data object 1, data object 2 and data object 3 are VG0, VG3 and VG2, respectively, and the disk data blocks in the virtual nodes VG0, VG3 and VG2 for storing data object 1, data object 2 and data object 3 are Block1, Block4 and Block3, respectively.
Another point to be explained is that when a certain main storage node or backup storage node is offline, the network device or the management device reselects the main storage node or the backup storage node for the offline node; and if the main storage node or the backup storage node is reselected for the offline node by the associated equipment, synchronizing the result of the redetermination into the network equipment so that the network equipment carries out data migration based on the redetermined main storage node or backup storage node. In the embodiment of the present application, a description is given by taking an example that a network device reselects a main storage node or a backup storage node as an offline node.
When a certain main storage node or backup storage node is offline, the processing procedure of the network device is as follows:
in response to the second storage node being offline (i.e. the backup storage node being offline), the network device determines a third storage node from the distributed storage system, and backs up the data object in the first storage node to the third storage node; in response to the first storage node being offline (i.e., the primary storage node being offline), the network device determines a backup storage node for the second storage node from the distributed storage system, using the second storage node as the primary storage node of the target virtual node, and backs up the data object in the second storage node to the backup storage node.
In the embodiment of the application, when the main storage node is offline, the network device may use the backup storage node as the main storage node and reselect the backup storage node; when the backup storage node is offline, the network can reselect one backup storage node, so that data migration is reduced as much as possible, and the efficiency of the data migration is improved.
It should be noted that when a certain main storage node or backup storage node goes offline, the association relationship between the main storage node, the backup storage node, and the virtual node needs to be updated again. For example, before the node is offline, the main storage node and the backup storage node associated with VG1 are respectively [ node1, node2 ], and when the main storage node1 is offline, the association relationship of VG1 is changed to VG1 [ node2, node3 ], that is, the original backup storage node2 is promoted to the main storage node, and the newly allocated backup storage node is node 3. For another example, before the node is offline, the main storage node and the backup storage node associated with VG1 are respectively [ node1, node2 ], and when the backup storage node2 is offline, the association relationship of VG1 is changed to VG [ node1, node3 ], that is, the original main storage node1 is still the main storage node, and a backup storage node is reallocated to the main storage node.
For example, referring to fig. 7, storage nodes corresponding to VG1 are node1 and node2, respectively, where node1 and node2 are a main storage node and a backup storage node, respectively; the storage nodes corresponding to VG2 are node2 and node3, where node2 and node3 are a main storage node and a backup storage node, respectively; the storage nodes corresponding to VG4 are node1 and node3, where node1 and node4 are a main storage node and a backup storage node, respectively; the storage nodes corresponding to VG4 are node2 and node3, where node2 and node3 are the main storage node and the backup storage node, respectively. Responding to the node1 offline, the storage nodes corresponding to the updated VG1 are node2 and node3 respectively, wherein node2 and node3 are a main storage node and a backup storage node respectively; the storage nodes corresponding to VG4 are node3 and node2, where node3 and node2 are the main storage node and the backup storage node, respectively.
305. And the network equipment updates the version identification of the target virtual node and the storage time stamp of the data object into the storage index of the first storage node.
In response to the absence of the version identification of the target virtual node and the storage timestamp of the data object in the storage index, the network device updates the version identification of the target virtual node and the storage timestamp of the data object into the storage index of the first storage node; in response to the version identification of the target virtual node and the storage timestamp of the data object existing in the storage index, the network device replaces the stored version identification and the stored timestamp of the target virtual node and the stored timestamp of the data object in the storage index respectively.
In this embodiment of the application, in response to that a certain data object already exists in the first storage node or the second storage node, after the newly written data object is stored, the new index directly overwrites the old index, and the storage index of each first storage node includes the latest version number and the latest writing time stamp of the currently associated virtual node, so that the latest version data of the data object can be read based on the storage index.
It should be noted that the network device also updates the version identifier of the target virtual node and the storage timestamp of the data object into the storage index of the second storage node, so that after the second storage node is promoted to be the main storage node, data is read based on the storage index, and the reliability of the data is further improved.
It should be noted that, in this step, the version identifier of the target virtual node and the storage timestamp of the data object are updated to the storage index of the first storage node by the network device, for example, as described above. In another possible implementation manner, the version identifier of the target virtual node and the storage timestamp of the data object may also be updated to the storage index of the first storage node by the first storage node.
In the embodiment of the application, the storage index is stored in the first storage node, but not in the network device or the management device, so that the central management can be removed, and the network device or the management device is prevented from having an object query bottleneck and being incapable of being expanded horizontally; the solution of the present application thus makes use of the lateral extension of the distributed storage system, i.e. of adding storage nodes in the distributed storage system.
It should be noted that the storage index includes, in addition to the version identifier of the target virtual node and the storage timestamp of the data object, a storage address of a disk data block where the data object is located, and the data object is subsequently read from the disk data block based on the storage address.
It should be noted that each disk data block stores a block task record, where the block task record includes an identifier of the disk data block, a node identifier of a storage node where the disk data block is located, a node identifier of a virtual node associated with the storage node, and a node identifier of a backup storage node of the storage node, so that after a subsequent storage node takes a latest node list, the disk data block is checked.
Wherein, the storage node regularly patrols and examines the disk data block, checks the virtual node that the disk data block belongs to, and this process includes:
(1) for a data object corresponding to any disk data block, in response to that the storage node is a main storage node of a virtual node to which the data object is mapped, the storage node checks a block task record of the disk data block again, if a backup storage node in the block task record is empty or the backup storage node is not a backup storage node specified by the current virtual node, it indicates that a data duplicate is missing, and performs asynchronous replication, that is, backups the data of the disk data block to the backup storage node, thereby ensuring the data duplicate of the current virtual node, and after the asynchronous replication is completed, updates a node identifier of the backup storage node in the block task record.
(2) And responding to the storage node being a backup storage node of the virtual node mapped by the data object, and not processing.
(3) In response to the storage node not being the storage node associated with the virtual node, migrating and copying the disk data block to a main storage node currently associated with the virtual node; in the data migration path, when a data object conflicts with the virtual node (that is, the data object is distributed in a plurality of storage nodes), only the latest object data (the version identifiers of the virtual nodes in the index record of the object data are preferentially compared, and the storage timestamps are compared if the version identifiers are the same) is reserved, and after the storage node completes migration of all data corresponding to the virtual node, the disk data block of the local virtual node is deleted and the identifier of the storage node in the association track record of the virtual node is erased.
306: the network device sends a notification message to the terminal, wherein the notification message is used for notifying the storage state of the data object.
For example, when the network device stores the data object in the target disk data block of the first storage node, the notification message is "successful storage, please know".
In the embodiment of the application, on one hand, since the number of the virtual nodes in the distributed storage system is fixed, the same data object is mapped to the same virtual node, and the virtual node is associated with a storage node, it can be ensured that the data of the same data object is stored in the same storage node through two layers of address mapping. On the other hand, since the storage node stores therein the version identification of the virtual node and the time stamp of the storage data, the version identification of the virtual node and the time stamp of the storage data can be used to verify the version of the data. Therefore, according to the scheme, the data of the same data object can be stored in the same storage node, and the version of the data can be verified through the version identification and the storage timestamp of the virtual node, so that the terminal can be ensured to download the data of the latest version of the data object, and the performance of the distributed storage system is improved.
Fig. 8 is a flowchart of a data association method of a distributed storage system according to an embodiment of the present application, and in the embodiment of the present application, a terminal downloads a data object as an example for description. Referring to fig. 8, the embodiment includes:
801. the terminal sends a first downloading request to the network equipment, wherein the first downloading request carries the object identification of the data object to be downloaded.
It should be noted that the terminal may send the first download request to the network device through the restful protocol or the object upload SDK interface.
802. The network device receives the first downloading request, and determines a target virtual node mapped by the data object based on the object identification of the data object and the number of virtual nodes included in the distributed storage system.
In this step and step 302, the process of determining, by the network device, the target virtual node corresponding to the data object is the same based on the object identifier of the data object and the number of virtual nodes included in the distributed storage system, which is not described herein again.
803. The network equipment determines a first storage node associated with the target virtual node based on the association relationship between the storage nodes and the virtual nodes.
This step is the same as step 303 and is not described herein again.
804. And the network equipment sends a second downloading request to the first storage node, wherein the second downloading request carries the version identification of the target virtual node and the object identification of the data object.
It should be noted that the latest version identifier of the target virtual node is stored in the network device, and the version identifier of the target virtual node is updated once each time the storage node associated with the target virtual node changes.
805. And the first storage node receives the second downloading request, and acquires the data of the latest version of the data object from the distributed storage system based on the version identification of the target virtual node, the object identification of the data object and the locally stored storage index.
The storage index stores the version identification of the target virtual node and the storage time stamp of the data object; based on the stored index. Correspondingly, the step of the first storage node acquiring the latest version of the data object from the distributed storage system based on the version identifier of the target virtual node, the object identifier of the data object, and the locally stored storage index includes:
and in response to that the version identification of the target virtual node in the locally stored storage index is the same as the version identification of the target virtual node carried by the second download request, acquiring the storage address of the data object from the storage index, and acquiring the data of the latest version of the data object from the local based on the storage address. And in response to that the version identification of the target virtual node in the locally stored storage index is different from the version identification of the target virtual node carried by the second download request, determining a storage node storing the latest version data of the data object from the distributed storage system, and acquiring the latest version data of the data object from the storage node.
The first storage node determines a storage node storing the latest version data of the data object from the distributed storage system, and the step of acquiring the latest version data of the data object from the storage node comprises the following steps:
a first storage node acquires an association track record of a locally stored target virtual node, wherein the association track record comprises node identifications of a plurality of storage nodes which are associated with the target virtual node in history and migration paths among the plurality of storage nodes; determining a plurality of storage nodes which are historically associated with the target virtual node from the associated track record; and determining the latest storage node of the version identification of the associated target virtual node from the plurality of storage nodes based on the version identification of the target virtual node associated with each storage node.
The method for determining the storage node with the latest version identification of the associated target virtual node from the plurality of storage nodes by the first storage node based on the version identification of the target virtual node associated with each storage node comprises the following steps:
in response to the version identification of the associated target virtual node among the plurality of storage nodes identifying the latest at least two storage nodes, the first storage node obtains the storage node with the latest timestamp based on the timestamp of the data object stored by each storage node.
In the embodiment of the application, when downloading the object data, the user preferentially downloads the object data from the storage node associated with the current latest virtual object, and if the version number of the virtual object recorded in the object index in the latest storage node is lower than the current latest version number, the user tries to acquire the object data from other storage nodes according to the associated track record until the latest object data is obtained through comparison.
806. The first storage node sends the data object to the network device.
It should be noted that each storage node periodically scans its own object data, pushes the object data that does not belong to its own management to the storage node associated with the current virtual object, and compares the old object data with the new object data before pushing, and if the data is longer, deletes the object data directly. After the pushing and cleaning of certain virtual object data of the local node are completed, the local node information in the historical track is deleted, so that the problem of download performance attenuation caused by the change of the incidence relation between the virtual node and the storage node is solved. The process comprises the following steps:
the first storage node periodically scans the stored data objects; in response to that a virtual node mapped by any data object is not a self-associated virtual node and the locally stored data is the data of the latest version of the data object, pushing the data of the latest version of the data object to a storage node currently associated with the virtual node; and in response to the virtual node mapped by any data object not being the self-associated virtual node and the locally stored data not being the latest version of the data object, deleting the locally stored data of the data object.
It should be noted that, when a virtual node mapped by any data object is not a virtual node associated with itself, the first storage node further deletes the virtual node mapped by the data object in the associated track record, and updates the deleted associated track record to each storage node of the distributed storage system, so that the latest associated track record can be stored in each storage node, and accurate data reading can be performed based on the associated track record.
807. The network equipment receives the data object and sends the data object to the terminal.
808. And the terminal receives the data object sent by the network equipment.
In the embodiment of the application, a version management mechanism is introduced when the object is repeatedly uploaded and written, the version number and the write time stamp which are added by a virtual group are used for verification, the latest data can be normally checked when the object data is downloaded, and the background completes multi-version de-duplication of the object through an object data migration mechanism, so that the performance of the distributed storage system is improved.
All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.
Fig. 9 is a block diagram of a data management apparatus of a distributed storage system according to an embodiment of the present application.
Referring to fig. 9, the apparatus includes:
a first receiving module 901, configured to receive an object uploading request of a terminal, where the object uploading request carries a data object to be uploaded and is used to request to store the data object in a distributed storage system, where the distributed storage system includes a plurality of virtual nodes and a plurality of storage nodes, and one storage node is associated with at least one virtual node;
a first determining module 902, configured to determine a target virtual node to which a data object is mapped based on an object identifier of the data object and the number of virtual nodes;
a second determining module 903, configured to determine, based on an association relationship between a storage node and a virtual node, a first storage node associated with a target virtual node;
a storage module 904 for storing the data object to the first storage node;
the first updating module 905 is configured to update the version identifier of the target virtual node and the storage timestamp of the data object into the storage index of the first storage node, where the version identifier and the storage timestamp of the target virtual node are used to check the version of the data object when the terminal downloads the data object.
In a possible implementation manner, the storage module 904 is configured to store the data object into a target disk data block in response to that the first storage node includes the target disk data block, where the target disk data block is a disk data block used for storing the data object mapped to the target virtual node;
and responding to the first storage node not including the target disk data block, allocating a target disk data block for the target virtual node, and storing the data object into the allocated target disk data block.
In another possible implementation manner, the updating module is configured to store the version identifier of the target virtual node and the storage timestamp of the data object into the storage index of the first storage node in response to that the version identifier of the target virtual node and the storage timestamp of the data object do not exist in the storage index; and in response to the version identification of the target virtual node and the storage time stamp of the data object existing in the storage index, replacing the stored version identification and the stored time stamp of the target virtual node and the stored time stamp of the data object in the storage index respectively.
In another possible implementation manner, the apparatus further includes:
the third determining module is used for determining a second storage node in the distributed storage system, wherein the second storage node is a backup storage node of the first storage node;
and the backup module is used for backing up the data object to the second storage node.
In another possible implementation manner, the backup module is further configured to determine a third storage node from the distributed storage system in response to the second storage node being offline, and backup the data object in the first storage node in the third storage node;
and responding to the offline of the first storage node, taking the second storage node as a main storage node of the target virtual node, determining a backup storage node for the second storage node from the distributed storage system, and backing up the data object in the second storage node to the backup storage node.
In another possible implementation manner, the apparatus further includes:
the fourth determining module is used for responding to the offline of any storage node of the distributed storage system and determining the virtual node related to the offline storage node;
the second updating module is used for determining the storage node for the virtual node again and updating the version identifier of the virtual node; and updating the association relation between the storage node and the virtual node based on the updated version identification of the virtual node and the redetermined node identification of the storage node.
In another possible implementation manner, the apparatus further includes:
the adding module is used for adding the node identification of the redetermined storage node to an associated track record of the virtual node, wherein the associated track record comprises the node identifications of a plurality of storage nodes which are associated with the virtual node history and migration paths among the plurality of storage nodes;
and the third updating module is used for updating the associated track record to each storage node of the distributed storage system.
In another possible implementation manner, the apparatus further includes:
the second receiving module is used for receiving a first downloading request of the terminal, wherein the first downloading request carries an object identifier of a data object to be downloaded;
a fifth determining module, configured to determine a target virtual node mapped by the data object based on the object identifier of the data object and the number of virtual nodes, and determine a first storage node associated with the target virtual node based on an association relationship between the storage node and the virtual node;
the first acquisition module is used for acquiring the data object from the first storage node;
and the first sending module is used for sending the data object to the terminal.
In another possible implementation manner, the obtaining module is configured to send a second download request to the first storage node, where the second download request carries a version identifier of the target virtual node and an object identifier of the data object, the version identifier is used for the first storage node to verify a version of the stored data object, and the object identifier of the data object is used for the first storage node to read the data object; and receiving the data object sent by the first storage node.
Fig. 10 is a block diagram of a data management apparatus of a distributed storage system according to an embodiment of the present application.
Referring to fig. 10, the apparatus includes:
a third receiving module 1001, configured to receive a second download request sent by a network device, where the second download request carries a version identifier of a target virtual node and an object identifier of a data object to be downloaded;
a second obtaining module 1002, configured to obtain, from the distributed storage system, data of a latest version of the data object based on the version identifier of the target virtual node and the object identifier of the data object, and a locally stored storage index, where the version identifier of the target virtual node and a storage timestamp of the data object are stored in the storage index;
a second sending module 1003, configured to send the latest version of the data object to the network device.
In a possible implementation manner, the second obtaining module 1002 is configured to, in response to that the version identifier of the target virtual node in the locally stored storage index is the same as the version identifier of the target virtual node carried in the second download request, obtain, based on the object identifier of the data object, a storage address of the data object from the storage index, and obtain, based on the storage address, data of the latest version of the data object locally;
and in response to that the version identification of the target virtual node in the locally stored storage index is different from the version identification of the target virtual node carried by the second download request, determining a storage node storing the latest version data of the data object from the distributed storage system, and acquiring the latest version data of the data object from the storage node.
In another possible implementation manner, the second obtaining module 1002 is configured to obtain an association track record of a target virtual node stored locally, where the association track record includes node identifiers of a plurality of storage nodes associated with the target virtual node in history and migration paths between the plurality of storage nodes;
determining a plurality of storage nodes which are historically associated with the target virtual node from the associated track record;
and determining the latest storage node of the version identification of the associated target virtual node from the plurality of storage nodes based on the version identification of the target virtual node associated with each storage node.
In another possible implementation manner, the second obtaining module 1002 is configured to, in response to that the version of the target virtual node associated with the storage node identifies at least two storage nodes that are the latest, obtain a storage node with the latest timestamp based on a timestamp of a data object stored by each storage node.
In another possible implementation manner, the apparatus further includes:
a scanning module for periodically scanning the stored data objects;
the pushing module is used for pushing the data of the latest version of any data object to the storage node which is currently associated with the virtual node in response to the fact that the virtual node mapped by any data object is not the virtual node associated with the virtual node and the locally stored data of the latest version of any data object;
and the deleting module is used for deleting the data of any locally stored data object in response to that the virtual node mapped by any data object is not the virtual node associated with the virtual node and the locally stored data is not the latest version of any data object.
In another possible implementation manner, the apparatus further includes:
the deleting module is further used for deleting the virtual node mapped by any data object in the associated track record;
and the fourth updating module is used for updating the deleted associated track record to each storage node of the distributed storage system.
It should be noted that: in the data management apparatus of the distributed storage system according to the foregoing embodiment, only the division of the functional modules is illustrated in the data management of the distributed storage system, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the terminal is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the data management apparatus of the distributed storage system and the data management method embodiment of the distributed storage system provided in the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.
Fig. 11 is a block diagram of a network device according to an exemplary embodiment, where the network device 1100 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1101 and one or more memories 1102, where the memory 1102 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1101 to implement the data management method of the distributed storage system provided by the above method embodiments. Certainly, the network device 1100 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the network device 1100 may also include other components for implementing device functions, which are not described herein again.
In an exemplary embodiment, a computer readable storage medium is also provided, in which at least one instruction is stored, and the at least one instruction is executable by a processor in the network device 1100 to complete the data management method of the distributed storage system in the above embodiment. For example, the computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.
The present disclosure also provides a computer program product, wherein when instructions in the computer program product are executed by a processor of a network device, the network device 1100 is enabled to execute the data management method of the distributed storage system provided by the above method embodiments.
Fig. 12 is a block diagram of a storage node 1200, which may have a relatively large difference due to different configurations or performances, according to an example embodiment, and may include one or more processors (CPUs) 1201 and one or more memories 1202, where the memory 1202 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1201 to implement the data management method of the distributed storage system provided by the above method embodiments. Certainly, the storage node 1200 may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the storage node 1200 may further include other components for implementing device functions, which are not described herein again.
In an exemplary embodiment, a computer readable storage medium is also provided, in which at least one instruction is stored, and the at least one instruction is executable by a processor in the storage node 1200 to complete the data management method of the distributed storage system in the above embodiment. For example, the computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.
The present disclosure also provides a computer program product, wherein instructions of the computer program product, when executed by a processor of the storage node 1200, enable the storage node 1200 to execute the data management method of the distributed storage system provided by the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (20)

1. A method of data management for a distributed storage system, the method comprising:
receiving an object uploading request of a terminal, wherein the object uploading request carries a data object to be uploaded and is used for requesting to store the data object into a distributed storage system, the distributed storage system comprises a plurality of virtual nodes and a plurality of storage nodes, and one storage node is associated with at least one virtual node;
determining a target virtual node mapped by the data object based on the object identifier of the data object and the number of the virtual nodes;
determining a first storage node associated with the target virtual node based on the association relationship between the storage nodes and the virtual nodes;
the data object is stored in the first storage node, the version identification of the target virtual node and the storage timestamp of the data object are updated in the storage index of the first storage node, the version identification of the target virtual node and the storage timestamp are used for checking the version of the data object when the terminal downloads the data object, and the version identification of the target virtual node is updated when the associated storage node changes.
2. The method of claim 1, wherein storing the data object to the first storage node comprises:
responding to the first storage node comprising a target disk data block, wherein the target disk data block is a disk data block used for storing a data object mapped to the target virtual node, and storing the data object into the target disk data block;
and responding to the first storage node not including the target disk data block, allocating a target disk data block to the target virtual node, and storing the data object into the allocated target disk data block.
3. The method of claim 1, wherein updating the version identification of the target virtual node and the storage timestamp of the data object into the storage index of the first storage node comprises:
in response to the version identification of the target virtual node and the storage timestamp of the data object not existing in the storage index, storing the version identification of the target virtual node and the storage timestamp of the data object into the storage index of the first storage node;
and in response to the version identification of the target virtual node and the storage time stamp of the data object existing in the storage index, replacing the version identification of the target virtual node and the storage time stamp of the data object with the stored version identification and the stored time stamp in the storage index respectively.
4. The method of claim 1, further comprising:
determining a second storage node in the distributed storage system, wherein the second storage node is a backup storage node of the first storage node;
and backing up the data object to the second storage node.
5. The method of claim 4, further comprising:
determining a third storage node from the distributed storage system in response to the second storage node being offline, and backing up the data object in the first storage node in the third storage node;
and responding to the first storage node being offline, taking the second storage node as a main storage node of the target virtual node, determining a backup storage node for the second storage node from the distributed storage system, and backing up the data object in the second storage node to the backup storage node.
6. The method of claim 1, further comprising:
in response to any storage node of the distributed storage system being offline, determining a virtual node associated with the offline storage node;
re-determining a storage node for the virtual node, and updating the version identification of the virtual node;
and updating the association relation between the storage node and the virtual node based on the updated version identification of the virtual node and the redetermined node identification of the storage node.
7. The method of claim 6, further comprising:
adding the node identification of the redetermined storage node to an association track record of the virtual node, wherein the association track record comprises node identifications of a plurality of storage nodes which are historically associated with the virtual node and migration paths among the plurality of storage nodes;
and updating the associated track record to each storage node of the distributed storage system.
8. The method of claim 1, further comprising:
receiving a first downloading request of the terminal, wherein the first downloading request carries an object identifier of a data object to be downloaded;
determining a target virtual node mapped by the data object based on the object identifier of the data object and the number of the virtual nodes;
determining a first storage node associated with the target virtual node based on the association relationship between the storage nodes and the virtual nodes;
and acquiring the data object from the first storage node, and sending the data object to the terminal.
9. The method of claim 8, wherein said retrieving the data object from the first storage node comprises:
sending a second download request to the first storage node, where the second download request carries a version identifier of the target virtual node and an object identifier of the data object, the version identifier is used for the first storage node to verify the version of the stored data object, and the object identifier of the data object is used for the first storage node to read the data object;
and receiving the data object sent by the first storage node.
10. A method of data management for a distributed storage system, the method comprising:
receiving a second download request sent by the network device, wherein the second download request carries a version identifier of a target virtual node and an object identifier of a data object to be downloaded, and the version identifier of the target virtual node is updated when a storage node associated with the target virtual node changes;
acquiring the data of the latest version of the data object from a distributed storage system based on the version identification of the target virtual node, the object identification of the data object and a locally stored storage index, wherein the version identification of the target virtual node and the storage time stamp of the data object are stored in the storage index;
sending the latest version of the data object to the network device.
11. The method of claim 10, wherein retrieving the most recent version of the data object from the distributed storage system based on the version identification of the target virtual node and the object identification of the data object, and a locally stored storage index comprises:
in response to that the version identifier of the target virtual node in the locally stored storage index is the same as the version identifier of the target virtual node carried by the second download request, acquiring a storage address of the data object from the storage index based on the object identifier of the data object, and acquiring data of the latest version of the data object from the local based on the storage address;
and in response to that the version identification of the target virtual node in the locally stored storage index is different from the version identification of the target virtual node carried by the second download request, determining a storage node storing the latest version data of the data object from the distributed storage system, and acquiring the data of the latest version of the data object from the storage node.
12. The method of claim 11, wherein determining a storage node from the distributed storage system that stores the latest version of the data object comprises:
acquiring an association track record of a target virtual node stored locally, wherein the association track record comprises node identifications of a plurality of storage nodes associated with the target virtual node history and migration paths among the plurality of storage nodes;
determining a plurality of storage nodes which are historically associated with the target virtual node from the associated track record;
and determining the latest storage node of the version identification of the associated target virtual node from the plurality of storage nodes based on the version identification of the target virtual node associated with each storage node.
13. The method of claim 12, wherein determining the version identification of the associated target virtual node from the plurality of storage nodes based on the version identification of the target virtual node associated with each storage node comprises:
in response to the version identification of the associated target virtual node among the plurality of storage nodes identifying the latest at least two storage nodes, retrieving the storage node with the closest timestamp based on the timestamp of the data object stored by each storage node.
14. The method according to any one of claims 10-13, further comprising:
periodically scanning the stored data objects;
in response to that a virtual node mapped by any data object is not a self-associated virtual node and the locally stored data is the data of the latest version of the data object, pushing the data of the latest version of the data object to a storage node currently associated with the virtual node;
and in response to the virtual node mapped by any data object not being the self-associated virtual node and the locally stored data not being the latest version of the data object, deleting the locally stored data of the data object.
15. The method of claim 14, further comprising:
deleting the virtual node mapped by any data object in the associated track record;
and updating the deleted associated track record to each storage node of the distributed storage system.
16. A data management apparatus of a distributed storage system, the apparatus comprising:
the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving an object uploading request of a terminal, the object uploading request carries a data object to be uploaded, and is used for requesting to store the data object into a distributed storage system, the distributed storage system comprises a plurality of virtual nodes and a plurality of storage nodes, and one storage node is associated with at least one virtual node;
a first determining module, configured to determine a target virtual node corresponding to the data object based on the object identifier of the data object and the number of virtual nodes;
a second determining module, configured to determine a first storage node associated with the target virtual node;
a storage module for storing the data object to the first storage node;
the first updating module is used for updating the version identification of the target virtual node and the storage timestamp of the data object into the storage index of the first storage node, the version identification of the target virtual node and the storage timestamp are used for checking the version of the data object, and the version identification of the target virtual node is updated when the associated storage node is changed.
17. A data management apparatus of a distributed storage system, the apparatus comprising:
a third receiving module, configured to receive a second download request sent by a network device, where the second download request carries a version identifier of a target virtual node and an object identifier of a data object to be downloaded, and the version identifier of the target virtual node is updated when a storage node associated with the target virtual node changes;
a second obtaining module, configured to obtain the data object from a distributed storage system based on the version identifier of the target virtual node and the object identifier of the data object;
and the second sending module is used for sending the data object to the network equipment.
18. A network device comprising a processor and a memory, wherein the memory stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the operations executed by the data management method of the distributed storage system according to any one of claims 1 to 9.
19. A storage node, comprising a processor and a memory, wherein the memory stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the operations executed by the data management method of the distributed storage system according to any one of claims 10 to 15.
20. A computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to perform operations performed by the data management method of the distributed storage system of any one of claims 1 to 15.
CN202011407835.0A 2020-12-04 2020-12-04 Data management method, device and equipment of distributed storage system Active CN112632029B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011407835.0A CN112632029B (en) 2020-12-04 2020-12-04 Data management method, device and equipment of distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011407835.0A CN112632029B (en) 2020-12-04 2020-12-04 Data management method, device and equipment of distributed storage system

Publications (2)

Publication Number Publication Date
CN112632029A CN112632029A (en) 2021-04-09
CN112632029B true CN112632029B (en) 2022-08-05

Family

ID=75307991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011407835.0A Active CN112632029B (en) 2020-12-04 2020-12-04 Data management method, device and equipment of distributed storage system

Country Status (1)

Country Link
CN (1) CN112632029B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114968539A (en) * 2021-02-22 2022-08-30 华为技术有限公司 Data processing method, computer system and intermediate device
CN113703826A (en) * 2021-07-29 2021-11-26 北京三快在线科技有限公司 Method, apparatus, device and storage medium for responding to data processing request
CN116069788B (en) * 2023-03-24 2023-06-20 杭州趣链科技有限公司 Data processing method, database system, computer device, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036986A (en) * 2011-12-15 2013-04-10 微软公司 Update notification provided on distributed application object
CN110990483A (en) * 2019-11-26 2020-04-10 上海莉莉丝科技股份有限公司 Data access and control method and system for cache nodes in distributed cache
CN111382200A (en) * 2018-12-29 2020-07-07 北京中交兴路信息科技有限公司 Information loading method and device, server and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10839093B2 (en) * 2018-04-27 2020-11-17 Nutanix, Inc. Low latency access to physical storage locations by implementing multiple levels of metadata

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036986A (en) * 2011-12-15 2013-04-10 微软公司 Update notification provided on distributed application object
CN111382200A (en) * 2018-12-29 2020-07-07 北京中交兴路信息科技有限公司 Information loading method and device, server and storage medium
CN110990483A (en) * 2019-11-26 2020-04-10 上海莉莉丝科技股份有限公司 Data access and control method and system for cache nodes in distributed cache

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种支持分布式数据流处理的双层重叠网络模型;王金栋等;《应用科学学报》;20060731;第24卷(第4期);全文 *

Also Published As

Publication number Publication date
CN112632029A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN112632029B (en) Data management method, device and equipment of distributed storage system
CN109683826B (en) Capacity expansion method and device for distributed storage system
CN107861686B (en) File storage method, server and computer readable storage medium
US11966307B2 (en) Re-aligning data replication configuration of primary and secondary data serving entities of a cross-site storage solution after a failover event
JP5727020B2 (en) Cloud computing system and data synchronization method thereof
EP2923272B1 (en) Distributed caching cluster management
US20150201036A1 (en) Gateway device, file server system, and file distribution method
CN108683668B (en) Resource checking method, device, storage medium and equipment in content distribution network
CN104202375A (en) Method and system for synchronous data
CN109976941B (en) Data recovery method and device
CN111049928B (en) Data synchronization method, system, electronic device and computer readable storage medium
CN111355791B (en) File transmission method and device, computer equipment and storage medium
CN113268472B (en) Distributed data storage system and method
CN111464603B (en) Server capacity expansion method and system
CN106850724B (en) Data pushing method and device
CN113190619B (en) Data read-write method, system, equipment and medium for distributed KV database
CN108509296B (en) Method and system for processing equipment fault
CN112579550B (en) Metadata information synchronization method and system of distributed file system
CN112866406B (en) Data storage method, system, device, equipment and storage medium
US10853892B2 (en) Social networking relationships processing method, system, and storage medium
CN111459416B (en) Distributed storage-based thermal migration system and migration method thereof
CN115087966A (en) Data recovery method, device, equipment, medium and program product
CN112000850A (en) Method, device, system and equipment for data processing
CN111147226B (en) Data storage method, device and storage medium
CN109992447B (en) Data copying method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant