CN113055495B - Data processing method and device and distributed storage system - Google Patents

Data processing method and device and distributed storage system Download PDF

Info

Publication number
CN113055495B
CN113055495B CN202110346412.0A CN202110346412A CN113055495B CN 113055495 B CN113055495 B CN 113055495B CN 202110346412 A CN202110346412 A CN 202110346412A CN 113055495 B CN113055495 B CN 113055495B
Authority
CN
China
Prior art keywords
node
storage
virtual
virtual group
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110346412.0A
Other languages
Chinese (zh)
Other versions
CN113055495A (en
Inventor
刘�东
胡君怡
李丹旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision System Technology Co Ltd
Original Assignee
Hangzhou Hikvision System Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision System Technology Co Ltd filed Critical Hangzhou Hikvision System Technology Co Ltd
Priority to CN202110346412.0A priority Critical patent/CN113055495B/en
Publication of CN113055495A publication Critical patent/CN113055495A/en
Application granted granted Critical
Publication of CN113055495B publication Critical patent/CN113055495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data processing method, a data processing device and a distributed storage system, wherein the method is applied to a management node in the distributed storage system and comprises the following steps: acquiring load information of a storage node under the condition that a preset generation condition is met; generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups; selecting two virtual nodes from each virtual group, and respectively using the two virtual nodes as a main node and a standby node of the virtual group; and storing each storage object into the storage node corresponding to the main node and storing the backup of each storage object into the storage node corresponding to the corresponding backup node based on the identification and the virtual group information of each storage object. The management node selects the main node and the standby node of the virtual group according to the load information of the storage nodes, so that the main node and the standby node are more reasonably selected, and the load balance of each storage node in the distributed storage system is realized.

Description

Data processing method and device and distributed storage system
Technical Field
The present invention relates to the field of distributed data storage technologies, and in particular, to a data processing method and apparatus, and a distributed storage system.
Background
The distributed storage system is a distributed storage service facing mass data scale, has the characteristics of stability, reliability, safety, low cost and the like, and is widely applied to the storage field of storage objects such as images, documents, videos and the like. A distributed storage system generally includes a plurality of storage nodes for storing individual storage objects in a distributed manner.
When determining the storage node corresponding to each storage object, a corresponding main storage node is generally determined for the storage object by a hash modulo algorithm, a consistent hash algorithm, or the like. In order to avoid data loss caused by failure of a certain storage node or some storage nodes, a backup primary storage node needs to be selected for each primary storage node to store a backup of a storage object stored by the primary storage node.
In these methods, only one main storage node can be selected for the storage object, and the selection of the backup storage node becomes a big problem, and the backup storage node is generally determined by a random selection method. Meanwhile, because the sizes of the storage objects are different, the storage capacity of each storage node is also different, and the main storage node and the standby storage node selected by adopting the method have considerable randomness, the phenomenon of unbalanced load of the storage nodes can be caused, and the distributed storage system can be possibly crashed.
Disclosure of Invention
The embodiment of the invention aims to provide a data processing method, a data processing device and a distributed storage system, so that a main node and a standby node are reasonably selected, and load balance of each storage node in the distributed storage system is realized. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a data processing method, which is applied to a management node in a distributed storage system, where the distributed storage system further includes multiple storage nodes, and the method includes:
acquiring load information of the storage nodes under the condition that a preset generation condition is met;
generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups, wherein each virtual group comprises a plurality of virtual nodes;
selecting two virtual nodes from each virtual group, and respectively using the two virtual nodes as a main node and a standby node of the virtual group;
and storing each storage object into a storage node corresponding to the main node and storing the backup of each storage object into a storage node corresponding to a corresponding standby node based on the identification of each storage object and the virtual group information, wherein the virtual group information comprises the corresponding relation between each virtual group and the main node and the standby node of the virtual group.
Optionally, the preset generating condition includes at least one of the following:
one or more storage nodes in the distributed storage system go offline;
one or more storage nodes in the distributed storage system are on-line;
the load information of the storage nodes in the distributed storage system meets a preset unbalance condition;
a virtual group update instruction is received.
Optionally, the load information includes an available storage capacity of each storage node;
the step of generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups includes:
determining the quantity weight of the virtual nodes corresponding to each storage node according to the proportion of the available storage capacity of each storage node to the total storage capacity of all the storage nodes;
and generating a corresponding virtual node for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of the preset virtual groups to obtain a plurality of virtual groups.
Optionally, the step of selecting two virtual nodes from each virtual group as a master node and a standby node of the virtual group includes:
performing hash operation on the identifier of each virtual node to obtain a hash value of each virtual node, wherein the identifier of each virtual node comprises the identifier of the virtual group to which the virtual node belongs, the identifier of the corresponding storage node and a random number;
sorting the virtual nodes included in each virtual group according to the corresponding hash value to obtain a sorting result;
for the virtual nodes included in each virtual group, performing deduplication processing on continuous virtual nodes with the same storage node based on the sorting result;
and selecting two continuous virtual nodes from each virtual group after the duplicate removal processing, and respectively taking the two continuous virtual nodes as a main node and a standby node of the virtual group.
Optionally, the distributed storage system further includes a monitoring node;
the step of obtaining the load information of the storage node includes:
and acquiring the load information of the storage node reported by the monitoring node.
Optionally, the step of storing each storage object in the storage node corresponding to the primary node and storing the backup of each storage object in the storage node corresponding to the corresponding backup node based on the identifier of each storage object and the virtual group information includes:
and sending the virtual group information to the storage nodes, so that each storage node determines a target storage object which does not belong to the storage node based on the virtual group information, determines a first target virtual group based on the identification of the target storage object and the identification of each virtual group, migrates the target storage object to a storage node corresponding to a main node of the first target virtual group, so that the storage node corresponding to the main node of the first target virtual group stores the target storage object, and stores the backup of the target storage object to a storage node corresponding to a standby node of the first target virtual group.
Optionally, the distributed storage system further includes a gateway node; the method further comprises the following steps:
and sending the virtual group information to the gateway node so that the gateway node stores the virtual group information.
In a second aspect, an embodiment of the present invention provides a data processing method, which is applied to a storage node in a distributed storage system, where the distributed storage system further includes a management node, and the method includes:
receiving virtual group information sent by the management node, wherein the virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition;
determining a target storage object which does not belong to the storage node based on the virtual group information, and determining a first target virtual group based on the identification of the target storage object and the identification of each virtual group;
migrating the target storage object to a storage node corresponding to the main node of the first target virtual group, so that the storage node corresponding to the main node of the first target virtual group stores the target storage object, and storing the backup of the target storage object to a storage node corresponding to the standby node of the first target virtual group.
In a third aspect, an embodiment of the present invention provides a data processing method, which is applied to a gateway node in a distributed storage system, where the distributed storage system further includes a management node and multiple storage nodes, and the method includes:
acquiring an object to be uploaded;
determining a second target virtual group based on the identifier of the object to be uploaded and pre-stored virtual group information, wherein the virtual group information comprises a corresponding relation between each virtual group and a main node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of the preset virtual groups when the management node meets a preset generation condition;
and sending the object to be uploaded to a storage node corresponding to the main node of the second target virtual group so that the storage node stores the object to be uploaded, and backing up the object to be uploaded to the storage node corresponding to the standby node of the second target virtual group.
In a fourth aspect, an embodiment of the present invention provides a data processing method, which is applied to a gateway node in a distributed storage system, where the distributed storage system further includes a management node and a plurality of storage nodes, and the method includes:
acquiring an object downloading instruction, wherein the object downloading instruction comprises an identifier of an object to be downloaded;
determining a third target virtual group based on the identifier of the object to be downloaded and pre-stored virtual group information, wherein the virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of the preset virtual groups when the management node meets a preset generation condition;
and reading the object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the main node of the third target virtual group.
Optionally, in a case that reading an object corresponding to the identifier of the object to be downloaded from the master node of the third target virtual group fails, the method further includes:
and reading the object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the standby node of the third target virtual group.
In a fifth aspect, an embodiment of the present invention provides a data processing apparatus, which is applied to a management node in a distributed storage system, where the distributed storage system further includes a plurality of storage nodes, and the apparatus includes:
the load information acquisition module is used for acquiring the load information of the storage node under the condition that a preset generation condition is met;
a virtual group generation module, configured to generate a corresponding virtual node for each storage node based on the load information and a preset number of virtual groups, so as to obtain multiple virtual groups, where each virtual group includes multiple virtual nodes;
a master/standby node determining module, configured to select two virtual nodes from each virtual group, where the two virtual nodes are respectively used as a master node and a standby node of the virtual group;
and the first object storage module is used for storing each storage object into a storage node corresponding to the main node and storing the backup of each storage object into a storage node corresponding to a corresponding standby node based on the identification of each storage object and virtual group information, wherein the virtual group information comprises the corresponding relation between each virtual group and the main node and the standby node of the virtual group.
In a sixth aspect, an embodiment of the present invention provides a data processing apparatus, which is applied to a storage node in a distributed storage system, where the distributed storage system further includes a management node, and the apparatus includes:
a virtual group information receiving module, configured to receive virtual group information sent by the management node, where the virtual group information includes a correspondence between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating, by the management node, a corresponding virtual node for each storage node based on load information of the storage node and a preset number of virtual groups when a preset generation condition is met;
a first virtual group determining module, configured to determine, based on the virtual group information, a target storage object that does not belong to the storage node, and determine, based on an identifier of the target storage object and an identifier of each virtual group, a first target virtual group;
and the second object storage module is used for migrating the target storage object to a storage node corresponding to the main node of the first target virtual group, so that the storage node corresponding to the main node of the first target virtual group stores the target storage object, and stores the backup of the target storage object to the storage node corresponding to the standby node of the first target virtual group.
In a seventh aspect, an embodiment of the present invention provides a data processing apparatus, applied to a gateway node in a distributed storage system, where the distributed storage system further includes a management node and multiple storage nodes, and the apparatus includes:
the object acquisition module is used for acquiring an object to be uploaded;
a second virtual group determining module, configured to determine a second target virtual group based on the identifier of the object to be uploaded and pre-stored virtual group information, where the virtual group information includes a correspondence between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition;
and the third object storage module is used for sending the object to be uploaded to a storage node corresponding to the main node of the second target virtual group, so that the storage node stores the object to be uploaded, and backups the object to be uploaded to the storage node corresponding to the standby node of the second target virtual group.
In an eighth aspect, an embodiment of the present invention provides a data processing apparatus, which is applied to a gateway node in a distributed storage system, where the distributed storage system further includes a management node and a plurality of storage nodes, and the apparatus includes:
the download instruction acquisition module is used for acquiring an object download instruction, wherein the object download instruction comprises an identifier of an object to be downloaded;
a third virtual group determining module, configured to determine a third target virtual group based on an identifier of the object to be downloaded and pre-stored virtual group information, where the virtual group information includes a correspondence between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition;
and the object downloading module is used for reading the object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the main node of the third target virtual group.
In a ninth aspect, an embodiment of the present invention provides a distributed storage system, where the distributed storage system includes a management node and a plurality of storage nodes, where:
the management node is used for acquiring the load information of the storage node under the condition that a preset generation condition is met; generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups; selecting two virtual nodes from each virtual group, and respectively using the two virtual nodes as a main node and a standby node of the virtual group; storing each storage object into a storage node corresponding to the main node and storing a backup of each storage object into a storage node corresponding to a corresponding standby node based on the identification of each storage object and virtual group information, wherein each virtual group comprises a plurality of virtual nodes, and the virtual group information comprises the corresponding relation between each virtual group and the main node and the standby node of the virtual group;
the storage node is used for storing the storage object and/or the backup of the storage object.
Optionally, the distributed storage system further includes a monitoring node;
the monitoring node is used for acquiring the load information of the storage node and reporting the load information to the management node;
the management node is specifically configured to obtain the load information reported by the monitoring node.
Optionally, the distributed storage system further includes a gateway node;
the management node is further configured to send the virtual group information to the gateway node;
the gateway node is configured to store the virtual group information.
Optionally, the gateway node is further configured to obtain an object to be uploaded; determining a second target virtual group based on the identification of the object to be uploaded and the virtual group information; sending the object to be uploaded to a storage node corresponding to the main node of the second target virtual group;
and the storage node corresponding to the main node of the second target virtual group is used for storing the object to be uploaded and backing up the object to be uploaded to the storage node corresponding to the standby node of the second target virtual group.
Optionally, the gateway node is further configured to obtain an object downloading instruction; determining a third target virtual group based on the identification of the object to be downloaded and the virtual group information; and reading an object corresponding to the identifier of the object to be downloaded from a storage node corresponding to the main node of the third target virtual group, wherein the object downloading instruction comprises the identifier of the object to be downloaded.
In a tenth aspect, an embodiment of the present invention provides a management node, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor configured to implement the method steps of any one of the first aspect when executing a program stored in the memory.
In an eleventh aspect, an embodiment of the present invention provides a storage node, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor, configured to implement the method steps of the second aspect when executing the program stored in the memory.
In a twelfth aspect, an embodiment of the present invention provides a gateway node, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor arranged to implement the method steps of any of the third or fourth aspects when executing a program stored in the memory.
In a thirteenth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above method steps.
In the solution provided in the embodiment of the present invention, a management node may obtain load information of storage nodes when a preset generation condition is satisfied, generate a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups, and obtain a plurality of virtual groups, where each virtual group includes a plurality of virtual nodes, select two virtual nodes from each virtual group to serve as a master node and a standby node of the virtual group, store each storage object in the storage node corresponding to the master node based on an identifier of each storage object and virtual group information, and store a backup of each storage object in the storage node corresponding to the corresponding standby node, where the virtual group information includes a correspondence relationship between each virtual group and the master node and the standby node of the virtual group. Therefore, the management node can generate corresponding virtual nodes according to the load information of the storage nodes, two virtual nodes are selected from each virtual group and are respectively used as a main node and a standby node of the virtual group, and the selected main node and the standby node take the load information of the storage nodes into consideration, so that the main node and the standby node are more reasonably selected, and the load balance of each storage node in the distributed storage system is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention;
fig. 2 is a flowchart of a first data processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a specific step S203 in the embodiment shown in FIG. 2;
FIG. 4 is a schematic diagram of a virtual group structure according to the embodiment shown in FIG. 3;
fig. 5 is a schematic structural diagram of another distributed storage system according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention;
FIG. 7 is a flowchart illustrating a second data processing method according to an embodiment of the present invention;
FIG. 8 is a flowchart illustrating a third data processing method according to an embodiment of the invention;
FIG. 9 is a flowchart illustrating a fourth data processing method according to an embodiment of the present invention;
FIG. 10 is a block diagram of a first data processing apparatus according to an embodiment of the present invention;
FIG. 11 is a block diagram of a second data processing apparatus according to an embodiment of the present invention;
FIG. 12 is a block diagram of a third data processing apparatus according to an embodiment of the present invention;
FIG. 13 is a diagram illustrating a fourth data processing apparatus according to an embodiment of the present invention;
fig. 14 is a schematic structural diagram of a management node according to an embodiment of the present invention;
fig. 15 is a schematic structural diagram of a storage node according to an embodiment of the present invention;
fig. 16 is a schematic structural diagram of a gateway node according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In order to reasonably select a master node and a backup node and realize load balancing of each storage node in a distributed storage system, the embodiment of the invention provides a data processing method, a device, the distributed storage system, a management node, a storage node, a gateway node and a computer readable storage medium.
First, a first data processing method provided in an embodiment of the present invention is described below. The first data processing method provided by the embodiment of the invention can be applied to a management node in a distributed storage system. As shown in fig. 1, the distributed storage system according to the embodiment of the present invention may include a management node 110 and a plurality of storage nodes 120, where the management node 110 and the plurality of storage nodes 120 may communicate with each other to transmit data, and the storage nodes 120 may also communicate with each other to transmit data.
As shown in fig. 2, a data processing method is applied to a management node in a distributed storage system, and the method includes:
s201, acquiring load information of the storage node under the condition that a preset generation condition is met;
s202, generating corresponding virtual nodes for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups;
wherein each virtual group comprises a plurality of virtual nodes.
S203, selecting two virtual nodes from each virtual group, and respectively using the two virtual nodes as a main node and a standby node of the virtual group;
and S204, storing each storage object into a storage node corresponding to the main node and storing the backup of each storage object into a storage node corresponding to the corresponding backup node based on the identification of each storage object and the virtual group information.
The virtual group information includes the corresponding relationship between each virtual group and the master node and the standby node of the virtual group.
It can be seen that in the scheme provided in the embodiment of the present invention, a management node may obtain load information of storage nodes when a preset generation condition is satisfied, generate a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups, to obtain a plurality of virtual groups, where each virtual group includes a plurality of virtual nodes, select two virtual nodes from each virtual group to serve as a master node and a backup node of the virtual group, store each storage object in the storage node corresponding to the master node based on an identifier of each storage object and virtual group information, and store a backup of each storage object in the storage node corresponding to the backup node, where the virtual group information includes a corresponding relationship between each virtual group and the master node and the backup node of the virtual group. Therefore, the management node can generate corresponding virtual nodes according to the load information of the storage nodes, two virtual nodes are selected from each virtual group and are respectively used as a main node and a standby node of the virtual group, and the selected main node and the standby node take the load information of the storage nodes into consideration, so that the main node and the standby node are more reasonably selected, and the load balance of each storage node in the distributed storage system is realized.
In the solution provided in the embodiment of the present invention, the storage node corresponding to the primary node is used to store a storage object to be stored, and the storage node corresponding to the backup node that belongs to the same virtual group as the primary node is used to store a backup of the storage object, that is, the more the primary node and the backup node corresponding to one storage node are, the larger the data amount that the storage node needs to store is, so that it is desirable to implement load balancing of the storage amount of each storage node, and it is necessary to reasonably select the primary node and the backup node to ensure stable performance of the distributed storage system, so in step S201, the management node can obtain load information of each storage node under the condition that a preset generation condition is satisfied.
The preset generation condition may include various conditions, and the condition that the preset generation condition is met indicates that the master node and the standby node need to be re-determined currently, for example, a storage node fails, a new storage node is added into the distributed storage system, and the load of each storage node is obviously unbalanced. Under these circumstances, data migration is required to ensure load balancing of storage nodes in the distributed storage system, while ensuring that storage objects stored in the distributed storage system are not lost.
The load information of each storage node may be any information capable of representing the load condition of the storage node, such as an available storage capacity and a used storage capacity of each storage node, and is not limited specifically herein. After obtaining the load information of the storage nodes, the management node may execute step S202, that is, generate corresponding virtual nodes for each storage node based on the load information and the number of preset virtual groups, so as to obtain a plurality of virtual groups.
Each virtual group includes a plurality of virtual nodes, and the number of the virtual groups may be set according to the number of the storage nodes, the data amount of the storage object to be stored, and the like, which is not specifically limited herein. For example, 16384, 15000, 2186, etc. are possible.
Since a storage node with a higher available storage capacity can store a storage object with a larger data size, to implement load balancing, the load information indicates that the higher the available storage capacity of the storage node is, more virtual nodes can be generated for the storage node to store the storage object with the larger data size.
For example, the distributed storage system includes 5 storage nodes, and the ratio of available storage capacity is 1:2:3:1:1, the number of the preset virtual groups is 10, then the following virtual nodes can be generated for 5 storage nodes:
Figure BDA0003000854910000121
after obtaining the plurality of virtual groups, the management node may select two virtual nodes from each virtual group as the master node and the standby node of the virtual group, that is, execute step S203. In one embodiment, the management node may randomly select two virtual nodes from each virtual group as the master node and the standby node of the virtual group. Because the number of virtual nodes corresponding to the storage node with high available capacity is also large, the probability of being selected is also high, and therefore, the number of main nodes and standby nodes corresponding to the storage node with high available capacity is also large.
After each virtual group is determined, in order to facilitate subsequent processing of data storage, migration, and the like, the management node may record virtual group information, where the virtual group information at least includes a correspondence between each virtual group and a master node and a standby node of the virtual group, and thus, which storage node each master node and the corresponding standby node correspond to may be determined according to the virtual group information.
For example, if the master node and the standby node of the virtual group 1 are the virtual node 2-1-1 and the virtual node 3-1-3, respectively, the virtual group information of the virtual group 1 may be recorded as: virtual node 2-1-1; virtual nodes 3-1-3.
Next, in step S204, the management node may store each storage object in the storage node corresponding to the primary node and store the backup of each storage object in the storage node corresponding to the corresponding backup node based on the identifier of each storage object and the virtual group information. The storage object may be a file such as a video, an image, a document, and the like, and is not limited in this respect.
In an embodiment, the management node may process the identifier of each storage object by using a hash modulo algorithm or a consistent hash algorithm, and determine a virtual group corresponding to each storage object, further store each storage object into a storage node corresponding to a primary node in the corresponding virtual group, and store a backup of the storage object into a storage node corresponding to a backup node corresponding to the primary node.
For example, the management node determines that the virtual group corresponding to the storage object a is virtual group 1, and the virtual group information of the virtual group 1 is: virtual node 2-1-1; virtual node 3-1-3, then the management node may store storage object a to storage node 2 since the storage node corresponding to virtual node 2-1-1 is storage node 2. The storage node corresponding to the virtual node 3-1-3 is the storage node 3, so the management node can store the backup of the storage object a to the storage node 3, and complete the storage of the storage object a.
As an implementation manner of the embodiment of the present invention, the preset generating condition may include at least one of the following:
one or more storage nodes in the distributed storage system go offline; one or more storage nodes in the distributed storage system are online; the load information of the storage nodes in the distributed storage system meets a preset unbalance condition; a virtual group update instruction is received.
The first method comprises the following steps: if a failed storage node occurs in the distributed storage system, the failed storage node is also in a down status, that is, one or more storage nodes are down. At this time, in order to ensure that the distributed storage system can be used normally, the storage objects stored in the offline storage nodes need to be migrated to other storage nodes, and then, in order to ensure load balance of each storage node after migration, the virtual group needs to be regenerated.
And the second method comprises the following steps: if a new online storage node appears in the distributed storage system, in order to ensure load balance of each storage node, the stored storage object needs to be migrated to the new online storage node, so that a virtual group needs to be regenerated at this time.
And the third is that: the load information of the storage nodes in the distributed storage system meets a preset imbalance condition, which indicates that the storage amounts of the storage nodes in the distributed storage system are quite imbalanced, and a problem of crash is likely to occur, and at this time, storage objects stored by the storage nodes need to be readjusted, that is, virtual groups need to be regenerated.
In one embodiment, when the difference of the available storage amounts of the storage nodes in the distributed storage system is greater than a preset threshold, it may be determined that the load information of the storage nodes meets a preset imbalance condition.
And fourthly: a virtual group update instruction is received. A user may want to adjust the storage objects stored by the various storage nodes in the distributed storage system for various reasons, in which case the user may issue a virtual group update instruction indicating that a virtual group needs to be updated. Then, when the management node receives the virtual group update instruction, it may determine that the preset generation condition is satisfied.
It can be seen that, in this embodiment, when at least one of a condition that one or more storage nodes in the distributed storage system are offline, a condition that one or more storage nodes are online, and load information of the storage nodes meets a preset imbalance condition, and a virtual group update instruction is received occurs, the management node may determine that a preset generation condition is met, and then execute the data processing method, and may adjust the virtual group when various conditions occur, and further reallocate the storage nodes of the storage object, thereby ensuring load balancing of each storage node.
As an implementation manner of the embodiment of the present invention, the load information may include an available storage capacity of each storage node.
In this case, the step of generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups may include:
determining the quantity weight of the virtual nodes corresponding to each storage node according to the proportion of the available storage capacity of each storage node to the total storage capacity of all the storage nodes; and generating a corresponding virtual node for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of the preset virtual groups to obtain a plurality of virtual groups.
In order to determine the number of virtual nodes that each storage node needs to generate, the management node may determine a weight of the number of virtual nodes corresponding to each storage node according to a ratio between the available storage capacity of each storage node and a total storage capacity of all storage nodes.
In one embodiment, the higher the ratio between the available storage capacity of a storage node and the total storage capacity of all storage nodes, the higher the available storage capacity of the storage node is, the higher the weight of the number of virtual nodes corresponding to the storage node may be.
Furthermore, the management device may generate a corresponding virtual node for each storage node based on the number weight of the virtual nodes corresponding to each storage node and the number of the preset virtual groups, so as to obtain a plurality of virtual groups. In one embodiment, each virtual group includes virtual nodes, and the number of virtual nodes corresponding to each storage node matches with the corresponding number weight.
For example, the distributed storage system includes 4 storage nodes, which are respectively storage nodes 1 to 4, the number of preset virtual groups is 16384, and the identifier of the virtual Node is Node-N-M-r, where N is a virtual group identifier, M is an actual Node identifier, and r is a random number. The ratio between the available storage capacity of the storage nodes 1-4 and the total storage capacity of all the storage nodes is: 20%, 40%, the number of virtual nodes corresponding to the storage nodes 1-4 may be determined to be weighted 20%, 40%. The management node may generate a corresponding virtual node for each storage node, as follows:
Figure BDA0003000854910000151
thus, 16384 virtual groups can be obtained, where each virtual group includes 5 virtual nodes, and the number of virtual nodes corresponding to the storage nodes 1-4 is 1, and 2, respectively.
As can be seen, in this embodiment, the management node may determine the quantity weight of the virtual nodes corresponding to each storage node according to a ratio between the available storage capacity of each storage node and the total storage capacity of all the storage nodes, and then generate corresponding virtual nodes for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of the preset virtual groups, so as to obtain multiple virtual groups. Therefore, the management node can generate a corresponding virtual node for each storage node based on the available storage capacity of each storage node, so that the number of the storage nodes in the obtained virtual group is in direct proportion to the available storage capacity, and the main node and the standby node can be further reasonably selected.
As an implementation manner of the embodiment of the present invention, as shown in fig. 3, the step of selecting two virtual nodes from each virtual group as a master node and a standby node of the virtual group may include:
s301, performing hash operation on the identifier of each virtual node to obtain a hash value of each virtual node;
in order to ensure that when one of the main node and the standby node fails, the storage object stored by the failed node can be acquired from the storage node corresponding to the other node which does not fail, so that data loss is avoided, and the main node and the standby node which store the same storage object cannot correspond to the same storage node.
In order to ensure that the master node and the standby node do not correspond to the same storage node when the master node and the standby node of each virtual group are selected, the management node may first perform hash operation on the identifier of each virtual node to obtain a hash value of each virtual node. The identifier of each virtual node comprises an identifier of a virtual group to which the virtual node belongs, an identifier of a corresponding storage node and a random number, so that hash operation can be performed, and the virtual group to which the virtual node belongs and the corresponding storage node can be determined through the identifier of the virtual node.
S302, sorting the virtual nodes included in each virtual group according to the corresponding hash values to obtain sorting results;
after the hash value of each virtual node is obtained, for the virtual nodes included in each virtual group, the management node may sort according to the corresponding hash value of each virtual node therein, so as to obtain a sorting result. In one embodiment, the hash values may be sorted in an ascending order, or the hash values may be sorted in a descending order, which is reasonable.
S303, aiming at the virtual nodes included in each virtual group, carrying out duplicate removal processing on the continuous virtual nodes with the same storage node based on the sequencing result;
in order to ensure that the master node and the standby node of the same virtual group do not correspond to the same storage node, the management node may perform deduplication processing on consecutive virtual nodes having the same storage node based on the sorting result. In particular, if two consecutive virtual nodes have the same storage node, one of the virtual nodes may be removed.
S304, selecting two continuous virtual nodes from each virtual group after the duplication elimination processing, and respectively using the two continuous virtual nodes as a main node and a standby node of the virtual group.
In each virtual group after the deduplication processing, the storage nodes corresponding to two consecutive virtual nodes are all different, so the management node can select two consecutive virtual nodes from each virtual group after the deduplication processing to respectively serve as the master node and the backup node of the virtual group, and thus, the storage nodes corresponding to the master node and the backup node of each virtual group are different storage nodes.
In one embodiment, the management node may select the first two virtual nodes from each virtual group after the deduplication processing, and respectively serve as the master node and the standby node of the virtual group. For example, the distributed storage system includes 4 storage nodes, which are storage nodes 1 to 4, respectively, and the number of preset virtual groups is 16384, specifically, the virtual groups shown in the above table. The management node may select a master node and a standby node for each virtual group by using the steps S301 to S304, which are as follows:
virtual group Master node Backup node
VG-0 Node-0-1-32 Node-0-2-17
VG-1 Node-1-3-58 Node-1-1-34
VG-2 Node-2-2-93 Node-2-4-37
...
VG-16383 Node-16383-4-35 Node-16383-2-45
As shown in fig. 4, the structure diagram of the virtual group includes a virtual group VG-0, a virtual group VG-1, a virtual group VG-2 \8230, and a virtual group VG-16383, each virtual group has a master node and a standby node, and the storage nodes corresponding to the master node and the standby node of each virtual group are different, and an ip Address (Internet Protocol Address) in fig. 4 is a storage node ip.
It can be seen that, in this embodiment, the management node may perform hash operation on the identifier of each virtual node to obtain a hash value of each virtual node, sort the virtual nodes included in each virtual group according to the corresponding hash values to obtain a sorting result, perform deduplication processing on consecutive virtual nodes having the same storage node based on the sorting result, and further select two consecutive virtual nodes from each deduplication processed virtual group to be respectively used as a master node and a backup node of the virtual group.
As an implementation manner of the embodiment of the present invention, as shown in fig. 5, the distributed storage system may further include a monitoring node 130. In one embodiment, there may be one monitoring node 130 for each storage node. The monitoring node 130 may monitor the state and load condition of the corresponding storage node 120, and then report the monitored information to the management node 110. The state of the storage node may include online, offline, failure, and the like.
Correspondingly, the step of acquiring the load information of the storage node may include:
and acquiring the load information of the storage node reported by the monitoring node.
In this case, the management node may receive the load information of each storage node reported by the monitoring node. In an embodiment, the monitoring node may report load information of each storage node at regular time, and simultaneously monitor whether the storage node is offline or online in real time, and report the information to the management node.
Therefore, in this embodiment, the monitoring node may monitor the load information of the storage nodes and report the load information to the management node, and the management node may quickly obtain accurate load information, so that the master node and the standby node of each virtual group may be reasonably selected when the preset generation condition is met, and load balance of each storage node is ensured.
As an embodiment of the present invention, the step of storing each storage object in the storage node corresponding to the primary node and storing the backup of each storage object in the storage node corresponding to the corresponding backup node based on the identifier of each storage object and the virtual group information may include:
and sending the virtual group information to the storage nodes, so that each storage node determines a target storage object which does not belong to the storage node based on the virtual group information, determines a first target virtual group based on the identification of the target storage object and the identification of each virtual group, migrates the target storage object to a storage node corresponding to a main node of the first target virtual group, so that the storage node corresponding to the main node of the first target virtual group stores the target storage object, and stores the backup of the target storage object to a storage node corresponding to a standby node of the first target virtual group.
After the master node and the standby node of each virtual group are determined, the management node may send the virtual group information to each storage node, and after each storage node receives the virtual group information, the management node may determine the current virtual node corresponding to the management node based on the virtual group information. And further, hash modulo processing may be performed on the identifier of the currently stored storage object, to determine a virtual group corresponding to each currently stored storage object, and if neither the master node nor the standby node of the virtual group is a virtual node corresponding to the current node, the storage object is not a storage object that needs to be stored by the storage node, that is, is not a target storage object belonging to the storage node.
Next, the storage node may determine a first target virtual group based on the identifier (key) of the target storage object and the identifier of each virtual group, and then migrate the target storage object to a storage node corresponding to the primary node of the first target virtual group. The first target virtual group is a virtual group used for storing the target storage object and indicated by the virtual group information received by the storage node. In one embodiment, the storage node may perform a hash modulo process on the identifier of the target storage object to determine its corresponding first target virtual group.
And after receiving the target storage object, the storage node corresponding to the main node of the first target virtual group can store the target storage object, and store the backup of the target storage object to the storage node corresponding to the standby node of the first target virtual group, thereby completing the migration of the target storage object.
After the virtual group is updated, after each storage node in the distributed storage system receives the virtual group information, the storage object can be migrated according to the above manner, so as to realize load balancing of the storage nodes.
As an implementation manner of the embodiment of the present invention, as shown in fig. 6, the distributed storage system may further include a gateway node 140. The gateway node 140 may be communicatively coupled to the management node 110 and the respective storage nodes 120 for data transfer.
In this case, the method may further include: and sending the virtual group information to the gateway node so that the gateway node stores the virtual group information.
The management node may further send the virtual group information to the gateway node, and the gateway node may store the virtual group information after receiving the virtual group information, so that the object is uploaded or downloaded based on the virtual group information in the case of subsequently receiving an object to be uploaded or an object download instruction, and the like.
Corresponding to the first data processing method, the embodiment of the invention also provides a second data processing method. The second data processing method provided by the embodiment of the invention can be applied to storage nodes in a distributed storage system, and the distributed storage system further comprises a management node.
As shown in fig. 7, a data processing method is applied to a storage node in a distributed storage system, and the method includes:
s701, receiving virtual group information sent by the management node;
the virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition.
S702, determining a target storage object which does not belong to the storage node based on the virtual group information, and determining a first target virtual group based on the identification of the target storage object and the identification of each virtual group;
and S703, migrating the target storage object to a storage node corresponding to the primary node of the first target virtual group, so that the storage node corresponding to the primary node of the first target virtual group stores the target storage object, and storing the backup of the target storage object to the storage node corresponding to the backup node of the first target virtual group.
As can be seen, in the solution provided in the embodiment of the present invention, a storage node may receive virtual group information sent by a management node, where the virtual group information includes a correspondence between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating, for each storage node, a corresponding virtual node based on load information of the storage node and a number of preset virtual groups when the management node meets a preset generation condition, determining, based on the virtual group information, a target storage object that does not belong to the storage node, and determining, based on an identifier of the target storage object and an identifier of each virtual group, a first target virtual group. And then migrating the target storage object to a storage node corresponding to the main node of the first target virtual group, so that the storage node corresponding to the main node of the first target virtual group stores the target storage object, and storing the backup of the target storage object to a storage node corresponding to the standby node of the first target virtual group. In this way, the storage nodes can perform migration of the storage objects based on the virtual group information sent by the management node, and load balance of each storage node is ensured.
After determining the master node and the standby node of each virtual group by using the first data processing method, the management node may send the virtual group information to each storage node, and after receiving the virtual group information, each storage node may determine the current virtual node corresponding to the management node based on the virtual group information. And further, hash modulo processing may be performed on the identifier of the currently stored storage object, to determine a virtual group corresponding to each currently stored storage object, and if neither the master node nor the standby node of the virtual group is a virtual node corresponding to the current node, the storage object is not a storage object that needs to be stored by the storage node, that is, is not a target storage object belonging to the storage node.
For example, after receiving the virtual group information, the storage Node 2 may determine that the corresponding virtual nodes are Node-0-2-17, node-2-2-93 \8230, node-16383-2-45 according to the virtual group information. And then the storage Node 2 can perform hash modular processing on the identifier of the currently stored storage object to determine the virtual group corresponding to each currently stored storage object, and if the master Node and the standby Node of the virtual group are not one of the virtual nodes in Node-0-2-17, node-2-2-93 \8230andnode-16383-2-45, the storage object is not the storage object that the storage Node 2 needs to store, that is, the storage object does not belong to the target storage object of the storage Node 2.
Next, the storage node may determine a first target virtual group based on the identifier (key) of the target storage object and the identifier of each virtual group, and then migrate the target storage object to a storage node corresponding to the primary node of the first target virtual group. The first target virtual group is a virtual group used for storing the target storage object and indicated by the virtual group information received by the storage node. In one embodiment, the storage node may perform a hash modulo process on the identifier of the target storage object to determine its corresponding first target virtual group.
And after receiving the target storage object, the storage node corresponding to the main node of the first target virtual group can store the target storage object, and store the backup of the target storage object to the storage node corresponding to the standby node of the first target virtual group, thereby completing the migration of the target storage object.
After the virtual group is updated, after each storage node in the distributed storage system receives the virtual group information, the storage object can be migrated according to the above manner, so as to realize load balancing of the storage nodes.
Corresponding to the first data processing method, the embodiment of the present invention further provides a third data processing method. The third data processing method provided by the embodiment of the invention can be applied to a gateway node in a distributed storage system, and the distributed storage system further comprises a management node and a plurality of storage nodes.
As shown in fig. 8, a data processing method is applied to a gateway node in a distributed storage system, and the method includes:
s801, acquiring an object to be uploaded;
s802, determining a second target virtual group based on the identification of the object to be uploaded and the pre-stored virtual group information;
the virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition.
And S803, sending the object to be uploaded to a storage node corresponding to the main node of the second target virtual group, so that the storage node stores the object to be uploaded, and backing up the object to be uploaded to the storage node corresponding to the standby node of the second target virtual group.
It can be seen that, in the scheme provided in the embodiment of the present invention, a gateway node may obtain an object to be uploaded, and determine a second target virtual group based on an identifier of the object to be uploaded and pre-stored virtual group information, where the virtual group information includes a correspondence between each virtual group and a master node and a standby node of the virtual group, and a virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of the preset virtual groups when a management node meets a preset generation condition, so that the storage node stores the object to be uploaded, and backs up the object to be uploaded to a storage node corresponding to the standby node of the second target virtual group. The gateway node can rapidly complete the storage of the object to be uploaded based on the virtual group information generated by the management node.
When a user needs to store a certain file, the file can be sent to the gateway node as an object to be uploaded, and then the gateway node can acquire the object to be uploaded. Further, the gateway node may determine a second target virtual group based on an identifier (key) of the object to be uploaded and virtual group information stored in advance.
The virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual groups are obtained by generating corresponding virtual nodes for each storage node based on load information of the storage nodes and the number of preset virtual groups when a management node meets a preset generation condition. That is, the virtual group information is information related to a virtual group generated by the management node using the first processing method.
Specifically, the gateway node may perform hash modulo processing on the identifier of the object to be uploaded, and determine the virtual group for storing the object to be uploaded, as the second target virtual group.
Then, the gateway node may send the object to be uploaded to a storage node corresponding to the primary node of the second target virtual group, and the storage node receives the object to be uploaded, stores the object to be uploaded, and backs up the object to be uploaded to a storage node corresponding to the backup node of the second target virtual group. That is, the object to be uploaded is sent to the storage node corresponding to the standby node of the second target virtual group, and then the storage node corresponding to the standby node can store the object to be uploaded as a backup.
For example, the gateway Node performs hash modulo processing on the identifier of the object to be uploaded, determines that the virtual group used for storing the object to be uploaded is VG-0, and then the gateway Node may send the object to be uploaded to the storage Node 1 corresponding to the master Node-0-1-32 of the virtual group VG-0, where the storage Node 1 stores the object to be uploaded, and backs up the object to be uploaded to the storage Node corresponding to the standby Node-0-2-17.
In order to store the object to be uploaded conveniently, the virtual group information may further include an ip of the storage node corresponding to each virtual node, so that after the gateway node determines the second target virtual group, the gateway node may determine the storage node corresponding to the main node of the second target virtual group, and further store the object to be uploaded.
Corresponding to the first data processing method, the embodiment of the invention also provides a fourth data processing method. The fourth data processing method provided by the embodiment of the invention can be applied to a gateway node in a distributed storage system, and the distributed storage system further comprises a management node and a plurality of storage nodes.
As shown in fig. 9, a data processing method is applied to a gateway node in a distributed storage system, and the method includes:
s901, acquiring an object downloading instruction;
wherein the object download instruction comprises an identifier of an object to be downloaded.
S902, determining a third target virtual group based on the identification of the object to be downloaded and the pre-stored virtual group information;
the virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition.
And S903, reading the object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the main node of the third target virtual group.
It can be seen that in the scheme provided in the embodiment of the present invention, the gateway node may obtain an object download instruction, where the object download instruction includes an identifier of an object to be downloaded, and determine a third target virtual group based on the identifier of the object to be downloaded and pre-stored virtual group information, where the virtual group information includes a correspondence between each virtual group and a host node and a standby node of the virtual group, and a virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of the preset virtual groups when the management node meets a preset generation condition, and reads an object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the host node of the third target virtual group. The gateway node may quickly complete the object download based on the virtual group information generated by the management node.
When a user needs to download a certain file from the distributed storage system, an object downloading instruction can be sent to the gateway node, the object downloading instruction includes an identifier of an object to be downloaded, and the gateway node can determine a third target virtual group based on the identifier (key) of the object to be downloaded and the pre-stored virtual group information.
The virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage nodes and the number of preset virtual groups when the management node meets a preset generation condition. That is, the virtual group information is information related to a virtual group generated by the management node using the first processing method.
Specifically, the gateway node may perform hash modulo processing on the identifier of the object to be downloaded, and determine the virtual group in which the object to be downloaded is stored, as the third target virtual group. Then, the gateway node may read an object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the main node of the third target virtual group.
For example, the gateway Node performs hash modulo processing on the identifier of the object to be downloaded, and determines that the virtual group used for storing the object to be uploaded is VG-1, so that the gateway Node may read the object corresponding to the identifier of the object to be downloaded from the storage Node 3 corresponding to the master Node-1-3-58 of the virtual group VG-1.
In order to facilitate downloading of the object, the virtual group information may further include an IP (Internet Protocol Address) of the storage node corresponding to each virtual node, so that after the gateway node determines the third target virtual group, the gateway node may determine the storage node corresponding to the main node of the third target virtual group, and then read the object.
In one embodiment, after reading the object corresponding to the identifier of the object to be downloaded, the gateway node may return the object to the client for use by the user.
As an implementation manner of the embodiment of the present invention, in a case that reading an object corresponding to an identifier of an object to be downloaded from a master node of a third target virtual group fails, the method may further include:
and reading the object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the standby node of the third target virtual group.
Because the backup of the object stored by the master node is stored in the backup node corresponding to each master node, the gateway node can read the object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the backup node of the third target virtual group to meet the user requirement under the condition that reading the object corresponding to the identifier of the object to be downloaded from the master node of the third target virtual group fails.
Corresponding to the first data processing method, the embodiment of the invention also provides a data processing device. The first data processing apparatus provided in the embodiment of the present invention may be applied to a management node in a distributed storage system, where the distributed storage system further includes a plurality of storage nodes.
As shown in fig. 10, a data processing apparatus applied to a management node in a distributed storage system includes:
a load information obtaining module 1010, configured to obtain load information of the storage node when a preset generation condition is met;
a virtual group generating module 1020, configured to generate a corresponding virtual node for each storage node based on the load information and a preset number of virtual groups, so as to obtain multiple virtual groups;
wherein each virtual group comprises a plurality of virtual nodes.
A master/standby node determining module 1030, configured to select two virtual nodes from each virtual group, where the two virtual nodes are respectively used as a master node and a standby node of the virtual group;
the first object storage module 1040 is configured to store each storage object into a storage node corresponding to the primary node, and store a backup of each storage object into a storage node corresponding to a corresponding backup node, based on the identifier of each storage object and the virtual group information.
The virtual group information includes the corresponding relationship between each virtual group and the master node and the standby node of the virtual group.
It can be seen that in the scheme provided in the embodiment of the present invention, a management node may obtain load information of storage nodes when a preset generation condition is satisfied, generate a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups, to obtain a plurality of virtual groups, where each virtual group includes a plurality of virtual nodes, select two virtual nodes from each virtual group to serve as a master node and a backup node of the virtual group, store each storage object in the storage node corresponding to the master node based on an identifier of each storage object and virtual group information, and store a backup of each storage object in the storage node corresponding to the backup node, where the virtual group information includes a corresponding relationship between each virtual group and the master node and the backup node of the virtual group. Therefore, the management node can generate corresponding virtual nodes according to the load information of the storage nodes, two virtual nodes are selected from each virtual group and are respectively used as a main node and a standby node of the virtual group, and the selected main node and the standby node take the load information of the storage nodes into consideration, so that the main node and the standby node are more reasonably selected, and the load balance of each storage node in the distributed storage system is realized.
As an implementation manner of the embodiment of the present invention, the preset generating condition includes at least one of:
one or more storage nodes in the distributed storage system go offline;
one or more storage nodes in the distributed storage system are online;
the load information of the storage nodes in the distributed storage system meets a preset unbalance condition;
a virtual group update instruction is received.
As an implementation manner of the embodiment of the present invention, the load information includes an available storage capacity of each storage node;
the virtual group generation module 1020 includes:
the quantity weight determining unit is used for determining the quantity weight of the virtual nodes corresponding to each storage node according to the proportion between the available storage capacity of each storage node and the total storage capacity of all the storage nodes;
and the virtual group generating unit is used for generating corresponding virtual nodes for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of preset virtual groups to obtain a plurality of virtual groups.
As an implementation manner of the embodiment of the present invention, the active/standby node determining module 1030 includes:
the hash calculation unit is used for performing hash operation on the identifier of each virtual node to obtain a hash value of each virtual node, wherein the identifier of each virtual node comprises the identifier of the virtual group to which the virtual node belongs, the identifier of the corresponding storage node and a random number;
the sorting unit is used for sorting the virtual nodes included in each virtual group according to the corresponding hash values to obtain a sorting result;
a duplicate removal processing unit, configured to perform duplicate removal processing on consecutive virtual nodes having the same storage node based on the sorting result for the virtual nodes included in each virtual group;
and the main/standby node determining unit is used for selecting two continuous virtual nodes from each virtual group after the duplicate removal processing, and the two continuous virtual nodes are respectively used as a main node and a standby node of the virtual group.
As an implementation manner of the embodiment of the present invention, the distributed storage system further includes a monitoring node;
the load information obtaining module 1010 includes:
and acquiring the load information of the storage node reported by the monitoring node.
As an implementation manner of the embodiment of the present invention, the first object storage module 1040 includes:
the first object storage unit is configured to send the virtual group information to the storage nodes, so that each storage node determines a target storage object that does not belong to the storage node based on the virtual group information, determine a first target virtual group based on an identifier of the target storage object and an identifier of each virtual group, migrate the target storage object to a storage node corresponding to a primary node of the first target virtual group, so that the storage node corresponding to the primary node of the first target virtual group stores the target storage object, and store a backup of the target storage object to a storage node corresponding to a standby node of the first target virtual group.
As an implementation manner of the embodiment of the present invention, the distributed storage system further includes a gateway node; the device further comprises:
and the virtual group information sending module is used for sending the virtual group information to the gateway node so that the gateway node stores the virtual group information.
Corresponding to the second data processing method, the embodiment of the invention also provides a data processing device. The second data processing apparatus provided in the embodiment of the present invention may be applied to a storage node in a distributed storage system, where the distributed storage system further includes a management node.
As shown in fig. 11, a data processing apparatus applied to a storage node in a distributed storage system includes:
a virtual group information receiving module 1110, configured to receive virtual group information sent by the management node;
the virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition.
A first virtual group determining module 1120, configured to determine a target storage object not belonging to the storage node based on the virtual group information, and determine a first target virtual group based on an identifier of the target storage object and an identifier of each virtual group;
the second object storage module 1130 is configured to migrate the target storage object to a storage node corresponding to the primary node of the first target virtual group, so that the storage node corresponding to the primary node of the first target virtual group stores the target storage object, and store the backup of the target storage object to the storage node corresponding to the backup node of the first target virtual group.
As can be seen, in the solution provided in the embodiment of the present invention, a storage node may receive virtual group information sent by a management node, where the virtual group information includes a correspondence between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating, for each storage node, a corresponding virtual node based on load information of the storage node and a number of preset virtual groups when the management node meets a preset generation condition, determining, based on the virtual group information, a target storage object that does not belong to the storage node, and determining, based on an identifier of the target storage object and an identifier of each virtual group, a first target virtual group. And then, migrating the target storage object to a storage node corresponding to the main node of the first target virtual group, so that the storage node corresponding to the main node of the first target virtual group stores the target storage object, and storing the backup of the target storage object to the storage node corresponding to the standby node of the first target virtual group. In this way, the storage nodes can perform migration of the storage objects based on the virtual group information sent by the management node, and load balance of each storage node is ensured.
Corresponding to the third data processing method, an embodiment of the present invention further provides a data processing apparatus. The third data processing apparatus provided in the embodiment of the present invention may be applied to a gateway node in a distributed storage system, where the distributed storage system further includes a management node and a plurality of storage nodes.
As shown in fig. 12, a data processing apparatus applied to a gateway node in a distributed storage system includes:
an object obtaining module 1210, configured to obtain an object to be uploaded;
a second virtual group determining module 1220, configured to determine a second target virtual group based on the identifier of the object to be uploaded and pre-stored virtual group information;
the virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition.
The third object storage module 1230 is configured to send the object to be uploaded to a storage node corresponding to the primary node of the second target virtual group, so that the storage node stores the object to be uploaded, and backs up the object to be uploaded to a storage node corresponding to the backup node of the second target virtual group.
It can be seen that, in the scheme provided in the embodiment of the present invention, a gateway node may obtain an object to be uploaded, and determine a second target virtual group based on an identifier of the object to be uploaded and pre-stored virtual group information, where the virtual group information includes a correspondence between each virtual group and a master node and a standby node of the virtual group, and a virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of the preset virtual groups when a management node meets a preset generation condition, so that the storage node stores the object to be uploaded, and backs up the object to be uploaded to a storage node corresponding to the standby node of the second target virtual group. The gateway node can rapidly complete the storage of the object to be uploaded based on the virtual group information generated by the management node.
Corresponding to the fourth data processing method, an embodiment of the present invention further provides a data processing apparatus. The fourth data processing apparatus provided in the embodiment of the present invention may be applied to a gateway node in a distributed storage system, where the distributed storage system further includes a management node and a plurality of storage nodes.
As shown in fig. 13, a data processing apparatus applied to a gateway node in a distributed storage system includes:
a download instruction obtaining module 1310, configured to obtain an object download instruction;
wherein the object download instruction comprises an identifier of an object to be downloaded.
A third virtual group determining module 1320, configured to determine a third target virtual group based on the identifier of the object to be downloaded and pre-stored virtual group information;
the virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition.
An object downloading module 1330, configured to read an object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the main node of the third target virtual group.
It can be seen that in the scheme provided in the embodiment of the present invention, the gateway node may obtain an object download instruction, where the object download instruction includes an identifier of an object to be downloaded, and determine a third target virtual group based on the identifier of the object to be downloaded and pre-stored virtual group information, where the virtual group information includes a correspondence between each virtual group and a host node and a standby node of the virtual group, and a virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of the preset virtual groups when the management node meets a preset generation condition, and reads an object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the host node of the third target virtual group. The gateway node may quickly complete the object download based on the virtual group information generated by the management node.
As an implementation manner of the embodiment of the present invention, the apparatus further includes:
and the object reading module is used for reading the object corresponding to the identifier of the object to be downloaded from the storage node corresponding to the standby node of the third target virtual group under the condition that reading of the object corresponding to the identifier of the object to be downloaded from the main node of the third target virtual group fails.
Corresponding to the data processing method, the embodiment of the invention also provides a distributed storage system. As shown in fig. 1, a distributed storage system includes a management node 110 and a plurality of storage nodes 120, wherein:
the management node 110 is configured to, when a preset generation condition is met, obtain load information of the storage node 120; generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups; selecting two virtual nodes from each virtual group as a main node and a standby node of the virtual group respectively; based on the identification and the virtual group information of each storage object, storing each storage object into the storage node 120 corresponding to the main node, and storing the backup of each storage object into the storage node 120 corresponding to the corresponding backup node;
each virtual group comprises a plurality of virtual nodes, and the virtual group information comprises the corresponding relation between each virtual group and the master node and the standby node of the virtual group.
The storage node 120 is configured to store a storage object and/or a backup of the storage object.
Therefore, in the scheme provided by the embodiment of the invention, the management node can acquire the load information of the storage node under the condition that the preset generation condition is met; generating a corresponding virtual node for each storage node based on the load information and the number of the preset virtual groups to obtain a plurality of virtual groups; selecting two virtual nodes from each virtual group, and respectively using the two virtual nodes as a main node and a standby node of the virtual group; storing each storage object into a storage node corresponding to the main node and storing the backup of each storage object into a storage node corresponding to the corresponding backup node based on the identification of each storage object and the virtual group information; each virtual group comprises a plurality of virtual nodes, and the virtual group information comprises the corresponding relation between each virtual group and the main node and the standby node of the virtual group. The storage nodes may store storage objects and/or backups of the storage objects. Therefore, the management node can generate corresponding virtual nodes according to the load information of the storage nodes, two virtual nodes are selected from each virtual group and are respectively used as a main node and a standby node of the virtual group, and the selected main node and the standby node take the load information of the storage nodes into consideration, so that the main node and the standby node are more reasonably selected, and the load balance of each storage node in the distributed storage system is realized.
As an implementation manner of the embodiment of the present invention, the preset generating condition includes at least one of:
one or more storage nodes in the distributed storage system go offline;
one or more storage nodes in the distributed storage system are on-line;
the load information of the storage nodes in the distributed storage system meets a preset unbalance condition;
a virtual group update instruction is received.
As an implementation manner of the embodiment of the present invention, the load information includes an available storage capacity of each storage node;
the management node 110 is specifically configured to determine, according to a ratio between the available storage capacity of each storage node and the total storage capacity of all the storage nodes, a quantity weight of the virtual nodes corresponding to each storage node; and generating a corresponding virtual node for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of the preset virtual groups to obtain a plurality of virtual groups.
As an implementation manner of the embodiment of the present invention, the management node 110 is specifically configured to perform a hash operation on an identifier of each virtual node to obtain a hash value of each virtual node; sorting the virtual nodes included in each virtual group according to the corresponding hash value to obtain a sorting result; for the virtual nodes included in each virtual group, performing deduplication processing on continuous virtual nodes with the same storage node based on the sorting result; and selecting two continuous virtual nodes from each virtual group after the duplicate removal processing, and respectively using the two continuous virtual nodes as a main node and a standby node of the virtual group.
The identifier of each virtual node includes an identifier of a virtual group to which the virtual node belongs, an identifier of a corresponding storage node, and a random number.
As an implementation manner of the embodiment of the present invention, the management node 110 is specifically configured to send the virtual group information to the storage nodes, so that each storage node determines a target storage object that does not belong to the storage node based on the virtual group information, determine a first target virtual group based on an identifier of the target storage object and an identifier of each virtual group, migrate the target storage object to a storage node corresponding to a primary node of the first target virtual group, so that the storage node corresponding to the primary node of the first target virtual group stores the target storage object, and store a backup of the target storage object to a storage node corresponding to a standby node of the first target virtual group.
As an implementation manner of the embodiment of the present invention, as shown in fig. 5, the distributed storage system further includes a monitoring node 130;
the monitoring node 130 is configured to obtain load information of the storage node 120, and report the load information to the management node 110;
the management node 110 is specifically configured to obtain the load information reported by the monitoring node 130.
As an implementation manner of the embodiment of the present invention, as shown in fig. 6, the distributed storage system further includes a gateway node 140;
the management node 110 is further configured to send the virtual group information to the gateway node 140;
the gateway node 140 is configured to store the virtual group information.
As an implementation manner of the embodiment of the present invention, the gateway node 140 is further configured to obtain an object to be uploaded; determining a second target virtual group based on the identification of the object to be uploaded and the virtual group information; sending the object to be uploaded to a storage node corresponding to the main node of the second target virtual group;
and the storage node corresponding to the main node of the second target virtual group is used for storing the object to be uploaded and backing up the object to be uploaded to the storage node corresponding to the standby node of the second target virtual group.
As an implementation manner of the embodiment of the present invention, the gateway node 140 is further configured to obtain an object downloading instruction; determining a third target virtual group based on the identification of the object to be downloaded and the virtual group information; reading an object corresponding to the identifier of the object to be downloaded from a storage node corresponding to the main node of the third target virtual group;
wherein the object downloading instruction comprises an identifier of the object to be downloaded.
An embodiment of the present invention further provides a management node, as shown in fig. 14, the management node may include a processor 1401, a communication interface 1402, a memory 1403 and a communication bus 1404, wherein the processor 1401, the communication interface 1402 and the memory 1403 are communicated with each other via the communication bus 1404,
a memory 1403 for storing a computer program;
the processor 1401 is configured to implement the first data processing method steps according to any of the above embodiments when executing the program stored in the memory 1403.
As shown in fig. 15, the management node may include a processor 1501, a communication interface 1502, a memory 1503, and a communication bus 1504, where the processor 1501, the communication interface 1502, and the memory 1503 complete communication with each other through the communication bus 1504,
a memory 1503 for storing a computer program;
the processor 1501 is configured to implement the steps of the second data processing method according to any one of the embodiments described above when executing the program stored in the memory 1503.
An embodiment of the present invention further provides a gateway node, as shown in fig. 16, the management node may include a processor 1601, a communication interface 1602, a memory 1603, and a communication bus 1604, where the processor 1601, the communication interface 1602, and the memory 1603 complete communication with each other via the communication bus 1604,
a memory 1603 for storing a computer program;
the processor 1601 is configured to implement the third and fourth data processing method steps described in any of the above embodiments when executing the program stored in the memory 1603.
The communication bus mentioned in the management node, the storage node, and the gateway node may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface is used for communication between the management node, the storage node or the gateway node and other devices.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
The embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the steps of the data processing method described in any of the above embodiments are implemented.
It should be noted that, for the above-mentioned apparatus, system, management node, storage node, gateway node and computer-readable storage medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for the relevant points, refer to the partial description of the method embodiments.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a relevant manner, and the same and similar components in all the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (14)

1. A data processing method applied to a management node in a distributed storage system, the distributed storage system further including a plurality of storage nodes, the method comprising:
acquiring load information of the storage node under the condition that a preset generation condition is met;
generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups, wherein each virtual group comprises a plurality of virtual nodes;
selecting two virtual nodes from each virtual group, and respectively using the two virtual nodes as a main node and a standby node of the virtual group;
storing each storage object into a storage node corresponding to the main node and storing a backup of each storage object into a storage node corresponding to a corresponding standby node based on the identification of each storage object and virtual group information, wherein the virtual group information comprises the corresponding relation between each virtual group and the main node and the standby node of the virtual group;
wherein the load information comprises an available storage capacity of the each storage node; the step of generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups includes:
determining the quantity weight of the virtual nodes corresponding to each storage node according to the proportion of the available storage capacity of each storage node to the total storage capacity of all the storage nodes; and generating a corresponding virtual node for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of the preset virtual groups to obtain a plurality of virtual groups.
2. The method of claim 1, wherein the preset generation condition comprises at least one of:
one or more storage nodes in the distributed storage system go offline;
one or more storage nodes in the distributed storage system are on-line;
the load information of the storage nodes in the distributed storage system meets a preset unbalance condition;
a virtual group update instruction is received.
3. The method of claim 1, wherein said step of selecting two virtual nodes from each of said virtual groups as a master node and a backup node for the virtual group comprises:
performing hash operation on the identifier of each virtual node to obtain a hash value of each virtual node, wherein the identifier of each virtual node comprises the identifier of the virtual group to which the virtual node belongs, the identifier of the corresponding storage node and a random number;
sorting the virtual nodes included in each virtual group according to the corresponding hash value to obtain a sorting result;
for the virtual nodes included in each virtual group, performing deduplication processing on continuous virtual nodes with the same storage node based on the sorting result;
and selecting two continuous virtual nodes from each virtual group after the duplicate removal processing, and respectively using the two continuous virtual nodes as a main node and a standby node of the virtual group.
4. The method of claim 1, wherein the distributed storage system further comprises a monitoring node;
the step of obtaining the load information of the storage node includes:
and acquiring the load information of the storage node reported by the monitoring node.
5. The method according to any one of claims 1-4, wherein the step of storing each storage object in the storage node corresponding to the primary node and storing the backup of each storage object in the storage node corresponding to the corresponding backup node based on the identifier of each storage object and the virtual group information comprises:
and sending the virtual group information to the storage nodes, so that each storage node determines a target storage object which does not belong to the storage node based on the virtual group information, determines a first target virtual group based on the identification of the target storage object and the identification of each virtual group, migrates the target storage object to a storage node corresponding to a main node of the first target virtual group, so that the storage node corresponding to the main node of the first target virtual group stores the target storage object, and stores the backup of the target storage object to a storage node corresponding to a standby node of the first target virtual group.
6. The method of any of claims 1-4, wherein the distributed storage system further comprises a gateway node; the method further comprises the following steps:
and sending the virtual group information to the gateway node so that the gateway node stores the virtual group information.
7. A data processing method applied to a storage node in a distributed storage system, the distributed storage system further including a management node, the method comprising:
receiving virtual group information sent by the management node, wherein the virtual group information includes a corresponding relationship between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating a corresponding virtual node for each storage node based on load information of the storage node and the number of preset virtual groups when the management node meets a preset generation condition;
determining a target storage object which does not belong to the storage node based on the virtual group information, and determining a first target virtual group based on the identification of the target storage object and the identification of each virtual group;
migrating the target storage object to a storage node corresponding to a main node of the first target virtual group, so that the storage node corresponding to the main node of the first target virtual group stores the target storage object, and storing a backup of the target storage object to a storage node corresponding to a backup node of the first target virtual group;
wherein the load information comprises an available storage capacity of the each storage node; the step of generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups includes:
determining the quantity weight of the virtual nodes corresponding to each storage node according to the proportion of the available storage capacity of each storage node to the total storage capacity of all the storage nodes; and generating a corresponding virtual node for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of preset virtual groups to obtain a plurality of virtual groups.
8. A data processing apparatus, applied to a management node in a distributed storage system, the distributed storage system further including a plurality of storage nodes, the apparatus comprising:
the load information acquisition module is used for acquiring the load information of the storage node under the condition that a preset generation condition is met;
a virtual group generation module, configured to generate a corresponding virtual node for each storage node based on the load information and a preset number of virtual groups, so as to obtain multiple virtual groups, where each virtual group includes multiple virtual nodes;
a master/standby node determining module, configured to select two virtual nodes from each virtual group, where the two virtual nodes are respectively used as a master node and a standby node of the virtual group;
the first object storage module is used for storing each storage object into a storage node corresponding to the main node and storing the backup of each storage object into a storage node corresponding to a corresponding standby node based on the identification of each storage object and virtual group information, wherein the virtual group information comprises the corresponding relation between each virtual group and the main node and the standby node of the virtual group;
wherein the load information comprises an available storage capacity of the each storage node; the virtual group generation module includes:
the quantity weight determining unit is used for determining the quantity weight of the virtual nodes corresponding to each storage node according to the proportion of the available storage capacity of each storage node to the total storage capacity of all the storage nodes;
and the virtual group generating unit is used for generating corresponding virtual nodes for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of preset virtual groups to obtain a plurality of virtual groups.
9. A data processing apparatus, applied to a storage node in a distributed storage system, the distributed storage system further including a management node, the apparatus comprising:
a virtual group information receiving module, configured to receive virtual group information sent by the management node, where the virtual group information includes a correspondence between each virtual group and a master node and a standby node of the virtual group, and the virtual group is obtained by generating, by the management node, a corresponding virtual node for each storage node based on load information of the storage node and a number of preset virtual groups when a preset generation condition is satisfied;
a first virtual group determining module, configured to determine, based on the virtual group information, a target storage object that does not belong to the storage node, and determine a first target virtual group based on an identifier of the target storage object and an identifier of each virtual group;
the second object storage module is used for migrating the target storage object to a storage node corresponding to the main node of the first target virtual group, so that the storage node corresponding to the main node of the first target virtual group stores the target storage object, and stores the backup of the target storage object to the storage node corresponding to the standby node of the first target virtual group;
wherein the load information comprises an available storage capacity of the each storage node; the step of generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups includes:
determining the quantity weight of the virtual nodes corresponding to each storage node according to the proportion of the available storage capacity of each storage node to the total storage capacity of all the storage nodes; and generating a corresponding virtual node for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of the preset virtual groups to obtain a plurality of virtual groups.
10. A distributed storage system, comprising a management node and a plurality of storage nodes, wherein:
the management node is used for acquiring the load information of the storage node under the condition that a preset generation condition is met; generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups; selecting two virtual nodes from each virtual group, and respectively using the two virtual nodes as a main node and a standby node of the virtual group; storing each storage object into a storage node corresponding to the main node and storing a backup of each storage object into a storage node corresponding to a corresponding standby node based on the identification of each storage object and virtual group information, wherein each virtual group comprises a plurality of virtual nodes, and the virtual group information comprises the corresponding relation between each virtual group and the main node and the standby node of the virtual group;
the storage node is used for storing a storage object and/or a backup of the storage object;
wherein the load information comprises an available storage capacity of the each storage node; the step of generating a corresponding virtual node for each storage node based on the load information and the number of preset virtual groups to obtain a plurality of virtual groups includes:
determining the quantity weight of the virtual nodes corresponding to each storage node according to the proportion of the available storage capacity of each storage node to the total storage capacity of all the storage nodes; and generating a corresponding virtual node for each storage node based on the quantity weight of the virtual nodes corresponding to each storage node and the quantity of the preset virtual groups to obtain a plurality of virtual groups.
11. The system of claim 10, wherein the distributed storage system further comprises a monitoring node;
the monitoring node is used for acquiring the load information of the storage node and reporting the load information to the management node;
the management node is specifically configured to obtain the load information reported by the monitoring node.
12. The system of claim 10 or 11, wherein the distributed storage system further comprises a gateway node;
the management node is further configured to send the virtual group information to the gateway node;
the gateway node is configured to store the virtual group information.
13. The system of claim 12,
the gateway node is also used for acquiring an object to be uploaded; determining a second target virtual group based on the identification of the object to be uploaded and the virtual group information; sending the object to be uploaded to a storage node corresponding to the main node of the second target virtual group;
and the storage node corresponding to the main node of the second target virtual group is used for storing the object to be uploaded and backing up the object to be uploaded to the storage node corresponding to the standby node of the second target virtual group.
14. The system of claim 12,
the gateway node is also used for acquiring an object downloading instruction; determining a third target virtual group based on the identification of the object to be downloaded and the virtual group information; and reading an object corresponding to the identifier of the object to be downloaded from a storage node corresponding to the main node of the third target virtual group, wherein the object downloading instruction comprises the identifier of the object to be downloaded.
CN202110346412.0A 2021-03-31 2021-03-31 Data processing method and device and distributed storage system Active CN113055495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110346412.0A CN113055495B (en) 2021-03-31 2021-03-31 Data processing method and device and distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110346412.0A CN113055495B (en) 2021-03-31 2021-03-31 Data processing method and device and distributed storage system

Publications (2)

Publication Number Publication Date
CN113055495A CN113055495A (en) 2021-06-29
CN113055495B true CN113055495B (en) 2022-11-04

Family

ID=76516620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110346412.0A Active CN113055495B (en) 2021-03-31 2021-03-31 Data processing method and device and distributed storage system

Country Status (1)

Country Link
CN (1) CN113055495B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597567A (en) * 2017-09-30 2019-04-09 网宿科技股份有限公司 A kind of data processing method and device
WO2020010503A1 (en) * 2018-07-10 2020-01-16 深圳花儿数据技术有限公司 Multi-layer consistent hashing-based distributed data storage method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101117402B1 (en) * 2009-09-02 2012-03-02 한양대학교 산학협력단 Virtualized service management system and method and virtualized service system and virtualized service providing method for providing high-performance cluster
CN105657064B (en) * 2016-03-24 2019-03-12 东南大学 Swift load-balancing method based on dummy node storage optimization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597567A (en) * 2017-09-30 2019-04-09 网宿科技股份有限公司 A kind of data processing method and device
WO2020010503A1 (en) * 2018-07-10 2020-01-16 深圳花儿数据技术有限公司 Multi-layer consistent hashing-based distributed data storage method and system

Also Published As

Publication number Publication date
CN113055495A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN109474668B (en) CDN service switching method and device, computer equipment and storage medium
CN108023967B (en) Data balancing method and device and management equipment in distributed storage system
CN107404541B (en) Method and system for selecting neighbor node in peer-to-peer network transmission
CN105915650B (en) Load-balancing method and device
CN104506619A (en) Data backup and recovery method and device, and server
CN111008075A (en) Load balancing system, method, device, equipment and medium
CN106357449A (en) zedis distributed type buffer method
CN112948120A (en) Load balancing method, system, device and storage medium
CN110995513A (en) Data sending and receiving method in Internet of things system, Internet of things equipment and platform
CN106991070B (en) Real-time computing method and device
CN103246484B (en) A kind of date storage method, Apparatus and system
CN109597800B (en) Log distribution method and device
CN109597903B (en) Image file processing apparatus and method, file storage system, and storage medium
CN111562884B (en) Data storage method and device and electronic equipment
CN113485637A (en) Data storage method and device and computer equipment
JP6059558B2 (en) Load balancing judgment system
CN113055495B (en) Data processing method and device and distributed storage system
CN106790610B (en) Cloud system message distribution method, device and system
CN113965576A (en) Container-based big data acquisition method and device, storage medium and equipment
CN111046004B (en) Data file storage method, device, equipment and storage medium
US20160342899A1 (en) Collaborative filtering in directed graph
CN109815047B (en) Data processing method and related device
CN115756955A (en) Data backup and data recovery method and device and computer equipment
CN110716698B (en) Data fragment copy deployment method and device
CN110244903B (en) Data storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant