CN110764690B - Distributed storage system and leader node election method and device thereof - Google Patents

Distributed storage system and leader node election method and device thereof Download PDF

Info

Publication number
CN110764690B
CN110764690B CN201810850568.0A CN201810850568A CN110764690B CN 110764690 B CN110764690 B CN 110764690B CN 201810850568 A CN201810850568 A CN 201810850568A CN 110764690 B CN110764690 B CN 110764690B
Authority
CN
China
Prior art keywords
data storage
node
election
information
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810850568.0A
Other languages
Chinese (zh)
Other versions
CN110764690A (en
Inventor
陈希
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN201810850568.0A priority Critical patent/CN110764690B/en
Publication of CN110764690A publication Critical patent/CN110764690A/en
Application granted granted Critical
Publication of CN110764690B publication Critical patent/CN110764690B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a distributed storage system and a leader node election method and device thereof, wherein the distributed storage system comprises a plurality of data storage devices, the data storage devices are associated with corresponding master control processes, the master control processes are used for dividing the associated data storage devices to form a plurality of data storage nodes and configuring corresponding control threads for the data storage nodes, and the control threads are used for: if the corresponding data storage node is not the leader node and is not within the time threshold and receives the heartbeat message sent by the leader node in the node group to which the data storage node belongs, determining that the data storage node is a candidate node, sending first election information to other data storage nodes in the node group to instruct the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage node is selected as the leader node of the node group according to the first response information.

Description

Distributed storage system and leader node election method and device thereof
Technical Field
The application relates to the technical field of computers, in particular to a distributed storage system and a leader node election method and device thereof.
Background
A distributed storage system is a popular data storage technology, and in the distributed storage system, a virtual storage space may be established by using storage spaces in a plurality of data storage devices located at different geographic locations, and data storage is implemented in the virtual storage space, so as to store data in each data storage device in a scattered manner.
With the increase of data storage amount, the distributed storage system needs to ensure high reliability, for example, when a large amount of data storage devices in the distributed storage system are down, the distributed storage system needs to ensure that the data storage performance of other data storage devices is not affected as much as possible, so as to ensure the smooth operation of upstream services.
Based on this, it is necessary to provide a technical solution to improve the reliability of the distributed storage system.
Disclosure of Invention
The embodiment of the application aims to provide a distributed storage system and a leader node election method and device thereof, so as to improve the reliability of the distributed storage system.
In order to achieve the technical purpose, the embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a leader node election method for a distributed storage system, where the distributed storage system includes multiple data storage devices, the data storage devices are associated with corresponding master processes, and the master processes are configured to divide the associated data storage devices to form multiple data storage nodes and configure corresponding control threads for the data storage nodes, where the method includes:
if the data storage node is not a leader node, and the corresponding control thread is not within the time threshold and receives a heartbeat message sent by the leader node in a node group to which the data storage node belongs, determining that the data storage node is a candidate node in the node group to which the data storage node belongs, wherein each data storage node in the node group is used for storing the same data and is respectively positioned in different data storage devices;
and sending first election information to control threads corresponding to other data storage nodes in the node group through the control threads corresponding to the data storage nodes so as to instruct the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage nodes are selected as leader nodes of the node group according to the fed back first response information.
In a second aspect, an embodiment of the present application provides a distributed storage system, including: the system comprises a plurality of data storage devices, a plurality of control devices and a plurality of control modules, wherein the data storage devices are associated with corresponding master control processes;
the main control process is used for dividing the associated data storage equipment to form a plurality of data storage nodes and configuring corresponding control threads for the data storage nodes;
the control thread is used for determining that the data storage node is a candidate node in the node group to which the data storage node belongs if the corresponding data storage node is not the leader node and does not receive the heartbeat message sent by the leader node in the node group to which the data storage node belongs within the time threshold, sending first election information to the control threads corresponding to other data storage nodes in the node group to indicate the control threads corresponding to other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage node is selected as the leader node of the node group according to the fed back first response information;
and the data storage nodes in the node group are used for storing the same data and are respectively positioned in different data storage devices.
In a third aspect, an embodiment of the present application provides a leader node election device for a distributed storage system, where the distributed storage system includes a plurality of data storage devices, the data storage devices are associated with corresponding master processes, and the master processes are configured to divide the associated data storage devices to form a plurality of data storage nodes and configure corresponding control threads for the data storage nodes, and the device includes:
a determining module, configured to determine that the data storage node is a candidate node in a node group to which the data storage node belongs if the data storage node is not a leader node and a corresponding control thread is not within a time threshold and receives a heartbeat message sent by the leader node in the node group to which the data storage node belongs, where each data storage node in the node group is used to store the same data and is located in different data storage devices respectively;
and the node election module is used for sending first election information to the control threads corresponding to other data storage nodes in the node group through the control threads corresponding to the data storage nodes so as to instruct the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage nodes are selected as leader nodes of the node group according to the fed back first response information.
In a fourth aspect, an embodiment of the present application provides a leader node election device for a distributed storage system, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the leader node election method of the first aspect described above.
In a fifth aspect, an embodiment of the present application provides a storage medium for storing computer-executable instructions, where the computer-executable instructions, when executed, implement the leader node election method according to the first aspect.
According to the technical scheme, the election process of the leader node can be maintained by the control thread corresponding to each data storage node, so that the election work of the leader node is downloaded to each data storage device, and other data storage devices can autonomously elect the leader node after the data storage device where the leader node is located is down, so that the data storage process of the distributed storage system is not affected, and the reliability of the distributed storage system is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application;
FIG. 2a is a schematic diagram of a master process dividing associated data storage devices to form a plurality of data storage nodes according to an embodiment of the present application;
FIG. 2b is another schematic diagram of a master process partitioning associated data storage devices to form a plurality of data storage nodes according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a distributed storage system according to another embodiment of the present application;
fig. 4 is a schematic flowchart of a leader node election method according to an embodiment of the present application
Fig. 5a is a schematic view of an election process of a leader node according to an embodiment of the present application;
fig. 5b is a schematic view of an election process of a leader node according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a distributed storage system according to another embodiment of the present application;
fig. 6 is a schematic structural diagram of a distributed storage system according to another embodiment of the present application;
FIG. 7 is a schematic structural diagram of a leader node election device according to an embodiment of the present application;
fig. 8 is a schematic diagram illustrating a result of a leader node election device according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making creative efforts shall fall within the protection scope of the present application.
The embodiment of the application provides a distributed storage system and a leader node election method and device thereof, so that the reliability of the distributed storage system is improved.
Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application, and as shown in fig. 1, the distributed storage system includes a plurality of data storage devices, such as data storage devices 11, 12, 13, 14, 15 and the like shown in fig. 1, the data storage devices are associated with corresponding master processes, such as master process 100 in fig. 1, the master processes are configured to divide the associated data storage devices to form a plurality of data storage nodes, such as data storage nodes 1001, 1002, 1003, 1004 and the like in fig. 1, and the master processes further configure corresponding control threads, such as control threads 10001, 10002, 10003, 10004 and the like in fig. 1, for each data storage node. In a particular embodiment, each data storage device in the distributed storage system is associated with a corresponding master process.
In the system shown in fig. 1, the data storage devices have corresponding data storage spaces, and the master process divides the associated data storage devices to form a plurality of data storage nodes.
Fig. 2a is a schematic diagram of a master process dividing an associated data storage device to form a plurality of data storage nodes according to an embodiment of the present disclosure, and fig. 2b is another schematic diagram of a master process dividing an associated data storage device to form a plurality of data storage nodes according to an embodiment of the present disclosure, as shown in fig. 2a and fig. 2b, the master process acquires a size of a data storage space of the associated data storage device, for example, 10T, and the master process divides the associated data storage device to form 5 data storage nodes in a manner that each 2T storage space corresponds to one data storage node.
In this embodiment, when the data storage device stores data, a data storage node corresponding to the data is determined, and the data is stored in a data storage space corresponding to the data storage node.
In this embodiment of the present application, data storage nodes located in different data storage devices are further selected and taken out to form a node group, a certain data storage node in the node group is a leader node (leader), the other data storage nodes are follower nodes (followers), and each data storage node in the node group is used to store the same data. Fig. 3 is a schematic structural diagram of a distributed storage system according to another embodiment of the present application, and as shown in fig. 3, taking three data storage devices as an example, all first data storage nodes in the three data storage devices are selected to form a first node group, and a certain first data storage node in the three data storage devices is determined to be a leader node, and all second data storage nodes in the three data storage devices are selected to form a second node group, and a certain second data storage node in the three data storage devices is determined to be a leader node, and all third data storage nodes in the three data storage devices are selected to form a third node group, and a certain third data storage node in the three data storage devices is determined to be a leader node. When data distributed storage is carried out, after the leader node acquires a data storage request, the leader node stores data in the corresponding data storage space and notifies other follower data storage nodes to store the same data in the corresponding data storage space, so that the purpose of distributed storage of a copy of data in different data storage devices is achieved.
In the distributed storage system, each data storage node is configured with a corresponding control thread by the main control process, the main control process and the configured control thread share a memory, and the control thread configured by the main control process can be regarded as a thread object generated by the main control process. The control thread is used for maintaining leader node election work of the node group, and the control thread can determine the node group to which the corresponding data storage node belongs and maintain the leader node election work of the node group when the leader node does not exist in the node group.
Based on the above-described distributed data storage system, fig. 4 is a schematic flowchart of a leader node election method according to an embodiment of the present application, and as shown in fig. 4, the flow includes the following steps:
step S402, if the data storage node is not a leader node and the corresponding control thread is not within the time length threshold value and receives a heartbeat message sent by the leader node in the node group to which the data storage node belongs, determining the data storage node as a candidate node in the node group to which the data storage node belongs, wherein each data storage node in the node group is used for storing the same data and is respectively positioned in different data storage devices;
step S404, sending first election information to the control threads corresponding to the other data storage nodes in the node group through the control thread corresponding to the data storage node, so as to instruct the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage node is selected as a leader node of the node group according to the fed back first response information.
In this embodiment, RPC (Remote Procedure Call Protocol) communication may be performed between the master processes in each data storage device, and when a leader node exists in the node group, the master process corresponding to the leader node may send a heartbeat message to the master processes corresponding to other data storage nodes in the node group periodically through RPC communication.
In this embodiment, if the data storage node is not a leader node and the control thread corresponding to the data storage node is not within the time threshold, and the heartbeat message sent by the leader node in the node group to which the data storage node belongs is acquired, the control thread corresponding to the data storage node determines that the data storage node is a candidate node in the node group to which the data storage node belongs.
The control thread corresponding to the data storage node may send first election information to the control threads corresponding to other data storage nodes in the node group, where the first election information includes the number of received data storage requests and election weight corresponding to the data storage node, the number of received data storage requests corresponding to the data storage node is the number of accepted data storage requests of the control thread corresponding to the data storage node, the control thread corresponding to the data storage node further instructs the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and the first response information includes first consent sub-information or first rejection sub-information.
Instructing control threads corresponding to other data storage nodes in the node group to feed back corresponding first response information according to the first election information, which specifically comprises the following steps: and indicating the control threads corresponding to other data storage nodes in the node group, feeding back first consent sub-information to the control thread corresponding to the data storage node when determining that the number of the received data storage requests is smaller than that of the received data storage requests corresponding to the data storage node and the election weight of the data storage node corresponding to the node group is smaller than that of the data storage node, and otherwise feeding back first rejection sub-information to the control thread corresponding to the data storage node.
Specifically, after receiving the first election information, the control threads corresponding to the other data storage nodes compare the magnitude relationship between the number of received data storage requests of the control thread sending the first election information and the number of received data storage requests of the control thread according to the first election information, compare the magnitude relationship between the election weight of the control thread sending the first election information and the election weight of the data storage node corresponding to the control thread sending the first election information, if the number of received data storage requests of the control thread sending the first election information is greater than the number of received data storage requests of the control thread, and the election weight of the data storage node corresponding to the controllable thread sending the first election information is greater than the election weight of the data storage node corresponding to the control thread sending the first election information, the control threads corresponding to the other data storage nodes return first consent sub-information to the control thread sending the first election information, and otherwise, return first rejection sub-information. The first consent sub-information indicates that the corresponding data storage nodes are consented to be election competitive leader nodes, and the first rejection sub-information indicates that the corresponding data storage nodes are rejected to be election competitive leader nodes.
In this embodiment, when data is stored each time, the control thread corresponding to the leader node receives a data storage request first, and the control thread corresponding to the leader node sends a data storage request to the control threads corresponding to the other data storage nodes in the node group to which the control thread corresponds according to the received data storage request, thereby implementing distributed storage of data. Considering that the control threads corresponding to other data storage nodes in the node group may not receive the data storage request sent by the control thread corresponding to the leader node due to a communication failure, so that the reliability of data storage is reduced, therefore, to ensure that a relatively reliable data storage node is selected as the leader node, the number of the received data storage requests is used as one of the basis factors for selecting the leader node, so as to ensure that the communication condition of the selected leader node is relatively stable, and ensure that the selected leader node is relatively reliable.
In this embodiment, the election weight is used to indicate a weight for the data storage node to participate in election of the leader node, for a control thread, whenever the control thread receives a notification message that another data storage node sent by a control thread corresponding to another data storage node in a node group to which the corresponding control thread belongs becomes a candidate node through a corresponding master process, the control thread determines that the election weight of the corresponding data storage node is increased by 1, and whenever the control thread determines that the corresponding data storage node becomes a candidate node, the control thread determines that the election weight of the corresponding data storage node is increased by 1.
In this embodiment, the control thread receives first responder information returned by the control thread corresponding to the other data storage nodes in the node group based on the first election information. The control thread may determine, according to the fed-back first response information, whether the corresponding data storage node is selected as a leader node of the node group, specifically, if the control thread determines that the number of the received first consent sub-information is not less than the number threshold according to the fed-back first response information, it is determined that the corresponding data storage node is selected as the leader node of the node group, and otherwise, it is determined that the corresponding data storage node is not selected as the leader node of the node group. The number threshold may be half of the number of nodes in the node group to which the data storage node belongs.
Specifically, when the number of the first consent sub-information received by the control thread is not less than the number threshold, it is indicated that the data storage node not less than the number threshold agrees to elect the data storage node corresponding to the control thread as the leader node, and therefore the control thread determines that the corresponding data storage node is selected as the leader node of the node group. Otherwise, if the number of the first synonym sub-information is less than the number threshold, the control thread determines that the corresponding data storage node is not selected as the leader node of the node group.
Therefore, in the embodiment of the application, the election process of the leader node can be maintained by the control thread corresponding to each data storage node, so that the effect of putting the election work of the leader node down to each data storage device is achieved, and after the data storage device where the leader node is located is down, other data storage devices can autonomously elect the leader node, so that the data storage process of the distributed storage system is not affected, and the reliability of the distributed storage system is improved.
In this embodiment, RPC communication may be performed between the main control processes in each data storage device, and the control thread sends the first election information to the control threads corresponding to the other data storage nodes in the node group, where the first election information may be: and the control thread sends first election information to the control threads corresponding to other data storage nodes in the node group through the communication relation between the main control process corresponding to the corresponding data storage node and the main control processes corresponding to other data storage nodes in the node group. In this embodiment, the control thread corresponding to the data storage node may further receive, through the communication relationship, first response information returned by the control threads corresponding to other data storage nodes in the node group.
Specifically, fig. 5a is a schematic diagram of a leader node election process provided in an embodiment of the present application, and as shown in fig. 5a, a master process corresponding to a control thread sends first election information to master processes corresponding to other data storage nodes in an affiliated node group through RPC communication, and receives first response information returned by the master processes corresponding to the other data storage nodes in the affiliated node group through RPC communication.
Because the main control process and the control thread configured by the main control process share the memory, the main control process can acquire the first election information which needs to be sent by the control thread from the memory, and the control thread can acquire the first response information received by the main control process from the memory. Similarly, the main control processes corresponding to other data storage nodes may also acquire the first response information that needs to be sent, and the control processes corresponding to other data storage nodes may also acquire the first election information received by the corresponding main control processes.
In this embodiment, after other data storage nodes in the node group become candidate nodes, if the control thread corresponding to the data storage node receives second election information sent by the control thread corresponding to other data storage nodes in the node group, the control thread corresponding to the data storage node returns second response information to the control thread sending the second election information according to the second election information, so as to determine whether the data storage node sending the second election information agrees to be elected as a leader node. Wherein the process of the other data storage nodes becoming candidate nodes is the same as the process of the data storage node becoming a candidate node described earlier.
Specifically, the second election information includes the number of data storage requests received by the control thread that sends out the second election information and the corresponding election weight, the election weight corresponding to the control thread that sends out the second election information is the election weight of the data storage node corresponding to the control thread that sends out the second election information, the second response information includes the second consent sub-information or the second rejection sub-information, and the control thread corresponding to the data storage node returns the second response information to the control thread that sends out the second election information according to the second election information, including:
and if the number of the received data storage requests of the control thread sending the second election information is larger than the number of the received data storage requests of the control thread sending the second election information and the election weight of the data storage node corresponding to the control thread sending the second election information is larger than the election weight of the data storage node corresponding to the control thread sending the second election information, returning second consent sub-information to the control thread sending the second election information, and otherwise, returning second rejection sub-information to the control thread sending the second election information.
Correspondingly, if the control thread sending out the second election information receives second consent sub-information returned by the control thread corresponding to the data storage nodes not smaller than the quantity threshold value in the node group, it is determined that the data storage node corresponding to the control thread sending out the second election information is elected as the leader node of the node group.
In order to avoid the situation that a plurality of candidate nodes exist in the node group, in this embodiment, if the control thread determines that the election weight of the data storage node corresponding to the control thread which sends the second election information is greater than the election weight of the data storage node corresponding to the control thread which sends the second election information according to the received second election information, the data storage node corresponding to the control thread is degraded from the candidate node to the follower node, and the election weight of the data storage node corresponding to the control thread which sends the second election information is adjusted to be the same as the election weight of the data storage node corresponding to the control thread which sends the second election information.
Similarly, if the other control threads determine that the election weight of the data storage node corresponding to the other control threads is smaller than the election weight corresponding to the data storage node according to the received first election information, the other control threads determine that the data storage node corresponding to the other control threads is degraded into a following node in the node group, and adjust the election weight of the data storage node corresponding to the other control threads to be the same as the election weight of the data storage node corresponding to the control thread which sends the first election information.
In this embodiment, RPC communication may be performed between the master control processes in each data storage device, and the control thread may receive second election information sent by the control threads corresponding to other data storage nodes through RPC communication between the corresponding master control process and the master control processes corresponding to other data storage nodes, and return second response information to the control threads corresponding to other data storage nodes.
Specifically, fig. 5b is a schematic diagram of a leader node election process provided in an embodiment of the present application, and as shown in fig. 5b, a master process corresponding to a control thread receives second election information sent by the master process corresponding to a candidate node through RPC communication, and sends second response information to the master process corresponding to the candidate node through RPC communication. Because the main control process and the control thread configured by the main control process share the memory, the main control process can acquire the second response information which needs to be sent by the control thread from the memory, and the control thread can acquire the second election information received by the main control process from the memory. Similarly, the master control thread corresponding to the candidate node may also acquire second election information to be sent, and the control thread corresponding to the candidate node may also acquire second response information.
In this embodiment, if the data storage node is elected as the leader node of the node group, the master control process corresponding to the data storage node may also send a heartbeat message to master control processes corresponding to other data storage nodes except the leader node in the node group by periodically performing RPC communication, so that control threads corresponding to other data storage nodes receive the heartbeat message, and keep heartbeat connection between the leader node and each other data storage node.
In this embodiment, if the data storage node is not elected as the leader node of the node group, the master control process corresponding to the data storage node may also receive, periodically through RPC communication, a heartbeat message sent by the master control process corresponding to the leader node of the node group, thereby periodically obtaining the heartbeat message from the leader node.
Fig. 6 is a schematic structural diagram of a distributed storage system according to another embodiment of the present application, and as shown in fig. 6, a data storage device is further associated with corresponding data read/write processes, such as the data read/write processes 10005 and 10006 shown in fig. 6, where one data read/write process corresponds to one or more data storage nodes in the data storage device.
The data reading and writing process is used for receiving a first data storage request aiming at the data storage node after the data storage node is selected as a leader node of the node group, storing data in a storage space corresponding to the data storage node according to the first data storage request, and sending a second data storage request to data reading and writing processes corresponding to other data storage nodes in the node group according to the first data storage request so as to indicate the other data storage nodes in the node group to store the data in the corresponding storage space.
Specifically, the data read-write process is used to implement data storage and data read in the distributed storage system, as shown in fig. 6, a plurality of data storage nodes may correspond to one data read-write process together, or each data storage node may correspond to one data read-write process. In this embodiment, a control thread can obtain, from a data read-write process, quantity information and election level information of received data storage requests corresponding to data storage nodes through an HTTP ((HyperText Transfer Protocol, hyperText Transfer Protocol) communication mode, where the data read-write process and the control thread corresponding to the data storage node corresponding to the data read-write process share a memory, when storing data, a node used for storing data in a distributed storage system first determines a node group corresponding to the data to be stored, and then sends a data storage request to a data read-write process corresponding to each data storage node in the node group, and after receiving the data storage request, a data read-write process corresponding to a leader node in the node group responds to the request to write the data into a storage space corresponding to the leader node, and the data read-write process corresponding to the leader node also sends a data storage request to a data read-write process corresponding to another node in the node group, so that the data read-write processes corresponding to another node also store the same data in the corresponding storage space, thereby completing distributed storage of the data.
The data reading and writing process corresponding to the leader node sends the data storage request to the data reading and writing processes corresponding to other data storage nodes in the node group, and the data storage request can be realized through RPC communication between the master control processes in each data storage device.
In this embodiment, the data storage request may be used to store a plurality of pieces of data to be stored, and the data read-write process may further store, in parallel, each piece of data to be stored in the storage space corresponding to the corresponding data storage node when it is determined that there is no dependency relationship between each piece of data to be stored. Determining whether the data to be stored have the dependency relationship may be, and determining whether the data to be stored have the dependency relationship according to whether the data to be stored carries the dependency identifier. For example, if the data to be stored does not carry the dependency identifier, it is determined that the data to be stored does not depend on other data to be stored, and if the data to be stored carries the dependency identifier, the data to be stored pointed by the dependency identifier is used as the data on which the data to be stored depends.
Therefore, according to the embodiment, when the data to be stored does not have a dependency relationship, the data to be stored can be stored in parallel in the storage space corresponding to the corresponding data storage node, so that the data can be written in parallel, and the data storage efficiency is improved.
In summary, the distributed storage system in the embodiment of the present application has at least the following advantages:
(1) Each data storage device is provided with a plurality of control threads and independent master control processes, and the election process of the leader node can be maintained by the control threads corresponding to the data storage nodes, so that the election work of the leader node is downloaded to each data storage device, decentralized of a distributed storage system is achieved, and the bottleneck problem caused by centralized management of the devices is solved.
(2) When a large number of data storage devices where the leader nodes are located in the distributed storage system are down, other data storage devices can autonomously select the leader nodes, so that the data storage process of the distributed storage system is not affected, the efficiency and reliability of the distributed storage system are improved, and the fault recovery efficiency of the distributed storage system is improved.
(3) The master control process and the control thread are used for being responsible for heartbeat communication among the data storage nodes and selection of the leader node, and the data reading and writing process is used for being responsible for data storage and reading, so that separation of a control layer and a data layer in the distributed storage system is realized, and usability of the distributed storage system is improved.
(4) When the data to be stored do not have a dependency relationship, the data reading and writing process can store the data to be stored in parallel in the storage space corresponding to the corresponding data storage node, so that the data can be written in parallel, and the data storage efficiency is improved.
Based on the above description, an embodiment of the present application further provides a distributed storage system, including: a plurality of data storage devices having associated therewith a corresponding master process. And the main control process is used for dividing the associated data storage equipment to form a plurality of data storage nodes and configuring corresponding control threads for each data storage node. And the control thread is used for determining that the data storage node is a candidate node in the node group to which the data storage node belongs if the corresponding data storage node is not the leader node and does not receive the heartbeat message sent by the leader node in the node group to which the data storage node belongs within the time threshold, sending first election information to the control threads corresponding to other data storage nodes in the node group to indicate the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage node is selected as the leader node of the node group according to the fed back first response information. And each data storage node in the node group is used for storing the same data and is respectively positioned in different data storage devices.
Optionally, the first election information includes the number of received data storage requests and election weight corresponding to the data storage node, the first response information includes first consent sub-information or first rejection sub-information, and the control thread is specifically configured to: and indicating the control threads corresponding to other data storage nodes in the node group, feeding back first consent sub-information to the control thread corresponding to the data storage node when determining that the number of the received data storage requests per se is smaller than the number of the received data storage requests corresponding to the data storage node and the election weight of the data storage node per se is smaller than the election weight of the data storage node, and otherwise, feeding back first rejection sub-information to the control thread corresponding to the data storage node.
Optionally, the control thread is specifically configured to: and if the number of the received first consent sub-information is determined to be not less than the number threshold according to the fed back first response information, determining that the data storage node is selected as the leader node of the node group, otherwise, determining that the data storage node is not selected as the leader node of the node group.
Optionally, the control thread is further configured to: and if second election information sent by the control threads corresponding to other data storage nodes in the node group is received, returning second response information to the control thread sending the second election information according to the second election information so as to determine whether the data storage nodes sending the second election information agree to be elected as leader nodes.
Optionally, the second election information includes the number of data storage requests received by the control thread that sends out the second election information and a corresponding election weight, and the second response information includes second consent sub-information or second rejection sub-information; the control thread is further specifically configured to: and if the number of the received data storage requests of the control thread sending the second election information is larger than the number of the received data storage requests corresponding to the data storage nodes and the election weight corresponding to the control thread sending the second election information is larger than the election weight of the data storage nodes, returning second consent sub-information to the control thread sending the second election information, and otherwise, returning second rejection sub-information to the control thread sending the second election information.
Optionally, the control thread is further configured to: and if the election weight corresponding to the control thread sending out the second election information is determined to be larger than the election weight of the data storage node according to the second election information, the data storage node is degraded from the candidate node to the following node.
Optionally, the control thread is specifically configured to: sending first election information to control threads corresponding to other data storage nodes in the node group through a communication relation between a main control process corresponding to the data storage node and main control processes corresponding to other data storage nodes in the node group; the control thread is further configured to: and receiving first response information returned by the control threads corresponding to other data storage nodes in the node group through the communication relation.
Optionally, the data storage device is further associated with a corresponding data read-write process, where in the data storage device, the data read-write process corresponds to multiple data storage nodes, and the data read-write process is configured to: after the data storage node is selected as a leader node of the node group, receiving a first data storage request aiming at the data storage node, and storing data in a storage space corresponding to the data storage node according to the first data storage request; and sending a second data storage request to the data reading and writing process corresponding to other data storage nodes in the node group according to the first data storage request so as to indicate the other data storage nodes in the node group to store data in the corresponding storage space.
Optionally, the first data storage request is used to store a plurality of data to be stored, and the data reading and writing process is specifically configured to: and when determining that the data to be stored do not have the dependency relationship, storing the data to be stored in parallel in the storage space corresponding to the corresponding data storage node.
In the embodiment of the application, the election process of the leader node can be maintained by the control thread corresponding to each data storage node, so that the effect of putting the election work of the leader node to each data storage device is achieved, and after the data storage device where the leader node is located is down, other data storage devices can autonomously elect the leader node, so that the data storage process of the distributed storage system is not affected, and the reliability of the distributed storage system is improved.
The specific process of the present embodiment can also refer to the description of the method part, and has the same beneficial effects, and will not be repeated here.
Further, an embodiment of the present application provides a leader node election device for a distributed storage system, where the distributed storage system includes a plurality of data storage devices, each data storage device is associated with a corresponding master process, and each master process is configured to divide the associated data storage device to form a plurality of data storage nodes, and configure corresponding control threads for each data storage node, where fig. 7 is a schematic structural diagram of the leader node election device provided in an embodiment of the present application, and as shown in fig. 7, the device includes: a node determination module 71 and a node election module 72.
The node determining module 71 is configured to determine that the data storage node is a candidate node in a node group to which the data storage node belongs if the data storage node is not a leader node and a corresponding control thread is not within a time threshold and receives a heartbeat message sent by the leader node in the node group to which the data storage node belongs, where each data storage node in the node group is used to store the same data and is located in different data storage devices respectively.
The node election module 72 is configured to send first election information to the control threads corresponding to the other data storage nodes in the node group through the control thread corresponding to the data storage node, so as to instruct the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determine whether the data storage node is selected as a leader node of the node group according to the fed back first response information.
In this embodiment, the first election information includes the number of received data storage requests and election weight corresponding to the data storage node, the first response information includes first consent sub-information or first rejection sub-information, and the node election module 72 is specifically configured to: and indicating the control threads corresponding to other data storage nodes in the node group, feeding back first consent sub-information to the control thread corresponding to the data storage node when determining that the number of the received data storage requests is smaller than that of the data storage nodes and the election weight of the data storage node corresponding to the control thread is smaller than that of the data storage node, and otherwise feeding back first rejection sub-information to the control thread corresponding to the data storage node.
In this embodiment, the node election module 72 is specifically configured to: and if the number of the received first consent sub-information is determined to be not less than the number threshold according to the fed back first response information, determining that the data storage node is selected as the leader node of the node group, otherwise, determining that the data storage node is not selected as the leader node of the node group.
In this embodiment, the method further includes: and the information sending module is used for returning second response information to the control thread sending the second election information according to the second election information if the control thread corresponding to the data storage node receives the second election information sent by the control thread corresponding to other data storage nodes in the node group, so as to determine whether the data storage node sending the second election information agrees to be elected as a leader node.
In this embodiment, the second election information includes the number of data storage requests received by the control thread that sends the second election information and a corresponding election weight, the second response information includes second consent sub-information or second denial sub-information, and the information sending module is specifically configured to: and if the number of the received data storage requests of the control thread sending the second election information is larger than the number of the received data storage requests corresponding to the data storage node and the election weight corresponding to the control thread sending the second election information is larger than the election weight of the data storage node according to the second election information, returning second consent sub-information to the control thread sending the second election information, and otherwise, returning second rejection sub-information to the control thread sending the second election information.
In this embodiment, the method further includes: and the degradation module is used for degrading the data storage node from the candidate node to the following node if the election weight corresponding to the control thread sending out the second election information is determined to be greater than the election weight of the data storage node according to the second election information.
In this embodiment, the node election module 72 is specifically configured to: through the communication relation between the main control process corresponding to the data storage node and the main control processes corresponding to other data storage nodes in the node group, first election information is sent to the control threads corresponding to other data storage nodes in the node group, and the device further comprises: and the information receiving module is used for receiving the first response information returned by the control threads corresponding to other data storage nodes in the node group through the communication relation.
In the embodiment of the application, the election process of the leader node can be maintained by the control thread corresponding to each data storage node, so that the election work of the leader node is released to each data storage device, and after the data storage device where the leader node is located is down, other data storage devices can autonomously elect the leader node, so that the data storage process of the distributed storage system is not affected, and the reliability of the distributed storage system is improved.
The specific processes of the present embodiment can also refer to the descriptions of the foregoing method parts, and have the same beneficial effects, and are not repeated here.
Further, an embodiment of the present application further provides a leader node election device for a distributed storage system, fig. 8 is a schematic diagram of a result of the leader node election device provided in an embodiment of the present application, as shown in fig. 8, the leader node election device may generate a relatively large difference due to different configurations or performances, and may include one or more processors 901 and a memory 902, and one or more storage applications or data may be stored in the memory 902. Memory 902 may be, among other things, transient storage or persistent storage. The application program stored in memory 902 may include one or more modules (not shown), each of which may include a series of computer-executable instructions in the election device of a leader node. Still further, the processor 901 can be configured to communicate with the memory 902 to execute a series of computer-executable instructions in the memory 902 on the leader node election device. The leader node election device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input-output interfaces 905, one or more keyboards 906, and the like.
In a particular embodiment, the leader election device is comprised of memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the leader election device, and execution of the one or more programs by one or more processors includes computer-executable instructions for:
if the data storage node is not a leader node, and the corresponding control thread is not within the time threshold and receives a heartbeat message sent by the leader node in a node group to which the data storage node belongs, determining that the data storage node is a candidate node in the node group to which the data storage node belongs, wherein each data storage node in the node group is used for storing the same data and is respectively positioned in different data storage devices;
and sending first election information to control threads corresponding to other data storage nodes in the node group through the control thread corresponding to the data storage node to instruct the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage node is selected as a leader node of the node group according to the fed back first response information.
Optionally, when executed, the computer executable instructions include that the first election information includes the number of received data storage requests and election weights corresponding to the data storage node, and the first response information includes first consent sub-information or first rejection sub-information, and instruct control threads corresponding to other data storage nodes in the node group to feed back corresponding first response information according to the first election information, including: and indicating the control threads corresponding to other data storage nodes in the node group, feeding back first consent sub-information to the control thread corresponding to the data storage node when determining that the number of the received data storage requests is smaller than that of the received data storage requests corresponding to the data storage node and the election weight of the data storage node corresponding to the node group is smaller than that of the data storage node, and otherwise feeding back first rejection sub-information to the control thread corresponding to the data storage node.
Optionally, the computer-executable instructions, when executed, determine whether the data storage node is selected as a leader node of the group of nodes according to the fed back first response information, comprising: and if the number of the received first consent sub-information is determined to be not less than the number threshold according to the fed back first acknowledgement information, determining that the data storage node is selected as the leader node of the node group, otherwise, determining that the data storage node is not selected as the leader node of the node group.
Optionally, the computer executable instructions, when executed, further comprise: and if the control thread corresponding to the data storage node receives second election information sent by the control threads corresponding to other data storage nodes in the node group, returning second response information to the control thread sending the second election information according to the second election information so as to determine whether the data storage node sending the second election information agrees to be elected as a leader node.
Optionally, when the computer-executable instructions are executed, the second election information includes the number of data storage requests received by the control thread which sends out the second election information and election weight, the second response information includes second consent sub-information or second rejection sub-information, and the second response information is returned to the control thread which sends out the second election information according to the second election information, where the method includes: and if the number of the received data storage requests of the control thread sending the second election information is determined to be larger than the number of the received data storage requests corresponding to the data storage nodes according to the second election information, and the election weight corresponding to the control thread sending the second election information is larger than the election weight of the data storage nodes, returning second consent sub-information to the control thread sending the second election information, and otherwise, returning second rejection sub-information to the control thread sending the second election information.
Optionally, the computer executable instructions, when executed, further comprise: and if the election weight corresponding to the control thread sending out the second election information is determined to be larger than the election weight of the data storage node according to the second election information, the data storage node is degraded from the candidate node to the following node.
Optionally, when executed, the computer-executable instructions send first election information to control threads corresponding to other data storage nodes in the node group, where the first election information includes: and sending first election information to control threads corresponding to other data storage nodes in the node group through the communication relation between the main control process corresponding to the data storage node and the main control processes corresponding to the other data storage nodes in the node group. Further comprising: and the control thread corresponding to the data storage node receives the first response information returned by the control threads corresponding to other data storage nodes in the node group through the communication relation.
In the embodiment of the application, the election process of the leader node can be maintained by the control thread corresponding to each data storage node, so that the effect of putting the election work of the leader node to each data storage device is achieved, and after the data storage device where the leader node is located is down, other data storage devices can autonomously elect the leader node, so that the data storage process of the distributed storage system is not affected, and the reliability of the distributed storage system is improved.
The specific processes of the present embodiment can also refer to the descriptions of the foregoing method parts, and have the same beneficial effects, and are not repeated here.
Further, embodiments of the present application also provide a storage medium for storing computer-executable instructions, in a specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, and the like, and the storage medium stores computer-executable instructions that, when executed by a processor, implement the following processes:
if the data storage node is not a leader node and the corresponding control thread is not within the time threshold and receives a heartbeat message sent by the leader node in the node group to which the data storage node belongs, determining that the data storage node is a candidate node in the node group to which the data storage node belongs, wherein each data storage node in the node group is used for storing the same data and is respectively positioned in different data storage devices;
and sending first election information to control threads corresponding to other data storage nodes in the node group through the control thread corresponding to the data storage node to instruct the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage node is selected as a leader node of the node group according to the fed back first response information.
Optionally, when executed by the computer-executable instructions, the first election information includes the number of received data storage requests and election weight corresponding to the data storage node, the first response information includes first consent sub-information or first rejection sub-information, and the control threads corresponding to other data storage nodes in the node group are instructed to feed back corresponding first response information according to the first election information, including: and indicating the control threads corresponding to other data storage nodes in the node group, feeding back first consent sub-information to the control thread corresponding to the data storage node when determining that the number of the received data storage requests is smaller than that of the received data storage requests corresponding to the data storage node and the election weight of the data storage node corresponding to the node group is smaller than that of the data storage node, and otherwise feeding back first rejection sub-information to the control thread corresponding to the data storage node.
Optionally, the computer-executable instructions, when executed, determine whether the data storage node is selected as a leader node of the group of nodes according to the fed back first reply information, comprising: and if the number of the received first consent sub-information is determined to be not less than the number threshold according to the fed back first acknowledgement information, determining that the data storage node is selected as the leader node of the node group, otherwise, determining that the data storage node is not selected as the leader node of the node group.
Optionally, the computer executable instructions, when executed, further comprise: and if the control thread corresponding to the data storage node receives second election information sent by the control threads corresponding to other data storage nodes in the node group, returning second response information to the control thread sending the second election information according to the second election information so as to determine whether the data storage node sending the second election information agrees to be elected as a leader node.
Optionally, when the computer-executable instructions are executed, the second election information includes the number of data storage requests received by the control thread which sends out the second election information and election weight, the second response information includes second consent sub-information or second rejection sub-information, and the second response information is returned to the control thread which sends out the second election information according to the second election information, where the method includes: and if the number of the received data storage requests of the control thread sending the second election information is determined to be larger than the number of the received data storage requests corresponding to the data storage nodes according to the second election information, and the election weight corresponding to the control thread sending the second election information is larger than the election weight of the data storage nodes, returning second consent sub-information to the control thread sending the second election information, and otherwise, returning second rejection sub-information to the control thread sending the second election information.
Optionally, the computer executable instructions, when executed, further comprise: and if the election weight corresponding to the control thread sending out the second election information is determined to be larger than the election weight of the data storage node according to the second election information, the data storage node is degraded from the candidate node to the following node.
Optionally, when executed, the computer-executable instructions send first election information to control threads corresponding to other data storage nodes in the node group, where the first election information includes: and sending first election information to control threads corresponding to other data storage nodes in the node group through the communication relation between the main control process corresponding to the data storage node and the main control processes corresponding to the other data storage nodes in the node group. Further comprising: and the control thread corresponding to the data storage node receives the first response information returned by the control threads corresponding to other data storage nodes in the node group through the communication relation.
In the embodiment of the application, the election process of the leader node can be maintained by the control thread corresponding to each data storage node, so that the effect of putting the election work of the leader node to each data storage device is achieved, and after the data storage device where the leader node is located is down, other data storage devices can autonomously elect the leader node, so that the data storage process of the distributed storage system is not affected, and the reliability of the distributed storage system is improved.
The specific processes of the present embodiment can also refer to the descriptions of the foregoing method parts, and have the same beneficial effects, and are not repeated here.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical blocks. For example, a Programmable Logic Device (PLD) (e.g., a Field Programmable Gate Array (FPGA)) is an integrated circuit whose Logic functions are determined by a user programming the Device. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as ABEL (Advanced Boolean Expression Language), AHDL (alternate Hardware Description Language), traffic, CUPL (core universal Programming Language), HDCal, jhddl (Java Hardware Description Language), lava, lola, HDL, PALASM, rhyd (Hardware Description Language), and vhigh-Language (Hardware Description Language), which is currently used in most popular applications. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium that stores computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (25)

1. A leader node election method for a distributed storage system, the distributed storage system comprising a plurality of data storage devices, the data storage devices being associated with corresponding master processes, the master processes being configured to divide the associated data storage devices to form a plurality of data storage nodes and configure respective control threads for each data storage node, the method comprising:
if the data storage node is not a leader node, and the corresponding control thread is not within the time threshold and receives a heartbeat message sent by the leader node in a node group to which the data storage node belongs, determining that the data storage node is a candidate node in the node group to which the data storage node belongs, wherein each data storage node in the node group is used for storing the same data and is respectively positioned in different data storage devices;
and sending first election information to control threads corresponding to other data storage nodes in the node group through the control threads corresponding to the data storage nodes to instruct the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage node is selected as a leader node of the node group according to the fed back first response information.
2. The method of claim 1, wherein the first election information includes a number of received data storage requests and election weights corresponding to the data storage nodes, the first response information includes a first consent sub-information or a first rejection sub-information, and the instructing control threads corresponding to other data storage nodes in the node group to feed back corresponding first response information according to the first election information includes:
and indicating control threads corresponding to other data storage nodes in the node group, feeding back first consent sub-information to the control threads corresponding to the data storage nodes when determining that the number of the received data storage requests is smaller than that of the received data storage requests corresponding to the data storage nodes and the election weight of the data storage nodes corresponding to the node group is smaller than that of the data storage nodes, and otherwise feeding back first rejection sub-information to the control threads corresponding to the data storage nodes.
3. The method of claim 2, wherein said determining whether the data storage node is selected as a leader node of the group of nodes based on the fed back first reply information comprises:
and if the number of the received first consent sub-information is determined to be not less than the number threshold according to the fed back first response information, determining that the data storage node is selected as the leader node of the node group, otherwise, determining that the data storage node is not selected as the leader node of the node group.
4. The method according to any one of claims 1-3, further comprising:
and if the control thread corresponding to the data storage node receives second election information sent by the control threads corresponding to other data storage nodes in the node group, returning second response information to the control thread sending the second election information according to the second election information so as to determine whether the data storage node sending the second election information agrees to be elected as a leader node.
5. The method of claim 4, wherein the second election information includes the number of data storage requests received by the control thread which issues the second election information and corresponding election weight, the second response information includes a second consent sub-information or a second rejection sub-information, and the returning of the second response information to the control thread which issues the second election information according to the second election information includes:
and if the number of the received data storage requests of the control thread sending the second election information is larger than the number of the received data storage requests corresponding to the data storage nodes and the election weight corresponding to the control thread sending the second election information is larger than the election weight of the data storage nodes according to the second election information, returning second consent sub-information to the control thread sending the second election information, and otherwise, returning second rejection sub-information to the control thread sending the second election information.
6. The method of claim 5, further comprising:
and if the election weight corresponding to the control thread sending out the second election information is determined to be larger than the election weight of the data storage node according to the second election information, the data storage node is degraded from the candidate node to the following node.
7. The method of claim 1, wherein the sending the first election information to the control threads corresponding to the other data storage nodes in the node group includes:
sending first election information to control threads corresponding to other data storage nodes in the node group through a communication relation between a main control process corresponding to the data storage node and main control processes corresponding to other data storage nodes in the node group;
the method further comprises the following steps:
and the control thread corresponding to the data storage node receives first response information returned by the control threads corresponding to other data storage nodes in the node group through the communication relation.
8. A distributed storage system, comprising: a plurality of data storage devices associated with corresponding master processes;
the main control process is used for dividing the associated data storage equipment to form a plurality of data storage nodes and configuring corresponding control threads for the data storage nodes;
the control thread is used for determining that the data storage node is a candidate node in the node group to which the data storage node belongs if the corresponding data storage node is not the leader node and does not receive the heartbeat message sent by the leader node in the node group to which the data storage node belongs within the time threshold, sending first election information to the control threads corresponding to other data storage nodes in the node group to indicate the control threads corresponding to other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage node is selected as the leader node of the node group according to the fed back first response information;
and each data storage node in the node group is used for storing the same data and is respectively positioned in different data storage devices.
9. The system of claim 8, wherein the first election information includes a number of received data storage requests and election weights corresponding to the data storage nodes, the first response information includes a first consent sub-information or a first rejection sub-information, and the control thread is specifically configured to:
and indicating the control threads corresponding to other data storage nodes in the node group, and feeding back first consent sub-information to the control thread corresponding to the data storage node when determining that the number of the received data storage requests is smaller than that of the received data storage requests corresponding to the data storage node and the election weight of the data storage node corresponding to the node group is smaller than that of the data storage node, otherwise, feeding back first rejection sub-information to the control thread corresponding to the data storage node.
10. The system of claim 9, wherein the control thread is specifically configured to:
and if the number of the received first consent sub-information is determined to be not less than the number threshold according to the fed back first response information, determining that the data storage node is selected as the leader node of the node group, otherwise, determining that the data storage node is not selected as the leader node of the node group.
11. The system of any of claims 8-10, wherein the control thread is further to:
and if second election information sent by control threads corresponding to other data storage nodes in the node group is received, returning second response information to the control thread sending the second election information according to the second election information so as to determine whether the data storage node sending the second election information agrees to be elected as a leader node.
12. The system of claim 11, wherein the second election information includes a number of data storage requests received by a control thread that issued the second election information and a corresponding election weight, the second answer information includes a second consent sub-information or a second denial sub-information;
the control thread is further specifically configured to:
and if the number of the received data storage requests of the control thread sending the second election information is determined to be larger than the number of the received data storage requests corresponding to the data storage nodes according to the second election information, and the election weight corresponding to the control thread sending the second election information is larger than the election weight of the data storage nodes, returning second consent sub-information to the control thread sending the second election information, and otherwise, returning second rejection sub-information to the control thread sending the second election information.
13. The system of claim 12, wherein the control thread is further to:
and if the election weight corresponding to the control thread sending out the second election information is determined to be larger than the election weight of the data storage node according to the second election information, the data storage node is degraded from the candidate node to the following node.
14. The system of claim 8, wherein the control thread is specifically configured to:
sending first election information to control threads corresponding to other data storage nodes in the node group through a communication relation between a main control process corresponding to the data storage node and main control processes corresponding to other data storage nodes in the node group;
the control thread is further to: and receiving first response information returned by the control threads corresponding to other data storage nodes in the node group through the communication relation.
15. The system of claim 8, wherein the data storage device is further associated with a corresponding data read-write process, wherein the data read-write process corresponds to a plurality of data storage nodes in the data storage device, and the data read-write process is configured to:
after the data storage node is selected as a leader node of the node group, receiving a first data storage request aiming at the data storage node, and storing data in a storage space corresponding to the data storage node according to the first data storage request; and
and sending a second data storage request to the data reading and writing processes corresponding to other data storage nodes in the node group according to the first data storage request so as to indicate the other data storage nodes in the node group to store data in corresponding storage spaces.
16. The system of claim 15, wherein the first data storage request is configured to store a plurality of data to be stored, and the data read/write process is specifically configured to:
and when determining that the data to be stored do not have a dependency relationship, storing the data to be stored in parallel in a storage space corresponding to the data storage node.
17. A leader node election apparatus for a distributed storage system, the distributed storage system including a plurality of data storage devices, the data storage devices having associated therewith corresponding master processes for partitioning the associated data storage devices to form a plurality of data storage nodes and configuring respective control threads for each data storage node, the apparatus comprising:
a node determining module, configured to determine that the data storage node is a candidate node in a node group to which the data storage node belongs if the data storage node is not a leader node and a corresponding control thread is not within a time threshold and receives a heartbeat message sent by the leader node in the node group to which the data storage node belongs, where each data storage node in the node group is used to store the same data and is located in different data storage devices respectively;
and the node election module is used for sending first election information to control threads corresponding to other data storage nodes in the node group through the control threads corresponding to the data storage nodes so as to instruct the control threads corresponding to the other data storage nodes in the node group to feed back corresponding first response information according to the first election information, and determining whether the data storage node is selected as a leader node of the node group according to the fed back first response information.
18. The apparatus according to claim 17, wherein the first election information includes a number of received data storage requests and an election weight corresponding to the data storage node, the first response information includes a first consent sub-information or a first rejection sub-information, and the node election module is specifically configured to:
and indicating control threads corresponding to other data storage nodes in the node group, feeding back first consent sub-information to the control threads corresponding to the data storage nodes when determining that the number of the received data storage requests is smaller than that of the received data storage requests corresponding to the data storage nodes and the election weight of the data storage nodes corresponding to the node group is smaller than that of the data storage nodes, and otherwise feeding back first rejection sub-information to the control threads corresponding to the data storage nodes.
19. The apparatus of claim 18, wherein the node election module is specifically configured to:
and if the number of the received first consent sub-information is determined to be not less than the number threshold according to the fed back first response information, determining that the data storage node is selected as the leader node of the node group, otherwise, determining that the data storage node is not selected as the leader node of the node group.
20. The apparatus of any of claims 17-19, further comprising:
and the information sending module is used for returning second response information to the control thread sending the second election information according to the second election information if the control thread corresponding to the data storage node receives the second election information sent by the control thread corresponding to the other data storage nodes in the node group, so as to determine whether the data storage node sending the second election information agrees to be elected as the leader node or not.
21. The apparatus according to claim 20, wherein the second election information includes a number of data storage requests received by a control thread that issued the second election information and a corresponding election weight, the second response information includes a second consent sub-information or a second rejection sub-information, and the information sending module is specifically configured to:
and if the number of the received data storage requests of the control thread sending the second election information is determined to be larger than the number of the received data storage requests corresponding to the data storage nodes according to the second election information, and the election weight corresponding to the control thread sending the second election information is larger than the election weight of the data storage nodes, returning second consent sub-information to the control thread sending the second election information, and otherwise, returning second rejection sub-information to the control thread sending the second election information.
22. The apparatus of claim 21, further comprising:
and the degradation module is used for degrading the data storage node from the candidate node to the following node if the election weight corresponding to the control thread sending the second election information is determined to be greater than the election weight of the data storage node according to the second election information.
23. The apparatus of claim 17, wherein the node election module is specifically configured to:
sending first election information to control threads corresponding to other data storage nodes in the node group through a communication relation between a main control process corresponding to the data storage node and main control processes corresponding to other data storage nodes in the node group;
the device further comprises:
and the information receiving module is used for receiving first response information returned by the control threads corresponding to other data storage nodes in the node group through the communication relation.
24. A leader node election appliance for a distributed storage system comprising:
a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the leader node election method according to any one of claims 1 to 7.
25. A storage medium storing computer-executable instructions that, when executed, implement the leader node election method according to any one of claims 1 to 7.
CN201810850568.0A 2018-07-28 2018-07-28 Distributed storage system and leader node election method and device thereof Active CN110764690B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810850568.0A CN110764690B (en) 2018-07-28 2018-07-28 Distributed storage system and leader node election method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810850568.0A CN110764690B (en) 2018-07-28 2018-07-28 Distributed storage system and leader node election method and device thereof

Publications (2)

Publication Number Publication Date
CN110764690A CN110764690A (en) 2020-02-07
CN110764690B true CN110764690B (en) 2023-04-14

Family

ID=69328950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810850568.0A Active CN110764690B (en) 2018-07-28 2018-07-28 Distributed storage system and leader node election method and device thereof

Country Status (1)

Country Link
CN (1) CN110764690B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339059A (en) * 2020-03-25 2020-06-26 星辰天合(北京)数据科技有限公司 NAS storage system based on distributed storage system Ceph

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929696A (en) * 2012-09-28 2013-02-13 北京搜狐新媒体信息技术有限公司 Method and apparatus for constructing, submitting and monitoring center node of distributed system
CN105512266A (en) * 2015-12-03 2016-04-20 曙光信息产业(北京)有限公司 Method and device for achieving operational consistency of distributed database
CN106161495A (en) * 2015-03-25 2016-11-23 中兴通讯股份有限公司 A kind of host node electoral machinery, device and storage system
CN106533738A (en) * 2016-10-20 2017-03-22 中国民生银行股份有限公司 Distributed batch processing method, device and system
WO2017065209A1 (en) * 2015-10-16 2017-04-20 国立大学法人東北大学 Information processing system, information processing device, information processing method, and program
CN108183971A (en) * 2015-03-13 2018-06-19 聚好看科技股份有限公司 A kind of node electoral machinery in distributed system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929696A (en) * 2012-09-28 2013-02-13 北京搜狐新媒体信息技术有限公司 Method and apparatus for constructing, submitting and monitoring center node of distributed system
CN108183971A (en) * 2015-03-13 2018-06-19 聚好看科技股份有限公司 A kind of node electoral machinery in distributed system
CN106161495A (en) * 2015-03-25 2016-11-23 中兴通讯股份有限公司 A kind of host node electoral machinery, device and storage system
WO2017065209A1 (en) * 2015-10-16 2017-04-20 国立大学法人東北大学 Information processing system, information processing device, information processing method, and program
CN105512266A (en) * 2015-12-03 2016-04-20 曙光信息产业(北京)有限公司 Method and device for achieving operational consistency of distributed database
CN106533738A (en) * 2016-10-20 2017-03-22 中国民生银行股份有限公司 Distributed batch processing method, device and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Shaik Sahil Babu ; Arnab Raha ; Mrinal Kanti Naskar ; Omar Alfandi ; Dieter Hogrefe."Fuzzy Logic Election of Node for Routing in WSNs".《2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications》.2012,全文. *
杜丽娟 ; 余镇危 ; .分布式超级节点选举算法.计算机工程与应用.2011,(14),全文. *

Also Published As

Publication number Publication date
CN110764690A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
KR102262713B1 (en) Methods, devices and systems for blockchain consensus
CN111756550B (en) Block chain consensus method and device
CN107577694B (en) Data processing method and device based on block chain
EP3547169B1 (en) Block chain-based data processing method and equipment
CN108628688B (en) Message processing method, device and equipment
WO2019033949A1 (en) Data migration method, apparatus and device
CN107196772B (en) Method and device for broadcasting message
TW202008763A (en) Data processing method and apparatus, and client
CN109766167B (en) Method, device, system and equipment for distributing timed tasks
CN110955720B (en) Data loading method, device and system
CN111459724B (en) Node switching method, device, equipment and computer readable storage medium
TWI690187B (en) Service updating method, device and system
CN105306507A (en) Disaster tolerance processing method and disaster tolerance processing device in distributed architecture
CN111552945B (en) Resource processing method, device and equipment
CN110764690B (en) Distributed storage system and leader node election method and device thereof
CN111930530A (en) Equipment message processing method, device and medium based on Internet of things
CN116737345A (en) Distributed task processing system, distributed task processing method, distributed task processing device, storage medium and storage device
CN113126884B (en) Data migration method, data migration device, electronic equipment and computer storage medium
CN114422422A (en) Data transmission method, device and system based on node information
CN110032433B (en) Task execution method, device, equipment and medium
TW202008153A (en) Data processing method and apparatus, and server
CN111797070A (en) Ticket data processing method and device
CN114610526B (en) Data disaster tolerance method, system, device and equipment
CN116932514A (en) Budget deduction method and device
CN115344410B (en) Method and device for judging event execution sequence, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210908

Address after: Room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: ALIBABA GROUP HOLDING Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211231

Address after: 310000 No. 12, Zhuantang science and technology economic block, Xihu District, Hangzhou City, Zhejiang Province

Applicant after: Aliyun Computing Co.,Ltd.

Address before: 310000 room 508, 5th floor, building 4, No.699 Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant before: Alibaba (China) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant