CN110096227B

CN110096227B - Data storage method, data processing device, electronic equipment and computer readable medium

Info

Publication number: CN110096227B
Application number: CN201910245119.8A
Authority: CN
Inventors: 陈钢
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2023-04-18
Anticipated expiration: 2039-03-28
Also published as: CN110096227A

Abstract

The embodiment of the application discloses a data storage method, a data processing device, electronic equipment and a computer readable medium. An embodiment of the method comprises: distributing a preset number of virtual nodes to each disk respectively according to a consistent hash algorithm, and mapping the distributed virtual nodes and data to be stored into a hash ring respectively; determining at least two target virtual nodes in the hash ring based on the mapping positions of the data in the hash ring; and respectively storing the data into the magnetic disks corresponding to the target virtual nodes. This embodiment helps to increase the capacity of the logical volume.

Description

Data storage method, data processing device, electronic equipment and computer readable medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a data storage method, a data processing device, electronic equipment and a computer readable medium.

Background

Data storage is the process of recording data on a storage medium internal or external to a computer. When data storage is performed locally on an electronic device, the data is typically stored on a disk mounted on the electronic device. Due to the limited storage capacity of independent disks, multiple independent disks are often required to be organized together to form a logical volume to improve storage performance and provide data backup functionality.

In the conventional method, a RAID (Redundant Array of Independent Disks) technology is generally used to combine a plurality of Disks into a Logical Volume (Logical Volume), and then divide data into a plurality of data blocks (blocks) and write the data blocks into a plurality of Disks in parallel. However, when one disk fails, this approach requires reading the full amount of data in the other disks in order to calculate the data in the damaged disk. Due to the limitation of factors such as read-write speed, the larger the disk capacity is, the lower the performance of each disk is when the disk is reconstructed. Thus, with logical volumes formed in this manner, capacity is typically significantly limited while disk performance is guaranteed.

Disclosure of Invention

The embodiment of the application provides a data storage method, a data processing device, an electronic device and a computer readable medium, so as to solve the technical problem that a logical volume with a larger capacity cannot be created while the performance of a magnetic disk is ensured in the prior art.

In a first aspect, an embodiment of the present application provides a data storage method, where the method includes: respectively allocating one or more virtual nodes to each disk according to a consistent hash algorithm, and respectively mapping the allocated virtual nodes and data to be stored into hash rings, wherein the virtual nodes of the same disk are adjacent in the hash rings; determining at least two target virtual nodes in the hash ring based on the mapping positions of the data in the hash ring, wherein the disks corresponding to the target virtual nodes are different; and respectively storing the data into the magnetic disks corresponding to the target virtual nodes.

In some embodiments, determining at least two target virtual nodes in the hash ring based on the mapping positions of the data in the hash ring comprises: taking the mapping position of the data in the hash ring as a target position, and searching a first virtual node closest to the target position in the hash ring from the target position according to a preset searching direction; taking the mapping position of the first virtual node in the hash ring as the initial position for searching, and searching at least one second virtual node in the hash ring according to the searching direction, wherein the first virtual node and each second virtual node respectively correspond to different disks; and determining the first virtual node and the second virtual node as target virtual nodes.

In some embodiments, the number of virtual nodes allocated to each disk is equal to a preset number; and using the mapping position of the first virtual node in the hash ring as the initial position of the search, and searching at least one second virtual node in the hash ring according to the search direction, including: determining a target interval number based on a preset number; and taking the mapping position of the first virtual node in the hash ring as a searching initial position, sequentially searching the virtual nodes in the hash ring according to the searching direction and the target interval number, and determining the found virtual nodes as second virtual nodes, wherein the number of times of searching the second virtual nodes is equal to the preset data backup number.

In some embodiments, after storing the data in the disks corresponding to the target virtual nodes, the method further includes: and taking the disk corresponding to the first virtual node as a main disk for storing the data, and respectively determining the disks corresponding to the second virtual nodes as backup disks corresponding to the main disk.

In some embodiments, the method further comprises: in response to receiving a query request for target data, determining a target mapping position of the target data in a hash ring according to a consistent hash algorithm; determining a virtual node to be queried in the hash ring based on the target mapping position; and querying target data from the disk corresponding to the virtual node to be queried.

In some embodiments, the virtual nodes to be queried include a first virtual node to be queried and at least one second virtual node to be queried; and querying target data from the disk corresponding to the virtual node to be queried, including: taking a disk corresponding to the first virtual node to be queried as a main disk for storing target data, and querying the target data from the main disk for storing the target data; and in response to the fact that the target data are not inquired, taking the disk corresponding to the second virtual node to be inquired as a backup disk for storing the target data, and inquiring the target data from the backup disk for storing the target data.

In a second aspect, an embodiment of the present application provides a data storage device, including: the mapping unit is configured to allocate one or more virtual nodes to each disk according to a consistent hash algorithm, and map the allocated virtual nodes and data to be stored into hash rings respectively, wherein the virtual nodes of the same disk are adjacent in the hash rings; the first determining unit is configured to determine at least two target virtual nodes in the hash ring based on the mapping positions of the data in the hash ring, wherein the disks corresponding to the target virtual nodes are different; and the storage unit is configured to store the data into the disks corresponding to the target virtual nodes respectively.

In some embodiments, the first determination unit comprises: the first searching module is configured to use the mapping position of the data in the hash ring as a target position, and from the target position, a first virtual node closest to the target position is searched in the hash ring according to a preset searching direction; the second searching module is configured to search at least one second virtual node in the hash ring according to the searching direction by taking the mapping position of the first virtual node in the hash ring as the initial position of searching, wherein the first virtual node and each second virtual node respectively correspond to different disks; a determination module configured to determine the first virtual node and the second virtual node as target virtual nodes.

In some embodiments, the number of virtual nodes allocated to each disk is equal to a preset number; and a second lookup module further configured to: determining a target interval number based on a preset number; and taking the mapping position of the first virtual node in the hash ring as a searching initial position, sequentially searching the virtual nodes in the hash ring according to the searching direction and the target interval number, and determining the found virtual nodes as second virtual nodes, wherein the number of times of searching the second virtual nodes is equal to the preset data backup number.

In some embodiments, the apparatus further comprises: and a second determining unit configured to determine the disks corresponding to the first virtual nodes as primary disks storing the data, and determine the disks corresponding to the second virtual nodes as backup disks corresponding to the primary disks.

In some embodiments, the apparatus further comprises: a third determining unit configured to determine, in response to receiving a query request for the target data, a target mapping position of the target data in the hash ring according to a consistent hash algorithm; a fourth determination unit configured to determine a virtual node to be queried in the hash ring based on the target mapping position; and the query unit is configured to query the target data from the disk corresponding to the virtual node to be queried.

In some embodiments, the virtual nodes to be queried include a first virtual node to be queried and at least one second virtual node to be queried; and, a query unit comprising: the first query module is configured to take a disk corresponding to the first virtual node to be queried as a main disk for storing target data, and query the target data from the main disk for storing the target data; and the second query module is configured to respond to the determination that the target data is not queried, take the disk corresponding to the second virtual node to be queried as a backup disk for storing the target data, and query the target data from the backup disk for storing the target data.

In a third aspect, an embodiment of the present application provides a data processing method, including: in response to detection of disk replacement, taking a replaced disk as a target main disk and a target backup disk respectively, determining a backup disk corresponding to the target main disk as a first disk to be copied, and determining a main disk corresponding to the target backup disk as a second disk to be copied, wherein each disk stores data by adopting the method described in the embodiment of the first aspect; copying the data in the first disk to be copied and the second disk to be copied into the replaced new disk

In a fourth aspect, an embodiment of the present application provides a data processing apparatus, including: a determining unit, configured to, in response to detecting disk replacement, take a replaced disk as a target primary disk and a target backup disk, respectively, determine a backup disk corresponding to the target primary disk as a first disk to be copied, and determine a primary disk corresponding to the target backup disk as a second disk to be copied, where each disk stores data by using the method described in the embodiment of the first aspect; and the copying unit is configured to copy the data in the first disk to be copied and the second disk to be copied in the replaced new disk.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; storage means having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the embodiments of the first and third aspects above.

In a sixth aspect, the present application provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method according to any one of the first and third aspects.

According to the data storage method, the data processing device, the electronic equipment and the computer readable medium provided by the embodiment of the application, firstly, one or more virtual nodes are respectively allocated to each disk according to a consistent hash algorithm, so that each allocated virtual node and data to be stored are respectively mapped into a hash ring. And then, based on the mapping position of the data in the hash ring, determining at least two target virtual nodes corresponding to different disks in the hash ring. And finally, respectively storing the data into the magnetic disks corresponding to the target virtual nodes. Therefore, the data storage mode can store the data to at least two different disks respectively. Because the consistent hash algorithm is adopted in the data storage process, the data can be backed up to at least one disk related to a certain disk while being stored to the disk. Furthermore, when a certain disk is damaged, only data needs to be read from the disk in which the data of the damaged disk is backed up, and the whole amount of data in other disks does not need to be read. Thus, the creation of larger capacity logical volumes is supported.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

FIG. 1 is a flow diagram of one embodiment of a data storage method according to the present application;

FIG. 2 is a schematic diagram of a hash ring after mapping virtual nodes;

FIG. 3 is a flow diagram of yet another embodiment of a data storage method according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a data storage device according to the present application;

FIG. 5 is a schematic block diagram of one embodiment of a data processing method according to the present application;

FIG. 6 is a schematic block diagram of one embodiment of a data processing apparatus according to the present application;

FIG. 7 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to FIG. 1, a flow 100 of one embodiment of a data storage method according to the present application is shown. The data storage method comprises the following steps:

step 101, respectively allocating one or more virtual nodes to each disk according to a consistent hash algorithm, and respectively mapping each allocated virtual node and data to be stored into a hash ring.

In the present embodiment, a plurality of disks may be installed in an execution main body of the data storage method (e.g., a server for storing data). Generally, magnetic disks may include floppy disks and hard disks. It will be appreciated that the plurality of disks herein may be a plurality of hard disks, as the hard disks have a greater storage capacity and are more suitable for data storage.

In this embodiment, the execution main body may allocate a preset number of virtual nodes to each disk according to a consistent hash algorithm, and map the allocated virtual nodes and data to be stored into a hash ring respectively. In the hash ring, virtual nodes of the same disk are adjacent. Specifically, according to the consistent hash algorithm, the following operations may be performed in sequence:

in a first step, a Hash ring (Hash ring) may be constructed based on a range of values of a preset Hash function. The hash ring is a circular numerical space, and is formed by arranging values (e.g. integers in the range of 0 to (2 ^ 32) -1) of a hash function from small to large in a clockwise direction. Each value of the hash function can characterize its position on the hash ring.

In the second step, a preset number (for example, 2) of virtual nodes may be allocated to each disk, and each virtual node may be mapped to the hash ring. Here, each virtual node may correspond to one hash value. And distributing virtual nodes to each disk, namely distributing hash values to each disk. And the mapping position of each virtual node in the hash ring is the position of the hash value of the virtual node in the hash ring. It should be noted that, when allocating virtual nodes, virtual nodes of the same disk in the hash ring may be adjacent to each other. Further, the number of virtual nodes allocated to each disk may be the same.

As an example, fig. 2 shows a schematic diagram of a hash ring after mapping virtual nodes. In the hash ring shown in fig. 2, from the position 0 (i.e., the position at which the hash value is 0), the virtual nodes mapped sequentially in the clockwise direction respectively include virtual node 0, virtual node 1, virtual node 2, virtual node 3, virtual node 4, virtual node 5, virtual node 6, and virtual node 7. The virtual node 0 and the virtual node 1 are virtual nodes allocated to the disk a. The virtual nodes 2 and 3 are virtual nodes allocated for the disk B. The virtual nodes 4 and 5 are virtual nodes allocated for the disk C. The virtual nodes 6 and 7 are virtual nodes allocated for the disk D. Therefore, virtual nodes of the same hard disk in the hash ring are adjacent. It should be noted that, the illustration of allocating two virtual nodes to each disk is taken as an example, and is not used to limit the number of virtual nodes allocated to the disk in the present invention.

Thirdly, the hash value of the data to be stored may be calculated using the function, so as to map the data into the hash ring. Here, the mapping position of the data is a position of the hash value of the data in the hash ring.

And 102, determining at least two target virtual nodes in the hash ring based on the mapping positions of the data in the hash ring.

In this embodiment, the execution body may determine, in the hash ring, at least two target virtual nodes corresponding to different disks respectively based on a mapping position of the data in the hash ring. It should be noted that the number of target virtual nodes may be preset according to actual needs. For example, when data to be stored only needs to be backed up once, then two target virtual nodes may be determined. When the data to be stored needs to be backed up twice, three target virtual nodes can be determined. When the data to be stored needs to be backed up three times, four target virtual nodes can be determined. And so on.

Here, the disks corresponding to the target virtual nodes are different. Specifically, the executing agent may determine the target virtual nodes one by one according to the following steps:

in the first step, the position of the first target virtual node may be determined based on the mapping position of the data to be stored in the hash ring.

As an example, a first virtual node found from the hash ring may be used as a first target virtual node according to a preset finding direction (for example, a clockwise direction) from a mapping position of data to be stored in the hash ring.

It should be noted that the manner of determining the first target virtual node is not limited to the above example, and may also be determined according to other rules, which are not limited herein. For example, the second or third found virtual node may be used as the first target virtual node, or the virtual node may be searched according to another search direction (e.g., a counterclockwise direction) or a preset search order.

And secondly, determining a hard disk corresponding to the first target virtual node, and determining other one or more target virtual nodes in the virtual nodes corresponding to other hard disks.

As an example, the virtual nodes may be searched sequentially from the first target virtual node according to the search direction (e.g., clockwise). After a virtual node is found, whether the virtual node corresponds to the same hard disk as the determined target virtual node or not can be judged, if yes, the virtual node is ignored, and the next virtual node is continuously found. If not, the virtual node can be determined as a target virtual node. When the number of the target virtual nodes reaches the preset number, the search can be stopped.

As yet another example, since the number of virtual nodes allocated for each disk may be the same. Therefore, the executing entity may further determine a target virtual node every preset number of virtual nodes from the first target virtual node according to the search direction (e.g., clockwise). As an example, if there are two virtual nodes corresponding to each disk, a target virtual node may be determined for every other virtual node. At this time, the determined target virtual node just corresponds to a different disk.

As another example, a hard disk corresponding to the first target virtual node may be regarded as a first hard disk, and at least one hard disk that is sequentially adjacent to the first target virtual node may be searched according to the search direction (e.g., clockwise). Then, one virtual node in the hard disk searched each time can be determined as a target virtual node. Here, the number of searched hard disks is a preset backup number.

For example, when backup is needed once, a neighboring hard disk is searched. When the backup is needed twice, two hard disks which are adjacent in sequence are searched. It should be noted that the adjacency relationship of the hard disks may be determined based on the adjacency relationship of the virtual nodes in the hash ring. Taking fig. 2 as an example, since the virtual node 0 of the disk a is adjacent to the virtual node 7 of the disk D, and the virtual node 1 of the disk a is adjacent to the virtual node 2 of the disk B, it can be considered that the disk a is adjacent to the disk D and the disk B, respectively. Similarly, disk B may be considered to be adjacent to disk a and disk C, respectively; disk C is adjacent to disk B and disk D, respectively. Further, since the disk a is adjacent to the disk B and the disk B is adjacent to the disk C, the disk a, the disk B, and the disk C are considered to be adjacent in this order. Similarly, the disk B, the disk C and the disk D are adjacent in sequence; the disk C, the disk D and the disk A are adjacent in sequence; the disk D, the disk A and the disk B are adjacent in sequence.

It should be noted that the manner of determining the other one or more target virtual nodes is not limited to the above-listed examples, and may also be determined by other manners, which are not limited herein.

And 103, respectively storing the data into the magnetic disks corresponding to the target virtual nodes.

In this embodiment, after determining a target virtual node, the execution main body may store the data to be stored in a disk corresponding to each target virtual node. Taking fig. 2 as an example, if the two determined target virtual nodes are virtual nodes 1 and 3, respectively, the data to be stored (which may be referred to as data M) may be stored in disk a and disk B, respectively. Here, the disk a may be a primary disk for storing the data M, and the disk B may be a backup disk corresponding to the primary disk. Thus, the data M is stored in both the magnetic disk a and the magnetic disk B. In addition, for other data (which may be referred to as data N), if the two target virtual nodes determined during the storage of the data N are respectively node 7 and node 1, the data N may be stored in disk D and disk a, respectively. In this case, the disk D may be a primary disk storing the data N, and the disk a may be a backup disk corresponding to the primary disk D. Thus, the data N is stored in both the disk D and the disk a. It can be seen that disk a can simultaneously serve as a primary disk for storing some data (e.g., data M) and a backup disk for backing up other data (e.g., data N). Therefore, each disk can be used as a main disk to store a part of data, and can also be used as a backup disk of one or more disks to backup another part of data.

Because each target virtual node corresponds to different disks, after the data is stored in each disk, the data to be stored can be stored and backed up at the same time.

It is understood that when any data is written, the data is stored to one disk, and is backed up to one or more other disks (which may be referred to as corresponding backup disks) according to the corresponding rule. Therefore, when a certain disk fails, a part of data can be reconstructed on the failed disk by copying the data in the backup disk corresponding to the failed disk. Meanwhile, the failed disk may also serve as a backup disk for another disk, and at this time, another part of data (i.e., backup data of the other disk) still exists in the failed disk. Therefore, the data in the other disk can be copied at the same time, i.e. the reconstruction of the other part of data can be performed on the failed disk. As an example, when the number of target virtual nodes is two, when a certain disk (e.g., disk a) fails and needs to be rebuilt, a part of data in disk a is already backed up by adjacent disk B, and another part of data in disk a is backed up by another adjacent disk D. Therefore, the data in the disk B and the disk D can be directly copied to the new disk a, and the data in the disk C does not need to be read.

Therefore, in the data storage mode, when the disk fails, the whole data does not need to be extracted from all the disks, so that the disk reconstruction speed can be increased, and the performance of the disk can be ensured. Furthermore, because the performance of the disk reconstruction is not limited by the disk capacity, the creation of a logical volume with larger capacity can be realized, and the capacity of the logical volume is increased.

In the method provided by the above embodiment of the present application, firstly, one or more virtual nodes are respectively allocated to each disk according to a consistent hash algorithm, so that each allocated virtual node and data to be stored are respectively mapped into a hash ring. And then, determining at least two target virtual nodes corresponding to different disks in the hash ring based on the mapping positions of the data in the hash ring. And finally, respectively storing the data into the magnetic disks corresponding to the target virtual nodes. Therefore, the data storage mode can store the data to at least two different disks respectively. Because a consistent hash algorithm is adopted in the data storage process, the data can be backed up to at least one disk related to a certain disk while being stored to the disk. Furthermore, when a certain disk is damaged, only data needs to be read from the disk in which the data of the damaged disk is backed up, and the whole data in other disks does not need to be read. Thus, the creation of a larger capacity logical volume is supported.

With further reference to FIG. 3, a flow 300 of yet another embodiment of a data storage method is illustrated. The process 300 of the data storage method includes the following steps:

step 301, respectively allocating one or more virtual nodes to each disk according to a consistent hash algorithm, and respectively mapping each allocated virtual node and data to be stored into a hash ring.

In the present embodiment, a plurality of disks may be installed in an execution main body of the data storage method (e.g., a server for storing data). The execution main body can respectively allocate a preset number of virtual nodes to each disk according to a consistent hash algorithm, and respectively map the allocated virtual nodes and the data to be stored into a hash ring. In the hash ring, virtual nodes of the same disk are adjacent.

It should be noted that the operation of step 301 is substantially the same as the operation of step 101, and is not described herein again.

And 302, taking the mapping position of the data in the hash ring as a target position, and searching a first virtual node closest to the target position in the hash ring according to a preset searching direction from the target position.

In this embodiment, the executing entity may use a mapping position of the data to be stored in the hash ring as a target position, and search, from the target position, a virtual node closest to the target position in the hash ring according to a preset search direction (e.g., clockwise direction) as a first virtual node.

Taking fig. 2 as an example, if the mapping position of the data to be stored in the hash ring is located between virtual node 0 and virtual node 1 (here, the mapping position includes the positions of virtual node 0 and virtual node 1), the search is performed clockwise, and the virtual node closest to the mapping position is virtual node 1, and then the virtual node 1 may be determined as the first virtual node.

Step 303, using the mapping position of the first virtual node in the hash ring as the initial position for searching, and searching for at least one second virtual node in the hash ring according to the searching direction.

In this embodiment, the executing body may use a mapping position of the first virtual node in the hash ring as an initial position of the lookup, and lookup at least one second virtual node in the hash ring according to the lookup direction. The first virtual node and each second virtual node correspond to different disks respectively.

Here, the execution subject may perform the search of the second virtual node in various manners. As an example, the virtual nodes may be searched sequentially from the first virtual node according to the search direction. After a virtual node is found, whether the virtual node corresponds to the same hard disk as the first virtual node or the determined second virtual node can be judged, if yes, the virtual node is ignored, and the next virtual node is continuously found. If not, the virtual node may be determined to be a second virtual node. When the number of the second virtual nodes reaches the preset data backup number, the search can be stopped.

In some optional implementations of this embodiment, the number of virtual nodes allocated to each disk is equal to a preset number (e.g., 2). The execution body may first determine a target number of intervals based on the preset number. In practice, the target number of intervals may be equal to the difference between the preset number and 1. For example, if the preset number is 2, the target number of intervals may be determined to be 1. Namely every other virtual node, a target virtual node is determined. In this case, each target virtual node necessarily corresponds to a different hard disk.

In this case, the execution body may sequentially search for the virtual nodes in the hash ring according to the search direction and the target number of intervals, with the mapping position of the first virtual node in the hash ring as a start position of the search, and determine the found virtual node as the second virtual node. Namely, every other virtual node of the target interval number, a second virtual node is determined. And searching the second virtual node for times equal to the preset data backup quantity.

And step 304, storing the data into the magnetic disks corresponding to the target virtual nodes respectively.

In this embodiment, after determining each target virtual node, the execution main body may store the data to be stored in the disk corresponding to each target virtual node. Taking fig. 2 as an example, if the determined two target virtual nodes are virtual nodes 1 and 3, the data to be stored may be stored in the disk a and the disk B, respectively.

Step 305, the disk corresponding to the first virtual node is taken as a primary disk for storing the data, and the disks corresponding to the second virtual nodes are respectively determined as backup disks corresponding to the primary disk.

In this embodiment, the execution body may determine the disk corresponding to the first virtual node as a primary disk storing the data, and determine the disks corresponding to the second virtual nodes as backup disks corresponding to the primary disk.

It is understood that when any data is written, the data is stored to a corresponding primary disk and is also stored to one or more backup disks corresponding to the primary disk. Therefore, each disk can be used as a backup disk of the primary disk and one or more other disks at the same time.

It is understood that, since each disk can be used as a primary disk and a backup disk of one or more other disks, the data in the disk can be regarded as being composed of two parts of data for each disk. Wherein, a part of data is data stored when the magnetic disk is used as a main magnetic disk; the other part of the data is the data stored when the disk is used as a backup disk. When a certain disk fails, a part of data can be reconstructed on the failed disk by copying the data in the backup disk corresponding to the failed disk. Meanwhile, the failed disk may also serve as a backup disk for other disks, and at this time, another part of data (i.e., backup data of the other disks) still exists in the failed disk. Therefore, the data in the other disk can be copied at the same time, i.e. the reconstruction of the other part of data can be performed on the failed disk. As an example, when the number of target virtual nodes is two, when a certain disk (e.g., disk a) fails and needs to be made, main data a in disk a (i.e., data stored by disk a as a primary disk) is already backed up by adjacent disk B, and main data D in disk a (i.e., data stored by disk D as a primary disk) of another adjacent disk D is backed up. Therefore, the data in the disk B and the disk D can be directly copied to the new disk a, and the data in the disk C does not need to be read.

Step 306, in response to receiving the query request for the target data, determining a target mapping position of the target data in the hash ring according to a consistent hash algorithm.

In this embodiment, in response to receiving a query request for target data (i.e. data to be queried currently), the execution main body may determine a target mapping position of the target data in the hash ring according to the consistent hash algorithm. Here, for the specific operation of determining the target mapping position, reference may be made to the description of determining the mapping position of the data to be stored in step 201 or step 301, and details are not repeated here.

Step 307, determining a virtual node to be queried in the hash ring based on the target mapping position.

In this embodiment, the executing agent may determine the virtual node to be queried in the hash ring based on the target mapping position. Here, the operation of determining the virtual node to be queried based on the target mapping position is the same as the operation of determining the target virtual node in the data storage process (see what is described in step 102 or steps 302 to 303), and is not described herein again.

Because the mode of determining the virtual node to be queried during data query is the same as the mode of determining the target virtual node during data storage, the queried disk and the disk stored by the target data can be ensured to be the same, so that each disk does not need to be accessed to query the target data, and the access times of the disks are reduced.

And 308, inquiring target data from the disk corresponding to the virtual node to be inquired.

In this embodiment, the executing entity may query the target data from the disk corresponding to any virtual node to be queried determined in step 307. And target data can be queried in the disks corresponding to the determined nodes to be queried respectively.

In some optional implementation manners of this embodiment, the virtual nodes to be queried may include a first virtual node to be queried and at least one second virtual node to be queried. In this case, the execution agent may first search for the target data from the primary disk storing the target data, with the disk corresponding to the first virtual node to be searched as the primary disk storing the target data. Taking fig. 2 as an example, if the two determined virtual nodes to be queried are the virtual nodes 1 and 3, respectively, it can be known that the target data is stored in the disk a and the disk B. Since the virtual node 1 is a first virtual node to be queried and the virtual node 3 is a second virtual node to be queried, the disk a can be used as a primary disk for storing the target data, and the disk B can be used as a backup disk corresponding to the primary disk. After receiving the query request for the data, the primary disk a may directly query the data. It should be noted that the determining of the first virtual node to be queried may be performed in the same manner as the determining of the first virtual node in step 302, and the determining of the second virtual node to be queried may be performed in the same manner as the determining of the second virtual node in step 303.

In the above implementation manner, in response to determining that the target data is not queried (for example, the primary disk is damaged), the execution main body may query the target data from the backup disk storing the target data, with the disk corresponding to the second virtual node to be queried as the backup disk storing the target data. Continuing with the above example, when the primary disk a is damaged, the target data cannot be read from the primary disk a, and at this time, the target data can be read from the backup disk B of the primary disk a.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 1, the flow 300 of the data storage method in this embodiment relates to the determination steps of the primary disk and the backup disk. Therefore, according to the scheme described in this embodiment, when a disk fails, it is not necessary to extract the entire amount of data from all the disks, so that the disk reconstruction speed can be increased, and the performance of the disk can be ensured. Furthermore, because the performance of the disk reconstruction is not limited by the disk capacity, the creation of a logical volume with larger capacity can be realized, and the capacity of the logical volume is increased. In addition, the flow 300 of the data storage method in the embodiment also relates to the operation of data query. When data is queried, firstly, a target mapping position of target data in the hash ring is determined, and then a virtual node to be queried corresponding to the target mapping position is determined, so that the target data is queried from a disk corresponding to the virtual node to be queried, each disk does not need to be accessed, and disk access amount is reduced.

With further reference to fig. 4, as an implementation of the method shown in the above figures, the present application provides an embodiment of a data storage device, which corresponds to the embodiment of the method shown in fig. 1, and which can be applied to various electronic devices.

As shown in fig. 4, the data storage device 400 according to the present embodiment includes: a mapping unit 401, configured to allocate one or more virtual nodes to each disk according to a consistent hash algorithm, and map each allocated virtual node and data to be stored into a hash ring, where the virtual nodes of the same disk are adjacent to each other in the hash ring; a first determining unit 402, configured to determine at least two target virtual nodes in the hash ring based on mapping positions of the data in the hash ring, where each target virtual node corresponds to a different disk; a storage unit 403 configured to store the data in the disks corresponding to the target virtual nodes, respectively.

In some optional implementations of the present embodiment, the first determining unit may include a first searching module, a second searching module, and a determining module (not shown in the figure). The first searching module is configured to use a mapping position of the data in the hash ring as a target position, and search a first virtual node closest to the target position in the hash ring according to a preset searching direction from the target position. The second searching module is configured to search at least one second virtual node in the hash ring according to the searching direction by using a mapping position of the first virtual node in the hash ring as an initial position of searching, where the first virtual node and each second virtual node correspond to different disks respectively. The determining module is configured to determine the first virtual node and the second virtual node as target virtual nodes.

In some optional implementation manners of this embodiment, the number of virtual nodes allocated to each disk is equal to a preset number; and the second lookup module may be further configured to: determining a target interval number based on the preset number; and taking the mapping position of the first virtual node in the hash ring as a searching initial position, sequentially searching the virtual nodes in the hash ring according to the searching direction and the target interval number, and determining the found virtual nodes as second virtual nodes, wherein the number of times of searching the second virtual nodes is equal to the preset data backup number.

In some optional implementation manners of this embodiment, the apparatus may further include a third determining unit, a fourth determining unit, and an inquiring unit (not shown in the figure). Wherein the third determining unit is configured to determine a target mapping position of the target data in the hash ring according to a consistent hash algorithm in response to receiving a query request for the target data; a fourth determination unit configured to determine a virtual node to be queried in the hash ring based on the target mapping position; and the query unit is configured to query the target data from the disk corresponding to the virtual node to be queried.

In some optional implementations of this embodiment, the virtual nodes to be queried may include a first virtual node to be queried and at least one second virtual node to be queried; and, the query unit may include: the first query module is configured to take a disk corresponding to the first virtual node to be queried as a main disk for storing target data, and query the target data from the main disk for storing the target data; and the second query module is configured to respond to the determination that the target data is not queried, take the disk corresponding to the second virtual node to be queried as a backup disk for storing the target data, and query the target data from the backup disk for storing the target data.

In some optional implementations of this embodiment, the apparatus may further include a first querying unit and a second querying unit (not shown in the figure). Wherein the first querying unit is configured to query the data from the primary disk in response to receiving a query request for the data; a second lookup unit (not shown). Wherein the processor is configured to, in response to determining that the data is not queried, and inquiring the data from the backup disk corresponding to the main disk.

In the apparatus provided in the foregoing embodiment of the present application, first, the mapping unit 401 allocates one or more virtual nodes to each disk according to a consistent hash algorithm, so as to map each allocated virtual node and data to be stored into a hash ring respectively. Then, the first determining unit 402 determines at least two target virtual nodes corresponding to different disks in the hash ring based on the mapping position of the data in the hash ring. Finally, the storage unit 403 stores the data in the disks corresponding to the target virtual nodes. Therefore, the data storage mode can determine the disks for data storage and backup according to a certain rule. Therefore, when a certain disk is damaged, only data need to be read from the disk with the data of the damaged disk backed up, and the whole data in other disks does not need to be read, so that the performance of the disk is ensured, and the creation of a logical volume with larger capacity is supported.

Referring to fig. 5, a flow 500 of an embodiment of a data processing method provided by the present application is shown. The data processing method may include the steps of:

step 501, in response to detection of disk replacement, taking the replaced disks as a target main disk and a target backup disk respectively, determining a backup disk corresponding to the target main disk as a first disk to be copied, and determining a main disk corresponding to the target backup disk as a second disk to be copied.

In this embodiment, an execution main body of the data processing method (e.g., a server for storing data) may determine, as the first disk to be copied, a backup disk corresponding to a target primary disk, by taking the replaced disk as the target primary disk in response to detecting disk replacement. Meanwhile, the replaced disk may be used as a target backup disk, and the primary disk corresponding to the target backup disk may be determined as the second disk to be copied.

It should be noted that each disk may use the data storage method described in the embodiment of fig. 1 or the embodiment of fig. 3 for data storage. For a specific storage process, reference may be made to the description related to the embodiment in fig. 1 or the embodiment in fig. 3, and details are not described here again.

Step 502, copying the data in the first disk to be copied and the second disk to be copied into the replaced new disk.

In this embodiment, the executing body may copy the data in the first disk to be copied and the second disk to be copied in the replaced new disk.

It is understood that, since each disk can be used as a primary disk and a backup disk of one or more other disks, for each disk, the data in the disk can be regarded as being composed of two parts of data. Wherein, a part of data is data stored when the magnetic disk is used as a main magnetic disk; the other part of the data is the data stored when the disk is used as a backup disk. When a certain disk fails, a part of data can be reconstructed from the failed disk by copying the data in the backup disk corresponding to the failed disk. Meanwhile, the failed disk may also serve as a backup disk for other disks, and at this time, another part of data (i.e., backup data of the other disks) still exists in the failed disk. Therefore, the data in the other disk can be copied at the same time, i.e. the reconstruction of the other part of data can be performed on the failed disk. As an example, when the number of target virtual nodes is two, when a certain disk (e.g., disk a) fails and needs to be made, main data a in disk a (i.e., data stored by disk a as a primary disk) is already backed up by adjacent disk B, and main data D in disk a (i.e., data stored by disk D as a primary disk) of another adjacent disk D is backed up. Therefore, the data in the disk B and the disk D can be directly copied to the new disk a, and the data in the disk C does not need to be read.

According to the method provided by the embodiment of the application, when the disk fails, the whole data does not need to be extracted from all the disks, so that the disk reconstruction speed can be increased, and the performance of the disk can be ensured. Furthermore, because the performance of the disk reconstruction is not limited by the disk capacity, the creation of a logical volume with larger capacity can be realized, and the capacity of the logical volume is increased.

With continuing reference to FIG. 6, the present application provides one embodiment of a data processing apparatus as an implementation of the method illustrated in FIG. 5 and described above. The embodiment of the device corresponds to the embodiment of the method shown in fig. 5, and the device can be applied to various electronic devices.

As shown in fig. 6, the text recognition apparatus 600 according to the present embodiment includes: a determining unit 601 configured to, in response to detection of disk replacement, take a replaced disk as a target primary disk and a target backup disk, respectively, determine a backup disk corresponding to the target primary disk as a first disk to be copied, and determine a primary disk corresponding to the target backup disk as a second disk to be copied; a copying unit 602 configured to copy data in the first disk to be copied and the second disk to be copied in the replaced new disk

It will be understood that the elements described in the apparatus 600 correspond to various steps in the method described with reference to fig. 5. Thus, the operations, features and advantages described above for the method are also applicable to the apparatus 600 and the units included therein, and are not described herein again.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 708 including a magnetic disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU) 701, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a magnetic disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, which may be described as: a processor includes a mapping unit, a first determining unit, and a storage unit. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: distributing a preset number of virtual nodes to each disk respectively according to a consistent hash algorithm, and mapping the distributed virtual nodes and data to be stored into a hash ring respectively; determining at least two target virtual nodes in the hash ring based on the mapping positions of the data in the hash ring; and respectively storing the data into the magnetic disks corresponding to the target virtual nodes.

The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method of storing data, comprising:

according to a consistent hash algorithm, respectively allocating a preset number of virtual nodes to each disk, and respectively mapping the allocated virtual nodes and data to be stored into hash rings, wherein in the hash rings, the virtual nodes of the same disk are adjacent, and the number of the virtual nodes allocated to each disk is the same;

determining at least two target virtual nodes in the hash ring based on the mapping positions of the data in the hash ring, wherein the target virtual nodes comprise a first virtual node and at least one second virtual node, and the disks corresponding to the target virtual nodes are different;

respectively storing the data into a disk corresponding to each target virtual node; the disk corresponding to the first virtual node is used as a main disk for storing the data, and the disks corresponding to the second virtual nodes are respectively determined as backup disks corresponding to the main disk; the main magnetic disk stores a part of data and is used as a backup disk of one or more other magnetic disks to backup another part of data.

2. The data storage method of claim 1, wherein determining at least two target virtual nodes in the hash ring based on the mapping positions of the data in the hash ring comprises:

taking the mapping position of the data in the hash ring as a target position, and searching a first virtual node closest to the target position in the hash ring according to a preset searching direction from the target position;

taking the mapping position of the first virtual node in the hash ring as the initial position of searching, and searching at least one second virtual node in the hash ring according to the searching direction, wherein the first virtual node and each second virtual node correspond to different disks respectively;

determining the first virtual node and the second virtual node as target virtual nodes.

3. The data storage method of claim 2,

taking the mapping position of the first virtual node in the hash ring as the initial position of the search, and searching at least one second virtual node in the hash ring according to the search direction, including:

determining a target interval number based on the preset number;

and taking the mapping position of the first virtual node in the hash ring as a searching initial position, sequentially searching the virtual nodes in the hash ring according to the searching direction and the target interval number, and determining the found virtual nodes as second virtual nodes, wherein the number of times of searching the second virtual nodes is equal to the preset data backup number.

4. The data storage method of claim 1, further comprising:

in response to receiving a query request for target data, determining a target mapping position of the target data in the hash ring according to the consistent hash algorithm;

determining a virtual node to be queried in the hash ring based on the target mapping position;

and querying the target data from the disk corresponding to the virtual node to be queried.

5. The data storage method according to claim 4, wherein the virtual nodes to be queried comprise a first virtual node to be queried and at least one second virtual node to be queried; and the number of the first and second groups,

the querying the target data from the disk corresponding to the virtual node to be queried includes:

taking the disk corresponding to the first virtual node to be queried as a main disk for storing the target data, and querying the target data from the main disk for storing the target data;

and in response to the fact that the target data are not inquired, taking the disk corresponding to the second virtual node to be inquired as a backup disk for storing the target data, and inquiring the target data from the backup disk for storing the target data.

6. A data storage device, comprising:

the mapping unit is configured to allocate a preset number of virtual nodes to each disk according to a consistent hash algorithm, and map the allocated virtual nodes and data to be stored into hash rings respectively, wherein in the hash rings, the virtual nodes of the same disk are adjacent, and the number of the virtual nodes of each disk is the same;

a first determining unit, configured to determine at least two target virtual nodes in the hash ring based on mapping positions of the data in the hash ring, where the target virtual nodes include a first virtual node and at least one second virtual node, and where disks corresponding to the target virtual nodes are different;

the storage unit is configured to store the data into the disks corresponding to the target virtual nodes respectively; the second determining unit is configured to take the disks corresponding to the first virtual nodes as main disks for storing the data, and determine the disks corresponding to the second virtual nodes as backup disks corresponding to the main disks respectively; the main magnetic disk stores a part of data and is used as a backup disk of one or more other magnetic disks to backup another part of data.

7. The data storage device of claim 6, wherein the first determination unit comprises:

a first searching module configured to use a mapping position of the data in the hash ring as a target position, and search a first virtual node closest to the target position in the hash ring according to a preset searching direction from the target position;

a second searching module configured to search at least one second virtual node in the hash ring according to the searching direction by using the mapping position of the first virtual node in the hash ring as an initial position of searching, wherein the first virtual node and each second virtual node correspond to different disks respectively;

a determination module configured to determine the first virtual node and the second virtual node as target virtual nodes.

8. The data storage device of claim 7,

the second lookup module further configured to:

determining a target interval number based on the preset number;

9. The data storage device of claim 8, wherein the device further comprises:

a third determination unit configured to determine, in response to receiving a query request for target data, a target mapping position of the target data in the hash ring according to the consistent hash algorithm;

a fourth determining unit configured to determine a virtual node to be queried in the hash ring based on the target mapping position;

and the query unit is configured to query the target data from the disk corresponding to the virtual node to be queried.

10. The data storage device of claim 9, wherein the virtual nodes to be queried comprise a first virtual node to be queried and at least one second virtual node to be queried; and (c) a second step of,

the query unit comprises:

a first query module configured to query the target data from the primary disk storing the target data by using the disk corresponding to the first virtual node to be queried as the primary disk storing the target data;

and the second query module is configured to query the target data from the backup disk storing the target data by taking the disk corresponding to the second virtual node to be queried as the backup disk storing the target data in response to determining that the target data is not queried.

11. A method of data processing, the method comprising:

in response to detection of disk replacement, taking a replaced disk as a target main disk and a target backup disk respectively, determining a backup disk corresponding to the target main disk as a first disk to be copied, and determining a main disk corresponding to the target backup disk as a second disk to be copied, wherein each disk stores data by adopting the method of any one of claims 1 to 5;

and copying the data in the first disk to be copied and the second disk to be copied into the replaced new disk.

12. A data processing apparatus, characterized in that the apparatus comprises:

a determining unit, configured to, in response to detection of disk replacement, take a replaced disk as a target primary disk and a target backup disk, respectively, determine a backup disk corresponding to the target primary disk as a first disk to be copied, and determine a primary disk corresponding to the target backup disk as a second disk to be copied, wherein each disk stores data by using the method according to any one of claims 1 to 5;

a copying unit configured to copy data in the first disk to be copied and the second disk to be copied in the replaced new disk.

13. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5, 11.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5, 11.