CN111756828A

CN111756828A - Data storage method, device and equipment

Info

Publication number: CN111756828A
Application number: CN202010567658.6A
Authority: CN
Inventors: 樊云龙; 颜秉珩
Original assignee: Guangdong Inspur Big Data Research Co Ltd
Current assignee: Guangdong Inspur Smart Computing Technology Co Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2020-10-09
Anticipated expiration: 2040-06-19
Also published as: CN111756828B; WO2021253853A1

Abstract

The application discloses a data storage method, which divides disk resources of a node into more than two disk resource domains in a disk resource domain strategy and sets a corresponding relation between an object and the disk resource domains. Therefore, when the disk mapping is performed, the target disk resource domain corresponding to the object is determined, and then the hash algorithm is used to determine which disk in the target disk resource domain the object is specifically mapped to, so as to finally obtain the mapping relationship between the object and the disk. Therefore, the method avoids the problem that the object is randomly mapped to any disk in the node by setting the disk resource domain strategy, and realizes the purpose mapping of the object, so that the object can only be mapped to the corresponding disk resource domain. The flexibility of resource allocation is improved, and the storage performance of the distributed storage system is fully exerted. In addition, the application also provides a data storage device, equipment and a readable storage medium, and the technical effect of the data storage device corresponds to that of the method.

Description

Data storage method, device and equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data storage method, an apparatus, a device, and a readable storage medium.

Background

The sheetlog is an emerging distributed storage system of an open source community, adopts a completely symmetrical structure, does not have a central node similar to metadata service, and provides storage service to the outside as a whole by means of interconnection of a large number of common PC servers through a network.

Unlike other distributed storage designs, the sheetlog has no metadata information, that is, node position information stored by the object is not recorded, and the sheetlog calculates the mapping relationship from the object to the storage position through a hash algorithm in the data storage process.

When calculating the mapping relationship between the object and the disk, the disadvantage of calculating the storage location of the object by the hash algorithm is that: according to the hash algorithm, the objects are randomly distributed on any disk in the node, so that the objects cannot be organized according to a certain rule and purposefully mapped. For example, assuming that each node has 4 disks, the objects are randomly distributed on any one of the 4 disks according to a hash algorithm, and the storage range of the object cannot be limited to only the disk 1 and the disk 2 of the node.

It can be seen that, in the current distributed storage system, the mapping relationship of the object to the disk is determined through a hash algorithm, the object is randomly distributed on any disk of the node, and the resource allocation mode is too rigid, which affects the storage performance of the distributed storage system.

Disclosure of Invention

The application aims to provide a data storage method, a data storage device, data storage equipment and a readable storage medium, which are used for solving the problems that the storage performance of a distributed storage system is influenced by the fact that the current distributed storage system determines the mapping relation of an object to a disk through a Hash algorithm and the resource allocation mode is too rigid. The specific scheme is as follows:

in a first aspect, the present application provides a data storage method, including:

determining a data object to be stored;

determining a target node to which the data object is mapped, and acquiring a disk resource domain strategy of the target node, wherein the disk resource of the target node is divided into more than two disk resource domains, and the disk resource domain strategy comprises a corresponding relation between the data object and the disk resource domains and also comprises a corresponding relation between the disks and the disk resource domains;

determining the mapping relation between the data object and the disk by utilizing a consistent hash algorithm on a target disk resource domain corresponding to the data object;

and storing the data object according to the mapping relation between the data object and the disk.

Preferably, the determining, by using a consistent hash algorithm, a mapping relationship between the data object and the disk in the target disk resource domain corresponding to the data object includes:

constructing a hash ring according to the target disk resource domain corresponding to the data object;

calculating a hash value of the name of the data object by using a consistent hash algorithm;

determining the position of the data object on the hash ring according to the size of the hash value;

and determining the mapping relation between the data object and the disk according to the position of the data object in the hash ring.

Preferably, before the obtaining the disk resource domain policy of the target node, the method further includes:

and setting a disk resource domain strategy of the target node, and dividing the high-performance disk and the low-performance disk into different disk resource domains.

Preferably, the storing the data object according to the mapping relationship between the data object and the disk includes:

and determining the storage position information of the data object according to the mapping relation between the data object and the disk, and storing the data object according to the storage position information, wherein the storage position information comprises a disk resource domain number, a disk number and a virtual node number.

Preferably, the determining the target node to which the data object is mapped includes:

acquiring a node resource domain strategy of a current cluster, wherein the node resource of the current cluster is divided into more than two node resource domains, and the node resource domain strategy comprises a corresponding relation between a data object and a node resource domain and also comprises a corresponding relation between a node and the node resource domain;

and determining the mapping relation between the data object and the nodes by utilizing a consistent hash algorithm on the target node resource domain corresponding to the data object to obtain the target node mapped by the data object.

Preferably, before the obtaining the node resource domain policy of the current cluster, the method further includes:

and setting a node resource domain strategy of the current cluster, and dividing nodes in different fault domains into the same node resource domain.

In a second aspect, the present application provides a data storage device comprising:

an object determination module: for determining a data object to be stored;

a policy acquisition module: the system comprises a target node, a disk resource domain policy and a data object mapping module, wherein the target node is used for determining a target node to which the data object is mapped and acquiring the disk resource domain policy of the target node, the disk resource of the target node is divided into more than two disk resource domains, and the disk resource domain policy comprises a corresponding relation between the data object and the disk resource domains and also comprises a corresponding relation between the disks and the disk resource domains;

a mapping relation determination module: the mapping relation between the data object and the disk is determined by utilizing a consistent hash algorithm on a target disk resource domain corresponding to the data object;

a storage module: and the data object is stored according to the mapping relation between the data object and the disk.

In a third aspect, the present application provides a data storage device, comprising:

a memory: for storing a computer program;

a processor: for executing said computer program for implementing the steps of the data storage method as described above.

In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program for implementing the steps of the data storage method as described above when executed by a processor.

The data storage method provided by the application comprises the following steps: determining a data object to be stored; determining a target node mapped by a data object, and acquiring a disk resource domain strategy of the target node, wherein the disk resource of the target node is divided into more than two disk resource domains, and the disk resource domain strategy comprises a corresponding relation between the data object and the disk resource domains and also comprises a corresponding relation between a disk and the disk resource domains; determining the mapping relation between the data object and the disk by utilizing a consistent hash algorithm on a target disk resource domain corresponding to the data object; and storing the data object according to the mapping relation between the data object and the disk.

Therefore, in the method, the disk resources of the node are divided into more than two disk resource domains in the disk resource domain strategy, and the corresponding relation between the data object and the disk resource domains is set in the disk resource domain strategy. Therefore, when mapping between the data object and the disk is performed, the target disk resource domain corresponding to the data object is determined, and then the hash algorithm is used to determine which disk in the target disk resource domain the data object is specifically mapped to, so as to finally obtain the mapping relationship between the object and the disk. Therefore, the method avoids the problem that the data object is randomly mapped to any disk in the node by setting the disk resource domain strategy, and realizes the purpose mapping of the data object, so that the data object can only be mapped to the disk in the corresponding disk resource domain. The flexibility of resource allocation is improved, and the storage performance of the distributed storage system is fully exerted.

In addition, the present application also provides a data storage device, a device and a readable storage medium, the technical effects of which correspond to the technical effects of the above method, and are not described herein again.

Drawings

For a clearer explanation of the embodiments or technical solutions of the prior art of the present application, the drawings needed for the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an object distribution of a conventional hash ring based on a consistent hash algorithm provided in the present application;

FIG. 2 is a flowchart illustrating an implementation of an embodiment of a data storage method provided in the present application;

FIG. 3 is a schematic diagram illustrating a partitioning of disk resources according to the present application;

fig. 4 is a schematic diagram of a hash ring applying a disk resource domain policy provided in the present application;

fig. 5 is a schematic diagram illustrating an object distribution of a hash ring applying a disk resource domain policy according to the present application;

fig. 6 is a flowchart of a refinement of S103 in a first embodiment of a data storage method provided in the present application;

FIG. 7 is a flowchart of a second implementation of a data storage method according to the present application;

FIG. 8 is a functional block diagram of an embodiment of a data storage device provided herein.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As previously described, unlike other distributed storage designs, the sheetlog has no metadata information, i.e., does not record the location information that the object stores. The sheetlog calculates the mapping relationship between the object and the storage location through a consistent hashing algorithm, and the mapping process can be defined as follows: the object storage location is hash (object name). Then, the unique hash value calculated according to the object name is searched for its position on the hash ring, thereby determining the storage location of the object, as shown in fig. 1.

Fig. 1 is a schematic diagram of a hash ring, in which triangles represent objects and circles represent virtual nodes. In the consistent hash algorithm, regardless of whether the virtual node corresponding to the physical node or the virtual disk corresponding to the physical disk is referred to as a virtual node in the hash ring, a disk hash process is taken as an example to be described below.

As can be seen from fig. 1, the range of [0,2n) forms a hash ring, assuming that there are 3 physical disks on a certain physical node, and according to the consistent hash rule, each physical disk corresponds to 4 virtual nodes, where the names of the virtual nodes are defined by "physical disk number + virtual node number", that is, each virtual node corresponding to the physical disk 1 is respectively denoted as vnode1.1, vnode 1.2, vnode 1.3, vnode 1.4, and each virtual node is randomly and uniformly distributed at different positions of the hash ring. Assuming that there are 8 objects to be stored, which are respectively denoted as object 1, object 2, …, and object 8, a hash value is calculated according to the object name, and then the position of the object on the hash ring is determined according to the size of the hash value. According to the consistent hash algorithm, the objects are randomly distributed on different physical disks. The final allocation result as in fig. 1 is: the object 1 is allocated on the physical disk 1, the object 5 is allocated on the physical disk 3, and other object allocation situations are not described one by one.

Because a plurality of physical nodes exist in the sheetlog, and a plurality of physical disks exist on each physical node, in practical application, two layers of hash are needed to calculate the position information of an object: the first layer of hash is a hash ring consisting of all physical nodes in the cluster, and the physical node of the cluster on which the object is distributed is calculated through the layer of hash; after determining the physical node to which the object is mapped, all the physical disks of the physical node are formed into a hash ring, and the physical disk on which the object is allocated is calculated according to the hash value of the object. That is, the first layer hash calculates the node position information of the object, and the second layer hash calculates the disk position information of the object; through two-layer hash calculation, the position information of an object is determined.

However, the two-layer hash mapping described above has the disadvantages that: object mappings cannot be organized according to certain rules. For example: how can an object 1 be distributed only on physical node1 and physical node2, and an object 2 distributed only on physical node2 and physical node3, and further how can an object 1 be distributed only on physical disk 1 of physical node1, physical disk 2 of physical node1, physical disk 3 of physical node2, and physical disk 4 of physical node 2? The two-layer hash mapping of conventional distributed storage systems does not solve this problem.

In view of the above problems, the present application provides a data storage method, apparatus, device and readable storage medium, which avoid the problem that a data object is randomly mapped to any disk inside a node by setting a disk resource domain policy, and implement the purpose mapping of the data object, so that the data object can only be mapped to a disk in a corresponding disk resource domain. The flexibility of resource allocation is improved, and the storage performance of the distributed storage system is fully exerted.

Referring to fig. 2, a first embodiment of a data storage method provided in the present application is described below, where the first embodiment includes:

s201, determining a data object to be stored;

s202, determining a target node to which the data object is mapped, and acquiring a disk resource domain strategy of the target node, wherein the disk resource of the target node is divided into more than two disk resource domains, and the disk resource domain strategy comprises a corresponding relation between the data object and the disk resource domains and also comprises a corresponding relation between the disks and the disk resource domains;

s203, determining the mapping relation between the data object and the disk by utilizing a consistent hash algorithm on a target disk resource domain corresponding to the data object;

and S204, storing the data object according to the mapping relation between the data object and the disk.

In this embodiment, a concept of a resource domain (domain) is defined on the basis of not changing hash mapping, and taking a disk resource domain as an example, a resource set of a disk is defined in the disk resource domain. In this embodiment, a disk inside a node is divided into more than two disk resource domains, and a disk resource domain policy is used to describe a specific disk resource division condition, that is, to describe which disk resource domain each disk is specifically divided into, that is, a corresponding relationship between the disk and the disk resource domain; in addition, the disk resource domain policy is also used to describe a custom mapping rule, that is, a directional mapping policy between the object and the disk resource domain is defined, that is, the corresponding relationship between the data object and the disk resource domain.

In summary, a disk resource domain is a collection of disk resources, and a disk resource domain policy describes a partition condition of the disk resource domain and an object-to-disk resource domain mapping policy. In order to better explain the concept of the disk resource domain and the disk resource domain policy, the following description takes specific applications as an example:

the disk resource shown in fig. 1 is assumed to be divided into two disk resource domains, which are respectively denoted as domain-1 and domain-2. Assuming that the division result is as shown in fig. 3, the virtual nodes corresponding to the white circles in fig. 3, i.e., vnode1.1, vnode2.2, vnode3.2, and vnode3.3, are all divided into domain-1, and the virtual nodes corresponding to the black circles in fig. 3 are divided into domain-2. Assume that the naming convention for the virtual node is set to: the physical disk number + the number of the disk resource domain + the number of the virtual node in the disk resource domain, the naming result is as shown in fig. 3, for example, vnode3.3 in fig. 1 is named vnode 3.1.2 in fig. 3.

The hash rings are respectively constructed according to the two disk resource domains, and actually, the hash ring in fig. 1 is split into two hash rings according to the partition result of the disk resource domains, as shown in fig. 4, that is, the virtual node of domain-1 forms the hash ring 1, and the virtual node of domain-2 forms the hash ring 2.

The corresponding relation between the object and the disk resource domain is set in the disk resource domain policy, and for 8 objects shown in fig. 1, as shown in fig. 3, the object 1, the object 3, the object 4, and the object 5 all correspond to domain-1, and the other objects correspond to domain-2. Then, according to the consistent hash algorithm, the respective objects are distributed on the corresponding hash ring according to the hash value size of the object name, as shown in fig. 5.

By comparing fig. 1 and fig. 5, it can be seen that the mapping rule of the object and the location of the mapped virtual node are not changed, and only the name of the virtual node is changed. Therefore, only the disk resource domain needs to be defined, the hash ring shown in fig. 1 can be split into more than two, and the consistent hash distribution policy of the object on the two hash rings is not changed.

Therefore, the disk resource domain can divide the disk resources, and the virtual nodes of each disk resource domain obtained by dividing form a complete hash ring. From this perspective, the disk resource domain is a collection of node resources, and different combinations of nodes can be realized by defining different disk resource domain policies.

Specifically, in the above S103, that is, in the process of determining the mapping relationship between the data object and the disk by using the consistent hash algorithm on the target disk resource domain corresponding to the data object, the method specifically includes the following steps, as shown in fig. 6:

s601, constructing a hash ring according to the target disk resource domain corresponding to the data object;

s602, calculating a hash value of the name of the data object by using a consistent hash algorithm;

s603, determining the position of the data object on the hash ring according to the size of the hash value;

s604, determining the mapping relation between the data object and the disk according to the position of the data object on the hash ring.

In the data storage method provided by this embodiment, a disk resource of a node is divided into more than two disk resource domains in a disk resource domain policy, and a corresponding relationship between a data object and a disk resource domain is set in the disk resource domain policy. Therefore, when mapping between the data object and the disk is performed, the target disk resource domain corresponding to the data object is determined, and then the hash algorithm is used to determine which disk in the target disk resource domain the data object is specifically mapped to, so as to finally obtain the mapping relationship between the object and the disk. Therefore, the method avoids the problem that the data object is randomly mapped to any disk in the node by setting the disk resource domain strategy, and realizes the purpose mapping of the data object, so that the data object can only be mapped to a specific disk, namely the disk in the disk resource domain corresponding to the data object. The flexibility of resource allocation is improved, and the storage performance of the distributed storage system is fully exerted.

The second embodiment of the data storage method provided by the present application is described in detail below, and is implemented based on the first embodiment, and is expanded to a certain extent on the basis of the first embodiment.

Specifically, the first embodiment only describes that a disk resource domain policy is adopted in the disk mapping process, and on the basis of this embodiment, a node resource domain policy is also adopted in the node mapping process. Referring to fig. 7, the second embodiment specifically includes:

s701, determining a data object to be stored;

s702, obtaining a node resource domain strategy of a current cluster, wherein the node resource of the current cluster is divided into more than two node resource domains, and the node resource domain strategy comprises a corresponding relation between a data object and the node resource domains and also comprises a corresponding relation between nodes and the node resource domains;

s703, determining the mapping relation between the data object and the node by using a consistent hash algorithm on a target node resource domain corresponding to the data object, and obtaining a target node mapped by the data object;

s704, obtaining a disk resource domain strategy of the target node, wherein the disk resource of the target node is divided into more than two disk resource domains, and the disk resource domain strategy comprises a corresponding relation between a data object and the disk resource domains and also comprises a corresponding relation between a disk and the disk resource domains;

s705, determining the mapping relation between the data object and the disk by utilizing a consistent hash algorithm on a target disk resource domain corresponding to the data object;

s706, storing the data object according to the mapping relation between the data object and the disk.

In this embodiment, a node resource domain policy and a disk resource domain policy are respectively defined in two layers of hash mapping of a distributed storage system, where the node resource domain policy includes partition information of a node resource, and the disk resource domain policy includes partition information of a disk resource. And determining the node position information of the object in the node resource domain through the first-layer Hash mapping, and determining the disk position information of the object in the disk resource domain of the node through the second-layer Hash. Therefore, a directed mapping policy from node to disk can be implemented by defining a node resource domain policy and a disk resource domain policy in a configuration file, similar to the rule policy in ceph.

As a preferred embodiment, different disk resource domains may be divided according to the characteristics of the disk. Specifically, before the obtaining the disk resource domain policy of the target node, the method further includes: and setting a disk resource domain strategy of the target node, dividing the high-performance disk and the low-performance disk into different disk resource domains, and storing the disk resource domain strategy in a configuration file. For example, the high-performance storage medium is divided into a disk resource domain, and the low-performance storage medium is divided into a disk resource domain, so that the function of hierarchical storage can be realized by means of the strategy.

As a preferred embodiment, before the obtaining the node resource domain policy of the current cluster, the method further includes: setting a node resource domain strategy of the current cluster, dividing nodes in different fault domains into the same node resource domain, and storing the node resource domain strategy in a configuration file.

For example, assuming that node1, node2 and node3 are in the same rack, in order to solve the problem that the data copies stored on the nodes are simultaneously failed due to the simultaneous downtime of the rack for node1, node2 and node3, node1, node2 and node3 are usually grouped in the same fault domain. In this embodiment, the node1, the node2, and the node3 are defined in different node resource domains by a custom rule, or the nodes located in different fault domains are divided into the same node resource domain, and each node resource domain shares its own hash ring, so that the case that the object copy exists on the 3 nodes at the same time does not occur, and the definition of the node resource domain policy can implement the function of the fault domain.

It can be seen that, in the data storage method provided by this embodiment, on the basis of not changing the two-layer hash mapping, the nodes and the disks on the nodes are subjected to resource partitioning and integration according to a certain rule, so as to form a node resource domain and a disk resource domain; the method allows the user to define node resource domain strategies and disk resource domain strategies, and realizes the mapping of objects to specific nodes and characteristic disks according to the user-defined strategies, so that the physical resources in the distributed storage system can be used more flexibly, and the functions of fault domain, layered storage and the like can be realized.

In the following, a data storage device provided by an embodiment of the present application is described, and a data storage device described below and a data storage method described above are referred to correspondingly.

As shown in fig. 8, the data storage device of the present embodiment includes:

object determination module 801: for determining a data object to be stored;

the policy acquisition module 802: the system comprises a target node, a disk resource domain policy and a data object mapping module, wherein the target node is used for determining a target node to which the data object is mapped and acquiring the disk resource domain policy of the target node, the disk resource of the target node is divided into more than two disk resource domains, and the disk resource domain policy comprises a corresponding relation between the data object and the disk resource domains and also comprises a corresponding relation between the disks and the disk resource domains;

the mapping relation determination module 803: the mapping relation between the data object and the disk is determined by utilizing a consistent hash algorithm on a target disk resource domain corresponding to the data object;

the storage module 804: and the data object is stored according to the mapping relation between the data object and the disk.

The data storage device of this embodiment is used to implement the foregoing data storage method, and therefore the specific implementation of this device can be seen in the foregoing embodiment portions of the data storage method, for example, the object determining module 801, the policy obtaining module 802, the mapping relationship determining module 803, and the storage module 804 are respectively used to implement steps S201, S202, S203, and S204 in the foregoing data storage method. Therefore, specific embodiments thereof may be referred to in the description of the corresponding respective partial embodiments, and will not be described herein.

In addition, since the data storage device of this embodiment is used to implement the foregoing data storage method, the role thereof corresponds to that of the foregoing method, and details thereof are not repeated here.

In addition, the present application also provides a data storage device, including:

a memory: for storing a computer program;

Finally, the present application provides a readable storage medium having stored thereon a computer program for implementing the steps of the data storage method as described above when executed by a processor.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above detailed descriptions of the solutions provided in the present application, and the specific examples applied herein are set forth to explain the principles and implementations of the present application, and the above descriptions of the examples are only used to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of storing data, comprising:

determining a data object to be stored;

2. The method of claim 1, wherein determining the mapping relationship between the data object and the disk by using a consistent hashing algorithm on the target disk resource domain corresponding to the data object comprises:

3. The method of claim 2, wherein prior to said obtaining the disk resource domain policy of the target node, further comprising:

4. The method of claim 3, wherein storing the data object according to the mapping relationship between the data object and the disk comprises:

5. The method of any of claims 1-4, wherein the determining the target node to which the data object maps comprises:

6. The method of claim 5, wherein prior to said obtaining the node resource domain policy for the current cluster, further comprising:

7. A data storage device, comprising:

an object determination module: for determining a data object to be stored;

8. A data storage device, comprising:

a memory: for storing a computer program;

a processor: for executing said computer program for implementing the steps of the data storage method according to any one of claims 1 to 6.

9. A readable storage medium, having stored thereon a computer program for implementing the steps of the data storage method according to any one of claims 1-6 when executed by a processor.