WO2021003935A1

WO2021003935A1 - Data cluster storage method and apparatus, and computer device

Info

Publication number: WO2021003935A1
Application number: PCT/CN2019/118232
Authority: WO
Inventors: 兰东平
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-07-11
Filing date: 2019-11-13
Publication date: 2021-01-14
Also published as: CN110489059A; CN110489059B

Abstract

A data cluster storage method and apparatus, and a computer device, which relate to the field of data processing, and can solve the problem of poor storage performance caused by incapability of guaranteeing the regular storage of data in a cluster and rapidly locating the position where the data is stored when the cluster is used for data storage. The method comprises: obtaining all physical clusters for storing data (101); uniformly mapping the physical clusters onto a physical node of a consistent hash ring (102); determining, according to a hash value of a file to be stored, the optimal target physical cluster for storage (103); and storing said file into the target physical cluster (104). The method is applicable to data cluster storage.

Description

Method, device and computer equipment for data cluster storage

Technical field

This application claims priority with the Chinese patent application filed on July 11, 2019 with the Chinese Patent Office, the application number is 201910625543.5, and the application name is "Data cluster storage methods, devices and computer equipment", the entire content of which is incorporated by reference Applying.

Background technique

With the accumulation of physical resources such as servers and cabinets used by data storage services, the way to expand storage capacity through clusters is widely welcomed. Among them, cluster storage aggregates the storage space of multiple storage devices into one that can provide unified access to application servers. The storage pool of the interface and management interface, the application can transparently access and utilize the disks on all storage devices through the access interface, and can give full play to the performance and disk utilization of the storage device.

At present, when data is stored in clusters, it is common to randomly store data in idle physical clusters, and when performing cross-cluster storage of data, additional index libraries need to be established to record the correspondence between data and clusters. Query the clusters stored in historical files in the index database.

However, the aforementioned storage method cannot guarantee the regular storage of data in the cluster, and cannot quickly locate the location of the data storage, resulting in poor storage performance.

Summary of the invention

In view of this, this application discloses a method, device and computer equipment for data cluster storage. The main purpose is to solve the problem that when data cluster storage is performed, the regular storage of data in the cluster cannot be guaranteed, and the location of data storage cannot be quickly located. , Resulting in poor storage performance.

According to one aspect of the present application, there is provided a data cluster storage method, the method including:

Get all physical clusters used to store data;

Mapping the physical cluster to the physical node of the consistent hash ring;

The uniformly mapping the physical clusters to the physical nodes of the consistent hash ring specifically includes: obtaining the storage space of the physical cluster; and the first physical cluster with the storage space greater than or equal to a preset threshold according to the preset Suppose the proportion is divided into multiple sub-physical clusters with equal space; according to the naming rule, the second physical cluster whose storage space is less than the preset threshold and each of the sub-physical clusters are configured with an identity code; according to the identity code Determine the hash value of the second physical cluster and each of the sub-physical clusters; use the hash value to calculate the physical node positions of the second physical cluster and the sub-physical cluster on the consistent hash ring;

Determine the optimal storage target physical cluster according to the hash value of the file to be stored;

Storing the file to be stored in the target physical cluster.

According to another aspect of the present application, there is provided a data cluster storage device, which includes:

The acquisition module is used to acquire all physical clusters used to store data;

A mapping module, which is used to map the physical cluster to a physical node of a consistent hash ring;

The mapping unit is specifically configured to: obtain the storage space of the physical cluster; divide the first physical cluster with the storage space greater than or equal to a preset threshold into a plurality of sub-physical clusters with equal space according to a preset ratio; The naming rule is that the second physical cluster whose storage space is less than the preset threshold and each of the sub-physical clusters are configured with an identification code; the second physical cluster and each of the sub-physical clusters are determined according to the identification code Use the hash value to calculate the physical node positions of the second physical cluster and the sub-physical cluster on the consistent hash ring;

The determining module is used to determine the optimal storage target physical cluster according to the hash value of the file to be stored;

The storage module is used to store the file to be stored in the target physical cluster.

According to another aspect of the present application, there is provided a non-volatile readable storage medium having computer readable instructions stored thereon, and the computer readable instructions are executed by a processor to implement the above-mentioned data cluster storage method.

According to another aspect of the present application, there is provided a computer device, including a non-volatile readable storage medium, a processor, and a computer-readable storage medium that is stored on the non-volatile readable storage medium and can run on the processor. Instructions, when the processor executes the computer-readable instructions, the method for data cluster storage is implemented.

With the above technical solutions, the method, device and computer equipment for data cluster storage provided by this application, compared with the current method of randomly storing data in idle physical clusters, this application can evenly map physical clusters to consistent data. On the physical nodes of Xihuan, the logical node position of the file to be stored in the consistent hash ring is determined according to the hash value of the file to be stored, and the optimal storage target physical cluster is filtered based on the logical node position, and then the file to be stored is selected Stored in the target physical cluster. This application can quickly locate the cluster where the data file should be stored through calculation. Because the hash value of the data file is fixed, it can ensure the regular storage of the data in the cluster. Moreover, each physical cluster is evenly mapped to the physical nodes of the consistent hash ring, so that each physical cluster can store data, avoiding the centralized storage of data in a physical cluster, resulting in increased storage pressure and data avalanche The problem. In addition, integrating the consistent hash ring into the data cluster storage of the present application can effectively reduce the complexity of data storage, thereby reducing costs, and can achieve efficient positioning of the physical cluster to meet the needs of massive storage expansion.

Description of the drawings

The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the local application. In the attached picture:

FIG. 1 shows a schematic flowchart of a data cluster storage method provided by an embodiment of the present application;

FIG. 2 shows a schematic flowchart of another data cluster storage method provided by an embodiment of the present application;

FIG. 3 shows an example schematic diagram of a data cluster storage method provided by an embodiment of the present application;

FIG. 4 shows an example schematic diagram of another data cluster storage method provided by an embodiment of the present application;

FIG. 5 shows an example schematic diagram of yet another data cluster storage method provided by an embodiment of the present application;

FIG. 6 shows a schematic structural diagram of a data cluster storage device provided by an embodiment of the present application;

FIG. 7 shows a schematic structural diagram of another data cluster storage device provided by an embodiment of the present application.

Detailed ways

Hereinafter, the application will be described in detail with reference to the drawings and in conjunction with embodiments. It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict.

In view of the current data cluster storage, the regular storage of data in the cluster cannot be guaranteed, and the location of the data storage cannot be quickly located, resulting in poor storage performance. The embodiment of the present application provides a method for data cluster storage. As shown in FIG. 1, the method includes:

101. Obtain all physical clusters used to store data.

For this embodiment, the purpose of obtaining all physical clusters is to configure all physical clusters equally in a consistent hash ring, so as to realize uniform configuration distribution of physical clusters.

102. Evenly map the physical clusters to the physical nodes of the consistent hash ring.

For this embodiment, a unified namespace can be used to name each physical cluster, thereby mapping each physical cluster to a consistent hash ring. Specifically, the identification code or host name of the physical cluster can be selected as the key to calculate the hash value , So that each machine can determine its position on the hash ring, so as to achieve targeted storage of data files based on a consistent hash algorithm.

Among them, the consistent hash ring can be imagined as a ring composed of 2^32 points, the point directly above the ring represents 0, the first point to the right of the 0 point represents 1, and so on, 2. 3, 4, 5, 6... until 2^32-1, which means that the first point to the left of 0 points represents 2^32-1. The consistent hash ring has two layers of nodes: the first layer is a logical node, the number is 2^32; the second layer is a physical node, which is the actual storage cluster.

103. Determine an optimal storage target physical cluster according to the hash value of the file to be stored.

Among them, the target physical cluster is the physical cluster that is most suitable for the storage of the file to be stored determined according to the consistent hash algorithm. The method of using the consistent hash algorithm to determine the target physical cluster is: starting from the logical node location where the file to be stored is located, The first physical cluster with normal storage status encountered in the clockwise direction is determined as the target physical cluster.

104. Store the file to be stored in the target physical cluster.

In a specific application scenario, after determining the optimal storage target physical cluster, the files to be stored can be stored in the target physical cluster, and queries and data acquisitions of the data to be stored can be received.

Through the method of data cluster storage in this embodiment, the physical cluster can be evenly mapped to the physical nodes of the consistent hash ring, and the logical node of the file to be stored in the consistent hash ring is determined according to the hash value of the file to be stored Location, the optimal storage target physical cluster is filtered out based on the logical node location, and then the files to be stored are stored in the target physical cluster. This application can quickly locate the cluster where the data file should be stored through calculation. Because the hash value of the data file is fixed, it can ensure the regular storage of the data in the cluster. Moreover, each physical cluster is evenly mapped to the physical nodes of the consistent hash ring, so that each physical cluster can store data, avoiding the centralized storage of data in a physical cluster, resulting in increased storage pressure and data avalanche The problem. In addition, integrating the consistent hash ring into the data cluster storage of the present application can effectively reduce the complexity of data storage, thereby reducing costs, and can achieve efficient positioning of the physical cluster to meet the needs of massive storage expansion.

Further, as a refinement and extension of the specific implementation of the foregoing embodiment, in order to fully illustrate the specific implementation process in this embodiment, another method for data cluster storage is provided. As shown in FIG. 2, the method includes:

201. Obtain all physical clusters used to store data.

In specific application scenarios, all physical clusters used to store data can be obtained from the data storage system. For example, if the data storage system contains four physical clusters A, B, C, and D, the basic information of the four clusters A, B, C, and D needs to be extracted.

202. Obtain the storage space of the physical cluster, and divide the first physical cluster with the storage space greater than or equal to a preset threshold into multiple sub-physical clusters with equal space according to a preset ratio.

In specific application scenarios, if there are fewer physical clusters in the consistent hash ring, it is easy to cause data storage skew problems due to uneven node distribution. Therefore, in this application, a physical cluster with a larger storage space needs to be split into multiple sub-physical clusters, and each sub-physical cluster is distributed in different physical nodes to ensure the uniform distribution of data among the physical clusters and avoid data storage The problem of tilt, and when a single sub-physical cluster fails, other normal sub-physical clusters will not be affected, thereby ensuring the security of data storage.

Wherein, the preset threshold value is the minimum storage space for judging to divide the physical cluster into multiple sub-physical clusters. The preset ratio is the number of divisions that divide the physical cluster into sub-physical clusters, and the value of the preset ratio can be preset according to actual needs.

For example, set the preset ratio of the unit capacity of the physical cluster and the sub-physical cluster to 10:1, and the preset threshold is 30TB. If the storage space of physical cluster A is 200TB, the storage space of physical cluster B is 100TB, and the storage space of physical cluster C is 100TB. The storage space is 20TB. Because the storage space of physical cluster A and physical cluster B is greater than the preset threshold, physical cluster A and physical cluster B are defined as the first physical cluster to be divided, and physical cluster A is divided into 10 20TB pieces according to the preset ratio Sub-physical cluster, which divides physical cluster B into 10 sub-physical clusters of 10TB. Since the storage space of the physical cluster C is less than the preset threshold, it can be determined that the storage space is small and does not need to be divided into multiple sub-physical clusters, and it is defined as the second physical cluster.

203. According to the naming rule, configure an identification code for the second physical cluster and each sub-physical cluster whose storage space is less than the preset threshold.

After the first physical cluster with larger storage space is divided into multiple sub-physical clusters based on step 202 of the embodiment, a unified namespace needs to be used to configure identities that comply with the naming rules for each sub-physical cluster and the second physical cluster to facilitate the physical Unified management of storage space.

Among them, the naming rule can be uniformly set as cluster[cluster number]-[physical node number]. Before naming the physical cluster, it is necessary to obtain the cluster number of the physical cluster, and determine that the physical cluster corresponding to the cluster number is the first physical cluster It is the second physical cluster. If it is determined to be the first physical cluster, it is necessary to further obtain the sequence number of the sub-physical cluster in the first physical cluster, that is, the physical node number in the corresponding naming rule. When it is determined that the physical cluster corresponding to the cluster number is the second physical cluster, the physical node number can be directly set to 1. For example, if it is determined that the storage cluster 1 has two sub-physical nodes, the two sub-physical nodes can be named cluster1-1 and cluster1-2 in sequence; if it is determined that the storage cluster 2 has four sub-physical nodes, the four sub-physical nodes can be named in sequence As, cluster2-1, cluster2-2, cluster2-3, cluster2-4; if it is determined that the storage cluster 3 is the second physical cluster, it can be named cluster3-1.

204. Determine the hash value of the second physical cluster and each sub-physical cluster according to the identification code.

For this embodiment, in a specific application scenario, the method for determining the mapping of the second physical cluster and sub-physical clusters to the consistent hash ring may be: using the MD5 message digest algorithm to generate a 128-bit (16 byte) Hash value is used to ensure complete and consistent information transmission. The specific implementation method is: MD5 processes the input identification code in 512-bit groups, and each group is divided into 16 32-bit sub-groups. After a series of processing, the output of the algorithm is composed of four 32-bit groups. , Cascading these four 32-bit packets will generate a 128-bit hash value.

205. Calculate physical node positions of the second physical cluster and sub-physical clusters on the consistent hash ring by using the hash value.

For this embodiment, in a specific application scenario, the method of using the hash value to determine the physical node positions of the second physical cluster and the sub-physical cluster on the consistent hash ring may be: using a hash value function to obtain The hash value of the second physical cluster or sub-physical cluster is modulo 2^32, and the obtained result is the physical node corresponding to the consistent hash ring, namely: Hash(cluster1)=hash(cluster1)%2^32, Among them, hash(cluster1) is the hash value obtained according to the identification code, and hash(cluster1) is the physical node corresponding to the second physical cluster or sub-physical cluster on the consistent hash ring. The result calculated by the above formula must be an integer between 0 and 2^32-1, so the calculated integer represents the physical cluster. Since this integer must be between 0 and 2^32-1, then, The location of the physical node must be determined on the consistent hash ring, that is, to map each physical cluster to the consistent hash ring.

206. Calculate the hash value of the file to be stored according to the identification code of the file to be stored.

In this embodiment, the calculation method in step 204 is the same. After the identification code of the file to be stored is obtained, the identification code is converted into a hash value.

207. Use the hash value to determine the logical node position of the file to be stored on the consistent hash ring.

For this embodiment, correspondingly, the method of using the hash value to determine the logical node position of the file to be stored on the consistent hash ring may be: Using the hash value function, the obtained hash value of the file to be stored is then paired by 2 ^32 is the modulus, and the obtained result is the logical node corresponding to the consistent hash ring, namely: Hash(obj1)=hash(obj1)%2^32, where hash(obj1) is the identification code according to the file to be stored Find the obtained hash value, Hash(obj1) is the logical node corresponding to the file to be stored. The result calculated by the above formula must be an integer between 0 and 2^32-1, so the calculated integer represents the file to be stored. Since this integer must be between 0 and 2^32-1, then , The location of the logical node corresponding to the file to be stored must be determined on the consistent hash ring.

208. Determine the first second physical cluster or the first sub-physical cluster that is checked out in a clockwise direction from the logical node position on the consistent hash ring as the target physical cluster.

In a specific application scenario, a consistent hash algorithm can be used to determine the target physical cluster corresponding to the file to be stored. Among them, the principle of the consistent hash algorithm is: After mapping the physical cluster and the file to be stored on the hash ring, starting from the location of the file to be stored, the first physical cluster encountered in the clockwise direction is the current The physical cluster where the object will be cached. Since the hashed value of the file to be stored and the physical cluster is fixed, the file to be stored must be cached on the fixed physical cluster when the physical cluster remains unchanged. Then, When you want to access the file to be stored next time, just use the same algorithm again to calculate the location where the file to be stored is cached, and go directly to the corresponding physical cluster to find it.

For example, as shown in Figure 3, if it is determined through the embodiment steps that the logical node position of the file to be stored on the consistent hash ring is key1, then the physical cluster is searched clockwise using key1 as the starting point. The physical node key2 on the ring is the first second physical cluster or the first sub-physical cluster retrieved, and the physical cluster corresponding to the key2 point can be determined as the target physical cluster of the file to be stored at the key1 point.

209. Receive an addition or deletion instruction to the physical cluster.

For this embodiment, in a specific application scenario, the physical cluster has fault tolerance and scalability. When it is determined that there is a physical cluster failure, in order not to affect the data storage, the failed physical cluster needs to be removed. In addition, in order to increase the storage space of the cluster storage, physical clusters can be added to the consistent hash ring according to the actual situation.

210. Update the second physical cluster and/or sub-physical cluster on the physical node according to the addition and deletion instructions.

For this embodiment, after receiving an instruction to delete a physical cluster, the physical node where the physical cluster is located can be cleared; after it is determined that a new physical cluster needs to be added, an identity identifier needs to be configured for the physical cluster according to the naming rules Code, and determine the physical node position of the physical cluster on the consistent hash ring based on the identification code.

211. Adjust the storage location of the files to be stored that meet the preset conditions.

For this embodiment, in a specific application scenario, if an instruction to add a physical cluster is received based on step 209 of the embodiment, step 211 of the embodiment specifically includes: acquiring a newly added second physical cluster or each sub-physical cluster; The naming rule is to configure the identity code for the newly added second physical cluster or each sub-physical cluster; based on the identity code, determine the position of the newly added physical node of the newly added second physical cluster or each sub-physical cluster on the consistent hash ring ; Extract the data to be migrated between the location of the newly added physical node and the location of the previous cluster physical node in the ring space, where the location of the previous cluster physical node is the first and second one that is checked counterclockwise from the location of the newly added physical node as the starting point The physical node location corresponding to the physical cluster or the first sub-physical cluster; the data to be migrated is migrated and stored in the sub-physical cluster or the second physical cluster corresponding to the location of the newly added physical node.

For example, as shown in Figure 4, when a new physical node key5 is added in the clockwise front direction of the consistent hash ring physical node key2, it is necessary to determine the target physical node location key5 and the previous cluster physical node location in the ring space All the data to be migrated between key4, if it is determined that there is the data key1 to be migrated, the data file key1 originally stored on the physical node key2 can be migrated and stored to the physical cluster corresponding to the physical node key5, and the rest of the stored data files are No changes are required. Therefore, if a new physical cluster is added, the affected data is only the data between the new physical cluster and the previous physical cluster in the ring space (that is, the first physical cluster encountered when walking in the counterclockwise direction). The data will not be affected.

Correspondingly, if a deletion instruction for a physical cluster is received based on step 209 of the embodiment, step 211 of the embodiment specifically includes: determining the second physical cluster or sub-physical cluster to be deleted; and setting the second physical cluster to be deleted Or all storage files in the sub-physical cluster are migrated and stored in the first second physical cluster or the first sub-physical cluster in the clockwise direction; after the storage files are migrated and stored, the second physical cluster to be deleted Or delete the sub-physical cluster.

For example, as shown in Figure 5, if it is determined that the physical cluster corresponding to the location of the key2 physical node is faulty, you only need to remove key2 from the hash ring, and when key2 is not removed, you need to store the previous data on key2. The data file key1 is migrated and stored in the physical cluster corresponding to key3, because starting from the location of key1, the first physical cluster encountered in the clockwise direction is the physical cluster corresponding to key3. In this embodiment, if a physical cluster is deleted, the affected data is only the data stored in the physical cluster to be deleted, and other data will not be affected. In other words, if a physical cluster is unavailable, the affected data is only between this physical cluster and the previous physical cluster in its ring space (that is, the first physical cluster encountered when walking in a counterclockwise direction) Data, other data will not be affected.

In specific application scenarios, in order to facilitate the retrieval of stored data, as a preferred way, after storing the data to be stored in the physical cluster, it may specifically include: receiving a query request for the file to be stored; The hash value of the file determines the target physical cluster; the data file is retrieved in the target physical cluster.

For this embodiment, when the physical cluster remains unchanged, the file to be stored must be cached on a fixed physical cluster. Then, when you want to access the file to be stored next time, you only need to use the same algorithm for calculation again. You can calculate where the file to be stored is cached, and go directly to the corresponding physical cluster to find it.

Through the above method of data cluster storage, a physical cluster with a larger storage space can be divided into multiple sub-physical clusters with equal space to achieve uniform distribution of physical clusters, so that different data can be evenly stored in corresponding locations, thereby balancing each The storage pressure of the physical cluster avoids the avalanche of stored data caused by centralized data storage. After that, according to the naming rules, the second physical cluster and each of the sub-physical clusters whose storage space is less than the preset threshold is configured with an identification code, and the hash value is calculated according to the identification code, and then the second physical cluster and each of the sub-physical clusters The physical clusters are evenly mapped into the consistent hash ring, and then the consistent hash algorithm is used to determine the target storage cluster corresponding to the optimal storage of the file to be stored, and the file to be stored is stored in the target storage cluster. In addition, it can also receive the addition and deletion instructions of the physical cluster to meet the expansion needs of mass storage. Then update the second physical cluster and/or the sub-physical cluster on the physical node according to the add-delete instruction, and adjust the storage location of the file to be stored that meets the preset condition in time. The queryability of data storage can be guaranteed, that is, when querying the file to be stored, the original storage location of the data is not considered, and the current storage physical cluster location can be accurately located according to the hash value of the file to be stored. And then realize the efficient positioning of data storage.

Further, as a specific embodiment of the method shown in FIG. 1 and FIG. 2, an embodiment of the present application provides a data cluster storage device. As shown in FIG. 3, the device includes: an acquisition module 31, a mapping module 32, and a determination module 33. Storage module 34.

The obtaining module 31 can be used to obtain all physical clusters used to store data;

The mapping module 32 can be used to map the physical cluster to the physical nodes of the consistent hash ring;

The determining module 33 can be used to determine the optimal storage target physical cluster according to the hash value of the file to be stored;

The storage module 34 can be used to store the files to be stored in the target physical cluster.

In a specific application scenario, in order to map the physical cluster to the physical nodes of the consistent hash ring, the mapping module 32 can be specifically used to obtain the storage space of the physical cluster; the storage space is greater than or equal to the preset threshold of the first physical The cluster is divided into multiple sub-physical clusters with equal space according to the preset ratio; according to the naming rules, the second physical cluster with storage space less than the preset threshold and each sub-physical cluster are configured with identification codes; the second physical cluster is determined according to the identification codes And the hash value of each sub-physical cluster; the hash value is used to calculate the physical node positions of the second physical cluster and the sub-physical cluster on the consistent hash ring.

Correspondingly, in order to determine the optimal target physical cluster for storing the file to be stored, the determining module 33 can be specifically used to calculate the hash value of the file to be stored according to the identification code of the file to be stored; the hash value is used to determine the location of the file to be stored The location of the logical node on the consistent hash ring; the first second physical cluster or the first sub-physical cluster that is checked out clockwise from the logical node location on the consistent hash ring is determined as the target physical cluster.

In a specific application scenario, in order to eliminate the faulty physical cluster or realize the expansion of the physical cluster, as shown in FIG. 6, the device further includes: a receiving module 35, an update module 36, and an adjustment module 37.

The receiving module 35 can be used to receive addition and deletion instructions to the physical cluster;

The update module 36 can be used to update the second physical cluster and/or sub-physical cluster on the physical node according to the addition and deletion instructions;

The adjustment module 37 can be used to adjust the storage location of the files to be stored that meet the preset conditions.

In a specific application scenario, if the receiving module 35 receives an instruction to add a physical cluster, the adjustment module 37 can be specifically used to obtain the newly added second physical cluster or each sub-physical cluster; according to the naming rule, the newly added physical cluster Configure the identity code for the second physical cluster or each sub-physical cluster; determine the position of the newly added physical node of the second physical cluster or each sub-physical cluster on the consistent hash ring based on the identity code; extract the new physical node The data to be migrated between the location and the location of the physical node of the previous cluster in the ring space, where the location of the physical node of the previous cluster is the first second physical cluster or the first sub-physical that was checked counterclockwise from the location of the new physical node as the starting point The location of the physical node corresponding to the cluster; the data to be migrated is migrated and stored in the sub-physical cluster or the second physical cluster corresponding to the location of the newly added physical node.

Correspondingly, if the receiving module 35 receives the deletion instruction for the physical cluster, the adjustment module 37 can be specifically used to determine the second physical cluster or sub-physical cluster to be deleted; and the second physical cluster or sub-physical cluster to be deleted All storage files in the sub-physical cluster are migrated and stored to the first second physical cluster or the first sub-physical cluster in the clockwise direction; after the storage files are migrated and stored, the second physical cluster or The child physical cluster is deleted.

In a specific application scenario, in order to provide a query on the storage location of the file to be stored, as shown in FIG. 7, the device further includes: a query module 38.

The receiving module 35 can also be used to receive query requests for files to be stored;

The determining module 33 may also be used to determine the target physical cluster according to the hash value of the file to be stored;

The retrieval module 38 can be used to retrieve data files in the target physical cluster.

It should be noted that, for other corresponding descriptions of the functional units involved in the device for data cluster storage provided in this embodiment, reference may be made to the corresponding descriptions in FIGS. 1 to 2, and details are not repeated here.

Based on the methods shown in Figures 1 and 2, correspondingly, embodiments of the present application also provide a storage medium on which computer-readable instructions are stored. When the computer-readable instructions are executed by a processor, the foregoing 1 and Figure 2 show the method of data cluster storage.

Based on this understanding, the technical solution of this application can be embodied in the form of a software product. The software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods in each implementation scenario of the present application.

Based on the above methods shown in Figures 1 and 2 and the virtual device embodiments shown in Figures 6 and 7, in order to achieve the above objectives, an embodiment of the present application also provides a computer device, which may be a personal computer, Servers, network devices, etc., the physical device includes a storage medium and a processor; the storage medium is used to store a computer program; the processor is used to execute the computer program to implement the data cluster storage method shown in FIG. 1 and FIG. 2 .

Optionally, the computer device may also include a user interface, a network interface, a camera, a radio frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on. The user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, and the like. The network interface can optionally include a standard wired interface, a wireless interface (such as a Bluetooth interface, a WI-FI interface), etc.

Those skilled in the art can understand that the computer device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or arrange different components.

The non-volatile readable storage medium may also include an operating system and a network communication module. The operating system is the computer readable instructions of the physical device hardware and software resources stored in the data cluster, and supports the operation of the information processing computer readable instructions and other software and/or computer readable instructions. The network communication module is used to implement communication between various components in the non-volatile readable storage medium and communication with other hardware and software in the physical device.

Through the description of the foregoing implementation manners, those skilled in the art can clearly understand that this application can be implemented by means of software plus a necessary general hardware platform, or by hardware. By applying the technical solution of the present application, compared with the current prior art, the present application can divide a physical cluster with a larger storage space into multiple sub-physical clusters with equal space, realize the uniform distribution of the physical clusters, and enable different data Evenly store to the corresponding location, thereby balancing the storage pressure of each physical cluster, and avoiding stored data avalanches caused by centralized data storage. After that, according to the naming rules, the second physical cluster and each of the sub-physical clusters whose storage space is less than the preset threshold is configured with an identification code, and the hash value is calculated according to the identification code, and then the second physical cluster and each of the sub-physical clusters The physical clusters are evenly mapped into the consistent hash ring, and then the consistent hash algorithm is used to determine the target storage cluster corresponding to the optimal storage of the file to be stored, and the file to be stored is stored in the target storage cluster. In addition, it can also receive the addition and deletion instructions of the physical cluster to meet the expansion needs of mass storage. Then update the second physical cluster and/or the sub-physical cluster on the physical node according to the add-delete instruction, and adjust the storage location of the file to be stored that meets the preset condition in time. The queryability of data storage can be guaranteed, that is, when querying the file to be stored, the original storage location of the data is not considered, and the current storage physical cluster location can be accurately located according to the hash value of the file to be stored. And then realize the efficient positioning of data storage.

Those skilled in the art can understand that the accompanying drawings are only schematic diagrams of preferred implementation scenarios, and the modules or processes in the accompanying drawings are not necessarily necessary for implementing this application. Those skilled in the art can understand that the modules in the device in the implementation scenario can be distributed in the device in the implementation scenario according to the description of the implementation scenario, or can be changed to be located in one or more devices different from the implementation scenario. The modules of the above implementation scenarios can be combined into one module or further divided into multiple sub-modules.

The above serial number of this application is only for description, and does not represent the merits of implementation scenarios. The above disclosures are only a few specific implementation scenarios of the application, but the application is not limited to these, and any changes that can be thought of by those skilled in the art should fall into the protection scope of the application.

Claims

A method for data cluster storage, characterized in that it comprises:

Get all physical clusters used to store data;

Evenly map the physical cluster to the physical nodes of the consistent hash ring;

The uniformly mapping the physical clusters to the physical nodes of the consistent hash ring specifically includes: obtaining the storage space of the physical cluster; and the first physical cluster with the storage space greater than or equal to a preset threshold according to the preset Suppose the proportion is divided into multiple sub-physical clusters with equal space; according to the naming rule, the second physical cluster whose storage space is less than the preset threshold and each of the sub-physical clusters are configured with an identity code; according to the identity code Determine the hash value of the second physical cluster and each of the sub-physical clusters; use the hash value to calculate the physical node positions of the second physical cluster and the sub-physical cluster on the consistent hash ring;

Determine the optimal storage target physical cluster according to the hash value of the file to be stored;

Storing the file to be stored in the target physical cluster.
The method according to claim 1, wherein the determining the optimal storage target physical cluster according to the hash value of the file to be stored specifically comprises:

Calculating the hash value of the file to be stored according to the identification code of the file to be stored;

Using the hash value to determine the logical node position of the file to be stored on the consistent hash ring;

The first second physical cluster or the first sub-physical cluster that is checked out clockwise on the consistent hash ring with the location of the logical node as a starting point is determined as the target physical cluster.
The method according to claim 2, wherein after storing the file to be stored in the target physical cluster, the method further comprises:

Receive addition and deletion instructions to the physical cluster;

Updating the second physical cluster and/or the sub-physical cluster on the physical node according to the addition and deletion instruction;

Adjust the storage location of the file to be stored that meets the preset conditions.
The method according to claim 3, wherein if an instruction to add the physical cluster is received, the adjusting the storage location of the file to be stored that meets a preset condition specifically includes:

Obtain the newly added second physical cluster or each sub-physical cluster;

Configure an identity code for the newly added second physical cluster or each sub-physical cluster according to the naming rule;

Determining the position of the newly added physical node of the newly added second physical cluster or each sub-physical cluster on the consistent hash ring based on the identity identification code;

Extract the data to be migrated between the location of the newly added physical node and the location of the previous cluster physical node in the ring space, where the location of the previous cluster physical node is checked counterclockwise from the location of the newly added physical node as a starting point The location of the physical node corresponding to the first second physical cluster or the first sub-physical cluster;

The data to be migrated is migrated and stored in the sub-physical cluster or the second physical cluster corresponding to the position of the newly added physical node.
The method according to claim 3, wherein if a deletion instruction for the physical cluster is received, the adjusting the storage location of the file to be stored that meets a preset condition specifically includes:

Determine the second physical cluster or sub-physical cluster to be deleted;

Migrating and storing all the storage files in the second physical cluster or sub-physical cluster to be deleted to the first second physical cluster or the first sub-physical cluster in a clockwise direction;

After completing the migration and storage of the storage file, delete the second physical cluster or sub-physical cluster to be deleted.
The method according to claim 1, wherein after storing the to-be-stored file in the target physical cluster, it specifically further comprises:

Receiving a query request for the file to be stored;

Determining the target physical cluster according to the hash value of the file to be stored;

The data file is retrieved in the target physical cluster.
A data cluster storage device, characterized in that it comprises:

The acquisition module is used to acquire all physical clusters used to store data;

A mapping module, which is used to map the physical cluster to a physical node of a consistent hash ring;

The mapping unit is specifically configured to: obtain the storage space of the physical cluster; divide the first physical cluster with the storage space greater than or equal to a preset threshold into a plurality of sub-physical clusters with equal space according to a preset ratio; The naming rule is that the second physical cluster whose storage space is less than the preset threshold and each of the sub-physical clusters are configured with an identification code; the second physical cluster and each of the sub-physical clusters are determined according to the identification code Use the hash value to calculate the physical node positions of the second physical cluster and the sub-physical cluster on the consistent hash ring;

The determining module is used to determine the optimal storage target physical cluster according to the hash value of the file to be stored;

The storage module is used to store the file to be stored in the target physical cluster.
7. The device according to claim 7, wherein the determining module is specifically configured to calculate the hash value of the file to be stored according to the identification code of the file to be stored; and to determine the The location of the logical node of the stored file on the consistent hash ring; the first second physical cluster or the first sub-physical cluster that is checked out clockwise from the location of the logical node on the consistent hash ring is determined Is the target physical cluster.
The device according to claim 8, wherein the device further comprises: a receiving module, an update module, and an adjustment module;

The receiving module is used to receive addition and deletion instructions to the physical cluster;

The update module is configured to update the second physical cluster and/or the sub-physical cluster on the physical node according to the addition and deletion instruction;

The adjustment module is configured to adjust the storage location of the file to be stored that meets a preset condition.
The apparatus according to claim 9, wherein if an instruction to add the physical cluster is received, the adjustment module is specifically configured to obtain a newly added second physical cluster or each sub-physical cluster; according to the The naming rule is to configure an identity code for the newly added second physical cluster or each sub-physical cluster; based on the identity code, it is determined that the newly added second physical cluster or each sub-physical cluster is in the consistent hash The position of the newly added physical node on the ring; extracting the data to be migrated between the position of the newly added physical node and the position of the previous cluster physical node in the ring space, wherein the position of the previous cluster physical node is based on the newly added physical node position The physical node location is the starting point and counterclockwise to check the physical node location corresponding to the first second physical cluster or the first sub-physical cluster; migrate and store the data to be migrated to the sub-physical cluster or the first sub-physical cluster corresponding to the location of the newly added physical node 2. In the physical cluster.
The device according to claim 9, wherein if a deletion instruction for the physical cluster is received, the adjustment module is specifically configured to determine the second physical cluster or sub-physical cluster to be deleted; All storage files in the second physical cluster or sub-physical cluster to be deleted are migrated and stored in the first second physical cluster or first sub-physical cluster in the clockwise direction; after completing the migration and storage of the storage files After that, the second physical cluster or sub-physical cluster to be deleted is deleted.
7. The device according to claim 7, further comprising: a search module;

The receiving module is further configured to receive a query request for the file to be stored;

The determining module is specifically configured to determine the target physical cluster according to the hash value of the file to be stored;

The retrieval module is used to retrieve the data file in the target physical cluster.
A non-volatile readable storage medium having computer readable instructions stored thereon, wherein the method for realizing data cluster storage when the computer readable instructions are executed by a processor includes:

Get all physical clusters used to store data;

Evenly map the physical cluster to the physical nodes of the consistent hash ring;

The uniformly mapping the physical clusters to the physical nodes of the consistent hash ring specifically includes: obtaining the storage space of the physical cluster; and the first physical cluster with the storage space greater than or equal to a preset threshold according to the preset Suppose the proportion is divided into multiple sub-physical clusters with equal space; according to the naming rule, the second physical cluster whose storage space is less than the preset threshold and each of the sub-physical clusters are configured with an identity code; according to the identity code Determine the hash value of the second physical cluster and each of the sub-physical clusters; use the hash value to calculate the physical node positions of the second physical cluster and the sub-physical cluster on the consistent hash ring;

Determine the optimal storage target physical cluster according to the hash value of the file to be stored;

Storing the file to be stored in the target physical cluster.
The computer-readable storage medium according to claim 13, wherein when the computer-readable instruction is executed by a processor, the determination of the optimal storage target physical cluster according to the hash value of the file to be stored comprises:

Calculating the hash value of the file to be stored according to the identification code of the file to be stored;

Using the hash value to determine the logical node position of the file to be stored on the consistent hash ring;

The first second physical cluster or the first sub-physical cluster that is checked out clockwise on the consistent hash ring with the location of the logical node as a starting point is determined as the target physical cluster.
The computer-readable storage medium according to claim 14, wherein after the computer-readable instructions are executed by the processor to store the file to be stored in the target physical cluster, the method further comprises:

Receive addition and deletion instructions to the physical cluster;

Updating the second physical cluster and/or the sub-physical cluster on the physical node according to the addition and deletion instruction;

Adjust the storage location of the file to be stored that meets the preset conditions.
The computer-readable storage medium according to claim 15, wherein after the computer-readable instructions are executed by the processor to store the file to be stored in the target physical cluster, the method further comprises:

Receiving a query request for the file to be stored;

Determining the target physical cluster according to the hash value of the file to be stored;

The data file is retrieved in the target physical cluster.
A computer device, including a non-volatile readable storage medium, a processor, and computer readable instructions stored on the non-volatile readable storage medium and running on the processor, characterized in that the processor The method for realizing data cluster storage when executing the computer-readable instruction includes:

Get all physical clusters used to store data;

Evenly map the physical cluster to the physical nodes of the consistent hash ring;

The uniformly mapping the physical clusters to the physical nodes of the consistent hash ring specifically includes: obtaining the storage space of the physical cluster; and the first physical cluster with the storage space greater than or equal to a preset threshold according to the preset Suppose the proportion is divided into multiple sub-physical clusters with equal space; according to the naming rule, the second physical cluster whose storage space is less than the preset threshold and each of the sub-physical clusters are configured with an identity code; according to the identity code Determine the hash value of the second physical cluster and each of the sub-physical clusters; use the hash value to calculate the physical node positions of the second physical cluster and the sub-physical cluster on the consistent hash ring;

Determine the optimal storage target physical cluster according to the hash value of the file to be stored;

Storing the file to be stored in the target physical cluster.
The computer device according to claim 17, wherein, when the computer-readable instructions are executed by a processor, the determination of the optimal storage target physical cluster according to the hash value of the file to be stored comprises:

Calculating the hash value of the file to be stored according to the identification code of the file to be stored;

Using the hash value to determine the logical node position of the file to be stored on the consistent hash ring;

The first second physical cluster or the first sub-physical cluster that is checked out clockwise on the consistent hash ring with the location of the logical node as a starting point is determined as the target physical cluster.
The computer device according to claim 18, wherein after the computer-readable instructions are executed by the processor to store the file to be stored in the target physical cluster, the method further comprises:

Receive addition and deletion instructions to the physical cluster;

Updating the second physical cluster and/or the sub-physical cluster on the physical node according to the addition and deletion instruction;

Adjust the storage location of the file to be stored that meets the preset conditions.
The computer device according to claim 19, wherein after the computer-readable instructions are executed by the processor to store the file to be stored in the target physical cluster, the method further comprises:

Receiving a query request for the file to be stored;

Determining the target physical cluster according to the hash value of the file to be stored;

The data file is retrieved in the target physical cluster.