CN111177154A - Distributed database caching method and hash ring optimization thereof - Google Patents

Distributed database caching method and hash ring optimization thereof Download PDF

Info

Publication number
CN111177154A
CN111177154A CN201911390078.8A CN201911390078A CN111177154A CN 111177154 A CN111177154 A CN 111177154A CN 201911390078 A CN201911390078 A CN 201911390078A CN 111177154 A CN111177154 A CN 111177154A
Authority
CN
China
Prior art keywords
cache
node
nodes
hash
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911390078.8A
Other languages
Chinese (zh)
Other versions
CN111177154B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingu Zhitong Green Chain Technology Co.,Ltd.
Original Assignee
Zhangxun Yitong Beijing Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhangxun Yitong Beijing Information Technology Co Ltd filed Critical Zhangxun Yitong Beijing Information Technology Co Ltd
Priority to CN201911390078.8A priority Critical patent/CN111177154B/en
Publication of CN111177154A publication Critical patent/CN111177154A/en
Application granted granted Critical
Publication of CN111177154B publication Critical patent/CN111177154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a distributed database cache and a hash ring optimization method thereof. Firstly, a hash value with fixed length of URL is obtained by a hash algorithm, and then a series of subsequent operations of searching for a cache object are carried out by using the hash value. Hashing is performed to find a 128-bit value for all the input strings: hashing all URLs to one [0, 2 ]128‑1]And a one-to-one mapping is made with the URL. The method adopts a self-decision virtual node migration mutual aid strategy to perform perception monitoring on the operation parameters of each cache node in the distributed proxy cache, judge whether local overheating occurs or not and determine whether other abnormity occurs or not, and select a cache node set with lower current load as an auxiliary according to a given multi-copy hierarchical management strategy to share cache nodes with higher load and reduced performance on a hash ringThe corresponding virtual nodes achieve hot spot migration and avoid single-point faults.

Description

Distributed database caching method and hash ring optimization thereof
Technical Field
The invention relates to the technical field of computer application, in particular to a distributed database caching method and a hash ring optimization method thereof.
Background
In a database cluster, addition or deletion of physical nodes is the most basic function of a cluster management system. If the conventional hash algorithms such as hash modulo and random number fetching are adopted, a large number of original caches are reestablished under the condition that physical nodes are added or deleted, so that serious performance cost is brought to the whole database cluster system, even the normal operation of a business system is influenced, and the monotonicity principle is seriously violated.
The consistent hash algorithm is presented to ensure monotonicity of the algorithm, that is, when a physical node is removed or added, the influence of the consistent hash algorithm on the existing cache mapping relationship is very small. Moreover, the more the number of physical nodes in the cluster, the better the consistent hash algorithm guarantees the monotonicity effect. The principle of the consistent hash algorithm is as follows:
(1) determining a hash ring and physical nodes on the ring.
A range of hash values is first determined, for example: (-216,216). All hash values of this range are considered as a clockwise increasing and end-to-end circular ring, called a hash ring.
A hash ring is a virtual data structure that does not actually exist. Suppose that six nodes a-F are distributed on the hash ring after hash computation. A schematic diagram of the hash ring is shown in fig. 1.
(2) Setting access mode of physical node
If there is a SQL query request, using the SQL sentence character string object as the KEY value of the hash algorithm, then the calculated hash value is mapped to a certain point in the hash ring, if the point does not correspond to a certain physical node, then clockwise searching is carried out along the hash ring (i.e. searching the physical node with the hash value larger than the hash value) until the physical node with mapping is found for the first time, the node is the determined target node (i.e. the minimum node with the hash value larger than the hash value), if the value exceeds 216If the node is still not found in the range, the first physical node is matched (because the nodes are connected end to end, the clockwise search can be regarded as always-on). And if the calculated hash value is between B and C, the matched physical node is the C node. If the hash value is greater than F, then the A physical node is hit.
(3) Adding processing of nodes
Suppose a G physical node is to be added, as shown by the gray circular box in FIG. 2.
The hash value of the physical node is calculated first, and the value is mapped to a certain point of the hash ring. Meanwhile, the access mode of the hash ring is not changed, the mapping relationship is changed by the hash value distributed between the node D and the node G, and the part of the hash value is mapped to the node G instead of the original node E after the node G is added. The original mapping relationship of the hash values distributed between the nodes G and E does not change, or the hash values are mapped to the nodes E. The result of this is that only a small number of cache misses after adding one physical node, need to be rebuilt. Although the problem of mapping relation change caused by node addition still exists after the consistent hash algorithm is applied, compared with the traditional hash modulo mode, the situation of mapping relation change is reduced to the lowest possible extent.
(4) Processing to delete a node
Assuming node G in fig. 2 needs to be deleted, the hash will revert back to the state of the original hash ring. At this time, the cache mapped to the hash value on the node G will inevitably miss, in which case, the part of the hash value will be mapped to the node E in a clockwise manner, and at this time, the only cache to be rebuilt is the cache of the hash value of the part of the hash value distributed from the node E to the node G. Therefore, the cache required to be reestablished for deleting the nodes on the hash ring is greatly reduced compared with the traditional hash modulo method.
The consistent hash algorithm is a load balancing algorithm which is widely used at present. The consistent hash algorithm well meets two factors of judging the quality of the hash algorithm, namely balance and monotonicity, in a dynamically changing cache environment.
(1) Balance: the balance means that all the buffer space should be fully utilized, so a good hash algorithm should be able to distribute the hash result as evenly as possible.
(2) Monotonicity: monotonicity means that after the original system cache is established stably, a new cache node is added in the system, and at the moment, the established cache can be mapped into a newly increased cache space but not into the original cache space.
However, after the above background analysis and research, the present invention finds that when the consistent hash algorithm is determined to be used, the hash algorithm used in the consistent hash must be determined, which is a very critical step, and determines the uniformity of the distribution of the nodes on the ring, and also determines the algorithm efficiency and other factors.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a distributed database caching method and hash ring optimization thereof. The whole process of the method achieves the effect of distributing the space of the cache object, so that a plurality of cache nodes in a background can work cooperatively, and the efficiency is improved. When a new cache node is added into the hash ring, the range charged by each original node cannot be changed greatly, the new node can only split the original range of one node, the redistribution of cache space cannot be caused when the new cache node is added, the fluctuation of system load is avoided, and the stability of the mechanism is ensured.
In order to solve the technical problem, the invention provides a caching method of a distributed database, which comprises the steps of firstly solving a hash value with a fixed URL length by using a hash algorithm, and then carrying out a series of subsequent operations of searching a cached object by using the hash value.
Preferably, the method further comprises: hashing is performed using the MD5 algorithm to obtain a 128-bit value for all input strings: hashing all URLs to one [0, 2 ]128-1]And performing one-to-one mapping with the URL; all of [0, 2 ]128-1]The hash range is seen as a circular structure, [0, 2 ]128-1]All hash values in the range are arranged in the order from big to small along the clockwise direction, and are uniformly distributed on the hash ring as a whole.
Preferably, the method further comprises: each cache node is responsible for all corresponding URLs in a section of range on the hash ring, and the IP value of each cache node is subjected to hash calculation to obtain a hash value.
Preferably, the method further comprises: after a request of a user comes, a front-end agent can obtain a URL contained in the request, firstly, a hash value of the URL is calculated, the hash value can correspond to a key value on a hash ring, then, a first node larger than the key value is searched along the clockwise direction of the hash ring, and the HTTP request can be positioned on a searched background cache server by the front-end agent; when a new cache node is added into the hash ring, the range charged by each original node does not change greatly, the new cache node only splits the original range of one node, and the redistribution of cache space cannot be caused when the new cache node is added.
Preferably, the method further comprises: the method comprises the steps of adopting a self-decision virtual node migration mutual aid strategy, carrying out perception monitoring on operation parameters of each Cache node in the distributed proxy Cache, judging whether local overheating occurs or not and other abnormalities exist, selecting a Cache node set with a lower current load as an aid according to a given multi-copy hierarchical management strategy, and sharing virtual nodes corresponding to Cache nodes with higher loads and reduced performance on a Cache hash ring.
Preferably, the self-decision virtual node migration policy further includes:
A. evaluating the state and the service capacity of the cache server;
B. selecting cache nodes with overheating states and reduced service capacity, migrating the virtual nodes of the cache nodes, selecting cache nodes with normal states and stronger service capacity, fusing the migrated node lists, and bearing part of request loads of the cache nodes;
C. for different cache nodes, the number of virtual nodes to be migrated is determined.
Preferably, the method further comprises: adjusting the layer number of the hash ring; adjusting the virtual nodes to rebalance; and (6) data migration.
Preferably, the step of adjusting the number of layers of the hash ring further includes: if the number of the virtual nodes of the unit weight is reduced to the threshold value, the number of layers of the virtual nodes is increased by 1; if the number of virtual nodes of a unit weight is higher than a threshold, reducing the layers of the virtual nodes;
the adjusting the virtual node for rebalancing, further comprising: rebalancing occurs during the addition of a new node or the deletion/failure of an existing node in the cluster;
the step of data migration further comprises: and migrating the data in the deleted virtual node to the neighbor node.
In order to solve the technical problem, the invention further provides a hash ring optimization method in the distributed database caching method, which adopts a self-decision virtual node migration mutual aid strategy to perform sensing monitoring on operating parameters of each caching node in the distributed proxy Cache, judge whether local overheating occurs or other abnormity occurs, select a caching node set with a lower current load as an auxiliary according to a given multi-copy hierarchical management strategy, and share virtual nodes corresponding to caching nodes with higher load and reduced performance on the hash ring of the Cache;
preferably, the self-decision virtual node migration policy further includes:
A. evaluating the state and the service capacity of the cache server;
B. selecting cache nodes with overheating states and reduced service capacity, migrating the virtual nodes of the cache nodes, selecting cache nodes with normal states and stronger service capacity, fusing the migrated node lists, and bearing part of request loads of the cache nodes;
C. for different cache nodes, the number of virtual nodes to be migrated is determined.
Preferably, the step of evaluating the state and the service capability of the cache server further comprises: evaluating the busy degree of each cache node in all current cache nodes, and calculating the average value of the current states of all the nodes on the assumption that n cache nodes exist in the background
Figure BDA0002341890930000051
The calculation formula is as follows:
Figure BDA0002341890930000052
two states of the cache node are defined: (1) for cache node i, if
Figure BDA0002341890930000053
Then it is in a hot state; (2) for cache node i, if
Figure BDA0002341890930000054
The state is normal; when the cache cluster needs self-decision adjustment, at least one cache node is determined to be in a hot state, which cache nodes in the hot state need to be selected urgently to be adjusted, and a decision is made that virtual nodes on the cache nodes in the hot state should be migrated to which cache nodes in a normal state.
The beneficial effects of the invention include: the method adopts a self-decision virtual node migration mutual aid strategy to perform perception monitoring on the operating parameters of each Cache node in the distributed proxy Cache, judge whether local overheating occurs or not and judge other abnormalities, and select a Cache node set with a lower current load as an aid according to a given multi-copy hierarchical management strategy to share the virtual nodes corresponding to the Cache nodes with higher load and reduced performance on a Cache hash ring, thereby achieving hotspot migration and avoiding single-point faults. By dynamic adaptation, the availability will be higher.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only a part of the embodiments or prior art, and other similar or related drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of a hash ring according to the background of the invention;
fig. 2 is a diagram of a node added to a hash ring according to the background art of the present invention;
fig. 3 is a corresponding relationship between a virtual node and a physical node in a hash ring according to the embodiment of the present invention;
fig. 4 is a node data migration diagram in the hash ring according to the embodiment of the present invention;
FIG. 5 is a diagram illustrating a multi-layered consistent hash ring according to an embodiment of the present invention;
fig. 6 is a schematic diagram of creating a virtual node according to the embodiment of the present invention;
FIG. 7 is a diagram illustrating a distributed proxy cache architecture according to an embodiment of the present invention;
FIG. 8 is a diagram of a hash ring based URL space allocation according to an embodiment of the present invention;
fig. 9 is a migration mutual aid diagram of a virtual node according to the embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to examples. The present invention will be described in further detail below to make the objects, aspects and advantages of the present invention clearer and more clear, but the present invention is not limited to these examples.
If the existing physical nodes are distributed on the hash ring, the situation that the physical nodes are distributed unevenly on the hash ring is likely to occur, and the situation also has a great influence on whether the subsequent caches are distributed evenly, so that the load is unbalanced.
In one embodiment of the present invention, to solve the balancing problem, the present invention adds a concept of "virtual node" to the hash ring. The virtual nodes are simply understood to be some replicas of the physical nodes on the hash ring, but not actually existing nodes, and the number of the virtual nodes corresponding to one physical node is generally determined according to the actual situation. The virtual nodes and the physical nodes are arranged on the hash ring in the same way and have the same access way. After the virtual nodes are introduced, every time a physical node is added, a corresponding number of virtual nodes need to be added on the hash ring, and if the physical node is deleted, all corresponding virtual nodes need to be deleted on the hash ring.
In FIG. 3, nodes A-L represent 12 virtual nodes, nodes 1-4 represent 4 physical nodes, and each three virtual nodes correspond to a physical node. The hash value calculated by the SQL string object corresponds to a certain node on the ring, then the corresponding virtual node is found according to the access mode of the hash ring, and then the physical node is mapped. The more the number of the nodes in the cluster is, the better the balance effect of the consistent hash algorithm is, so that after the virtual nodes are introduced, the number of the nodes on the hash ring is greatly increased, and the balance effect of the consistent hash algorithm is better.
The HASH algorithm has three major advantages: unidirectionality, impact resistance, distribution uniformity. The consistent HASH algorithm is more the most commonly adopted algorithm for distributed load balancing and storage balancing at present. The implementation method of the algorithm includes that a value of a machine node is obtained through a HASH function, and the machine node is mapped to 0-232In the space of (a). When the data is stored, the data block is firstly subjected to HASH calculation to obtain a HASH value, and then the data block is corresponding to a certain position in the ring according to the HASH value. The hash value key1 is calculated as the left image data object1 in FIG. 4, and then a machine Node1 is found clockwise, and the object1 is stored in the Node of Node 1.
If a Node2 Node goes down, the data on Node2 will migrate to the Node3 Node. This has the advantage that even if a node is broken, data accesses from other nodes can still be quickly located using the HASH algorithm, only affecting neighboring nodes.
Consistent hashing is widely used in distributed systems, including distributed databases. It provides a mode of mapping keys to specific nodes around the ring. However, the existing consistent hashing method cannot guarantee accurate balance of data. For example: when expanded, the newly added node will only share the workload of the neighboring nodes. The same is true after the node is deleted from the system. This will result in an unbalanced state after a node is deleted or added.
In addition, given that the capacity of the nodes in a distributed system is not always balanced in the beginning, it is not easy to shift the entire cluster to a balanced state. In the previous embodiment of the present invention, the method designed in fig. 3 presents virtual nodes to map one physical node to multiple virtual nodes to maintain balance. However, the method of FIG. 3 does not consider the relationship of virtual nodes belonging to the same physical node. Once a physical node is turned off, a data avalanche situation may occur.
Thus, as shown in FIG. 5, in another embodiment of the present invention, our idea is to create and use a multi-layered consistent hash ring for a distributed database and includes:
weights for the physical nodes are calculated according to the capacities and rules.
The first hash ring is constructed and different weights are assigned to the physical nodes.
A second hash ring is constructed for certain physical nodes, where the second hash ring has more capacity and the nodes in the second hash ring are virtual nodes.
If some virtual nodes are unbalanced, more hash rings are constructed.
Finally, a multi-layered consistent hash ring is constructed.
Multiple layers of hashes are used to locate and access data in the ring.
Upon failure of a node, non-first tier nodes are bypassed directly to accelerate rebalancing.
Hash ring startup
In the hash ring starting process of the embodiment, the weight of each physical node, such as storage size, storage type, storage time, storage risk, etc., is calculated according to the rule and the capacity.
In the embodiment, we use only memory size as one major factor for simplicity of explanation. However, rules and other factors may be specified as desired.
According to the weights of different physical nodes, different virtual nodes are defined, and a multilayer hash ring is generated. In real-world data processing, there may also be only a first tier of some physical nodes, and no second and more tiers.
(II) creating virtual nodes
The capacity of each virtual node is guaranteed to be the same during the creation of the virtual node. The number of virtual nodes per physical disk depends on the weight. The mapping relationship between the virtual node and the physical node will be recorded as metadata in the mapping table.
If the number of virtual nodes is not large enough, the balance of consistent hashes is broken. Therefore, the nodes in each layer of hash ring should be recorded by taking into account that the lowest threshold of the virtual nodes has multi-level indexes.
(III) adjusting the number of layers of the hash ring
The adjustment rule is: if the number of virtual nodes of the unit weight is reduced to a threshold value, the number of layers of the virtual nodes is increased by 1.
On the other hand, if the number of virtual nodes of the unit weight is higher than the threshold, the layer of the virtual nodes is decreased.
(IV) adjusting the virtual nodes for rebalancing
Rebalancing may occur during the addition of a new node or the deletion/failure of an existing node in the cluster.
During the addition of a node:
-re-computing the weights of all nodes in the first hash ring;
-constructing a second or more level hash ring for the newly added node;
-adding new data to the newly added node and moving some existing data to the newly added node.
During the deletion of a node:
if a node failure is detected, directly bypassing all virtual nodes of the second or higher layer, thus preventing a large number of unnecessary node redirections;
all data in the failed node will move to the adjacent hash ring.
And (V) data migration:
for rebalancing, data in the deleted virtual node should be migrated to the neighboring nodes. If the newly added node will find the same position in the consistent HASH circle as the deletion. Migration occurs only between nodes.
Finally, how to find:
by examining the data structure and data flow and looking to see if there are multiple layers of consistent hash rings.
By examining the user manual and system behavior.
In another embodiment of the present invention, a hash ring storage mechanism is disclosed: distributed proxy cache architecture. As shown in fig. 7, a front-end proxy server needs to manage numerous background cache server nodes, and when receiving a request from a user, the front-end proxy server goes to the background cache server to obtain requested data, and the cache server first searches for corresponding Web content in a local cache space, and if found, returns the Web content to the front-end proxy server; if the corresponding Web content is not found, the content is obtained from the original Web server of the request, and then the Web content is sent to the front-end proxy server, and a copy of the cache object is stored locally, so that the subsequent request response time is accelerated.
In the distributed proxy cache, all HTTP requests of network users are distributed to each cache after proxy, and due to the existence of a plurality of cache nodes in the background, compared with the single cache, the cache space and the cache content are greatly increased, and the response time of the foreground proxy is reduced, so that the overall performance is improved. Obviously, if a plurality of cache nodes in the background store the same content, the advantages will be greatly reduced, and the performance improvement can only be ensured by distributing the space of the whole cache copy to the cache nodes.
For the HTTP request of the user, the most useful information is the URL, which is also the basis for content search of subsequent cache nodes, so it is easy to think of dividing the cache space by using the URL, but the URL has various lengths, and the difficulty is very high if the URL is not directly used. In order to solve this problem, a Hash value with a fixed length of the URL may be obtained by a specific Hash algorithm, and then a series of subsequent operations of searching for the cache object may be performed using the Hash value. The invention provides a Hash mechanism-based method for distributing the space of the whole URL among a plurality of cache nodes, namely the space of the whole cache. Specifically, the MD5 algorithm is utilized in the inventionLine hashing, which will find a 128-bit value for all the input strings, i.e. all URLs will be hashed to a value of 0, 2128-1]Above the space of (a). Considering the actual number of URLs, the spatial range is sufficient to contain all the content, and a one-to-one mapping with the URLs is possible.
To facilitate implementation of the URL routing mechanism, as shown in FIG. 8, the whole [0, 2 ] will be used128-1]The hash range is seen as a circular structure, [0, 2 ]128-1]All hash values in the range are arranged in the order from big to small along the clockwise direction, and are uniformly distributed on the hash ring as a whole. For each caching node in fig. 7, it should be responsible for a segment of the range on the hash ring, i.e., all URLs corresponding within the range. The IP value of each caching node is hashed, just as nodes 1 to 4 represent four caching nodes in the figure, the range between node1 and node2 is the URL range that caching node2 should take charge of, and so on. It is worth noting that the range for node3 spans the maximum and starts from the minimum, which is just the advantage of a hash ring, and is sufficient to cover all ranges on the ring. After the user's request comes, the front-end proxy will obtain the URL included in the request, first calculate the hash value of the URL, the hash value will be for a key value on the hash ring, as shown in fig. 8, then look up the first node larger than the key value clockwise along the hash ring, this node represents a cache server in the actual environment, so the front-end proxy will locate the HTTP request on the just found background cache server. The whole process achieves the effect of distributing the space of the cache object, so that a plurality of cache nodes in the background can work cooperatively, and the efficiency is improved. Meanwhile, the method has the characteristics that when a new cache node5 is added into the hash ring, the range in charge of each original node does not change greatly, node5 only splits the original range of one node, and cache space cannot be redistributed when a new cache node is added, so that system load fluctuation is avoided, and the stability of the mechanism is ensured.
Although the storage mechanism of the hash ring has many advantages in cache management, there is a problem that, due to the inherent property of the MD5 hash algorithm, it cannot be guaranteed that each real cache node IP is uniformly distributed over the whole hash ring after hashing. As shown in fig. 8, node1 and node5 have widely different responsible ranges, and obviously have a high probability of receiving HTTP requests with a wide range, and vice versa. In the case of a large number of requests, this may cause uneven cache load tasks of each cache node, and it is also not easy to adjust the load of each cache node unless the cache nodes are hashed again, which is obviously not feasible and too costly. The improved MD5 algorithm is also less helpful in solving this problem because randomness is a big feature of the hashing algorithm and cannot make the IP hashed have some evenly spaced property.
The root cause of the above problems is that each cache node corresponds to only one node on the hash ring, so the idea of virtual nodes is introduced on the hash ring to solve the problem of uneven distribution of cache space. That is, a mechanism of multi-level hash ring is adopted, and one physical node corresponds to a plurality of virtual nodes. The core of the idea of the virtual nodes is that each cache server corresponds to a plurality of virtual nodes v-node except one node on the hash ring, all ranges for which the nodes and v-nodes are responsible are cache URL spaces distributed by the real cache servers, and a route query mechanism is not changed. Three real Cache nodes correspond to node, node2 and node3 on the hash ring of the Cache, so that each Cache node has multiple copies on the hash ring, and the copies can more uniformly divide the whole URL space due to the randomness of the hash. The number of virtual nodes v-node generated by each Cache node on the Cache hash ring is calculated according to the service capability of the Cache node, the number of virtual nodes corresponding to the Cache with strong service capability is large, the coverage URL range is wide, and therefore the number of user requests distributed by a front-end agent is large; and vice versa. For example, node1 would be mapped to v-node1a, v-node1b, node2 would be mapped to v-node2a, v-node2b, and node3 would be mapped to v-node3a, v-node3b, v-node3 c.
The number of virtual nodes per cache node is closely related to its current service capability. In order to enable the distributed proxy cache system to autonomously sense the operation condition of the system, the autonomous decision module then predicts the operation state of a future cache node, comprehensively evaluates the service capability of the cache node, dynamically decides the distribution number of virtual nodes on the hash ring, reduces the number of virtual nodes of the overloaded cache node, and reallocates the virtual nodes to the cache node with stronger current service performance, wherein the whole process is shown in fig. 9.
Fig. 9 adopts a self-decision-making virtual node migration mutual aid strategy, performs sensing monitoring on operating parameters of each Cache node in the distributed proxy Cache, determines whether local overheating occurs or not due to other abnormalities, selects a Cache node set with a lower current load as an auxiliary according to a given multi-copy hierarchical management strategy, shares virtual nodes corresponding to Cache nodes with higher loads and reduced performance on a Cache hash ring, achieves hot spot migration, and avoids single point faults. The self-decision virtual node migration mutual assistance is essentially to solve three problems, namely, the state and the service capacity of a cache server are evaluated; selecting cache nodes with overheating states and reduced service capacity, migrating the virtual nodes of the cache nodes, selecting cache nodes with normal states and stronger service capacity, fusing migrated node lists, and bearing part of request loads of the cache nodes; and thirdly, determining the number of the migrated virtual nodes for different cache nodes.
For the evaluation of the state and the service capability of the cache server, the perception monitoring part is comprehensively added to obtain
Figure BDA0002341890930000111
It represents the current state and service capabilities of the caching node.
Figure BDA0002341890930000112
The larger the cache node, the more busy the cache node is represented, and the smaller the available service capacity is;
Figure BDA0002341890930000113
the smaller, the more free the representative cache node, the greater the available service capacity.
Since the background has many cache nodes in the distributed proxy cache, the state values of many cache nodes are received in each autonomous decision part. In order to evaluate the busy degree of each cache node in all current cache nodes, assuming that n cache nodes exist in the background, calculating the average value of the current states of all the nodes
Figure BDA0002341890930000121
The calculation formula is shown in the following formula 1:
Figure BDA0002341890930000122
two states of the cache node are defined: (1) for cache node i, if
Figure BDA0002341890930000123
Then it is in a hot state; (2) for cache node i, if
Figure BDA0002341890930000124
It is in a normal state. When the cache cluster needs self-decision adjustment, at least one cache node is determined to be in a hot state, which cache nodes in the hot state need to be selected urgently to be adjusted, and a decision is made that virtual nodes on the cache nodes in the hot state should be migrated to which cache nodes in a normal state.
First, define the set of cache nodes in all hot states as H, and n1H |; the set of cache nodes containing all normal states is N, and N2N |, then they satisfy equation 2:
n1+n2h + N |, N formula 2
Secondly, all elements in H are arranged according to the descending order of the state values to obtain a sequence SHThe sequence represents an order of how busy all hot state cache nodes are, the more advanced the element, the more the cache node isThe smaller the available service capacity is, the less, its formula 3 is as follows:
SH={SH。1,SH。2...SH。i...}(SH。i≥SH。(i+1)and i ═ 1, 2,. n1) Equation 3
Next, all elements in N are arranged according to the ascending order of the state values, and a sequence S is obtainedNThe more advanced the cache node in the sequence is, the higher the service capability is, and its formula 4 is as follows:
SN={SN。1,SN。2...SN。i..}(SN。i≤SN。(i+1)and i ═ 1, 2,. n2) Equation 4
Obviously, if only for SHThe first element, namely the cache node with the largest state value is adjusted, the adjusting effect of each time is only limited to one cache node, and the condition that the difference between the largest state value and the next largest state value in the elements is small is not considered; if to SHThe virtual nodes of all the elements are adjusted, so that the overhead is high, the thermal degree of a plurality of nodes is low, and the adjustment is not needed. Taken together, from the sequence SHThe first element of (a) begins to select a prefix subsequence that will be the most hot state cache node that needs to be adjusted because their state values are relatively high, already in a busy state, and the service capacity has begun to decrease.
Definition of SHThe prefix subsequence finally obtained in the step (A) is Ssub-HThe prefix subsequence is initially
Figure BDA0002341890930000132
The method for calculating the sequence is as follows:
(1) if | SHI is less than or equal to 2, then SH-=Ssub-HSelecting all elements, and ending the process;
(2) if | SHI > 2, first of all SHolAnd SHo2Adding two elements;
(3) suppose SHoiThe previous elements have all been added with Ssub-HIn, if SHoiIf the following condition is satisfied, then S isHoiAdding Ssub-HThe condition is expressed as formula 5:
Figure BDA0002341890930000131
(4) if S isHoiSatisfies the conditional expression in (3) of equation 5, then for SHNext element S in (1)Ho(i+1)Repeating the judgment in the step (3); if the conditional expression formula 5 is not satisfied, the entire selection process ends.
Equation 5 for SHThe element interval difference in the sequence is judged, and S is finally selectedHA prefix subsequence S ofsub-HFrom SHThe elements with the minimum interval are formed, and the elements are uniformly assigned to the hottest type of cache nodes, namely the cache node set which needs to be adjusted at this time. Select Ssub-HThen, the slave S is requiredNDetermining a corresponding set of mutually-assisted cache nodes, wherein the cache nodes in the set are fused with Ssub-HAnd the virtual node of the middle heat state cache node. Equation 5 has assumed n2=|SNFor S |, forsub-HOne hot state cache node S insub-HoiIts corresponding mutual idle cache node is SNoiWhere j ═ i% n2
Although the present invention has been described with reference to a few embodiments, it should be understood that the present invention is not limited to the above embodiments, but rather, the present invention is not limited to the above embodiments, and those skilled in the art can make various changes and modifications without departing from the scope of the invention.

Claims (10)

1. A cache method of a distributed database is characterized in that a hash value with a fixed length of a URL is obtained by a hash algorithm, and then a series of subsequent operations of searching a cache object are carried out by using the hash value.
2. The method for caching a distributed database according to claim 1, further comprising: hashing is performed using the MD5 algorithm to obtain a 128-bit value for all input strings: hashing all URLs to one [0, 2 ]128-1]And performing one-to-one mapping with the URL; all of [0, 2 ]128-1]The hash range is seen as a circular structure, [0, 2 ]128-1]All hash values in the range are arranged in the order from big to small along the clockwise direction, and are uniformly distributed on the hash ring as a whole.
3. The method for caching a distributed database according to claim 2, further comprising: each cache node is responsible for all corresponding URLs in a section of range on the hash ring, and the IP value of each cache node is subjected to hash calculation to obtain a hash value.
4. The method for caching a distributed database according to claim 1, further comprising: after a request of a user comes, a front-end agent can obtain a URL contained in the request, firstly, a hash value of the URL is calculated, the hash value can correspond to a key value on a hash ring, then, a first node larger than the key value is searched along the clockwise direction of the hash ring, and the HTTP request can be positioned on a searched background cache server by the front-end agent; when a new cache node is added into the hash ring, the range charged by each original node does not change greatly, the new cache node only splits the original range of one node, and the redistribution of cache space cannot be caused when the new cache node is added.
5. The method for caching a distributed database according to claim 1, further comprising: the method comprises the steps of adopting a self-decision virtual node migration mutual aid strategy, carrying out perception monitoring on operation parameters of each Cache node in the distributed proxy Cache, judging whether local overheating occurs or not and other abnormalities exist, selecting a Cache node set with a lower current load as an aid according to a given multi-copy hierarchical management strategy, and sharing virtual nodes corresponding to Cache nodes with higher loads and reduced performance on a Cache hash ring.
6. The method for caching a distributed database according to claim 5, wherein said self-decision-making virtual node migration policy further comprises:
A. evaluating the state and the service capacity of the cache server;
B. selecting cache nodes with overheating states and reduced service capacity, migrating the virtual nodes of the cache nodes, selecting cache nodes with normal states and stronger service capacity, fusing the migrated node lists, and bearing part of request loads of the cache nodes;
C. for different cache nodes, the number of virtual nodes to be migrated is determined.
7. The method for caching a distributed database according to claim 2, further comprising: adjusting the layer number of the hash ring; adjusting the virtual nodes to rebalance; and (6) data migration.
8. The method for caching distributed database according to claim 7,
the step of adjusting the number of layers of the hash ring further comprises: if the number of the virtual nodes of the unit weight is reduced to the threshold value, the number of layers of the virtual nodes is increased by 1; if the number of virtual nodes of a unit weight is higher than a threshold, reducing the layers of the virtual nodes;
the adjusting the virtual node for rebalancing, further comprising: rebalancing occurs during the addition of a new node or the deletion/failure of an existing node in the cluster;
the step of data migration further comprises: and migrating the data in the deleted virtual node to the neighbor node.
9. A hash ring optimization method in a distributed database caching method according to any one of claims 1 to 8, characterized in that a self-decision-making virtual node migration mutual aid strategy is adopted, the operating parameters of each Cache node in a distributed proxy Cache are perceptively monitored, whether local overheating occurs or other abnormality is judged, and according to a given multi-copy hierarchical management strategy, a Cache node set with a lower current load is selected as an auxiliary to share a virtual node corresponding to a Cache node with a higher load and a reduced performance on a Cache hash ring;
the self-decision virtual node migration policy further comprises:
A. evaluating the state and the service capacity of the cache server;
B. selecting cache nodes with overheating states and reduced service capacity, migrating the virtual nodes of the cache nodes, selecting cache nodes with normal states and stronger service capacity, fusing the migrated node lists, and bearing part of request loads of the cache nodes;
C. for different cache nodes, the number of virtual nodes to be migrated is determined.
10. The hash ring optimization method of claim 9, wherein the step of evaluating the state and service capability of the cache server further comprises: evaluating the busy degree of each cache node in all current cache nodes, and calculating the average value of the current states of all the nodes on the assumption that n cache nodes exist in the background
Figure FDA0002341890920000031
The calculation formula is as follows:
Figure FDA0002341890920000032
two states of the cache node are defined: (1) for cache node i, if
Figure FDA0002341890920000033
Then it is in a hot state; (2) for cache node i, if
Figure FDA0002341890920000034
The state is normal; when the cache cluster needs self-decision adjustment, at least one cache node is determined to be in a hot state, which cache nodes in the hot state need to be selected urgently to be adjusted, and a decision is made that virtual nodes on the cache nodes in the hot state should be migrated to which cache nodes in a normal state.
CN201911390078.8A 2019-12-27 2019-12-27 Distributed database caching method and hash ring optimization thereof Active CN111177154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911390078.8A CN111177154B (en) 2019-12-27 2019-12-27 Distributed database caching method and hash ring optimization thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911390078.8A CN111177154B (en) 2019-12-27 2019-12-27 Distributed database caching method and hash ring optimization thereof

Publications (2)

Publication Number Publication Date
CN111177154A true CN111177154A (en) 2020-05-19
CN111177154B CN111177154B (en) 2023-07-25

Family

ID=70650459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911390078.8A Active CN111177154B (en) 2019-12-27 2019-12-27 Distributed database caching method and hash ring optimization thereof

Country Status (1)

Country Link
CN (1) CN111177154B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111917853A (en) * 2020-07-24 2020-11-10 山东云缦智能科技有限公司 Optimization method for distributed cache scaling of content distribution network
CN112380288A (en) * 2020-11-16 2021-02-19 林亮 Decentralized distributed data processing system
CN113507522A (en) * 2021-07-08 2021-10-15 上海七牛信息技术有限公司 Method and system for improving hit rate of PCDN (Primary Contourlet distribution) network requests
CN113689103A (en) * 2021-08-18 2021-11-23 国电南瑞南京控制系统有限公司 Adaptive load balancing employing flow distribution intelligent scheduling management method, device and system
CN114629908A (en) * 2022-03-28 2022-06-14 浙江邦盛科技股份有限公司 Data fragmentation method based on server node hardware resource density
CN115297131A (en) * 2022-08-01 2022-11-04 东北大学 Sensitive data distributed storage method based on consistent hash
CN115981848A (en) * 2022-12-17 2023-04-18 郑州斋杆网络科技有限公司 Memory database fragmentation adjustment method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013387A (en) * 2007-02-09 2007-08-08 华中科技大学 Load balancing method based on object storage device
US20140359043A1 (en) * 2012-11-21 2014-12-04 International Business Machines Corporation High performance, distributed, shared, data grid for distributed java virtual machine runtime artifacts
US20170149660A1 (en) * 2014-07-30 2017-05-25 Huawei Technologies Co., Ltd. Packet transmission method, apparatus, and system
CN107197035A (en) * 2017-06-21 2017-09-22 中国民航大学 A kind of compatibility dynamic load balancing method based on uniformity hash algorithm
CN108810041A (en) * 2017-04-27 2018-11-13 华为技术有限公司 A kind of data write-in of distributed cache system and expansion method, device
CN109218438A (en) * 2018-10-12 2019-01-15 山东科技大学 A kind of performance optimization method of distributed cache server cluster

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013387A (en) * 2007-02-09 2007-08-08 华中科技大学 Load balancing method based on object storage device
US20140359043A1 (en) * 2012-11-21 2014-12-04 International Business Machines Corporation High performance, distributed, shared, data grid for distributed java virtual machine runtime artifacts
US20170149660A1 (en) * 2014-07-30 2017-05-25 Huawei Technologies Co., Ltd. Packet transmission method, apparatus, and system
CN108810041A (en) * 2017-04-27 2018-11-13 华为技术有限公司 A kind of data write-in of distributed cache system and expansion method, device
CN107197035A (en) * 2017-06-21 2017-09-22 中国民航大学 A kind of compatibility dynamic load balancing method based on uniformity hash algorithm
CN109218438A (en) * 2018-10-12 2019-01-15 山东科技大学 A kind of performance optimization method of distributed cache server cluster

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
巴子言: "基于虚节点的一致性哈希算法的优化" *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111917853A (en) * 2020-07-24 2020-11-10 山东云缦智能科技有限公司 Optimization method for distributed cache scaling of content distribution network
CN112380288A (en) * 2020-11-16 2021-02-19 林亮 Decentralized distributed data processing system
CN113507522A (en) * 2021-07-08 2021-10-15 上海七牛信息技术有限公司 Method and system for improving hit rate of PCDN (Primary Contourlet distribution) network requests
CN113689103A (en) * 2021-08-18 2021-11-23 国电南瑞南京控制系统有限公司 Adaptive load balancing employing flow distribution intelligent scheduling management method, device and system
CN113689103B (en) * 2021-08-18 2023-11-24 国电南瑞南京控制系统有限公司 Mining and shunting intelligent scheduling management method, device and system for self-adaptive load balancing
CN114629908A (en) * 2022-03-28 2022-06-14 浙江邦盛科技股份有限公司 Data fragmentation method based on server node hardware resource density
CN114629908B (en) * 2022-03-28 2023-10-13 浙江邦盛科技股份有限公司 Data slicing method based on hardware resource density of server node
CN115297131A (en) * 2022-08-01 2022-11-04 东北大学 Sensitive data distributed storage method based on consistent hash
CN115297131B (en) * 2022-08-01 2023-05-26 东北大学 Sensitive data distributed storage method based on consistent hash
CN115981848A (en) * 2022-12-17 2023-04-18 郑州斋杆网络科技有限公司 Memory database fragmentation adjustment method and device
CN115981848B (en) * 2022-12-17 2024-05-28 上海律保科技有限公司 Memory database fragment adjustment method and equipment

Also Published As

Publication number Publication date
CN111177154B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN111177154B (en) Distributed database caching method and hash ring optimization thereof
CN111159193B (en) Multi-layer consistent hash ring and application thereof in creating distributed database
CN106790324B (en) Content distribution method, virtual server management method, cloud platform and system
KR101928529B1 (en) Code Distributed Hash Table based MapReduce System and Method
CN110830562B (en) Limited load consistency Hash load balancing strategy based on virtual nodes
Xu et al. Drop: Facilitating distributed metadata management in eb-scale storage systems
CN101645919B (en) Popularity-based duplicate rating calculation method and duplicate placement method
US11140220B1 (en) Consistent hashing using the power of k choices in server placement
Xu et al. Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems
CN109617989B (en) Method, apparatus, system, and computer readable medium for load distribution
Kangasharju et al. Adaptive content management in structured P2P communities
JP4533923B2 (en) Super-peer with load balancing function in hierarchical peer-to-peer system and method of operating the super-peer
Inoue et al. Efficient content replication strategy for data sharing considering storage capacity restriction in hybrid Peer-to-Peer networks
CN111917853A (en) Optimization method for distributed cache scaling of content distribution network
Rahmani et al. A comparative study of replication schemes for structured P2P networks
Soltani et al. A dynamic popularity-aware load balancing algorithm for structured p2p systems
US11310309B1 (en) Arc jump: per-key selection of an alternative server when implemented bounded loads
March et al. Multi-attribute range queries on read-only DHT
CN108965387B (en) Balancing method and system for improving survivability of P2P data storage
CN111435345A (en) Tile data service system and method thereof
KR101690944B1 (en) Method and apparatus for managing distributed cache in consideration of load distribution in heterogeneous computing environment
Soltani et al. A LOAD BALANCING ALGORITHM BASED ON REPLICATION AND MOVEMENT OF DATA ITEMS FOR DYNAMIC STRUCTURED P2P SYSTEMS
CN117194439B (en) Method for creating resource storage system, electronic equipment and storage medium
Xu et al. C 2: adaptive load balancing for metadata server cluster in cloud-scale storage systems
Liu et al. Load balancing strategy for cloud computing based on dynamic replica technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Request for anonymity

Inventor before: Request for anonymity

CB03 Change of inventor or designer information
TR01 Transfer of patent right

Effective date of registration: 20240423

Address after: Building 7, No. 7 Taiping East Road (South), Mafang Town, Pinggu District, Beijing, 101200

Patentee after: Beijing Jingu Zhitong Green Chain Technology Co.,Ltd.

Country or region after: China

Address before: 3009-315, 3rd Floor, Building B, Building 1, Yard 2, Yongcheng North Road, Haidian District, Beijing, 100089

Patentee before: Zhangxun Yitong (Beijing) Information Technology Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right