CN111177154A

CN111177154A - Distributed database caching method and hash ring optimization thereof

Info

Publication number: CN111177154A
Application number: CN201911390078.8A
Authority: CN
Inventors: 不公告发明人
Original assignee: Zhangxun Yitong Beijing Information Technology Co Ltd
Current assignee: Beijing Jingu Zhitong Green Chain Technology Co.,Ltd.
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-05-19
Anticipated expiration: 2039-12-27
Also published as: CN111177154B

Abstract

The application discloses a distributed database cache and a hash ring optimization method thereof. Firstly, a hash value with fixed length of URL is obtained by a hash algorithm, and then a series of subsequent operations of searching for a cache object are carried out by using the hash value. Hashing is performed to find a 128-bit value for all the input strings: hashing all URLs to one [0, 2 ]¹²⁸‑1]And a one-to-one mapping is made with the URL. The method adopts a self-decision virtual node migration mutual aid strategy to perform perception monitoring on the operation parameters of each cache node in the distributed proxy cache, judge whether local overheating occurs or not and determine whether other abnormity occurs or not, and select a cache node set with lower current load as an auxiliary according to a given multi-copy hierarchical management strategy to share cache nodes with higher load and reduced performance on a hash ringThe corresponding virtual nodes achieve hot spot migration and avoid single-point faults.

Description

Distributed database caching method and hash ring optimization thereof

Technical Field

The invention relates to the technical field of computer application, in particular to a distributed database caching method and a hash ring optimization method thereof.

Background

In a database cluster, addition or deletion of physical nodes is the most basic function of a cluster management system. If the conventional hash algorithms such as hash modulo and random number fetching are adopted, a large number of original caches are reestablished under the condition that physical nodes are added or deleted, so that serious performance cost is brought to the whole database cluster system, even the normal operation of a business system is influenced, and the monotonicity principle is seriously violated.

The consistent hash algorithm is presented to ensure monotonicity of the algorithm, that is, when a physical node is removed or added, the influence of the consistent hash algorithm on the existing cache mapping relationship is very small. Moreover, the more the number of physical nodes in the cluster, the better the consistent hash algorithm guarantees the monotonicity effect. The principle of the consistent hash algorithm is as follows:

(1) determining a hash ring and physical nodes on the ring.

A range of hash values is first determined, for example: (-2¹⁶，2¹⁶). All hash values of this range are considered as a clockwise increasing and end-to-end circular ring, called a hash ring.

A hash ring is a virtual data structure that does not actually exist. Suppose that six nodes a-F are distributed on the hash ring after hash computation. A schematic diagram of the hash ring is shown in fig. 1.

(2) Setting access mode of physical node

If there is a SQL query request, using the SQL sentence character string object as the KEY value of the hash algorithm, then the calculated hash value is mapped to a certain point in the hash ring, if the point does not correspond to a certain physical node, then clockwise searching is carried out along the hash ring (i.e. searching the physical node with the hash value larger than the hash value) until the physical node with mapping is found for the first time, the node is the determined target node (i.e. the minimum node with the hash value larger than the hash value), if the value exceeds 2¹⁶If the node is still not found in the range, the first physical node is matched (because the nodes are connected end to end, the clockwise search can be regarded as always-on). And if the calculated hash value is between B and C, the matched physical node is the C node. If the hash value is greater than F, then the A physical node is hit.

(3) Adding processing of nodes

Suppose a G physical node is to be added, as shown by the gray circular box in FIG. 2.

The hash value of the physical node is calculated first, and the value is mapped to a certain point of the hash ring. Meanwhile, the access mode of the hash ring is not changed, the mapping relationship is changed by the hash value distributed between the node D and the node G, and the part of the hash value is mapped to the node G instead of the original node E after the node G is added. The original mapping relationship of the hash values distributed between the nodes G and E does not change, or the hash values are mapped to the nodes E. The result of this is that only a small number of cache misses after adding one physical node, need to be rebuilt. Although the problem of mapping relation change caused by node addition still exists after the consistent hash algorithm is applied, compared with the traditional hash modulo mode, the situation of mapping relation change is reduced to the lowest possible extent.

(4) Processing to delete a node

Assuming node G in fig. 2 needs to be deleted, the hash will revert back to the state of the original hash ring. At this time, the cache mapped to the hash value on the node G will inevitably miss, in which case, the part of the hash value will be mapped to the node E in a clockwise manner, and at this time, the only cache to be rebuilt is the cache of the hash value of the part of the hash value distributed from the node E to the node G. Therefore, the cache required to be reestablished for deleting the nodes on the hash ring is greatly reduced compared with the traditional hash modulo method.

The consistent hash algorithm is a load balancing algorithm which is widely used at present. The consistent hash algorithm well meets two factors of judging the quality of the hash algorithm, namely balance and monotonicity, in a dynamically changing cache environment.

(1) Balance: the balance means that all the buffer space should be fully utilized, so a good hash algorithm should be able to distribute the hash result as evenly as possible.

(2) Monotonicity: monotonicity means that after the original system cache is established stably, a new cache node is added in the system, and at the moment, the established cache can be mapped into a newly increased cache space but not into the original cache space.

However, after the above background analysis and research, the present invention finds that when the consistent hash algorithm is determined to be used, the hash algorithm used in the consistent hash must be determined, which is a very critical step, and determines the uniformity of the distribution of the nodes on the ring, and also determines the algorithm efficiency and other factors.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a distributed database caching method and hash ring optimization thereof. The whole process of the method achieves the effect of distributing the space of the cache object, so that a plurality of cache nodes in a background can work cooperatively, and the efficiency is improved. When a new cache node is added into the hash ring, the range charged by each original node cannot be changed greatly, the new node can only split the original range of one node, the redistribution of cache space cannot be caused when the new cache node is added, the fluctuation of system load is avoided, and the stability of the mechanism is ensured.

In order to solve the technical problem, the invention provides a caching method of a distributed database, which comprises the steps of firstly solving a hash value with a fixed URL length by using a hash algorithm, and then carrying out a series of subsequent operations of searching a cached object by using the hash value.

Preferably, the method further comprises: hashing is performed using the MD5 algorithm to obtain a 128-bit value for all input strings: hashing all URLs to one [0, 2 ]¹²⁸-1]And performing one-to-one mapping with the URL; all of [0, 2 ]¹²⁸-1]The hash range is seen as a circular structure, [0, 2 ]¹²⁸-1]All hash values in the range are arranged in the order from big to small along the clockwise direction, and are uniformly distributed on the hash ring as a whole.

Preferably, the method further comprises: each cache node is responsible for all corresponding URLs in a section of range on the hash ring, and the IP value of each cache node is subjected to hash calculation to obtain a hash value.

Preferably, the method further comprises: after a request of a user comes, a front-end agent can obtain a URL contained in the request, firstly, a hash value of the URL is calculated, the hash value can correspond to a key value on a hash ring, then, a first node larger than the key value is searched along the clockwise direction of the hash ring, and the HTTP request can be positioned on a searched background cache server by the front-end agent; when a new cache node is added into the hash ring, the range charged by each original node does not change greatly, the new cache node only splits the original range of one node, and the redistribution of cache space cannot be caused when the new cache node is added.

Preferably, the method further comprises: the method comprises the steps of adopting a self-decision virtual node migration mutual aid strategy, carrying out perception monitoring on operation parameters of each Cache node in the distributed proxy Cache, judging whether local overheating occurs or not and other abnormalities exist, selecting a Cache node set with a lower current load as an aid according to a given multi-copy hierarchical management strategy, and sharing virtual nodes corresponding to Cache nodes with higher loads and reduced performance on a Cache hash ring.

Preferably, the self-decision virtual node migration policy further includes:

A. evaluating the state and the service capacity of the cache server;

B. selecting cache nodes with overheating states and reduced service capacity, migrating the virtual nodes of the cache nodes, selecting cache nodes with normal states and stronger service capacity, fusing the migrated node lists, and bearing part of request loads of the cache nodes;

C. for different cache nodes, the number of virtual nodes to be migrated is determined.

Preferably, the method further comprises: adjusting the layer number of the hash ring; adjusting the virtual nodes to rebalance; and (6) data migration.

Preferably, the step of adjusting the number of layers of the hash ring further includes: if the number of the virtual nodes of the unit weight is reduced to the threshold value, the number of layers of the virtual nodes is increased by 1; if the number of virtual nodes of a unit weight is higher than a threshold, reducing the layers of the virtual nodes;

the adjusting the virtual node for rebalancing, further comprising: rebalancing occurs during the addition of a new node or the deletion/failure of an existing node in the cluster;

the step of data migration further comprises: and migrating the data in the deleted virtual node to the neighbor node.

In order to solve the technical problem, the invention further provides a hash ring optimization method in the distributed database caching method, which adopts a self-decision virtual node migration mutual aid strategy to perform sensing monitoring on operating parameters of each caching node in the distributed proxy Cache, judge whether local overheating occurs or other abnormity occurs, select a caching node set with a lower current load as an auxiliary according to a given multi-copy hierarchical management strategy, and share virtual nodes corresponding to caching nodes with higher load and reduced performance on the hash ring of the Cache;

preferably, the self-decision virtual node migration policy further includes:

A. evaluating the state and the service capacity of the cache server;

Preferably, the step of evaluating the state and the service capability of the cache server further comprises: evaluating the busy degree of each cache node in all current cache nodes, and calculating the average value of the current states of all the nodes on the assumption that n cache nodes exist in the background

The calculation formula is as follows:

two states of the cache node are defined: (1) for cache node i, if

Then it is in a hot state; (2) for cache node i, if

The state is normal; when the cache cluster needs self-decision adjustment, at least one cache node is determined to be in a hot state, which cache nodes in the hot state need to be selected urgently to be adjusted, and a decision is made that virtual nodes on the cache nodes in the hot state should be migrated to which cache nodes in a normal state.

The beneficial effects of the invention include: the method adopts a self-decision virtual node migration mutual aid strategy to perform perception monitoring on the operating parameters of each Cache node in the distributed proxy Cache, judge whether local overheating occurs or not and judge other abnormalities, and select a Cache node set with a lower current load as an aid according to a given multi-copy hierarchical management strategy to share the virtual nodes corresponding to the Cache nodes with higher load and reduced performance on a Cache hash ring, thereby achieving hotspot migration and avoiding single-point faults. By dynamic adaptation, the availability will be higher.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only a part of the embodiments or prior art, and other similar or related drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of a hash ring according to the background of the invention;

fig. 2 is a diagram of a node added to a hash ring according to the background art of the present invention;

fig. 3 is a corresponding relationship between a virtual node and a physical node in a hash ring according to the embodiment of the present invention;

fig. 4 is a node data migration diagram in the hash ring according to the embodiment of the present invention;

FIG. 5 is a diagram illustrating a multi-layered consistent hash ring according to an embodiment of the present invention;

fig. 6 is a schematic diagram of creating a virtual node according to the embodiment of the present invention;

FIG. 7 is a diagram illustrating a distributed proxy cache architecture according to an embodiment of the present invention;

FIG. 8 is a diagram of a hash ring based URL space allocation according to an embodiment of the present invention;

fig. 9 is a migration mutual aid diagram of a virtual node according to the embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to examples. The present invention will be described in further detail below to make the objects, aspects and advantages of the present invention clearer and more clear, but the present invention is not limited to these examples.

If the existing physical nodes are distributed on the hash ring, the situation that the physical nodes are distributed unevenly on the hash ring is likely to occur, and the situation also has a great influence on whether the subsequent caches are distributed evenly, so that the load is unbalanced.

In one embodiment of the present invention, to solve the balancing problem, the present invention adds a concept of "virtual node" to the hash ring. The virtual nodes are simply understood to be some replicas of the physical nodes on the hash ring, but not actually existing nodes, and the number of the virtual nodes corresponding to one physical node is generally determined according to the actual situation. The virtual nodes and the physical nodes are arranged on the hash ring in the same way and have the same access way. After the virtual nodes are introduced, every time a physical node is added, a corresponding number of virtual nodes need to be added on the hash ring, and if the physical node is deleted, all corresponding virtual nodes need to be deleted on the hash ring.

In FIG. 3, nodes A-L represent 12 virtual nodes, nodes 1-4 represent 4 physical nodes, and each three virtual nodes correspond to a physical node. The hash value calculated by the SQL string object corresponds to a certain node on the ring, then the corresponding virtual node is found according to the access mode of the hash ring, and then the physical node is mapped. The more the number of the nodes in the cluster is, the better the balance effect of the consistent hash algorithm is, so that after the virtual nodes are introduced, the number of the nodes on the hash ring is greatly increased, and the balance effect of the consistent hash algorithm is better.

The HASH algorithm has three major advantages: unidirectionality, impact resistance, distribution uniformity. The consistent HASH algorithm is more the most commonly adopted algorithm for distributed load balancing and storage balancing at present. The implementation method of the algorithm includes that a value of a machine node is obtained through a HASH function, and the machine node is mapped to 0-2³²In the space of (a). When the data is stored, the data block is firstly subjected to HASH calculation to obtain a HASH value, and then the data block is corresponding to a certain position in the ring according to the HASH value. The hash value key1 is calculated as the left image data object1 in FIG. 4, and then a machine Node1 is found clockwise, and the object1 is stored in the Node of Node 1.

If a Node2 Node goes down, the data on Node2 will migrate to the Node3 Node. This has the advantage that even if a node is broken, data accesses from other nodes can still be quickly located using the HASH algorithm, only affecting neighboring nodes.

Consistent hashing is widely used in distributed systems, including distributed databases. It provides a mode of mapping keys to specific nodes around the ring. However, the existing consistent hashing method cannot guarantee accurate balance of data. For example: when expanded, the newly added node will only share the workload of the neighboring nodes. The same is true after the node is deleted from the system. This will result in an unbalanced state after a node is deleted or added.

In addition, given that the capacity of the nodes in a distributed system is not always balanced in the beginning, it is not easy to shift the entire cluster to a balanced state. In the previous embodiment of the present invention, the method designed in fig. 3 presents virtual nodes to map one physical node to multiple virtual nodes to maintain balance. However, the method of FIG. 3 does not consider the relationship of virtual nodes belonging to the same physical node. Once a physical node is turned off, a data avalanche situation may occur.

Thus, as shown in FIG. 5, in another embodiment of the present invention, our idea is to create and use a multi-layered consistent hash ring for a distributed database and includes:

weights for the physical nodes are calculated according to the capacities and rules.

The first hash ring is constructed and different weights are assigned to the physical nodes.

A second hash ring is constructed for certain physical nodes, where the second hash ring has more capacity and the nodes in the second hash ring are virtual nodes.

If some virtual nodes are unbalanced, more hash rings are constructed.

Finally, a multi-layered consistent hash ring is constructed.

Multiple layers of hashes are used to locate and access data in the ring.

Upon failure of a node, non-first tier nodes are bypassed directly to accelerate rebalancing.

Hash ring startup

In the hash ring starting process of the embodiment, the weight of each physical node, such as storage size, storage type, storage time, storage risk, etc., is calculated according to the rule and the capacity.

In the embodiment, we use only memory size as one major factor for simplicity of explanation. However, rules and other factors may be specified as desired.

According to the weights of different physical nodes, different virtual nodes are defined, and a multilayer hash ring is generated. In real-world data processing, there may also be only a first tier of some physical nodes, and no second and more tiers.

(II) creating virtual nodes

The capacity of each virtual node is guaranteed to be the same during the creation of the virtual node. The number of virtual nodes per physical disk depends on the weight. The mapping relationship between the virtual node and the physical node will be recorded as metadata in the mapping table.

If the number of virtual nodes is not large enough, the balance of consistent hashes is broken. Therefore, the nodes in each layer of hash ring should be recorded by taking into account that the lowest threshold of the virtual nodes has multi-level indexes.

(III) adjusting the number of layers of the hash ring

The adjustment rule is: if the number of virtual nodes of the unit weight is reduced to a threshold value, the number of layers of the virtual nodes is increased by 1.

On the other hand, if the number of virtual nodes of the unit weight is higher than the threshold, the layer of the virtual nodes is decreased.

(IV) adjusting the virtual nodes for rebalancing

Rebalancing may occur during the addition of a new node or the deletion/failure of an existing node in the cluster.

During the addition of a node:

-re-computing the weights of all nodes in the first hash ring;

-constructing a second or more level hash ring for the newly added node;

-adding new data to the newly added node and moving some existing data to the newly added node.

During the deletion of a node:

if a node failure is detected, directly bypassing all virtual nodes of the second or higher layer, thus preventing a large number of unnecessary node redirections;

all data in the failed node will move to the adjacent hash ring.

And (V) data migration:

for rebalancing, data in the deleted virtual node should be migrated to the neighboring nodes. If the newly added node will find the same position in the consistent HASH circle as the deletion. Migration occurs only between nodes.

Finally, how to find:

by examining the data structure and data flow and looking to see if there are multiple layers of consistent hash rings.

By examining the user manual and system behavior.

In another embodiment of the present invention, a hash ring storage mechanism is disclosed: distributed proxy cache architecture. As shown in fig. 7, a front-end proxy server needs to manage numerous background cache server nodes, and when receiving a request from a user, the front-end proxy server goes to the background cache server to obtain requested data, and the cache server first searches for corresponding Web content in a local cache space, and if found, returns the Web content to the front-end proxy server; if the corresponding Web content is not found, the content is obtained from the original Web server of the request, and then the Web content is sent to the front-end proxy server, and a copy of the cache object is stored locally, so that the subsequent request response time is accelerated.

In the distributed proxy cache, all HTTP requests of network users are distributed to each cache after proxy, and due to the existence of a plurality of cache nodes in the background, compared with the single cache, the cache space and the cache content are greatly increased, and the response time of the foreground proxy is reduced, so that the overall performance is improved. Obviously, if a plurality of cache nodes in the background store the same content, the advantages will be greatly reduced, and the performance improvement can only be ensured by distributing the space of the whole cache copy to the cache nodes.

For the HTTP request of the user, the most useful information is the URL, which is also the basis for content search of subsequent cache nodes, so it is easy to think of dividing the cache space by using the URL, but the URL has various lengths, and the difficulty is very high if the URL is not directly used. In order to solve this problem, a Hash value with a fixed length of the URL may be obtained by a specific Hash algorithm, and then a series of subsequent operations of searching for the cache object may be performed using the Hash value. The invention provides a Hash mechanism-based method for distributing the space of the whole URL among a plurality of cache nodes, namely the space of the whole cache. Specifically, the MD5 algorithm is utilized in the inventionLine hashing, which will find a 128-bit value for all the input strings, i.e. all URLs will be hashed to a value of 0, 2¹²⁸-1]Above the space of (a). Considering the actual number of URLs, the spatial range is sufficient to contain all the content, and a one-to-one mapping with the URLs is possible.

To facilitate implementation of the URL routing mechanism, as shown in FIG. 8, the whole [0, 2 ] will be used¹²⁸-1]The hash range is seen as a circular structure, [0, 2 ]¹²⁸-1]All hash values in the range are arranged in the order from big to small along the clockwise direction, and are uniformly distributed on the hash ring as a whole. For each caching node in fig. 7, it should be responsible for a segment of the range on the hash ring, i.e., all URLs corresponding within the range. The IP value of each caching node is hashed, just as nodes 1 to 4 represent four caching nodes in the figure, the range between node1 and node2 is the URL range that caching node2 should take charge of, and so on. It is worth noting that the range for node3 spans the maximum and starts from the minimum, which is just the advantage of a hash ring, and is sufficient to cover all ranges on the ring. After the user's request comes, the front-end proxy will obtain the URL included in the request, first calculate the hash value of the URL, the hash value will be for a key value on the hash ring, as shown in fig. 8, then look up the first node larger than the key value clockwise along the hash ring, this node represents a cache server in the actual environment, so the front-end proxy will locate the HTTP request on the just found background cache server. The whole process achieves the effect of distributing the space of the cache object, so that a plurality of cache nodes in the background can work cooperatively, and the efficiency is improved. Meanwhile, the method has the characteristics that when a new cache node5 is added into the hash ring, the range in charge of each original node does not change greatly, node5 only splits the original range of one node, and cache space cannot be redistributed when a new cache node is added, so that system load fluctuation is avoided, and the stability of the mechanism is ensured.

Although the storage mechanism of the hash ring has many advantages in cache management, there is a problem that, due to the inherent property of the MD5 hash algorithm, it cannot be guaranteed that each real cache node IP is uniformly distributed over the whole hash ring after hashing. As shown in fig. 8, node1 and node5 have widely different responsible ranges, and obviously have a high probability of receiving HTTP requests with a wide range, and vice versa. In the case of a large number of requests, this may cause uneven cache load tasks of each cache node, and it is also not easy to adjust the load of each cache node unless the cache nodes are hashed again, which is obviously not feasible and too costly. The improved MD5 algorithm is also less helpful in solving this problem because randomness is a big feature of the hashing algorithm and cannot make the IP hashed have some evenly spaced property.

The root cause of the above problems is that each cache node corresponds to only one node on the hash ring, so the idea of virtual nodes is introduced on the hash ring to solve the problem of uneven distribution of cache space. That is, a mechanism of multi-level hash ring is adopted, and one physical node corresponds to a plurality of virtual nodes. The core of the idea of the virtual nodes is that each cache server corresponds to a plurality of virtual nodes v-node except one node on the hash ring, all ranges for which the nodes and v-nodes are responsible are cache URL spaces distributed by the real cache servers, and a route query mechanism is not changed. Three real Cache nodes correspond to node, node2 and node3 on the hash ring of the Cache, so that each Cache node has multiple copies on the hash ring, and the copies can more uniformly divide the whole URL space due to the randomness of the hash. The number of virtual nodes v-node generated by each Cache node on the Cache hash ring is calculated according to the service capability of the Cache node, the number of virtual nodes corresponding to the Cache with strong service capability is large, the coverage URL range is wide, and therefore the number of user requests distributed by a front-end agent is large; and vice versa. For example, node1 would be mapped to v-node1a, v-node1b, node2 would be mapped to v-node2a, v-node2b, and node3 would be mapped to v-node3a, v-node3b, v-node3 c.

The number of virtual nodes per cache node is closely related to its current service capability. In order to enable the distributed proxy cache system to autonomously sense the operation condition of the system, the autonomous decision module then predicts the operation state of a future cache node, comprehensively evaluates the service capability of the cache node, dynamically decides the distribution number of virtual nodes on the hash ring, reduces the number of virtual nodes of the overloaded cache node, and reallocates the virtual nodes to the cache node with stronger current service performance, wherein the whole process is shown in fig. 9.

Fig. 9 adopts a self-decision-making virtual node migration mutual aid strategy, performs sensing monitoring on operating parameters of each Cache node in the distributed proxy Cache, determines whether local overheating occurs or not due to other abnormalities, selects a Cache node set with a lower current load as an auxiliary according to a given multi-copy hierarchical management strategy, shares virtual nodes corresponding to Cache nodes with higher loads and reduced performance on a Cache hash ring, achieves hot spot migration, and avoids single point faults. The self-decision virtual node migration mutual assistance is essentially to solve three problems, namely, the state and the service capacity of a cache server are evaluated; selecting cache nodes with overheating states and reduced service capacity, migrating the virtual nodes of the cache nodes, selecting cache nodes with normal states and stronger service capacity, fusing migrated node lists, and bearing part of request loads of the cache nodes; and thirdly, determining the number of the migrated virtual nodes for different cache nodes.

For the evaluation of the state and the service capability of the cache server, the perception monitoring part is comprehensively added to obtain

It represents the current state and service capabilities of the caching node.

The larger the cache node, the more busy the cache node is represented, and the smaller the available service capacity is;

the smaller, the more free the representative cache node, the greater the available service capacity.

Since the background has many cache nodes in the distributed proxy cache, the state values of many cache nodes are received in each autonomous decision part. In order to evaluate the busy degree of each cache node in all current cache nodes, assuming that n cache nodes exist in the background, calculating the average value of the current states of all the nodes

The calculation formula is shown in the following formula 1:

two states of the cache node are defined: (1) for cache node i, if

Then it is in a hot state; (2) for cache node i, if

It is in a normal state. When the cache cluster needs self-decision adjustment, at least one cache node is determined to be in a hot state, which cache nodes in the hot state need to be selected urgently to be adjusted, and a decision is made that virtual nodes on the cache nodes in the hot state should be migrated to which cache nodes in a normal state.

First, define the set of cache nodes in all hot states as H, and n₁H |; the set of cache nodes containing all normal states is N, and N₂N |, then they satisfy equation 2:

n₁+n₂h + N |, N formula 2

Secondly, all elements in H are arranged according to the descending order of the state values to obtain a sequence S_HThe sequence represents an order of how busy all hot state cache nodes are, the more advanced the element, the more the cache node isThe smaller the available service capacity is, the less, its formula 3 is as follows:

S_H＝{S_H。1，S_H。2...S_H。i...}(S_H。i≥S_H。(i+1)and i ═ 1, 2,. n₁) Equation 3

Next, all elements in N are arranged according to the ascending order of the state values, and a sequence S is obtained_NThe more advanced the cache node in the sequence is, the higher the service capability is, and its formula 4 is as follows:

S_N＝{S_N。1，S_N。2...S_N。i..}(S_N。i≤S_N。(i+1)and i ═ 1, 2,. n₂) Equation 4

Obviously, if only for S_HThe first element, namely the cache node with the largest state value is adjusted, the adjusting effect of each time is only limited to one cache node, and the condition that the difference between the largest state value and the next largest state value in the elements is small is not considered; if to S_HThe virtual nodes of all the elements are adjusted, so that the overhead is high, the thermal degree of a plurality of nodes is low, and the adjustment is not needed. Taken together, from the sequence S_HThe first element of (a) begins to select a prefix subsequence that will be the most hot state cache node that needs to be adjusted because their state values are relatively high, already in a busy state, and the service capacity has begun to decrease.

Definition of S_HThe prefix subsequence finally obtained in the step (A) is S_sub-HThe prefix subsequence is initially

The method for calculating the sequence is as follows:

(1) if | S_HI is less than or equal to 2, then S_H-＝S_sub-HSelecting all elements, and ending the process;

(2) if | S_HI > 2, first of all S_HolAnd S_Ho2Adding two elements;

(3) suppose S_HoiThe previous elements have all been added with S_sub-HIn, if S_HoiIf the following condition is satisfied, then S is_HoiAdding S_sub-HThe condition is expressed as formula 5:

(4) if S is_HoiSatisfies the conditional expression in (3) of equation 5, then for S_HNext element S in (1)_Ho(i+1)Repeating the judgment in the step (3); if the conditional expression formula 5 is not satisfied, the entire selection process ends.

Equation 5 for S_HThe element interval difference in the sequence is judged, and S is finally selected_HA prefix subsequence S of_sub-HFrom S_HThe elements with the minimum interval are formed, and the elements are uniformly assigned to the hottest type of cache nodes, namely the cache node set which needs to be adjusted at this time. Select S_sub-HThen, the slave S is required_NDetermining a corresponding set of mutually-assisted cache nodes, wherein the cache nodes in the set are fused with S_sub-HAnd the virtual node of the middle heat state cache node. Equation 5 has assumed n₂＝|S_NFor S |, for_sub-HOne hot state cache node S in_sub-HoiIts corresponding mutual idle cache node is S_NoiWhere j ═ i% n₂。

Although the present invention has been described with reference to a few embodiments, it should be understood that the present invention is not limited to the above embodiments, but rather, the present invention is not limited to the above embodiments, and those skilled in the art can make various changes and modifications without departing from the scope of the invention.

Claims

1. A cache method of a distributed database is characterized in that a hash value with a fixed length of a URL is obtained by a hash algorithm, and then a series of subsequent operations of searching a cache object are carried out by using the hash value.

2. The method for caching a distributed database according to claim 1, further comprising: hashing is performed using the MD5 algorithm to obtain a 128-bit value for all input strings: hashing all URLs to one [0, 2 ]¹²⁸-1]And performing one-to-one mapping with the URL; all of [0, 2 ]¹²⁸-1]The hash range is seen as a circular structure, [0, 2 ]¹²⁸-1]All hash values in the range are arranged in the order from big to small along the clockwise direction, and are uniformly distributed on the hash ring as a whole.

3. The method for caching a distributed database according to claim 2, further comprising: each cache node is responsible for all corresponding URLs in a section of range on the hash ring, and the IP value of each cache node is subjected to hash calculation to obtain a hash value.

4. The method for caching a distributed database according to claim 1, further comprising: after a request of a user comes, a front-end agent can obtain a URL contained in the request, firstly, a hash value of the URL is calculated, the hash value can correspond to a key value on a hash ring, then, a first node larger than the key value is searched along the clockwise direction of the hash ring, and the HTTP request can be positioned on a searched background cache server by the front-end agent; when a new cache node is added into the hash ring, the range charged by each original node does not change greatly, the new cache node only splits the original range of one node, and the redistribution of cache space cannot be caused when the new cache node is added.

5. The method for caching a distributed database according to claim 1, further comprising: the method comprises the steps of adopting a self-decision virtual node migration mutual aid strategy, carrying out perception monitoring on operation parameters of each Cache node in the distributed proxy Cache, judging whether local overheating occurs or not and other abnormalities exist, selecting a Cache node set with a lower current load as an aid according to a given multi-copy hierarchical management strategy, and sharing virtual nodes corresponding to Cache nodes with higher loads and reduced performance on a Cache hash ring.

6. The method for caching a distributed database according to claim 5, wherein said self-decision-making virtual node migration policy further comprises:

A. evaluating the state and the service capacity of the cache server;

7. The method for caching a distributed database according to claim 2, further comprising: adjusting the layer number of the hash ring; adjusting the virtual nodes to rebalance; and (6) data migration.

8. The method for caching distributed database according to claim 7,

the step of adjusting the number of layers of the hash ring further comprises: if the number of the virtual nodes of the unit weight is reduced to the threshold value, the number of layers of the virtual nodes is increased by 1; if the number of virtual nodes of a unit weight is higher than a threshold, reducing the layers of the virtual nodes;

9. A hash ring optimization method in a distributed database caching method according to any one of claims 1 to 8, characterized in that a self-decision-making virtual node migration mutual aid strategy is adopted, the operating parameters of each Cache node in a distributed proxy Cache are perceptively monitored, whether local overheating occurs or other abnormality is judged, and according to a given multi-copy hierarchical management strategy, a Cache node set with a lower current load is selected as an auxiliary to share a virtual node corresponding to a Cache node with a higher load and a reduced performance on a Cache hash ring;

the self-decision virtual node migration policy further comprises:

A. evaluating the state and the service capacity of the cache server;

10. The hash ring optimization method of claim 9, wherein the step of evaluating the state and service capability of the cache server further comprises: evaluating the busy degree of each cache node in all current cache nodes, and calculating the average value of the current states of all the nodes on the assumption that n cache nodes exist in the background

The calculation formula is as follows:

two states of the cache node are defined: (1) for cache node i, if

Then it is in a hot state; (2) for cache node i, if