WO2016070750A1

WO2016070750A1 - Distributed buffering range querying method, device, and system

Info

Publication number: WO2016070750A1
Application number: PCT/CN2015/093310
Authority: WO
Inventors: 湛滨瑜; 于君泽
Original assignee: 阿里巴巴集团控股有限公司; 湛滨瑜; 于君泽
Priority date: 2014-11-06
Filing date: 2015-10-30
Publication date: 2016-05-12
Also published as: CN105610881B9; CN105610881B; CN105610881A

Abstract

Disclosed is a distributed buffering range querying method. The method comprises: prestoring, in a storage area of a memory, an identifier value, corresponding to a field value that can be used for range querying, in keywords used for mapping buffered data; in response to a received query request aimed at a keyword within a specified range, finding, from the storage area, an identifier value corresponding to an end value of the specified range; and determining, according to the identifier value corresponding to the end value of the specified range, a keyword set corresponding to the specified range, to implement range querying decoupled from a database. Also disclosed are a distributed buffering range querying device and system.

Description

Distributed cache range query method, device and system

Technical field

The present invention relates to distributed cache, and in particular, to a distributed cache range query method, device and system.

Background technique

Distributed cache is a data cache method in which cached data is stored in a memory hash table in the form of key-value (keyword-cache data) through a distributed cache server cluster. Distributed caching reduces the number of accesses to the database by caching data and objects in memory, increasing data access speed.

Currently, in order to support the range query of the distributed cache, the range query of the associated key is implemented by establishing an index supporting the range query in the relational database. When the server receives a range condition query request, the index of the relational database is used to query the associated key according to the scope query condition, and then the corresponding value is obtained by direct query according to the key in the distributed cache.

However, since the scope query of the distributed cache needs to be implemented by the database index, the query performance is relatively poor without leaving the strong dependence on the database.

Summary of the invention

In view of this, the purpose of the present application is to provide a distributed cache range query method, apparatus and system for achieving the purpose of range query in the case of complete decoupling from the database.

In a first aspect of the embodiments of the present application, a distributed cache range query method is provided. For example, the method may include: in the keyword used to map the cached data, the identifier value corresponding to the field value that can be used for the range query is pre-stored in the storage area of the memory; in response to receiving the keyword for the specified range The query request is used to find an identifier value corresponding to the endpoint value of the specified range from the storage area, and determine a keyword set corresponding to the specified range according to the identifier value corresponding to the endpoint value of the specified range.

In a second aspect of the embodiments of the present application, a distributed cache range query device is provided. For example, the apparatus may include: a pre-processing unit, configured to store, in the keyword used for mapping the cache data, an identifier value corresponding to the field value of the range query, in a storage area of the memory; the query response unit, And in response to receiving the query request for the specified range of keywords, the identifier value corresponding to the specified range of endpoint values is searched from the storage area; the keyword obtaining unit is configured to use the endpoint value of the specified range The corresponding identifier value determines a keyword set corresponding to the specified range.

In a third aspect of the embodiments of the present application, a distributed cache range query system is provided. For example, the system may include: a cache server, configured to store cached data having a mapping relationship with a keyword, receive a query request sent by the query server for the cached data corresponding to the keyword set, and feed back the keyword set corresponding to the keyword set by the query server. The cached data; the query server may be used in the keyword used to map the cached data, and the identifier value corresponding to the field value of the range query may be pre-stored in the memory storage area, in response to receiving from the client for the specified And determining, by the identifier value corresponding to the endpoint value of the specified range, the identifier value corresponding to the endpoint value of the specified range, and determining the specified range according to the identifier value corresponding to the endpoint value of the specified range. a set of keywords, the cache data corresponding to the keyword set is obtained from the cache server, and the obtained cache data is fed back to the client that sends the query request; the client may be configured to send to the query server. Query request for cached data corresponding to a specified range of keywords The query cache server receives data feedback.

It can be seen that the application has the following beneficial effects:

The identifier value corresponding to the field value of the range query is pre-stored in the storage area of the memory, and the keyword for the specified range is received in advance. After the query request, the query for finding the identifier value corresponding to the endpoint value of the specified range from the storage area may all be completed in the memory, and the scope query decoupled from the database is implemented without accessing the database.

DRAWINGS

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings to be used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description are only Some embodiments described in this application are for those of ordinary skill in the art In other words, other drawings can be obtained based on these drawings without paying for creative labor.

1 is a schematic flowchart of a distributed cache range query method according to an embodiment of the present application;

2 is a schematic flowchart of a distributed cache range query method according to another embodiment of the present disclosure;

3 is a schematic diagram of a node ring according to an embodiment of the present application;

4 is a schematic diagram of a node ring according to another embodiment of the present application;

FIG. 5 is a schematic structural diagram of a distributed cache range query apparatus according to an embodiment of the present application; FIG.

FIG. 6 is a schematic structural diagram of a distributed cache range query apparatus according to another embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a distributed cache range query system according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application are clearly and completely described in the following, in which the technical solutions in the embodiments of the present application are clearly and completely described. The embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope should fall within the scope of the present invention.

In order to make the embodiments of the present application more understandable, the possible application scenarios of the embodiments of the present application are first illustrated. For example, embodiments of the present application can be applied to separate query servers that are different from one or more cache servers for storing cached data. Among them, a plurality of cache servers store mappings of keywords and cache data composed of hotel IDs and date values. The query server applying the method provided by the embodiment of the present application may receive a query request for a hotel ID in a specified range, and obtain a keyword set corresponding to the query request. For example, a collection of keywords with a hotel ID in the range of 7 to 32 can be queried.

Based on the foregoing analysis, the embodiment of the present application provides the following distributed cache range query method and apparatus.

For example, refer to FIG. 1 , which is a schematic flowchart of a distributed cache range query method according to an embodiment of the present application. As shown in FIG. 1, the method can include:

S110. In the keyword that is used to map the cached data, the identifier value corresponding to the field value that can be used for the range query is pre-stored in the storage area of the memory.

For example, in order to reduce the number of nodes in the node ring and improve query efficiency, all the keys can be Words are duplicated in the field values of the same field, and only the different field values of the same field are pre-stored in a memory area of the memory.

The storage structure of the storage area located in the memory is not limited, and may be, for example, a singly linked list, an array, a circular linked list, or the like. For example, in some possible implementations, the identifier value corresponding to the field value that can be used for the range query in the keyword may be pre-stored in a node ring (ie, a ring list) located in the memory, where one identifier The values are correspondingly stored in a node, and a corresponding routing table is established for each node in the node ring, and the routing table records the identification values of one or more other nodes determined according to the preset index algorithm.

In some possible implementation manners, the preset index algorithm may be: using a routing table to record an identifier value of a power relationship of a node corresponding to the routing table in the node ring to a power relationship of 2. Correspondingly, the step S110 may be specifically: storing, in the keyword, the identifier value corresponding to the field value of the same field that can be used for the range query, in the order of the identifier value, in the node ring located in the memory, wherein the routing table An identification value of a power relationship between the node ring and the identification value of the corresponding node in a power of 2 is recorded. Generally, the identifier value corresponding to the field value that can be used for the range query in the cache data is usually a value that is not much different, and the identifier in the routing table corresponds to the identifier value of the power relationship of the node to the power of 2 Nodes can facilitate subsequent lookups and improve search efficiency.

Of course, the preset indexing algorithm is not limited to the one in the above embodiment, and may be set according to the actual query efficiency. This application does not limit this. For example, the preset index algorithm may further be: using a routing table to record an identifier value of a node value of a node corresponding to the routing table in the node ring to be an integer multiple of a specified constant, and the like.

It can be understood that the key can be composed of fields with different fields, wherein the fields that can be used for the range query can include fields such as numbers, dates, and the like. For example, if a range query is needed for the date, the date value may be extracted from each keyword, and the extracted different date values are correspondingly converted into identifier values that can be calculated according to the preset index algorithm. In combination with the implementation of the node loop above, the identifier value may be stored in the form of a linked list as a node loop that is connected end to end and sorted by date value. For example, the original keyword is "date-August 07, 2014-hotelId-18873", the keyword reflects the correspondence between the date and the hotel id, and the date "August 07, 2014" can be converted to The identification value is 140807. The sorting according to the size of the identifier value is specifically sorted by the field value from small to large, or from the largest to the smallest.

S120. Search for an identifier value corresponding to the endpoint value of the specified range from the storage area, in response to receiving the query request for the keyword of the specified range.

For example, in an embodiment where the identification value is stored in a node ring, in response to receiving a query request for a specified range of keywords, using any of the node rings as the current node, routing at the current node The table finds the identity value closest to the endpoint value of the specified range. If it is determined that the identified identifier value is the identifier value closest to the endpoint value of the specified range in the node ring, the found identifier value is used as the identifier value corresponding to the endpoint value of the specified range; The identifier value is not the identifier value closest to the endpoint value of the specified range in the node ring, and the found identifier value is used as the current node, and then returned to the routing table of the current node to find the specified range from the current node. The step of identifying the value of the endpoint value most recently.

It can be understood that the specified range may include one or more specified ranges, and the endpoint value may be an endpoint value used to determine the specified range interval. For example, the specified date range may include: January 1, 2001 to May 1, 2001, and January 1, 2002 to May 1, 2002. Then the endpoint values may include: 010101 and 010501, 020101 and 020501.

It should be noted that the identifier value corresponding to the endpoint value may be an identifier value equal to the endpoint value, and in the case that there is no identifier value equal to the endpoint value, the node ring may be in the specified The identity value within the range that is closest to the endpoint value.

S130. Determine, according to the identifier value corresponding to the endpoint value of the specified range, a keyword set corresponding to the specified range.

For example, a keyword within the specified range may be constructed according to an identifier value corresponding to the endpoint value of the specified range, and a keyword set corresponding to the specified range may be obtained. Specifically, for example, a corresponding keyword construction rule may be set for different types of keywords in advance, and a corresponding keyword construction rule is adopted according to the type of the keyword to be queried, and the field value corresponding to the identifier value is used as an input variable to construct Corresponding keywords, get the keyword set corresponding to the query request. Suppose that you need to construct a keyword that reflects the relationship between the date and the hotel id. You can splicing the date field value corresponding to the identified identifier value and the different hotel id according to the different hotel ids preset by the keyword construction rule. Keyword. Of course, there are other ways to determine the keyword with the identity value. Those skilled in the art can set according to actual implementation requirements, and details are not described herein again.

For example, in some possible implementation manners, the method provided by the embodiment of the present application may be applied to a separate query server different from one or more cache servers that store cached data, after obtaining a keyword set corresponding to the specified range, The cache data corresponding to the keyword set may be further obtained from the one or more cache servers by one multi-thread download concurrently, and the obtained cache data is returned to the client that issues the query request.

It can be seen that, after the method provided by the embodiment of the present application is received, after the query request for the specified range of keywords is received, the query for finding the identifier value corresponding to the endpoint value of the specified range from the storage area may all be in the memory. Completed in the middle, without access to the database, the scope query decoupled from the database.

Next, an embodiment in which the routing table is used to record the identification value of the power relationship between the node ring and the corresponding node is set to a power relationship of two. For example, the embodiment can include:

S210, in the keyword, the identifier value corresponding to the field value of the same field that can be used for the range query is pre-stored in the node ring in the memory according to the size of the identifier value, wherein one identifier value is correspondingly stored in one node, And establishing a corresponding routing table for each node in the node ring, where the routing table records an identifier value of a power relationship between the node ring and the identifier value of the corresponding node.

Wherein, the identifier value of the power relationship with the spacing of 2 may refer to the identifier value of the node whose spacing is equal to 2 ^i-1 , and when there is no identifier value with the spacing equal to 2 ^i-1 , the spacing is closest to 2 ^i- The node ID value of ¹ . For example, in some possible implementations, in order to maximize the spacing to improve the query efficiency, the identifier value closest to 2 ^i-1 may be in the identification value with a spacing greater than 2 ^i-1 , and the spacing is closest to 2 ⁱ An identification value of ^-1 , where i is an integer, and is greater than or equal to 1, less than or equal to the maximum identification value of the node in the node ring, taking the logarithm of 2 and then rounding up the number.

For example, the node ring shown in FIG. 3, wherein the numbers 2, 8, 10, 16 and the like marked next to the node are identification values for identifying the node. Each node in the node ring shown in Figure 3 maintains a routing table of m items. Wherein, if the identifier value is expressed in binary, m is the number of bits of the largest binary identification value in the node ring. If L is the node with the largest identification value in the ring, then m is L and the 2 logarithm is rounded up. which is:

As shown in Figure 3, all nodes need to be distributed to the ring. The value of m should be 6. In the m-item routing table maintained by each node, the identifier value of the i-th record of the routing table is equal to:

Successor ((the node's identity value +2 ^i-1 ) mod2 ^m ), (1 ≤ ⁱ ≤ ^m ).

Since the routing table records the identity value of the power relationship with the identity value of the corresponding node in a power of 2, the direct successor node of each node is the first item of its routing table. In order to facilitate the query of the identifier value corresponding to the endpoint value of the specified range, each node in the node ring also maintains its own direct precursor node. In the embodiment of the present application, since the interval of the identifier value recorded by the routing table increases exponentially, the density of the node adjacent to the corresponding node recorded in the routing table is greater than the density of the remote node, so the routing table is indexed below. In the process of querying the identifier value corresponding to the specified range, if the endpoint value of the specified range is far from the identifier value of the current node, the sparse remote node recorded according to the routing table can quickly jump to a farther node for query. If the endpoint of the range is closer to the identifier of the current node, the denser neighboring node recorded by the routing table can be more accurately hopped to the node closer to the identifier value for query. Therefore, the routing table established by the node in the embodiment of the present application can perform efficient range query.

S220. Respond to receiving a query request for a specified range of keywords, using any node in the node ring as a current node;

S230. Find an identifier value that is closest to the endpoint value of the specified range in the routing table of the current node.

S240. If it is determined that the identifier value that is found is the identifier value that is closest to the endpoint value of the specified range in the node ring, the identifier value that is found is used as the identifier value corresponding to the endpoint value of the specified range.

S250. If it is determined that the identified identifier value is not the identifier value closest to the endpoint value of the specified range in the node ring, update the current node to the node where the found identifier value is located, and return to the current node in S230. The step of finding the identity value closest to the endpoint value of the specified range in the routing table.

In the following, the possible implementation manners of the foregoing steps S220-S250 in the embodiment of the present application are described in detail in conjunction with the implementation manner in which the nodes corresponding to the identifiers are sorted in the node ring in the order of the identifier values from small to large. For example, in this embodiment, the specified range may be a range between the first endpoint value and the second endpoint value, wherein the first endpoint value is smaller than the second endpoint value, and S220-S250 may be The query step can include:

When receiving a query request, any node in the node ring can be used as the current node.

Determining whether an identifier value equal to the first endpoint value exists in the identifier value of the routing table record of the current node.

If so, an identity value equal to the first endpoint value is used as the identity value corresponding to the first endpoint value.

If not, it is determined whether the first endpoint value is between the identity value of the current node and the identity value of its immediate precursor node or direct successor node. It can be understood that if the endpoint value is between the identifier value of the current node and the identifier value of the direct precursor node or its immediate successor node, it means that there is no identifier value equal to the endpoint value in the node ring, only in the The identifier value corresponding to the endpoint value is selected by the current node, the direct precursor node of the current node, or the direct successor node, and may be selected according to whether the endpoint value is the starting endpoint or the ending endpoint of the range. If the endpoint value is not between the identifier value of the current node and the identifier value of the direct precursor node or its immediate successor node, it indicates that other identifiers in the node ring may have an identity value equal to the endpoint value, and then the current node may be hopped. The node identified by the identifier value closest to the endpoint value recorded in the routing table continues to be judged.

And if the first endpoint value is between the identifier value of the current node and the identifier value of the direct precursor node, the identifier value of the current node is used as the identifier value corresponding to the first endpoint value.

If the first endpoint value is between the identifier value of the current node and the identifier value of the direct successor node, the identifier value of the direct successor node of the current node is used as the identifier value corresponding to the first endpoint value. .

If the first endpoint value is not between the identity value of the current node and the identity value of the immediate successor node, and is not between the identity value of the current node and the identity value of the immediate precursor node, then the current And updating, by the node, the node identified by the identifier value of the routing value of the current node recorded in the routing table of the current node, and returning to the foregoing determining whether the identifier value of the routing table record of the current node exists The step of identifying values with equal endpoint values.

Determining whether an identifier value equal to the second endpoint value exists in the identifier value of the routing table record of the current node.

If yes, the identity value equal to the second endpoint value is used as the identity value corresponding to the second endpoint value.

If not, it is determined whether the second endpoint value is between the identity value of the current node and the identity value of its immediate precursor node or direct successor node.

And if the second endpoint value is between the identifier value of the current node and the identifier value of the direct precursor node, the identifier value of the direct precursor node of the current node is used as the identifier value corresponding to the second endpoint value.

If the second endpoint value is between the identifier value of the current node and the identifier value of the direct successor node, the identifier value of the current node is used as the identifier value corresponding to the second endpoint value.

If the second endpoint value is not between the identity value of the current node and the identity value of the immediate successor node, and is not between the identity value of the current node and the identity value of the immediate precursor node, then the current node Updating to the node where the identifier value closest to the second endpoint value recorded in the routing table of the current node is located, and returning to the identifier value of the routing table record of the current node to determine whether the second endpoint exists The step of identifying values with equal values.

It should be noted that the query step for the identifier value corresponding to the first endpoint value and the identifier value corresponding to the second endpoint value may be performed concurrently or concurrently, and the sequence of query steps for different endpoint values in the embodiment of the present application is performed. There are no restrictions.

In the following, the above query step is schematically illustrated by taking the node ring shown in FIG. 3 and the cached information in the range of 7 to 32 for the query request value as an example. It can be understood that the numerical example is only for ease of understanding. If the field value available for the range query is non-numeric, the field value of the non-numeric type can be converted into the identifier value of the numeric type. For example, starting from node 2, between node 2 and the immediate successor node 8 of node 2, based on endpoint value 7, it is determined that there is no node with an identity value of 7 in the node ring. Therefore, the identifier value corresponding to the endpoint value 7 is 8. Then, starting from the node 8, according to the node in the routing table of the node 8 that is closest to the node 32, the node 28 jumps to the routing table of the node 28 to query, and queries the routing table information of the node 28 according to the node closest to 32. For the node 30, the routing table of the node 30 is queried, and the node 32 is located between the node 30 and its immediate successor node 33, and it is determined that there is no node with the identifier value of 32 in the node ring. Therefore, the identifier value corresponding to 32 is 30. Thus, according to the node ring as shown in FIG. 3, the identification values in the specified range 7 to 32 are found to be: 8, 10, 16, 21, 28, 30. Thus, the routing table of the ring node is used to query the end of the identification value of the specified range 7 to 32.

S260. Determine, according to the identifier value corresponding to the endpoint value of the specified range, the corresponding range of the specified range. Keyword collection.

It can be seen that, by applying the embodiment, the node ring can be directly read from the memory, and the routing table of the node in the node ring is used as an index to perform range query, and the dependency on the database is fast, the reading speed is fast, and the routing table records the In the node ring, the identifier value of the corresponding node is separated by an identifier value of a power relationship of 2, and therefore, in the process of searching for the identifier value corresponding to the endpoint value of the specified range according to the routing table, the jump is always the closest to the endpoint value. The routing table of the node identified by the field value is searched, and finally the keyword set corresponding to the specified range is found, so that the query process becomes a process of folding the search to achieve the purpose of efficient range query. For example, if 100 million nodes are used as an example, L=100,000,000, then the number of routing entries that need to be maintained for each node is log ₂ L=27, and the number of hops that any one of the nodes needs to query is at most logL=8, so The performance of the query is very high.

In addition, for distributed caching systems, cached keywords may be added or removed at any time. At the same time, in order to ensure that the node ring and the routing table are consistent with the keywords in the cache, it is necessary to update the node ring and the routing table at the same time as the keyword is added or deleted. Specifically, the embodiment of the present application may further include:

Determining, for the newly added keyword in the cache, whether the identifier value corresponding to the field value that can be used for the range query in the newly added keyword already exists in the node ring, and if not, storing the identifier value in the new node, Searching in the node ring for a node N that can serve as a direct precursor node of the new node, updating a direct precursor node of the direct successor node of the node N as the new node, and updating the node N as the new node Directly preceding the node, establishing a corresponding routing table for the new node;

For the keyword deleted in the cache, if the field value that can be used for the range query in the field of the deleted keyword does not exist in any other keyword, the node that stores the identifier value corresponding to the field value is used as the node to be deleted. Updating the direct precursor node of the direct successor node of the node to be deleted as the direct precursor node of the node to be deleted, and deleting the node to be deleted from the node ring;

And, according to the preset indexing algorithm, the routing table that needs to be updated is affected by the joining of the new node or affected by the deletion of the node to be deleted. For example, according to the routing table of each node, the field value of the power relationship with the field value of the corresponding node should be separately recorded, which is required to be affected by the joining of the new node or affected by the deletion of the node to be deleted. The updated routing table is updated.

The following is an example of how to update the routing table. For example, if the field value of the i-th item of the above routing table is equal to the successor ((the node's identification value +2 ^i-1 ) mod2 ^m ), (1 ≤ ⁱ ≤ ^m ), if the node ring newly joins the node P, you can use the following steps to update the routing table that needs to be updated by P:

According to the field value of the item i is equal to successor ((the identification value of the node +2 ^i-1 ) mod2 ^m ), the information recorded in the routing table of the predecessor node of the node P is recursively updated until the recursive precursor node cannot simultaneously The two conditions of the update are met and the recursion is terminated. The two conditions are as follows: Condition 1: The distance between the identification values of the recursive precursor node S and the node P is greater than or equal to 2 ^i-1 . Because if the distance of the identification value between the node S and the node P is less than 2 ^i-1 , the ^i- th item of the node S routing table must be after the node P, so the i-th item of the routing table does not need to be updated. Condition 2: Under the condition that the condition 1 is satisfied, the current i-th item of the routing table of the node S needs to be after the node P. Because if the routing table information of node S is the i-th item before node P. The P node is the node after the current i-th item, and the current item of the routing table does not need to be updated.

According to the judgment of the above two conditions, the routing table information of the predecessor node of the newly joined node can be recursively updated in the opposite direction of the preset order of the ring node. The update of the routing table by the affected node is the same as the update of the routing table by the new joining node, and will not be described here. Because the node is inserted or deleted, it will not affect the routing table of the successor node of the current node, and only affects the precursor node of the current node. This requires each node to maintain a direct precursor node in addition to maintaining routing table information. In the recursive update process, if the node S needs to update the i-th item of the routing table, the direct predecessor node of the node S may also need to update the routing table information. Conversely, if the node S does not need to update the routing table information, the predecessor node of the S There is also no need to update the routing table information. The recursive update of the routing table information ends.

For example, as shown in FIG. 4, the newly joined node is the node 30, and the routing table information of the predecessor node of the node 30 is recursively updated counterclockwise along the ring node, and is recursively updated from the predecessor node 28 to the node 16. As shown in FIG. 4, the node 28 routes the first item of the table, and the second item is updated from 33 to 30. Since the update to the node 16 does not satisfy the two conditions of updating the routing table at the same time, the routing table information of the node 16 does not change, and therefore, the recursive update is ended.

Corresponding to the distributed cache range query method provided by the embodiment of the present application, the embodiment of the present application A distributed cache range query device is also provided.

For example, refer to FIG. 5, which is a schematic structural diagram of a distributed cache range query apparatus according to an embodiment of the present application. As shown in FIG. 5, the apparatus may include:

The pre-processing unit 510 can be configured to store, in the keyword used for mapping the cache data, an identifier value corresponding to the field value of the range query, which is stored in a storage area of the memory in advance; the query response unit 520 can be used for In the keyword of the mapping cache data, the identifier value corresponding to the field value of the range query is pre-stored in the storage area of the memory; the keyword obtaining unit 530 can be used to identify the identifier corresponding to the endpoint value of the specified range. The value determines a set of keywords corresponding to the specified range.

In some possible implementations, the pre-processing unit 510 may be configured to store, in the keyword, an identifier value corresponding to a field value that is available for the range query, in a node ring located in the memory, where the identifier is The values are correspondingly stored in a node, and a corresponding routing table is established for each node in the node ring, and the routing table records the identification values of one or more other nodes determined according to the preset index algorithm. Correspondingly, referring to FIG. 6, the query response unit 520 may include: a lookup subunit 521, which may be configured to respond to a query request for a specified range of keywords, with any node in the node ring as a current The node searches for the identity value closest to the endpoint value of the specified range in the routing table of the current node. The first determining sub-unit 522 may be configured to: if it is determined that the found identifier value is the identifier value closest to the endpoint value of the specified range in the node ring, use the found identifier value as the endpoint value of the specified range. The corresponding identification value. The second determining sub-unit 523 may be configured to: if it is determined that the found identifier value is not the identifier value closest to the endpoint value of the specified range in the node ring, and use the found identifier value as the current node, triggering the searching The subunit searches for the identity value closest to the endpoint value of the specified range in the routing table of the current node.

In combination with the foregoing implementation manner, the pre-processing unit 510 may be configured to store, in the keyword, the identifier values corresponding to the field values of the same field that are available for the range query, in the order of the identifier value, in the node ring located in the memory. The routing table records field values in the node ring that are spaced apart from the field values of the corresponding nodes by a power relationship of two.

Next, an embodiment in which the routing table records field values in the node ring that are spaced apart from the field value of the corresponding node by a power relationship of 2 is described in detail. Assume that the nodes in the node ring The identifier values are sorted in ascending order, the specified range being a range between the first endpoint value and the second endpoint value, wherein the first endpoint value is less than the second endpoint value. For the query of the identifier value corresponding to the first endpoint value, the search sub-unit 521 in the embodiment of the present application, as shown in FIG. 6, may include:

The departure sub-unit 5210 may be configured to respond to the query request for the specified range of keywords, using any one of the node rings as the current node.

The first endpoint determining sub-unit 5211 is configured to determine whether an identifier value equal to the first endpoint value exists in the identifier value of the routing table record of the current node.

The first endpoint determining subunit 5212 may be configured to: if the first endpoint determining subunit 5211 determines that the identifier is present, the identifier value equal to the endpoint value is used as the first endpoint in the node ring The most recent identity value for the value.

The first endpoint continuation sub-unit 5213 may be configured to determine, if the first endpoint determining sub-unit 5211 determines that there is no presence, determine whether the first endpoint value is at the current node identifier value and its direct precursor node or directly Between the identification values of subsequent nodes.

The first endpoint continued stator unit 5214 can be configured to: if the first endpoint continuation subunit 5213 determines that the first endpoint value is between the identity value of the current node and an identity value of the immediate precursor node, Using the identifier value of the current node as the identifier value closest to the first endpoint value in the node ring; if the first endpoint continuation sub-unit 5213 determines that the first endpoint value is in the current Between the identifier value of the node and the identifier value of the direct successor node, the identifier value of the direct successor node of the current node is used as the identifier value closest to the first endpoint value in the node ring.

The second determining subunit 523 may be configured to: if the first endpoint contingency subunit 5213 determines that the first endpoint value is not between the identifier value of the current node and the identifier value of the immediate successor node And not between the identifier value of the current node and the identifier value of the direct precursor node, updating the current node to the identifier value closest to the first endpoint value recorded in the routing table of the current node The node where it is located, re-triggers the first endpoint determining sub-unit 5211 to execute.

For the query of the identifier value corresponding to the second endpoint value, the search sub-unit 521 in the embodiment of the present application, as shown in FIG. 6, may further include:

The second endpoint determining sub-unit 5215 can be configured to determine the routing table record of the current node. Whether there is an identification value equal to the second endpoint value in the identification value.

The second endpoint determining subunit 5216 may be configured to: if the second endpoint determining subunit 5215 determines that the identifier is present, the identifier value equal to the endpoint value is the closest to the second endpoint value in the node ring. Identification value.

The second endpoint continuation sub-unit 5217 may be configured to determine, if the second endpoint determining sub-unit 5215 determines that there is no presence, determine whether the second endpoint value is at an identifier value of the current node and a direct predecessor node or a direct successor node. Between the identification values.

The second endpoint continuation unit unit 5218 can be configured to: if the second endpoint continuation subunit 5217 determines that the second endpoint value is between the identity value of the current node and an identity value of the immediate precursor node thereof, The direct precursor node of the current node is the identifier value closest to the second endpoint value in the node ring; if the second endpoint continuation sub-unit 5217 determines that the second endpoint value is at the identifier value of the current node Between the identification values of the direct successor nodes, the identifier value of the current node is used as the identifier value closest to the second endpoint value in the node ring.

The second determining sub-unit 523 may be configured to: if the second endpoint continuation sub-unit 5217 determines that the second endpoint value is not between the identifier value of the current node and an identifier value of the immediate successor node, and Not between the identifier value of the current node and the identifier value of the direct precursor node, updating the current node to a node where the identifier value closest to the second endpoint value recorded in the routing table of the current node is located Re-triggering the second endpoint determination sub-unit 5215 to execute.

The following describes the specific implementation manner of adding or deleting a node in a node ring in the embodiment of the present application. For example, referring to FIG. 6, the apparatus provided in this embodiment of the present application may further include:

The node joining unit 540 may be configured to determine, for the newly added keyword in the cache, whether the identifier value corresponding to the field value of the newly added keyword that is available for the range query already exists in the node ring, and if not, the The identifier value is stored in a new node, in which the node N that can be the direct precursor node of the new node is found, and the direct precursor node of the direct successor node of the node N is updated as the new node, and the new node is updated. Node N is a direct precursor node of the new node, and a corresponding routing table is established for the new node;

The node deleting unit 550 may be configured to: for the keyword deleted in the cache, if the field value applicable to the range query in the field of the deleted keyword does not exist in any other keyword, A node that stores the identifier value corresponding to the field value is used as the node to be deleted, and the direct precursor node that updates the direct successor node of the node to be deleted is the direct precursor node of the node to be deleted, and the to-be-deleted is deleted from the node ring. node;

And the routing update unit 560 can be configured to update, according to the preset indexing algorithm, a routing table that needs to be updated by the joining effect of the new node or affected by the deletion of the to-be-deleted node.

The following is a schematic description of some possible application scenarios in the embodiments of the present invention.

For example, in some possible implementations, according to the consistency hash rule, kay-value information of each node may be stored in multiple cache servers of the distributed cache system. In order to improve query performance, the apparatus provided in this embodiment of the present application may be configured in a separate query server different from multiple cache servers for storing cached data. Correspondingly, the device may further include: a data feedback unit 570, configured to: after the keyword obtaining unit obtains the keyword set corresponding to the specified range, obtain the key from the plurality of cache servers by one multi-thread download and concurrent The cached data corresponding to the set of words returns the obtained cached data to the client that issued the request.

It can be seen that, by configuring the device provided by the embodiment of the present application, the query response unit 520 can directly read the node ring from the memory, and use the routing table of the node in the node ring as an index to perform range query, which is free from dependence on the database and fast in reading. And searching for the identifier value corresponding to the endpoint value of the specified range according to the routing table, always jumping to the routing table of the node identified by the field value closest to the endpoint value to search, and finally enabling the keyword obtaining unit 530 to Find the keyword set corresponding to the specified range, so that the query process becomes a process of folding the search to achieve the purpose of efficient range query.

It should be noted that the search subunit 521, the first determining subunit 522, the second determining subunit 523, the starting subunit 5210, the first endpoint judging subunit 5211, and the first endpoint determining subroutine are provided in the embodiment of the present application. The unit 5212, the first endpoint continuation subunit 5213, the first endpoint continuation stator unit 5214, the second endpoint determination subunit 5215, the second endpoint determination subunit 5216, the second endpoint continuation subunit 5217, and the second endpoint continuation stator Unit 5218, node joining unit 540, node deleting unit 550, routing update unit 560, and data feedback unit 570 are drawn in dashed lines in FIG. 6 to indicate that these units or subunits are not necessary units of the apparatus provided by the embodiments of the present application.

Corresponding to the above-mentioned distributed cache range query method, the embodiment of the present application further provides a Distributed cache range query system.

For example, refer to FIG. 7, which is a schematic structural diagram of a distributed cache range query system according to an embodiment of the present application. As shown in Figure 7, the system can include:

The cache server 710 may be configured to store the cached data that has a mapping relationship with the keyword, receive the query request sent by the query server 720 for the cached data corresponding to the keyword set, and feed back the cached data corresponding to the keyword set to the query server 720;

The query server 720 may be configured to store, in the keyword used for mapping the cache data, an identifier value corresponding to the field value of the range query, in a storage area of the memory, in response to receiving the specified range from the client. And a query request for the cached data corresponding to the keyword, the identifier value corresponding to the endpoint value of the specified range is searched from the storage area, and the key corresponding to the specified range is determined according to the identifier value corresponding to the endpoint value of the specified range a set of words, the cache data corresponding to the keyword set is obtained from the cache server, and the obtained cache data is fed back to the client that sends the query request;

The client 730 may be configured to send, to the query server, a query request for cached data corresponding to a specified range of keywords; and receive cached data fed back by the query server.

For example, the cache server 710 can have one or more. Different node rings and routing tables that can be used for range query established by the embodiments of the present application can be saved in the separate query server 720. For example, the query server 720 can determine the node ring that needs to be read according to the query request, and use any node in the node ring as the current node to perform a subsequent query step. After obtaining the keyword set corresponding to the query request, the query may be multi-threaded. Downloading and obtaining the cache data corresponding to the keyword set from the one or more cache servers, and returning the obtained cache data to the client that sends the request.

For the convenience of description, the above devices are described separately by function into various units. Of course, the functions of the various units may be implemented in one or more software and/or hardware in the practice of the invention.

It will be apparent to those skilled in the art from the above description of the embodiments that the present invention can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product, which may be stored in a storage medium such as a ROM/RAM or a disk. , optical discs, etc., including a number of instructions to make a computer device (can be a personal computer, A server, or network device, etc.) performs the methods described in various embodiments of the present invention or in certain portions of the embodiments.

The various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The invention is applicable to a wide variety of general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor based systems, set-top boxes, programmable consumer electronics devices, network PCs, small computers, mainframe computers, including A distributed computing environment of any of the above systems or devices, and the like.

The invention may be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network. In a distributed computing environment, program modules can be located in both local and remote computer storage media including storage devices.

It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

A distributed cache range query method, comprising:

Among the keywords that will be used to map the cached data, the identifier value corresponding to the field value that can be used for the range query is pre-stored in the storage area of the memory;

And in response to receiving the query request for the specified range of keywords, searching for the identifier value corresponding to the specified range of endpoint values from the storage area;

And determining, according to the identifier value corresponding to the endpoint value of the specified range, the keyword set corresponding to the specified range.
The method according to claim 1, wherein the identifier value corresponding to the field value that can be used for the range query in the keyword to be used for mapping the cache data is pre-stored in the storage area of the memory, including:

The identifier value corresponding to the field value that can be used for the range query is pre-stored in the node ring located in the memory, where an identifier value is correspondingly stored in one node, and is each in the node ring. A node establishes a corresponding routing table, and the routing table records an identifier value of one or more other nodes determined according to a preset index algorithm;

And in response to receiving the query request for the specified range of keywords, the identifier value corresponding to the endpoint value of the specified range from the storage area is:

Responding to receiving the query request for the specified range of keywords, using any node in the node ring as the current node, and finding the identifier closest to the specified range of endpoint values in the routing table of the current node value;

If it is determined that the identified identifier value is the identifier value closest to the endpoint value of the specified range in the node ring, the found identifier value is used as the identifier value corresponding to the endpoint value of the specified range;

If it is determined that the found identity value is not the identity value closest to the endpoint value of the specified range in the node ring, the current node is updated to the node where the found identity value is located, and the routing table is returned to the current node. The step of finding the identity value closest to the endpoint value of the specified range.
The method according to claim 2, wherein the identifier value corresponding to the field value that can be used for the range query in the keyword is pre-stored in the node ring located in the memory, and includes:

The identifier value corresponding to the field value of the same field that can be used for the range query in the keyword Pre-stored in the node ring located in the memory according to the size of the identifier value, wherein the routing table records the identifier value of the power relationship between the node ring and the corresponding node's identification value.
The method of claim 2, further comprising:

Determining, for the newly added keyword in the cache, whether the identifier value corresponding to the field value that can be used for the range query in the newly added keyword already exists in the node ring, and if not, storing the identifier value in the new node, Searching in the node ring for a node N that can serve as a direct precursor node of the new node, updating a direct precursor node of the direct successor node of the node N as the new node, and updating the node N as the new node Directly preceding the node, establishing a corresponding routing table for the new node;

For the keyword deleted in the cache, if the field value that can be used for the range query in the field of the deleted keyword does not exist in any other keyword, the node that stores the identifier value corresponding to the field value is used as the node to be deleted. Updating the direct precursor node of the direct successor node of the node to be deleted as the direct precursor node of the node to be deleted, and deleting the node to be deleted from the node ring;

And, according to the preset indexing algorithm, the routing table that needs to be updated is affected by the joining of the new node or affected by the deletion of the node to be deleted.
The method according to claim 1, wherein the method is applied to a query server different from one or more cache servers for storing cached data;

After obtaining the keyword set corresponding to the specified range, the method further includes: obtaining, by using one multi-threaded download concurrently, the cache data corresponding to the keyword set from the one or more cache servers, and returning the obtained cache data to the issued The client of the query request.
A distributed cache range query device, comprising:

a pre-processing unit, configured to store, in the keyword used for mapping the cache data, an identifier value corresponding to the field value of the range query, which is pre-stored in a storage area of the memory;

a query response unit, configured to search for an identifier value corresponding to the endpoint value of the specified range from the storage area, in response to receiving the query request for the keyword of the specified range;

The keyword obtaining unit is configured to determine, according to the identifier value corresponding to the endpoint value of the specified range, a keyword set corresponding to the specified range.
The device according to claim 6, wherein the pre-processing unit is configured to pre-store the identifier value corresponding to the field value that can be used for the range query in the keyword, in a node ring located in the memory, where An identifier value is correspondingly stored in one node, and is each of the node rings A node establishes a corresponding routing table, and the routing table records an identifier value of one or more other nodes determined according to a preset index algorithm;

The query response unit includes:

Determining a subunit, configured to: in response to receiving a query request for a specified range of keywords, use any one of the node rings as a current node, and find a specified range from the routing table of the current node The last identifier value of the endpoint value;

a first determining subunit, configured to determine, if the found identity value is an identifier value that is closest to the endpoint value of the specified range in the node ring, the identifier value that is found is corresponding to the endpoint value of the specified range Identification value

a second determining subunit, configured to: if it is determined that the found identifier value is not the identifier value closest to the endpoint value of the specified range in the node ring, update the current node to the node where the found identifier value is located, and trigger the location The lookup subunit searches for the identity value closest to the endpoint value of the specified range in the routing table of the current node.
The device according to claim 7, wherein the pre-processing unit is configured to store, in the keyword, an identifier value corresponding to a field value of the same field that can be used for the range query, in the order of the identifier value. Located in a node ring of the memory, wherein the routing table records field values in the node ring that are in a power relationship of 2 to the field value of the corresponding node.
The device according to claim 7, further comprising:

The node joining unit is configured to determine, for the newly added keyword in the cache, whether the identifier value corresponding to the field value of the newly added keyword that is available for the range query already exists in the node ring, and if not, the identifier value Storing in a new node, finding a node N in the node ring that can be a direct precursor node of the new node, updating a direct precursor node of the direct successor node of the node N as the new node, and updating the node N Establishing a corresponding routing table for the new node as a direct precursor node of the new node;

a node deleting unit, configured to: for a keyword deleted in the cache, if a field value that is available for the range query in the field of the deleted keyword does not exist in any other keyword, the identifier value corresponding to the field value is stored The node as the node to be deleted, the direct precursor node of the direct successor node of the node to be deleted is the direct precursor node of the node to be deleted, and the node to be deleted is deleted from the node ring;

And a routing update unit, configured to update, according to the preset indexing algorithm, a routing table that needs to be updated by the join of the new node or affected by the deletion of the node to be deleted.
The apparatus according to claim 6, wherein said apparatus is configured in a query server different from one or more cache servers for storing cached data;

The device further includes: a data feedback unit, configured to: after the keyword obtaining unit obtains the keyword set corresponding to the specified range, obtain the key from the one or more cache servers by using a multi-thread download concurrently The cached data corresponding to the set of words returns the obtained cached data to the client that issued the query request.
A distributed cache range query system, comprising:

a cache server, configured to store cache data with a mapping relationship with the keyword, receive a query request sent by the query server for the cached data corresponding to the keyword set, and feed back the cache data corresponding to the keyword set to the query server;

The query server is configured to store, in the keyword used for mapping the cache data, an identifier value corresponding to the field value of the range query, which is pre-stored in the storage area of the memory, in response to receiving the keyword for the specified range from the client And corresponding to the query request of the cached data, the identifier value corresponding to the endpoint value of the specified range is searched from the storage area, and the keyword set corresponding to the specified range is determined according to the identifier value corresponding to the endpoint value of the specified range. Obtaining the cached data corresponding to the keyword set from the cache server, and feeding the obtained cached data to the client that sends the query request;

a client, configured to send, to the query server, a query request for cached data corresponding to a specified range of keywords; and receive cached data fed back by the query server.