US20130283287A1 - Generating monotone hash preferences - Google Patents
Generating monotone hash preferences Download PDFInfo
- Publication number
- US20130283287A1 US20130283287A1 US13/857,024 US201313857024A US2013283287A1 US 20130283287 A1 US20130283287 A1 US 20130283287A1 US 201313857024 A US201313857024 A US 201313857024A US 2013283287 A1 US2013283287 A1 US 2013283287A1
- Authority
- US
- United States
- Prior art keywords
- resource
- bias
- specific affinity
- data object
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims description 45
- 238000004891 communication Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 28
- 230000008569 process Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 16
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 230000001174 ascending effect Effects 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000002940 Newton-Raphson method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003090 exacerbative effect Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 239000004557 technical material Substances 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5033—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering data affinity
Definitions
- Hash algorithms speed up searching and sorting by mapping an object key to a bucket where the associated object value or object value locator is stored. They also find use in load balancing applications where a key associated with a request (such as a username, or client ID, or network address) is used to determine what resources should be used to satisfy the request. In recent years, there has been extensive use of consistent hash functions in load balancing and peer to peer applications.
- consistent hashing algorithms provide hash table semantics where addition or removal of buckets do not cause significant numbers of objects to be remapped between buckets that have not been added or removed.
- the method of consistent hashing allows the hash table to map to physical objects, like disks or computers or network nodes, which may either be removed due to failure or added in response to load.
- the hashed objects which, among other things, may be data objects, network sessions, or users
- FIG. 1 is a block diagram illustrating an embodiment of a network environment in which data objects are stored in a distributed manner.
- FIG. 2 is a flow diagram illustrating an embodiment of a process for storing data objects in a distributed storage environment.
- FIG. 3 is a flow diagram illustrating an embodiment of a process for generating an ordered list of candidate storage nodes.
- FIG. 4 is a flow diagram illustrating an embodiment of a process for generating a node-specific hash for a candidate node with respect to an object.
- FIG. 5 is a flow diagram illustrating an embodiment of a process for relocating objects when a storage node is removed.
- FIG. 6 is a flow diagram illustrating an embodiment of a process for populating an added storage node.
- FIG. 7 is a flow diagram illustrating an embodiment of a process for determining biases for candidate nodes.
- FIG. 8 is a block diagram illustrating an embodiment of a node selection system.
- the invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor.
- these implementations, or any other form that the invention may take, may be referred to as techniques.
- the order of the steps of disclosed processes may be altered within the scope of the invention.
- a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task.
- the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
- a hash or other resource-specific value is computed with respect to a resource requirement to be fulfilled.
- a bias is applied to one or more of the resource-specific values.
- the resource-specific values are sorted and a prescribed number of resources in a prescribe location within the ordered list, e.g., a first m resources in the list, are used to satisfy the resource requirement.
- the resource requirement includes storing a data object on m storage nodes included in a plurality of storage nodes
- a hash is computed with respect to the data object, for example based on data comprising or associated with the data object and data associated with the node; biases are applied to one or more of the storage node-specific hash values; the biased (as applicable) node-specific hash values are sorted in an order (e.g., highest to lowest); and the data object is stored on the first m storage nodes in the sorted list.
- Buckets In various embodiments described herein the terms “buckets”, “nodes”, and other terms are used to refer to a set of possible paths, destinations, and/or other resources one or more of which must be selected to satisfy each of a plurality of resource requirements. Broadly, techniques disclosed herein may be used to match an item included in a first set with one or more elements of a second set.
- FIG. 1 is a block diagram illustrating an embodiment of a network environment in which data objects are stored in a distributed manner.
- one or more clients represented in FIG. 1 by client 102
- a node selector 112 is configured to process requests from clients such as client 102 to store data objects, such as files or pieces thereof, and for each object to determine a prescribed number m of the available n nodes on which the object will be stored.
- techniques disclosed herein are used to determine for each object one or more storage nodes on which the object will be stored.
- individual machines may act as clients, storage nodes, or both.
- consistent hashing algorithms provide hash table semantics where addition or removal of buckets do not cause significant numbers of objects to be remapped between buckets that have not been added or removed.
- a consistent hash algorithm would ensure that removal of node 108 would not result in objects not previously stored on node 108 being moved between remaining nodes such as nodes 106 and 110 . Instead, only objects previously stored on node 108 would be remapped to other, remaining nodes. Likewise, when a node is added the node should be populated without remapping objects between nodes that already existed previously.
- a hash function In order to avoid unnecessary relocation of objects when there are changes in the bucket population a hash function must have the property of monotonicity. As described by Karger, et al., in Consistent Hashing and Random Trees , Proceedings of the 19 th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1066-1075 (2008), “This property says that if items are initially assigned to a set of buckets V 1 and then some new buckets are added to form V 2 then an item may move from an old bucket to a new bucket, but not from one old bucket to another.
- the resources corresponding to hash buckets often are not homogeneous. Because the devices in a distributed environment may differ in terms of their resource availability and the cost of servicing various types of requests, it is desirable to have the hash buckets distributed unevenly. This is contrary to the historical trend of searching for balanced hash functions that uniformly distribute items between the buckets.
- the selection may be biased, e.g., based on such information as the available capacity and/or capabilities of each destination nodes and/or a relationship between the object and/or an entity with which the object is associated and one or more of the candidate nodes.
- a node with a relative large amount of remaining storage space available may in some embodiments be more likely to be selected than one that is nearly full, all else being equal.
- the candidate nodes are distributed, e.g., geographically and/or in terms of network or other topography
- a candidate node that is more “near” in some relevant sense to a client or other host that is likely to access the object determined in some embodiments, for example, based on stored statistics and/or other data tracking past use of the object, access statistics regarding related objects, a location of a client or other host being used to store the object, etc., is more likely to be selected than one that is relatively further away, all else being equal.
- resources may be associated with any arbitrary set of identifying attributes and biased at least in part on that basis.
- resources may specialize in large files, classified information, or objects with unusually high vowel counts in their name. Resource requirements that share these attributes are then directed, e.g., by employing biasing as described herein, to the associated resource(s).
- biases can be viewed as a vector of desired probabilities, object share, or relative affinity.
- a hash method can be constructed using multiplicative biases as coefficients to be multiplied by a family of independent hash functions.
- the preference list is then defined as the ordered list of the biased hashes.
- the method constructed here provides the desirable properties discussed above. Specifically, this method can bias decisions in any arbitrary way without causing unnecessary relocations and without unfairly spreading the necessary relocations among a small subset of available buckets. Not only multiplication, but any function taking a bias and a hash value as an input and providing a biased hash as an output can provide this property as long as the function is monotonically increasing with regard to either input parameter.
- a computer and/or or computer network has associated with it storage that resides on several different physical storage devices. When deciding where to place an object this could use total disk space or other metrics of storage and/or retrieval cost as an input to determine biases.
- the hash function would then take the object identifier of the object to be stored as its input. If some storage device has half the storage capacity it may receive about half of the write requests. Because of the repeatability of the hash implementation, any other computer in the network can determine where to retrieve the same file by computing the hash on the object identifier using the same set of biases.
- a load balancing server may receive requests from users to connect to a server.
- the load balancing server could use server capacity as an input to determine biases for the hash function.
- the client identifier or network address can be used as the input to the hash function to determine a server to service the request.
- This scheme maximizes the probability that the load balancing server will select a server which already has an initialized session for the given user without requiring the load balancing server to keep a table mapping client identifiers to servers.
- This scheme also allows multiple load balancing servers to work in parallel without sharing state per connection. As long as the set of load balancing servers all share the same biases they will reach the same decisions about which servers should handle a given client's traffic.
- Call centers could similarly use the algorithm to route incoming calls to a specific worker. Biases would then be based on the relative efficiency of the various workers.
- the input to the hash function could be the caller ID information of the incoming call. This would give each individual caller a high chance of reaching the same worker on subsequent calls while simultaneously providing appropriate workloads to each worker. Again, no record of association between past calls and the workers who serviced them would need to be kept or consulted.
- FIG. 2 is a flow diagram illustrating an embodiment of a process for storing data objects in a distributed storage environment.
- a distributed storage environment include a network environment, such as the example shown in FIG. 1 , in which a plurality of storage nodes (e.g., servers, etc.) are connected with one or more other nodes via network connections, and a single logical and/or physical node comprising a plurality of storage disks, drives, and/or other storage devices.
- a network environment such as the example shown in FIG. 1
- a plurality of storage nodes e.g., servers, etc.
- a single logical and/or physical node comprising a plurality of storage disks, drives, and/or other storage devices.
- an ordered list of candidate storage nodes is generated ( 202 ).
- the object is stored in the first m nodes in the ordered list ( 204 ).
- exceptions may be made to storing the object in the first m nodes in the sorted preference list. For example, exceptions may be made based on availability and capacity constraints as determined at run time. In some embodiments, if a node is determined to not be available, that node will not be selected even if it is within the first m nodes on the list. Instead, in some embodiments, the system will choose the first m available resources on the list.
- techniques described herein to select one or more storage nodes on which to store a data object are used to find the data object quickly after they have been stored. By knowing the preference list of where the object would be stored the probability is very high in various embodiments that the first location checked will be the correct location.
- FIG. 3 is a flow diagram illustrating an embodiment of a process for generating an ordered list of candidate storage nodes.
- step 202 of FIG. 2 includes the process of FIG. 3 .
- a biased node-specific hash is computed with respect to that object for each of a plurality of candidate nodes ( 302 ). For example, for a received object 1, a biased node-specific hash would be computed for each of nodes A, B, C, and D in the example above.
- a set of unbiased node-specific hashes are computed for the object, e.g., A 1 , B 1 , C 1 , and D 1 , and a node-specific bias is applied to each to generate a set of biased node-specific hashes, e.g., A 1 ′, B 1 ′, C 1 ′, and D 1 ′.
- a biased node-specific hash has been computed for the received object for each of the candidate nodes ( 302 )
- an ordered list of candidate nodes is generated by sorting the candidate nodes by their respective biased hash values for the object ( 304 ).
- a biased hash list of resources is generated by hashing data associated with a resource requirement to a point in space.
- the resources themselves are represented by points in space and the biased hash list is formed by taking the distances from the resource requirement's point to all of the resource points and ordering these distances from shortest to longest. Adjusting biases of the various resources is done in various embodiments by moving the points in space, either through iterative methods, control systems, or statistical methods such as Correspondence Analysis, Principal Component Analysis, and other techniques.
- FIG. 4 is a flow diagram illustrating an embodiment of a process for generating a node-specific hash for a candidate node with respect to an object.
- step 302 of FIG. 3 includes the process of FIG. 4 .
- a node-specific hash is computed for the object to be stored ( 402 ).
- a node-specific bias is applied to the computed node-specific hash ( 404 ) to generate the biased node-specific hash for the object.
- the node-specific hash values are biased by one or more techniques, for example by multiplying a hash value computed based on data associated uniquely (or likely so) with the object and data associated uniquely with the node, on the one hand, by a multiplicative bias.
- a multiplicative bias An example involving the use of multiplicative biases is described below.
- the bias may in addition or instead involve addition or one or more other operations.
- the biasing may be based on configured and/or configurable factors, such as differing capacity or remaining capacity, processing power and/or speed, communication speed, geography, network topology, etc.
- a hash value for a storage node known or determined to be near a location from which a data object has most often and/or most recently been accessed may be biased in a way that makes it more likely that node will be in the first m nodes on the ordered list.
- statistics reflecting the location(s) from which a stored object has been accessed are kept.
- object access statistics are evaluated and if an object is accessed solely or primarily and/or most recently from a location or set of locations associated with a particular storage node or subset of nodes, then for that object the node-specific hash value for that node or each of those nodes may be biased in an amount that creates a bias in favor of the object being stored, e.g., in a subsequent “save” operation, at that node (or those nodes).
- access information e.g., statistics
- an object's association with a particular storage node may be reflected in an object's name and/or an addition or other annotation thereto, and/or in (other) associated metadata.
- this portion of the name is used at storage time to compute a bias for the node(s) indicated by such annotation as being associated particularly with the object.
- other factors may also be reflected in biasing, including configuration information, storage node statistics (e.g., space remaining, etc.), etc.
- techniques described herein facilitate the efficient use of newly added nodes and/or avoid inefficiency (e.g., excessive churn) when nodes are removed (e.g., due to failure, for maintenance, etc.).
- FIG. 5 is a flow diagram illustrating an embodiment of a process for relocating objects when a storage node is removed.
- a node is removed ( 502 )
- a new ordered list of candidate storage nodes is computed ( 504 ).
- the process of FIGS. 2-4 is used to compute new ordered lists.
- Objects formerly stored on the removed node are copied, for example from other nodes on which they were stored redundantly prior to removal of the removed node, as needed to ensure that each is stored on the first m nodes in the object's new ordered list of nodes ( 506 ). Because the hash values and biases are independent between storage nodes, in some embodiments the new ordered list may be formed by simply removing the appropriate storage node(s) from the prior ordered list.
- FIG. 6 is a flow diagram illustrating an embodiment of a process for populating an added storage node.
- a node is added ( 602 )
- a new ordered list of candidate storage nodes is computed ( 604 ).
- the process of FIGS. 2-4 is used to compute new ordered lists.
- Objects for which the added node is included in the first m nodes on the object's new ordered list are copied to the added node, for example from other nodes on which they were stored prior to addition of the added node ( 606 ). Because the hash values and biases are independent between storage nodes, in some embodiments the new ordered list may be formed by only calculating the values for the newly added node(s) and inserting them in the appropriate location.
- FIG. 7 is a flow diagram illustrating an embodiment of a process for determining biases for candidate nodes.
- configuration data regarding candidate nodes is received ( 702 ).
- Examples of configuration data include a storage capacity of each node; a location preference value for one or more nodes; an identification of a type, size, source, etc. of data object preferred to be stored on one or more nodes; and an identification of statistics to be gathered and/or used to determine biases.
- node-bias related information for the object is determined ( 704 ). For example, access statistics for the object are evaluated in some embodiments.
- the node configuration information ( 702 ) and object information ( 704 ) are used to compute for each candidate node a node-specific bias to be applied to a node-specific hash for that object with respect to that node ( 706 ).
- FIG. 8 is a block diagram illustrating an embodiment of a node selection system.
- node selector 112 of FIG. 1 includes a node selection engine 802 connected via a communication interface 804 , such as a network interface card or other interface, to a network (not shown in FIG. 8 ), such as Internet 104 of FIG. 1 .
- the node selection engine 802 in some embodiments is implemented by one or more processors configured to execute software or other computer instructions.
- the node selection engine 802 is configured to receive via communication interface 804 a request to store a data object.
- the node selection engine 802 consults a node configuration datastore 806 and object access statistics 808 to determine for each candidate node, with respect to the object, a biased node-specific hash value (e.g., using the process of FIG. 7 ).
- the node selection engine 802 sorts the nodes into an order based on the biased hash values (e.g., highest to lowest) and causes the object to be stored on the first m nodes on the list.
- the location(s) in which the object is stored is stored in a location index 810 .
- the node configuration data 806 , access statistics 808 , and location index 810 are stored in one or more of a memory, a disk or other drive, and/or another storage device comprising node selector 112 .
- An access server 812 is configured to receive via communication interface 804 requests to access previously stored objects, and to use location index 810 to provide in response an indication of one or more nodes on which a requested object is stored.
- the location index 810 may be omitted because the information is already available through calculation of the biased hash function.
- temporary resource availability or capacity limitations may cause objects to skip earlier nodes in the hash list and instead reside on later storage nodes.
- location index 810 is used order to quickly find all redundant copies as soon as any one copy is found.
- the set of buckets is denoted by K. Individual buckets are denoted with a subscript for bucket number such as K 1 .
- the set of multiplicative biases is denoted as B. Individual biases from the set are denoted with a subscript bucket number such as B 1 .
- H 1 (x) is the hash of item x using the first function in the set. Constructing a large set of distinct functions may be difficult. For a sufficiently wide hash function such as SHA2 the hashes can be generated from different bit ranges of the output. Other approaches are to pass different initial values to the same hash function, or to prepend or append data to generate the alternate functions.
- W(B; n) is the probability of K n having the highest preference given bias set B and randomly selected hash values.
- D n (B; t) ⁇ 0 t
- P n (B; t)dt is the cumulative distribution function associated with P n (B; t).
- G is the set of unique biases in ascending order.
- G 1 is the minimum element of set B.
- is the number of elements in set G.
- C n is the count of elements of set B that are equal to G n .
- Equation (11) can be viewed as the product of all biases.
- Equation (12) can be viewed as the cumulative distribution function of all buckets.
- the preference list of buckets is ordered such that the higher preference goes to the bucket K n where B n H n (x) ⁇ B m H m (x) for all m.
- B n H n (x) ⁇ B m H m (x) for all m.
- two or more preferences are exactly equal
- several different tie breaker schemes can be used. One method would be to take the population that tied for first place and repeat the same hash algorithm with a different set of hash functions from the same family and possibly adjusted biases. Other methods could be to make a selection from the subset using a more traditional hash function or to always allow the lowest numbered bucket to be selected.
- the preferred bucket is the one with the largest resulting product when its bias is multiplied by its hash value for a given object.
- the biases can be determined by using the above equations via the Newton-Raphson method or any other iterative approximation method. Unfortunately these iterative processes can be slow for modem implementations when biases or hash bucket populations change frequently.
- Exact values for probabilities are not ideal for a cost-based bias system that is used for choosing an assignment of data objects to nodes for storage.
- the desired probabilities should be scaled depending on the relative number of nodes and the percentage desired by each node. While affinities may not sum to 1 their ratios represent relative desired probabilities. Nodes that have a very small affinity for some object (and thus a high cost for serving it) should get even less, since they are not likely to be asked to service requests for that object. Nodes that have a moderately strong affinity for an object should receive slightly more than desired. These are the most likely places for object requests to be serviced. Finally, nodes that have an extremely strong affinity for an object (such as a 90% requested selection rate) should be given much but not all of the data they request in order to preserve properties of robustness in the event of failure and to more evenly distribute load on the network.
- a method of approximating a set of biases from a set of desired selection probabilities while including the desired deviation is needed. Assuming X n is the requested affinity from an inverse cost metric and X is the mean of set X, one such method of calculation with desirable properties is
- the values Y and Z are both constants. This function was chosen because the first term closely approximates the inverse function of the dominant term of the polynomial W(B; n) in the continuous case. This approximation serves to mostly linearize the relation between inset X n and W n . The relationship is not quite linear because the root taken is slightly larger than the ideal linearizing root over the full range of the polynomial. This approximation also remains close for the discrete case.
- the second term provides the correction necessary to adjust the slope such that the properties discussed above are provided. Larger values of Y tend to give more weight to the first term in all cases while larger values of Z tend to give more weight to the first term as the bucket population increases.
- the resource requirement is storage of a data object and the resources (“buckets”) are candidate storage nodes
- the techniques described herein may be used in other contexts as well, including without limitation call center call routing and network resource or other load balancing, as noted above.
- biasing calculations used in some embodiments have been described in detail, in other embodiments other methods of determining biases and applying same to node- (or other bucket-) specific hash values (or other values computed to achieve a desired distribution) may be used.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Selecting a resource to fulfill a resource requirement is disclosed. For each resource requirement, a resource-specific affinity value is computed with respect to each of a plurality of resources. A bias is applied to each of at least a subset of the resource-specific affinity values. The biased, as applicable, resource-specific affinity values are sorted into a resource preference list. The sorted preference list is used to select a resource to fulfill the resource requirement.
Description
- This application is a continuation of co-pending U.S. patent application Ser. No. 12/806,236, entitled GENERATING MONOTONE HASH PREFERENCES, filed Aug. 5, 2010, which is incorporated herein by reference for all purposes, which claims priority to U.S. Provisional Application No. 61/274,295, entitled GENERATING MONOTONE HASH PREFERENCES filed Aug. 14, 2009, which is incorporated herein by reference for all purposes.
- Hash algorithms speed up searching and sorting by mapping an object key to a bucket where the associated object value or object value locator is stored. They also find use in load balancing applications where a key associated with a request (such as a username, or client ID, or network address) is used to determine what resources should be used to satisfy the request. In recent years, there has been extensive use of consistent hash functions in load balancing and peer to peer applications.
- Typically, in both load balancing and peer to peer applications “churn” (changes in the number of buckets) occurs. Conventional hash algorithms cause nearly all of the keys to be remapped between buckets when the number of buckets changes. By contrast, consistent hashing algorithms provide hash table semantics where addition or removal of buckets do not cause significant numbers of objects to be remapped between buckets that have not been added or removed. The method of consistent hashing allows the hash table to map to physical objects, like disks or computers or network nodes, which may either be removed due to failure or added in response to load. When consistent hashing is used, the hashed objects (which, among other things, may be data objects, network sessions, or users) do not suffer excessive churn when the set of physical resources is changed.
- Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
-
FIG. 1 is a block diagram illustrating an embodiment of a network environment in which data objects are stored in a distributed manner. -
FIG. 2 is a flow diagram illustrating an embodiment of a process for storing data objects in a distributed storage environment. -
FIG. 3 is a flow diagram illustrating an embodiment of a process for generating an ordered list of candidate storage nodes. -
FIG. 4 is a flow diagram illustrating an embodiment of a process for generating a node-specific hash for a candidate node with respect to an object. -
FIG. 5 is a flow diagram illustrating an embodiment of a process for relocating objects when a storage node is removed. -
FIG. 6 is a flow diagram illustrating an embodiment of a process for populating an added storage node. -
FIG. 7 is a flow diagram illustrating an embodiment of a process for determining biases for candidate nodes. -
FIG. 8 is a block diagram illustrating an embodiment of a node selection system. - The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
- A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
- Generating monotone hash preferences is disclosed. In various embodiments, for each of a plurality of resources a hash or other resource-specific value is computed with respect to a resource requirement to be fulfilled. A bias is applied to one or more of the resource-specific values. The resource-specific values are sorted and a prescribed number of resources in a prescribe location within the ordered list, e.g., a first m resources in the list, are used to satisfy the resource requirement. For example, if the resource requirement includes storing a data object on m storage nodes included in a plurality of storage nodes, for each storage node in the plurality of nodes a hash is computed with respect to the data object, for example based on data comprising or associated with the data object and data associated with the node; biases are applied to one or more of the storage node-specific hash values; the biased (as applicable) node-specific hash values are sorted in an order (e.g., highest to lowest); and the data object is stored on the first m storage nodes in the sorted list.
- In various embodiments described herein the terms “buckets”, “nodes”, and other terms are used to refer to a set of possible paths, destinations, and/or other resources one or more of which must be selected to satisfy each of a plurality of resource requirements. Broadly, techniques disclosed herein may be used to match an item included in a first set with one or more elements of a second set.
-
FIG. 1 is a block diagram illustrating an embodiment of a network environment in which data objects are stored in a distributed manner. In the example shown, one or more clients, represented inFIG. 1 byclient 102, communicates via the Internet 104 to store data objects (potentially) distributed across a plurality ofstorage nodes 1 to n, represented inFIG. 1 bynodes node selector 112 is configured to process requests from clients such asclient 102 to store data objects, such as files or pieces thereof, and for each object to determine a prescribed number m of the available n nodes on which the object will be stored. In various embodiments, techniques disclosed herein are used to determine for each object one or more storage nodes on which the object will be stored. In various embodiments, individual machines may act as clients, storage nodes, or both. - As noted above, consistent hashing algorithms provide hash table semantics where addition or removal of buckets do not cause significant numbers of objects to be remapped between buckets that have not been added or removed. For example, in the example shown in
FIG. 1 , a consistent hash algorithm would ensure that removal ofnode 108 would not result in objects not previously stored onnode 108 being moved between remaining nodes such asnodes node 108 would be remapped to other, remaining nodes. Likewise, when a node is added the node should be populated without remapping objects between nodes that already existed previously. - In order to avoid unnecessary relocation of objects when there are changes in the bucket population a hash function must have the property of monotonicity. As described by Karger, et al., in Consistent Hashing and Random Trees, Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1066-1075 (2008), “This property says that if items are initially assigned to a set of buckets V1 and then some new buckets are added to form V2 then an item may move from an old bucket to a new bucket, but not from one old bucket to another. This reflects one intuition about consistency: when the set of usable buckets changes, items should only move if necessary to preserve an even distribution.” Because a monotone hash function generates a preference list of all possible buckets for each object we are guaranteed that the removal of a bucket will only affect the objects that preferred that bucket. Changes in any one bucket will not affect the preference ordering of the remaining buckets.
- In a typical distributed environment populations of computers, hard drives, and other devices are constantly changing. New devices are introduced and old devices fail or are removed from the network. From the perspective of the hash function this means that the items to be hashed may have a constantly changing population of buckets. When the population of buckets changes it is important that no unnecessary load is generated on the network. A hash function's monotonicity is necessary and sufficient to guarantee that when a given node is introduced, removed, or has its bias changed that no two uninvolved nodes will exchange files in any direction. Any existing items which have changed their assigned bucket will only have moved to or from the added, removed, or changed bucket.
- In various embodiments, the resources corresponding to hash buckets often are not homogeneous. Because the devices in a distributed environment may differ in terms of their resource availability and the cost of servicing various types of requests, it is desirable to have the hash buckets distributed unevenly. This is contrary to the historical trend of searching for balanced hash functions that uniformly distribute items between the buckets. For example, in the example shown in
FIG. 1 , in some embodiments the selection may be biased, e.g., based on such information as the available capacity and/or capabilities of each destination nodes and/or a relationship between the object and/or an entity with which the object is associated and one or more of the candidate nodes. For example, a node with a relative large amount of remaining storage space available may in some embodiments be more likely to be selected than one that is nearly full, all else being equal. Likewise, in embodiments in which the candidate nodes are distributed, e.g., geographically and/or in terms of network or other topography, a candidate node that is more “near” in some relevant sense to a client or other host that is likely to access the object, determined in some embodiments, for example, based on stored statistics and/or other data tracking past use of the object, access statistics regarding related objects, a location of a client or other host being used to store the object, etc., is more likely to be selected than one that is relatively further away, all else being equal. In various embodiments, resources may be associated with any arbitrary set of identifying attributes and biased at least in part on that basis. For example, resources may specialize in large files, classified information, or objects with unusually high vowel counts in their name. Resource requirements that share these attributes are then directed, e.g., by employing biasing as described herein, to the associated resource(s). - Resource allocation can be made significantly more efficient by providing arbitrary categorical distributions that reflect varying costs of object placement. Biases may also vary per object as well as globally. Any arbitrary external method can be used to generate the bias list on a per-object basis using statistical methods, policy inputs, or any other available data. These biases can be viewed as a vector of desired probabilities, object share, or relative affinity.
- Storage locations in a distributed network are not the only critical resource. Network bandwidth can be contended as well. Existing hash function schemes such as the monotone ranged hash function unfairly distribute work load in the event of a node failure or bias change. Ranged hashing relies on picking the closest bucket to an object's location in an n dimensional vector space. When the number of buckets is significantly larger than the dimensionality of the vector space the result is extremely unbalanced network loading when relocating objects. This kind of unfair loading can place stress on a small number of nodes. The newly stressed nodes can cause failures to cascade, further exacerbating the problem.
- A hash method can be constructed using multiplicative biases as coefficients to be multiplied by a family of independent hash functions. The preference list is then defined as the ordered list of the biased hashes. The method constructed here provides the desirable properties discussed above. Specifically, this method can bias decisions in any arbitrary way without causing unnecessary relocations and without unfairly spreading the necessary relocations among a small subset of available buckets. Not only multiplication, but any function taking a bias and a hash value as an input and providing a biased hash as an output can provide this property as long as the function is monotonically increasing with regard to either input parameter.
- In various embodiments, a computer and/or or computer network has associated with it storage that resides on several different physical storage devices. When deciding where to place an object this could use total disk space or other metrics of storage and/or retrieval cost as an input to determine biases. The hash function would then take the object identifier of the object to be stored as its input. If some storage device has half the storage capacity it may receive about half of the write requests. Because of the repeatability of the hash implementation, any other computer in the network can determine where to retrieve the same file by computing the hash on the object identifier using the same set of biases.
- Another possible implementation is for load balancing client sessions. A load balancing server may receive requests from users to connect to a server. In this instance the load balancing server could use server capacity as an input to determine biases for the hash function. The client identifier or network address can be used as the input to the hash function to determine a server to service the request. This scheme maximizes the probability that the load balancing server will select a server which already has an initialized session for the given user without requiring the load balancing server to keep a table mapping client identifiers to servers. This scheme also allows multiple load balancing servers to work in parallel without sharing state per connection. As long as the set of load balancing servers all share the same biases they will reach the same decisions about which servers should handle a given client's traffic.
- Call centers could similarly use the algorithm to route incoming calls to a specific worker. Biases would then be based on the relative efficiency of the various workers. The input to the hash function could be the caller ID information of the incoming call. This would give each individual caller a high chance of reaching the same worker on subsequent calls while simultaneously providing appropriate workloads to each worker. Again, no record of association between past calls and the workers who serviced them would need to be kept or consulted.
- In any of the above examples if there is no difference in probability desired between the various buckets then equivalent bias values can be used for all buckets.
-
FIG. 2 is a flow diagram illustrating an embodiment of a process for storing data objects in a distributed storage environment. Examples of a distributed storage environment include a network environment, such as the example shown inFIG. 1 , in which a plurality of storage nodes (e.g., servers, etc.) are connected with one or more other nodes via network connections, and a single logical and/or physical node comprising a plurality of storage disks, drives, and/or other storage devices. In the example shown inFIG. 2 , for each object to be stored, an ordered list of candidate storage nodes is generated (202). The object is stored in the first m nodes in the ordered list (204). For example, if candidate nodes A, B, C, and D were available, an ordered list in which the nodes were listed in an order of preference for having the object stored on that node would be generated, e.g., B-D-C-A. If the prescribed number of nodes m on which the object was to be stored was m=2, for example to achieve single redundancy, then in this example nodes B and D would be selected and the object would be stored on those two nodes. - In some embodiments, exceptions may be made to storing the object in the first m nodes in the sorted preference list. For example, exceptions may be made based on availability and capacity constraints as determined at run time. In some embodiments, if a node is determined to not be available, that node will not be selected even if it is within the first m nodes on the list. Instead, in some embodiments, the system will choose the first m available resources on the list.
- In various embodiments, techniques described herein to select one or more storage nodes on which to store a data object are used to find the data object quickly after they have been stored. By knowing the preference list of where the object would be stored the probability is very high in various embodiments that the first location checked will be the correct location.
-
FIG. 3 is a flow diagram illustrating an embodiment of a process for generating an ordered list of candidate storage nodes. In various embodiments, step 202 ofFIG. 2 includes the process ofFIG. 3 . In the example shown, for each object received a biased node-specific hash is computed with respect to that object for each of a plurality of candidate nodes (302). For example, for a receivedobject 1, a biased node-specific hash would be computed for each of nodes A, B, C, and D in the example above. In some embodiments, a set of unbiased node-specific hashes are computed for the object, e.g., A1, B1, C1, and D1, and a node-specific bias is applied to each to generate a set of biased node-specific hashes, e.g., A1′, B1′, C1′, and D1′. Referring further toFIG. 3 , once a biased node-specific hash has been computed for the received object for each of the candidate nodes (302), an ordered list of candidate nodes is generated by sorting the candidate nodes by their respective biased hash values for the object (304). - In some embodiments, a biased hash list of resources is generated by hashing data associated with a resource requirement to a point in space. The resources themselves are represented by points in space and the biased hash list is formed by taking the distances from the resource requirement's point to all of the resource points and ordering these distances from shortest to longest. Adjusting biases of the various resources is done in various embodiments by moving the points in space, either through iterative methods, control systems, or statistical methods such as Correspondence Analysis, Principal Component Analysis, and other techniques.
-
FIG. 4 is a flow diagram illustrating an embodiment of a process for generating a node-specific hash for a candidate node with respect to an object. In various embodiments, step 302 ofFIG. 3 includes the process ofFIG. 4 . In the example shown, a node-specific hash is computed for the object to be stored (402). A node-specific bias is applied to the computed node-specific hash (404) to generate the biased node-specific hash for the object. In various embodiments, the node-specific hash values are biased by one or more techniques, for example by multiplying a hash value computed based on data associated uniquely (or likely so) with the object and data associated uniquely with the node, on the one hand, by a multiplicative bias. An example involving the use of multiplicative biases is described below. In other embodiments, the bias may in addition or instead involve addition or one or more other operations. In various embodiments, the biasing may be based on configured and/or configurable factors, such as differing capacity or remaining capacity, processing power and/or speed, communication speed, geography, network topology, etc. For example, a hash value for a storage node known or determined to be near a location from which a data object has most often and/or most recently been accessed may be biased in a way that makes it more likely that node will be in the first m nodes on the ordered list. - In some embodiments, statistics reflecting the location(s) from which a stored object has been accessed are kept. In some embodiments, object access statistics are evaluated and if an object is accessed solely or primarily and/or most recently from a location or set of locations associated with a particular storage node or subset of nodes, then for that object the node-specific hash value for that node or each of those nodes may be biased in an amount that creates a bias in favor of the object being stored, e.g., in a subsequent “save” operation, at that node (or those nodes). In some embodiments, access information (e.g., statistics) may be observed and an object's association with a particular storage node may be reflected in an object's name and/or an addition or other annotation thereto, and/or in (other) associated metadata. In some embodiments, this portion of the name is used at storage time to compute a bias for the node(s) indicated by such annotation as being associated particularly with the object. In various embodiments, other factors may also be reflected in biasing, including configuration information, storage node statistics (e.g., space remaining, etc.), etc.
- As noted above, techniques described herein facilitate the efficient use of newly added nodes and/or avoid inefficiency (e.g., excessive churn) when nodes are removed (e.g., due to failure, for maintenance, etc.).
-
FIG. 5 is a flow diagram illustrating an embodiment of a process for relocating objects when a storage node is removed. In the example shown, when a node is removed (502), for each object that was stored on the affected node a new ordered list of candidate storage nodes is computed (504). In some embodiments, the process ofFIGS. 2-4 is used to compute new ordered lists. Objects formerly stored on the removed node are copied, for example from other nodes on which they were stored redundantly prior to removal of the removed node, as needed to ensure that each is stored on the first m nodes in the object's new ordered list of nodes (506). Because the hash values and biases are independent between storage nodes, in some embodiments the new ordered list may be formed by simply removing the appropriate storage node(s) from the prior ordered list. -
FIG. 6 is a flow diagram illustrating an embodiment of a process for populating an added storage node. In the example shown, when a node is added (602), for each object eligible to be stored on the added node a new ordered list of candidate storage nodes is computed (604). In some embodiments, the process ofFIGS. 2-4 is used to compute new ordered lists. Objects for which the added node is included in the first m nodes on the object's new ordered list are copied to the added node, for example from other nodes on which they were stored prior to addition of the added node (606). Because the hash values and biases are independent between storage nodes, in some embodiments the new ordered list may be formed by only calculating the values for the newly added node(s) and inserting them in the appropriate location. -
FIG. 7 is a flow diagram illustrating an embodiment of a process for determining biases for candidate nodes. In the example shown, configuration data regarding candidate nodes is received (702). Examples of configuration data include a storage capacity of each node; a location preference value for one or more nodes; an identification of a type, size, source, etc. of data object preferred to be stored on one or more nodes; and an identification of statistics to be gathered and/or used to determine biases. When an object is to be stored, node-bias related information for the object is determined (704). For example, access statistics for the object are evaluated in some embodiments. The node configuration information (702) and object information (704) are used to compute for each candidate node a node-specific bias to be applied to a node-specific hash for that object with respect to that node (706). -
FIG. 8 is a block diagram illustrating an embodiment of a node selection system. In the example shown,node selector 112 ofFIG. 1 includes anode selection engine 802 connected via acommunication interface 804, such as a network interface card or other interface, to a network (not shown inFIG. 8 ), such asInternet 104 ofFIG. 1 . Thenode selection engine 802 in some embodiments is implemented by one or more processors configured to execute software or other computer instructions. Thenode selection engine 802 is configured to receive via communication interface 804 a request to store a data object. Thenode selection engine 802 consults anode configuration datastore 806 and objectaccess statistics 808 to determine for each candidate node, with respect to the object, a biased node-specific hash value (e.g., using the process ofFIG. 7 ). Thenode selection engine 802 sorts the nodes into an order based on the biased hash values (e.g., highest to lowest) and causes the object to be stored on the first m nodes on the list. The location(s) in which the object is stored is stored in alocation index 810. In various embodiments, thenode configuration data 806,access statistics 808, andlocation index 810 are stored in one or more of a memory, a disk or other drive, and/or another storage device comprisingnode selector 112. Anaccess server 812 is configured to receive viacommunication interface 804 requests to access previously stored objects, and to uselocation index 810 to provide in response an indication of one or more nodes on which a requested object is stored. In some embodiments, thelocation index 810 may be omitted because the information is already available through calculation of the biased hash function. In some embodiments, temporary resource availability or capacity limitations may cause objects to skip earlier nodes in the hash list and instead reside on later storage nodes. In some embodiments,location index 810 is used order to quickly find all redundant copies as soon as any one copy is found. - In the equations discussed below, which are used in various embodiments, the following terms and definitions (1) through (12) are used:
- (1) The set of buckets is denoted by K. Individual buckets are denoted with a subscript for bucket number such as K1.
- (2) |K| is the total number of buckets.
- (3) The set of multiplicative biases is denoted as B. Individual biases from the set are denoted with a subscript bucket number such as B1.
- (4) |B| is the maximum value in the B.
- (5) The set of hash functions H are all independent and uniformly distributed over the range of 0 to |H|. Thus, H1(x) is the hash of item x using the first function in the set. Constructing a large set of distinct functions may be difficult. For a sufficiently wide hash function such as SHA2 the hashes can be generated from different bit ranges of the output. Other approaches are to pass different initial values to the same hash function, or to prepend or append data to generate the alternate functions.
- (6) W(B; n) is the probability of Kn having the highest preference given bias set B and randomly selected hash values.
- (7) Pn(B; t) is the probability that BnHn(x)=t given bias set B and a randomly selected item identifier x.
- (8) Dn(B; t)=∫0 t Pn(B; t)dt is the cumulative distribution function associated with Pn(B; t).
- (9) G is the set of unique biases in ascending order. Thus G1 is the minimum element of set B. |G| is the number of elements in set G. Thus G|G| is the maximum element of set B. For convenience, take G0=0.
- (10) Cn is the count of elements of set B that are equal to Gn.
- (11) R(n)=πk=n |G|Gn C
n - (12) Q(x)=πk=1 |K|Gk(x)
- Equation (11) can be viewed as the product of all biases. Equation (12) can be viewed as the cumulative distribution function of all buckets.
- Determining Selection Probabilities from Bias Sets. In various embodiments, the preference list of buckets is ordered such that the higher preference goes to the bucket Kn where BnHn(x)≧BmHm(x) for all m. For the rare case that two or more preferences are exactly equal several different tie breaker schemes can be used. One method would be to take the population that tied for first place and repeat the same hash algorithm with a different set of hash functions from the same family and possibly adjusted biases. Other methods could be to make a selection from the subset using a more traditional hash function or to always allow the lowest numbered bucket to be selected. In other words, the preferred bucket is the one with the largest resulting product when its bias is multiplied by its hash value for a given object.
- For discrete hash functions the probability of a node being tied for selection or being selected is found by:
-
- For continuous hash functions the probability of a node being tied for selection or being selected is found by:
-
- where m is selected such that Gm=B.
- If the desired selection probabilities are known the biases can be determined by using the above equations via the Newton-Raphson method or any other iterative approximation method. Unfortunately these iterative processes can be slow for modem implementations when biases or hash bucket populations change frequently.
- A faster convergence to the desired probability set can be obtained by observing that simultaneously changing several Bn where Bn=G|G| will not change the relative proportions of buckets with lesser biases as long as the changed value still satisfies Bn=G|G|. Thus it is possible to begin with an arbitrary selection for G1, solve for the appropriate G2 to obtain the desired ratio of selection rates, and then repeat in ascending order of bias until all buckets have been assigned an appropriate bias.
- Exact values for probabilities are not ideal for a cost-based bias system that is used for choosing an assignment of data objects to nodes for storage. The desired probabilities should be scaled depending on the relative number of nodes and the percentage desired by each node. While affinities may not sum to 1 their ratios represent relative desired probabilities. Nodes that have a very small affinity for some object (and thus a high cost for serving it) should get even less, since they are not likely to be asked to service requests for that object. Nodes that have a moderately strong affinity for an object should receive slightly more than desired. These are the most likely places for object requests to be serviced. Finally, nodes that have an extremely strong affinity for an object (such as a 90% requested selection rate) should be given much but not all of the data they request in order to preserve properties of robustness in the event of failure and to more evenly distribute load on the network.
- A method of approximating a set of biases from a set of desired selection probabilities while including the desired deviation is needed. Assuming Xn is the requested affinity from an inverse cost metric and
X is the mean of set X, one such method of calculation with desirable properties is -
- The values Y and Z are both constants. This function was chosen because the first term closely approximates the inverse function of the dominant term of the polynomial W(B; n) in the continuous case. This approximation serves to mostly linearize the relation between inset Xn and Wn. The relationship is not quite linear because the root taken is slightly larger than the ideal linearizing root over the full range of the polynomial. This approximation also remains close for the discrete case. The second term provides the correction necessary to adjust the slope such that the properties discussed above are provided. Larger values of Y tend to give more weight to the first term in all cases while larger values of Z tend to give more weight to the first term as the bucket population increases.
- While in a number of examples described in detail herein the resource requirement is storage of a data object and the resources (“buckets”) are candidate storage nodes, the techniques described herein may be used in other contexts as well, including without limitation call center call routing and network resource or other load balancing, as noted above. Likewise, while biasing calculations used in some embodiments have been described in detail, in other embodiments other methods of determining biases and applying same to node- (or other bucket-) specific hash values (or other values computed to achieve a desired distribution) may be used.
- Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Claims (20)
1. A resource selection system, comprising:
a communication interface configured to receive a communication comprising an operation associated with a data object; and
a processor coupled to the communication interface and configured to:
compute for each of a plurality of resources, with respect to the data object, a resource-specific affinity value;
apply a bias to each of at least a subset of the resource-specific affinity values, wherein the bias comprises a function that is monotonically increasing; and
use the resource-specific affinity values to select a resource to fulfill the operation associated with the data object.
2. The system of claim 1 , wherein the function is monotonically increasing in terms of one or more of an input hash and a desired bias.
3. The system of claim 1 , wherein the bias comprises a resource-specific bias.
4. The system of claim 1 , wherein the bias comprises a resource-specific bias determined for a particular resource based at least in part on a data associated with that particular resource.
5. The system of claim 1 , wherein the bias is determined based at least in part on a data associated with the operation.
6. The system of claim 1 , wherein the bias is determined for a particular resource based at least in part on a data reflecting a relationship between the data object and the particular resource.
7. The system of claim 6 , wherein the bias is determined based at least in part on a stored data reflecting a past relationship between the resource and the data object.
8. The system of claim 6 , wherein the bias is determined based at least in part on a stored data reflecting a past relationship between the resource and a user with which the operation is associated.
9. The system of claim 1 , wherein the bias comprises a multiplicative bias.
10. The system of claim 1 , wherein the operation is associated with storing the data object.
11. The system of claim 1 , wherein the operation is associated with finding the data object in storage.
12. The system of claim 1 , wherein the plurality of resources comprises a plurality of storage nodes.
13. The system of claim 1 , wherein a consistent hash function is used to compute the resource-specific affinity values.
14. The system of claim 1 , wherein the processor is further configured to:
detect that a removed resource has been removed from the plurality of resources; and
remove a resource-specific affinity value associated with the removed resource from the resource-specific affinity values.
15. The system of claim 1 , wherein the processor is further configured to:
detect that an added resource has been added to the plurality of resources; and
compute a resource-specific affinity value for the added resource with respect to the data object.
16. A method of selecting a resource, comprising:
computing, using a processor, for each of a plurality of resources, with respect to the data object, a resource-specific affinity value;
applying a bias to each of at least a subset of the resource-specific affinity values, wherein the bias comprises a function that is monotonically increasing; and
using the resource-specific affinity values to select a resource to fulfill the operation associated with the data object.
17. The method of claim 16 , wherein a consistent hash function is used to compute the resource-specific affinity values.
18. The method of claim 16 , further comprising:
detecting that a removed resource has been removed from the plurality of resources; and
removing a resource-specific affinity value associated with the removed resource from the resource-specific affinity values.
19. The method of claim 16 , further comprising:
detecting that an added resource has been added to the plurality of resources; and
computing a resource-specific affinity value for the added resource with respect to the data object.
20. A computer program product for selecting a resource, the computer program product being embodied in a computer readable storage medium and comprising computer instructions for:
computing for each of a plurality of resources, with respect to the data object, a resource-specific affinity value;
applying a bias to each of at least a subset of the resource-specific affinity values, wherein the bias comprises a function that is monotonically increasing; and
using the resource-specific affinity values to select a resource to fulfill the operation associated with the data object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/857,024 US20130283287A1 (en) | 2009-08-14 | 2013-04-04 | Generating monotone hash preferences |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US27429509P | 2009-08-14 | 2009-08-14 | |
US12/806,236 US8438574B1 (en) | 2009-08-14 | 2010-08-05 | Generating monotone hash preferences |
US13/857,024 US20130283287A1 (en) | 2009-08-14 | 2013-04-04 | Generating monotone hash preferences |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/806,236 Continuation US8438574B1 (en) | 2009-08-14 | 2010-08-05 | Generating monotone hash preferences |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130283287A1 true US20130283287A1 (en) | 2013-10-24 |
Family
ID=48183383
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/806,236 Expired - Fee Related US8438574B1 (en) | 2009-08-14 | 2010-08-05 | Generating monotone hash preferences |
US13/857,024 Abandoned US20130283287A1 (en) | 2009-08-14 | 2013-04-04 | Generating monotone hash preferences |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/806,236 Expired - Fee Related US8438574B1 (en) | 2009-08-14 | 2010-08-05 | Generating monotone hash preferences |
Country Status (1)
Country | Link |
---|---|
US (2) | US8438574B1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140156330A1 (en) * | 2012-12-04 | 2014-06-05 | Genesys Telecommunications Laboratories, Inc. | Distributed agent reservation in sip cluster |
US10346217B1 (en) * | 2016-11-21 | 2019-07-09 | Amazon Technologies, Inc. | Best-effort key affinity workload sharding |
WO2020049334A1 (en) * | 2018-09-04 | 2020-03-12 | Telefonaktiebolaget Lm Ericsson (Publ) | A decentralized load-balancing method for resource/traffic distribution |
WO2020210165A1 (en) * | 2019-04-09 | 2020-10-15 | Cisco Techonology, Inc. | Distributed object placement, replication, and retrieval for cloud-scale storage and data delivery |
US11375013B2 (en) | 2020-09-25 | 2022-06-28 | Microsoft Technology Licensing, Llc | Deterministic mapping and uniform routing of items to physical resources using hash values |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9311359B2 (en) | 2013-01-30 | 2016-04-12 | International Business Machines Corporation | Join operation partitioning |
US9317548B2 (en) | 2013-01-30 | 2016-04-19 | International Business Machines Corporation | Reducing collisions within a hash table |
US9367556B2 (en) | 2013-06-14 | 2016-06-14 | International Business Machines Corporation | Hashing scheme using compact array tables |
US9471710B2 (en) | 2013-06-14 | 2016-10-18 | International Business Machines Corporation | On-the-fly encoding method for efficient grouping and aggregation |
US10097628B2 (en) * | 2014-01-29 | 2018-10-09 | Microsoft Technology Licensing, Llc | Resource affinity in a dynamic resource pool |
US9672248B2 (en) | 2014-10-08 | 2017-06-06 | International Business Machines Corporation | Embracing and exploiting data skew during a join or groupby |
US10303791B2 (en) | 2015-03-20 | 2019-05-28 | International Business Machines Corporation | Efficient join on dynamically compressed inner for improved fit into cache hierarchy |
US9922064B2 (en) | 2015-03-20 | 2018-03-20 | International Business Machines Corporation | Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables |
US10650011B2 (en) | 2015-03-20 | 2020-05-12 | International Business Machines Corporation | Efficient performance of insert and point query operations in a column store |
US10108653B2 (en) | 2015-03-27 | 2018-10-23 | International Business Machines Corporation | Concurrent reads and inserts into a data structure without latching or waiting by readers |
US10831736B2 (en) | 2015-03-27 | 2020-11-10 | International Business Machines Corporation | Fast multi-tier indexing supporting dynamic update |
US10540207B1 (en) * | 2018-07-18 | 2020-01-21 | International Business Machines Corporation | Fast, low memory, consistent hash using an initial distribution |
US11429452B2 (en) * | 2020-04-16 | 2022-08-30 | Paypal, Inc. | Method for distributing keys using two auxiliary hashing functions |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6173313B1 (en) * | 1998-06-24 | 2001-01-09 | Oracle Corporation | Methodology for hosting distributed objects at a predetermined node in a distributed system |
US6243788B1 (en) * | 1998-06-17 | 2001-06-05 | International Business Machines Corporation | Cache architecture to enable accurate cache sensitivity |
US20020143965A1 (en) * | 2001-04-03 | 2002-10-03 | International Business Machines Corporation | Server application initiated affinity within networks performing workload balancing |
US20030101164A1 (en) * | 2001-10-12 | 2003-05-29 | Marc Pic | Method of indexing and comparing multimedia documents |
US20080040474A1 (en) * | 2006-08-11 | 2008-02-14 | Mark Zuckerberg | Systems and methods for providing dynamically selected media content to a user of an electronic device in a social network environment |
US20080065618A1 (en) * | 2006-07-25 | 2008-03-13 | Maluf David A | Indexing for rapid database searching |
US20090034537A1 (en) * | 2007-07-31 | 2009-02-05 | Oracle International Corporation | Temporal affinity-based routing of workloads |
US7555544B1 (en) * | 2004-03-17 | 2009-06-30 | Sun Microsystems, Inc. | Implementation of affinities in high availability computer system clusters |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060036747A1 (en) * | 2004-07-28 | 2006-02-16 | Galvin James P Jr | System and method for resource handling of SIP messaging |
US8037169B2 (en) * | 2005-05-18 | 2011-10-11 | Oracle International Corporation | Determining affinity in a cluster |
US7941805B2 (en) * | 2006-08-15 | 2011-05-10 | International Business Machines Corporation | Affinity dispatching load balancer with precise CPU consumption data |
US7689682B1 (en) * | 2006-08-16 | 2010-03-30 | Resource Consortium Limited | Obtaining lists of nodes of a multi-dimensional network |
-
2010
- 2010-08-05 US US12/806,236 patent/US8438574B1/en not_active Expired - Fee Related
-
2013
- 2013-04-04 US US13/857,024 patent/US20130283287A1/en not_active Abandoned
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6243788B1 (en) * | 1998-06-17 | 2001-06-05 | International Business Machines Corporation | Cache architecture to enable accurate cache sensitivity |
US6173313B1 (en) * | 1998-06-24 | 2001-01-09 | Oracle Corporation | Methodology for hosting distributed objects at a predetermined node in a distributed system |
US20020143965A1 (en) * | 2001-04-03 | 2002-10-03 | International Business Machines Corporation | Server application initiated affinity within networks performing workload balancing |
US20030101164A1 (en) * | 2001-10-12 | 2003-05-29 | Marc Pic | Method of indexing and comparing multimedia documents |
US7555544B1 (en) * | 2004-03-17 | 2009-06-30 | Sun Microsystems, Inc. | Implementation of affinities in high availability computer system clusters |
US20080065618A1 (en) * | 2006-07-25 | 2008-03-13 | Maluf David A | Indexing for rapid database searching |
US20080040474A1 (en) * | 2006-08-11 | 2008-02-14 | Mark Zuckerberg | Systems and methods for providing dynamically selected media content to a user of an electronic device in a social network environment |
US20090034537A1 (en) * | 2007-07-31 | 2009-02-05 | Oracle International Corporation | Temporal affinity-based routing of workloads |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10181974B2 (en) * | 2012-12-04 | 2019-01-15 | Genesys Telecommunications Laboratories, Inc. | Distributed agent reservation in SIP cluster |
US20140156330A1 (en) * | 2012-12-04 | 2014-06-05 | Genesys Telecommunications Laboratories, Inc. | Distributed agent reservation in sip cluster |
US9344569B2 (en) | 2012-12-04 | 2016-05-17 | Genesys Telecommunications Laboratories, Inc. | System and method for addition and removal of servers in server cluster |
US9357072B2 (en) | 2012-12-04 | 2016-05-31 | Genesys Telecommunications Laboratories, Inc. | Dialed string matching and call cost minimization in dial plan |
US9590840B2 (en) | 2012-12-04 | 2017-03-07 | Genesys Telecommunications Laboratories, Inc. | Distributed event delivery |
US10129073B2 (en) | 2012-12-04 | 2018-11-13 | Genesys Telecommunications Laboratories, Inc. | System and method for addition and removal of servers in server cluster |
US9185220B2 (en) | 2012-12-04 | 2015-11-10 | Genesys Telecommunications Laboratories, Inc. | Distributed event delivery |
US10382249B2 (en) | 2012-12-04 | 2019-08-13 | Genesys Telecomminucations Laboratories, Inc. | Logging in multithreaded application |
US10346217B1 (en) * | 2016-11-21 | 2019-07-09 | Amazon Technologies, Inc. | Best-effort key affinity workload sharding |
WO2020049334A1 (en) * | 2018-09-04 | 2020-03-12 | Telefonaktiebolaget Lm Ericsson (Publ) | A decentralized load-balancing method for resource/traffic distribution |
WO2020210165A1 (en) * | 2019-04-09 | 2020-10-15 | Cisco Techonology, Inc. | Distributed object placement, replication, and retrieval for cloud-scale storage and data delivery |
US11113114B2 (en) | 2019-04-09 | 2021-09-07 | Cisco Technology, Inc. | Distributed object placement, replication, and retrieval for cloud-scale storage and data delivery |
CN113711186A (en) * | 2019-04-09 | 2021-11-26 | 思科技术公司 | Distributed object placement, replication, and retrieval for cloud-scale storage and data transfer |
US11375013B2 (en) | 2020-09-25 | 2022-06-28 | Microsoft Technology Licensing, Llc | Deterministic mapping and uniform routing of items to physical resources using hash values |
Also Published As
Publication number | Publication date |
---|---|
US8438574B1 (en) | 2013-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8438574B1 (en) | Generating monotone hash preferences | |
AU2015221548B2 (en) | A computer implemented method for dynamic sharding | |
Wolf et al. | DASD dancing: A disk load balancing optimization scheme for video-on-demand computer systems | |
US7447839B2 (en) | System for a distributed column chunk data store | |
US7599941B2 (en) | Transparent redirection and load-balancing in a storage network | |
JP5798248B2 (en) | System and method for implementing a scalable data storage service | |
US8214388B2 (en) | System and method for adding a storage server in a distributed column chunk data store | |
US6832248B1 (en) | System and method for managing usage quotas | |
US7299250B2 (en) | File availability in distributed file storage systems | |
US20100057935A1 (en) | Record medium with a load distribution program recorded thereon, load distribution method, and load distribution apparatus | |
EP1564638B1 (en) | A method of reassigning objects to processing units | |
US7457935B2 (en) | Method for a distributed column chunk data store | |
US20130318123A1 (en) | Mechanism for co-located data placement in a parallel elastic database management system | |
US20080109450A1 (en) | System and program for maintaining a namespace of filesets accessible to clients over a network | |
JP5203733B2 (en) | Coordinator server, data allocation method and program | |
US20070220028A1 (en) | Method and system for managing load balancing in data-processing system | |
CN107465729B (en) | Virtual node copy adjusting system and method based on access time ranking | |
Silberstein et al. | Efficient bulk insertion into a distributed ordered table | |
CN116982035A (en) | Measurement and improvement of index quality in distributed data systems | |
US7647592B2 (en) | Methods and systems for assigning objects to processing units | |
US8161076B1 (en) | Generation and use of a data structure for distributing responsibilities among multiple resources in a network storage system | |
Nwe et al. | A consistent replica selection approach for distributed key-value storage system | |
Chen et al. | Revisiting consistent hashing with bounded loads | |
US20080091740A1 (en) | Method for managing a partitioned database in a communication network | |
Yue | Distributing Data with Zero Migration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TRANSLATTICE, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LYLE, MICHAEL P.;ROSS, ROBERT F.;REEL/FRAME:034988/0467 Effective date: 20100928 |
|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TRANSLATTICE, INC.;REEL/FRAME:035190/0742 Effective date: 20150311 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |