WO2015179092A1

WO2015179092A1 - System and method for supporting a distributed data structure in a distributed data grid

Info

Publication number: WO2015179092A1
Application number: PCT/US2015/028335
Authority: WO
Inventors: Brian Keith OLIVER; Jonathan Knight
Original assignee: Oracle International Corporation
Priority date: 2014-05-21
Filing date: 2015-04-29
Publication date: 2015-11-26

Abstract

A system and method supports a distributed data structure in a distributed data grid. The grid includes a plurality of buckets, wherein each bucket is configured with a capacity to contain a number of elements in the distributed data structure. The distributed data grid includes a state owner process configured to hold state information for the distributed data structure and provides the state information for the distributed data structure to a client process. The distributed queue can include a queue of buckets stored on a plurality of processes, wherein each bucket is configured to contain a number of elements of the distributed queue. The distributed queue can include a named queue that holds a local version of state information for the distributed queue, wherein said local version of the state information contains a head pointer and a tail pointer to the queue of buckets in the distributed data grid.

Description

SYSTEM AND METHOD FOR SUPPORTING A DISTRIBUTED DATA STRUCTURE IN A

DISTRIBUTED DATA GRID

Copyright Notice:

[0001] A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. Field of Invention:

[0002] The present invention is generally related to computer systems, and is particularly related to supporting a distributed data grid.

Background:

[0003] Modern computing systems, particularly those employed by larger organizations and enterprises, continue to increase in size and complexity. Particularly, in areas such as Internet applications, there is an expectation that millions of users should be able to simultaneously access that application, which effectively leads to an exponential increase in the amount of content generated and consumed by users, and transactions involving that content. Such activity also results in a corresponding increase in the number of transaction calls to databases and metadata stores, which may have a limited capacity to accommodate that demand. This is the general area that embodiments of the invention are intended to address.

Summary:

[0004] Described herein are systems and methods which support a distributed data structure in a distributed data grid. The distributed data grid includes a plurality of buckets, wherein each said bucket is configured with a capacity to contain a number of elements of a distributed data structure. Furthermore, the distributed data grid includes a state owner process, which is configured to hold state information for the distributed data structure and provides the state information for the distributed data structure to a client process.

[0005] Described herein are systems and methods which support a distributed queue in a distributed data grid. The distributed data grid includes a plurality of buckets, wherein each said bucket is configured with a capacity to contain a number of elements of a distributed data queue. Furthermore, the distributed queue can include a named queue that holds a local version of a state information for the distributed queue, wherein said local version of the state information contains a head pointer and a tail pointer to the queue of buckets in the distributed data grid. Brief Description of the Figures:

[0006] Figure 1 is an illustration of a data grid cluster in accordance with various embodiments of the invention.

[0007] Figure 2 shows an illustration of supporting a distributed data structure in a distributed data grid, in accordance with an embodiment of the invention.

[0008] Figure 3 shows an illustration of supporting a bucket in a distributed data structure, in accordance with an embodiment of the invention.

[0009] Figure 4 shows an illustration of performing an operation on a distributed data structure using a client process, in accordance with an embodiment of the invention.

[0010] Figure 5 illustrates an exemplary flow chart for supporting a distributed data structure in a distributed data grid, in accordance with an embodiment of the invention.

[0011] Figure 6 shows an illustration of supporting a distributed queue in a distributed data grid, in accordance with an embodiment of the invention.

[0012] Figure 7 shows an illustration of a bucket in a distributed queue, in accordance with an embodiment of the invention.

[0013] Figure 8 shows an illustration of offering an element to a distributed queue, in accordance with an embodiment of the invention.

[0014] Figure 9 shows an illustration of an interaction diagram for offering an element to a distributed queue, in accordance with an embodiment of the invention.

[0015] Figure 10 shows an illustration of an interaction diagram for offering an element to a bucket in a distributed queue, in accordance with an embodiment of the invention.

[0016] Figure 11 shows an illustration of polling an element from a distributed queue, in accordance with an embodiment of the invention.

[0017] Figure 12 shows an illustration of an interaction diagram for polling or peeking an element from a distributed queue, in accordance with an embodiment of the invention.

[0018] Figure 13 shows an illustration of an interaction diagram for polling or peeking an element from a bucket in a distributed queue, in accordance with an embodiment of the invention.

[0019] Figure 14 illustrates an exemplary flow chart for supporting a distributed queue in a distributed data grid, in accordance with an embodiment of the invention.

Detailed Description:

[0020] Described herein are systems and methods that can support a distributed data structure in a distributed data grid. The distributed data grid includes a plurality of buckets, wherein each bucket is configured with a capacity to contain a number of elements of a distributed data structure. Furthermore, the distributed data grid includes a state owner process, which is configured to hold state information for the distributed data structure and provides the state information for the distributed data structure to a client process. Distributed Data Grid

[0021] In accordance with an embodiment, as referred to herein a "data grid cluster", or "data grid", is a system comprising a plurality of computer servers which work together to manage information and related operations, such as computations, within a distributed or clustered environment. The data grid cluster can be used to manage application objects and data that are shared across the servers. Preferably, a data grid cluster should have low response time, high throughput, predictable scalability, continuous availability and information reliability. As a result of these capabilities, data grid clusters are well suited for use in computational intensive, stateful middle-tier applications. Some examples of data grid clusters, e.g., the Oracle Coherence data grid cluster, can store the information in-memoryto achieve higher performance, and can employ redundancy in keeping copies of that information synchronized across multiple servers, thus ensuring resiliency of the system and the availability of the data in the event of serverfailure. For example, Coherence provides replicated and distributed (partitioned) data management and caching services on top of a reliable, highly scalable peer-to-peer clustering protocol.

[0022] An in-memory data grid can provide the data storage and management capabilities by distributing data over a number of servers working together. The data grid can be middleware that runs in the same tier as an application server or within an application server. It can provide management and processing of data and can also push the processing to where the data is located in the grid. In addition, the in-memory data grid can eliminate single points of failure by automatically and transparently failing over and redistributing its clustered data management services when a server becomes inoperative or is disconnected from the network. When a new server is added, or when a failed server is restarted, it can automatically join the cluster and services can be failed back over to it, transparently redistributing the cluster load. The data grid can also include network-level fault tolerance features and transparent soft re-start capability.

[0023] In accordance with an embodiment, the functionality of a data grid cluster is based on using different cluster services. The cluster services can include root cluster services, partitioned cache services, and proxy services. Within the data grid cluster, each cluster node can participate in a number of cluster services, both in terms of providing and consuming the cluster services. Each cluster service has a service name that uniquely identifies the service within the data grid cluster, and a service type, which defines what the cluster service can do. Other than the root cluster service running on each cluster node in the data grid cluster, there may be multiple named instances of each service type. The services can be either configured by the user, or provided by the data grid cluster as a default set of services.

[0024] Figure 1 is an illustration of a data grid cluster in accordance with various embodiments of the invention. As shown in Figure 1 , a data grid cluster 100, e.g. an Oracle Coherence data grid, includes a plurality of cluster members (or server nodes) such as cluster nodes 101-106, having various cluster services 1 1 1-1 16 running thereon. Additionally, a cache configuration file 1 10 can be used to configure the data grid cluster 100. In an embodiment, the data grid cluster 100 is an in-memory data grid cluster which provides the data storage and management capabilities by distributing data in the memory of cluster nodes working together. Distributed Data Structure

[0025] In accordance with an embodiment of the invention, a distributed data grid can support a distributed data structure. For example, the distributed data structure can be a distributed queue, a distributed set, a distributed list, and/or a distributed stack.

[0026] Figure 2 shows an illustration of supporting a distributed data structure in a distributed data grid, in accordance with an embodiment of the invention. As shown in Figure 2, a distributed data structure 201 can include a plurality of buckets (e.g. buckets 21 1-216), each of which is configured to contain a number of elements. In use the buckets may contain zero, one, or more elements at different times. The elements can be data elements comprising, for example, units of data, calls, requests, responses, messages, transactions, and/or events, depending upon the application of the distributed data structure.

[0027] Furthermore, a state owner process 202 in the cluster 200 can be responsible for holding the state 203 for the distributed data structure 201. The distributed data structure 201 can perform different operations on the different buckets 21 1-216 separately, instead of working on the individual elements directly.

[0028] As shown in Figure 2, a cluster of processes, such as the processes 221 -223 in a distributed data grid 200, can maintain (or own) the plurality of buckets 21 1-216. For example, the buckets 21 1 -216 can be evenly distributed across the cluster of processes 221 -223, with each process being responsible for maintaining (or owning) one or more buckets. Thus, the total size of the distributed data structure 201 can exceed the maximum storage capacity of a single process, since each process 221-223 is responsible for storing only a part of the contents of the distributed data structure 201 .

[0029] In accordance with an embodiment of the invention, the distributed data structure 201 allows multiple user processes to simultaneously perform different operations on the different buckets 21 1-216 in the distributed data structure 201. As shown in Figure 2, an operation 204 can be performed on the bucket 21 1 , while another operation 206 is performed on the bucket 216. Thus, the system can reduce contention on the distributed data structure 201 , since the demand for accessing the distributed data structure 201 can be spread over multiple buckets.

[0030] Furthermore, the distributed data structure 201 can ensure that only one process is able to access the elements in a single bucket at any time. For example, the system can apply a lock mechanism on each individual bucket that is under contention. Alternatively, the system can take advantage of a request queue, which allows the different processes to wait for processing in order.

[0031] As shown in Figure 2, the bucket 21 1 is locked, while an operation 204 is being performed on a bucket 21 1. Other operations on the same bucket 21 1 , such as the operation 205, may need to wait until the locking operation 204 is completed. Similarly, the bucket 216 is locked while an operation 206 is being performed on a bucket 216, and other operations on the same bucket 216, such as the operation 207, may need to wait until the locking operation 206 is completed.

[0032] Figure 3 shows an illustration of supporting a bucket in a distributed data structure, in accordance with an embodiment of the invention. As shown in Figure 3, a bucket 301 in the distributed data grid 300 is configured to contain a number of elements and may contain one or more elements, e.g. elements 31 1 -315 in an internal data structure 310, for a distributed data structure. Additionally, the bucket 301 can maintain various bucket states 303, which can be used for accessing the elements 31 1 -315 stored in the bucket 301.

[0033] In accordance with an embodiment of the invention, the bucket 301 is configured with a capacity, which is the maximum number of elements that it can contain. The bucket 301 is thus configured to contain a number of elements of a distributed data structure. The bucket 301 can hold zero to more elements (up to the capacity) while in use. Furthermore, the capacity of the bucket 301 can be tuned for supporting different distributed data structures (in order to improve the performance for specific data access patterns).

[0034] Additionally, the bucket 301 can be replicated to other nodes in the distributed data grid 300. As shown in Figure 3, a back-up bucket 302, which contains one or more elements (e.g. elements 321-325 in an internal data structure 320) and the associated bucket state 304, can take over when the bucket 301 is lost.

[0035] Figure 4 shows an illustration of performing an operation on a distributed data structure using a client process, in accordance with an embodiment of the invention. As shown in Figure 4, a distributed data structure 401 in a distributed data grid 400 can include a plurality of buckets, e.g. buckets 41 1-416. Additionally, a state owner process 402 can be responsible for holding the state information 403 for the distributed data structure 401 .

[0036] Furthermore, a client process 404 can use (or include) a named data structure 410, which holds a local copy of the state information 405 for the distributed data structure 401 . For example, the system can initialize the state information 405, when the client process 404 first connects to the distributed data structure 401.

[0037] In accordance with an embodiment of the invention, multiple client processes can access the distributed data structure 401 simultaneously. The local copy of the state information 405 on the client process 404 may become stale when another client process has changed the state 403 of the distributed data structure 401.

[0038] Using distributed data structure 401 , the logic in the named data structure 410 (i.e. used by the client process 404) can take into account that the state 405 may be stale. Thus, there is no need for refreshing the state information 405 before every operation.

[0039] As shown in Figure 4, when the state 405 becomes stale, the named data structure 410 (on the client process 404) can send a message to the state owner process 402 for refreshing the state 405. After receiving the message, the state owner process 402 can send the new state 403 back to the client process 404, which in turn can perform the operation 420 on the bucket 416 via the process 421.

[0040] In accordance with an embodiment of the invention, once a bucket is created, the bucket may not be removed from the distributed data structure 401 , even when the bucket becomes empty (i.e. when all of the elements have been polled or removed from the bucket). This can be beneficial for supporting the operation of multiple client processes.

[0041] Figure 5 illustrates an exemplary flow chart for supporting a distributed data structure in a distributed data grid, in accordance with an embodiment of the invention. As shown in Figure 5, at step 501 , the system can provide a plurality of buckets in the distributed data grid, wherein each said bucket is configured with a capacity to contain a number of elements of a distributed data structure. The buckets may contain zero or more elements (up to the capacity) of a distributed data structure at different points during use. Furthermore, at step 502, a state owner process can hold state information for the distributed data structure. Then, at step 503, the state owner process can provide the state information for the distributed data structure to a client process.

Distributed Queue

[0042] In accordance with an embodiment of the invention, a distributed data grid can support a distributed queue. For example, the distributed queue is a queue of queues (or buckets) with each bucket being a sub-queue in a parent queue.

[0043] Figure 6 shows an illustration of supporting a distributed queue in a distributed data grid, in accordance with an embodiment of the invention. As shown in Figure 6, a distributed queue 601 can include a queue of buckets, e.g. buckets 61 1 -616, in a distributed data grid 600. Each of the plurality of buckets (e.g. buckets 61 1-616) is configured to contain a number of elements. In use the buckets may contain zero, one, or more elements at different times. The elements can be data elements comprising, for example, units of data, calls, requests, responses, messages, transactions, and/or events, depending upon the application of the distributed data structure.

[0044] Additionally, a state owner process 602 can be responsible for holding the state 603 for the distributed queue 601. For example, the state 603 can include a head pointer to the current head bucket 61 1 in the distributed queue 601 and a tail pointer to the tail bucket 616 in the distributed queue 601.

[0045] In accordance with an embodiment of the invention, a client process 604 can obtain the current state 603 of the distributed queue 601 from the queue state owner 602, and perform various operations (such as the offer and poll operations) on the distributed queue 601 .

[0046] As shown in Figure 6, the client process 604, which can be accessed by the user processes 606-607, can use (or include) a named queue 610. The named queue 610 can maintain a local copy of the queue state 605, which includes a head pointer and a tail pointer in its own view. For example, the head pointer in the queue state 605 may point to the head bucket 61 1 and the tail pointer in the queue state 605 may point to the tail bucket 616.

[0047] The queue state 605 structure can be initialized when the client process 604 first connects to the distributed queue 601. Furthermore, the client process 604 does not need to refresh its queue state 605 before every operation, since the logic in named queue 610 (for performing the offer and poll operations) can take into account that the queue state 605 maintained may be stale.

[0048] In accordance with an embodiment of the invention, the client process 604 can perform an offer operation on the distributed queue 601 , by offering (or adding) one or more elements to the tail bucket 616 (i.e. the end) of the distributed queue 601. Also, the client process 604 can perform a poll operation on the distributed queue 601 , by polling (or removing) one or more elements from the head bucket 61 1 (i.e. the front) of the distributed queue 601 . Additionally, the client process 604 can perform a peek operation on the distributed queue 601 to obtain the value of one or more elements (without removing the elements from the distributed queue 601 ).

[0049] Additionally, different client processes may attempt to perform different operations simultaneously. Implicit transactions may be formed around the head bucket 61 1 and the tail bucket 616 of the distributed queue 601 for handling the multiple attempts by different client processes. The system can apply a lock mechanism on the head bucket 61 1 and/or the tail bucket 616, which are under contention. Alternatively, the system can take advantage of a request queue, which allows the different processes to wait for processing in order.

[0050] Figure 7 shows an illustration of a bucket in a distributed queue, in accordance with an embodiment of the invention. As shown in Figure 7, a bucket 701 in a distributed queue 700 contains a queue of elements, e.g. elements 71 1-715 in a queue 710.

[0051] Additionally, the bucket 701 can maintain various bucket states 702, which can be used for accessing the different elements 71 1 -715 stored in the bucket 701. The bucket states 702 can include a pointer to a head element 71 1 in the bucket 701 and a pointer to a tail element 716 in the bucket 701 .

[0052] Furthermore, the elements71 1 -715 in the bucket 701 can be stored using the same process (which stores the bucket 701 ). When the bucket 701 is first created, the pointers to the head and tail elements can contain a special value indicating that the bucket 701 is empty. Also, the bucket 701 can contain a flag to indicate whether the bucket 701 is the last bucket in the queue.

[0053] Figure 8 shows an illustration of offering an element to a distributed queue, in accordance with an embodiment of the invention. As shown in Figure 8, a distributed queue 801 in a distributed data grid 800 can include a queue of buckets, e.g. buckets 81 1-816, each of which may be configured with a capacity to hold a number of elements and may hold zero or more elements during use. Additionally, a state owner process 802 can be responsible for holding the state information in the queue state 803 for the distributed queue 801 .

[0054] In accordance with an embodiment of the invention, a client process 804 may attempt to offer (or add) one or more elements to the tail bucket as indicated by the queue state 805 in the named queue 810. For example, the client process 804 may assume the bucket 815 to be the tail bucket, since the tail pointer in the queue state 805 points to the bucket 815. Accordingly, the client process 804 can send an offer message (containing a set of elements) to the process 821 , which maintains the bucket 815.

[0055] Then, the client process 804 can receive a response to its message. The contents of the response can signify to the client process 804 whether the offer operation was successful. For example, if the response is empty then all of the offered elements were added to the tail bucket. On the other hand, if the response contains any elements, then these elements were not offered (or added) to the bucket.

[0056] There can be different reasons that the system fails to offer (or add) one or more elements to the bucket. For example, the perceived tail bucket may no longer be current (i.e. the queue state 805 is stale), in which case all of the elements are returned in the response. Alternatively, the current tail bucket may not have enough remaining capacity for holding all of the offered elements, in which case the remaining elements are returned in the response message.

[0057] In either case, when the response to an offer contains un-offered elements, the client process 804 may need to know what the next tail bucket ID is. The process 804 can send a message to the state owner process 802, which owns the queue state 803, to inform the state owner process 802 that the queue state 803 should point to the next tail bucket. [0058] When there are multiple client processes, the client process 804 may not know whether another client process has already moved the tail ID. The system can perform the operation to move the tail ID as an atomic compare and set operation.

[0059] As shown in Figure 8, when the client process 804 asks the state owner process 802 to increment the tail ID, the client process 804 can also send the tail ID in the queue state 805 to the state owner process 802.

[0060] When the tail I D in the queue state 805 matches the current tail I D in the queue state 803, the system can change the queue state 803 by moving the tail to the next bucket ID. Then, the system can send the tail ID in the updated queue state 803 to the client process 804 as a response.

[0061] When the tail ID in the queue state 805 does not match the current tail ID in the queue state 803, the queue state 803 may have either been incremented by this message, or had previously been incremented. Then, the system can send the tail ID in the current queue state 803 to the client process 804 as a response.

[0062] For example, if the tail ID on the client side is 10 and the tail ID on the owner side is 10, then the tail ID on the owner side can be moved to 1 1 , which value may be returned to the client. If the tail ID on the client side is 10 and the tail I D on the owner side is already 1 1 , then the queue state owner may return the value 1 1 to the client.

[0063] As shown in Figure 8, the queue state 803 points to the new tail bucket 816. After the client process 804 receives the correct tail I D, the client process 804 can send an offer message, containing any remaining elements to be offered, to the process that owns the new tail bucket 816.

[0064] Furthermore, the new tail bucket 816 may not be able to hold all of the offered elements. For example, another client process may have already filled the new tail bucket 816, or the new tail bucket 816 does not have sufficient capacity to hold all of the elements. Then, the system can repeat the above process of incrementing the tail bucket ID and offering the remaining elements, until all of the elements have been successfully offered (or added) to the distributed queue 801.

[0065] Figure 9 shows an illustration of an interaction diagram for offering an element to a distributed queue, in accordance with an embodiment of the invention. As shown in Figure 9, a client process 901 can use a named queue 902 for offering (or adding) one or more elements to a distributed queue in a distributed data grid. The distributed data grid includes a cluster member 905, which is the queue state owner 905 that owns the queue state 906.

[0066] At step 91 1 , a named queue 902 can receive a message (e.g. from a client process 901 ) to offer a new value to the tail of the distributed queue. The named queue 902, which is associated with the client process 901 , can maintain a queue state 903, which is a local version of the queue state 906 for the distributed queue.

[0067] The named queue 902 may assume that the information in the local queue state 903 is correct. At steps 912-913, the named queue 902 can request and receive the state information for the queue, such as the tail ID and the queue version number, from the local queue state 903.

[0068] Then, the named queue 902, which is associated with the client process 901 , can send an offer message to the cluster member, which owns the tail bucket, accordingly to its local view. At step 914, the named queue 902 can offer the received value to the cluster member904, which owns the tail bucket based on the tail ID received from the queue state 906.

[0069] If the value is offered (or added) successfully to the tail bucket, at step 915, the cluster member 904 can return an empty set, and at step 916, the named queue 902 can return an OFFE _ ACCEPTED message to the client process 901 .

[0070] Otherwise, the value is not successfully offered (or added) to the tail bucket, e.g. due to either the bucket being full or a mismatch in the queue state version. At step 915, the cluster member 904 can return a set of un-offered elements back to the named queue 902. Then the named queue 902 may attempt the offer operation again, e.g. by sending a message to the next tail bucket. The named queue 902 may try to add the received to the tail of the distributed queue repeatedly (e.g. in a loop 910), until either when the distributed queue is full or the received value is successfully offered (or added).

[0071 ] Within the loop 910, the named queue 902 may need to refresh or update its view of the tail ID and the queue version, if a previous offer fails due to a full bucket or a version mismatch.

[0072] At step 917, the named queue 902 can send a request to the local queue state 903 to refresh the tail ID. At step 918, the local queue state 903 may try to obtain the next tail ID from the cluster member 905, which is the owner of the queue state 906. For example, the message may contain the tail ID and queue version information based on such information in the local queue state 903.

[0073] At steps 919-920, the cluster member 905 can request and obtain the state information in the owner version of the queue state 906. Thus, the cluster member 905 can determine whether the queue state should be refreshed, e.g. by comparing the information received from the local queue state 903 and queue state 906.

[0074] For example, the local queue state 903 is stale when the information in the local queue state 903 does not match the information in the owner queue state 906. In such a case, the state information in the owner queue state 906 may be returned to the local queue state 903 (i.e. the local queue state 903 is refreshed).

[0075] When the information in the local queue state 903 matches the information in the queue state 906, at step 921 , the cluster member 905 can update the queue state 906, e.g. by moving the tail to the next bucket ID. Then, at step 922, the queue state 906 can provide the updated state information of the queue to the cluster member 905. For example, the queue version number may be incremented, if the tail ID moves to the next Bucket ID by wrapping back around to the first bucket ID or any other reused buckets.

[0076] Additionally, a check can be performed to determine whether the queue is full. For example, a queue is full if incrementing the tail ID would make the tail ID equivalent to the current head ID.

[0077] As shown in Figure 9, at steps 923-924, the cluster member 905 can provide the state information of the queue (refreshed or updated) to the queue state 903, which in turn can provide the state information of the queue to the named queue 902. For example, such state information may include either a QUEUE_FULL message or an updated tail ID with an updated queue version number.

[0078] At step 925, when the distributed queue is full, the named queue 902 can send a QUEUE_FULL message to the client process 901 . Otherwise, at steps 926-927, the named queue 902 can request and obtain the state information of the queue from the queue state 903. Furthermore, at steps 928-929, the named queue 902 can offer (or add) the value to the cluster member 904 (or any other cluster member that owns the bucket with the updated tail ID).

[0079] As shown in Figure 9, the named queue 902 may try to offer (or add) the received element to the tail of the distributed queue repeatedly, e.g. in the loop 910, until the received value is successfully added. At step 930, the named queue 902 can send a QUEUE_SUCCESS message to the client process 901 after the received value is successfully added.

[0080] Figure 10 shows an illustration of an interaction diagram for offering an element to a bucket in a distributed queue, in accordance with an embodiment of the invention. As shown in Figure 10, a caller (e.g. a named queue 1001 ) can be used for offering (or adding) one or more elements to a particular bucket 1003, which is owned by a cluster member 1002 in a distributed data grid.

[0081] At step 101 1 , the named queue 1001 can offer a value to the cluster member 1002 based on the state information, such as the bucket ID and the queue version information. At steps 1012-1013, a new bucket may be created if the bucket 1003 with the specific bucket ID does not exist.

[0082] At steps 1014-1015, the cluster member 1002 can check whether the bucket 1003 is full. When the bucket is full, at step 1016, the cluster member 1002 can send a BUCKET_FULL message to the named queue 1001. Otherwise, at steps 1017-1018, the cluster member 1002 can obtain the state information from the bucket 1003.

[0083] At step 1019, if the queue version information in the view of the named queue 1001 does not match the queue version information associated with the bucket 1003, the named queue 1001 can send an OFFE _FAl LED message to the named queue 1001.

[0084] At steps 1020-1021 , the named queue 1001 can add the value into the bucket 1003. Then, at step 1022, the named queue 1001 can send an OFFER_SUCCESS message to the named queue 1001 .

[0085] Figure 11 shows an illustration of polling an element from a distributed queue, in accordance with an embodiment of the invention. As shown in Figure 1 1 , a distributed queue 1 101 in a distributed data grid 1 100 can include a queue of buckets, e.g. buckets 1 1 1 1 -1 1 16, each of which may include one or more elements. Additionally, a state owner process 1 102 can be responsible for holding the state information of the queue state 1 103 for the distributed queue 1 101 .

[0086] In accordance with an embodiment of the invention, a client process 1 104 can perform a poll or peek action on the head bucket of the distributed queue 1 101. For example, a poll action may return and remove an element (such as the head element) from the distributed queue 1 101. On the other hand, a peek action may return the value of the head element of the distributed queue 1 101 without removing the head element.

[0087] As shown in Figure 1 1 , the client process 1 104 assumes the bucket 1 1 1 1 to be the head bucket in the distributed queue 1 101 . In order to poll from the distributed queue 1 101 , the client process 1 104 sends a message to the process 1 121 , which owns the head bucket 1 1 1 1. Then, the client process 1 104 can receive a response containing either the element from the head of the distributed queue 1 101 or an indicator that the bucket 1 1 1 1 is empty.

[0088] If the response indicates that the bucket 1 1 1 1 is empty (providing that the distributed queue 1 101 itself is not empty), then the queue state 1 105, which is maintained by the named queue 1 1 10, is stale. Thus, the queue state 1 105 may need to be refreshed so that the head bucket ID is updated to the next head ID.

[0089] As shown in Figure 1 1 , the queue state 1 104 can send a message to the state owner process 1 102, which owns the queue state 1 103, to instruct the queue state 1 103 to increment the head ID. The operation of moving to the next head ID (i.e. refresh the queue state 1 105) for the poll or peek operation can be an atomic compare and set operation, where the queue state 1 103 is changed only if the current head ID matches the clients head ID. The client process 1 104 can then receive a response containing the head ID from the queue state 1 103, which has either been incremented by this message, or had previously been incremented.

[0090] Finally, the client process 1 104 can resend the poll/peek message to the process owning the new head bucket.

[0091] Figure 12 shows an illustration of an interaction diagram for polling or peeking an element from a distributed queue, in accordance with an embodiment of the invention. As shown in Figure 12, a client process 1201 can use a named queue 1202 for performing an action, such as a poll or peek action, on one or more elements in a distributed queue 1200 in a distributed data grid. The distributed queue 1200 includes a cluster member 1205, which is the queue state owner 1205 that owns the queue state 1206.

[0092] At step 121 1 , a named queue 1202 can receive a message (e.g. from a client process 1201 ) to perform a poll or a peek action on the head element of the distributed queue 1200. The named queue 1202, which is associated with the client process 1201 , can maintain a local version of the queue state 1203 for the distributed queue 1200.

[0093] At steps 1212-1213, the named queue 1202 can check the queue state 1203 to obtain the information on whether the distributed queue 1200 is empty. When the distributed queue 1200 is empty, at step 1214, the named queue 1202 can send a QUEUE_EM PTY message to the client process 1201.

[0094] Furthermore, if the queue is not empty, the client process 1201 assumes that it has the current head ID and the queue version number. At steps 1215-1216, the named queue 1202 can request and receive the state information, such as the head ID and the queue version number, from the local queue state 1203. The named queue 1202 can use such information for performing the poll or peek action on the head element of the distributed queue 1200.

[0095] Then, at step 1217, the named queue 1202 can perform a poll ora peek action on the cluster member 1204, which owns the head bucket based on the received head ID.

[0096] If the action is performed successfully on the head bucket, at step 1218, the cluster member 1204 can return one or more elements to the named queue 1202, and at step 1219, the named queue 1202 can return the elements to the client process 1201.

[0097] Otherwise, if the action is not performed successfully on the head bucket, at step 1218, the cluster member 1204 can return an empty set to the named queue 1202. In such a case, the client process 1201 may need to update its view of the head ID and the queue version information.

[0098] As shown in Figure 12, the named queue 1202 may try to perform a poll or peek action on the head of the distributed queue 1200 repeatedly (e.g. in a loop 1210), until either when the distributed queue 1200 is empty or when the action is successfully performed.

[0099] Within the loop 1210, at step 1220, the named queue 1202 can request the local queue state 1203 to refresh the head ID and the queue version number, and at step 1221 the local queue state 1203 may try to obtain the next head ID from the cluster member 1205, which is the owner of the queue state 1206.

[00100] At steps 1222-1223, the cluster member 1205 can request and obtain the owner version of the state information in the queue state 1206. For example, the message may contain the head ID and the queue version information based on the information in the local queue state 1203. Thus, the cluster member 1205 can determine whether the queue state is stale, e.g. by comparing the information received from the local queue state 1203 and the queue state 1206.

[00101] For example, if the information in the local queue state 1203 does not match the information in the queue state 1206, the queue state 1203 is stale. In such a case, the state information in the queue state 1206 may be returned to the queue state 1203 (i.e. the queue state 1203 is refreshed).

[00102] When the information in the local queue state 1203 matches the information in the queue state 1206, at step 1224, the cluster member 1205 can update the queue state 1206, e.g. by moving the head to the next bucket ID. Then, at step 1225, the queue state 1206 can provide the updated state information of the queue state 1206 to the cluster member 1205. Additionally, a check can be performed to determine whether the queue is empty. For example, the distributed queue 1200 is an empty queue when the head ID is the same as the tail ID, in which case the head ID may not be incremented over the tail ID.

[00103] At steps 1226-1227, the cluster member 1205 can provide the state information of the queue (refreshed or updated) to the queue state 1203, which in turn can provide the queue state information to the named queue 1202. For example, such state information may include either a QUEUE_EM PTY message or an updated head ID with an updated queue version number.

[00104] At steps 1228-1229, the named queue 1202 can check the queue state 1203 to determine whether the distributed queue 1200 is empty (since multiple operations may be performed on the distributed queue 1200). At step 1230, when the distributed queue 1200 is empty, the named queue 1202 can send the QUEUE_EM PTY message to the client process 1201.

[00105] Otherwise, at steps 1231-1232, the named queue 1202 can request and obtain the updated state information of the queue state 1203, from the queue state 1203. At steps 1233- 1234, the named queue 1202 can perform a poll or a peek action on the cluster member 1204 (or any other cluster member that owns the bucket with the updated head ID).

[00106] As shown in Figure 12, the named queue 1202 may try to perform the poll or a peek action on the head of the distributed queue 1200 repeatedly (e.g. in a loop 1210), until the action is successfully performed. At step 1235, the named queue 1202 can return the elements to the client process 1201 .

[00107] Figure 13 shows an illustration of an interaction diagram for polling or peeking an element from a bucket in a distributed queue, in accordance with an embodiment of the invention. As shown in Figure 13, a caller (e.g. a named queue 1301 ) can be used for performing an action, such as a poll or peek operation, on a bucket 1303, which is owned by a cluster member 1302 in a distributed data grid.

[00108] At step 131 1 , the named queue 1301 can perform a poll or peek action on the cluster member 1302 based on the state information, such as the bucket ID and the queue version number. [00109] At steps 1312-1313, the cluster member 1302 can check whether the bucket 1303 is empty. At step 1314, the cluster member 1302 can send a BUCKET_EM PTY message to the named queue 1301 , if the bucket 1303 is empty.

[00110] At steps 1315-1316, the cluster member 1302 can obtain the state information of the bucket from the bucket 1003.

[00111] At step 1317, if the queue version information in the view of the named queue 1301 does not match the queue version information associated with the bucket 1303, the cluster member 1302 can send a POLL_FAlLED message to the named queue 1301.

[00112] Otherwise, at steps 1318-1324, the cluster member 1302 can perform a poll or peek action on the bucket 1303. For example, if the action is a poll action, the bucket 1303 can return the value of the bucket element 1304 before removing or deleting the bucket element 1304. Additionally, the bucket can update the queue version if the buck become empty afterthe bucket element 1304 is removed or deleted. Then, the cluster member 1302 can provide the element to the named queue 1301 . On the other hand, if the action is a peek action, the bucket 1303 can return the value of the bucket element 1304 without removing or deleting the bucket element 1304.

[00113] Figure 14 illustrates an exemplary flow chart for supporting a distributed queue in a distributed data grid, in accordance with an embodiment of the invention. As shown in Figure 14, at step 1401 , the system can provide a plurality of buckets in the distributed data grid. Then, at step 1402, the system can store one or more elements of a distributed queue in the plurality of buckets. Furthermore, at step 1403, the system can use a named queue to hold a local version of a state information for the distributed queue, wherein said local version of the state information contains a head pointer and a tail pointer to the queue of buckets in the distributed data grid.

[00114] Asystem for supporting a distributed queue in a distributed data grid, comprising: one or more microprocessors; a plurality of buckets in the distributed data grid running on the one or more microprocessors, wherein the plurality of buckets, which are maintained in a queue, are configured to store one or more elements of a distributed queue; and a named queue that holds a local version of a state information for the distributed queue, wherein said local version of the state information contains a head pointer and a tail pointer to the queue of buckets in the distributed data grid.

[00115] The system further comprising a means wherein the named queue is associated with a client process, which is used by one or more user process for accessing the one or more elements in the distributed queue.

[00116] The system further comprising a means wherein said client process is configured to offer one or more elements to a tail bucket in the queue of buckets, based on the local version of the state information for the distributed queue. [00117] The system further comprising a means wherein said client process is configured to determine the queue state is stale when a message received from a process owning the tail bucket includes one or more un-offered elements.

[00118] The system further comprising a means wherein said client process is configured to obtain a refreshed queue state from a state owner process, wherein the refreshed queue state indicates a new head bucket.

[00119] The system further comprising a means wherein said client process is configured to poll or peek an element from a head bucket in the queue of buckets, based on the local version of the state information for the distributed queue.

[00120] The system further comprising a means wherein said client process is configured to determine that the queue state is stale when a message received from a process owning the head bucket is empty when the distributed queue is not empty.

[00121] The system further comprising a means wherein said client process is configured to obtain a refreshed queue state from a state owner process, wherein the refreshed queue state indicates a new head bucket.

[00122] The system further comprising a means wherein each bucket maintains a bucket state, which include a pointer to a current head element and a pointer to a current tail element, and a bucket is locked while an operation is performed, wherein other operations on said bucket are configured to wait until the operation is completed.

[00123] A method for supporting a distributed queue in a distributed data grid, comprising: providing a plurality of buckets in the distributed data grid wherein each bucket is configured to store one or more elements of a distributed queue; storing one or more elements of the distributed queue in the plurality of buckets; and using a named queue to hold a local version of a state information for the distributed queue, wherein said local version of the state information contains a head pointer and a tail pointer to the queue of buckets in the distributed data grid.

[00124] The method further comprising a means for associating the named queue with a client process, which is used by one or more user process for accessing the one or more elements in the distributed queue.

[00125] The method further comprising a means for using said client process to offer one or more elements to a tail bucket in the queue of buckets, based on the local version of the state information for the distributed queue.

[00126] The method further comprising a means for using said client process to determin the queue state is stale when a message received from a process owning the tail bucket includes one or more un-offered elements.

[00127] The method further comprising a means for using said client process to obtain a refreshing queue state from a state owner process, wherein the refreshed queue state indicates a new head bucket.

[00128] The method further comprising a means for using said client process to poll or peek an element from a head bucket in the queue of buckets, based on the local version of the state information for the distributed queue.

[00129] The method further comprising a means for using said client process to determine the queue state is stale when a message received from a process owning the head bucket is empty when the distributed queue is not empty.

[00130] The method further comprising a means for using said client process to obtain a refreshed queue state from a state owner process, wherein the refreshed queue state indicates a new head bucket.

[00131] The method further comprising a means for maintaining a bucket state with each bucket, which bucket state includes a pointer to a current head element and a pointer to a current tail element, and wherein a bucket is locked while an operation is performed, wherein other operations on said bucket are configured to wiat until the operation is completed.

[00132] A non-transitory machine readable storage medium having instructions stored thereon for supporting a distributed queue in a distributed data grid, which instructions, when executed by a computer system, cause the computer system to perform steps comprising providing a plurality of buckets in a distributed data grid wherein each bucket is configured to store one or more elements of a distributed queue; storing one or more elements of the distributed queue in the plurality of buckets; and using a named queue to hold a local version of a state information for the distributed queue, wherein said local version of the state information contains a head pointer and a tail pointer to the queue of buckets in the distributed data grid.

[00133] An apparatus for supporting a distributed data structure in a distributed data grid, comprising: means for providing a plurality of buckets in the distributed data grid, wherein each said bucket is configured with a capacity to contain a number of elements of a distributed data structure; means for holding, via a state owner process, state information for the distributed data structure, and means for providing, via the state owner process, the state information for the distributed data structure to a client process.

[00134] The apparatus further comprising means for distributing said plurality of buckets evenly across a plurality of processes in the distributed data grid.

[00135] The apparatus further comprising means for performing multiple operations simultaneously on different buckets associated with the distributed data structure.

[00136] The apparatus further comprising means for updating said state owner process the state information for the distributed data structure when the distributes data structure changes.

[00137] The apparatus further comprising means for allowing a bucket to be locked while an operation is performed, wherein other operations on said bucket are configured to wait unitl the operation is completed.

[00138] The apparatus further comprising means for allowing the distributed data structure to be a distributed queue, a distributed set, a distributed list, or a distributed stack.

[00139] The apparatus further comprising means for allowing each bucket to maintain a bucket state, wherein the bucket state is used to access one or more elements in said bucket.

[00140] The apparatus further comprising means for associated at least one bucket with a back-up bucket, which contains one or more elements contained in said at least one bucket.

[00141] The apparatus further comprising means for allowing said client process to store a local version of the state information for the distributed data structure; refresh the local version of the state information stored in said client process based on the state information held by the state owner process, only when the local version of the state information becomes stale; and perform an operation on one or more elements in a bucket in the distributed data structure.

[00142] The present invention may be conveniently implemented using one or more conventional general purpose or specialized digital computer, computing device, machine, or microprocessor, including one or more processors, memory and/or computer readable storage media programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.

[00143] In some embodiments, the present invention includes a computer program product which is a storage medium or computer readable medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data. The storage medium or computer readable medium (media) may be non-transitory.

[00144] The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art. The modification and variation include any relevant combination of the described features. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.

Claims

Claims:

What is claimed is: 1. A system for supporting a distributed data structure in a distributed data grid, comprising: one or more microprocessors;

a plurality of buckets in the distributed data grid, wherein each said bucket is configured with a capacity to contain a number of elements of a distributed data structure;

a state owner process, running on the one or more microprocessors, wherein said state owner process is configured to:

hold state information for the distributed data structure, and

provide the state information for the distributed data structure to a client process.

2. The system according to Claim 1 , wherein:

said plurality of buckets are evenly distributed across a plurality of processes in the distributed data grid.

3. The system according to Claim 1 or 2, wherein:

multiple operations are simultaneously performed on different buckets associated with the distributed data structure.

4. The system according to any preceding Claim, wherein:

said state owner process is configured to update the state information for the distributed data structure when the distributed data structure changes.

5. The system according to any preceding Claim, wherein:

a bucket is locked while an operation is performed, wherein other operations on said bucket are configured to wait until the operation is completed.

6. The system according to any preceding Claim, wherein:

the distributed data structure is a distributed queue, a distributed set, a distributed list, or a distributed stack.

7. The system according to any preceding Claim, wherein:

each bucket maintains a bucket state, wherein the bucket state is used to access one or more elements in said bucket.

8. The system according to any preceding Claim, wherein:

at least one bucket is associated with a back-up bucket which contains one or more elements contained in said at least one bucket.

9. The system according to any preceding Claim, wherein said client process is configured to:

store a local version of the state information for the distributed data structure; and perform an operation on one or more elements in a bucket in the distributed data structure.

10. The system according to Claim 9, wherein:

said client process is configured to refresh the local version of the state information stored in said client process based on the state information held by the state owner process, only when the local version of the state information becomes stale.

11. A method for supporting a distributed data structure in a distributed data grid, comprising: providing a plurality of buckets in the distributed data grid, wherein each said bucket is configured with a capacity to contain a number of elements of a distributed data structure, holding, via a state owner process, state information for the distributed data structure, and providing, via the state owner process, the state information for the distributed data structure to a client process.

12. The method of Claim 1 1 , further comprising:

distributing said plurality of buckets evenly across a plurality of processes in the distributed data grid.

13. The method according to Claim 11 or 12, further comprising:

performing multiple operations simultaneously on different buckets associated with the distributed data structure.

14. The method according to any of Claims 11 to 13, further comprising:

updating, by said state owner process the state information for the distributed data structure when the distributed data structure changes.

15. The method according to any of Claims 11 to 14, further comprising: locking a bucket while an operation is performed, wherein other operations on said bucket are configured to wait until the operation is completed.

16. The method according to any of Claims 11 to 15,

wherein the distributed data structure is a distributed queue, a distributed set, a distributed list, or a distributed stack.

17. The method according to any of Claims 11 to 16, further comprising:

each bucket maintaining a bucket state, wherein the bucket state is used to access one or more elements in said bucket.

18. The method according toany of Claims 11 to 17, further comprising:

associating at least one bucket with a back-up bucket; which contains one or more elements contained in said at least one bucket.

19. The method according to any of Claims 1 1 to 18, further comprising said client process: storing a local version of the state information for the distributed data structure;

refreshing the local version of the state information stored in said client process based on the state information held by the state owner process, only when the local version of the state information becomes stale; and

performing an operation on one or more elements in a bucket in the distributed data structure.

20. The method according to any of Claims 1 1 to 19, wherein the distributed data structure is a distributed queue, wherein the plurality of buckets, which are maintained in a queue, are configured to store one or more elements of the distributed queue, and wherein the state owner process is a named queue that holds a local version of state information for the distributed queue, wherein said local version of the state information contains a head pointer and a tail pointer to the queue of buckets in the distributed data grid.

21 . A computer program comprising program instructions in machine-readable format that when executed by a computer system cause the computer system to perform the method of any of Claims 1 1 to 20.

22. A computer program product comprising the computer program of Claim 21 stored in a non-transitory machine-readable data storage medium.

23. A non-transitory machine readable storage medium having instructions stored thereon for supporting a distributed data structure in a distributed data grid, which instructions, when executed by a computer system, cause the computer system to perform steps comprising: providing a plurality of buckets in a distributed data grid, wherein each said bucket is configured with a capacity to contain a number of elements of a distributed data structure; holding, via a state owner process, state information for the distributed data structure; and providing the state information for the distributed data structure to a client process.