WO2017114176A1 - 一种分布式环境协调消费队列方法和装置 - Google Patents

一种分布式环境协调消费队列方法和装置 Download PDF

Info

Publication number
WO2017114176A1
WO2017114176A1 PCT/CN2016/110230 CN2016110230W WO2017114176A1 WO 2017114176 A1 WO2017114176 A1 WO 2017114176A1 CN 2016110230 W CN2016110230 W CN 2016110230W WO 2017114176 A1 WO2017114176 A1 WO 2017114176A1
Authority
WO
WIPO (PCT)
Prior art keywords
queue
fragment
client
lease
fragments
Prior art date
Application number
PCT/CN2016/110230
Other languages
English (en)
French (fr)
Inventor
孙廷韬
Original Assignee
阿里巴巴集团控股有限公司
孙廷韬
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 孙廷韬 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017114176A1 publication Critical patent/WO2017114176A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources

Definitions

  • the present application relates to the field of distributed technologies, and in particular, to a distributed environment coordinated consumption queue method and a distributed environment coordinated consumption queue device.
  • the queue system In order to reduce the coupling degree of data production and consumption, the queue system is usually used to cache and collect data generated by multiple generators, and then a plurality of clients as computing nodes consume data from the queue.
  • the general queue system uses multiple queue fragments (such as shard or partition) to support the horizontal expansion of massive data.
  • queue fragments such as shard or partition
  • a client acting as a compute node needs to consume data from multiple shards (or partitions), it is necessary to run the same program on multiple clients to co-consume the data.
  • the client consumes data it also needs to specify which shard (or partition) to grab data from, and complete the consumption.
  • M M
  • the value ranges from 1 to N) to consume the shards (or partitions), and each client consumes N/M shards (or partitions). In order to properly consume data in all shards (or partitions), these M clients need to work together.
  • Kinesis is a real-time data queue service provided by AWS.
  • the Kinesis Client Library is a collaborative lib for consuming Kinesis data.
  • the Kinesis Client Library relies on DynamoDB to accomplish collaboration between clients.
  • DynamoDB DynamoDB to accomplish collaboration between clients.
  • the Kinesis Client Library supports the addition of client-side consumption data, when performing load balancing and other operations, the client preempts the queue and shards directly, so that some data is repeatedly consumed, resulting in no consumption results. accurate.
  • embodiments of the present application have been made in order to provide a distributed environment coordinated consumption queue method and a corresponding distributed environment coordinated consumption queue device that overcome the above problems or at least partially solve the above problems.
  • the present application discloses a method for coordinating consumption queues in a distributed environment, including:
  • the status data of the queue fragment is updated, and the current consumption progress of the queue fragment is obtained;
  • the queue fragments are continued to be consumed according to the current consumption schedule, and the new consumption progress of the queue fragments is recorded.
  • the application also discloses a device for coordinating consumption queues in a distributed environment, comprising:
  • a state data obtaining module configured to collect, according to a queue fragment to be consumed, state data of the queue fragment obtained based on a lease agreement
  • a consumption judging module configured to determine, according to the status data, whether another client is consuming the queue fragment
  • a progress obtaining module if it is determined that no other client is consuming the queue fragment, updating status data of the queue fragment, and acquiring a current consumption progress of the queue fragment;
  • the consumption module is configured to continue to consume the queue fragment according to the current consumption schedule, and record the new consumption progress of the queue fragment.
  • the queue fragment is a queue fragment to be consumed by the client, and the client needs to be based on a lease agreement first.
  • the consumption progress of the queue fragment can be seamlessly transmitted to the current client, so that the queue fragment load balancing is performed.
  • the shard of a client is preempted by the current client.
  • the shard of the queue can be sharded according to the queue. The consumption progress of consumption continues to consume the queue fragmentation, avoiding repeated consumption of some data and making the consumption result more accurate.
  • FIG. 1 is a flow chart of steps of an embodiment of a distributed environment coordinated consumption queue method according to the present application
  • FIG. 2 is a structural block diagram of an embodiment of a distributed environment coordinated consumption queue device of the present application
  • 2A is an example of an architecture of a distributed environment of the present application.
  • One of the core concepts of the embodiments of the present application is that, in a distributed environment, when the generator When writing data, load balancing can be done by writing data to different queue fragments (shards or partitions) by hashing or polling scheduling.
  • queue fragments shards or partitions
  • M M is in the range of 1 to N
  • the queue fragment is represented by shard.
  • these M clients need to cooperate with each other. They need to consider the following aspects: First, how can each client correctly choose the shard of consumption to ensure that any shard has and only A consumer client; second, how to deal with one or some client downtimes, when reload balancing, the data on all shards can still be processed correctly, not repeated consumption; third, when When the processing pressure increases, the client needs to be newly processed, how to achieve automatic load balancing, and can ensure that any data is not repeatedly consumed.
  • the queue fragment is the queue to be consumed for the client A.
  • Fragmentation then the client A needs to first obtain the status data of the queue fragment based on the lease agreement, and then according to the status data, determine whether other clients are consuming the queue fragment, and determine that there are no other clients. After consuming the queue fragment, the client A obtains the current consumption progress of the queue fragment, and updates the status data of the queue fragment, and then the client A continues to consume the queue at the current consumption schedule. Fragment and record new consumption progress.
  • the consumption progress of the queue fragment can be seamlessly transmitted to the current client, so that the queue fragment load balancing is performed.
  • the shard of a client is preempted by the current client.
  • the shard of the queue can be sharded according to the queue. The consumption progress of consumption continues to consume the queue fragmentation, avoiding repeated consumption of some data and making the consumption result more accurate.
  • Shard A queue fragment, which is a container for actually storing data in a data queue.
  • a data queue consists of multiple shards. Each worker needs to select a specific shard when consuming data. Of course, it can also be represented by a partition, which is not limited in this application.
  • Worker Can be understood as a client or a worker process, corresponding to a compute node, a worker can consume data in one or more shards.
  • Worker_name The name of each worker, used to distinguish between different clients.
  • Each shard has a lock. Only the worker that preempts the lease can consume the data in the shard.
  • Check Point The consumption progress, which records the location information of a shard that has been consumed, indicating which data has been consumed.
  • FIG. 1 a flow chart of steps of a method for a distributed environment coordinated consumption queue method of the present application is shown, which may specifically include the following steps:
  • Step 110 For a queue fragment to be consumed, status data of the queue fragment obtained based on a lease agreement;
  • the embodiment of the present application provides a set of lease agreements for shards in a multi-consumer collaborative consumption queue in a distributed environment, and implements a process in which multiple computing nodes cooperatively consume data in multiple shards.
  • the status data of the shard is updated.
  • the status data of the shard is used to determine whether the shard is being consumed by the client.
  • the shard is a shard to be consumed for the client, and the current client worker A needs to obtain the queue fragment based on the lease agreement.
  • the status data proceeds to step 120.
  • the method before step 110, the method further includes:
  • Step 100 Determine a queue fragment to be consumed by the current client based on the lease agreement.
  • the current client first needs to determine how many shards it can consume according to the lease agreement, and then from which to seize the above shard, as the shard to be consumed by the current client worker A.
  • the step 100 includes the sub-step M1:
  • Sub-step M1 determines the queue fragment to be consumed by the current client based on the lease agreement every first time period.
  • the lease agreement specifies that the current client worker A can consume a few shards every second time period, and then from which to seize the shard as the shard to be consumed by the current client. In this way, load balancing can be performed on each shard in a distributed environment in real time.
  • the step 100 includes sub-steps 101-102:
  • Sub-step 101 Obtain the total number of active clients U, the total number of queue fragments P, and the total number of queue fragments Q that the current client has consumed to calculate the number of queue fragments N that the current client needs to preempt.
  • an active client indicates that the client is normal and can share the processing shard.
  • the number of N needs to ensure the load balancing of the distributed environment. Then, the total number of active clients U, the total number of queue fragments P, and the queue fragment that the current client has consumed are obtained. The total number Q, then the number N is calculated.
  • sub-step 101 comprises: sub-steps A11-A14:
  • Sub-step A11 obtaining the total number of active clients U from the client instance table and the queue lease table stored in the persistent storage space.
  • the client instance table and the queue lease table may be constructed in advance in the persistent storage space.
  • the persistent storage space may be a database, or a distributed cache, and of course, may be other types of persistent storage space, which is not limited in the embodiment of the present application.
  • the preferred persistent storage space shown in the embodiment of the present application is a database, of course, the database can be It is assumed that the mysql database can also be other types of databases, and the embodiments of the present application do not limit them.
  • the queue lease table client_shard_lease can be in the form of Table 1:
  • Consume_group Consumption group field, the data type is Char (64), which is one of the primary keys of the table, indicating the consumption group name for a certain queue.
  • Char 64
  • the shard is divided into consumption groups, and the client is also created by the consumption group. Therefore, the application can use the consume_group to distinguish different consumption groups.
  • Shard_id Queue ID field, the data type is Char (64), which is one of the primary keys of the table, and the identifier of each shard in the queue.
  • Lease_id The lease ID field, the data type is int(20), and the id used by the worker to lease the shard.
  • the atom's test and set operations are used to ensure that at any time, only one owner can modify the value of the lease, that is, the shard can be grabbed.
  • the "test” and “set” operations are performed in an atomic operation that cannot be split, so that only one client can preempt the shard at the same time to ensure the correctness of data operations.
  • Lease_owner the leaser field, the data type is Char (64), the owner of the lease that preempts the shard, that is, a worker_name;
  • Consumer_owner consumer field, data type Char (64), the owner of the shard currently being consumed, ie a worker_name;
  • Update_time update time field
  • data type is DateTime, used to record updates Time for monitoring use.
  • the client_shard_lease table in the embodiment of the present application may include a shard_id, a lease_owner, a consumer_owner, and a check_point, so that between the preempted client and the robbed client, the robbed queue fragment is transmitted. Grab the consumption progress of the client.
  • Several other fields may be preferred fields.
  • the client instance table client_worker_instance can be in the form of Table 2:
  • Consume_group is a consumer field, the data type is Char (128), which is one of the primary keys of the table, indicating the name of the consumption group for a certain queue;
  • Worker_name is the client name, the data type is Char (64), which is one of the primary keys of the table.
  • DateTime The time when the client starts.
  • Table 2 can record each client created for a consumer group.
  • the worker_name of the client is written into the client instance table, and the time of starting it is recorded.
  • the client includes lib and data processing logic.
  • Lient lib performs logic such as preemption and renewal of the application, and the data processing logic is logic for normal consumption of data in the shard.
  • the above-mentioned consumption logic such as user behavior analysis, monitoring alarm, program intrusion detection and other specific analysis processing operations.
  • client lib can be understood as a thread of the client, the client lib implementation process is roughly:
  • the client lib gets the name of all shards in the queue, namely shard_id.
  • Client lib obtains all shard_id information from client_shard_lease, including lease_id and lease_owner corresponding to each shard;
  • Client lib determines whether the shard_id obtained in 2 appears in the client_shard_lease table; if it does not appear, add the record of the shard_id in the client_shard_lease table, set the lease_id to 0, and leave_owner, consumer_owner, check_point, and update_time are set to null.
  • the client will create preemption threads and lease threads.
  • the preemption thread is used to preempt the shard, and the lease thread is used to lease the shard for consumption.
  • the embodiment of the present application can obtain active clients from Tables 1 and 2, thereby calculating the total number of active clients U.
  • sub-step A11 includes: sub-steps A111-A113:
  • Sub-step A111 obtaining the number of clients started in the last time period recorded in the client instance table, live1;
  • the client may not preempt any shard, so the subscriber field in Table 1 does not have a record for the client.
  • the embodiment of the present application can use the system time of the client minus the t obtained by the startup time to determine whether the t is smaller than the first time period T. If the client is active, the number of active clients is statistically obtained. Get live1.
  • Sub-step A112 in the queue rental table, the number of clients corresponding to the timeout queue fragment is not live2; the client that has no timeout is the guest recorded under the renter field of the queue fragmentation Account
  • the embodiment of the present application also obtains records of all shards from the client_shard_lease table, saves all shard records with no timeout, and then extracts the worker_name under the lease_owner, and the corresponding client is active, and the statistics are active. The number of clients gets live2.
  • Sub-step A113 adding live1 plus live2 to get the total number of active clients U.
  • Sub-step A12 obtaining the total number P of queue fragments from the queue system
  • the total number of shards P in the consumption group is obtained in the queue system.
  • the total number of queue fragments Q that the current client has consumed is obtained from the queue lease table stored in the persistent storage space.
  • the worker_name of the lease_owner and the consumer_owner are the same, and the consumer_owner is the consumer_owner of the current preempted worker process, the worker_name occupies the shard. If the worker_name of the lease_owner and the consumer_owner are different, the worker_name in the consumer_owner is no longer allowed to occupy the shard. The shard is preempted by other workers.
  • the client_shard_lease table, the lease_owner, and the consumer_owner are both the worker_name, and the number of shards P is the number of shards that the worker has occupied.
  • shard_1 For example, for the current client worker A, in the client_shard_lease table, there are shard_1, shard_2's lease_owner and consumer_owner are both worker A, and shard_3's lease_owner is workerB, and consumer_owner is worker B. Then worker A occupies shard_1, shard_2, and does not occupy shard_3. Therefore, the number of shards Q occupied by worker A is 2.
  • sequence between sub-steps A11-A13 is not limited by the embodiment of the present application.
  • worker F occupies 11 shards
  • worker B occupies 10 shards
  • worker C occupies 10 shards
  • worker D occupies 10 shards
  • worker E occupies 11 shards.
  • the worker A is a new worker and does not occupy a shard.
  • Sub-step 101 Preempting N queue fragments from the time-out queue fragmentation and/or the queue fragment being consumed by other clients as the current client's to-be-spent queue fragment.
  • timeout queue fragment There is a timeout queue fragment, and the number of timeout shards L is greater than or equal to N.
  • timeout queue fragment There is a timeout queue fragment, and the number of timeout shards L is less than N.
  • worker A can directly occupy N queue fragments from the timeout shard.
  • worker A can preempt the queue fragments that are not greater than L from the timeout queue fragments, and the remaining N-L shards are preempted from other workers.
  • N shards are directly seized from other workers.
  • said sub-step 101 comprises sub-steps A21-A24:
  • Sub-step A21 determining whether the N is greater than 0;
  • Sub-step A22 if the N is greater than 0, it is determined whether the number of queue fragments L of the timeout is less than N;
  • Sub-step A23 if the number of queue fragments L that are timed out is greater than or equal to N, the N queue fragments are preempted from the timed queue fragments;
  • N are randomly preempted from the L timeout queue fragments.
  • Sub-step A24 if it is determined that the number of queue fragments L of the timeout is less than N, the L queue fragments are preempted from the time-out queue fragments, and the NL queue fragments are preempted from the queue fragments that other clients are consuming. .
  • the step of preempting N-L queue fragments from the queue fragments that are being consumed by other clients in the sub-step A24 includes: sub-step A241:
  • Sub-step A241 the current client preempts the NL queue fragments from the queue fragments occupied by other clients, and the difference between the number of queue fragments occupied by each client and the number of queue fragments occupying the smallest number does not exceed the specified number. .
  • the preemption thread of each worker ensures that the number of shards occupied by each worker meets the following conditions: the difference between the number of occupied maximum queue fragments and the number of occupied minimum queue fragments does not exceed Specify the quantity.
  • the specified number is, for example, 1.
  • the difference between the number of queue fragments occupied by each client and the number of queue fragments that occupy the minimum is not more than the specified number.
  • worker B occupies 11 shards
  • worker C occupies 10 shards
  • workerD occupies 10 shards
  • worker E occupies 10 shards
  • worker F occupies 11 shards.
  • the total shard is 54 and the timeout queue is 2 shards.
  • Worker A is the newly started worker.
  • the sub-step A241 comprises: sub-step A2411 - sub-step A2412:
  • Sub-step A2411 based on the queue lease table, sorting each client according to the number of queue fragments occupied by the clients;
  • the client_shard_lease table counts the number of shard_ids occupied by each worker_name.
  • Sub-step A2412 after preempting one or more preemption queue fragments from the previous K clients, after the first K clients are preempted by J, the average number of remaining queue fragments of the former K clients is greater than The number of queue fragments currently occupied by the K+1th client, and the number of remaining queues of each client in the first K clients is different from the average number by no more than the specified number until the NL is successfully preempted. Queue fragmentation.
  • the preceding table 4 is preempted. Because worker A can preempt 2 timeout queues. Then worker A needs to grab 7 shards from other workers.
  • the number of remaining shards is greater than the number of shards 11 of the second one.
  • the number of queue fragments currently occupied by the work process the limit of two conditions, can not be from the first 2 Preempt more than 2 shards in work. Then you can grab 1 from the first, and the second does not preempt. At this point, worker F needs to preempt 6 shards and need to continue to preempt.
  • the two conditions are satisfied after the two shards are seized from the worker C, and the shards are grabbed from the worker B, the worker D, the worke E, and the worker F.
  • the other two preemption modes do not satisfy the foregoing two conditions. condition.
  • the sub-step 101 includes:
  • sub-step 1011 when the queue fragment is preempted, the consumer field of the preempted queue fragment is modified into the current client in the queue lease table.
  • the client after the client has seized a queue fragment, the client also updates the status data of the shard according to the lease agreement.
  • the client worker A preempts the N shard, and the N shards are the queue fragments to be consumed for the worker A.
  • the sub-step A31-A34 is further included:
  • Sub-step A31 obtaining a queue lease table every first time period, and determining whether the rent ID of the renter field in each queue fragment in the queue lease table is the last recorded in the locally stored map table The lease ID of the corresponding queue fragment is the same;
  • each worker's client lib creates a map table, which includes a queue fragment identification field shard_id, a lease identifier field lease_id, and a last monitor time field last_update_time.
  • the map table is in the memory of the computing server where the worker is located.
  • map table After the map table is created, first obtain information about all shards from the client_shard_lease table and write them to the map table.
  • the related information is shard_id as the primary key, and for each shard, the shard_id, lease_id, and current system time are recorded one by one. Then, the process proceeds to step B11 for the timeout monitoring process.
  • the client lib of a worker obtains each shard_id and the lease_id of the shard_id from the client_shard_lease table of the database every first time period T.
  • the shard_id is not matched, it indicates that the shard is newly appearing, then there is no corresponding record in the map, the shard_id+lease_id is added to the map table, and the current system time is recorded under the corresponding last_update_time field.
  • step A32 If the shard_id matches, and the lease_id does not match, then go to step A32.
  • step A33 If shard_id+lease_id matches, then go to step A33.
  • Sub-step A32 if the lease ID of a queue fragment in the queue lease table is different from the lease ID of the corresponding queue fragment recorded last time in the map table, the lease ID of the queue fragment in the map table is updated as a queue.
  • the lease ID in the lease table and the last monitor time field of the queue fragment described in the update map table are the current system time;
  • the renewal quotation thread of each worker renews the same shard of the lease_owner and the consumer_owner for the worker every second time period.
  • the second time period is less than the first time period, and the second time period is, for example, T1/2.
  • the current client workerA of the embodiment of the present application determines, for each shard, that the map table shard_id matches the shard_id of the queue lease table, and if the lease_id does not match, it determines that the response shard has not timed out, and the shard_id of the map table is After the last_update_time field, the system time recorded in its last_update_time is updated to the current system time.
  • Sub-step A33 if the lease ID of a queue fragment in the queue lease table is the same as the lease ID of the corresponding queue fragment recorded last time in the map table, then the last monitoring time field is maintained. System time, and determine whether the current system time minus the last monitored system time is greater than the first time period;
  • the lease_id does not change, indicating that the worker does not consume the shard normally.
  • worker A maintains the system time under the last_update_time field in its map table unchanged, and then determines whether the current system time sys_time minus last_update_time is greater than the first time period T.
  • Sub-step A34 if the current system time minus the last monitored system time is greater than the first time period, it is determined that the corresponding queue fragment times out.
  • step A11 If the current system time sys_time minus last_update_time is greater than the first time period T, the shard is timed out. The shard will be taken as a preempted object. It is possible to proceed to step A11.
  • the step 110 includes the sub-step 111:
  • Sub-step 111 Obtain state data of the queue fragment from a queue lease table stored in the persistent storage space for a queue fragment to be consumed.
  • the current client worker A After the N queue fragments are preempted, the status data of the queue fragment can be obtained from the queue lease table. Go to step 120.
  • Step 120 Determine, according to the status data, whether another client is consuming the queue fragment
  • each shard of the N shards preempted by the current client worker A since the foregoing step acquires the state data of each shard, it can be analyzed according to the state data whether the shards still have Other workers are using the shard. If not If the shard is used, the process proceeds to step 130. If it is still in use, continue to acquire the new state data of the shard and make the above judgment.
  • the status data includes a value of a consumer field under the queue fragment, a value of a renter field, and a consumption time of an update time field in a queue lease table; After the current client preempts the queue fragment, the value of the update time field is modified.
  • the shard status data such as the values of lease_owner, consumer_owner, and update_time, are obtained from the queue lease table according to the shard_id.
  • the value of the update_time is that when the worker A preempts the shard, the consumption time value is sent to the database based on the shard_id generation time update request, and is obtained by the update_time value of the shard_id of the database update queue lease table.
  • the consumption group can also be used to obtain the state data of the shard, and send an update request to the database to update the state data in the queue lease table.
  • step 120 includes sub-steps 121-123:
  • Sub-step 121 determining whether the value of the value of the consumer field of the queue fragment is the current client
  • the current client worker A for the shard to be consumed, sends a status data acquisition request to the database according to the corresponding shard_id, so as to obtain the corresponding state data, the value of the lease_owner can be extracted from the state data, if The value of worker A means that the client successfully preempts the shard. Go to sub-step 121.
  • Sub-step 122 if the value of the consumer field of the queue fragment is the current client, determining whether the preemption time of the current client is greater than the consumption time;
  • Sub-step 123 If the preemption time of the current client is greater than the consumption time, it is determined that no other client is consuming the queue fragment.
  • the update_time value of the corresponding shard_id is the consumption time.
  • the preemption time can be recorded locally in the current client worker A.
  • the preemption time of the current client is greater than the consumption time. If the preemption time of the current client is greater than the consumption time, no other client is consuming the queue fragment, and the consumption progress of the queue fragment has no other client update. If the preemption time of the current client is not greater than the consumption time, it indicates that other clients may consume the queue fragment, and the consumption progress of other clients may not be updated to the queue lease table.
  • the preemption time is a system time of the current client
  • the consumption time is a sum of a system time and a first cycle time when the client preempts the queue fragment.
  • the lease_owner of the queue lease table is updated to the worker A, and the current system time of the worker A is added to the first time period.
  • the consumption time is updated to the update_time under the shard_id of the queue lease table.
  • the sub-step 122 extracts the consumption time from the update_time, and then directly compares the current system time of the worker A with the consumption time. If the current system time exceeds the consumption time, it is determined that there are no other customers. The end is consuming the queue fragments.
  • the system time of each worker may also be different. Then the application records its own system time for each worker, and unifies the first time period T. Taking the foregoing worker A as an example, the system time that the system should arrive after T is started after the worker A preemption starts from the foregoing steps, and the system time that should be reached is the aforementioned consumption time. When the system time of worker A exceeds the consumption time of shard_3, it is determined that no other client is consuming the queue fragment. For example, when the worker A seizes shard_3, the system time is 12:00:00:000, and the first time period T is 50ms, then the consumption time is 12:00:00:050.
  • the database obtains the state data of the shard_3 of the queue lease table, and the state data includes update_time. Then if the system time goes to 12:00:00:051, it is greater than the consumption time, which means that other clients are in the first time period T Inside, the latest consumption progress is written into the check_point of shard_3, and step 130 can be entered. If the system time is not greater than the consumption time, it indicates that there may be other clients consuming the queue fragment. In this way, the difference in system time of each client is avoided, and the current client determines that there is no other client in the judgment of consuming the queue fragment.
  • Step 130 If it is determined that no other client is consuming the queue fragment, update the status data of the queue fragment, and obtain a current consumption progress of the queue fragment.
  • the preemption time is greater than the consumption time, it is determined that no other client is consuming the queue fragment, and an update request may be sent to the database to update the status data of the queue fragment in the queue lease table. Obtain the consumption progress of the progress field of the queue fragment from the queue lease table of the database.
  • the preemption time is not greater than the consumption time
  • other clients may consume the shard, and may continue to perform step 110 to obtain status data for judgment.
  • step 130 includes sub-steps 131-132:
  • Sub-step 131 renting the queue fragment, and updating the consumer field under the queue fragment in the storage queue lease table in the persistent storage space to the current client;
  • Sub-step 132 Obtain the consumption progress under the progress field of the queue fragment from the queue lease table stored in the persistent storage space.
  • the consumption progress of the shard recorded in the queue lease table is the latest shard, and no other worker is in the shard. Spend the shard.
  • the lease thread of worker A can lease the shard and then send a request to the database to update the consumer_owner of the corresponding shard_id of the queue lease table in the database as worker A.
  • the lease_id of the shard_id in the queue lease table is also changed to lease_id+1, indicating that the worker A occupies the queue fragment.
  • the preemption time is greater than the consumption time, you can only update the lease_id of the shard_id in the queue lease table to lease_id+1, so that other clients do not preempt the shard.
  • sub-step 132 sub-step 133-134 are also included:
  • Sub-step 133 determining whether the preemptor field and the consumer field in the queue fragment are current clients in the queue lease table
  • Sub-step 134 If the preemptor field and the consumer field under the queue fragment are current clients, the current client renews the queue fragment.
  • each worker determines whether the worker occupies each shard, and whether the consumer_onwer and the lease_owner under the shard_id of the client_shard_lease table are the same. If the same, the worker is renewed for the shard. If it is different, the shard is not renewed for the worker.
  • step A31 for each work process, when the queue fragment occupied by the work process is consumed, a new lease ID is generated based on the last lease ID in the memory of the work process every first time period;
  • every shard is consumed for every worker, every other time period T.
  • Step A32 determining whether the last lease ID recorded in the working process memory is the same as the lease ID of the corresponding queue fragment in the queue lease period;
  • Step A33 if the same, the lease identifier field in the update queue lease table is the new lease ID
  • Step A34 if different, rejects the update of the lease identifier field in the queue lease table to the new lease ID.
  • the clientlib corresponding to each worker updates the lease_id of the shard in the database by using the lease_id of each shard in the current memory.
  • the client lib compares the shard's lease_id+1 in memory every T time, and compares the last lease_id with the lease_id of the shard in the client_shard_lease table. If they are the same, the lease_id of the client_shard_lease table is allowed to be lease_id+1. The above update is not allowed. Due to the atomicity of the operation, it can be guaranteed that only one worker can be updated successfully at any time, that is, only one worker can grab the lease of the shard.
  • Step 140 Continue to consume the queue fragment according to the current consumption schedule, and record the new consumption progress of the queue fragment.
  • the worker A can continue to consume the shard from the consumption progress of the shard.
  • the new consumption progress is updated to the queue schedule of the database.
  • the step of recording the new consumption progress of the queue fragment in step 140 includes: sub-steps A41-A42:
  • Sub-step A41 determining, in the second time period, whether the renter field of the queue fragment is the current client in the queue lease table
  • Sub-step A42 if the renter field of the queue fragment is the current client, update the consumption progress of the queue fragment by the current client to the progress field of the queue fragment in the queue lease table.
  • the consumption progress is read from the check_point field under the shard_3 of Table 6 by 2/5, and then read from the 2/5 of the shard_3. Data is consumed.
  • the data is read from the begin or end of the shard according to the pre-configuration.
  • the configuration starts reading data from the begin to avoid data omission.
  • the new consumption progress is updated to the check_point field of shard_3 of the queue schedule of the database.
  • the client lib in the embodiment of the present application provides a check point interface, and the check point interface is used to save the check point information, so as to ensure that the worker is correctly consumed when the worker fails over, or the shard is preempted by different workers. Data in shard.
  • Check Point consists of the following two parts:
  • step 120 When a shard is determined to be consumable, the client automatically loads the check point information from the database. That is, the related method of step 120 is performed.
  • the data is read from the begin or end of the shard according to the pre-configuration.
  • Clientlib provides an interface to checkpoint operation saveCheckPoint (Bool persistent).
  • the parameter persistent is used to control whether it needs to be understood to be persisted to the external database. If persistent is true, it will be persisted to the database immediately. Otherwise, it will be persisted at regular intervals.
  • the value of persistent is controlled by client lib.
  • the persistent is not true, so that the duration of the consumption may be persisted to the corresponding shard in the database for a long time without exceeding the specified duration.
  • the regular is like 2T.
  • clientlib calls the shutdown interface to notify the upper application and persists the consumption progress to the check_point field of the corresponding shard in the database.
  • the preempted worker can correctly persist the check point information of each shard consumed by the worker.
  • the queue fragment is a queue fragment to be consumed by the client, and the client needs to be based on a lease agreement first.
  • the consumption progress of the queue fragment can be seamlessly transmitted to the current client, so that the queue fragment load balancing is performed. Or, if a client that is consuming the shard of the queue is smashed, and the queue fragment of a client is preempted by the current client, the current client can continue to consume the queue fragment according to the consumed consumption schedule, thereby avoiding part of the data. Repeat consumption to make consumption results more accurate.
  • Kakfa provides a set of advanced application APIs (Application Programming Interface) to synchronize between multiple worker processes.
  • the API relies on the kafka back-end zookeeper system. Because the ZooKeeper system is relied on, the work process A cannot update the zookeeper's key update data of a queue fragment A, and then the zookeeper cannot be returned in time. If the load is balanced, the key of the queue fragment A obtained by other work processes is updated. Previous data. Therefore, in the Kafka system, when the zookeeper data seen by different work processes may be different versions, the load balancing operation fails when the failover occurs.
  • the working process of the present application directly faces the queue fragmentation, and processes the queue fragmentation, and also avoids the problem that different work processes see different data versions when the Kafka system is load-balanced.
  • the Kinesis Client Library can load balance only one queue fragment at a time in reload balancing, for example, the work process A is consuming 100 queue fragments, and for the work process B, load balancing. It is necessary to preempt the queue fragment from the work process A. Each time load balancing, only one queue fragment can be preempted from the work process A. If the A wants to preempt 30 queue fragments from the B, it must trigger 30 times. Load balancing operation causes load balancing to take a long time and cannot reach the steady state quickly.
  • the N queues that should be preempted by the working process can be preempted in one time.
  • the method for preempting the shard can be quickly selected in batches, so that each worker quickly reaches a stable state. And equalize the number of shards consumed by each worker.
  • the embodiment of the present application records the preemptor, occupant, and occupant of each queue fragment through the queue lease table and then the consumer field consumer_owner, the leaser field lease_owner, and the progress field check_point in the queue lease table. Consumption schedule parameters. Then, when a worker process preempts the queue fragment occupied by another worker process, the consumption progress parameter is passed to the preemption through the consumer field consumer_owner, the renter field lease_owner, and the progress field check_point of the queue fragment in the queue lease table.
  • the work process is used to preempt the queues that are occupied by the other work processes when the load balancing is performed again after the load is re-balanced after the new work process, the compute server of the work process is down, or the failover of a work process.
  • the queue fragments can continue to be consumed according to the position they have consumed, avoiding repeated consumption of some data, and making the consumption result more accurate.
  • this embodiment describes the case where a client worker A is newly started based on the queue lease table and the client instance table:
  • Step 201 Create a client instance table and a queue lease table in the database in advance.
  • the client_worker_instance table and the client_shard_lease table are created similarly to the principle of the first embodiment.
  • Steps 202-227 are then performed for each newly launched client.
  • step 202 the client is started, and the client name is written into the client instance table.
  • the client lib thread of the client will write the worker_name and the current system time to the client_worker_instance table.
  • Step 203 Obtain a queue fragment identifier of all queue fragments from the queue system.
  • the shard_id of all shards in the queue system is obtained by the Client lib thread.
  • Step 204 Obtain a rent ID field and a renter field of all queue fragments from a queue lease table of the database.
  • the client lib thread obtains the lease_id and lease_owner of all shards from the client_shard_lease table according to the shard_id
  • Step 205 Determine whether all queue fragments in the queue system do not appear in the queue lease table. Medium; if there are queue fragments that do not appear in the queue lease table, the information about the queue fragment is written to the queue queuing table of the database.
  • the Client lib thread determines that a new shard_id does not appear in the client_shard_lease table, write the record corresponding to the shard_id to the client_shard_lease table, and set the corresponding lease_id to 0, and set the other fields to null.
  • the Client lib thread then creates the preemption thread and the renewal thread.
  • Step 206 Obtain the number of clients started in the last time period recorded in the client instance table, live1.
  • step 207 the number of clients corresponding to the timeout queue fragment is not in the queue rental table, and the client that is not timed out is the client recorded in the subscriber field of the queue fragment.
  • step 208 live1 is added to live2 to obtain the total number of active clients U.
  • Step 209 obtaining a total number P of queue fragments from the queue system
  • Step 210 Obtain the total number of queue fragments Q that the current client has consumed from the queue lease table.
  • Step 212 Obtain a queue lease table every first time period, and determine whether a rent ID of the renter field in each queue fragment in the queue lease table is corresponding to the last record in the locally stored map table.
  • the lease ID of the queue fragment is the same;
  • the client's preemption thread obtains the lease_id of each shard in the client_shard_lease table in the database every first time period, and compares it with the lease_id of the corresponding shard recorded in the map table to determine whether it is the same.
  • Step 213 If the lease ID of a queue fragment in the queue lease table is different from the lease ID of the corresponding queue fragment recorded last time in the map table, update the lease ID of the queue fragment in the map table to be a queue lease.
  • the monitoring time field is the current system time;
  • the lease_id of a shard in the client_shard_lease table is different from the lease_id of the corresponding shard recorded in the map table, the lease_id of the shard in the map table is updated as the lease_id in the client_shard_lease table, and the monitoring of the shard in the update map table is updated.
  • Field is the current system time
  • Step 214 If the lease ID of a queue fragment in the queue lease table is the same as the lease ID of the last queue fragment recorded in the map table, the system time under the last monitoring time field is maintained, and the current system is determined. Time minus whether the last monitored system time is greater than the first time period;
  • the lease_id of a shard in the client_shard_lease table is the same as the lease_id of the corresponding shard recorded in the map table, the system time under the last monitored field is maintained, and it is determined whether the current system time minus the last monitored system time is greater than the first time. A time period T.
  • Step 215 If the current system time minus the last monitored system time is greater than the first time period, determine that the corresponding queue fragment times out.
  • Step 216 determining whether the N is greater than 0;
  • Step 217 if the N is greater than 0, it is determined whether the number of queue fragments L that are timed out is less than N;
  • Step 218 If the number of queue fragments L that are timed out is greater than or equal to N, the N queue fragments are preempted from the timed queue fragment.
  • Step 219 If it is determined that the number of queue fragments L is less than N, the L queue fragments are preempted from the timeout queue fragments, and the N-L queue fragments are preempted from the queue fragments that other clients are consuming.
  • the consumer field of the preempted queue fragment is modified into the current client in the queue lease table of the database, and the queue is leased.
  • the update time field in the table is modified to the consumption time; the consumption time is the sum of the system time and the first cycle time when the client preempts the queue fragment.
  • the lease_owner of the preempted shard is modified to the worker A in the client_shard_lease table, and the consumption time is recorded under the update_time. And put the preempted shard into the renewal thread.
  • the robbed client judges that the lease ID in the local map is different from the queue lease table, and the consumer field is the same as the robbed client, and the client can be robbed.
  • the end time updates the consumption progress, and since the renewal period is T/2, the robbed client cannot renew the shard, and the robbed client cannot continue processing the shard. This ensures that the consumption progress of the queue fragments is seamlessly transmitted in each client.
  • step 212-step 219 is performed for the foregoing preemptive thread, and the preemptive thread executes once for each 2*T.
  • the client worker A joins, all its queue fragments are preempted by the above steps.
  • the preemption process is performed.
  • Step 220 For the preempted queue fragment, obtain the value of the consumption field of the queue fragment from the queue lease table, and determine whether the value of the value of the consumer field of the queue fragment is the current client.
  • Step 221 If the value of the value of the consumer field of the queue fragment is the current client, determine whether the preemption time of the current client is greater than the consumption time.
  • This step determines whether the system time of the preempted client is greater than the consumption time of update_time.
  • Step 222 If the preemption time of the current client is greater than the consumption time, rent the queue fragment, and update the consumer field under the queue fragment in the storage queue lease table in the persistent storage space to the current client. end.
  • Step 223 For the queue fragment occupied by the client, obtain the preemptor field value and the renter field value from the queue lease table of the database, and determine whether the preemptor field and the consumer field in the queue fragment are current.
  • Step 224 If the preemptor field and the consumer field in the queue fragment are the current client, the current client renews the queue fragment.
  • the above steps 220-224 are performed by the renewal lease thread.
  • the renewal of the lease is now every T/2.
  • the lease is renewed.
  • Step 225 For the queue fragment that the client preempts successfully, the consumption progress is obtained from the progress field under the queue fragment of the queue rental table of the database.
  • the client extracts the consumption progress from the check_point field of the client_shard_lease table. When the consumption progress is empty, the shard's consumption progress is from the beginning of consumption.
  • Step 226 The client continues to consume the queue fragment from the consumption progress position of the queue fragment.
  • step 227 the client updates the consumption progress to the progress field under the queue fragment in the database during the consumption queue fragmentation process.
  • the Client lib thread calls the checkpoint interface, and updates the consumption progress to the check_point of the client_shard_lease table every T time; or after the shard lease_owner is preempted, when the renewal lease arrives, the checkpoint is called.
  • the shutdown interface notifies the upper-layer application to persist the checkpoint information of the current consumption success to the check_point of the client_shard_lease table; or, when the worker A is dropped, the shutdown interface of the checkpoint is called to notify the upper-layer application, and the check point of the current consumption is successful. The information is persisted to the check_point of the client_shard_lease table.
  • the client after the client performs the first preemption of the queue fragment, it can perform steps 203 and later.
  • FIG. 2 a structural block diagram of a system embodiment of a distributed environment coordinated consumption queue of the present application is shown, which may specifically include the following modules:
  • the state data obtaining module 310 is configured to collect, according to a queue fragment to be consumed, state data of the queue fragment obtained based on a lease agreement;
  • the consumption judging module 320 is configured to determine, according to the status data, whether another client is consuming the queue fragment;
  • the progress obtaining module 330 is configured to: if it is determined that no other client is consuming the queue fragment, update status data of the queue fragment, and obtain a current consumption progress of the queue fragment;
  • the consumption module 340 is configured to continue to consume the queue fragment according to the current consumption schedule, and record the new consumption progress of the queue fragment.
  • the method before the state data obtaining module 310, the method further includes:
  • the queue fragment determining module is configured to determine a queue fragment to be consumed by the current client request based on the lease agreement.
  • the queue fragment determining module includes:
  • the quantity determining sub-module is configured to obtain the total number of active clients U, the total number of queue fragments P, and the total number of queue fragments Q that the current client has consumed, to calculate the number of queue fragments N that the current client needs to preempt;
  • Preemption submodule for sharding from timeout queues and/or other teams that the client is consuming the N queue fragments are preempted as the current client's queues to be consumed.
  • the requirement quantity determining submodule includes:
  • the total number of client acquisition sub-modules is used to obtain the total number of active clients U from the client instance table and the queue lease table stored in the persistent storage space;
  • the total number of queues in the overall queue is obtained by the sub-module, which is used to obtain the total number P of queue fragments from the queue system;
  • the total number of the contiguous sub-blocks of the queried shards is used to obtain the total number of queue shards Q that the current client has consumed from the queue lease table stored in the persistent storage space;
  • the client total acquisition submodule includes:
  • the new client quantity obtaining submodule is used to obtain the number of clients started in the last time period recorded in the client instance table, live1;
  • the number of clients that have not timed out to obtain the sub-module is used to obtain the number of clients corresponding to the timeout queue fragment in the queue lease table.
  • the client that is not timed out is the client recorded under the subscriber field of the queue fragment. ;
  • the number of clients is a sub-module that is used to add live1 plus live2 to get the total number of active clients U.
  • the preemption submodule includes:
  • a first determining submodule configured to determine whether the N is greater than 0;
  • a second determining submodule configured to determine, if the N is greater than 0, whether the number of queue fragments L that are timed out is less than N;
  • the full timeout preemption submodule is used to preempt N queue fragments from the timeout queue fragment if the number of queue fragments L that are timed out is greater than or equal to N.
  • the hybrid preemption sub-module is configured to preempt L queue fragments from the time-out queue fragments and to preempt NL from the queue fragments that other clients are consuming, if the number of queue fragments L that is determined to be timed out is less than N. Queue fragmentation.
  • the hybrid preemption module includes:
  • the first hybrid preemption sub-module is configured to preempt the NL queue fragments from the queue fragments occupied by other clients, and make each client occupy the maximum number of queue fragments and the minimum number of queue fragments. The difference does not exceed the specified amount.
  • the first hybrid preemption submodule includes:
  • a sorting sub-module configured to sort each client according to the number of queues occupied by the queue according to the queue lease table
  • the second hybrid preemption module is configured to reserve the remaining queue fragments of the first K clients after preempting the first K clients after preempting one or more preemption queue fragments from the previous K clients.
  • the average number of queues is larger than the number of queue fragments currently occupied by the K+1th client, and the number of remaining queues of each client in the first K clients is different from the average number by a specified number until the success is successful. Preempt NL queue fragments.
  • the preemption submodule includes:
  • the consumer field modification sub-module is configured to modify the consumer field of the preempted queue fragment in the queue lease table to the current client when preempting a queue fragment.
  • the method before the preempting the submodule, the method further includes:
  • the map table determining sub-module is configured to obtain a queue lease table every first time period, and determine a lease ID of a renter field in each queue fragment in the queue lease table, and whether it is in a locally stored map table.
  • the lease ID of the corresponding queue fragment recorded last time is the same;
  • the map monitoring time update submodule is configured to update the queue fragment in the map table if the lease ID of a queue fragment in the queue lease table is different from the lease ID of the corresponding queue fragment recorded last time in the map table.
  • the lease ID is the lease ID in the queue lease table, and the last monitor time field of the queue fragment in the update map table is the current system time;
  • the map monitoring time maintaining submodule is configured to maintain the system under the last monitoring time field if the lease ID of a queue fragment in the queue lease table is the same as the lease ID of the corresponding queue fragment recorded last time in the map table. Time, and determine whether the current system time minus the last monitored system time is greater than the first time period;
  • the timeout judging submodule is configured to determine that the corresponding queue fragment times out if the current system time minus the last monitored system time is greater than the first time period.
  • the queue fragment determining module includes:
  • the first queue fragment determining submodule is configured to determine, according to the lease agreement, the queue fragment to be consumed by the current client every first time period.
  • the state data obtaining module 310 includes:
  • the first state data obtaining sub-module is configured to obtain the state data of the queue fragment from the queue lease table stored in the persistent storage space for a queue fragment to be consumed.
  • the status data includes a value of a consumer field under the queue fragment, a value of a renter field, and a consumption time of an update time field in a queue lease table; the consumption time is After the current client preempts the queue fragment, the value of the update time field is modified;
  • the consumption determining module 320 includes:
  • a consumer field determining sub-module configured to determine whether a value of a value of a consumer field of the queue fragment is a current client
  • the preemption time judging module is configured to determine, if the value of the value of the consumer field of the queue fragment is the current client, whether the preemption time of the current client is greater than the consumption time;
  • the determining submodule is configured to determine that no other client is consuming the queue fragment if the preemption time of the current client is greater than the consumption time.
  • the preemption time is a system time of the current client
  • the consumption time is a sum of a system time and a first cycle time when the client preempts the queue fragment.
  • the progress obtaining module 330 includes:
  • the renting sub-module is configured to lease the queue fragment, and update the consumer field in the queue fragment of the storage queue lease table in the persistent storage space to the current client;
  • the progress field reading module is configured to obtain the consumption progress under the progress field of the queue fragment from the queue lease table stored in the persistent storage space.
  • the consumption module 320 includes:
  • a leaser judgment sub-module configured to determine, in a second time period, whether a renter field of the queue fragment is a current client in a queue lease table
  • a consumption progress update sub-module configured to update the consumption progress of the current client to the queue fragment to the queue fragment in the queue lease table, if the renter field of the queue fragment is the current client Under the progress field.
  • the method further includes:
  • the renewal lease judgment sub-module is configured to determine, in the queue lease table, whether the preemptor field and the consumer field under the queue fragment are current clients;
  • the first renewal sub-module is configured to re-lease the queue fragment for the current client if the pre-emptor field and the consumer field in the queue fragment are the current client.
  • each compute node is a client.
  • the scheduling server can allocate queue fragments.
  • Each client includes a state data acquisition module 310, a consumption determination module 320, a progress acquisition module 330, and a consumption module 340.
  • each client can also include a corresponding preferred module.
  • the queue fragment is the queue fragment to be consumed by the client, and the client needs to first obtain the method based on the lease agreement.
  • the status data of the queue fragment is determined according to the status data, and it is determined whether another client is consuming the queue fragment, and the client obtains the queue after determining that no other client is consuming the queue fragment.
  • Fragment the current consumption progress, and update the status data of the queue fragment, and then the client continues to consume the queue at the current consumption schedule. Fragment and record new consumption progress.
  • the consumption progress of the queue fragment can be seamlessly transmitted to the current client, so that the queue fragment load balancing is performed. Or, if a client that is consuming the shard of the queue is smashed, and the queue fragment of a client is preempted by the current client, the current client can continue to consume the queue fragment according to the consumed consumption schedule, thereby avoiding part of the data. Repeat consumption to make consumption results more accurate
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more consumer devices (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) Or other optical storage, magnetic tape cartridge, magnetic tape storage or other magnetic storage device or any other non-transportable media that can be used to store information that can be accessed by the computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a consumer of a general purpose computer, a special purpose computer, an embedded consumer machine, or other programmable data consumer terminal device to produce a machine that causes instructions to be executed by a consumer of a computer or other programmable data consumer terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data consumer terminal device to operate in a particular manner, such that instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)

Abstract

一种分布式环境协调消费队列方法和装置,涉及分布式技术领域。所述方法包括:针对一待消费的队列分片,基于租赁协议获取的所述队列分片的状态数据(110);根据所述状态数据,判断是否有其他客户端在消费所述队列分片(120);如果确定没有其他客户端在消费所述队列分片,则更新所述队列分片的状态数据,并获取所述队列分片当前的消费进度(130);根据当前的消费进度继续消费所述队列分片,并将所述队列分片新的消费进度进行记录(140)。所述方法可以在负载均衡、某个客户端宕掉等情况下,在当前客户端A抢占其他客户端正在消费的队列分片时,队列分片的消费进度可以无缝传递到当前客户端中,避免部分数据的重复消费,使消费结果更精确。

Description

一种分布式环境协调消费队列方法和装置 技术领域
本申请涉及分布式技术领域,特别是涉及一种分布式环境协调消费队列方法和一种分布式环境协调消费队列装置。
背景技术
随着云计算、大数据时代到来,数据产生的来源越来越广,速度越来越快,数量也越来越大。如web服务器上、各种客户端、传感器等,实时产生了海量的数据,记录了各种用户访问请求、监控数据、程序运行状态等信息。为了更好挖掘数据的价值,往往有多个系统消费这些数据,如各类实时、离线系统对数据进行的用户行为分析、监控报警、程序入侵检测等。
为了降低数据生产和消费的耦合度,通常使用队列系统缓存、汇集多个生成者生产的数据,再由多个作为计算节点的客户端从队列中消费数据。一般队列系统为了能支持海量数据,都会使用多个队列分片(如shard或partition)来支持海量数据的水平扩展。为了增加写入的吞吐量,只需要增加shard(或partition)的个数即可。
当作为计算节点的客户端需要从多个shard(或partition)中消费数据的时候,就需要在多个客户端上运行相同的程序来协同消费这些数据。而客户端在消费数据的时候,同样也需要指定从哪个shard(或partition)中抓取数据,完成消费,通常对于一个由N个shard(或partition)组成的数据队列,会由M(M的取值范围为1~N)个消费客户端来协同消费这些shard(或partition),平均每台客户端消费N/M个shard(或partition)。为了正确消费所有shard(或partition)中的数据,这M个客户端之间需要协同合作。
在先技术中,对于分布式环境下的各个客户端,存在以下方案协同各个客户端对各个队列分片进行消费:如采用Kinesis Client Library。
Kinesis是AWS提供的数据实时队列服务,Kinesis Client Library是用于消费Kinesis数据的协同lib。Kinesis Client Library依赖于DynamoDB来完成客户端之间的协同。虽然Kinesis Client Library支持新增客户端消费数据,但是进行负载均衡等操作时,在某个客户端对抢占队列分片,直接重新进行消费,因此会导致部分数据的重复消费,而导致消费结果不精确。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种分布式环境协调消费队列方法和相应的一种分布式环境协调消费队列装置。
为了解决上述问题,本申请公开了一种分布式环境协调消费队列的方法,包括:
针对一待消费的队列分片,基于租赁协议获取的所述队列分片的状态数据;
根据所述状态数据,判断是否有其他客户端在消费所述队列分片;
如果确定没有其他客户端在消费所述队列分片,则更新所述队列分片的状态数据,并获取所述队列分片当前的消费进度;
根据当前的消费进度继续消费所述队列分片,并将所述队列分片新的消费进度进行记录。
本申请还公开了一种分布式环境协调消费队列的装置,包括:
状态数据获取模块,用于针对一待消费的队列分片,基于租赁协议获取的所述队列分片的状态数据;
消费判断模块,用于根据所述状态数据,判断是否有其他客户端在消费所述队列分片;
进度获取模块,用于如果确定没有其他客户端在消费所述队列分片,则更新所述队列分片的状态数据,并获取所述队列分片当前的消费进度;
消费模块,用于根据当前的消费进度继续消费所述队列分片,并将所述队列分片新的消费进度进行记录。
本申请实施例包括以下优点:
本申请实施例在分布式环境下,对于一客户端,其在抢占到队列分片后,该队列分片对于该客户端即是待消费的队列分片,那么该客户端需要首先基于租赁协议获取的所述队列分片的状态数据,然后根据所述状态数据,判断是否有其他客户端在消费所述队列分片,在确定没有其他客户端在消费该队列分片后,该客户端才会获取该队列分片当前的消费进度,并且更新该队列分片的状态数据,然后该客户端再以当前的消费进度继续消费该队列分片并记录新的消费进度。本发明实施例通过上述过程,在当前客户端A抢占其他客户端正在消费的队列分片时,该队列分片的消费进度可以无缝传递到当前客户端中,使在进行队列分片负载均衡时,或者某个正在消费队列分片的客户端宕掉,某个客户端的队列分片被当前客户端抢占后,当前客户端A抢占到上述队列分片后,可以按照该队列分片的已消费的消费进度,继续消费该队列分片,避免部分数据的重复消费,使消费结果更精确。
附图说明
图1是本申请的一种分布式环境协调消费队列方法实施例的步骤流程图;
图2是本申请的一种分布式环境协调消费队列装置实施例的结构框图;
图2A是本申请一种分布式环境的架构示例。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
本申请实施例的核心构思之一在于,对于分布式环境下,当生成者 写入数据的时候,可以通过哈希或者轮询调度的方式将数据写入不同的队列分片(shard或partition)来完成负载均衡。而作为分布式环境下的计算节点的客户端,其在消费数据的时候,同样也需要指定从哪个队列分片中抓取数据,完成消费,通常对于一个由N个队列分片组成的数据队列,会由M(M的取值范围为1~N)个消费客户端来协同消费这些队列分片,平均每个客户端消费N/M个队列分片。本申请实施例中,为了方便描述,队列分片以shard表示。
为了正确消费所有shard中的数据,这M个客户端之间需要协同合作,其需要考虑如下几个方面:第一,各客户端如何正确选择消费的shard,以确保任意时刻任意shard有且只有一个消费客户端;第二,同时如何处理某个或者某些客户端宕机的情况下,进行再负载均衡时,所有的shard上的数据仍旧能够正确处理,不被重复消费;第三,当处理压力增大时,需要新增处理的客户端,如何做到自动负载均衡,并能保证任意数据不被重复消费。
基于保证数据不被重复消费的考虑,本申请实施例在分布式环境下,对于一客户端A,其在抢占到队列分片后,该队列分片对于该客户端A即是待消费的队列分片,那么该客户端A需要首先基于租赁协议获取的所述队列分片的状态数据,然后根据所述状态数据,判断是否有其他客户端在消费所述队列分片,在确定没有其他客户端在消费该队列分片后,该客户端A才会获取该队列分片当前的消费进度,并且更新该队列分片的状态数据,然后该客户端A再以当前的消费进度继续消费该队列分片并记录新的消费进度。如此循环,当另一客户端B抢占客户端的该队列分片时,则按照相同的逻辑执行。本发明实施例通过上述过程,在当前客户端A抢占其他客户端正在消费的队列分片时,该队列分片的消费进度可以无缝传递到当前客户端中,使在进行队列分片负载均衡时,或者某个正在消费队列分片的客户端宕掉,某个客户端的队列分片被当前客户端抢占后,当前客户端A抢占到上述队列分片后,可以按照该队列分片的已消费的消费进度,继续消费该队列分片,避免部分数据的重复消费,使消费结果更精确。
为了更清楚的描述本申请实施例,本申请定义了以下几种术语:
Shard:队列分片,其为数据队列中实际保存数据的容器,一个数据队列由多个shard构成。每个Worker消费数据时需要选择具体的shard。当然,也可用partition表示,本申请不对其加以限制。
Worker:可以理解为一个客户端或者说一个工作进程,对应一个计算节点,一个worker可消费一个或多个shard中的数据。
Worker_name:每个worker的名字,用于区别对应不同客户端。
Lease:每个shard都有一个锁(lease),只有抢占到该lease的worker才能消费该shard中的数据。
Check Point:消费进度,其记录一个shard当前已经被消费的位置信息,表示哪些数据已经被消费完。
实施例一
参照图1,示出了本申请的一种分布式环境协调消费队列方法实施例的步骤流程图,具体可以包括如下步骤:
步骤110,针对一待消费的队列分片,基于租赁协议获取的所述队列分片的状态数据;
本申请实施例提供了一套分布式环境下多消费者协同消费队列中各shard的租赁协议,实现多计算节点协同消费多个shard中的数据的过程。
根据该租赁协议,某个客户端消费shard时,会将其消费进度记录到一个持久化的存储空间中,并且会更新对该shard的状态数据。该shard的状态数据用于判断该shard是否有客户端正在消费。
那么对当前客户端worker A来说,如果其抢占了一个shard,该shard对于该客户端来说,就是一个待消费的shard,则当前客户端worker A需要基于租赁协议获取的所述队列分片的状态数据,进入步骤120。
在本申请一优选的实施例中,步骤110之前还包括:
步骤100,基于租赁协议确定当前客户端需求的待消费的队列分片。
在实际应用中,当前客户端首先需要根据租赁协议,去确定其能够消费几个shard,然后从何处抢占上述shard,作为当前客户端worker A的待消费的shard。
在本申请一优选的实施例中,所述步骤100包括子步骤M1:
子步骤M1,基于租赁协议每隔第一时间周期,确定当前客户端需求的待消费的队列分片。
在本申请实施例中,租赁协议规定了当前客户端worker A每隔第一时间周期,去定其能够消费几个shard,然后从何处抢占上述shard,作为当前客户端的待消费的shard。如此,可以实时的对分布式环境下的各shard进行负载均衡。
在本申请一优选的实施例中,所述步骤100包括子步骤101-102:
子步骤101,获取活跃的客户端总数U、队列分片总数P以及当前客户端已消费的队列分片总数Q,以计算当前客户端需要抢占的队列分片数量N。
在实际应用中,活跃的客户端表示该客户端是正常的,可以分担处理shard。为了计算当前客户端A需要抢占的shard数量N,该数量N需要保证分布式环境的负载均衡,那么需要获取活跃的客户端总数U、队列分片总数P以及当前客户端已消费的队列分片总数Q,然后计算该数量N。
在本申请另一优选的实施例中,子步骤101包括:子步骤A11-A14:
子步骤A11,从持久化存储空间中存储的客户端实例表和队列租期表中获取活跃的客户端总数U。
在本申请实施例中,可以预先在持久化存储空间中构建客户端实例表和队列租期表。该持久化存储空间可以为数据库,或者如分布式缓存,当然也可以为其他类型的持久化存储空间,本申请实施例不对其加以限制。在本申请实施例中优选的所示持久化存储空间为数据库,当然该数据库可 以为mysql数据库,也可以为其他类型的数据库,本申请实施例不对其加以限制。
其中,队列租期表client_shard_lease,可以采用表一的形式:
consume_group shard_id lease_id lease_owner consumer_owner check_point update_time
             
表一
表一中:
consume_group:消费组字段,数据类型为Char(64),是表的主键之一,表示对于某一个队列的消费组名。在实际应用中,shard是以消费组划分的,客户端也是以消费组创建的,因此,本申请可以采用consume_group区分不同的消费组。
shard_id:队列标识字段,数据类型为Char(64),是表的主键之一,队列中各个shard的标识;
lease_id:租用标识字段,数据类型为int(20),worker租用shard使用的id。在本申请实施例中,使用原子的test and set操作,保证任意时刻,只有一个owner能修改lease的值,也就是能抢到该shard。其中,“test”和“set”操作是在一个不可以分割的原子操作中完成,使同一时刻只有1个客户端可以抢占该shard,以保障数据操作的正确性。
lease_owner:租用者字段,数据类型为Char(64),抢占该shard的lease的所有者,即某个worker_name;
consumer_owner:消费者字段,数据类型Char(64),当前正在消费该shard的所有者,即某个worker_name;
check_point:进度字段,数据类型为Text,记录该shard当前已消费的消费进度。
update_time:更新时间字段,数据类型为DateTime,用于记录更新 时间,供监控使用。
当然,在实际应用中,本申请实施例的client_shard_lease表可以包括shard_id、lease_owner、consumer_owner、check_point,以在抢占的客户端和被抢的占客户端之间,传递被抢的占队列分片被被抢客户端所消费的消费进度。其他几个字段可以为优选的字段。
在本申请实施例中,各个客户端消费某个shard时,会更新表一中该shard的状态数据。
其中,客户端实例表client_worker_instance,可以采用表二的形式:
consume_group worker_name create_time
     
表二
consume_group:为消费者字段,数据类型为Char(128),是表的主键之一,表示对于某一个队列的消费组名;
worker_name:为客户端名,数据类型为Char(64),是表的主键之一。
create_time:数据类型为DateTime,是客户端启动的时间。
表二可以记录对一个消费组创建的各个客户端。
在实际应用中,针对一个消费组的客户端启动后,该客户端的worker_name会被写入到该客户端实例表中,并记录其启动的时间。
具体的,在本申请实施例中,客户端中包括了client lib和数据处理逻辑。lient lib执行本申请的抢占和续租等逻辑,数据处理逻辑为正常对shard中数据进行消费的逻辑。其中,上述消费的逻辑,比如用户行为分析、监控报警、程序入侵检测等具体的分析处理操作。其中,client lib可以理解为客户端的一个线程,client lib执行过程大致为:
1、worker启动时,client lib将worker的名称worker_name和当前系 统时间写入到表二所示的client_worker_instance表中。
2、client lib获取队列中所有shard的名称,即shard_id。
3、Client lib从client_shard_lease中获取所有shard_id的信息,包括每个shard对应的lease_id和lease_owner;
4、Client lib判断2中获取的shard_id是否在client_shard_lease表出现;如果没出现,则在client_shard_lease表中添加该shard_id的记录,设置lease_id为0,lease_owner、consumer_owner、check_point、update_time设置为空。
当然在实际应用中,客户端会创建抢占线程和租用线程。抢占线程用于抢占shard,租用线程用于租用shard进行消费。
由于实际应用中,那么本申请实施例可以从表一和表二中获取活跃的客户端,从而计算活跃的客户端总数U。
进一步的,所述子步骤A11包括:子步骤A111-A113:
子步骤A111,获取客户端实例表中记录的最近一个第一时间周期内启动的客户端的数量live1;
在实际应用中,如果一个客户端启动后,运行时间短,则可能该客户端并未抢占任何一个shard,那么在表一中的租用者字段就没有该客户端的记录。
那么,如前述的client_worker_instance表,由于每个客户端启动后,基于其处理的队列的消费组名,都会将其worker_name和启动时间写入该client_worker_instance表中。
本申请实施例可采用客户端的系统时间减去启动时间得到的t,判断该t是否小于第一时间周期T,如果小于,则该客户端是活跃的,则统计得到该类活跃的客户端的数量得到live1。
子步骤A112,获取队列租期表中,没有超时队列分片对应的客户端的数量live2;所述没有超时的客户端为队列分片的租用者字段下记录的客 户端;
本申请实施例还从client_shard_lease表中,获取所有的shard的记录,保存所有lease没有超时的shard记录,然后从中提取lease_owner下的worker_name,其对应的客户端即为活跃的,则统计得到该类活跃的客户端的数量得到live2。
子步骤A113,将live1加上live2得到总的活跃的客户端数量U。
U=live1+live2,该U即为活跃的客户端总数。
子步骤A12,从队列系统中获取队列分片的总个数P;
在本申请实施例中,对于一个消费组,队列系统中获取该消费组下的shard总数P。
子步骤A13,从持久化存储空间中存储的队列租期表中获取当前客户端已消费的队列分片总数Q。
对于client_shard_lease表中的任一shard记录,lease_owner和consumer_owner的worker_name相同时,并且该consumer_owner为当前的抢占的工作进程的consumer_owner,则说明该worker_name占用了该shard。如果lease_owner和consumer_owner的worker_name不相同时,则说明consumer_owner中的worker_name不再允许占用该shard,该shard被其他worker抢占。
因此,本申请实施例则对于一个worker_name,统计client_shard_lease表,lease_owner和consumer_owner的都为该worker_name情况下,shard的数量P,即为该worker已占用的shard数量。
假设表一的记录为如下述的表三的形式:
consume_group shard_id lease_id lease_owner consumer_owner check_point update_time
1 shard_1 1 worker A worker A 1/3 ……
1 shard_2 2 worker A worker A 1/4 ……
1 shard_3 3 worker B worker B 2/5 ……
表三
比如对于当前客户端worker A,在client_shard_lease表中,有shard_1、shard_2的lease_owner和consumer_owner的都为worker A,而shard_3的lease_owner为workerB,consumer_owner为worker B。那么则worker A占用shard_1、shard_2,而不占用shard_3。因此worker A占用的shard数量Q为2。
当然子步骤A11-A13之间的顺序本申请实施例不对其加以限制。
子步骤A14,通过N=[P/U]-Q,计算当前客户端需要抢占的队列分片数量N。
本申请实施例采用N=[P/U]-Q计算中计算当前客户端需要抢占的队列分片数量N。
其中[P/U],计算该worker应该占用的shard个数M。其中[P/U]表示对P/U的计算结果向上取整。
比如在worker A进行具体的抢占之前,worker F占用了11个shard,worker B占用了10个shard,worker C占用了10个shard,worker D占用了10个shard,worker E占用了11个shard。该worker A为新建worker,不占用shard。而总shard为54个,那么worker A应该占用的数量为54/6=9个,worker A需要占用的shard为9-0=9个。其他情况以此类推。
子步骤101,从超时的队列分片和/或者其他客户端正在消费的队列分片中,抢占N个队列分片作为当前客户端的待消费队列分片。
在本申请实施例中,因为对于所有的shard,可能存在如下几种情况:
1、存在超时的队列分片,而超时的shard数量L大于等于N。
2、存在超时的队列分片,而超时的shard数量L小于N。
3、不存在超时的队列分片。
对于第1种情况,worker A直接从超时的shard中抢占N个队列分片即可。
对于第2种情况,worker A可以从超时的队列分片中抢占不大于L的队列分片,剩余的N-L个shard从其他worker中抢占。
对于第3种情况,则直接从其他worker中抢占N各shard。
在本申请另一优选的实施例中,所述子步骤101包括子步骤A21-A24:
子步骤A21,判断所述N是否大于0;
可以理解的是,对于前述步骤得到的N,如果N=0,则对于该worker,不用抢占。如果N>0,则才抢占。
子步骤A22,如果所述N大于0,则判断超时的队列分片数量L是否小于N;
子步骤A23,如果超时的队列分片数量L大于等于N,则从超时的队列分片中抢占N个队列分片;
在本申请实施例中,如果L≥N,则从L个超时队列分片中随机抢占N个。
子步骤A24,如果判断超时的队列分片数量L小于N,则从超时的队列分片中抢占L个队列分片,并从其他客户端正在消费的队列分片中,抢占N-L个队列分片。
如果L<N,则抢占该L个队列分片,然后剩余的N-L个从其他worker正在消费的shard中抢占。
可以理解的是如果L=0,则worker A需要从其他worker正在消费的shard中抢占N个shard。
进一步的,在本申请另一优选的实施例中,子步骤A24中所述从其他客户端正在消费的队列分片中,抢占N-L个队列分片的步骤,包括:子步骤A241:
子步骤A241,当前客户端从其他客户端占用的队列分片中抢占N-L各队列分片,并使各客户端占用最多队列分片的数量和占用最少队列分片的数量之差不超过指定数量。
在本申请实施例中,每个worker的抢占线程,在抢占时,其保证各个worker占用的shard的数量满足以下条件:占用最多队列分片的数量和占用最少队列分片的数量之差不超过指定数量。该指定数量比如为1。
那么当前客户端worker A在从其他worker抢占shard时,也使各客户端占用最多队列分片的数量和占用最少队列分片的数量之差不超过指定数量。
比如在worker A进行具体的抢占之前,worker B占用了11个shard,worker C占用了10个shard,workerD占用了10个shard,worker E占用了10个shard,worker F占用了11个shard。而总shard为54个,超时的队列分片为2个。worker A为新启动的worker。
那么需要为worker A从worker B抢占2个shard,从worker C抢占1个,从worker D抢占1个shard,从worker E抢占1个shard,从worker F抢占2个shard,从超时的shard中抢占2个。保证各个worker占用的shard数量不超过1。其他情况以此类推。
在本申请另一优选的实施例中,所述子步骤A241包括:子步骤A2411-子步骤A2412:
子步骤A2411,基于所述队列租期表,将各个客户端按照其占用的队列分片数量,从多到少进行排序;
在本申请实施例中,在worker A抢占shard之前,首先基于 client_shard_lease表,统计各个worker_name占用的shard_id的数量。
比如前述例子,按数量进行排序得到如表四:
worker B 11
worker C 11
worker D 10
worker E 10
worker F 10
worker A 0
表四
其中,超时的shard有2个,worker A需要抢占9个。
子步骤A2412,每次从前K个客户端中抢占一个或多个抢占队列分片后,使前K个客户端被抢占J个之后,前K个客户端剩下的队列分片的平均数大于第K+1个客户端当前占用的队列分片数量,并且使前K个客户端中,各个客户端剩余的队列分片数与所述平均数相差不超过指定数量,直至成功抢占到N-L个队列分片。
可以理解的是,worker A在抢占其他worker的shard时,对于抢占后一个worker的shard数量不超过前一shard的数量。
以指定数量为1,对前述表四进行抢占,由于为worker A可以抢占2个超时队列。那么worker A需要从其他worker中抢占7个shard。
比如先设定K=1,从第1个抢占,则不满足剩下的shard数量大于第2个的shard数量11。
再设定K=2,由于使前K个工作进程每个剩余的队列分片数与所述平均数相差不超过指定数量,和抢占后剩下的队列分片的平均数大于第K+1个工作进程当前占用的队列分片数量,两个条件的限制,不能从前2个 work中抢占大于2个的shard。那么可以从第1个抢占1个,第2个不抢占。此时worker F还需抢占6个shard,需要继续抢占。
此时表四变更为表五:
worker B 10
worker C 11
worker D 10
worker E 10
worker F 10
worker A 3
表五
再设定K=3,此时因为如果抢占后,剩下的队列分片的平均数大于第K+1个工作进程当前占用的队列分片数量的条件限制,不能从前3个中抢任一个shard。K=4类似。
那么设置K等于5,则从worker C抢占2个shard、从worker B、worker D、worke E、worker F中分别抢占1个shard后,才满足前述两个条件,其他抢占方式不满足前述两个条件。
基于前述队列租期表,在本申请另一优选的实施例中,所述子步骤101包括:
子步骤1011,在抢占一队列分片时,将队列租期表中,所述被抢占队列分片的消费者字段修改为当前的客户端。
比如表三中,当worker A从worker B抢占了shard_3后,则将lease_owner修改为worker A,则表三变化为如下表六:
consume_group shard_id lease_id lease_owner consumer_owner check_point update_time
1 shard_1 1 worker A worker A 1/3 ……
1 shard_2 2 worker A worker A 1/4 ……
1 shard_3 3 worker A worker B 2/5 ……
表六
也可以理解为,客户端在抢占了一个队列分片后,也会根据租赁协议更新该shard的状态数据。
如此,通过上述步骤,为客户端worker A抢占了N了shard,该N个shard对于worker A来说,就是待消费的队列分片。
在实际应用中,在子步骤101之前需要判断超时的队列分片,优选的,在子步骤101之前还包括:子步骤A31-A34:
子步骤A31,每隔第一时间周期,获取队列租期表,并判断所述队列租期表中各队列分片下租用者字段的租用ID,是否与本地存储的map表中上次记录的相应队列分片的租用ID相同;
在本申请实施例中,每个worker的client lib创建了一个map表,该map表包括队列分片标识字段shard_id、租用标识字段lease_id、上次监控时间字段last_update_time。
其中,该map表在worker所在计算服务器的内存中。
在map表创建后,先从client_shard_lease表获取所有shard的相关信息写入map表。该相关信息如以shard_id为主键,对于每个shard,逐条记录shard_id、lease_id和当前系统时间。然后即可进入步骤B11进行超时监控过程。
在本申请实施例中,一个worker的client lib每隔第一时间周期T,从数据库的client_shard_lease表中,获取各个shard_id以及该shard_id的lease_id。
然后,将client_shard_lease表的shard_id+lease_id,与map表中的shard_id+lease_id进行匹配。
如果shard_id未匹配上,则说明该shard是新出现的,则map中没有相应记录,将该shard_id+lease_id添加到map表中,并在对应的last_update_time字段下记录当前的系统时间。
另外,本申请对于新出现的shard。如果shard_id的lease_id不为0,则说明该shard被某个worker抢占或者占用,则可以将该shard_id、lease_id记录到map表,并在last_update_time中记录当前时间。如果shard_id的lease_id为0,说明没有worker抢占或者占用该shard,在map表中记录shard_id的lease_id=0,last_update_time设置为0,该可以直接认为该shard超时。
如果shard_id匹配上,而lease_id未匹配上,则进入步骤A32。
如果shard_id+lease_id匹配上,则进入步骤A33。
子步骤A32,如果队列租期表中一队列分片的租用ID,与map表中上次记录的相应队列分片的租用ID不同,则更新map表中所述队列分片的租用ID为队列租期表中的租用ID,以及更新map表中所述队列分片的上次监控时间字段为当前的系统时间;
在本申请实施例中,由于各个worker的续租线程每隔第二时间周期为该worker续租其lease_owner和consumer_owner相同的shard。其中该第二时间周期小于第一时间周期,第二时间周期比如T1/2。在续租一个shard时,会更改client_shard_lease表lease_id为lease_id+1。那么该worker如果续租shard,则lease_id变化,说明worker在正常消费该shard。如果lease_owner和consumer_owner相同不同,则不允许该worker续租该shard。
那么,本申请实施例的当前客户端workerA,针对各个shard,判断出map表shard_id与队列租期表的shard_id匹配上,而lease_id未匹配上,则确定响应的shard未超时,在map表的shard_id后的last_update_time字段下,将其last_update_time中记录的系统时间更新为当前的系统时间。
子步骤A33,如果队列租期表中一队列分片的租用ID,与map表中上次记录的相应队列分片的租用ID相同,则维持上次监控时间字段下的 系统时间,并判断当前系统时间减去上次监控的系统时间是否大于第一时间周期;
如前所述,那么如果没有为worker续租的shard,则lease_id不变化,说明worker没有正常消费该shard。
那么,worker A维持其map表中的last_update_time字段下的系统时间不变,然后判断当前系统时间sys_time减去last_update_time是否大于第一时间周期T。
子步骤A34,如果当前系统时间减去上次监控的系统时间大于第一时间周期,则确定相应队列分片超时。
如果当前系统时间sys_time减去last_update_time大于第一时间周期T,则说明该shard超时。该shard则会被作为被抢占对象。可以进入步骤A11。
那么在当前客户端抢占了各队列分片后,所述步骤110包括子步骤111:
子步骤111,针对一待消费的队列分片,从持久化存储空间中存储的队列租期表中获取所述队列分片的状态数据。
在本申请实施例中,由于各队列分片被任一一客户端消费时,或者该队列分片新生成时,其状态数据被写入到前述队列租期表中,那么当前客户端worker A在抢占了N个队列分片后,则可以从队列租期表中获取该队列分片的状态数据。进入步骤120。
步骤120,根据所述状态数据,判断是否有其他客户端在消费所述队列分片;
在本申请实施例中,对于当前客户端worker A抢占的N个shard中的每个shard,由于前述步骤获取了每个shard的状态数据,则可以根据所述状态数据,分析这些shard是否还有其他worker正在使用该shard。如果没 有使用该shard,则进入步骤130。如果还在使用,则继续获取该shard新的状态数据,进行上述判断。
优选的,基于前述队列租期表,所述状态数据包括队列租期表中所述队列分片下消费者字段的值、租用者字段的值和更新时间字段的消费时间;所述消费时间为当前客户端抢占所述队列分片后修改所述更新时间字段的值获得。
基于前述的队列租期表,对于worker A抢占的各个shard,根据shard_id从队列租期表中获取该shard的状态数据,如lease_owner、consumer_owner、update_time各自的值。
该update_time的值是worker A抢占该shard时,将消费时间值基于shard_id生成时间更新请求发送到数据库,由数据库更新队列租期表的该shard_id的update_time值得到。
当然,在实际应用中,还可结合消费组去获取shard的状态数据,以及向数据库发送更新请求,以更新队列租期表中的状态数据。
进一步的,所述步骤120包括子步骤121-123:
子步骤121,判断所述队列分片的消费者字段的值的值是否为当前客户端;
在实际应用中,当前客户端worker A,针对其待消费的shard,根据相应的shard_id,向数据库发送状态数据获取请求,以获取相应的状态数据后,可以从状态数据中提取lease_owner的值,如果该值为worker A,则意味着该客户端成功抢占该shard。进入子步骤121。
子步骤122,如果所述队列分片的消费者字段的值是当前客户端,则判断当前客户端的抢占时间是否大于所述消费时间;
子步骤123,如果当前客户端的抢占时间大于所述消费时间,则确定没有其他客户端在消费所述队列分片。
在实际用于中,由于当前客户端worker A抢占了shard后,会更新数 据库中的队列租期表下,相应shard_id的update_time值为消费时间。
在当前客户端worker A本地可以记录抢占时间。
如此,可以判断当前客户端的抢占时间是否大于所述消费时间。如果当前客户端的抢占时间大于所述消费时间,则说明没有其他客户端在消费该队列分片,该队列分片的消费进度没有其他客户端更新。如果当前客户端的抢占时间不大于所述消费时间,则表示还可能有其他客户端在消费该队列分片,其他客户端的消费进度可能还未更新到队列租期表中。
优选的,所述抢占时间为当前客户端的系统时间,所述消费时间为所述客户端抢占所述队列分片时的系统时间与第一周期时间之和。
在本申请实施例中,为了计算方便,在当前客户端workerA抢占shard时,即将队列租期表的lease_owner更新为worker A时,同时会将worker A的当前系统时间加上第一时间周期得到的消费时间,更新到队列租期表该shard_id下的update_time。
那么在步骤110获取到shard_id的状态数据后,子步骤122从update_time提取消费时间,然后直接以worker A当前的系统时间与该消费时间比较,如果当前系统时间超过该消费时间,则确定没有其他客户端在消费所述队列分片。
在实际应用中,由于各个worker在不同的计算服务器中,而由于计算服务器的系统不同,所以各个worker的系统时间也可能存在差异。那么本申请则针对各个worker记录其自己的系统时间,而统一第一时间周期T。以前述worker A为例,通过前述步骤可以很容易得到从worker A抢占开始后,经过T后,系统应该到达的系统时间,该应该到达的系统时间即前述消费时间。当worker A的系统时间超过shard_3的消费时间,则确定没有其他客户端在消费所述队列分片。比如worker A抢占shard_3时的系统时间为12:00:00:000,第一时间周期T为50ms,那么消费时间为12:00:00:050,此时worker A可以每隔T/2,从数据库获取队列租期表的shard_3的状态数据,该状态数据包括update_time。那么若系统时间走到12:00:00:051,则大于消费时间,则意味着其他客户端在第一时间周期T 内,将最新的消费进度写入了shard_3的check_point中,可以进入步骤130。如系统时间不大于消费时间,则说明可能存在其他客户端在消费所述队列分片。如此,避免了各个客户端的系统时间的差异,导致当前客户端对确定没有其他客户端在消费所述队列分片的判断出错。
步骤130,如果确定没有其他客户端在消费所述队列分片,则更新所述队列分片的状态数据,并获取所述队列分片当前的消费进度;
在本申请实施例中,如果抢占时间大于消费时间,则确定没有其他客户端在消费所述队列分片,可以向数据库发送更新请求,更新队列租期表中该队列分片的状态数据,同时从数据库的队列租期表中,获取该队列分片的进度字段的消费进度。
如前所述,如果抢占时间不大于消费时间,则可能有其他客户端在消费该shard,可继续执行步骤110,获取状态数据进行判断。
基于前述队列租期表,在本申请另一优选的实施例中,步骤130包括子步骤131-132:
子步骤131,租用所述队列分片,并将持久化存储空间中存储队列租期表中所述队列分片下的消费者字段更新为当前客户端;
子步骤132,从持久化存储空间中存储的队列租期表中,获取所述队列分片的进度字段下的消费进度。
在实际应用中,对于worker A抢占的任一待消费的shard,当判断抢占时间大于消费时间后,则说明队列租期表中记录的该shard的消费进度为最新的shard,并且没有其他worker在消费该shard。
所以,worker A的租用线程可以租用该shard,然后向数据库发送请求,以更新数据库中队列租期表的相应shard_id的consumer_owner为worker A。当然,在实际应用中,还更新队列租期表中该shard_id下的lease_id为lease_id+1,表示worker A占用了该队列分片。
同时,可以从队列租期表的相应shard_id的update_time下,获取其记 录消费进度。
如果抢占时间大于消费时间,那么可以只更新队列租期表中该shard_id下的lease_id为lease_id+1,不让其他客户端抢占该shard。
如此,可以通过该上述方式,对于其他客户端消费过的shard,其最后的消费进度可以无缝转移到当前客户端中,不存在重复消费的问题。
在本申请另一优选的实施例中,在子步骤132之后,还包括子步骤133-134:
子步骤133,判断在队列租期表中,所述队列分片下的抢占者字段和消费者字段是否为当前客户端;
子步骤134,如果述队列分片下的抢占者字段和消费者字段是否为当前客户端,则为当前客户端续租所述队列分片。
可以理解,每个worker的续租线程判断该worker占用各个shard,在client_shard_lease表各个shard_id下的consumer_onwer和lease_owner是否相同。如果相同,则为该worker续租该shard。如果不同,则不为该worker续租该shard。
在实际应用中,每个worker的续租线程每隔T1/2执行步骤A41至A42的过程。
在每个工作进程消费的过程中,还包括:
步骤A31,针对每个工作进程,在消费所述工作进程所占用的队列分片时,每隔第一时间周期,在工作进程所在内存中,基于上一次的租用ID生成新的租用ID;
在本申请实施例中,对于每个worker消费任一shard的过程中,每隔第一时间周期T。
步骤A32,判断所述工作进程内存中记录的上一次租用ID,与队列租期表中相应队列分片的租用ID是否相同;
步骤A33,如果相同,则在更新队列租期表中租用标识字段为所述新的租用ID;
步骤A34,如果不同,则拒绝更新队列租期表中租用标识字段为所述新的租用ID。
在本申请实施例中,各个worker对应的clientlib以当前内存中各shard的lease_id去更新数据库中该shard的lease_id。client lib每隔T时间在内存中将该shard的lease_id+1,同时以上次的lease_id与client_shard_lease表中该shard的lease_id比较,如果相同,则允许更新client_shard_lease表的lease_id为lease_id+1,如果不同则不允许进行上述更新。由于该操作的原子性,能够保证任意时刻只有一个worker能更新成功,即只有一个worker能抢到该shard的lease。
步骤140,根据当前的消费进度继续消费所述队列分片,并将所述队列分片新的消费进度进行记录。
当获取到上述shard的消费进度,则worker A可以从该shard的消费进度处继续消费该shard。同时,在消费该shard的过程中,将新的消费进度更新到数据库的队列租期表中。
优选的,步骤140中将所述队列分片新的消费进度进行记录的步骤,包括:子步骤A41-A42:
子步骤A41,每隔第二时间周期,判断队列租期表中,所述队列分片的租用者字段是否为当前客户端;
子步骤A42,如果所述队列分片的租用者字段是当前客户端,则将当前客户端对所述队列分片的消费进度,更新到队列租期表中所述队列分片的进度字段下。
比如对于前述表六,当前客户端worker A确定可以对shard_3进行消费之后,则从表六的shard_3之下的check_point字段下读取消费进度2/5,然后从shard_3的2/5处开始读取数据进行消费。
需要说明的是,如果数据库该shard的check_point的信息为空,则根据预先的配置,从shard的begin或者end处开始读取数据。优选配置从begin处开始读取数据,避免数据遗漏。
同时,在worker A消费shard_3的过程中,将新的消费进度更新到数据库的队列租期表的shard_3的check_point字段下。
需要说明的是,本申请实施例的client lib提供了check point接口,通过该check point接口来完成保存check point的信息,确保在worker进行fail over,或shard被不同worker抢占的时候,worker正确消费shard中数据。Check point的使用有以下两个部分组成:
(1)用于worker初始化check point的部分:
(1.1)当一个shard被确定可消费的时候,client自动从数据库加载check point信息。即执行步骤120的相关方法。
如果数据库该shard的check_point的信息为空,则根据预先的配置,从shard的begin或者end处开始读取数据。
(2)用于worker持久化check point的部分,即执行步骤140相关方法:
(2.1)Clientlib提供接口进行check point的操作saveCheckPoint(Bool persistent)。其中,参数persistent用于控制是否需要理解持久化到外部数据库中,如果persistent为true,则立即持久化至数据库,否则的话,每隔一定时间,持久化一次。在本申请实施例中persistent的值由client lib控制。
(2.11)如果persistent为true,则将消费进度持久化到数据库中相应shard的check_point字段下。在本申请实施例中,在队列分片的队列租期表中的租用者字段被修改为一抢占的工作进程后,则在当前所处的第一时间周期T结束时,将persistent修改为true,从而可以立即将消费进度持久化到数据库中相应shard的check_point字段下。
(2.12)如果persistent不为true,则将消费进度保存在内存中,同时,后台会定期将超过指定时间长度长时间还没有持久化消费进度,持久化到数据库中相应shard的check_point字段下。
在本申请实施例中,在队列分片的lease_owner和consumer_owner相同时,可以保持persistent不为true,从而可以定期将超过指定时间长度长时间还没有持久化消费进度,持久化到数据库中相应shard的check_point字段下。该定期如2T。
(2.2)对于一个shard,在正常情况下,只有当数据库中,consumer_onwer和worker_name相同的时候,clientlib才将消费进度持久化到数据库中相应shard的check_point字段下。
(2.3)当一个shard被其他worker抢占或者宕掉之后,clientlib调用shutdown接口,通知上层应用,将消费进度持久化到数据库中相应shard的check_point字段下。
当worker的shard被抢占之后,由于在第一时间周期T内,数据库中该shard的consumer_owner信息尚未被更新,因此被抢占的worker能够正确持久化其消费的各shard的check point信息。
本申请实施例在分布式环境下,对于一客户端,其在抢占到队列分片后,该队列分片对于该客户端即是待消费的队列分片,那么该客户端需要首先基于租赁协议获取的所述队列分片的状态数据,然后根据所述状态数据,判断是否有其他客户端在消费所述队列分片,在确定没有其他客户端在消费该队列分片后,该客户端才会获取该队列分片当前的消费进度,并且更新该队列分片的状态数据,然后该客户端再以当前的消费进度继续消费该队列分片并记录新的消费进度。本发明实施例通过上述过程,在当前客户端抢占其他客户端正在消费的队列分片时,该队列分片的消费进度可以无缝传递到当前客户端中,使在进行队列分片负载均衡时,或者某个正在消费队列分片的客户端宕掉,某个客户端的队列分片被当前客户端抢占后,当前客户端可以按照已消费的消费进度继续消费该队列分片,避免部分数据的重复消费,使消费结果更精确。另外,对于 前述三个方面的考虑:第一,各计算服务器如何正确选择消费的shard,以确保任意时刻任意shard有且只有一个消费计算服务器;第二,同时如何处理某个或者某些计算服务器宕机的情况下,进行再负载均衡时,所有的shard上的数据仍旧能够正确处理,不被重复消费;第三,当处理压力增大是,需要新增处理的计算服务器,如何做到自动负载均衡,并能保证任意数据不被重复消费。在先技术还存在采用Kafka Client的方案,Kafka是一个专门处理的日志的分布式消息队列,提供消息的发布-订阅功能。Kakfa提供一套高级消费API(Application Programming Interface,应用程序编程接口)完成多个工作进程之间的同步,该API依赖kafka后台zookeeper系统。由于依赖zookeeper系统,工作进程A对某个队列分片A的key更新的数据更新后,不能及时传回zookeeper,那么如果再负载均衡时,其他工作进程拿到的队列分片A的key还是更新之前的数据。因此,Kafka系统中,当不同工作进程看到的zookeeper数据可能是不同的版本,而导致failover时负载均衡操作失败。
本申请实施例的上述过程,本申请的工作进程直接面对队列分片,对队列分片进行处理,也避免了Kafka系统再负载均衡时的,不同工作进程看到不同数据版本的问题。
再者,在先技术中,Kinesis Client Library在再负载均衡时,每次只能对一个队列分片进行负载均衡,比如工作进程A正在消费100个队列分片,而对于工作进程B,负载均衡时要从工作进程A抢占队列分片,其每次负载均衡时,只能从工作进程A抢占1个队列分片,如果实际上A要从B抢占30个队列分片的话,要触发30次负载均衡操作,导致负载均衡耗用时间长,不能快速到达稳定状态
而本申请,在一次负载均衡过程中,可以一次性的抢占该工作进程应该抢占的N各队列分片,本申请实施例可以快速批量选择需要抢占shard的方法,从而是各worker快速达到稳定状态,并且使每个worker消费的shard个数均衡。
尤其是以存储在持久化存储空间的客户端实例表和队列租期表为基 础,本申请实施例通过队列租期表,然后在该队列租期表中的消费者字段consumer_owner、租用者字段lease_owner和进度字段check_point,分别记录各个队列分片的抢占者、占用者和占用者的消费进度参数。然后在一工作进程抢占另一工作进程占用的队列分片时,通过队列租期表中该队列分片的消费者字段consumer_owner、租用者字段lease_owner和进度字段check_point,将消费进度参数传递给抢占的工作进程,使在新增工作进程、某个工作进程所在计算服务器宕机、或者某个工作进程fail over后,重新进行负载均衡时,其原来占用的队列分片被其他工作进程抢占时,抢占的队列分片可以按照其已消费的位置继续进行消费,避免部分数据的重复消费,使消费结果更精确。
为了更清楚的描述本申请,本实施例以队列租期表和客户端实例表为基础,以某个客户端worker A为新启动的情况进行描述:
步骤201,预先在数据库中创建客户端实例表和队列租期表。
类似实施例一的原理创建client_worker_instance表和client_shard_lease表。
然后对于每个新启动的客户端执行步骤202-227。
步骤202,启动客户端,并将客户端名称写入客户端实例表。
worker A动时,会启动其中的client lib线程,由客户端的client lib线程将worker_name,当前系统时间写入到client_worker_instance表。
步骤203,从队列系统中获取所有的队列分片的队列分片标识。
由Client lib线程获取队列系统中所有shard的shard_id。
步骤204,从数据库的队列租期表中获取所有队列分片的租用ID字段和租用者字段。
Client lib线程根据shard_id,从client_shard_lease表中获取所有shard的lease_id和lease_owner
步骤205,判断队列系统中所有队列分片是否有未出现在队列租期表 中;如果有未出现在队列租期表中的队列分片,则将该队列分片的相关信息写入数据库的队列租期表中。
当Client lib线程判断有新的shard_id未出现在client_shard_lease表中,则向client_shard_lease表写入该shard_id对应的记录,将对应的lease_id为0,其他字段设置为空。
然后Client lib线程创建抢占线程和续租线程。抢占线程执行队列分片的抢占过程,续租线程执行队列分片的租用过程。
步骤206,获取客户端实例表中记录的最近一个第一时间周期内启动的客户端的数量live1.
步骤207,获取队列租期表中,没有超时队列分片对应的客户端的数量live2;所述没有超时的客户端为队列分片的租用者字段下记录的客户端。
步骤208,将live1加上live2得到总的活跃的客户端数量U。
步骤209,从队列系统中获取队列分片的总个数P;
步骤210,从队列租期表中获取当前客户端已消费的队列分片总数Q;
步骤211,通过N=[P/U]-Q,计算当前客户端需要抢占的队列分片数量N。
步骤212,每隔第一时间周期,获取队列租期表,并判断所述队列租期表中各队列分片下租用者字段的租用ID,是否与本地存储的map表中上次记录的相应队列分片的租用ID相同;
客户端的抢占线程每隔第一时间周期,获取数据库中client_shard_lease表中各shard的lease_id,将其与map表中上次记录的相应shard的lease_id相进行比较,判断是否相同。
步骤213,如果队列租期表中一队列分片的租用ID,与map表中上次记录的相应队列分片的租用ID不同,则更新map表中所述队列分片的租用ID为队列租期表中的租用ID,以及更新map表中所述队列分片的上次 监控时间字段为当前的系统时间;
如果client_shard_lease表中一shard的lease_id,与map表中上次记录的相应shard的lease_id不同,则更新map表中所述shard的lease_id为client_shard_lease表中的lease_id,以及更新map表中所述shard的监控字段为当前的系统时间
步骤214,如果队列租期表中一队列分片的租用ID,与map表中上次记录的相应队列分片的租用ID相同,则维持上次监控时间字段下的系统时间,并判断当前系统时间减去上次监控的系统时间是否大于第一时间周期;
如果client_shard_lease表中一shard的lease_id,与map表中上次记录的相应shard的lease_id相同,则维持上次监控字段下的系统时间,并判断当前系统时间减去上次监控的系统时间是否大于第一时间周期T。
步骤215,如果当前系统时间减去上次监控的系统时间大于第一时间周期,则确定相应队列分片超时。
步骤216,判断所述N是否大于0;
步骤217,如果所述N大于0,则判断超时的队列分片数量L是否小于N;
步骤218,如果超时的队列分片数量L大于等于N,则从超时的队列分片中抢占N个队列分片;
步骤219,如果判断超时的队列分片数量L小于N,则从超时的队列分片中抢占L个队列分片,并从其他客户端正在消费的队列分片中,抢占N-L个队列分片。
其中,针对被抢占的队列分片,在抢占一队列分片时,将数据库的队列租期表中,所述被抢占队列分片的消费者字段修改为当前的客户端,并将队列租期表中的更新时间字段修改为消费时间;所述消费时间为所述客户端抢占所述队列分片时的系统时间与第一周期时间之和。
对于一个被抢占的shard,将client_shard_lease表中,所述被抢占的shard的lease_owner修改为所述worker A,并在update_time下记录消费时间。并将所述被抢占的shard放入续租线程。
对于从其他客户端抢占的shard,由于该被抢客户端判断本地map中的租用ID与队列租期表中的不同,此时消费者字段与该被抢客户端还相同,则可被抢客户端时间更新消费进度,并且由于续租的周期为T/2,被抢客户端不能续租该shard,不能续租则该被抢客户端就不能继续处理该shard。如此保证了队列分片的消费进度无缝在各客户端中传递。
上述步骤212-步骤219为前述抢占线程执行,抢占线程每个2*T执行一次。在客户端worker A新加入时其所有队列分片均为上述步骤抢占得到。在后续每次执行,如果通过上述步骤确定该客户端要负载均衡其他客户端的队列分片,则执行抢占过程。
步骤220,对于抢占的队列分片,从队列租期表中获取该队列分片的消费这字段的值,并判断所述队列分片的消费者字段的值的值是否为当前客户端。
步骤221,如果所述队列分片的消费者字段的值的值是当前客户端,则判断当前客户端的抢占时间是否大于所述消费时间。
本步骤判断抢占的客户端的系统时间是否大于update_time下的消费时间。
步骤222,如果当前客户端的抢占时间大于所述消费时间,则租用所述队列分片,并将持久化存储空间中存储队列租期表中所述队列分片下的消费者字段更新为当前客户端。
判断抢占的客户端的系统时间是否大于update_time下的消费时间;如果所述客户端的系统时间超过所述消费时间,则将所述被抢占shard的consumer_owner修改为worker A的客户端,并将所述被抢占的shard的lease_id修改为lease_id+1,如此worker A则租用了该shard,并且意味着worker A抢占成功上述shard。
步骤223,对于客户端占用的队列分片,从数据库的队列租期表中获取抢占者字段值和租用者字段值,并判断所述队列分片下的抢占者字段和消费者字段是否为当前客户端;
步骤224,如果述队列分片下的抢占者字段和消费者字段是否为当前客户端,则为当前客户端续租所述队列分片。
上述步骤220-224为续租线程执行的步骤,续租现在每隔T/2,对于当前客户端占用的shard,当判断client_shard_lease表中的lease_owner和consumer_owner相同,则进行续租。
步骤225,对于客户端抢占成功的队列分片,从数据库的队列租期表的该队列分片下的进度字段中获取消费进度。
client则从client_shard_lease表的check_point字段提取消费进度。当该消费进度为空是,则该shard的消费进度为从头部开始消费。
步骤226,客户端从该队列分片的该消费进度位置继续消费所述队列分片。
步骤227,客户端在消费队列分片的过程中,将消费进度更新到数据库中该队列分片下的进度字段中。
在worker消费shard的过程中,Client lib线程调用checkpoint的接口,每隔T时间,将消费进度更新到client_shard_lease表的check_point下;或者在该shard lease_owner被抢占之后,续租周期到达时,调用checkpoint的shutdown接口,通知上层应用,将当前消费成功的check point信息持久化到client_shard_lease表的check_point下;或者,在worker A宕掉时,调用checkpoint的shutdown接口,通知上层应用,将当前消费成功的check point信息持久化到client_shard_lease表的check_point下。
当然,对于客户端执行第一次抢占队列分片之后,其可以执行步骤203及以后的步骤。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
实施例二
参照图2,示出了本申请的一种分布式环境协调消费队列的系统实施例的结构框图,具体可以包括如下模块:
状态数据获取模块310,用于针对一待消费的队列分片,基于租赁协议获取的所述队列分片的状态数据;
消费判断模块320,用于根据所述状态数据,判断是否有其他客户端在消费所述队列分片;
进度获取模块330,用于如果确定没有其他客户端在消费所述队列分片,则更新所述队列分片的状态数据,并获取所述队列分片当前的消费进度;
消费模块340,用于根据当前的消费进度继续消费所述队列分片,并将所述队列分片新的消费进度进行记录。
在本申请一优选的实施例中,在状态数据获取模块310之前,还包括:
队列分片确定模块,用于基于租赁协议确定当前客户端需求的待消费的队列分片。
在本申请一优选的实施例中,所述队列分片确定模块包括:
需求数量确定子模块,用于获取活跃的客户端总数U、队列分片总数P以及当前客户端已消费的队列分片总数Q,以计算当前客户端需要抢占的队列分片数量N;
抢占子模块,用于从超时的队列分片和/或者其他客户端正在消费的队 列分片中,抢占N个队列分片作为当前客户端的待消费队列分片。
在本申请一优选的实施例中,所述需求数量确定子模块包括:
客户端总数获取子模块,用于从持久化存储空间中存储的客户端实例表和队列租期表中获取活跃的客户端总数U;
整体队列分片总数获取子模块,用于从队列系统中获取队列分片的总个数P;
单体队列分片总数获取子模块,用于从持久化存储空间中存储的队列租期表中获取当前客户端已消费的队列分片总数Q;
需求数量计算子模块,用于通过N=[P/U]-Q,计算当前客户端需要抢占的队列分片数量N。
在本申请一优选的实施例中,所述客户端总数获取子模块包括:
新建客户端数量获取子模块,用于获取客户端实例表中记录的最近一个第一时间周期内启动的客户端的数量live1;
未超时客户端数量获取子模块,用于获取队列租期表中,没有超时队列分片对应的客户端的数量live2;所述没有超时的客户端为队列分片的租用者字段下记录的客户端;
客户端数量累加子模块,用于将live1加上live2得到总的活跃的客户端数量U。
在本申请一优选的实施例中,所述抢占子模块包括:
第一判断子模块,用于判断所述N是否大于0;
第二判断子模块,用于如果所述N大于0,则判断超时的队列分片数量L是否小于N;
全超时抢占子模块,用于如果超时的队列分片数量L大于等于N,则从超时的队列分片中抢占N个队列分片;
混和抢占子模块,用于如果判断超时的队列分片数量L小于N,则从超时的队列分片中抢占L个队列分片,并从其他客户端正在消费的队列分片中,抢占N-L个队列分片。
在本申请一优选的实施例中,所述混和抢占模块包括:
第一混和抢占子模块,用于当前客户端从其他客户端占用的队列分片中抢占N-L各队列分片,并使各客户端占用最多队列分片的数量和占用最少队列分片的数量之差不超过指定数量。
在本申请一优选的实施例中,所述第一混和抢占子模块包括:
排序子模块,用于基于所述队列租期表,将各个客户端按照其占用的队列分片数量,从多到少进行排序;
第二混和抢占模块,用于每次从前K个客户端中抢占一个或多个抢占队列分片后,使前K个客户端被抢占J个之后,前K个客户端剩下的队列分片的平均数大于第K+1个客户端当前占用的队列分片数量,并且使前K个客户端中,各个客户端剩余的队列分片数与所述平均数相差不超过指定数量,直至成功抢占到N-L个队列分片。
在本申请一优选的实施例中,所述抢占子模块包括:
消费者字段修改子模块,用于在抢占一队列分片时,将队列租期表中,所述被抢占队列分片的消费者字段修改为当前的客户端。
在本申请一优选的实施例中,所述抢占子模块之前,还包括:
map表判断子模块,用于每隔第一时间周期,获取队列租期表,并判断所述队列租期表中各队列分片下租用者字段的租用ID,是否与本地存储的map表中上次记录的相应队列分片的租用ID相同;
map监控时间更新子模块,用于如果队列租期表中一队列分片的租用ID,与map表中上次记录的相应队列分片的租用ID不同,则更新map表中所述队列分片的租用ID为队列租期表中的租用ID,以及更新map表中所述队列分片的上次监控时间字段为当前的系统时间;
map监控时间维持子模块,用于如果队列租期表中一队列分片的租用ID,与map表中上次记录的相应队列分片的租用ID相同,则维持上次监控时间字段下的系统时间,并判断当前系统时间减去上次监控的系统时间是否大于第一时间周期;
超时判断子模块,用于如果当前系统时间减去上次监控的系统时间大于第一时间周期,则确定相应队列分片超时。
在本申请一优选的实施例中,所述队列分片确定模块包括:
第一队列分片确定子模块,用于基于租赁协议每隔第一时间周期,确定当前客户端需求的待消费的队列分片。
在本申请一优选的实施例中,所述状态数据获取模块310包括:
第一状态数据获取子模块,用于针对一待消费的队列分片,从持久化存储空间中存储的队列租期表中获取所述队列分片的状态数据。
在本申请一优选的实施例中,所述状态数据包括队列租期表中所述队列分片下消费者字段的值、租用者字段的值和更新时间字段的消费时间;所述消费时间为当前客户端抢占所述队列分片后修改所述更新时间字段的值获得;
则,所述消费判断模块320包括:
消费者字段判断子模块,用于判断所述队列分片的消费者字段的值的值是否为当前客户端;
抢占时间判断子模块,用于如果所述队列分片的消费者字段的值的值是当前客户端,则判断当前客户端的抢占时间是否大于所述消费时间;
确定子模块,用于如果当前客户端的抢占时间大于所述消费时间,则确定没有其他客户端在消费所述队列分片。
在本申请一优选的实施例中,所述抢占时间为当前客户端的系统时间,所述消费时间为所述客户端抢占所述队列分片时的系统时间与第一周期时间之和。
在本申请一优选的实施例中,所述进度获取模块330包括:
租用子模块,用于租用所述队列分片,并将持久化存储空间中存储队列租期表中所述队列分片下的消费者字段更新为当前客户端;
进度字段读取模块,用于从持久化存储空间中存储的队列租期表中,获取所述队列分片的进度字段下的消费进度。
在本申请一优选的实施例中,所述消费模块320包括:
租用者判断子模块,用于每隔第二时间周期,判断队列租期表中,所述队列分片的租用者字段是否为当前客户端;
消费进度更新子模块,用于如果所述队列分片的租用者字段是当前客户端,则将当前客户端对所述队列分片的消费进度,更新到队列租期表中所述队列分片的进度字段下。
在本申请一优选的实施例中,在租用子模块之后,还包括:
续租判断子模块,用于判断在队列租期表中,所述队列分片下的抢占者字段和消费者字段是否为当前客户端;
第一续租子模块,用于如果述队列分片下的抢占者字段和消费者字段是否为当前客户端,则为当前客户端续租所述队列分片。
结合图2A,其示出了本申请的分布式环境的示例,其中各个计算节点即为一个客户端。调度服务器可以分配队列分片。每个客户端包括:状态数据获取模块310、消费判断模块320、进度获取模块330、消费模块340。当然每个客户端也可包括相应优选的模块。
在分布式环境下,对于一客户端,其在抢占到队列分片后,该队列分片对于该客户端即是待消费的队列分片,那么该客户端需要首先基于租赁协议获取的所述队列分片的状态数据,然后根据所述状态数据,判断是否有其他客户端在消费所述队列分片,在确定没有其他客户端在消费该队列分片后,该客户端才会获取该队列分片当前的消费进度,并且更新该队列分片的状态数据,然后该客户端再以当前的消费进度继续消费该队列 分片并记录新的消费进度。本发明实施例通过上述过程,在当前客户端抢占其他客户端正在消费的队列分片时,该队列分片的消费进度可以无缝传递到当前客户端中,使在进行队列分片负载均衡时,或者某个正在消费队列分片的客户端宕掉,某个客户端的队列分片被当前客户端抢占后,当前客户端可以按照已消费的消费进度继续消费该队列分片,避免部分数据的重复消费,使消费结果更精确
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个消费器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD) 或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式消费机或其他可编程数据消费终端设备的消费器以产生一个机器,使得通过计算机或其他可编程数据消费终端设备的消费器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据消费终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据消费终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的消费,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺 序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种分布式环境协调消费队列方法和一种分布式环境协调消费队列系统,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (34)

  1. 一种分布式环境协调消费队列的方法,其特征在于,包括:
    针对一待消费的队列分片,基于租赁协议获取的所述队列分片的状态数据;
    根据所述状态数据,判断是否有其他客户端在消费所述队列分片;
    如果确定没有其他客户端在消费所述队列分片,则更新所述队列分片的状态数据,并获取所述队列分片当前的消费进度;
    根据当前的消费进度继续消费所述队列分片,并将所述队列分片新的消费进度进行记录。
  2. 根据权利要求1所述的方法,其特征在于,在针对一待消费的队列分片,基于租赁协议获取的所述队列分片的状态数据的步骤之前,还包括:
    基于租赁协议确定当前客户端需求的待消费的队列分片。
  3. 根据权利要求2所述的方法,其特征在于,所述基于租赁协议确定当前客户端需求的待消费的队列分片的步骤,包括:
    获取活跃的客户端总数U、队列分片总数P以及当前客户端已消费的队列分片总数Q,以计算当前客户端需要抢占的队列分片数量N;
    从超时的队列分片和/或者其他客户端正在消费的队列分片中,抢占N个队列分片作为当前客户端的待消费队列分片。
  4. 根据权利要求3所述的方法,其特征在于,所述获取活跃的客户端总数U、队列分片总数P以及当前客户端已消费的队列分片总数M的步骤,包括:
    从持久化存储空间中存储的客户端实例表和队列租期表中获取活跃的客户端总数U;
    从队列系统中获取队列分片的总个数P;
    从持久化存储空间中存储的队列租期表中获取当前客户端已消费的队列分片总数Q;
    通过N=[P/U]-Q,计算当前客户端需要抢占的队列分片数量N。
  5. 根据权利要求4所述的方法,其特征在于,所述从持久化存储空间中存储的客户端实例表和队列租期表中获取活跃的客户端总数U步骤,包括:
    获取客户端实例表中记录的最近一个第一时间周期内启动的客户端的数量live1;
    获取队列租期表中,没有超时队列分片对应的客户端的数量live2;所述没有超时的客户端为队列分片的租用者字段下记录的客户端;
    将live1加上live2得到总的活跃的客户端数量U。
  6. 根据权利要求3所述的方法,其特征在于,所述从超时的队列分片和/或者其他客户端正在消费的队列分片中,抢占N个队列分片作为当前客户端的待消费队列分片的步骤,包括:
    判断所述N是否大于0;
    如果所述N大于0,则判断超时的队列分片数量L是否小于N;
    如果超时的队列分片数量L大于等于N,则从超时的队列分片中抢占N个队列分片;
    如果判断超时的队列分片数量L小于N,则从超时的队列分片中抢占L个队列分片,并从其他客户端正在消费的队列分片中,抢占N-L个队列分片。
  7. 根据权利要求6所述的方法,其特征在于,所述从其他客户端正在消费的队列分片中,抢占N-L个队列分片的步骤,包括:
    当前客户端从其他客户端占用的队列分片中抢占N-L各队列分片,并 使各客户端占用最多队列分片的数量和占用最少队列分片的数量之差不超过指定数量。
  8. 根据权利要求7所述的方法,其特征在于,所述当前客户端从其他客户端占用的队列分片中抢占N-L各队列分片,并使各客户端占用最多队列分片的数量和占用最少队列分片的数量之差不超过指定数量的步骤,包括:
    基于所述队列租期表,将各个客户端按照其占用的队列分片数量,从多到少进行排序;
    每次从前K个客户端中抢占一个或多个抢占队列分片后,使前K个客户端被抢占J个之后,前K个客户端剩下的队列分片的平均数大于第K+1个客户端当前占用的队列分片数量,并且使前K个客户端中,各个客户端剩余的队列分片数与所述平均数相差不超过指定数量,直至成功抢占到N-L个队列分片。
  9. 根据权利要求3所述的方法,其特征在于,所述从超时的队列分片和/或者其他客户端正在消费的队列分片中,抢占N个队列分片作为当前客户端的待消费队列分片的步骤包括:
    在抢占一队列分片时,将队列租期表中,所述被抢占队列分片的消费者字段修改为当前的客户端。
  10. 根据权利要求3所述的方法,其特征在于,所述从超时的队列分片和/或者其他客户端正在消费的队列分片中,抢占N个队列分片作为当前客户端的待消费队列分片的步骤之前,还包括:
    每隔第一时间周期,获取队列租期表,并判断所述队列租期表中各队列分片下租用者字段的租用ID,是否与本地存储的map表中上次记录的相应队列分片的租用ID相同;
    如果队列租期表中一队列分片的租用ID,与map表中上次记录的相应队列分片的租用ID不同,则更新map表中所述队列分片的租用ID为队列租期表中的租用ID,以及更新map表中所述队列分片的上次监控时间 字段为当前的系统时间;
    如果队列租期表中一队列分片的租用ID,与map表中上次记录的相应队列分片的租用ID相同,则维持上次监控时间字段下的系统时间,并判断当前系统时间减去上次监控的系统时间是否大于第一时间周期;
    如果当前系统时间减去上次监控的系统时间大于第一时间周期,则确定相应队列分片超时。
  11. 根据权利要求2所述的方法,其特征在于,所述基于租赁协议确定当前客户端需求的待消费的队列分片的步骤包括:
    基于租赁协议每隔第一时间周期,确定当前客户端需求的待消费的队列分片。
  12. 根据权利要求1-11其中之一所述的方法,其特征在于,所述针对一待消费的队列分片,获取的所述队列分片的状态数据的步骤,包括:
    针对一待消费的队列分片,从持久化存储空间中存储的队列租期表中获取所述队列分片的状态数据。
  13. 根据权利要求12所述的方法,其特征在于,所述状态数据包括队列租期表中所述队列分片下消费者字段的值、租用者字段的值和更新时间字段的消费时间;所述消费时间为当前客户端抢占所述队列分片后修改所述更新时间字段的值获得;
    则,所述根据所述状态数据,判断是否有其他客户端在消费所述队列分片的步骤,包括:
    判断所述队列分片的消费者字段的值的值是否为当前客户端;
    如果所述队列分片的消费者字段的值的值是当前客户端,则判断当前客户端的抢占时间是否大于所述消费时间;
    如果当前客户端的抢占时间大于所述消费时间,则确定没有其他客户端在消费所述队列分片。
  14. 根据权利要求13所述的方法,其特征在于,所述抢占时间为当前客户端的系统时间,所述消费时间为所述客户端抢占所述队列分片时的系统时间与第一周期时间之和。
  15. 根据权利要求12所述的方法,其特征在于,所述更新所述队列分片的状态数据,并获取所述队列分片当前的消费进度的步骤,包括:
    租用所述队列分片,并将持久化存储空间中存储队列租期表中所述队列分片下的消费者字段更新为当前客户端;
    从持久化存储空间中存储的队列租期表中,获取所述队列分片的进度字段下的消费进度。
  16. 根据权利要求15所述的方法,其特征在于,所述将所述队列分片新的消费进度进行记录的步骤,包括:
    每隔第二时间周期,判断队列租期表中,所述队列分片的租用者字段是否为当前客户端;
    如果所述队列分片的租用者字段是当前客户端,则将当前客户端对所述队列分片的消费进度,更新到队列租期表中所述队列分片的进度字段下。
  17. 根据权利要求15所述的方法,其特征在于,在从持久化存储空间中存储的队列租期表中,获取所述队列分片的进度字段下的消费进度的步骤之后,还包括:
    判断在队列租期表中,所述队列分片下的抢占者字段和消费者字段是否为当前客户端;
    如果述队列分片下的抢占者字段和消费者字段是否为当前客户端,则为当前客户端续租所述队列分片。
  18. 一种分布式环境协调消费队列的装置,其特征在于,包括:
    状态数据获取模块,用于针对一待消费的队列分片,基于租赁协议获取的所述队列分片的状态数据;
    消费判断模块,用于根据所述状态数据,判断是否有其他客户端在消费所述队列分片;
    进度获取模块,用于如果确定没有其他客户端在消费所述队列分片,则更新所述队列分片的状态数据,并获取所述队列分片当前的消费进度;
    消费模块,用于根据当前的消费进度继续消费所述队列分片,并将所述队列分片新的消费进度进行记录。
  19. 根据权利要求18所述的装置,其特征在于,在状态数据获取模块之前,还包括:
    队列分片确定模块,用于基于租赁协议确定当前客户端需求的待消费的队列分片。
  20. 根据权利要求19所述的装置,其特征在于,所述队列分片确定模块包括:
    需求数量确定子模块,用于获取活跃的客户端总数U、队列分片总数P以及当前客户端已消费的队列分片总数Q,以计算当前客户端需要抢占的队列分片数量N;
    抢占子模块,用于从超时的队列分片和/或者其他客户端正在消费的队列分片中,抢占N个队列分片作为当前客户端的待消费队列分片。
  21. 根据权利要求19所述的装置,其特征在于,所述需求数量确定子模块包括:
    客户端总数获取子模块,用于从持久化存储空间中存储的客户端实例表和队列租期表中获取活跃的客户端总数U;
    整体队列分片总数获取子模块,用于从队列系统中获取队列分片的总个数P;
    单体队列分片总数获取子模块,用于从持久化存储空间中存储的队列租期表中获取当前客户端已消费的队列分片总数Q;
    需求数量计算子模块,用于通过N=[P/U]-Q,计算当前客户端需要抢占的队列分片数量N。
  22. 根据权利要求21所述的装置,其特征在于,所述客户端总数获取子模块包括:
    新建客户端数量获取子模块,用于获取客户端实例表中记录的最近一个第一时间周期内启动的客户端的数量live1;
    未超时客户端数量获取子模块,用于获取队列租期表中,没有超时队列分片对应的客户端的数量live2;所述没有超时的客户端为队列分片的租用者字段下记录的客户端;
    客户端数量累加子模块,用于将live1加上live2得到总的活跃的客户端数量U。
  23. 根据权利要求20所述的装置,其特征在于,所述抢占子模块包括:
    第一判断子模块,用于判断所述N是否大于0;
    第二判断子模块,用于如果所述N大于0,则判断超时的队列分片数量L是否小于N;
    全超时抢占子模块,用于如果超时的队列分片数量L大于等于N,则从超时的队列分片中抢占N个队列分片;
    混和抢占子模块,用于如果判断超时的队列分片数量L小于N,则从超时的队列分片中抢占L个队列分片,并从其他客户端正在消费的队列分片中,抢占N-L个队列分片。
  24. 根据权利要求23所述的装置,其特征在于,所述混和抢占模块包括:
    第一混和抢占子模块,用于当前客户端从其他客户端占用的队列分片中抢占N-L各队列分片,并使各客户端占用最多队列分片的数量和占用最少队列分片的数量之差不超过指定数量。
  25. 根据权利要求24所述的装置,其特征在于,所述第一混和抢占子模块包括:
    排序子模块,用于基于所述队列租期表,将各个客户端按照其占用的队列分片数量,从多到少进行排序;
    第二混和抢占模块,用于每次从前K个客户端中抢占一个或多个抢占队列分片后,使前K个客户端被抢占J个之后,前K个客户端剩下的队列分片的平均数大于第K+1个客户端当前占用的队列分片数量,并且使前K个客户端中,各个客户端剩余的队列分片数与所述平均数相差不超过指定数量,直至成功抢占到N-L个队列分片。
  26. 根据权利要求20所述的装置,其特征在于,所述抢占子模块包括:
    消费者字段修改子模块,用于在抢占一队列分片时,将队列租期表中,所述被抢占队列分片的消费者字段修改为当前的客户端。
  27. 根据权利要求20所述的装置,其特征在于,所述抢占子模块之前,还包括:
    map表判断子模块,用于每隔第一时间周期,获取队列租期表,并判断所述队列租期表中各队列分片下租用者字段的租用ID,是否与本地存储的map表中上次记录的相应队列分片的租用ID相同;
    map监控时间更新子模块,用于如果队列租期表中一队列分片的租用ID,与map表中上次记录的相应队列分片的租用ID不同,则更新map表中所述队列分片的租用ID为队列租期表中的租用ID,以及更新map表中所述队列分片的上次监控时间字段为当前的系统时间;
    map监控时间维持子模块,用于如果队列租期表中一队列分片的租用 ID,与map表中上次记录的相应队列分片的租用ID相同,则维持上次监控时间字段下的系统时间,并判断当前系统时间减去上次监控的系统时间是否大于第一时间周期;
    超时判断子模块,用于如果当前系统时间减去上次监控的系统时间大于第一时间周期,则确定相应队列分片超时。
  28. 根据权利要求19所述的装置,其特征在于,所述队列分片确定模块包括:
    第一队列分片确定子模块,用于基于租赁协议每隔第一时间周期,确定当前客户端需求的待消费的队列分片。
  29. 根据权利要求18-28其中之一所述的装置,其特征在于,所述状态数据获取模块包括:
    第一状态数据获取子模块,用于针对一待消费的队列分片,从持久化存储空间中存储的队列租期表中获取所述队列分片的状态数据。
  30. 根据权利要求29所述的装置,其特征在于,所述状态数据包括队列租期表中所述队列分片下消费者字段的值、租用者字段的值和更新时间字段的消费时间;所述消费时间为当前客户端抢占所述队列分片后修改所述更新时间字段的值获得;
    则,所述消费判断模块包括:
    消费者字段判断子模块,用于判断所述队列分片的消费者字段的值的值是否为当前客户端;
    抢占时间判断子模块,用于如果所述队列分片的消费者字段的值的值是当前客户端,则判断当前客户端的抢占时间是否大于所述消费时间;
    确定子模块,用于如果当前客户端的抢占时间大于所述消费时间,则确定没有其他客户端在消费所述队列分片。
  31. 根据权利要求30所述的装置,其特征在于,所述抢占时间为当前客户端的系统时间,所述消费时间为所述客户端抢占所述队列分片时的 系统时间与第一周期时间之和。
  32. 根据权利要求29所述的装置,其特征在于,所述进度获取模块包括:
    租用子模块,用于租用所述队列分片,并将持久化存储空间中存储队列租期表中所述队列分片下的消费者字段更新为当前客户端;
    进度字段读取模块,用于从持久化存储空间中存储的队列租期表中,获取所述队列分片的进度字段下的消费进度。
  33. 根据权利要求32所述的装置,其特征在于,所述消费模块包括:
    租用者判断子模块,用于每隔第二时间周期,判断队列租期表中,所述队列分片的租用者字段是否为当前客户端;
    消费进度更新子模块,用于如果所述队列分片的租用者字段是当前客户端,则将当前客户端对所述队列分片的消费进度,更新到队列租期表中所述队列分片的进度字段下。
  34. 根据权利要求32所述的装置,其特征在于,在租用子模块之后,还包括:
    续租判断子模块,用于判断在队列租期表中,所述队列分片下的抢占者字段和消费者字段是否为当前客户端;
    第一续租子模块,用于如果述队列分片下的抢占者字段和消费者字段是否为当前客户端,则为当前客户端续租所述队列分片。
PCT/CN2016/110230 2015-12-30 2016-12-16 一种分布式环境协调消费队列方法和装置 WO2017114176A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511021136.1A CN106933672B (zh) 2015-12-30 2015-12-30 一种分布式环境协调消费队列方法和装置
CN201511021136.1 2015-12-30

Publications (1)

Publication Number Publication Date
WO2017114176A1 true WO2017114176A1 (zh) 2017-07-06

Family

ID=59224458

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110230 WO2017114176A1 (zh) 2015-12-30 2016-12-16 一种分布式环境协调消费队列方法和装置

Country Status (2)

Country Link
CN (1) CN106933672B (zh)
WO (1) WO2017114176A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111343252A (zh) * 2020-02-13 2020-06-26 深圳壹账通智能科技有限公司 基于http2协议的高并发数据传输方法及相关设备
CN111913909A (zh) * 2019-05-08 2020-11-10 厦门网宿有限公司 一种分布式存储系统中的重新分片方法及系统
CN111988359A (zh) * 2020-07-15 2020-11-24 中科物缘科技(杭州)有限公司 基于消息队列的数据分片同步方法及系统
CN114448989A (zh) * 2022-01-26 2022-05-06 北京百度网讯科技有限公司 调整消息分发的方法、装置、电子设备、存储介质及产品
CN117251508A (zh) * 2023-09-22 2023-12-19 湖南长银五八消费金融股份有限公司 一种借据批量入账方法、装置、设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108199912B (zh) * 2017-12-15 2020-09-22 北京奇艺世纪科技有限公司 一种异地多活的分布式消息的管理、消费方法及装置
CN111309700B (zh) * 2020-02-14 2022-11-29 苏州浪潮智能科技有限公司 一种面向多共享目录树的控制方法及系统
CN112527527A (zh) * 2020-12-16 2021-03-19 深圳市分期乐网络科技有限公司 消息队列的消费速度控制方法、装置、电子设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289392A (zh) * 2011-09-08 2011-12-21 曙光信息产业股份有限公司 基于检查点的作业调度方法和系统
CN104468330A (zh) * 2014-12-03 2015-03-25 北京国双科技有限公司 分布式消息队列系统的数据处理方法和装置
US20150180792A1 (en) * 2010-06-23 2015-06-25 Amazon Technologies, Inc. Balancing a load on a multiple consumer queue
CN104754036A (zh) * 2015-03-06 2015-07-01 合一信息技术(北京)有限公司 一种基于kafka的消息处理系统及处理方法
US20150200886A1 (en) * 2014-01-14 2015-07-16 International Business Machines Corporation Message switch file sharing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103379021B (zh) * 2012-04-24 2017-02-15 中兴通讯股份有限公司 实现分布式消息队列的方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150180792A1 (en) * 2010-06-23 2015-06-25 Amazon Technologies, Inc. Balancing a load on a multiple consumer queue
CN102289392A (zh) * 2011-09-08 2011-12-21 曙光信息产业股份有限公司 基于检查点的作业调度方法和系统
US20150200886A1 (en) * 2014-01-14 2015-07-16 International Business Machines Corporation Message switch file sharing
CN104468330A (zh) * 2014-12-03 2015-03-25 北京国双科技有限公司 分布式消息队列系统的数据处理方法和装置
CN104754036A (zh) * 2015-03-06 2015-07-01 合一信息技术(北京)有限公司 一种基于kafka的消息处理系统及处理方法

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913909A (zh) * 2019-05-08 2020-11-10 厦门网宿有限公司 一种分布式存储系统中的重新分片方法及系统
CN111913909B (zh) * 2019-05-08 2024-02-23 厦门网宿有限公司 一种分布式存储系统中的重新分片方法及系统
CN111343252A (zh) * 2020-02-13 2020-06-26 深圳壹账通智能科技有限公司 基于http2协议的高并发数据传输方法及相关设备
CN111988359A (zh) * 2020-07-15 2020-11-24 中科物缘科技(杭州)有限公司 基于消息队列的数据分片同步方法及系统
CN111988359B (zh) * 2020-07-15 2023-08-15 中国科学院计算技术研究所数字经济产业研究院 基于消息队列的数据分片同步方法及系统
CN114448989A (zh) * 2022-01-26 2022-05-06 北京百度网讯科技有限公司 调整消息分发的方法、装置、电子设备、存储介质及产品
CN114448989B (zh) * 2022-01-26 2024-04-05 北京百度网讯科技有限公司 调整消息分发的方法、装置、电子设备、存储介质及产品
CN117251508A (zh) * 2023-09-22 2023-12-19 湖南长银五八消费金融股份有限公司 一种借据批量入账方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN106933672B (zh) 2021-04-13
CN106933672A (zh) 2017-07-07

Similar Documents

Publication Publication Date Title
WO2017114176A1 (zh) 一种分布式环境协调消费队列方法和装置
WO2017114199A1 (zh) 一种数据同步方法和装置
US20170083579A1 (en) Distributed data processing method and system
US20150295970A1 (en) Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system
WO2017016421A1 (zh) 一种集群中的任务执行方法及装置
JP2019517040A (ja) クラウドプラットフォームベースのクライアントアプリケーション情報統計方法および装置
CN106130960B (zh) 盗号行为的判断系统、负载调度方法和装置
US10498817B1 (en) Performance tuning in distributed computing systems
US10505863B1 (en) Multi-framework distributed computation
CN107783842B (zh) 一种分布式锁实现方法、设备及存储装置
US9684689B2 (en) Distributed parallel processing system having jobs processed by nodes based on authentication using unique identification of data
WO2020024650A1 (zh) 数据处理方法和装置、客户端
US11675622B2 (en) Leader election with lifetime term
US9104486B2 (en) Apparatuses, systems, and methods for distributed workload serialization
CN107203429A (zh) 一种基于分布式锁加载分布式任务的方法以及装置
CN107872517A (zh) 一种数据处理方法及装置
WO2023131058A1 (zh) 一种企业数字中台中资源服务应用的调度系统和方法
TW201727517A (zh) 資料儲存與業務處理的方法及裝置
CN106293911A (zh) 分布式调度系统、方法
US20180121135A1 (en) Data processing system and data processing method
US10783007B2 (en) Load distribution for integration scenarios
CN106815318B (zh) 一种时序数据库的集群化方法及系统
US8788601B2 (en) Rapid notification system
CN111913784A (zh) 任务调度方法及装置、网元、存储介质
CN108023920B (zh) 一种数据包传输方法、设备及应用接口

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16880966

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16880966

Country of ref document: EP

Kind code of ref document: A1