Disclosure of Invention
The embodiments of the present specification aim to provide a more efficient scheme for obtaining messages from message queue middleware, so as to solve the deficiencies in the prior art.
To achieve the above object, one aspect of the present specification provides a method for obtaining a message from message queue middleware, the method being performed by a process in a computing platform for processing the message, and comprising:
obtaining current allocation information, the current allocation information being related to a first number of first message queues currently allocated to the process, wherein the first message queues are message queues in the message queue middleware;
determining whether the current allocation information changes compared to pre-stored initial allocation information, the initial allocation information being associated with a second number of second message queues allocated to the process when the process is started, wherein the second message queues are message queues in the message queue middleware; and
and restarting to establish connection with the first number of first message queues under the condition that the current distribution information is changed compared with the initial distribution information, so as to acquire messages from the first number of first message queues.
In one embodiment, in the method of retrieving messages from message queue middleware, the message queues in the message queue middleware have corresponding topics, the process is configured to retrieve messages from at least one message queue in the message queue middleware corresponding to at least one specific topic, and the current allocation information is related to the current at least one message queue.
In one embodiment, in the method for retrieving a message from message queue middleware, the process is one process in a group of processes, a message queue in the message queue middleware has a corresponding topic, the group of processes is configured to retrieve a message from at least one message queue in the message queue middleware corresponding to at least one specific topic, and the retrieving current assignment information includes retrieving first information from the message queue middleware, the first information being related to the current at least one message queue, and retrieving the current assignment information based on the first information.
In one embodiment, in the method of retrieving a message from message queue middleware, retrieving the current allocation information based on the first information includes retrieving the current allocation information based on the first information according to a polling algorithm.
In one embodiment, in the method for retrieving a message from message queue middleware, the process is one process in a group of processes, a message queue in the message queue middleware has a corresponding topic, the group of processes is configured to retrieve a message from at least one message queue in the message queue middleware corresponding to at least one specific topic, and the retrieving current assignment information includes:
obtaining first information from the message queue middleware, wherein the first information is related to the current at least one message queue; determining whether the first information changes compared to second information, wherein the second information is related to the at least one message queue at the time of the process start; and acquiring the current distribution information based on the first information when the first information is changed compared with the second information.
In an embodiment, in the method for obtaining a message from a message queue middleware, the current allocation information includes the value of the first number, and a topic of each message queue in the first number of first message queues, an identifier of a server where the message queue is located, and a queue identifier, and the initial allocation information includes the value of the second number, and a topic of each message queue in the second number of second message queues, an identifier of a server where the message queue is located, and a queue identifier.
In one embodiment, the method for acquiring messages from the message queue middleware further includes, after restarting, acquiring, in a storage unit, processing progress information of a third message queue, for the third message queue connected to the process both before restarting and after restarting, and determining a plurality of messages to be acquired from the third message queue according to the processing progress information.
In one embodiment, in the method of retrieving a message from message queue middleware, the processing progress information is retrieved from a processing schedule in a storage unit, and the processing schedule includes values of the following fields: a topic, a server identification, a message queue identification, and a message identification.
In one embodiment, the method of retrieving messages from the message queue middleware further comprises, after determining a plurality of messages to be retrieved from the third message queue, updating the processing schedule in the storage unit upon completion of processing the retrieved plurality of messages.
In one embodiment, in the method of retrieving a message from message queue middleware, the storage unit is a third party storage unit independent of the message queue middleware and the computing platform.
In one embodiment, in the method for retrieving a message from message queue middleware, the computing platform is a Kepler computing platform.
Another aspect of the present specification provides an apparatus for retrieving a message from message queue middleware, the apparatus being implemented by a process in a computing platform for processing the message, comprising:
a first obtaining unit, configured to obtain current allocation information, where the current allocation information is related to a first number of first message queues currently allocated to the process, where the first message queues are message queues in the message queue middleware;
a determining unit, configured to determine whether the current allocation information changes compared to pre-stored initial allocation information, where the initial allocation information is related to a second number of second message queues allocated to the process when the process is started, where the second message queues are message queues in the message queue middleware; and
and the restarting unit is configured to restart to establish connection with the first number of first message queues so as to acquire messages from the first number of first message queues when the current distribution information is changed compared with the initial distribution information.
In one embodiment, in the apparatus for retrieving a message from message queue middleware, the process is one process in a group of processes, a message queue in the message queue middleware has a corresponding topic, the group of processes is configured to retrieve a message from at least one message queue in the message queue middleware corresponding to at least one specific topic, and the first retrieving unit further includes: a first obtaining subunit, configured to obtain first information from the message queue middleware, where the first information is related to the current at least one message queue; and a second obtaining subunit configured to obtain the current allocation information based on the first information.
In one embodiment, in the apparatus for retrieving a message from message queue middleware, the process is one process in a group of processes, a message queue in the message queue middleware has a corresponding topic, the group of processes is configured to retrieve a message from at least one message queue in the message queue middleware corresponding to at least one specific topic, and the first retrieving unit further includes: a first obtaining subunit, configured to obtain first information from the message queue middleware, where the first information is related to the current at least one message queue; a determining subunit configured to determine whether the first information changes compared to second information, wherein the second information is related to the at least one message queue at the time of the process start; and a second acquisition subunit configured to acquire the current allocation information based on the first information in a case where the first information is changed from the second information.
In an embodiment, the apparatus for acquiring a message from message queue middleware further includes a second acquiring unit, configured to, after performing a restart, acquire, in a storage unit, processing progress information of a third message queue connected to the process both before the restart and after the restart, where the third message queue is connected to the process; and a determining unit configured to determine a plurality of messages to be acquired from the third message queue according to the processing progress information.
In one embodiment, the apparatus for retrieving a message from the message queue middleware further includes an updating unit configured to update the processing schedule in the storage unit when the retrieved plurality of messages are processed after determining the plurality of messages to be retrieved from the third message queue.
By the message acquisition scheme according to the embodiment of the specification, the message queue change in the message queue middleware can be monitored in near real time, and only the process (node) with the changed queue distribution is restarted, so that the whole calculation task is not influenced, and the influence is reduced to the minimum. In addition, the processing progress information of the message queue is stored on the independent memory, so that the processing progress information is more stable and reliable.
Detailed Description
The embodiments of the present specification will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic diagram of a system 100 according to an embodiment of the present description. As shown in FIG. 1, system 100 includes message queue middleware 101, computing platform 102, and storage unit 103. Message queue middleware 101 is used to receive messages from message producers and provide them to message processing processes 12 (nodes 12) in computing platform 102, such as Antq, a message queue middleware adapted to ant LDC architecture, including distributed pull mode message queues. However, the message queue middleware 101 is not limited to being in a pull mode, and may also be, for example, a push mode, a point-to-point mode, or the like.
The message generally includes three important features of a theme, a body and an attribute, wherein the theme defines the type of the message, such as a transaction message, a click message and the like, the body mainly includes the content of the message, and the attribute is a custom attribute set by a producer to the message. The message queue middleware 101 includes a plurality of message queues 11, each of which has a corresponding topic, and the topic corresponding to a message queue is the same as the topic of a message included in the message queue.
The computing platform 102 is, for example, a computing platform Kepler developed by ants, and includes a plurality of consuming nodes 12 (message processing processes) corresponding to a computing task, and the consuming nodes 12 are configured to obtain a message from the message queue middleware 101, perform task processing, and store a consumption progress. The task processing here is, for example, a real-time calculation task, for example, calculating a risk in a loan transaction. The computing platform 102 is not limited to Kepler, and may be, for example, a small computing unit, such as a server (i.e., a consumer node), or a large computing platform, as long as it has computing power consistent with task processing. And the task processing is not limited to real-time computing tasks, and may be various types of tasks such as asynchronous processing, application decoupling, traffic clipping, log processing, and the like.
The message queue middleware 101 uniformly allocates the message queues 11 corresponding to the computation tasks to the consuming nodes 12 corresponding to the computation tasks by, for example, a polling allocation algorithm, and the consuming nodes 12 acquire messages from the message queues 11 allocated thereto. The consuming node 12 will periodically (e.g., every minute) determine whether the total number of message queues corresponding to the topic has changed and whether the message queues assigned to it have changed. When the message queue allocated to it changes, for example, it is increased, deleted, or changed, the node (process) restarts to reconnect with the message queue allocated to it to receive messages from the message queue allocated to it. However, message queue middleware 101 is not limited to allocating message queues through a round robin algorithm, but may be any algorithm that can allocate message queues under the same topic evenly to consumer nodes.
In addition, before the node 12 is going to obtain a message from the middleware 101, the node reads the message processing progress of the corresponding message queue from the storage unit 103, so as to pull a new batch of messages from the position of unprocessed messages in the message queue. After the node 12 finishes processing the batch of messages, a new message processing progress is written in the storage unit 103. The storage unit 103 may be a storage unit inside the computing platform 102, or may be a storage unit outside the computing platform.
Fig. 2 illustrates a method for obtaining a message from message queue middleware, the method being performed by a process in a computing platform for processing the message, according to an embodiment of the specification, and comprising:
at step S21, obtaining current allocation information, the current allocation information being related to a first number of first message queues currently allocated to the process, wherein the first message queues are message queues in the message queue middleware;
at step S22, determining whether the current allocation information changes from pre-stored initial allocation information associated with a second number of second message queues allocated to the process at the time of start-up of the process, wherein the second message queues are message queues in the message queue middleware; and
in step S23, in case that the current allocation information changes compared to the initial allocation information, restarting is performed to establish a connection with the first number of first message queues, so as to obtain messages from the first number of first message queues.
First, at step S21, current allocation information is obtained, the current allocation information relating to a first number of first message queues currently allocated to the process, wherein the first message queues are message queues in the message queue middleware.
As described above, the message queue middleware is, for example, Antq middleware, and the computing platform is, for example, a Kepler computing platform. In the Kepler computing platform, for example, a real-time computing task may be processed, which in the case of processing large data may be a distributed computing task, i.e., processing on a distributed plurality of server hosts. And the application instance running on each host for performing the computing task is a process of the computing task. The process comprises a plurality of processes of obtaining a message from Antq, processing data of the message, storing message consumption progress and the like. The process may correspond to a consumer in a message queue middleware-computing platform system.
The real-time computing task subscribes one or more message topics to the Antq, a plurality of queues are arranged under each message topic, the number of the queues is determined according to the number of the messages, the queues are equivalent to containers, and a plurality of messages sequentially acquired from message producers are sequentially arranged in the queues.
When a real-time computing task is started, namely when each process of the real-time computing task is started, each process performs the following processing: acquiring all topics subscribed by the real-time computing task and all message queues under all the topics, and acquiring the total process number of the real-time computing task; performing polling distribution on all the processes by all the message queues according to a specific algorithm, such as a polling algorithm, so as to obtain information of the message queues (the processes) distributed to the processes, such as the total number (i.e. the second number) of the message queues, the subject, the identification of the server where the message queues are located, the queue identification and the like; and establishing connection with the message queue allocated to the process to acquire messages from the message queue, and recording the messages as initial allocation information of the process.
In the Antq, the message queue under the topic to which the computing task subscribes may be changed according to the situation of message writing, for example, in the case that the message speed increases too fast, the number of message queues is increased, in the case that the message speed increases and becomes slow, the number of message queues is decreased, and so on. The process may periodically (e.g., every one minute) obtain information about all current message queues corresponding to the computing task from Antq, and obtain current allocation information of the process according to a polling algorithm, e.g., the total number of message queues currently allocated to the process (i.e., the first number), the topic, the identity of the server where it is located, the queue identity, and so on.
For example, in the case shown in FIG. 1, the message queue corresponding to a computing task is 4, including a corresponding 3 processes (nodes 12) on the computing platform. In this case, the first node 12 is assigned to 2 message queues and the 2 nd and third nodes 12 are each assigned to 1 message queue by polling assignment. In another example, if the number of the corresponding total message queues is 2 and the number of the corresponding processes is 3, then according to the polling algorithm, the 1 st and 2 nd processes are each allocated a message queue, and the 3 rd process is not allocated a message queue. That is, each process is assigned to the largest and smallest number of message queues that differ by at most 1. Thereby ensuring the balanced distribution of the message queue to the process. In addition, according to the polling algorithm, the identification of the message queue to which it has access can be uniquely determined according to the process identification (node identification). That is, the first number (the number of message queues currently allocated by the process) and the second number (the number of message queues allocated by the process at startup) are determined according to a specific allocation algorithm, for example, based on a polling algorithm, the first number and the second number may be zero, one, or more according to the number of message queues and the number of nodes.
At step S22, it is determined whether the current allocation information has changed from pre-stored initial allocation information associated with a second number of second message queues allocated to the process at startup of the process, wherein the second message queues are message queues in the message queue middleware.
As described in the above example, after the initial allocation information is obtained at the time of the process start and the current allocation information is obtained at the time of executing the method, the two may be compared to determine whether the current allocation information is changed from the initial allocation information. For example, it is determined whether the current allocation information is changed by comparing the total number of message queues in the current allocation information with the total number of message queues in the initial allocation information. For another example, whether the current distribution information changes or not is determined by comparing whether the message queue in the current distribution information and the message queue in the initial distribution information are the same message queue or not based on the subject of the respective message queue, the identifier of the server where the current distribution information and the initial distribution information are located, the queue identifier, and the like.
In step S23, in case that the current allocation information changes compared to the initial allocation information, restarting is performed to establish a connection with the first number of first message queues, so as to obtain messages from the first number of first message queues.
And when the current distribution information of the process is determined to be changed through the judgment, the process is still connected with the message queue recorded in the initial distribution information. In the Kepler platform, a process may connect with a message queue allocated to it at startup. Therefore, it is possible to establish a connection with a message queue currently allocated thereto only at the process and acquire a message from the reconnected message queue by restarting only a specific process. Thus, by the method, the message queue changes in the Antq middleware can be monitored in near real time (e.g., every minute as described above), and only processes (nodes) that have changes assigned to the queues are restarted to establish connections with the message queues currently assigned to them, without affecting the overall computational task, thereby minimizing the impact.
In the description of the above method, the computing platform adopts a Kepler computing platform as an example, however, the computing platform is not limited to the Kepler computing platform, and may be other large data computing platforms such as Jstorm, Storm, Spark Streaming, etc., and may also be a smaller computing platform according to business requirements, such as a server host.
In the description of the above method, the message queue middleware adopts the Antq middleware as an example, however, the message queue middleware is not limited to the Antq middleware, and may be, for example, other message queue middleware such as activemq, RabbitMQ, Kafka, rocktmq, and the like.
In the description of the above method, the tasks processed in the computing platform adopt real-time computing tasks as an example, however, the tasks are not limited to slave tasks, and may be various types of tasks, such as asynchronous processing, application decoupling, traffic clipping, log processing, and the like.
In the description of the above method, the message queue to which each process is assigned is calculated using a polling algorithm, however, the calculation of the message queue assignment is not limited to the polling algorithm, and may be any other algorithm that can achieve equal assignment, such as an equal assignment algorithm, a benchmarking algorithm, and the like.
In one embodiment, the computing platform includes only one server host, that is, for one task in the computing platform, the computing platform includes only one corresponding process. In this case, the process is configured to retrieve messages from at least one message queue in the message queue middleware corresponding to at least one particular topic, and the current assignment information is associated with the current at least one message queue. That is, in the case of the real-time computing task described above, the process is connected to all message queues corresponding to all topics to which the real-time computing task is subscribed. Wherein the specific topic is at least one topic subscribed to by the real-time computing task. Thus, the current allocation information of the process is information related to the entire message queues, such as the total number of the message queues, the subject, the server identifier of the process, the queue identifier, and the like.
In one embodiment, the computing platform performs computing tasks through a set of processes, as described above for the method. That is, the set of processes are connected to all message queues under all topics to which the computing task subscribes. In contrast, in this embodiment, in each process: obtaining first information from the message queue middleware, wherein the first information is related to the current at least one message queue; determining whether the first information is changed compared to pre-stored second information, wherein the second information is related to the at least one message queue at the time of starting the process; and acquiring the current distribution information based on the first information when the first information is changed compared with the second information. In case the first information is unchanged compared to the second information, the method will be terminated. That is, the comparison of the current full message queue (i.e., the first information) and the initial full message queue (i.e., the second information) of the task is increased compared to the above description, and the subsequent comparison of the current assignment information and the initial assignment information of the process is performed only when the first information is changed compared to the second information. Thereby saving computing resources and improving efficiency.
In one embodiment, after the process is restarted, for a third message queue connected to the process before and after the restart, processing progress information of the third message queue is obtained in a storage unit, and a plurality of messages to be obtained from the third message queue are determined according to the processing progress information. Wherein the processing progress information is acquired from a processing schedule in a storage unit, the processing schedule including values of: a topic, a server identification, a message queue identification, and a message identification. In addition, after determining a plurality of messages to be acquired from the third message queue, the processing schedule is updated in the storage unit when the acquired plurality of messages are processed.
The storage unit can be a third-party storage unit independent of the computing platform and the message queue middleware, so that the processing progress information is stable and reliable. For example, in the example of the Antq middleware-Kepler platform described above, a process in the computing platform pulls messages from the Antq middleware in a modular fashion. Before pulling a message from a specific message queue each time, a process reads a processing schedule in a storage unit, finds the position of the specific message queue in a table through a theme, a server identifier and a message queue identifier, and reads a message identifier corresponding to the specific message queue in the table, wherein the message identifier is an offset value of the message queue, namely the processing progress. Thus, the process pulls messages from the particular message queue according to the processing progress. For example, if the process reads the message identifier of the specific message queue recorded in the processing schedule from the storage unit as the 100 th message, the process pulls, for example, the 10 th messages of 101 th and 110 th messages when pulling the message from the specific message queue. After processing the 10 messages, the process updates the processing schedule, i.e. writes the message identifier 110 into the processing schedule at the position corresponding to the specific message queue.
When a process is restarted, as for a third message queue connected to the process both before and after the restart, since the third message queue is already connected to the process before the restart, the processing progress (offset value) of the message queue is recorded in the storage unit. Therefore, the process can acquire the processing progress by reading the processing progress table in the storage unit, and acquire the message from the third message queue according to the processing progress. Thereby avoiding missing messages or repeatedly processing messages. For a fourth message queue connected to the process only after the restart, i.e. the fourth message queue is added newly after the restart, the process pulls a message from the first message of the fourth message queue.
Fig. 3 shows a timing diagram of a message acquisition method according to an embodiment of the present specification. As shown in fig. 3, in a case where a computing task in a computing platform includes multiple processes, for one of the processes, it first obtains total message queue information currently corresponding to the computing task from message queue middleware (illustrated as an Antq example in the figure). As described above, the message queues included in the total message queue information may correspond to one or more topics subscribed to by the computing task, and the total message queue information includes a total number of corresponding message queues, an identity of the server on which they reside, an identity of the queue, and the like. Then, the process compares the current total message queue information with the pre-stored total message queue information at the time of starting the process (comparison 1) to determine whether the current total message queue information changes. In the case that the current total message queue information changes, the information of the currently allocated message queue of the process is acquired through a predetermined algorithm, for example, a polling algorithm, an average allocation algorithm, and the like. Thereafter, the process compares its current allocation information with the pre-stored allocation information at the start of the process (compare 2) to determine whether the current allocation information of the process has changed. And when the current distribution information of the process is changed, restarting the process to establish connection with the changed distributed message queue.
After the restart, the process reads the processing progress (message offset) of each message queue to which it is connected from the storage unit to acquire the position where the message will be read from the message queue. For a message queue that is not recorded in the storage unit, i.e., a message queue that is not connected to the process before restart, the process reads a message starting with the first message in the message queue. Then, the process obtains a batch of messages from the message queue middleware according to the processing progress and processes the batch of messages. After processing the batch of messages, the process writes the processing progress into the storage unit.
The timing diagram shown in fig. 3 is merely an example, and the method according to the embodiments of the present specification is not limited to this sequence. For example, in the case where the computing task includes only one process, the current total message queue information obtained by the process from the message queue middleware is the current allocation information of the process, so that the first two steps in the figure can be omitted. In addition, in the case where the calculation task includes a plurality of processes, the step of comparison 1 in the figure may be omitted, and comparison 2 may be performed directly.
The timing diagram of fig. 3 may be executed periodically, for example, once per minute, to ensure that the message queue changes in the message queue middleware are monitored in near real-time and that the connections to the message queue are updated in a timely manner.
Fig. 4 illustrates an apparatus 400 for retrieving a message from message queue middleware, implemented by a process in a computing platform for processing the message, according to an embodiment of the specification, comprising:
a first obtaining unit 41, configured to obtain current allocation information, where the current allocation information is related to a first number of first message queues currently allocated to the process, where the first message queues are message queues in the message queue middleware;
a determining unit 42, configured to determine whether the current allocation information changes compared to pre-stored initial allocation information, where the initial allocation information is related to a second number of second message queues allocated to the process when the process is started, where the second message queues are message queues in the message queue middleware; and
a restarting unit 43, configured to restart to establish a connection with the first number of first message queues to obtain messages from the first number of first message queues when the current allocation information changes compared to the initial allocation information.
In one embodiment, in the apparatus for retrieving a message from message queue middleware, the process is a process in a group of processes, a message queue in the message queue middleware has a corresponding topic, the group of processes is configured to retrieve a message from at least one message queue in the message queue middleware corresponding to at least one specific topic, the first retrieving unit 41 further includes: a first obtaining subunit 411, configured to obtain first information from the message queue middleware, where the first information is related to the current at least one message queue; and a second obtaining subunit 412 configured to obtain the current allocation information based on the first information.
In an embodiment, in the apparatus for retrieving a message from message queue middleware, the process is one process in a group of processes, a message queue in the message queue middleware has a corresponding topic, the group of processes is configured to retrieve a message from at least one message queue in the message queue middleware corresponding to at least one specific topic, the first retrieving unit 41 further includes: a first obtaining subunit 411, configured to obtain first information from the message queue middleware, where the first information is related to the current at least one message queue; a determining subunit 413 configured to determine whether the first information is changed compared to second information, wherein the second information is related to the at least one message queue at the time of starting the process; and a second obtaining subunit 412 configured to, in a case where the first information is changed compared to the second information, obtain the current allocation information based on the first information.
In one embodiment, the apparatus 400 for retrieving a message from message queue middleware further includes a second retrieving unit 44, configured to, after performing a restart, retrieve, in a storage unit, processing progress information of a third message queue connected to the process both before the restart and after the restart, where the third message queue is connected to the process; and a determining unit 45 configured to determine a plurality of messages to be acquired from the third message queue according to the processing progress information.
In one embodiment, the apparatus 400 for retrieving messages from the message queue middleware further comprises an updating unit 46 configured to update the processing schedule in the storage unit when the plurality of retrieved messages are processed after determining the plurality of messages to be retrieved from the third message queue.
By the message acquisition scheme according to the embodiment of the specification, the message queue change in the message queue middleware can be monitored in near real time, and only the process (node) with the changed queue distribution is restarted, so that the whole calculation task is not influenced, and the influence is reduced to the minimum. In addition, the processing progress information of the message queue is stored on the independent memory, so that the processing progress information is more stable and reliable.
It will be further appreciated by those of ordinary skill in the art that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether these functions are performed in hardware or software depends on the particular application of the solution and design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.