CN106354722B

CN106354722B - Message processing method and device for streaming computing system

Info

Publication number: CN106354722B
Application number: CN201510413095.4A
Authority: CN
Inventors: 陈昱; 刘键; 封仲淹
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2015-07-15
Filing date: 2015-07-15
Publication date: 2019-12-24
Anticipated expiration: 2035-07-15
Also published as: CN106354722A

Abstract

The invention provides a message processing method and a device of a streaming computing system, wherein the method comprises the following steps: s1, receiving a message; s2, judging whether the first storage unit stores the message or not, if so, not processing the message; otherwise, go to S3; s3, judging whether the second storage unit has the service processing result corresponding to the message, if so, executing S6, otherwise, executing S4; s4, carrying out service processing on the message; s5, writing the message and the service processing result thereof into a second storage unit; and S6, writing the message into the first storage unit. The invention can ensure that the message in the stream computing system is processed by the service only once, thereby improving the system performance and efficiency.

Description

Message processing method and device for streaming computing system

[ technical field ] A method for producing a semiconductor device

The present invention relates to the field of computer application technologies, and in particular, to a method and an apparatus for processing a message in a streaming computing system.

[ background of the invention ]

Jstom is a real-time streaming computing framework with reference to storm, and has been used by more and more enterprises as continuous improvements are made on network IO, thread model, resource scheduling, availability and stability. The level of current message processing in the field of streaming computing is generally at least once and at most once, and a simple and efficient way to ensure that a message is accurately processed only once is urgently needed.

JStrom itself has provided an Acker mechanism to implement a message processing interface that is processed at least once, and therefore the core problem to be solved is how to deduplicate data, i.e. to not process processed messages.

[ summary of the invention ]

In view of the above, the present invention provides a message processing method and apparatus for a streaming computing system, so as to reduce repeated processing of messages.

The specific technical scheme is as follows:

the invention provides a message processing method of a stream processing system, which comprises the following steps:

s1, receiving a message;

s2, judging whether the first storage unit stores the message or not, if so, not processing the message; otherwise, go to S3;

s3, judging whether the second storage unit has the service processing result corresponding to the message, if so, executing S6, otherwise, executing S4;

s4, carrying out service processing on the message;

s5, writing the message and the service processing result thereof into a second storage unit;

and S6, writing the message into the first storage unit.

According to a preferred embodiment of the invention, the content in the first storage unit is mapped to a bloom filter BloomFilter;

before the S2, the method further includes: judging whether the message hits BloomFilter, if not, executing the S3; if so, the step S2 is executed.

According to a preferred embodiment of the present invention, when executing the S4, if a message list having the same computation identifier as the message already exists in the second storage unit, performing a service process on the message by using a service process result corresponding to the message list having the same computation identifier as the message;

in the S5, adding the message to the message list, and updating the service processing result corresponding to the message list in the second storage unit by using the service processing result of S4;

writing the message list containing the message to a first storage unit while performing the S6.

According to a preferred embodiment of the present invention, between the S4 and the S5, further comprising: putting the message and the corresponding service processing result into a write-in queue;

performing the S5 and the S6 with contents in the write queue.

According to a preferred embodiment of the present invention, performing the S5 and the S6 using the contents of the write queue comprises:

s51, taking out a cache node from the write-in queue, wherein the cache node comprises a message and a corresponding service processing result;

s52, writing the content in the cache node into the second storage unit;

s53, putting the messages in the cache nodes into a de-duplication storage write queue;

and S54, taking the message from the deduplication storage write queue and writing the message into the first storage unit.

According to a preferred embodiment of the present invention, after the S2 and between the S3, the method further comprises:

and judging whether the message exists in the write queue and the deduplication storage write queue or not, if so, not performing service processing on the message, and otherwise, continuing to execute the step S3.

According to a preferred embodiment of the present invention, before the S51 and the S52, the method further comprises:

judging whether a message with the same calculation identifier as the message contained in the cache node exists in the deduplication storage write queue, if so, returning the cache node to the write queue, and waiting for re-execution of the step S51; otherwise, the execution continues with S52.

The invention also provides a message processing device of the streaming computing system, which comprises:

a receiving unit for receiving a message;

the first judging unit is used for judging whether the first storage unit stores the message or not, and if the judging result is yes, the message is not processed;

a second judging unit, configured to judge whether a service processing result corresponding to the message already exists in the second storage unit when the judgment result of the first judging unit is negative;

a processing unit, configured to not perform service processing on the message when the determination result of the second determining unit is yes; when the judgment result of the second judgment unit is negative, performing service processing on the message;

a first writing unit configured to write a processing result of the message into a second storage unit;

and the second writing unit is used for writing the message into the first storage unit after the first writing unit finishes writing or when the judgment result of the second judging unit is negative.

According to a preferred embodiment of the invention, the content in the first storage unit is mapped to BloomFilter;

the device also includes: a third judging unit, configured to judge whether the message hits the BloomFilter after the receiving unit receives the message, and if not, trigger the second judging unit to perform a judgment operation; and if so, triggering the first judging unit to execute the judged operation.

According to a preferred embodiment of the present invention, the processing unit is specifically configured to, if a message list having the same calculation identifier as the message already exists in the second storage unit, perform service processing on the message by using a service processing result corresponding to the message list having the same calculation identifier as the message;

the first writing unit is specifically configured to add the message to the message list, and update a service processing result corresponding to the message list in the second storage unit by using a service processing result of the processing unit this time;

the second writing unit is specifically configured to write the message list including the message into the first storage unit.

According to a preferred embodiment of the present invention, the processing unit is further configured to place the message and the corresponding service processing result into a write queue;

the first writing unit and the second writing unit perform writing operations using the contents in the writing queue.

According to a preferred embodiment of the present invention, the first write unit is specifically configured to take out a cache node from the write queue, where the cache node includes a message and a service processing result corresponding to the message, write the content in the cache node into the second storage unit, and place the message in the cache node into a deduplication storage write queue;

the second write unit is specifically configured to fetch a message from the deduplication storage write queue and write the message into the first storage unit.

According to a preferred embodiment of the invention, the apparatus further comprises:

a fourth judging unit, configured to, if the judgment result of the first judging unit is negative, judge whether the message already exists in the write queue and the de-overwrite write queue, and if yes, not perform service processing on the message; otherwise, triggering the second judgment unit to execute the judgment operation.

According to a preferred embodiment of the present invention, before writing the content in the cache node into the second storage unit, the first writing unit is further configured to determine whether a message having the same calculation identifier as the message included in the cache node already exists in the deduplication storage write queue, and if so, place the cache node back into the write queue to wait for re-fetching; otherwise, continuing to execute the operation of writing the content in the cache node into the second storage unit.

According to the technical scheme, the message and the processing result are correspondingly written into the second storage unit, even if the message is not written into the first storage unit successfully, when the same message is received next time, because the service processing result corresponding to the message already exists in the second storage unit, the message does not need to be processed again, and only the message is written into the first storage unit, so that the message is guaranteed to be processed by the service only once, and the system performance and efficiency are improved.

[ description of the drawings ]

FIG. 1 is a process flow diagram provided by an embodiment of the present invention;

fig. 2 is a flowchart of a message processing method according to an embodiment of the present invention;

FIG. 3 is a write flow diagram of a compute memory unit and a deduplication memory unit according to an embodiment of the present invention;

fig. 4 is a block diagram of a message processing apparatus according to an embodiment of the present invention.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

Two memory cells are used in the present invention: a calculation storage unit (second storage unit) and a deduplication storage unit (first storage unit). The computing storage unit is used for storing the service processing result of the message, and the duplicate removal storage unit is used for storing the message subjected to the service processing and is used for removing the duplicate to avoid the repeated service processing of the message.

When a message is received, the process flow may be as shown in fig. 1, including the following flows:

in 101, judging whether the received message is stored in the duplicate removal storage unit, if yes, not processing the message; otherwise, 102 is executed.

At 102, the message is subject to business processing.

The business processing to the message in the embodiment of the present invention may be determined by specific business requirements, and may use the historical processing result, if the historical processing result needs to be used, during the business processing, the historical processing result may be first obtained from the calculation storage unit, and then the message may be processed using the obtained historical processing result. Under some service requirements, the service processing on the message may be calculation using the message, for example, when the message 2 is to count the total order quantity of the class a user, if the message 1 already exists before, the total order quantity of the class a user is also counted, when the calculation is performed on the message 2, the calculation result of the message 1 may be obtained from the calculation storage unit, and the calculation result is added to the order quantity of the class a user included in the message 2 to obtain the calculation result of the message 2.

In 103, the business process result is written into a calculation storage unit.

At 104, the message is written to a deduplication store.

When a new message is received, execution is resumed at 102.

The above method can reduce the repeated processing of the messages to a certain extent, but if a fault such as a system downtime occurs at 104 after the execution 103, so that the execution of 104 fails, the result of the business processing is successfully written into the computing storage unit, but the message is not successfully written into the deduplication storage unit. In this case, the message is still repeatedly processed by the service, thereby affecting the system performance and efficiency. For example, when the message is received again, if it is determined that the message does not exist in the deduplication storage unit, the message is processed again, and the result of the processing is written into the computation storage unit, and the message is written into the deduplication storage unit.

In view of the above problem, an embodiment of the present invention further provides a solution, where the execution flow may be as shown in fig. 2, and when a message is received, the following steps are performed:

in 201, it is determined whether the received message is already stored in the deduplication storage unit, and if so, the message is not processed any more; otherwise, 202 is performed.

In 202, it is determined whether the service processing result corresponding to the message already exists in the calculation storage unit, and if so, the message is no longer subjected to service processing, and step 205 is directly executed; otherwise, 203 is executed.

When determining whether the service processing result corresponding to the message already exists in the calculation storage unit, it may be implemented by first determining whether the service processing result exists in the calculation storage unit and then determining whether the received message already exists in the calculation storage unit. If the service processing result exists in the calculation storage unit and the received message already exists in the calculation storage unit, the service processing result corresponding to the message already exists in the calculation storage unit, and the received message does not need to be processed again.

At 203, the message is subject to traffic processing.

In 204, the message is written into the computation storage unit in correspondence with the service processing result of the message.

In 205, the message is written to a deduplication store.

Several scenarios are exemplified as follows:

if the message a is received for the first time, it is judged that the message does not exist in the deduplication storage unit and the service processing result corresponding to the message does not exist in the calculation storage unit, the message is subjected to service processing, the message a and the service processing result of the message a are correspondingly written into the calculation storage unit, and the message a is written into the deduplication storage unit.

Suppose that message a fails to write to the deduplication storage unit.

And receiving the message a again, judging that the message does not exist in the duplicate removal storage unit, and if the service processing result corresponding to the message a already exists in the calculation storage unit, directly writing the message a into the duplicate removal storage unit without performing service processing on the message again. Obviously, the mode avoids performing business processing on the message a again and writing the message a into a computing storage unit, and improves the system performance and efficiency.

In addition, in order to improve efficiency, the content in the deduplication storage unit may be mapped to a BloomFilter (bloom filter), when receiving the message, first determining whether the message hits the BloomFilter, if not, indicating that the message does not exist in the deduplication storage unit, then continuing to execute step 202 in the execution flow shown in fig. 2; if the BloomFilter is hit, it indicates that the message may be stored in the deduplication storage unit, and then step 201 is continued in the execution flow shown in fig. 2.

In the flow shown in fig. 2, there may be a scenario that the service processing of the message a depends on the service processing result of the message B, in this case, the message a and the message B have the same calculation key (calculation identifier), for example, a certain message is to accumulate the order quantity of the user u, the order quantity of the user u included in the message needs to be accumulated with the order quantity of the user u using the service processing result of the previous message, the messages all have the same calculation key, and the calculation key may adopt the identifier of the user u, and the like. For such a scenario, in order to ensure deduplication more accurately, when performing service processing on a message in 203, if a message list having the same computation key as that of the received message already exists in the computation storage unit, the received message is subjected to service processing using a service processing result corresponding to the message list having the same computation key as that of the received message in the computation storage unit. In 204, the received message is added to the message list, and the service processing result corresponding to the message list in the calculation and storage unit is updated by using the service processing result of this time. The above message list containing the received messages is written 205 to a deduplication store.

Taking an example:

if the message a is received, judging that the deduplication storage unit does not contain the message a, and judging that the computation storage unit does not contain the message a, then performing service processing on the message a to obtain a service processing result, writing the message a and the corresponding service processing result a into the computation storage unit, and at the moment, only containing the message a in a message list corresponding to the service processing result a in the computation storage unit; the message a is written to the deduplication storage unit.

And on the basis, receiving the message b, judging that the deduplication storage unit does not contain the message b, and judging that the computation storage unit does not contain the message b, but because the computation storage unit contains a message list with the same computation key as the message b, performing service processing on the message b by using a service processing result a in the computation storage unit to obtain a service processing result b. And adding the message b into the message list of the computing storage unit, wherein the message list comprises the message a and the message b, updating the service processing result a by using the service processing result b, and the service processing result corresponding to the message list comprising the message a and the message b is the service processing result b. And writing the message a and the message b into a deduplication storage unit.

If the message a is received and the message a is not written into the deduplication storage unit, the message a is contained in the calculation storage unit, and the message a is not contained in the deduplication storage unit. On the basis, if the message b is received, the service processing of the message b can be realized through the flow, and the message a and the message b are written into the deduplication storage unit, namely, when the message b is received, the message a is written into the deduplication storage unit in a supplementing mode. Even if the message a is received again subsequently, the repeated service processing on the message a can be ensured.

After step 203 shown in fig. 2, the message and the business process results of the message may be placed in a write queue. In addition, because there may be a multi-thread processing mode in the streaming service processing system, before step 202 is executed, it may be determined whether the write queue and the deduplication storage write queue (which will be referred to in the following description) already contain the message, if yes, 202 is not executed, and the message received this time is no longer processed; otherwise, step 202 is performed.

The specific write flow, i.e. steps 204 and 205 shown in fig. 2, may be as shown in fig. 3, including the following steps:

in 301, a CacheNode (cache node) is fetched from the write queue, where the CacheNode includes a message and a service processing result corresponding to the message, where the message may exist in a message list.

In 302, judge whether there is already the message with the same calculation key as the message that the CacheNode includes in the de-duplication storage write queue, if yes, put the CacheNode back into the write queue, wait to re-execute 301; otherwise, 303 is performed.

In order to prevent the message from being overwritten by the message with the same computation key before being stored in the deduplication storage unit, in this step, it is first determined whether a message with the same computation key as the message included in the CacheNode already exists in the deduplication storage write queue, and if so, it is indicated that the message with the same computation key is being written in the deduplication storage unit, and a new message should be written after the write is completed, so that the CacheNode is put back into the write queue first and waits for retry.

In 303, the contents of CacheNode are written to the compute memory location.

At 304, the messages in the CacheNode are placed in a deduplication store write queue.

At 305, the message is fetched from the deduplication store write queue and written to the deduplication storage unit.

It should be noted that the write queue is actually a total queue, and is mainly used for calculating the writing of the storage unit, while the deduplication write queue is only used for writing the deduplication storage unit. It can be seen that writing to the compute memory unit and writing to the deduplication memory unit are asynchronous, thereby improving throughput.

Fig. 4 is a structural diagram of an apparatus according to an embodiment of the present invention, where the apparatus may be disposed in a device that needs to process a message in a streaming service processing system. As shown in fig. 4, the apparatus may include: the receiving unit 01, the first judging unit 02, the second judging unit 03, the processing unit 04, the first writing unit 05, and the second writing unit 06 may further include a third judging unit 07 and a fourth judging unit 08. The main functions of each component unit are as follows:

the receiving unit 01 is responsible for receiving messages.

The first judging unit 02 is responsible for judging whether the message is stored in the deduplication storing unit, and if so, the message is not processed any more. When the judgment result of the first judgment unit 02 is negative, the second judgment unit 03 judges whether the service processing result corresponding to the message already exists in the calculation storage unit. When the judgment result of the second judging unit 03 is yes, the processing unit 04 does not process the message; if the determination result of the second determining unit 03 is negative, the processing unit 04 performs service processing on the message.

The first writing unit 05 is responsible for writing the service processing result of the message into the calculation storage unit. The second writing unit 06 writes the message into the deduplication storage unit after the first writing unit 05 completes writing or when the determination result of the second determining unit 03 is no.

Preferably, to improve efficiency, the content in the deduplication storage unit may be mapped to BloomFilter. After the receiving unit 01 receives the message, the third determining unit 07 may first determine whether the message hits the BloomFilter, and if not, it indicates that the message does not exist in the deduplication storing unit, and triggers the second determining unit 03 to perform the determination operation; if yes, the operation of triggering the first judgment unit 02 to execute the judgment is performed, which indicates that the message may be stored in the deduplication storage unit.

For a scenario that requires a historical service processing result to perform service processing, in order to ensure deduplication more accurately, the following method is preferably adopted: that is, if a message list having the same computation key as the received message already exists in the computation storage unit, the processing unit 04 performs service processing on the message using a service processing result corresponding to the message list having the same computation key as the message.

At this time, the first writing unit 05 adds the message to the message list, and updates the service processing result corresponding to the message list in the calculation and storage unit by using the service processing result of the processing unit 04 at this time. The second writing unit 06 writes a message list containing the message received this time into the deduplication storage unit.

In addition, the processing unit 04 may also put the message (the message queue including the message) and the corresponding service processing result into the write queue. The first write unit 05 and the second write unit 06 perform the operation of writing using the write queue.

Specifically, the first writing unit 05 may take out one CacheNode from the writing queue, where the CacheNode includes a message (message list) and a service processing result corresponding to the message, write the content in the CacheNode into the calculation storage unit, and put the message in the CacheNode into the deduplication storage writing queue. The second write unit 06 fetches the message from the deduplication store write queue and writes to the deduplication store unit. It can be seen that writing to the compute memory unit and writing to the deduplication memory unit are asynchronous, thereby improving throughput.

Preferably, when the judgment result of the first judgment unit 02 is no, the fourth judgment unit 08 judges whether there is already a message in the write queue and the de-overwrite entry queue, and if so, does not process the message; otherwise, the second judging unit 03 is triggered to execute the judging operation.

Further, before writing the content in the CacheNode into the computation storage unit, the first writing unit 05 may first determine whether a message with the same computation key as the message included in the CacheNode already exists in the deduplication storage writing queue, and if so, put the CacheNode back into the writing queue to wait for re-fetching; otherwise, continuing to execute the operation of writing the content in the CacheNode into the calculation storage unit.

It should be noted that the method and apparatus provided by the embodiment of the present invention are not limited to JStorm and Storm, and can be applied to any message processing system that adopts a similar mechanism.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A message processing method for a streaming computing system, the method comprising:

s1, receiving a message;

s4, carrying out service processing on the message;

and S6, writing the message into the first storage unit.

2. The method of claim 1, wherein the content in the first storage unit is mapped to a bloom filter, BloomFilter;

3. The method according to claim 1, wherein when executing the S4, if a message list having the same computation identifier as the message already exists in the second storage unit, performing a service processing on the message using a service processing result corresponding to the message list having the same computation identifier as the message;

4. The method of claim 1, 2 or 3, further comprising, between the S4 and the S5: putting the message and the corresponding service processing result into a write-in queue;

performing the S5 and the S6 on the contents of the write queue.

5. The method of claim 4, wherein performing the S5 and the S6 on the contents of the write queue comprises:

s52, writing the content in the cache node into the second storage unit;

6. The method of claim 5, further comprising, after the S2 and before the S3:

7. The method of claim 5, further comprising, between the S51 and the S52:

8. A message processing apparatus of a streaming computing system, the apparatus comprising:

a receiving unit for receiving a message;

a second judging unit, configured to judge whether a service processing result corresponding to the message already exists in a second storage unit when the judgment result of the first judging unit is negative;

a first writing unit, configured to write a service processing result of the message into a second storage unit;

9. The apparatus of claim 8, wherein the content in the first storage unit is mapped to a BloomFilter;

10. The apparatus according to claim 8, wherein the processing unit is specifically configured to perform, if a message list having a same computation identifier as the message already exists in the second storage unit, service processing on the message by using a service processing result corresponding to the message list having the same computation identifier as the message;

11. The apparatus according to claim 8, 9 or 10, wherein the processing unit is further configured to place the message and the corresponding service processing result into a write queue;

12. The apparatus according to claim 11, wherein the first write unit is specifically configured to fetch a cache node from the write queue, where the cache node includes a message and a service processing result corresponding to the message, write the content in the cache node into the second storage unit, and place the message in the cache node into a deduplication storage write queue;

13. The apparatus of claim 11, further comprising:

a fourth judging unit, configured to, when the judgment result of the first judging unit is no, judge whether the message already exists in the write queue and the de-rewrite entry queue, and if yes, not process the message; otherwise, triggering the second judgment unit to execute the judgment operation.

14. The apparatus according to claim 11, wherein the first writing unit, before writing the content in the cache node into the second storage unit, is further configured to determine whether a message having the same computation identifier as the message included in the cache node already exists in a deduplication storage write queue, and if so, place the cache node back into the write queue to wait for re-fetching; otherwise, continuing to execute the operation of writing the content in the cache node into the second storage unit.