CN111338814A

CN111338814A - Message processing method and device, storage medium and electronic device

Info

Publication number: CN111338814A
Application number: CN202010091950.5A
Authority: CN
Inventors: 杨学毅; 李仓良; 祝梦遥
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2020-06-26

Abstract

The application provides a message processing method and device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring data to be processed with target message identifiers, wherein the data to be processed is data collected from a plurality of nodes corresponding to target stream processing tasks, the target stream processing tasks are stream processing tasks in which target messages are produced by produced nodes to be consumed by consumed nodes, and the target message identifiers are used for uniquely identifying the target messages; and under the condition that the target message is determined not to be normally consumed according to the data to be processed, adding the target message into a target message queue, wherein the target message queue stores messages corresponding to the stream processing tasks to be processed. By the method and the device, the problem that data are easy to lose in a data distribution mode in the related technology is solved, data correctness is guaranteed, and labor cost is saved.

Description

Message processing method and device, storage medium and electronic device

Technical Field

The present application relates to the field of computers, and in particular, to a message processing method and apparatus, a storage medium, and an electronic apparatus.

Background

Currently, some company message services provide data change messages, and call an RPC (Remote procedure call) interface to obtain entity data. The data volume of a single entity of the message service is large, in order to save development and resource cost, the message service can be connected in a unified mode, and kafka (open source stream processing platform) is adopted to distribute real-time data to serve each business party.

Although kafka has characteristics of high throughput, high concurrency, low delay, and the like, data loss is easily generated. For example, when stream processing task processing is performed using checkpoint, kafka cluster upgrade needs to be manually deleted, or two stream programs are started to run simultaneously, or an offset is stored in an external database. However, kafka cluster upgrades or business logic errors easily result in data loss, are difficult to find, require manual intervention, and are costly to maintain.

Therefore, the data distribution method in the related art has a problem that data is easily lost.

Disclosure of Invention

The embodiment of the application provides a message processing method and device, a storage medium and an electronic device, and aims to at least solve the problem that data is easy to lose in a data distribution mode in the related technology.

According to an aspect of an embodiment of the present application, there is provided a message processing method, including: acquiring data to be processed with target message identifiers, wherein the data to be processed is data collected from a plurality of nodes corresponding to target stream processing tasks, the target stream processing tasks are stream processing tasks in which target messages are produced by produced nodes to be consumed by consumed nodes, and the target message identifiers are used for uniquely identifying the target messages; and under the condition that the target message is determined not to be normally consumed according to the data to be processed, adding the target message into a target message queue, wherein the target message queue stores messages corresponding to the stream processing tasks to be processed.

Optionally, the acquiring the to-be-processed data with the target message identifier includes: reading target data in a target time period from a database, wherein the database stores data collected from each node in a node cluster, and the time difference between the starting time and the current time of the target time period is a target difference value; and matching the data to be processed with the target message identification from the target data by using the target message identification.

Optionally, before acquiring the to-be-processed data with the target message identifier, the method further includes: receiving to-be-processed data sent by a plurality of nodes, wherein the to-be-processed data comprises first data sent by a production node and second data sent by other nodes, the first data comprises a target message, a target message identifier and a timestamp, the other nodes are nodes of the plurality of nodes except the production node, and the second data comprises the target message identifier and the timestamp; and storing the data to be processed into a database.

Optionally, after acquiring the to-be-processed data with the target message identifier, the method further includes: determining whether the target message is normally consumed at a node behind each adjacent node according to the to-be-processed data with the target message identification, wherein the adjacent nodes are two nodes which have adjacent relation according to the target stream processing task in the plurality of nodes; and under the condition that the target node of the target message in the plurality of nodes is not normally consumed, determining that the target message is not normally consumed.

Optionally, before adding the target message to the target message queue, the method further includes: sending data to be processed to target equipment so as to display a target message and a target node through the target equipment; receiving a modification instruction sent by target equipment, wherein the modification instruction is used for modifying a task processing flow of a target stream processing task; after the target message is added to the target message queue, the method further comprises: and processing the target message according to the modified task processing flow.

Optionally, in a case that it is determined that the target message has been normally consumed by the consuming node according to the data to be processed, the method further includes: and clearing the data to be processed from a database, wherein the database stores the data collected from each node in the node cluster.

According to another aspect of the embodiments of the present application, there is provided a message processing apparatus including: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring data to be processed with target message identification, the data to be processed is data collected from a plurality of nodes corresponding to target flow processing tasks, the target flow processing tasks are flow processing tasks of target messages which are produced by a produced node and consumed by a consumed node, and the target message identification is used for uniquely identifying the target messages; and the adding unit is used for adding the target message into a target message queue under the condition that the target message is determined not to be normally consumed according to the data to be processed, wherein the target message queue stores messages corresponding to the stream processing tasks to be processed.

Optionally, the obtaining unit includes: the reading module is used for reading target data in a target time period from a database, wherein the database stores data collected from each node in the node cluster, and the time difference between the starting time and the current time of the target time period is a target difference value; and the matching module is used for matching the data to be processed with the target message identification from the target data by using the target message identification.

Optionally, the apparatus further comprises: the first receiving unit is used for receiving the data to be processed sent by the plurality of nodes before the data to be processed with the target message identification is obtained, wherein the data to be processed comprises first data sent by a production node and second data sent by other nodes, the first data comprises the target message, the target message identification and a timestamp, the other nodes are nodes except the production node in the plurality of nodes, and the second data comprises the target message identification and the timestamp; and the storage unit is used for storing the data to be processed into the database.

Optionally, the apparatus further comprises: the first determining unit is used for determining whether the target message is normally consumed at a node behind each adjacent node according to the data to be processed with the target message identification after the data to be processed with the target message identification is obtained, wherein the adjacent nodes are two nodes which have adjacent relation according to the target stream processing task in the plurality of nodes; and the second determining unit is used for determining that the target message is not normally consumed under the condition that the target message is determined not to be normally consumed in the target node in the plurality of nodes.

Optionally, the apparatus further comprises: the sending unit is used for sending the data to be processed to the target equipment before the target message is added into the target message queue so as to display the target message and the target node through the target equipment; the second receiving unit is used for receiving a modification instruction sent by the target equipment, wherein the modification instruction is used for modifying the task processing flow of the target stream processing task; and the processing unit is used for processing the target message according to the modified task processing flow after the target message is added into the target message queue.

Optionally, the apparatus further comprises: and the clearing unit is used for clearing the data to be processed from a database under the condition that the target message is determined to be normally consumed by the consuming node according to the data to be processed, wherein the database stores the data collected from each node in the node cluster.

According to the method and the device, data to be processed with target message identification is obtained by monitoring each core service processing stage of a flow processing task, wherein the data to be processed is data collected from a plurality of nodes corresponding to the target flow processing task, the target flow processing task is a flow processing task from the production node to the consumption node of the target message, and the target message identification is used for uniquely identifying the target message; under the condition that the target message is determined not to be normally consumed according to the data to be processed, the target message is added into a target message queue, wherein the target message queue stores messages corresponding to the stream processing tasks to be processed, and whether stream data are lost or not is judged according to data collected from all nodes corresponding to the stream processing tasks and the stream data are automatically reissued when the data are lost, so that the technical effects of ensuring data correctness and saving labor cost can be achieved, and the problem that the data are easy to lose in a data distribution mode in the related technology is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a block diagram of an alternative server hardware configuration according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating an alternative message processing method according to an embodiment of the present application;

FIG. 3 is a diagram of an alternative message processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another alternative message processing method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of yet another alternative message processing method according to an embodiment of the present application;

FIG. 6 is a flow diagram illustrating an alternative message processing method according to an embodiment of the present application; and the number of the first and second groups,

fig. 7 is a block diagram of an alternative message processing apparatus according to an embodiment of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

According to an aspect of an embodiment of the present application, a message processing method is provided. Alternatively, the method may be performed in a server or similar computing device. Taking an example of an application running on a server, fig. 1 is a block diagram of a hardware structure of an optional server according to an embodiment of the present application. As shown in fig. 1, the server 10 may include one or more processors 102 (only one is shown in fig. 1), wherein the processors 102 may include, but are not limited to, a processing device such as an MCU (micro controller Unit) or an FPGA (Field Programmable Gate Array) and a memory 104 for storing data, and optionally, the server may further include a transmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and is not intended to limit the structure of the server. For example, the server 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 can be used for storing computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the message processing method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to server 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 10. In one example, the transmission device 106 includes a NIC (Network Interface Controller) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be an RF (Radio Frequency) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, a message processing method operating in the server is provided, and fig. 2 is a schematic flowchart of an optional message processing method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S202, acquiring data to be processed with target message identification, wherein the data to be processed is data collected from a plurality of nodes corresponding to a target flow processing task, the target flow processing task is a flow processing task in which a target message is produced from a produced node to a consumed node, and the target message identification is used for uniquely identifying the target message;

step S204, under the condition that the target message is determined not to be normally consumed according to the data to be processed, the target message is added into a target message queue, wherein the target message queue stores the message corresponding to the stream processing task to be processed.

Alternatively, the executing body of the above steps may be a server, etc., but is not limited thereto, and other devices capable of performing message processing may be used to execute the method in the embodiment of the present application.

Optionally, the message processing method in the embodiment of the present application may be applied to, but is not limited to, a processing process of a stream processing task, and the stream processing task may be executed by a server node cluster. The server executing the message processing method may be one server node in the server node cluster, or may be another server independent of the server node cluster. This is not particularly limited in this embodiment.

According to the embodiment, a mode of monitoring each core service processing stage of the flow processing task is adopted, and whether the flow data is lost or not is judged according to the data collected from each node corresponding to the flow processing task, and the data is automatically reissued when the data is lost, so that the problem that the data is easy to lose in a data distribution mode in the related technology is solved, the data correctness is ensured, and the labor cost is saved.

The message processing method in the embodiment of the present application is explained below with reference to fig. 2.

In step S202, to-be-processed data with a target message identifier is obtained, where the to-be-processed data is data collected from a plurality of nodes corresponding to a target flow processing task, the target flow processing task is a flow processing task in which a target message is produced by a produced node to be consumed by a consumed node, and the target message identifier is used for uniquely identifying the target message.

The stream processing tasks may be tasks handled by Kafka clusters that employ the Flink real-time computation framework and Kafka for the production and consumption of messages. In the cluster, one or more servers may be included, the server nodes being referred to as brookers. The Kafka cluster is based on a Kafka distributed queue, having characteristics of a plurality of data producers (i.e., production nodes), a plurality of intermediate data processing stages (corresponding to brookers between producers and consumers), and a plurality of data consumers (i.e., consumption nodes). In processing the Kafka message queue, a Flink real-time computing framework is used. The message processing flow is shown in fig. 3.

Each stream processing task may correspond to a message. In order to distinguish different messages and ensure that the message identification can accurately identify the messages, one message can be uniquely identified through the message identification. For a target message, the unique identifier may be identified by the target message.

In order to improve the utilization rate of the message identifier, the message identifier of the message can be ensured to have uniqueness within the validity period of the message; after the validity period is exceeded, the message identification is reclaimed for use by other messages. The validity period of a message may be the time period between the message being produced and eventually successfully consumed, or the time period between the message being produced and being deleted from the database (at which point it has been determined that the message has eventually successfully been consumed).

As an optional embodiment, before acquiring to-be-processed data with a target message identifier, receiving to-be-processed data collected by a plurality of nodes, where the to-be-processed data includes first data sent by a production node and second data sent by other nodes, the first data includes the target message, the target message identifier and a timestamp, the other nodes are nodes other than the production node in the plurality of nodes, and the second data includes the target message identifier and the timestamp; and storing the data to be processed into a database.

At key nodes of the process flow, data can be stored in a database by specifying the format in which the data is stored. Each message determines a unique key (message identifier), and the source message of a production end (producer, production node) can store the key, basic data and timestamp of the message and also can store the size of the message. The consuming side (consumer, consuming node) may store only the key, timestamp, of the message.

For the target message, before or after the generated node produces, the production node may stamp the target message, where the timestamp is a timestamp stamped by the production node for the target message, and may be a time for producing the target message, or a time for sending the target message after producing the target message, and a specific time for stamping the timestamp for the target message may be set as needed, which is not specifically limited in this embodiment.

After the target message is produced, the producing node of the target message may encapsulate the target message into a specific data form, and the encapsulated data may include: target message identification, target message, and timestamp, and may also include other information related to the target message, such as the size of the message, etc. The production node may send the encapsulated data to a node between the production node and the consumption node, and after being processed and forwarded by one or more nodes, the encapsulated data is finally sent to the consumption node.

The targeted message may go through multiple processing nodes throughout its production by the producing node to its ultimate consumption by the consuming node. For a key node of the multiple processing nodes, data collection may be performed (as shown in fig. 4), where the key node may include at least a production node (production side) and a consumption node (consumption side), and may further include nodes corresponding to other core service processing stages. The key node may be defined by a service person according to experience, which is not specifically limited in this embodiment.

First data of the target message may be collected from a production node of the target message, the first message including a target key (target message identification), base data (target message), and a timestamp of the target message, and may further include a size of the target message. Second data of the target message is collected from other nodes of the plurality of nodes, the second data may include the target key and the timestamp.

The timestamp in the first data sent by the production node is the timestamp stamped by the production node for the target message, and the timestamp in the second data sent by other nodes is the timestamp obtained by the other nodes through analyzing the received data. That is, the time stamp included in the first data and the time stamp included in the second data are the same time stamp.

It should be noted that if the target message is normally consumed finally, data may be collected from each of the plurality of nodes, and if the target message is not normally consumed, data may not be collected from some of the plurality of nodes. Data collection may be automated by key nodes in data processing.

The data to be processed collected from the plurality of nodes can be saved in a database for subsequent data processing analysis.

The manner in which data is collected from multiple processing nodes is described below in connection with an alternative example.

The message M is time stamped by the producing node a at 8:55 (i.e., 8 o' clock 55 point) by 8:55 (this is an example of a time stamp, and other forms of time stamps are similar), and after being produced and encapsulated by the producing node a, the message M is sent to the processing node B1, and information (e.g., auxiliary information, which may include the identity of the producing node a, the identities of the processing nodes B1 and B2, and the identity of the consuming node C) such as key, basic data, time stamp (i.e., 8:55), and size of the message M is sent to a database for storage.

Processing node B1, after performing the necessary processing on the received data, sends the processed and encapsulated data to processing node B2 at 8:58 and sends information (e.g., side information, which may include the identity of processing node B1) such as the key and timestamp of message M (i.e., 8:55) to the database for storage.

Processing node B2, after performing the necessary processing on the received data, sends the processed and encapsulated data to consuming node C at 9:01 and sends information (e.g., auxiliary information, which may include the identity of processing node B2) such as the key and timestamp of message M (i.e., 8:55) to the database for storage.

After the consuming node C consumes the message M in the received data at 9:04, information (for example, auxiliary information, which may include the identification of the consuming node C) such as the key and the timestamp (i.e., 8:55) of the message M is sent to the database for storage. The message M takes a total of 9 minutes from being produced (8:55) to being finally consumed (9: 04).

The database may store data collected from production node a, processing node B1, processing node B2, and consumption node C so that the server can determine the production node, intermediate nodes, and consumption node of the message and at which nodes the processing was successful based on the data stored in the database.

By the embodiment, the accuracy of data collection can be ensured by collecting the data to be processed from a plurality of nodes in the process of flow task processing.

As an alternative embodiment, the acquiring the to-be-processed data with the target message identifier includes: reading target data in a target time period from a database, wherein the database stores data collected from each node in a node cluster, and the time difference between the starting time and the current time of the target time period is a target difference value; and matching the data to be processed with the target message identification from the target data by using the target message identification.

For data in the database, the server may analyze the collected data through a near real-time off-line task. For example, data may be periodically read from the database for processing, and the read data is data within a certain time period before the current time, that is, data with a timestamp in the target time period.

Reading the target data from the database for the target time period may include: and reading target data with the time stamps in the target time period from the database, wherein for the same message, the time stamp contained in the data collected from the production node of the message is the time stamp of the production node for the message, and the time stamp contained in the data collected from the nodes except the production node is the time stamp of the production node for the message.

For the target message identification (target key), target data may be read from the database, and the time difference between the time stamp of the read target data and the current time is greater than or equal to a target difference value (e.g., 10 minutes). For messages whose time difference from the current time is greater than or equal to the target difference value, it can be considered that if not lost, they have been finally consumed. Data loss refers to the failure of a message to be consumed successfully by a consuming node due to data processing errors occurring in the process from being produced to being consumed.

A message is produced by a producer to be finally consumed by a consumer, and needs to go through a plurality of processes such as production, transmission, consumption and the like, and each process needs a certain time. By setting the target difference value, it is possible to avoid acquiring data corresponding to the message being processed (e.g., the message being transmitted, the message being finally consumed), thereby ensuring that the target data read from the database is complete data, i.e., the message capable of being finally and normally consumed has been finally and normally consumed by the consuming node.

The manner in which the server reads data from the database is described below in connection with an alternative example. In this example, the server reads data from the database for processing every 30 minutes in a polling manner.

If the target difference is not set, the server will read the data received at 8:30 to 9:00 at 9: 00. The server can read the data corresponding to message M collected from production node a and processing node B1. According to the read data, the server can determine that the message M is not normally consumed, and the data analysis has errors.

If the target difference is set to be 10 minutes, when the server reads the data received from 8:50 to 9:20 at a ratio of 9:30, the server can read the data corresponding to the message M, which are collected from the production node A, the processing node B1, the processing node B2 and the consumption node C, and according to the read data, the server can determine that the message M is finally normally consumed by the consumption node C.

Through the embodiment, the collected data are stored through the database, and the reading time range is set, so that the accuracy of data analysis can be ensured, and data processing errors caused by incomplete consumption of the data are avoided.

In step S204, in the case that it is determined that the target message is not normally consumed according to the data to be processed, the target message is added to a target message queue, where a message corresponding to the stream processing task to be processed is stored in the target message queue.

After the data to be processed is acquired, the data lost in each data processing stage can be screened out.

As an optional embodiment, after acquiring the to-be-processed data with the target message identifier, determining whether the target message is normally consumed at a node subsequent to each adjacent node according to the to-be-processed data with the target message identifier, where the adjacent nodes are two nodes having an adjacent relationship according to a target stream processing task in the plurality of nodes; and under the condition that the target node of the target message in the plurality of nodes is not normally consumed, determining that the target message is not normally consumed.

For each message from production to final consumption, whether the data of each stage is normally consumed or not can be analyzed. That is, a message is produced by a producer, consumed by an intermediate broker (intermediate node) (also known as processing, e.g., forwarding, etc.), and finally consumed by a consumer from production to final consumption.

For any two adjacent nodes in the plurality of nodes, whether data is collected from each of the two adjacent nodes can be judged according to the data to be processed. If data is received at a previous node among the neighboring nodes but data is not received from a subsequent node, it may be determined that a message loss occurs at the subsequent node.

For example, for message M above, if the server reads data corresponding to message M collected from production node a and processing node B1. From the read data, the server determines that message M was consumed normally at processing node B1 and that message M was not consumed normally at processing node B2, then a message loss occurs at processing node B2.

It should be noted that, for a message, if data processing is performed at a node for a plurality of times, that is, the node corresponds to a plurality of message processing stages of the message (each time data processing corresponds to a message processing node), it may be determined whether data corresponding to each message processing stage is received from the node to determine whether the message is normally consumed at the node: if data corresponding to each message processing stage is received, it may be determined that the message is normally consumed at the node, and if data corresponding to only a part of the message processing stages is received, it may be determined that the message is not normally consumed at the node, and it may also be determined that a message processing stage in which data loss occurs.

For example, for message M, 2 stages of data processing have been performed at processing node B1, and after the first stage processing is complete, processing node B1 sends the key of message M, the timestamp, the identity of processing node B1, and the identity of the first stage to the database. After the first phase has been lost, the processing node B1 will not send the data associated with the second phase to the database. The server may determine from the data read from the database that the message was not consumed properly at processing node B1 and that a data loss occurred at the second stage (possibly due to a data processing failure).

By the embodiment, the message loss analysis is carried out according to the data to be processed, the node with the data loss is determined, the reason of the data loss can be conveniently determined, and the safety, the stability and the reliability of the cluster are improved.

If the target message is not consumed normally (missing data, the message is lost in the middle of a certain period), automatic message complementation can be triggered (as shown in fig. 4), and the target message is added into a target message queue (kafka queue) to ensure data correctness.

As an alternative embodiment, before adding the target message into the target message queue, sending the data to be processed to the target device, so as to display the target message and the target node through the target device; receiving a modification instruction sent by target equipment, wherein the modification instruction is used for modifying a task processing flow of a target stream processing task; after the target message is added to the target message queue, the target message may be processed according to the modified task processing flow.

The lost data can support manual examination, the reason of data loss is analyzed, the service logic is optimized, and the message complementation is triggered after the examination is passed.

For convenience of manual review, to-be-processed data that is not normally processed may be sent to the target device, the target device displays the target message and the target node, and the service logic optimization (e.g., modifying the task processing flow) is performed by manually analyzing the reason for data loss.

After the business logic optimization is carried out, the message compensation is triggered, and the message is prevented from being lost again. For example, a modification instruction sent by the target device for modifying the task processing flow of the target stream processing task may be received. After receiving the modification instruction, message complementation can be triggered. For the complemented target message, the target message can be processed according to the modified task processing flow.

Through the embodiment, the data to be processed of the abnormal messages is sent to the target device for display, so that manual examination and verification can be conveniently carried out, and the reliability of message processing is improved.

As an alternative embodiment, in the case that it is determined that the target message has been normally consumed by the consuming node according to the pending data, the pending data is cleared from the database, wherein the database stores the data collected from each node in the node cluster.

If it is determined that the target message has been consumed normally by the consuming node (i.e., the message is successful from production to final consumption), the pending data is considered normal data. In order to prevent the data backlog in the database from being too much, the normal data of the database can be cleared in time.

The way of cleaning up may be real-time, and for each message, if it is determined that the message has been consumed normally, the data related to the message is directly deleted. The cleaning mode can be non-real-time, and all normal data can be deleted after all the data needing to be processed in the current period are processed.

It should be noted that the message is not normally consumed, which may be that the message is not normally consumed by an intermediate node (a brooker between the producer and the consumer), or that the message is not normally consumed by the consuming node, and the message is considered to be finally normally consumed only if the message is normally consumed by the consuming node.

Through the embodiment, the backlog of the data in the database can be avoided by clearing the normal data in the database, and the rationality of the resource utilization of the database is improved.

The following describes a message processing method in the embodiment of the present application with reference to an optional example. As shown in fig. 5, data collection is performed at key nodes (a data production end, a data consumption end, and a core service processing stage) in a stream processing task, and the collected data is written into a database; using the message auto-compensation tool, the collected data is analyzed by a quasi-real-time offline task (e.g., reading data in a certain time period before the current time in a polling manner), screening out messages missing at each stage, and rewriting into the kafka message queue. In this example, zero data loss is achieved by each consumer consuming the anaplerosis message on demand.

Fig. 6 is a schematic flowchart of another alternative message processing method according to an embodiment of the present application, and as shown in fig. 6, the flowchart includes the following steps:

step S602, data collection.

And (3) performing data collection on key nodes (a production end, a consumption end and a core business processing stage of data) in the flow processing task, and storing the data into a database by specifying a data format. The only key is determined for each message, the source message of the production end can store the basic data, the time stamp, the size and the like of the message, and the consumption end can only store the key and the time stamp of the message.

Step S604, reading the database and performing data analysis.

The collected data may be analyzed by a near real-time off-line task. And reading the database and analyzing the data.

Step S606, determine whether the message is a normal message.

And judging whether the data of each stage is consumed normally from production to final consumption of each message according to the data analysis result, if so, executing the step S608, and otherwise, executing the step S610.

Step S608, the normal data is cleared in time.

The message is successful from production to final consumption and is regarded as normal data. And normal data of the database is cleared in time, so that excessive data backlog is prevented.

Step S610, automatically compensating the lost data.

If data is lost in the middle stage, the automatic back-filling of the message can be triggered, and the data is written into the kafka queue. And a manual auditing mechanism can be introduced, so that the lost message can be checked manually, the message loss reason is analyzed, the service logic is optimized, and the message complementation is triggered after the auditing is passed.

By the embodiment, loss of real-time stream data can be found in time, and maintenance cost is saved; by adopting the automatic message complementation tool, when data is lost due to cluster maintenance, program abnormity and the like, automatic message complementation can be ensured, manual intervention is not needed, the labor cost is saved, the data correctness is ensured, and the problems that stream data is difficult to discover after being lost and the cost of the complemented data is huge can be solved.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

According to another aspect of the embodiments of the present application, there is provided a message processing apparatus for implementing the message processing method in the above embodiments. Optionally, the apparatus is used to implement the above embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 7 is a block diagram of an alternative message processing apparatus according to an embodiment of the present application, and as shown in fig. 7, the apparatus includes:

(1) an obtaining unit 72, configured to obtain to-be-processed data with a target message identifier, where the to-be-processed data is data collected from a plurality of nodes corresponding to a target flow processing task, the target flow processing task is a flow processing task in which a target message is produced by a produced node to be consumed by a consumed node, and the target message identifier is used for uniquely identifying the target message;

(2) and the adding unit 74 is connected to the obtaining unit 72, and is configured to add the target message to a target message queue in the case that it is determined that the target message is not normally consumed according to the to-be-processed data, where the target message queue stores messages corresponding to the to-be-processed stream processing task.

Alternatively, the acquiring unit 72 may be used in step S202 in the above-described embodiment, and the adding unit 74 may be used in step S204 in the above-described embodiment.

As an alternative embodiment, the obtaining unit 72 includes:

(1) the reading module is used for reading target data in a target time period from a database, wherein the database stores data collected from each node in the node cluster, and the time difference between the starting time and the current time of the target time period is a target difference value;

(2) and the matching module is used for matching the data to be processed with the target message identification from the target data by using the target message identification.

As an alternative embodiment, the apparatus further comprises:

(1) the first receiving unit is used for receiving the data to be processed sent by the plurality of nodes before the data to be processed with the target message identification is obtained, wherein the data to be processed comprises first data sent by a production node and second data sent by other nodes, the first data comprises the target message, the target message identification and a timestamp, the other nodes are nodes except the production node in the plurality of nodes, and the second data comprises the target message identification and the timestamp;

(2) and the storage unit is used for storing the data to be processed into the database.

As an alternative embodiment, the apparatus further comprises:

(1) the first determining unit is used for determining whether the target message is normally consumed at a node behind each adjacent node according to the data to be processed with the target message identification after the data to be processed with the target message identification is obtained, wherein the adjacent nodes are two nodes which have adjacent relation according to the target stream processing task in the plurality of nodes;

(2) and the second determining unit is used for determining that the target message is not normally consumed under the condition that the target message is determined not to be normally consumed in the target node in the plurality of nodes.

As an alternative embodiment, the apparatus further comprises:

(1) the sending unit is used for sending the data to be processed to the target equipment before the target message is added into the target message queue so as to display the target message and the target node through the target equipment;

(2) the second receiving unit is used for receiving a modification instruction sent by the target equipment, wherein the modification instruction is used for modifying the task processing flow of the target stream processing task;

(3) and the processing unit is used for processing the target message according to the modified task processing flow after the target message is added into the target message queue.

As an alternative embodiment, the apparatus further comprises:

(1) and the clearing unit is used for clearing the data to be processed from a database under the condition that the target message is determined to be normally consumed by the consuming node according to the data to be processed, wherein the database stores the data collected from each node in the node cluster.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

According to yet another aspect of embodiments herein, there is provided a computer-readable storage medium. Optionally, the storage medium has a computer program stored therein, where the computer program is configured to execute the steps in any one of the methods provided in the embodiments of the present application when the computer program is executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, acquiring data to be processed with target message identification, wherein the data to be processed is data collected from a plurality of nodes corresponding to target flow processing tasks, the target flow processing tasks are flow processing tasks for target messages produced by a produced node to be consumed by a consumed node, and the target message identification is used for uniquely identifying the target messages;

and S2, adding the target message into a target message queue under the condition that the target message is determined not to be normally consumed according to the data to be processed, wherein the target message queue stores messages corresponding to the stream processing tasks to be processed.

Optionally, in this embodiment, the storage medium may include, but is not limited to: a variety of media that can store computer programs, such as a usb disk, a ROM (Read-only Memory), a RAM (Random Access Memory), a removable hard disk, a magnetic disk, or an optical disk.

According to still another aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor (which may be the processor 102 in fig. 1) and a memory (which may be the memory 104 in fig. 1) having a computer program stored therein, the processor being configured to execute the computer program to perform the steps of any of the above methods provided in embodiments of the present application.

Optionally, the electronic apparatus may further include a transmission device (the transmission device may be the transmission device 106 in fig. 1) and an input/output device (the input/output device may be the input/output device 108 in fig. 1), wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Optionally, for an optional example in this embodiment, reference may be made to the examples described in the above embodiment and optional implementation, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the principle of the present application shall be included in the protection scope of the present application.

Claims

1. A message processing method, comprising:

acquiring data to be processed with a target message identifier, wherein the data to be processed is data collected from a plurality of nodes corresponding to a target stream processing task, the target stream processing task is a stream processing task in which a target message is produced by a produced node to be consumed by a consumed node, and the target message identifier is used for uniquely identifying the target message;

and under the condition that the target message is determined not to be normally consumed according to the data to be processed, adding the target message into a target message queue, wherein the target message queue stores messages corresponding to the stream processing tasks to be processed.

2. The method of claim 1, wherein obtaining the data to be processed with the target message identification comprises:

reading target data in a target time period from a database, wherein the database stores data collected from each node in a node cluster, and the time difference between the starting time and the current time of the target time period is a target difference value;

and matching the data to be processed with the target message identification from the target data by using the target message identification.

3. The method of claim 2, wherein prior to said obtaining pending data having a target message identification, the method further comprises:

receiving the to-be-processed data sent by the plurality of nodes, wherein the to-be-processed data includes first data sent by the production node and second data sent by other nodes, the first data includes the target message, the target message identifier and a timestamp, the other nodes are nodes of the plurality of nodes other than the production node, and the second data includes the target message identifier and a timestamp;

and storing the data to be processed into the database.

4. The method of claim 1, wherein after the obtaining the pending data with the target message identification, the method further comprises:

determining whether the target message is normally consumed at a node behind each adjacent node according to the to-be-processed data with the target message identifier, wherein the adjacent nodes are two nodes which have adjacent relation according to the target stream processing task in the plurality of nodes;

determining that the target message is not normally consumed in a case where it is determined that the target message is not normally consumed in a target node of the plurality of nodes.

5. The method of claim 4,

prior to the adding the target message to the target message queue, the method further comprises: sending the data to be processed to a target device so as to display the target message and the target node through the target device; receiving a modification instruction sent by the target device, wherein the modification instruction is used for modifying a task processing flow of the target stream processing task;

after the adding the target message to the target message queue, the method further comprises: and processing the target message according to the modified task processing flow.

6. The method of claim 1, wherein in the case that it is determined from the data to be processed that the targeted message has been normally consumed by the consuming node, the method further comprises:

and clearing the data to be processed from a database, wherein the database stores the data collected from each node in the node cluster.

7. A message processing apparatus, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring data to be processed with target message identification, the data to be processed is data collected from a plurality of nodes corresponding to target flow processing tasks, the target flow processing tasks are flow processing tasks of target messages produced by a produced node to be consumed by a consumed node, and the target message identification is used for uniquely identifying the target messages;

and the adding unit is used for adding the target message into a target message queue under the condition that the target message is determined not to be normally consumed according to the data to be processed, wherein the target message queue stores messages corresponding to the stream processing tasks to be processed.

8. The apparatus of claim 7, wherein the obtaining unit comprises:

the reading module is used for reading target data in a target time period from a database, wherein the database stores data collected from each node in a node cluster, and the time difference between the starting time and the current time of the target time period is a target difference value;

and the matching module is used for matching the data to be processed with the target message identification from the target data by using the target message identification.

9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 6 when executed.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 6 by means of the computer program.