CN115543613A

CN115543613A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN115543613A
Application number: CN202211168215.5A
Authority: CN
Inventors: 胡建平
Original assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Current assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2022-12-30

Abstract

The embodiment of the invention relates to a data processing method, a data processing device, electronic equipment and a storage medium, and relates to the field of computers, wherein the method stores received data into a corresponding data buffer queue according to a preset grouping rule based on a Sink operator of a Flink architecture; sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue, and sending the data to multiple threads in a thread pool for data output; the embodiment of the invention firstly stores different types of data into corresponding data buffer queues respectively based on the grouping rules, then sends the data in the data buffer queues to multiple threads for output, realizes the grouping processing of different types of data and the sequential output of the same type of data through the grouping rules, increases the data throughput through the multiple threads, and improves the utilization rate of server resources.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computers, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

The native Sink operator of the Flink framework adopts a single-thread mode to land on the ground, but due to the limitation of the single thread, the message Sink is processed in a single-thread mode, the whole performance of the task server cannot be exerted, the utilization rate of server resources is low, and the data throughput is insufficient.

In the prior art, the parallelism of a Sink operator is usually increased (i.e. a processor is increased) to improve the data throughput, but this method increases the server cost, causes resource waste, and cannot realize sequential output of data of the same type.

Disclosure of Invention

The invention provides a data processing method, a data processing device, electronic equipment and a storage medium, and aims to solve the technical problems that in the prior art, the resource waste is caused by increasing the parallelism of a Sink operator, and the same type of data can not be output sequentially.

In a first aspect, the present invention provides a data processing method, including: the Sink operator based on the Flink architecture stores the received data into a corresponding data buffer queue according to a preset grouping rule; and sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue, and sending the data to multiple threads in the thread pool for data output.

As an optional embodiment of the present invention, after the data carries priority information and the received data is stored in a corresponding data buffer queue, the method further includes: and reordering the dequeuing sequence of all the data in the data buffer queue according to the priority information carried by the data.

As an optional embodiment of the present invention, before storing the received data in the corresponding data buffer queue, the method further includes: receiving a configuration instruction of a user, and setting configuration information of a Sink operator of the Flink architecture according to the configuration instruction, wherein the configuration information comprises at least one of the following: the upper limit value of the capacity of the data buffer queue, the number of threads of the thread pool, a retry strategy after data output fails and a consistency check point strategy; the upper limit value of the capacity of the data cache queue is determined according to the memory capacity of a virtual machine heap, the number of threads in the thread pool is not greater than the upper limit value of the number of threads, and the upper limit value of the number of threads is determined according to the capacity of a processor.

As an optional embodiment of the present invention, after storing the received data in the corresponding data buffer queue, the method further includes: determining whether the currently stored data capacity of the data buffer queue reaches the upper limit value of the capacity of the data buffer queue; if yes, blocking data from being stored in a corresponding data buffer queue; if not, the step of storing the data into the corresponding data buffer queue is executed.

As an optional embodiment of the present invention, after storing the received data in the corresponding data buffer queue, the method further includes: determining whether the currently stored data capacity of the data buffer queue reaches the upper limit value of the capacity of the data buffer queue; if yes, blocking data from being stored in a corresponding data buffer queue; and if not, executing the step of storing the received data into the corresponding data buffer queue.

As an optional embodiment of the present invention, the method further comprises: and under the condition of data output failure, processing according to the retry strategy after the data output failure.

As an optional embodiment of the present invention, the method further comprises: before consistency check, determining whether to output all data in the data buffer queue according to the consistency checkpoint strategy.

As an alternative embodiment of the present invention, the method further comprises: and storing the received data into a data buffer queue under the condition that the grouping rule does not exist.

In a second aspect, the present invention provides a data processing apparatus, where the apparatus is based on a Sink operator of a Flink architecture, and the apparatus includes: the storage module is used for storing the received data into a corresponding data buffer queue according to a preset grouping rule; and the processing module is used for sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue and sending the data to multiple threads in the thread pool for data output.

In a third aspect, the present invention provides an electronic device, including a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus; a memory for storing a computer program; a processor configured to implement the steps of the data processing method according to any one of the first aspect when executing the program stored in the memory.

In a fourth aspect, the invention provides a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the data processing method according to any one of the first aspect.

According to the data processing method, the data processing device, the electronic equipment and the storage medium, the received data are stored in the corresponding data buffer queue according to the preset grouping rule through the Sink operator based on the Flink architecture; sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue, and sending the data to multiple threads in a thread pool for data output; the embodiment of the invention firstly stores different types of data into corresponding data buffer queues respectively based on the grouping rules, then sends the data in the data buffer queues to multiple threads for output, realizes the grouping processing of different types of data and the sequential output of the same type of data through the grouping rules, increases the data throughput through the multiple threads, and improves the utilization rate of server resources.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention;

fig. 2 is a flow chart of a data processing method according to an embodiment of the present invention;

FIG. 3 is a block flow diagram of another data processing method according to an embodiment of the present invention;

FIG. 4 is a flow chart illustrating another data processing method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

First, the terms to which the present invention relates will be explained:

flink: is a distributed processing engine and framework that performs state computations on bounded and unbounded data streams. In general, flink is a stream computing framework, which is mainly used to process streaming data.

Sink: the function of the three-level logic structure is to output the data processed by the Flink to an external device, which can also be called as a data ground.

The native Sink operator in the traditional Flink framework uses a single-thread mode to land data, due to the limitation of a single thread, the message Sink is processed in a single-thread mode, so that the data throughput is not high, the whole performance of a task server cannot be exerted, the utilization rate of server resources such as a Central Processing Unit (CPU for short) is low, and the resources are idle and wasted.

In the prior art, in order to improve the Sink throughput, the parallelism of Sink operators is increased, namely a corresponding number of CPUs (central processing units) are increased, but the method not only increases the server cost, but also causes resource waste, and can not realize sequential output of data of the same type.

In view of the above technical problems, the technical idea of the present invention is as follows: the user can preset a grouping rule according to the requirement, after the Sink operator receives the data, the data are respectively stored into the corresponding data buffer queues according to the grouping rule, then multithreading is started for data output of the data in the data buffer queues, grouping processing of different types of data is achieved through the grouping rule, more use scenes are met, data throughput is increased through multithreading, and the utilization rate of server resources is improved.

Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention, and as shown in fig. 1, the data processing method includes:

step S101, the Sink operator based on the Flink architecture stores the received data into a corresponding data buffer queue according to a preset grouping rule.

Specifically, the embodiment of the invention customizes the Sink operator of the Flink architecture, the Sink operator applies for the data buffer queue and the thread pool, and the user can set the grouping rule according to the requirement, for example, the data related in the process of purchasing goods by the customer on the shopping website is divided into a list placing class, a warehouse outlet class and the like. In the step, the data carried in the Input message enters a Sink operator, the Sink operator analyzes the Input message to obtain the data, and the data is written into different local message cache queues according to a grouping rule specified by a user. For example, the data of the next class is written into a first buffer queue, and the data of the out-of-library class is written into a second buffer queue.

And S102, sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue, and sending the data to multiple threads in the thread pool for data output.

Specifically, data is taken out from the head of each data buffer queue, and is sent to a plurality of threads in the thread pool to perform data output operation in a polling or random mode, so that the sequential output of the data of the same type is realized.

For example, fig. 2 is a flow chart of a data processing method according to an embodiment of the present invention. As shown in fig. 2, the Sink operator receives the message and then analyzes the message to obtain data A3 and B3, determines a key1 message buffer queue corresponding to the data A3 according to a preset grouping key, writes the data A3 into the tail of the key1 message buffer queue, sequentially takes out the data from the head of the key1 message buffer queue for dequeuing, and sends the data to the thread pool for output; and determining the key2 message buffer queue corresponding to the data B3, writing the data B3 into the tail of the key2 message buffer queue, sequentially taking out the data from the head of the key2 message buffer queue for dequeuing, and sending the dequeued data to the thread pool for outputting.

As an optional embodiment, before the step S101, the method further includes: receiving a configuration instruction of a user, and setting configuration information of a Sink operator of the Flink architecture according to the configuration instruction, wherein the configuration information comprises at least one of the following: the upper limit value of the capacity of the data buffer queue, the number of threads of the thread pool, a retry strategy after data output fails and a consistency check point strategy; the upper limit value of the capacity of the data buffer queue is determined according to the memory capacity of a virtual machine heap, the number of threads in the thread pool is not greater than the upper limit value of the number of threads, and the upper limit value of the number of threads is determined according to the capacity of a processor.

Specifically, a user may set configuration information of the Sink operator in advance, such as a capacity upper limit value of a data buffer queue, where the value is used to maintain the size of a current data processing pool in a multi-thread environment, and coordinate backpressure, so as to prevent a thread data buffer queue from overflowing due to unlimited data filling, and if not, the value is default 1024. For example, when the number of threads in the thread pool is set to 0, it indicates that the thread pool is not used, and at this time, the number is not different from that of the original Sink operator. As another retry strategy after data output fails, it may be default to support never retry, or a task throws an exception when retry is allowed but the number of retries is used up, although the user may also define the retry strategy by himself. For another example, a consistency checkPoint policy is that whether all messages of a buffer queue are to be output before a Flink task makes checkPoint may be turned on by default, and if the messages are set to be turned off, there may be a risk of data loss when the task is restarted.

As an optional embodiment, after the step S101, the method further includes: determining whether the currently stored data capacity of the data buffer queue reaches the upper limit value of the capacity of the data buffer queue; if yes, blocking data from being stored in a corresponding data buffer queue; if not, the step of storing the received data into the corresponding data buffer queue is executed.

Specifically, after receiving data, the Sink operator determines a data buffer queue to which the data is to be written according to a preset grouping rule, then determines whether the currently used capacity of the data buffer queue reaches an upper limit of the capacity, if not, writes the data into the tail of the data buffer queue, and if so, blocks the data from being written into the data buffer queue, and optionally, after a preset time interval, repeatedly judges whether the currently used capacity of the buffer queue reaches the upper limit of the capacity.

Continuing to refer to fig. 2, before writing A3 into the Key1 message buffer queue, it is determined whether the current message number of the Key1 message buffer queue reaches the upper limit value of the capacity of the Key1 message buffer queue, if not, writing is performed, if so, writing is blocked, and after a preset time interval, it is determined again whether the current message number reaches the upper limit value of the capacity of the Key1 message buffer queue; similarly, before writing B3 into the Key2 message buffer queue, it is determined whether the current message number of the Key2 message buffer queue reaches the upper limit of the capacity of the Key2 message buffer queue, if not, writing is performed, if so, writing is blocked, and after a preset time interval, it is determined whether the current message number reaches the upper limit of the capacity of the Key2 message buffer queue again.

As an alternative embodiment, the method further comprises: and if the data output fails, processing according to the retry strategy after the data output fails. Specifically, after data is sent to a thread pool, if data output fails, processing is performed according to a retry strategy, if the retry strategy is never retry, the task does not retry any more, and the task throws an exception, and if the retry strategy is allowed to retry, but the retry time runs out, the task throws an exception, and data output is retried until output succeeds, or the retry time runs out.

As an alternative embodiment, the method further comprises: before consistency check, determining whether to output all data in the data buffer queue according to the consistency checkpoint strategy. Specifically, a user may set in advance whether to output all messages of the buffer queue before the Flink task makes a checkPoint, and may default to start, which means that all messages of the buffer queue are output before the Flink task makes the checkPoint, and if not, when the task is restarted, data may be lost.

As an alternative embodiment, the method further comprises: and storing the received data into a data buffer queue under the condition that the grouping rule does not exist.

Specifically, if the user does not preset a grouping rule, the received data can be directly stored in the data buffer queue, that is, the multithreading non-grouping sequential writing mode is realized. Fig. 3 is a flowchart of another data processing method according to an embodiment of the present invention, as shown in fig. 3, after an Input message enters a Sink operator, adding 1 to the number of cache messages in a local data cache queue, then determining whether the local data cache queue has reached an upper limit of the number of messages set by a user (i.e., an upper limit of a capacity of a data cache queue set in advance to prevent memory overflow), if the local data cache queue has reached the upper limit of the capacity set by the user, blocking the Input message from continuing writing, otherwise, placing data in a tail of the local data cache queue, taking out a message from a head of the queue, sending a thread pool to perform a data output operation, after the data is output, subtracting 1 from the number of cache messages in the local data cache queue, and if the execution fails, executing a retry policy specified by the user.

In addition, the self-defined Sink operator in the embodiment of the invention provides a corresponding interface for a user, and compared with the original Sink, the use mode of the invention is added with a layer of packaging, so that the use method is basically consistent, and the learning cost is lower.

According to the data processing method provided by the embodiment of the invention, the received data is stored in the corresponding data buffer queue according to the preset grouping rule through the Sink operator based on the Flink architecture; sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue, and sending the data to multiple threads in a thread pool for data output; the embodiment of the invention firstly stores different types of data into corresponding data buffer queues respectively based on the grouping rules, then sends the data in the data buffer queues to multiple threads for output, realizes the grouping processing of different types of data and the sequential output of the same type of data through the grouping rules, increases the data throughput through the multiple threads, and improves the utilization rate of server resources.

On the basis of the foregoing embodiment, fig. 4 is a schematic flow chart of another data processing method provided in an embodiment of the present invention, where the data carries priority information, and as shown in fig. 4, the data processing method includes:

step S201, the Sink operator based on the Flink architecture stores the received data into a corresponding data buffer queue according to a preset grouping rule.

Step S202, according to the priority information carried by the data, reordering the dequeuing sequence of all the data in the data buffer queue.

And step S203, sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue, and sending the data to multiple threads in the thread pool for data output.

The implementation manners of steps S201 and S203 in the embodiment of the present invention are similar to the implementation manners of steps S101 and S102 in the above embodiment, and are not described herein again.

The difference from the above embodiment is that, in order to further reflect the urgency of different data, in this embodiment, the data carries priority information, and according to the priority information carried by the data, all data in the data buffer queue is reordered according to the dequeue order.

Specifically, the data buffer Queue may be a Priority Blocking Queue (Priority Blocking Queue). The Sink operator receives the message and then analyzes the data, firstly writes different types of data into the tail of the corresponding data buffer queue according to the grouping rule, then reorders the dequeuing sequence according to the priority information carried by each data, dequeues the data after the data with high priority and dequeues the data after the data with low priority, and sends the data to the thread pool to realize the priority of different tasks, and the method can refer to fig. 2. Preferably, the priority information is time information, the priority before the time is high, the priority after the time is low, and the data are sequentially output to the external device according to the time sequence.

According to the data processing method provided by the embodiment of the invention, the data carries priority information, and the dequeuing sequence of all data in the data buffer queue is reordered according to the priority information carried by the data; namely, the embodiment of the invention determines the output sequence of the data according to the priority information, thereby meeting more use scenes.

Fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, where the apparatus is based on a Sink operator of a Flink architecture, and as shown in fig. 5, the data processing apparatus includes a storage module 10 and a processing module 20;

the storage module 10 is configured to store the received data into a corresponding data buffer queue according to a preset grouping rule; the processing module 20 is configured to sequentially obtain data from the data buffer queue according to a dequeue sequence of the data buffer queue, and send the data to multiple threads in the thread pool for data output.

As an optional embodiment of the present invention, the data carries priority information, and the storage module 10 is further configured to: and reordering the dequeuing sequence of all the data in the data buffer queue according to the priority information carried by the data.

As an optional embodiment of the present invention, the apparatus further includes a configuration module 30, where the configuration module 30 is configured to receive a configuration instruction of a user, and set configuration information of a Sink operator of the Flink architecture according to the configuration instruction, where the configuration information includes at least one of: the upper limit value of the capacity of the data buffer queue, the number of threads of the thread pool, a retry strategy after data output fails and a consistency check point strategy; the upper limit value of the capacity of the data buffer queue is determined according to the memory capacity of the virtual machine heap, the number of threads in the thread pool is not greater than the upper limit value of the number of threads, and the upper limit value of the number of threads is determined according to the capacity of the processor.

As an alternative embodiment of the present invention, the storage module 10 is further configured to: determining whether the currently stored data capacity of the data buffer queue reaches the upper limit value of the capacity of the data buffer queue; if yes, blocking data from being stored in a corresponding data buffer queue; if not, the step of storing the received data into the corresponding data buffer queue is executed.

As an alternative embodiment of the present invention, the processing module 20 is further configured to: and under the condition of data output failure, processing according to the retry strategy after the data output failure.

As an alternative embodiment of the present invention, the processing module 20 is further configured to: before consistency check, determining whether to output all data in the data buffer queue according to the consistency checkpoint strategy.

As an alternative embodiment of the present invention, the storage module 10 is further configured to: and storing the received data into a data buffer queue under the condition that the grouping rule does not exist.

The data processing apparatus provided in this embodiment has similar implementation principles and technical effects to those of the foregoing embodiments, and is not described herein again.

The data processing device provided by the embodiment of the invention comprises a storage module, a data buffer queue and a data buffer module, wherein the storage module is used for storing received data into the corresponding data buffer queue according to a preset grouping rule; the processing module is used for sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue and sending the data to multiple threads in the thread pool for data output; the embodiment of the invention firstly stores different types of data into corresponding data buffer queues respectively based on the grouping rules, then sends the data in the data buffer queues to multiple threads for output, realizes the grouping processing of different types of data and the sequential output of the same type of data through the grouping rules, increases the data throughput through the multiple threads, and improves the utilization rate of server resources.

As shown in fig. 6, an embodiment of the present invention provides an electronic device, which includes a processor 111, a communication interface 112, a memory 113, and a communication bus 114, where the processor 111, the communication interface 112, and the memory 113 complete mutual communication via the communication bus 114,

a memory 113 for storing a computer program;

in an embodiment of the present invention, the processor 111 is configured to implement the steps of the data processing method provided in any one of the foregoing method embodiments when executing the program stored in the memory 113.

The implementation principle and technical effect of the electronic device provided by the embodiment of the invention are similar to those of the above embodiment, and are not described herein again.

The memory 113 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable and programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 113 has a storage space for program code for performing any of the method steps of the above-described method. For example, the memory space for the program code may comprise respective program codes for respectively implementing the respective steps in the above method. The program code can be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such computer program products are typically portable or fixed storage units. The storage unit may have a storage section or a storage space or the like arranged similarly to the memory 113 in the electronic device described above. The program code may be compressed, for example, in a suitable form. Typically, the memory unit comprises a program for performing the steps of the method according to an embodiment of the invention, i.e. a code which can be read by a processor, such as 111 for example, which code, when run by an electronic device, causes the electronic device to perform the steps of the method described above.

Embodiments of the present invention also provide a computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data processing method as described above.

The computer-readable storage medium may be contained in the apparatus/device described in the above embodiments; or may be present alone without being assembled into the device/apparatus. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.

According to embodiments of the present invention, the computer readable storage medium may be a non-volatile computer readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device

It is noted that, in this document, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above description is merely illustrative of particular embodiments of the invention that enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data processing method, comprising:

the Sink operator based on the Flink architecture stores the received data into a corresponding data buffer queue according to a preset grouping rule;

and sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue, and sending the data to multiple threads in the thread pool for data output.

2. The method of claim 1, wherein the data carries priority information, and after storing the received data in a corresponding data buffer queue, the method further comprises:

and reordering the dequeuing sequence of all the data in the data buffer queue according to the priority information carried by the data.

3. The method according to claim 1 or 2, wherein before storing the received data in the corresponding data buffer queue, further comprising:

receiving a configuration instruction of a user, and setting configuration information of a Sink operator of the Flink architecture according to the configuration instruction, wherein the configuration information comprises at least one of the following: the upper limit value of the capacity of the data buffer queue, the number of threads of the thread pool, a retry strategy after data output fails and a consistency check point strategy;

the upper limit value of the capacity of the data buffer queue is determined according to the memory capacity of a virtual machine heap, the number of threads in the thread pool is not greater than the upper limit value of the number of threads, and the upper limit value of the number of threads is determined according to the capacity of a processor.

4. The method of claim 3, wherein after storing the received data in the corresponding data buffer queue, further comprising:

determining whether the currently stored data capacity of the data buffer queue reaches the upper limit value of the capacity of the data buffer queue;

if yes, blocking data from being stored in a corresponding data buffer queue;

if not, the step of storing the received data into the corresponding data buffer queue is executed.

5. The method of claim 3, further comprising:

and under the condition of data output failure, processing according to the retry strategy after the data output failure.

6. The method of claim 3, further comprising:

before consistency check, determining whether to output all data in the data buffer queue according to the consistency checkpoint strategy.

7. The method according to claim 1 or 2, characterized in that the method further comprises:

and storing the received data into a data buffer queue under the condition that the grouping rule does not exist.

8. A data processing apparatus, characterized in that the apparatus is based on a Sink operator of a Flink architecture, the apparatus comprising:

the storage module is used for storing the received data into a corresponding data buffer queue according to a preset grouping rule;

and the processing module is used for sequentially acquiring data from the data buffer queue according to the dequeue sequence of the data buffer queue and sending the data to multiple threads in the thread pool for data output.

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the data processing method of any one of claims 1 to 7 when executing the program stored in the memory.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data processing method according to any one of claims 1 to 7.