CN112118315A

CN112118315A - Data processing system, method, device, electronic equipment and storage medium

Info

Publication number: CN112118315A
Application number: CN202010986270.XA
Authority: CN
Inventors: 吴海涛; 王硕
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2020-09-18
Filing date: 2020-09-18
Publication date: 2020-12-22

Abstract

The disclosed embodiments relate to a data processing system, a method, an apparatus, an electronic device, and a storage medium, wherein the data processing system includes: a data source device and a distributed cluster; the distributed cluster comprises at least two nodes; the data source equipment is used for sending the data fragments to first nodes corresponding to the data fragments in the distributed cluster based on a preset data fragment distribution strategy; and the first node is used for preprocessing the data fragments after receiving the data fragments and sending the preprocessed data fragments to the service processing equipment. According to the embodiment of the invention, the data volume transmitted to the service processing equipment in a short time is reduced, namely, aiming at a service scene with high data concurrency, the distributed cluster realizes the peak clipping effect on the full data transmission between the data source equipment and the service processing equipment, reduces the resource occupation on the service processing equipment, and further improves the service processing performance in the service processing equipment.

Description

Data processing system, method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing system, a data processing method, a data processing apparatus, an electronic device, and a storage medium.

Background

At present, in internet service development, scenes that a large amount of data needs to be transmitted in real time between a data source end and a service end often exist. For example, for any application, log of user's login, behavior data of user, etc., the data level is usually on the order of ten thousand per second or even higher.

If the data volume received by the service end in a short time is too large, a large amount of software and hardware resources of the service end are inevitably occupied, and even the normal service processing process of the service end is influenced, so that abnormal service is caused.

Disclosure of Invention

To solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide a data processing system, a method, an apparatus, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a data processing system, including:

a data source device and a distributed cluster; the distributed cluster comprises at least two nodes;

the data source device is used for sending the data fragments to first nodes corresponding to the data fragments in the distributed cluster based on a preset data fragment distribution strategy;

and the first node is used for preprocessing the data fragment after receiving the data fragment and sending the preprocessed data fragment to service processing equipment.

In a second aspect, an embodiment of the present disclosure further provides a data processing method, which is applied to a node in a distributed cluster, where the method includes:

receiving data fragments sent by data source equipment; the data fragments are sent by the data source equipment based on a preset data fragment distribution strategy;

and preprocessing the data fragments, and sending the preprocessed data fragments to service processing equipment.

In a third aspect, an embodiment of the present disclosure further provides a data processing method, which is applied to a data source device, and the method includes:

carrying out fragmentation processing on source data to obtain data fragments;

based on a preset data fragment distribution strategy, sending the data fragments to first nodes corresponding to the data fragments in the distributed cluster;

the first node is configured to, after receiving the data fragment, pre-process the data fragment, and send the pre-processed data fragment to a service processing device.

In a fourth aspect, an embodiment of the present disclosure further provides a data processing apparatus configured in a node in a distributed cluster, where the apparatus includes:

the data receiving module is used for receiving the data fragments sent by the data source equipment; the data fragments are sent by the data source equipment based on a preset data fragment distribution strategy;

and the data sending module is used for preprocessing the data fragments and sending the preprocessed data fragments to the service processing equipment.

In a fifth aspect, an embodiment of the present disclosure further provides a data processing apparatus configured to a data source device, where the apparatus includes:

the fragment processing module is used for carrying out fragment processing on the source data to obtain data fragments;

the fragment sending module is used for sending the data fragments to first nodes corresponding to the data fragments in the distributed cluster based on a preset data fragment distribution strategy;

In a sixth aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes: a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instructions from the memory and executing the executable instructions to realize any data processing method provided by the embodiment of the disclosure.

In a seventh aspect, this disclosed embodiment also provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program, when executed by a processor, implements any one of the data processing methods provided by this disclosed embodiment.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: by additionally arranging a distributed cluster between data source equipment and service processing equipment, preprocessing data fragments in the data source equipment by using the distributed cluster, wherein the preprocessing operation is related to the service requirement, and then sending the preprocessed data fragments to the service processing equipment by the distributed cluster, considering that the time for transmitting a large number of data fragments in the data source equipment to the service processing equipment after passing through the distributed cluster is different due to the difference of the receiving time of the nodes in the distributed cluster to the data fragments, the time for preprocessing the data fragments by the nodes and the difference of the data transmission speed between the nodes and the service processing equipment, so that the data quantity transmitted to the service processing equipment in a short time is reduced, namely, the distributed cluster realizes the peak clipping effect on the full-volume data transmission between the data source equipment and the service processing equipment aiming at the service scene of high data concurrency, the problem of large resource consumption caused by receiving full data in a short time in the service processing equipment in the prior scheme is solved, the resource occupation of the service processing equipment is reduced, the service processing performance in the service processing equipment is further improved, and the normal service processing process is ensured.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a block diagram of a data processing system according to an embodiment of the present disclosure;

FIG. 2 is a block diagram of another data processing system according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a data processing process based on a data processing system according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a distributed cluster management method provided in the embodiment of the present disclosure;

FIG. 5 is a block diagram of another data processing system according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a disaster recovery processing process of a distributed cluster according to an embodiment of the present disclosure;

fig. 7 is a flowchart of a data processing method provided by an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure;

FIG. 9 is a flow chart of another data processing method provided by the embodiments of the present disclosure;

fig. 10 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

Fig. 1 is a schematic structural diagram of a data processing system according to an embodiment of the present disclosure, where the embodiment of the present disclosure may be applied to a situation where data in a data source device is reasonably transmitted to a service processing device in a data high concurrence scenario, and any device or apparatus included in the data processing system may be implemented by software and/or hardware.

As shown in fig. 1, a data processing system provided by the embodiment of the present disclosure may include a data source device 101 and a distributed cluster 102; the distributed cluster 102 includes at least two nodes; the data source device 101 is configured to send the data fragment to a first node corresponding to the data fragment in the distributed cluster 102 based on a preset data fragment distribution policy; and the first node is used for preprocessing the data fragments after receiving the data fragments and sending the preprocessed data fragments to the service processing equipment. The service processing device is configured to determine a service processing result based on the received data fragment, for example, based on a data access request sent by the user terminal, and feed back data corresponding to the data access request to the user terminal.

Specifically, for different service processing scenarios, the data source device 101 may generate source data (or referred to as streaming data) related to the service processing scenario in real time, and fragment the generated source data according to a preset data fragmentation policy. The preset data fragmentation policy may include, but is not limited to: data fragmentation is performed according to the data size of the source data, or data fragmentation is performed according to a hash value obtained after hash calculation (the hash calculation can be realized by any available hash algorithm) is performed on the source data, or data fragmentation is performed according to a preset field included in the source data, and the like, so that the fragmentation distribution effect of the source data can be realized subsequently. After completing the fragmentation operation of the source data, the data source device 101 sends the data fragments to corresponding nodes according to a preset data fragment distribution policy (used for defining how to send the data fragments), for example, according to a corresponding relationship between the data fragments and the nodes in the distributed cluster 102. The correspondence between the data fragments and the nodes may be specified in advance by service personnel, may be determined by the data source device 101 randomly distributing the data fragments in the distributed cluster 102, or may be established by the data source device 101 according to a data subscription request of the nodes in the distributed cluster 102. The correspondence between the data fragment and the node in the distributed cluster 102 may be, for example, a correspondence between a sequence number of the data fragment and the node, a correspondence between a hash value of the data fragment and the node, a correspondence between a preset field included in the data fragment and the node, or the like.

Optionally, the first node in the distributed cluster 102 is further configured to: sending a data subscription request to the data source device 101; the data subscription request is used to request to establish a correspondence relationship between the first node and the data fragment, that is, if the node in the distributed cluster 102 has a requirement for receiving the data fragment, the data subscription request needs to be sent to the data source device 101 separately. By actively sending a data subscription request to the data source device 101 through the nodes in the distributed cluster 102, the pertinence of the node in subscribing the data fragments can be improved, and meanwhile, the successful sending of the subscription request can also indicate the normalization of the current node running state, so that the validity of the currently established corresponding relationship between the data fragments and the nodes is ensured.

Fig. 1 shows, as an example, a one-to-one correspondence relationship between data fragments and nodes in the distributed cluster 102, but this is not to be understood as a specific limitation to the embodiment of the present disclosure, that is, a correspondence relationship between a data fragment and a node in the distributed cluster 102 may be that one data fragment corresponds to one node, or that at least two data fragments correspond to one node. Preferably, the number of data fragments which can be received by each node in each data transmission process can be reasonably controlled, so that the data concurrency of each node is reduced, and the data processing pressure of each node is reduced.

After receiving the data fragment, the node in the distributed cluster 102 may determine, according to the current service processing scenario, a preprocessing operation that needs to be performed on the data fragment, and then send the preprocessed data fragment to the service processing device. The preprocessing operation may include any data processing operation that needs to be performed on the data fragment before the service processing device determines the service processing result, and may include at least one of the following: the data analysis may refer to analyzing the data fragments according to a preset data analysis logic, the data assembly may refer to assembling a plurality of data in the data fragments according to a preset data assembly requirement, the data association may refer to associating the data with relevance in the data fragments, and the data screening may refer to screening specific data from the data fragments according to a preset data screening strategy. The preprocessing operation may be adaptively determined for different service processing scenarios, and the embodiments of the present disclosure are not particularly limited. In addition, in the same service processing scenario, the preprocessing operations executed by different nodes may be the same or different, which is related to specific service processing requirements, and the embodiment of the present disclosure is not specifically limited.

For example, for a scenario in which a client interacts with a service processing device (which is equivalent to a server), the data source device 101 may continuously generate user data, where the user data may include basic information such as a user name, a job, a gender, and the like, and may also include information such as a community relationship of a user, the data source device 101 sends the user data to nodes in the distributed cluster 102 in a data fragmentation manner, and the nodes in the distributed cluster 102 may parse the received data fragmentation, assemble the user basic information and the community relationship information corresponding to the data fragmentation according to a preset data structure, and then send the assembled user data to the service processing device, so that after the service processing device receives a user information query request sent by a user terminal, the assembly operation on the user information may be omitted, thereby improving the efficiency of feeding back a query result to the user terminal, i.e. to improve the efficiency of the service processing.

In addition, it should be noted that, in the process that the data source device 101 performs fragmentation processing on source data generated in real time, and then transmits the data fragments to corresponding nodes in the distributed cluster according to a preset data fragment distribution strategy, on the basis of not ignoring the processing time of the data fragments, there are differences in the transmission time of different data fragments, and further, the data transmission speed between the data source device 101 and different nodes is considered, and there are differences in the receiving time of the data fragments by different nodes; furthermore, different nodes have differences in data processing performance, and the time difference of sending the preprocessed data to the service processing device in a fragmentation manner is further increased by the different nodes; finally, considering that the data transmission speeds between different nodes and the service processing device are also different, the time for transmitting a large amount of data fragments from the data source device 101 to the service processing device will be different due to the final integration, that is, a large amount of source data will not arrive at the service processing device at the same time in a short time, so that the shunt transmission of the source data is integrally realized.

In the technical solution of the embodiment of the present disclosure, by adding the distributed cluster 102 between the data source device and the service processing device, the distributed cluster 102 is used to preprocess the data fragments in the data source device 101, the preprocessing operation is related to the service requirement, and then the data fragments after preprocessing are sent to the service processing device by the distributed cluster 102, considering that there is a difference in the receiving time of the nodes in the distributed cluster 102 to the data fragments, there is a difference in the time spent by the nodes in preprocessing the data fragments, and there is a difference in the data transmission speed between the nodes and the service processing device, resulting in a difference in the time for which a large number of data fragments in the data source device 101 are retransmitted to the service processing device by the distributed cluster 102, thereby reducing the data amount transmitted to the service processing device in a short time, that is, for a service scenario with high data concurrence, the distributed cluster 102 achieves a peak clipping effect on the full data transmission between the data source device 101 and the service processing device, solves the problem of large resource consumption caused by the fact that the service processing device receives the full data in a short time in the existing scheme, reduces resource occupation of the service processing device, further improves service processing performance in the service processing device, and ensures a normal service processing process.

Fig. 2 is a schematic structural diagram of another data processing system provided in the embodiment of the present disclosure, which is further optimized based on the above technical solution. As shown in fig. 2, a data processing system provided by the embodiment of the present disclosure may include a data source device 101, a distributed cluster 102, and a management device 103, where:

the data source device 101 is configured to send the data fragment to a first node corresponding to the data fragment in the distributed cluster 102 based on a preset data fragment distribution policy;

the first node in the distributed cluster 102 is configured to, after receiving the data fragment, pre-process the data fragment, and send the pre-processed data fragment to the service processing device;

a management device 103 for: acquiring operation information of a first node, and determining an operation state of the first node based on the operation information; if the first node is determined to be in an abnormal operation state (at this time, the first node is an abnormal node), determining a second node in a normal operation state for the first node from the distributed cluster, and sending the corresponding relation between the first node and the second node to the data source device 101;

correspondingly, the data source device 101 is further configured to establish a corresponding relationship between the target data fragment corresponding to the first node and the second node based on the corresponding relationship between the first node and the second node, and delete the corresponding relationship between the first node and the target data fragment.

Wherein, the first node and the second node may both refer to any one of the nodes in the distributed cluster 102. In the embodiment of the present disclosure, the management device 103 is used to manage and maintain the nodes in the distributed cluster 102. The management device 103 may interact with the nodes in the distributed cluster 102 in real time, for example, transmit a heartbeat data packet based on a heartbeat mechanism, or the management device 103 acquires a node operation log, and the like, to acquire operation information of each node, where the operation information may include performance information of a central processing unit of the node, data receiving and sending time, and the like, so as to determine an operation state of each node. If the node is determined to have a fault, such as a downtime or a network connection error, based on the operation information of the node, the node is an abnormal node, otherwise, the node is a normal node. In order to ensure that the data fragment having established a correspondence with the abnormal node can be successfully sent to the service processing device through the distributed cluster 102, the management device 103 may determine a takeover node for the abnormal node from the normal nodes, for example, after determining that the first node is in an abnormal operating state, determine a second node corresponding to the first node, and be used to take over the data fragment subscribed from the first node to the data source device 101.

Continuing to take the running state abnormality of the first node as an example, the management device 103 may randomly determine an available second node for the first node in the normal nodes; or determining available idle resource information of each normal node according to the operation information of each normal node, preferably determining a normal node with more idle resources as a second node corresponding to the first node; or determining a corresponding second node for the first node from the normal nodes sending the takeover request according to the takeover request of the normal nodes; the corresponding second node may also be determined for the first node based on a voting mechanism between normal nodes. Other ways of determining the corresponding second node for the first node may also be flexibly employed.

After the management device 103 determines a corresponding second node for the first node, the management device 103 sends the corresponding relationship between the first node and the second node to the data source device 101, so that the data source device 101 sends the target data fragment corresponding to the first node, that is, the data fragment subscribed by the first node before the abnormality occurs, to the second node, thereby ensuring normal distribution of the target data fragment and integrity of data that can be received by the service processing device.

The management device 103 may also be configured to: after a second node is determined for the first node, the corresponding relation between the first node and the target data fragment is adjusted to the corresponding relation between the second node and the target data fragment; sending the corresponding relation between the second node and the target data fragment to the data source device 101; correspondingly, the data source device 101 is specifically configured to: and sending the target data fragment to the second node according to the received corresponding relation between the second node and the target data fragment. The management device 103 performs an operation of adjusting the correspondence between the data fragments and the nodes, which is helpful for improving the efficiency of distributing the target data fragments by the data source device 101, and is also helpful for the management device 103 to timely grasp the adjustment state of the correspondence between the data fragments and the nodes when abnormal nodes exist.

On the basis of the above technical solution, optionally, the management device 103 is further configured to: after determining that the first node is in an abnormal operating state, sending a node abnormal event of the first node to the remaining nodes (preferably, nodes in a normal operating state) except the first node in the distributed cluster 102; the node abnormal event is used for notifying the remaining nodes of the distributed system 102 that there are nodes with abnormal operating states and information of the nodes, that is, the node abnormal event may include an identifier of a node with an abnormal operating state and information of data fragments corresponding to the node with an abnormal operating state, and the identifier of the node is used for uniquely identifying the node;

correspondingly, the remaining nodes in the distributed cluster 102 are configured to determine the receiving time of the node abnormal event, and send a takeover request to the management device after a preset time interval; the preset time interval takes the receiving time of the node abnormal event as a time starting point, the specific value can be determined adaptively, for example, after the remaining nodes receive the node abnormal event and delay or sleep for 1 minute, a takeover request is sent to the management device 103, and the takeover request is used for requesting the management device 103 to determine the local node as a second node corresponding to the first node;

the management device 103 is specifically configured to determine a second node for the first node from the remaining nodes according to the reception time of the management request.

By sending the takeover request after the remaining nodes in the distributed cluster 102 pass through the preset time interval, the redundant operation of determining the second node with a normal operation state for the first node under the condition that the first node is only in an abnormal state transiently due to reasons such as network abnormality and the like and can be recovered to a normal node after the preset time interval can be avoided, and further the distribution error of the data fragments is avoided.

Of course, for the case that no node is only temporarily in an abnormal state, the remaining nodes in the distributed cluster 102 may send a takeover request to the management device 103 after receiving the node abnormal event, so as to improve the efficiency of the management device 103 determining the normal node corresponding to the abnormal node.

The management device 103 is specifically configured to determine a second node for the first node from the remaining nodes according to the receiving time of the request for management, and the receiving time of the request for management by the management device 103 may represent the data processing performance or the data transmission performance of the remaining nodes to a certain extent, for example, it is preferable to determine a normal node corresponding to the first received takeover request as the second node corresponding to the first node, so that on the basis of ensuring that each remaining node can fairly compete for the takeover authority for the first node, the normal node with relatively better data processing performance or data transmission performance is determined as the second node corresponding to the first node, thereby improving stability and availability of the data processing system.

Further, the remaining nodes in the distributed cluster 102 may also be configured to:

after receiving the node abnormal event, determining the number of abnormal nodes corresponding to the local nodes;

if the number of the abnormal nodes is determined to be smaller than the number threshold, sending a takeover request to the management device 103 after a preset time interval; the number threshold may be determined according to an actual situation, and may be set to 1, for example, that each normal node may additionally take over 1 abnormal node, and each normal node may be equivalent to receive the data fragment corresponding to 2 nodes. If the normal node determines that 1 abnormal node has been taken over currently, the number of the abnormal nodes is not less than the number threshold, and the take-over request is not sent to the management device 103. By setting the limit of the number threshold, the data processing capacity of the normal nodes can be effectively controlled, and the phenomenon that the data processing pressure of the normal nodes is large due to the fact that the number of the abnormal nodes taken over by the normal nodes is too large is avoided.

Optionally, the node in the distributed cluster 102, for example, the first node, is further configured to: before receiving the data fragment sent by the data source device 101, sending a registration request to the management device 103; the registration request is used to request the management device to record information of the local node, where the information may include any information related to the current node, such as a node identifier, a node IP address, operation information, and a correspondence between a data fragment and the current node.

In the embodiment of the present disclosure, each node in the distributed cluster 102 has equality, that is, each node may implement the same function or perform the same data processing operation, for example, for the aforementioned first node, when the running state of the first node is normal, the first node may also perform an operation of receiving a node exception event sent by the management device 103 and an operation of sending a takeover request to the management device 103.

On the basis of the above technical solution, optionally, the nodes in the distributed cluster 102 are specifically configured to, after receiving the data fragments, pre-process the data fragments, and send the pre-processed data fragments to the service processing device by calling the message middleware. Message middleware may include, but is not limited to, card-card (Kafka) message middleware, message-oriented middleware (RabbitMQ), queue model-based message middleware (rockmq), and the like. Among them, the Kafka message middleware can provide a uniform, high-throughput, low-latency platform for processing real-time data, and its persistence layer is essentially a "large-scale publish/subscribe message queue according to the distributed transaction log architecture".

Fig. 3 is a schematic diagram illustrating a data processing process based on a data processing system according to an embodiment of the present disclosure. As shown in fig. 3, the data source device 101 transmits the data fragments to corresponding nodes in the distributed cluster 102; the management device 103 is configured to manage nodes in the distributed cluster 102, and when a first node with an abnormal operating state occurs in the distributed cluster 102, determine a second node with a normal operating state for the first node, so that the data source device 101 may distribute data fragments subscribed by the first node to the second node corresponding to the first node, and ensure normal distribution of the data fragments; after receiving the data fragment, the nodes in the distributed cluster 102 preprocess the data fragment, and then transmit the preprocessed data fragment to the service processing device by calling any message queue in the message middleware. By calling the message middleware, the nodes in the distributed cluster 102 and the preprocessed data fragments can be decoupled, the preprocessed data fragments are cached in a message queue, the nodes in the distributed cluster 102 do not need to maintain the preprocessed data fragments, and the service processing equipment can acquire required data from the message queue according to the current requirement, so that the data concurrence amount in the service processing equipment is reduced.

Moreover, after completing the preprocessing operation of the data fragments, each node in the distributed cluster may select to distribute the preprocessed data fragments to the same message middleware or to different message middleware, that is, one or more message middleware may be deployed in the data processing system of the embodiment of the present disclosure. Taking the existence of N message queues as an example, since the peak clipping operation has been performed on the data transmitted from the data source device 101 to the service processing device in the distributed cluster 102, the current amount of concurrency arriving at the message queue may be considered to be 1/N of the original amount of concurrency, and therefore, the availability of the message middleware may also be greatly improved.

In addition, it should be noted that, in the embodiment of the present disclosure, the service processing device may also be implemented in a distributed cluster, where one node in the distributed cluster may correspond to one service system, and specifically, this is related to service deployment, and the embodiment of the present disclosure is not limited.

The management device 103 may also be implemented via distributed components, which may include, but are not limited to, an administrator (i.e., ZooKeeper) component that manages components of the big data ecosystem, and the like. The ZooKeeper component is a distributed application program coordination service of a distributed and open source code, can provide a consistency service for the distributed application, and provides service functions including configuration maintenance, domain name service, distributed synchronization, group service and the like.

Fig. 4 is a flowchart of a distributed cluster management method provided in the embodiment of the present disclosure, specifically taking an example that a management device is implemented by using a distributed ZooKeeper component, the function of the management device in the embodiment of the present disclosure is exemplarily described, but should not be construed as a specific limitation to the embodiment of the present disclosure.

As shown in fig. 4, a node in the distributed cluster for receiving the data fragment is deployed in a server instance, and after a program of the server instance is started, that is, after the node in the distributed cluster is started, a registration request is sent to the ZooKeeper cluster to request to become one node (referred to as zk node for short) in the ZooKeeper cluster, where a node name may be denoted by X, and the node will subscribe to the data fragment X in the data source device. In the process of registering as a zk node, the server instance may determine whether the current server instance has registered with the ZooKeeper cluster within a history time (the history time may be within a preset time period before a new zk node is created on the current server instance) by querying the registered zk node information in the ZooKeeper cluster, that is, checking whether there is a duplicate zk node belonging to the same instance. If so, deleting the repeated zk nodes which are registered in the history by the current server instance, creating a current new zk node, and simultaneously sending a registration request to the ZooKeeper cluster; if not, that is, the ZooKeeper cluster does not have the information of the current server instance, a current new zk node may be created, and a registration request may be sent to the ZooKeeper cluster. After the current server instance successfully registers with the ZooKeeper cluster, a data subscription request may be sent to the data source device to subscribe to the data fragments in the data source device.

Moreover, for the case that the information of the current server instance exists on the ZooKeeper cluster, that is, the current server instance joins the ZooKeeper cluster in the history time and leaves the ZooKeeper cluster, and now rejoins the ZooKeeper cluster, the current server instance needs to send a data fragment query request to the ZooKeeper cluster to obtain the data fragment a subscribed to the data source device after becoming the zk node in the history time. After determining the data fragment a subscribed in the historical time, the current server instance further needs to request the ZooKeeper cluster to send a notification to other zk nodes to notify the other zk nodes that the current server instance does not subscribe to the data fragment a subscribed after becoming a zk node in the historical time any more, and at the same time, after the current server instance is successfully registered as the zk node, the current server instance subscribes the data fragment a to the data source device again, so that repeated subscription of the data fragment between different zk nodes is avoided.

When all server instances in the distributed cluster for receiving data shards are normally started and subscribe to corresponding data shards, it means that all the server instances are successfully registered to the ZooKeeper cluster and become zk nodes. As shown in fig. 4, if a zk node is deleted due to a failure, by means of the ZooKeeper cluster, the remaining server instances in the distributed cluster monitor the deletion event of the zk node, and start to compete for the right to take over the zk node, according to a competition policy, for example, according to the receiving time of the ZooKeeper cluster for the takeover request sent by the remaining server instances, there will be a server instance that operates normally and successfully to take over the deleted zk node. According to the actual service scenario and according to the delay strategy, after monitoring the node deletion event, each of the remaining server instances running normally starts to sleep for a preset time, for example, 1 minute, and then competes for taking over the authority of the deleted zk node. The reason why the normally operating server instance sleeps for the preset time is that, based on the retry mechanism of the ZooKeeper cluster, if the deleted zk node is deleted only temporarily due to network failure and the like, the node can be recovered to the normal operating state within the preset time, and other server instances are not required to take over the node.

In order to ensure the correctness and consistency of data transmission in the data processing system, a ZooKeeper cluster can be used for carrying out a timing check task and sending a heartbeat data packet in real time to a server instance for receiving data fragments, so that each machine instance is ensured to be successfully registered in the ZooKeeper cluster and really subscribe the data fragments consistent with the subscription after being registered as zk nodes.

Also, the number of zk nodes that can be owned by each server instance may be preset, for example, fig. 4 is taken as an example, which shows that 2 zk nodes can be set, that is, one zk node is set to take over one failed zk node. Therefore, each zk node can confirm the number of all zk nodes included in the current server instance before beginning to contend to take over the failed zk node, and if the number of zk nodes is smaller than a preset zk threshold, for example, smaller than 2, the contention can be continued; if the zk value is not less than the preset zk threshold value, for example, 2 zk nodes are included on the current server instance, the current information is kept, and the data fragments subscribed by all the nodes under the current server instance are subscribed.

Fig. 5 is a schematic structural diagram of another data processing system according to an embodiment of the present disclosure. As shown in fig. 5, different nodes, or referred to as server instances, in the distributed cluster may be respectively deployed in different network environments, for example, 3 computer rooms shown in the figure, and store instance information of each node through a unified management service, for example, a Zookeeper cluster, where the management nodes in the Zookeeper cluster are also deployed in different network environments. The advantages of such a deployment are: even if a certain network environment has problems, the operation of each node in other network environments cannot be influenced, and the high availability of the data processing system is ensured. And by utilizing the Zookeeper cluster, the operation condition of the nodes in the distributed cluster in each network environment can be known in time, once abnormal conditions such as node downtime, network environment faults and the like occur to the nodes in the distributed cluster, other available normal nodes can also take over the data fragments subscribed to the data source equipment by the abnormal nodes in time, and the consistency of the data in the data processing system is ensured.

Fig. 6 is a schematic diagram of a disaster recovery processing process of a distributed cluster according to an embodiment of the present disclosure. As shown in fig. 6, under normal conditions, nodes deployed in distributed clusters in the computer room 1, the computer room 2, and the computer room 3 may subscribe to the data source device for fragmented data, and in fig. 6, data fragments subscribed by the nodes are represented by the same number as that of the nodes. Assuming that a problem occurs in the network environment of the machine room 3 and a problem occurs in the node 2 and the node 4 of the machine room 1 at the same time, that is, the related nodes become abnormal nodes, with the disaster recovery scheme provided in the embodiment of the present disclosure, the abnormal node 5, the abnormal node 6, and the abnormal node 7 in the machine room 3 may be respectively taken over by the normal node 8 in the machine room 2, the normal node 1 in the machine room 1, and the normal node 9 in the machine room 2, and simultaneously, the abnormal node 2 and the abnormal node 4 in the machine room 1 may be taken over by the normal node 10 in the machine room 2 and the node 3 in the machine room 1. The normal node taking over the abnormal node receives the data fragment subscribed by the abnormal node before the abnormal state, and in terms of service, no data loss exists in the data processing system, so that a feasible high-availability physical deployment architecture is realized. In fig. 6, each normal node is responsible for data of two partitions, which effectively controls data processing pressure of each node. In addition, the number of nodes and the number of data shards included in the distributed cluster shown in fig. 5 and fig. 6 are taken as an example, and should not be understood as a specific limitation to the embodiments of the present disclosure.

Fig. 7 is a flowchart of a data processing method provided in an embodiment of the present disclosure, which is applied to nodes in a distributed cluster. The embodiment of the disclosure can be applied to the situation that data in the data source device is reasonably transmitted to the service processing device in a high-concurrency data scene. The method provided by the embodiments of the present disclosure may be performed by a data processing apparatus, which may be implemented in software and/or hardware.

The data processing method provided by the embodiment of the present disclosure is the same as the implementation logic of the foregoing data processing system, and the details not described in the following embodiments may refer to the description in the foregoing embodiments.

As shown in fig. 7, a data processing method provided in an embodiment of the present disclosure may include:

s501, receiving data fragments sent by data source equipment; and the data fragments are sent by the data source equipment based on a preset data fragment distribution strategy.

S502, preprocessing the data fragments and sending the preprocessed data fragments to the service processing equipment.

Optionally, the method provided by the embodiment of the present disclosure further includes:

sending a data subscription request to data source equipment; the data subscription request is used for requesting to establish a corresponding relation with the data fragments for the local node.

Optionally, the method further includes: sending the running information of the local node to the management equipment; the running information is used for the management equipment to determine the running state of the local node, and after the management equipment determines that the local node is in the abnormal running state, the management equipment determines the node with the normal running state for the local node from the distributed cluster.

sending a registration request to the management device; the registration request is used to request the management device to record information of the local node, where the information may include any information related to the current node, such as a node identifier, a node IP address, operation information, and a correspondence between a data fragment and the current node.

Optionally, the sending the preprocessed data fragments to the service processing device includes:

and transmitting the preprocessed data fragments to the service processing equipment by calling message middleware.

In the embodiment of the disclosure, a distributed cluster is added between a data source device and a service processing device, the distributed cluster is used to preprocess data fragments in the data source device, the preprocessing operation is related to the service requirement, and then the preprocessed data fragments are sent to the service processing device by the distributed cluster, considering that the difference of the receiving time of the data fragments by the nodes in the distributed cluster, the difference of the time spent on preprocessing the data fragments by the nodes, and the difference of the data transmission speed between the nodes and the service processing device, the difference of the time spent on retransmitting a large number of data fragments in the data source device to the service processing device by the distributed cluster is caused, so as to reduce the data amount transmitted to the service processing device in a short time, that is, aiming at the service scene of high data concurrency, the distributed cluster realizes the peak clipping effect on the full data transmission between the data source device and the service processing device, the problem of large resource consumption caused by receiving full data in a short time in the service processing equipment in the prior scheme is solved, the resource occupation of the service processing equipment is reduced, the service processing performance in the service processing equipment is further improved, and the normal service processing process is ensured. In addition, by using the management device, the embodiment of the present disclosure provides a set of effective disaster recovery mechanisms for the data processing system, thereby improving the stability and availability of the data processing system and ensuring the consistency of data in the data processing system.

Fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure, where the data processing apparatus may be configured as a node in a distributed cluster, and the data processing apparatus may be implemented by software and/or hardware.

As shown in fig. 8, the data processing apparatus provided in the embodiment of the present disclosure may include a data receiving module 601 and a data sending module 602, where:

a data receiving module 601, configured to receive data fragments sent by a data source device; the data fragmentation is sent by the data source equipment based on a preset data fragmentation distribution strategy;

the data sending module 602 is configured to pre-process the data fragments, and send the pre-processed data fragments to the service processing device.

Optionally, the apparatus provided in the embodiment of the present disclosure further includes:

a subscription request sending module, configured to send a data subscription request to a data source device; the data subscription request is used for requesting to establish a corresponding relation with the data fragments for the local node.

the running information sending module is used for sending the running information of the local node to the management equipment; the running information is used for the management equipment to determine the running state of the local node, and after the management equipment determines that the local node is in the abnormal running state, the management equipment determines the node with the normal running state for the local node from the distributed cluster.

a registration request sending module, configured to send a registration request to the management device; the registration request is used to request the management device to record information of the local node, where the information may include any information related to the current node, such as a node identifier, a node IP address, operation information, and a correspondence between a data fragment and the current node.

Optionally, the data sending module 602 is specifically configured to: and preprocessing the data fragments, and transmitting the preprocessed data fragments to the service processing equipment by calling message middleware.

The data processing device configured to the nodes in the distributed cluster provided by the embodiment of the present disclosure can execute the data processing method applied to the nodes in the distributed cluster provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method. Reference may be made to the description of any method embodiment or any system embodiment of the disclosure for those details not described in the apparatus embodiments of the disclosure.

Fig. 9 is a flowchart of another data processing method provided in the embodiment of the present disclosure, which is applied to a data source device. The method provided by the embodiment of the present disclosure may be executed by a data processing apparatus, which may be implemented by software and/or hardware, and may be integrated on a data source device. The data processing method provided by the embodiment of the present disclosure is the same as the implementation logic of the foregoing data processing system, and the details not described in the following embodiments may refer to the description in the foregoing embodiments.

As shown in fig. 9, a data processing method provided in an embodiment of the present disclosure may include:

s801, carrying out fragmentation processing on the source data to obtain data fragments.

Specifically, any available data fragmentation policy may be used to perform fragmentation processing on the source data.

S802, based on a preset data fragment distribution strategy, the data fragments are sent to first nodes corresponding to the data fragments in the distributed cluster.

The first node is used for preprocessing the data fragments after receiving the data fragments and sending the preprocessed data fragments to the service processing equipment. The preset data fragment distribution policy may also be determined according to needs, and the embodiment of the present disclosure is not particularly limited.

Optionally, the method provided by the embodiment of the present disclosure may further include:

and receiving a data subscription request sent by the first node, and establishing a corresponding relation with the data fragments for the first node.

receiving a corresponding relation between a first node and a second node sent by a management device;

establishing a corresponding relation between a target data fragment corresponding to the first node and the second node based on the corresponding relation between the first node and the second node, and deleting the corresponding relation between the first node and the target data fragment;

the current running state of the first node is an abnormal running state, and the second node is a node with a normal running state determined for the first node from the distributed cluster by the management device, that is, the second node may be used to take over the data fragmentation subscribed by the first node.

By the technical scheme of the embodiment of the disclosure, the data volume transmitted to the service processing equipment in a short time is reduced, that is, for a service scene with high data concurrency, the distributed cluster realizes a peak clipping effect on the full data transmission between the data source equipment and the service processing equipment, reduces the resource occupation on the service processing equipment, and further improves the service processing performance in the service processing equipment.

Fig. 10 is a schematic structural diagram of another data processing apparatus provided in the embodiment of the present disclosure, where the data processing apparatus may be configured in a data source device, and the data processing apparatus may be implemented by software and/or hardware.

As shown in fig. 10, the data processing apparatus provided in the embodiment of the present disclosure may include a fragment processing module 1001 and a fragment sending module 1002, where:

a fragment processing module 1001, configured to perform fragment processing on source data to obtain data fragments;

the fragment sending module 1002 is configured to send a data fragment to a first node corresponding to the data fragment in the distributed cluster based on a preset data fragment sending policy;

the first node is used for preprocessing the data fragments after receiving the data fragments and sending the preprocessed data fragments to the service processing equipment.

Optionally, the apparatus provided in the embodiment of the present disclosure may further include:

and the subscription request receiving module is used for receiving the data subscription request sent by the first node and establishing the corresponding relation with the data fragments for the first node.

the corresponding relation receiving module is used for receiving the corresponding relation between the first node and the second node sent by the management equipment;

the corresponding relation adjusting module is used for establishing the corresponding relation between the target data fragment corresponding to the first node and the second node based on the corresponding relation between the first node and the second node, and deleting the corresponding relation between the first node and the target data fragment;

The data processing device configured in the data source equipment provided by the embodiment of the disclosure can execute any data processing method applied to the data source equipment provided by the embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment or any system embodiment of the disclosure for those details not described in the apparatus embodiments of the disclosure.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, which is used to exemplarily illustrate an electronic device that executes a data processing method according to the embodiment of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 11, the electronic device 700 may include a processing device (or referred to as a processor, e.g., a central processing unit, a graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage device 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, microphone, accelerometer, gyroscope, camera, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 11 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving data fragments sent by data source equipment; the data fragmentation is sent by the data source equipment based on a preset data fragmentation distribution strategy; and preprocessing the data fragments, and sending the preprocessed data fragments to the service processing equipment.

Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: carrying out fragmentation processing on source data to obtain data fragments; based on a preset data fragment distribution strategy, sending the data fragments to first nodes corresponding to the data fragments in the distributed cluster; the first node is used for preprocessing the data fragments after receiving the data fragments and sending the preprocessed data fragments to the service processing equipment.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may operate over any of a variety of networks: including a Local Area Network (LAN) or a Wide Area Network (WAN), to the user's computer, or may be connected to an external computer (for example, through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules or units described in the embodiments of the present disclosure may be implemented by software or hardware. The name of a module or unit does not in some cases form a limitation on the module or unit itself, for example, a data receiving module may also be described as a "module for receiving data fragments sent by a data source device".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data processing system, comprising:

2. The system of claim 1, wherein the first node is further configured to:

sending a data subscription request to the data source device; the data subscription request is used for requesting to establish a corresponding relation with data fragments for the first node.

3. The system of claim 1, further comprising a management device configured to:

acquiring operation information of the first node, and determining an operation state of the first node based on the operation information;

if the first node is determined to be in an abnormal operation state, determining a second node with a normal operation state for the first node from the distributed cluster, and sending the corresponding relation between the first node and the second node to the data source equipment;

correspondingly, the data source device is further configured to establish a correspondence between a target data fragment corresponding to the first node and the second node based on the correspondence between the first node and the second node, and delete the correspondence between the first node and the target data fragment.

4. The system of claim 3, wherein the management device is further configured to:

after the first node is determined to be in an abnormal operation state, sending a node abnormal event of the first node to the remaining nodes except the first node in the distributed cluster;

correspondingly, the remaining nodes are configured to determine the receiving time of the node abnormal event, and send a takeover request to the management device after a preset time interval; wherein the preset time interval takes the receiving time of the node abnormal event as a time starting point;

the management device is specifically configured to determine the second node for the first node from the remaining nodes according to the time of receiving the takeover request.

5. The system of claim 4, wherein the remaining nodes are further configured to:

after the node abnormal event is received, determining the number of abnormal nodes corresponding to the local nodes;

and if the number of the abnormal nodes is smaller than the number threshold, sending the takeover request to the management equipment after the preset time interval.

6. The system of claim 1, wherein the nodes in the distributed cluster are specifically configured to:

and sending the preprocessed data fragments to the service processing equipment by calling message middleware.

7. The system of claim 1, wherein the pre-processing comprises at least one of: data analysis, data assembly, data association and data screening.

8. A data processing method applied to a node in a distributed cluster, the method comprising:

9. The method of claim 8, further comprising:

sending the running information of the local node to the management equipment;

the running information is used for the management equipment to determine the running state of the local node, and after the management equipment determines that the local node is in an abnormal running state, the management equipment determines a node with a normal running state for the local node from the distributed cluster.

10. A data processing method applied to a data source device, the method comprising:

carrying out fragmentation processing on source data to obtain data fragments;

based on a preset data fragment distribution strategy, sending the data fragment to a first node corresponding to the data fragment in a distributed cluster;

11. A data processing apparatus configured for nodes in a distributed cluster, the apparatus comprising:

12. A data processing apparatus, arranged at a data source device, the apparatus comprising:

13. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the data processing method of claims 8-9 or to implement the data processing method of claim 10.

14. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the data processing method of claims 8-9, or implements the data processing method of claim 10.