CN115629918B - Data processing method, device, electronic equipment and storage medium - Google Patents

Data processing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115629918B
CN115629918B CN202211305396.1A CN202211305396A CN115629918B CN 115629918 B CN115629918 B CN 115629918B CN 202211305396 A CN202211305396 A CN 202211305396A CN 115629918 B CN115629918 B CN 115629918B
Authority
CN
China
Prior art keywords
data
standby
water level
level time
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211305396.1A
Other languages
Chinese (zh)
Other versions
CN115629918A (en
Inventor
谭伟良
程怡
李成吉
龙飞
尚忠彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211305396.1A priority Critical patent/CN115629918B/en
Publication of CN115629918A publication Critical patent/CN115629918A/en
Application granted granted Critical
Publication of CN115629918B publication Critical patent/CN115629918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant

Abstract

The disclosure provides a data processing method, a data processing device, electronic equipment and a storage medium, and relates to the technical field of big data. The implementation scheme is as follows: responsive to a master ceasing to process a data stream, obtaining first data in the data stream that has been processed by the master; based on the first data, de-duplicating second data to be processed by the standby equipment in the data stream; processing the second data after the duplication removal; and responsive to the data processing schedule of the backup device leading the primary device, taking the backup device as the primary device for processing the data stream.

Description

Data processing method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to the field of big data technology, and in particular, to a data processing method and apparatus, an electronic device, a computer readable storage medium, and a computer program product.
Background
A data stream, also called an event stream, is a borderless data set. New data is added to the data stream over time, allowing the data in the data stream to grow continuously and indefinitely. Common data streams include user behavior data in an application (App) (e.g., browse, click, pay, etc.), network switch flow data, sensor collected data, etc.
The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.
Disclosure of Invention
The present disclosure provides a data processing method and apparatus, an electronic device, a computer readable storage medium, and a computer program product.
According to an aspect of the present disclosure, there is provided a data processing method applied to a standby device, the standby device and a main device being configured to process the same data stream, a data processing progress of the standby device lagging behind that of the main device, the method comprising: responsive to the master device ceasing to process the data stream, obtaining first data in the data stream that has been processed by the master device; based on the first data, de-duplicating second data to be processed by the standby equipment in the data stream; processing the second data after the duplication removal; and responsive to the data processing schedule of the backup device leading the primary device, taking the backup device as the primary device for processing the data stream.
According to an aspect of the present disclosure, there is provided a data processing apparatus for use with a backup device, the backup device and a master device being configured to process the same data stream, the data processing progress of the backup device lagging behind the master device, the apparatus comprising: an acquisition module configured to acquire first data in the data stream that has been processed by the master device in response to the master device stopping processing the data stream; a deduplication module configured to deduplicate second data to be processed by the standby device in the data stream based on the first data; the processing module is configured to process the second data after the duplication removal; and a primary-backup switching module configured to take the backup device as a primary device for processing the data stream in response to a data processing progress of the backup device leading the primary device.
According to an aspect of the present disclosure, there is provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method described above.
According to an aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the above-described data processing method.
According to an aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above-described data processing method.
According to one or more embodiments of the present disclosure, precise consistency of data states from end-to-end (data processing system to downstream system) can be ensured when a master-slave switch of the data processing system occurs.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.
FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to some embodiments of the present disclosure;
FIG. 2 illustrates a flow chart of a data processing method according to some embodiments of the present disclosure;
FIG. 3 illustrates a schematic diagram of high water level times and low water level times according to some embodiments of the present disclosure;
FIG. 4 illustrates a schematic diagram of first data according to some embodiments of the present disclosure;
FIG. 5 illustrates a block diagram of a data processing apparatus according to some embodiments of the present disclosure; and
fig. 6 illustrates a block diagram of an exemplary electronic device that can be used to implement some embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another element. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.
The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
The stream processing system is used for processing the data stream. The stream processing system receives a data stream sent from an upstream system (such as a service system), processes the data stream, and outputs a processing result to a downstream system (such as a sink system, an application system, etc.). In a streaming computing scenario, end-to-end state consistency includes state consistency between the streaming processing system and the downstream system, i.e., consistency in data state between the downstream system and the streaming processing system after the streaming processing system experiences an exception and returns to normal. The degree of state consistency between a streaming processing system and a downstream system can be represented by the following three semantics:
1. At least once (at least once): the data is processed at least once. When the streaming system is abnormal and returns to normal, some data may be repeatedly processed.
2. At most once (at most once): the data is processed at most once. No processing is performed when an exception occurs in the streaming processing system. After the system has recovered to normal, some data may be lost.
3. Exact once (once): the data is processed and only once. When the stream processing system is abnormal and returns to normal, the data is neither repeatedly processed nor lost.
It will be appreciated that "exactly once" is the most stringent state consistency requirement among the three semantics described above.
For important streaming services (e.g., billing services), in order to ensure high availability of the services, streaming systems typically employ redundant multiple computing modules (e.g., a primary device and a backup device) for multiple computations, while ensuring that only one computing module (i.e., the primary device) issues the computation results to downstream systems. When the master device is abnormal, the master-slave switching is needed, and the standby device takes over the current master device to provide service. For important streaming services such as billing services, end-to-end semantic variability (i.e., duplication or loss of data) is unacceptable. Therefore, in the process of switching between the primary and the secondary, it is necessary to ensure the precise consistency between the stream processing system and the downstream system, i.e., the "precise once" consistency.
In the related art, precise consistency between a streaming processing system and a downstream system is typically achieved by idempotent writing of the downstream system. Idempotent writing refers to that data can be processed multiple times by a streaming system, but the state in the downstream system is updated only once. This scheme relies on the idempotent writing capability of the downstream system, with poor versatility. If the downstream system does not have idempotent writing capability, the data may be repeatedly processed, and the end-to-end precise consistency cannot be ensured.
In view of the above problems, an embodiment of the present disclosure provides a data processing method, which optimizes a primary-backup switching process of a data processing system, so as to ensure an end-to-end accurate consistency from the data processing system to a downstream system.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, according to some embodiments of the present disclosure. Referring to FIG. 1, the system 100 includes one or more client devices 111, 112, 113, 114, 115, and 116, an upstream business system 120, a data processing system 130, and a downstream storage system 140. Client devices 111-116, upstream business system 120, data processing system 130, and downstream storage system 140 may communicate via a network.
Client devices 111-116 may provide interfaces that enable users of the client devices to interact with the client devices. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that the present disclosure may support any number of client devices.
Client devices 111-116 may include various types of computer devices, such as portable handheld devices, general-purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, vehicle devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays (such as smart glasses) and other devices. The gaming system may include various handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.
Client devices 111-116 may be configured to execute one or more application programs. Accordingly, the upstream business system 120 may be a server to which the applications correspond.
In some embodiments, the client devices 111-116 send data (or referred to as "events") to the upstream business system 120 based on user interactions, such as user browsing, clicking, downloading, payment, and the like behavior data. The upstream traffic system 120 receives data over time, the received data forming a data stream to be processed. Each piece of data in the data stream includes an event time (event time) and a globally unique identification. Wherein the event time represents a production time of the corresponding data.
The data stream generated by the upstream service system 120 is sent to a data processing system 130 for processing. The data processing system 130 is a streaming processing system.
In embodiments of the present disclosure, data processing system 130 includes a primary device 132 and a backup device 134, both primary device 132 and backup device 134 configured to process data streams, thereby enabling increased availability and disaster recovery capabilities of data processing system 130.
It should be noted that the data processing system 130 may include one or more primary devices 132 and may also include one or more backup devices 134. In the case where the data processing system 130 includes a plurality of masters 132, the plurality of masters 132 form a master cluster. In the case where the data processing system 130 includes a plurality of standby devices 134, the plurality of standby devices 134 form a standby cluster.
In an embodiment of the present disclosure, the data streams generated by the upstream business system 120 are sent to both the primary device 132 and the backup device 134. The data received by the primary device 132 and the backup device 134 are identical (both are complete data streams). The primary device 132 and the backup device 134 process the data streams, respectively, and the backup device 134 acts as a backup to the primary device 132 with the data processing schedule lagging the primary device 132.
It should be noted that, the data streams received by the primary device 132 and the standby device 134 may be out of order to some extent. For example, data a having a later event time may arrive at the primary device 132 or the backup device 134 earlier than data B having a earlier event time due to a network or the like.
In an embodiment of the present disclosure, the master device 132 processes the data stream and writes the data processing results to the downstream storage system 140. The downstream storage system 140 may be a temporary storage system (e.g., message queue, etc.) or a persistent storage system (e.g., distributed file system, object storage system, etc.).
According to some embodiments, the downstream storage system 140 may include a plurality of groupings (i.e., storage locations). For example, where the downstream storage system 140 is a message queue, the packet may be a partition within a certain message topic (topic). In the case where the downstream storage system 140 is a distributed file system, the grouping may be a certain file path or a certain file fragment in a file path.
The master device 132 may determine a packet corresponding to the data based on the identification of the data and write the processing result of the data to the corresponding packet. Thus, the storage location of each piece of data in the data stream is fixed.
According to some embodiments, each packet of the downstream storage system 140 may correspond to a different hash value. The master device 132 may calculate a hash value of the identity of the data and store the data into the corresponding packet based on the hash value. Determining the packets based on the hash values enables the data to be distributed relatively evenly among the individual packets, thereby improving the efficiency of utilization of the downstream storage system 140.
According to some embodiments, system 100 further includes a spare storage system 150. The backup device 134 processes the data stream and writes the data processing results to the backup storage system 150. The backup storage system 150 may be a temporary storage system (e.g., message queue, etc.) or a persistent storage system (e.g., distributed file system, object storage system, etc.). In the event that system 100 does not include a spare storage system 150, spare device 134 may also discard its data processing results.
According to some embodiments, the spare storage system 150 may also include multiple groupings, similar to the downstream storage system 140. The spare device 134 may determine a packet to which the data corresponds based on the identification of the data and write the processing result of the data into a corresponding packet of the spare storage system 150. The process by which the backup device 134 writes the data processing results into the groupings of the backup storage system 150 is similar to the process by which the primary device 132 writes the data processing results into the groupings of the downstream storage system 140, and will not be described in detail herein.
In an embodiment of the present disclosure, the data processing system 130 may perform a master-slave switch, i.e., the original standby device 134 is used as a master device for processing a data stream, and the original primary device 132 is used as a standby device for processing a data stream. In the event that a data processing system 130 requires a primary-to-backup switch (e.g., in the event that primary device 132 fails or processing performance fails to meet expectations), primary device 132 stops processing the data stream, backup device 134 may execute data processing method 200 of embodiments of the present disclosure to effect the primary-to-backup switch and ensure accurate consistency (i.e., exact one) of data processing system 130 to downstream storage system 140.
Fig. 2 shows a flow chart of a data processing method 200 according to an embodiment of the present disclosure. As described above, the method 200 is applied to a standby device, i.e., the execution subject of the method 200 is a standby device (e.g., the standby device 134 in fig. 1). As described above, the standby device and the primary device are configured to process the same data stream, and the data processing schedule of the standby device lags behind the primary device.
As shown in fig. 2, the method 200 includes steps S210-S240.
In step S210, in response to the master stopping processing the data stream, first data in the data stream that has been processed by the master is acquired.
In step S220, the second data to be processed by the standby device in the data stream is deduplicated based on the first data.
In step S230, the deduplicated second data is processed.
In step S240, in response to the data processing progress of the spare device leading the master device, the spare device is taken as the master device for processing the data stream.
According to the embodiment of the disclosure, when the data processing system is subjected to active-standby switching, the end-to-end accurate consistency can be ensured. After the main device stops processing the data stream, the standby device acquires the data processed by the main device (namely, the first data) and performs deduplication on the data to be processed (namely, the second data), so that the data processed by the standby device after the main device stops serving and the data processed by the main device are not repeated and lost.
The steps of method 200 are described in detail below.
According to some embodiments, the operational status of a master device may be monitored during operation of a data processing system. And triggering the primary and standby switching when the failure of the primary device is detected or the data processing performance cannot meet the expectations, and stopping processing the data stream by the primary device to wait for being replaced by the standby device.
For step S210, the standby device acquires first data in the data stream that has been processed by the primary device in response to the primary device stopping processing the data stream. The first data is a data set obtained by tracing back a piece of data for a certain time from the latest piece of data processed by the master device. It can be understood that if the backtracking time window is too short, it cannot be guaranteed that the data to be processed by the standby device is completely deduplicated; if the backtracking time window is too long, the data size of the first data is too large, more memory of the standby equipment is occupied, and more time is required for backtracking the data, so that the stopping time of the data processing service is prolonged, and the usability of the data processing system is reduced.
According to some embodiments, the first data may be determined according to the following steps S212-S216:
in step S212, a high water mark time of the master device, which indicates the maximum value of event times of the data processed by the master device, is acquired. In other words, the high water level time is the event time of the latest piece of data that the master has processed.
In step S214, a low water mark time of the standby device is acquired, the low water mark time indicating that data of which event time in the data stream is less than or equal to the low water mark time has been processed by the standby device.
In step S216, data having an event time greater than or equal to the low water level time of the standby device and less than or equal to the high water level time of the main device is taken as first data.
According to the first data determined in the steps S212-S216, the minimum set of data to be deduplicated can ensure that the data to be processed by the backup device is completely deduplicated, ensure the data processing efficiency, shorten the time for stopping the data processing service as much as possible, and realize the master-backup switching without sense to the downstream system.
It should be noted that, in the above embodiment, only the high water level time of the main device and the low water level time of the standby device are used in determining the first data. According to other embodiments, the high water level time of the main device and the low water level time of the standby device are acquired, and the low water level time of the main device and the high water level time of the standby device are also acquired. Although the low water level time of the main device and the high water level time of the standby device are not used to determine the first data, both may be used as references to the system operation state.
According to some embodiments, for step S212, in case the master has a plurality of (a plurality of masters form a master cluster), the high water level time of the master may be determined as follows: acquiring the maximum event time of the processed data of each of the plurality of main devices to obtain a plurality of first maximum event times; and taking the maximum value of the first maximum event times as the high water level time of the main equipment. Further, a minimum value among the plurality of first maximum event times may be taken as a low water level time of the master device.
Similarly, according to some embodiments, for step S214, in the case where there are a plurality of standby devices (a plurality of standby devices form a standby cluster), the low water level time of the standby device may be determined as follows: acquiring the maximum event time of the processed data of each standby device in the plurality of standby devices to obtain a plurality of second maximum event times; and taking the minimum value of the second maximum event times as the low water level time of the standby equipment. Further, the maximum value of the above-described plurality of second maximum event times may be taken as the high water level time of the standby device.
According to the above embodiment, for a cluster (e.g., a primary cluster, a backup cluster), the high water level time of the cluster refers to the fastest data processing progress of the individual devices within the cluster, and the low water level time refers to the slowest data processing progress of the individual devices within the cluster. According to some embodiments, since the backup cluster serves as a backup for the primary cluster, the fastest data processing progress of the backup cluster cannot be greater than the slowest data processing progress of the primary cluster, i.e., the high-water-level time of the backup cluster is less than or equal to the low-water-level time of the primary cluster.
Fig. 3 illustrates a schematic diagram of a high water level time and a low water level time according to some embodiments of the present disclosure. In the embodiment shown in fig. 3, the master cluster comprises four masters, master 1-master 4. Rectangular boxes 311-314 in fig. 3 represent the data processing progress of master 1-master 4, respectively. As seen by rectangular boxes 311-314, the maximum event times (i.e., first maximum event times) of the data that master 1-master 4 has processed are t, respectively 7 、t 4 、t 5 、t 6 。t 4 -t 7 Maximum t of (a) 7 High water time, minimum t, for master device (master cluster) 4 Low water time for the master (master cluster).
The standby cluster comprises three devices, namely standby device 1-a backup device 3. Rectangular boxes 321 to 323 in fig. 3 represent the data processing progress of the spare device 1 to the spare device 3, respectively. As can be seen from the rectangular boxes 321-323, the maximum event times (i.e., second maximum event times) of the data processed by the standby device 1-4 are t, respectively 1 、t 3 、t 2 。t 1 -t 3 Maximum t of (a) 3 High water level time for standby equipment (standby cluster), minimum value t 1 Low water time for the standby device (standby cluster).
After determining the high water level time of the main cluster and the low water level time of the standby device based on steps S212 and S214, data having an event time greater than or equal to the low water level time of the standby device and less than or equal to the high water level time of the main device may be taken as the first data in step S216. The first data is the minimum set of data to be deduplicated, so that the data to be processed by the standby equipment can be completely deduplicated, meanwhile, the data processing efficiency can be guaranteed, the time for stopping the data processing service is shortened as much as possible, and the main and standby switching of a downstream system is realized.
The rationality of data having an event time greater than or equal to the low water level time of the standby device and less than or equal to the high water level time of the main device as the first data is analyzed as follows in connection with fig. 4.
In fig. 4, rectangular boxes 411 and 412 respectively represent the fastest data processing progress and the slowest data processing progress of the main cluster, and as shown in the rectangular boxes 411 and 412, the high water level time and the low water level time of the main device are respectively t 4 、t 3 . Rectangular boxes 421 and 422 respectively show the fastest data processing progress and the slowest data processing progress of the standby cluster, and as shown in the rectangular boxes 421 and 422, the high water level time and the low water level time of the standby device are respectively t 2 、t 1 . In some embodiments, the high water level time of the standby device may be equal to the low water level time of the main device, i.e., t 2 =t 3
For data with event time T preceding T1 (i.e., T < T1): for both the primary and the backup clusters, these data have been processed, so there is no need to trace back this part of the data when switching between primary and backup.
For data with event time T between T1-T3 (i.e., t1.ltoreq.T.ltoreq.t3): for the primary cluster, these data have all been processed. For the backup clusters, some of the data has been processed and another portion has not been processed. Data that has been processed by the primary cluster but not by the backup cluster needs to be deduplicated to prevent the backup cluster from repeatedly sending such data to downstream systems after successor to the primary cluster. Thus, the backup cluster needs to trace back the data in the period t1-t3 that the primary cluster has processed.
For data with event time T between T3-T4 (i.e., t3.ltoreq.T.ltoreq.t4): for the primary cluster, some of this data has been processed and some has not been processed. For the backup clusters, none of the data is processed. The backup cluster needs to de-duplicate the data processed by the primary cluster, so the backup cluster needs to trace back the data processed by the primary cluster in the t3-t4 time period.
In summary, the backup cluster needs to trace back the data in the period of t1-t4 processed by the main cluster, that is, the data with the event time greater than or equal to the low water level time of the backup device and less than or equal to the high water level time of the main device is used as the first data.
According to some embodiments, the first data may be retrieved from a respective downstream storage location, the retrieving the first data comprising: determining a storage location of the first data based on the identification of the first data; and retrieving the first data from the storage location.
According to the above embodiment, the storage locations of the data in the downstream storage system are determined based on the identification of the data, and thus each piece of data has a fixed storage location in the downstream storage system. The first data is obtained from the fixed storage location, so that the total backtracking of all storage locations (i.e. all packets) of the downstream storage system can be avoided, and the time consumption and the memory occupation for backtracking the first data are reduced.
According to some embodiments, the storage location of the first data may be determined based on the hash value of the identification of the first data. The storage positions are determined based on the hash values, so that data can be distributed relatively uniformly in each storage position, the utilization efficiency of a downstream storage system is improved, and excessive data backtracking pressure of certain storage positions is avoided.
According to some embodiments, the method 200 further comprises: storing the first data to a memory; and responsive to the data processing schedule of the standby device leading the primary device, purging the first data from the memory.
According to the above embodiment, the standby device stores the first data to the local memory after acquiring the first data. And when the data processing progress of the standby equipment is ahead of that of the main equipment, the standby equipment takes over the main equipment to complete the main-standby switching. At this time, the first data is cleared from the memory of the standby device, so that the memory space of the standby device is released, and the memory is saved for other use.
For step S240, in response to the data processing progress of the standby device leading the primary device, the standby device is taken as the primary device for processing the data stream, and the original primary device is taken as the standby device for processing the data stream, thereby completing the primary-standby switching. After the switching of the main device and the standby device is completed, the new standby device and the new main device can process the data stream, and the data processing progress of the new standby device lags behind that of the new main device.
According to some embodiments, in a case where the main device and the standby device are respectively plural, it may be determined that the data processing progress of the standby device leads the main device in response to the low water level time of the standby device being greater than the high water level time of the main device. That is, when the slowest data processing progress of the standby cluster exceeds the fastest data processing progress of the main cluster, the main/standby switching is performed.
According to an embodiment of the present disclosure, there is also provided a data processing apparatus.
Fig. 5 shows a block diagram of a data processing apparatus 500 according to an embodiment of the present disclosure. The apparatus 500 is applied to a standby device. The standby device and the primary device are configured to process the same data stream, and the data processing schedule of the standby device lags behind the primary device. As shown in fig. 5, the apparatus 500 includes an acquisition module 510, a deduplication module 520, a processing module 530, and a primary/backup switching module 540.
An acquisition module 510 is configured to acquire first data in the data stream that has been processed by the master device in response to the master device ceasing to process the data stream;
the deduplication module 520 is configured to deduplicate second data in the data stream to be processed by the standby device based on the first data;
The processing module 530 is configured to process the deduplicated second data; and
the primary-to-backup switching module 540 is configured to take the backup device as a primary device for processing the data stream in response to the data processing schedule of the backup device leading the primary device.
According to the embodiment of the disclosure, when the data processing system is subjected to active-standby switching, the end-to-end accurate consistency can be ensured. After the main device stops processing the data stream, the standby device acquires the data processed by the main device (namely, the first data) and performs deduplication on the data to be processed (namely, the second data), so that the data processed by the standby device after the main device stops serving and the data processed by the main device are not repeated and lost.
According to some embodiments, the data stream includes a plurality of pieces of data, each piece of data of the plurality of pieces of data includes an event time, and the obtaining module 510 includes: a first acquisition unit configured to acquire a high water level time of the master device, wherein the high water level time indicates a maximum value of event times of data processed by the master device; a second acquisition unit configured to acquire a low water level time of the standby device, wherein the low water level time indicates that data of which event time in the data stream is less than or equal to the low water level time has been processed by the standby device; and a first determination unit configured to take, as the first data, data having an event time greater than or equal to the low water level time and less than or equal to the high water level time.
According to some embodiments, the master device has a plurality, and the first acquisition unit is further configured to: acquiring the maximum event time of the processed data of each of the plurality of main devices to obtain a plurality of first maximum event times; and taking the maximum value of the plurality of first maximum event times as the high water level time.
According to some embodiments, the backup device has a plurality, and the second acquisition unit is further configured to: acquiring the maximum event time of the processed data of each standby device in the plurality of standby devices to obtain a plurality of second maximum event times; and taking a minimum value of the plurality of second maximum event times as the low water level time.
According to some embodiments, the obtaining module 510 includes: a second determination unit configured to determine a storage location of the first data based on an identification of the first data; and a third acquisition unit configured to acquire the first data from the storage location.
According to some embodiments, the second determining unit is further configured to: a storage location of the first data is determined based on the hash value of the identification of the first data.
According to some embodiments, the apparatus 500 further comprises: a storage module configured to store the first data to a memory; and a clearing module configured to clear the first data from the memory in response to the data processing progress of the standby device leading the primary device.
According to some embodiments, the apparatus 500 further comprises: and a determination module configured to determine that the data processing schedule of the standby device leads the main device in response to the low water level time being greater than the high water level time.
It should be appreciated that the various modules or units of the apparatus 500 shown in fig. 5 may correspond to the various steps in the method 200 described with reference to fig. 2. Thus, the operations, features and advantages described above with respect to method 200 apply equally to apparatus 500 and the modules and units comprised thereof. For brevity, certain operations, features and advantages are not described in detail herein.
Although specific functions are discussed above with reference to specific modules, it should be noted that the functions of the various modules discussed herein may be divided into multiple modules and/or at least some of the functions of the multiple modules may be combined into a single module.
It should also be appreciated that various techniques may be described herein in the general context of software hardware elements or program modules. The various modules described above with respect to fig. 5 may be implemented in hardware or in hardware in combination with software and/or firmware. For example, the modules may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these modules may be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the modules 510-540 may be implemented together in a System on Chip (SoC). The SoC may include an integrated circuit chip including one or more components of a processor (e.g., a central processing unit (Central Processing Unit, CPU), microcontroller, microprocessor, digital signal processor (Digital Signal Processor, DSP), etc.), memory, one or more communication interfaces, and/or other circuitry, and may optionally execute received program code and/or include embedded firmware to perform functions.
There is further provided, in accordance with an embodiment of the present disclosure, a data processing system including a master device and a slave device for processing the same data stream, the data processing progress of the standby device lagging the master device, wherein the master device is configured to: stopping processing the data stream in response to the data processing system requiring active-standby switching; the standby device is configured to: responsive to the master device ceasing to process the data stream, obtaining first data in the data stream that has been processed by the master device; based on the first data, de-duplicating second data to be processed by the standby equipment in the data stream; processing the second data after the duplication removal; and responsive to the data processing schedule of the backup device leading the primary device, taking the backup device as the primary device for processing the data stream.
There is also provided, in accordance with an embodiment of the present disclosure, an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor, the memory storing instructions executable by the at least one processor to enable the at least one processor to perform the data processing methods of the embodiments of the present disclosure.
According to an embodiment of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the data processing method of the embodiment of the present disclosure.
According to an embodiment of the present disclosure, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the data processing method of the embodiments of the present disclosure.
Referring to fig. 6, a block diagram of an electronic device 600 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the electronic device 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the electronic device 600 can also be stored. The computing unit 601, ROM 602, and RAM603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the electronic device 600 are connected to the I/O interface 605,comprising the following steps: an input unit 606, an output unit 607, a storage unit 608, and a communication unit 609. The input unit 606 may be any type of device capable of inputting information to the electronic device 600, the input unit 606 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 607 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 608 may include, but is not limited to, magnetic disks, optical disks. The communication unit 609 allows the electronic device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chipset, such as bluetooth TM Devices, 802.11 devices, wi-Fi devices, wiMAX devices, cellular communication devices, and/or the like.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. One or more of the steps of the method 200 described above may be performed when a computer program is loaded into RAM 603 and executed by the computing unit 601. Alternatively, in other embodiments, computing unit 601 may be configured to perform method 200 by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely illustrative embodiments or examples and that the scope of the present disclosure is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims (19)

1. A data processing method applied to a standby device, the standby device and a primary device configured to process the same data stream, the data processing progress of the standby device lagging behind the primary device, the method comprising:
Responsive to the master device ceasing to process the data stream, obtaining first data in the data stream that has been processed by the master device;
based on the first data, de-duplicating second data to be processed by the standby equipment in the data stream;
processing the second data after the duplication removal; and
and in response to the data processing progress of the standby device leading the main device, using the standby device as the main device for processing the data stream.
2. The method of claim 1, wherein the data stream comprises a plurality of pieces of data, each piece of data comprising an event time, the acquiring first data in the data stream that has been processed by the master device comprising:
acquiring high water level time of the main equipment, wherein the high water level time indicates the maximum value of event time of processed data of the main equipment;
acquiring low water level time of the standby equipment, wherein the low water level time indicates that data of which event time in the data stream is less than or equal to the low water level time are processed by the standby equipment; and
and taking the data with the event time being greater than or equal to the low water level time and less than or equal to the high water level time as the first data.
3. The method of claim 2, wherein there are a plurality of the master devices, and the obtaining the high water level time of the master device comprises:
acquiring the maximum event time of the processed data of each of the plurality of main devices to obtain a plurality of first maximum event times; and
and taking the maximum value of the first maximum event times as the high water level time.
4. A method according to claim 2 or 3, wherein there are a plurality of said spare devices, said obtaining a low water level time of said spare devices comprising:
acquiring the maximum event time of the processed data of each standby device in the plurality of standby devices to obtain a plurality of second maximum event times; and
and taking the minimum value of the second maximum event times as the low water level time.
5. The method of claim 1, wherein the acquiring the first data in the data stream that has been processed by the master device comprises:
determining a storage location of the first data based on the identification of the first data; and
the first data is retrieved from the storage location.
6. The method of claim 5, wherein the determining the storage location of the first data based on the identification of the first data comprises:
A storage location of the first data is determined based on the hash value of the identification of the first data.
7. The method of claim 1, further comprising:
storing the first data to a memory; and
and clearing the first data from the memory in response to the data processing progress of the standby device leading the main device.
8. The method of claim 2, further comprising:
and in response to the low water level time being greater than the high water level time, determining that the data processing progress of the standby device leads the main device.
9. A data processing apparatus for use with a standby device, the standby device and a primary device being configured to process the same data stream, the data processing progress of the standby device lagging behind that of the primary device, the apparatus comprising:
an acquisition module configured to acquire first data in the data stream that has been processed by the master device in response to the master device stopping processing the data stream;
a deduplication module configured to deduplicate second data to be processed by the standby device in the data stream based on the first data;
the processing module is configured to process the second data after the duplication removal; and
And the main-standby switching module is configured to respond to the fact that the data processing progress of the standby device is ahead of that of the main device, and the standby device is used as the main device for processing the data stream.
10. The apparatus of claim 9, wherein the data stream comprises a plurality of pieces of data, each piece of data of the plurality of pieces of data comprising an event time, the acquisition module comprising:
a first acquisition unit configured to acquire a high water level time of the master device, wherein the high water level time indicates a maximum value of event times of data processed by the master device;
a second acquisition unit configured to acquire a low water level time of the standby device, wherein the low water level time indicates that data of which event time in the data stream is less than or equal to the low water level time has been processed by the standby device; and
a first determination unit configured to take, as the first data, data having an event time greater than or equal to the low water level time and less than or equal to the high water level time.
11. The apparatus of claim 10, wherein the master device has a plurality, the first acquisition unit is further configured to:
Acquiring the maximum event time of the processed data of each of the plurality of main devices to obtain a plurality of first maximum event times; and
and taking the maximum value of the first maximum event times as the high water level time.
12. The apparatus of claim 10 or 11, wherein the backup device is a plurality of, the second acquisition unit is further configured to:
acquiring the maximum event time of the processed data of each standby device in the plurality of standby devices to obtain a plurality of second maximum event times; and
and taking the minimum value of the second maximum event times as the low water level time.
13. The apparatus of claim 9, wherein the acquisition module comprises:
a second determination unit configured to determine a storage location of the first data based on an identification of the first data; and
and a third acquisition unit configured to acquire the first data from the storage location.
14. The apparatus of claim 13, wherein the second determination unit is further configured to:
a storage location of the first data is determined based on the hash value of the identification of the first data.
15. The apparatus of claim 9, further comprising:
a storage module configured to store the first data to a memory; and
and the clearing module is configured to clear the first data from the memory in response to the data processing progress of the standby device leading the main device.
16. The apparatus of claim 10, further comprising:
and a determination module configured to determine that the data processing schedule of the standby device leads the main device in response to the low water level time being greater than the high water level time.
17. A data processing system comprising a primary device and a backup device for processing the same data stream, the backup device having a data processing schedule that lags behind the primary device, wherein,
the master device is configured to:
stopping processing the data stream in response to the data processing system requiring active-standby switching;
the standby device is configured to:
responsive to the master device ceasing to process the data stream, obtaining first data in the data stream that has been processed by the master device;
based on the first data, de-duplicating second data to be processed by the standby equipment in the data stream;
Processing the second data after the duplication removal; and
and in response to the data processing progress of the standby device leading the main device, using the standby device as the main device for processing the data stream.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
19. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-8.
CN202211305396.1A 2022-10-24 2022-10-24 Data processing method, device, electronic equipment and storage medium Active CN115629918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211305396.1A CN115629918B (en) 2022-10-24 2022-10-24 Data processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211305396.1A CN115629918B (en) 2022-10-24 2022-10-24 Data processing method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115629918A CN115629918A (en) 2023-01-20
CN115629918B true CN115629918B (en) 2023-06-27

Family

ID=84906343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211305396.1A Active CN115629918B (en) 2022-10-24 2022-10-24 Data processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115629918B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106031130A (en) * 2014-02-19 2016-10-12 第三雷沃通讯有限责任公司 Content delivery network architecture with edge proxy
CN113377809A (en) * 2021-06-23 2021-09-10 北京百度网讯科技有限公司 Data processing method and apparatus, computing device, and medium
CN113568938A (en) * 2021-08-04 2021-10-29 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003245225A1 (en) * 2002-07-19 2004-02-09 Xelerated Ab Method and apparatus for pipelined processing of data packets
US8214517B2 (en) * 2006-12-01 2012-07-03 Nec Laboratories America, Inc. Methods and systems for quick and efficient data management and/or processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106031130A (en) * 2014-02-19 2016-10-12 第三雷沃通讯有限责任公司 Content delivery network architecture with edge proxy
CN113377809A (en) * 2021-06-23 2021-09-10 北京百度网讯科技有限公司 Data processing method and apparatus, computing device, and medium
CN113568938A (en) * 2021-08-04 2021-10-29 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115629918A (en) 2023-01-20

Similar Documents

Publication Publication Date Title
JP6865219B2 (en) Event batch processing, output sequencing, and log-based state storage in continuous query processing
EP3404899B1 (en) Adaptive computation and faster computer operation
US20170024293A1 (en) Automatic serial starting of resource groups on failover
JP6325001B2 (en) Method and system using recursive event listeners in nodes of hierarchical data structures
EP2819015B1 (en) Method, terminal, and server for synchronizing terminal mirror
US20170083419A1 (en) Data management method, node, and system for database cluster
CN113193947B (en) Method, apparatus, medium, and program product for implementing distributed global ordering
CN113364877A (en) Data processing method, device, electronic equipment and medium
CN113778644A (en) Task processing method, device, equipment and storage medium
CN111782341B (en) Method and device for managing clusters
CN111078418B (en) Operation synchronization method, device, electronic equipment and computer readable storage medium
CN115629918B (en) Data processing method, device, electronic equipment and storage medium
US9703646B2 (en) Centralized database system
CN113448770A (en) Method, electronic device and computer program product for recovering data
CN113961641A (en) Database synchronization method, device, equipment and storage medium
CN113821232A (en) Model updating method and device
CN112817701A (en) Timer processing method and device, electronic equipment and computer readable medium
CN117395263B (en) Data synchronization method, device, equipment and storage medium
CN112463514A (en) Monitoring method and device for distributed cache cluster
CN114546705B (en) Operation response method, operation response device, electronic apparatus, and storage medium
CN111258954B (en) Data migration method, device, equipment and storage medium
CN115510036A (en) Data migration method, device, equipment and storage medium
CN115712679A (en) Data processing system, method, electronic equipment and storage medium
CN116841948A (en) Method and device for data transmission between Central Processing Units (CPUs) and electronic equipment
CN115373894A (en) Data recovery method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant