CN115442272B - Method, device, equipment and storage medium for detecting lost data - Google Patents

Method, device, equipment and storage medium for detecting lost data Download PDF

Info

Publication number
CN115442272B
CN115442272B CN202210974434.6A CN202210974434A CN115442272B CN 115442272 B CN115442272 B CN 115442272B CN 202210974434 A CN202210974434 A CN 202210974434A CN 115442272 B CN115442272 B CN 115442272B
Authority
CN
China
Prior art keywords
data
compared
pieces
synchronized
time stamp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210974434.6A
Other languages
Chinese (zh)
Other versions
CN115442272A (en
Inventor
喻捷
韩韬
苗浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202210974434.6A priority Critical patent/CN115442272B/en
Publication of CN115442272A publication Critical patent/CN115442272A/en
Application granted granted Critical
Publication of CN115442272B publication Critical patent/CN115442272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application provides a method, a device, equipment and a storage medium for detecting lost data, which relate to the technical field of big data, and the method comprises the following steps: the data synchronization device acquires M pieces of data to be synchronized in a preset period from the source data center, takes the M pieces of data to be compared as first data to be compared, and divides the acquired M pieces of first data to be compared into N first data groups according to time stamp information. P pieces of data to be synchronized in a preset period are obtained from a destination data center and used as second data to be compared, and the obtained P pieces of second data to be compared are divided into Q second data groups according to time stamp information. For any time stamp information, the data synchronization device compares the first data set with the time stamp information with the second data set with the time stamp information to determine whether the first data to be compared in the first data set with the time stamp information is lost in the destination data center, and whether the data is lost in the data synchronization process can be accurately determined in real time.

Description

Method, device, equipment and storage medium for detecting lost data
Technical Field
The embodiment of the invention relates to the technical field of big data, in particular to a method, a device, equipment and a storage medium for detecting lost data.
Background
With the rapid development of internet technology, the scale of a service system is larger and larger, and in order to improve the disaster recovery capability of the service system, service data required by the service system is generally backed up and stored in different data centers respectively. When data backup is performed by different data centers, data synchronization is generally realized by using Kafka. However, in the data synchronization process, when a network failure or a hard disk failure occurs in a data center, a data loss problem easily occurs, and at present, kafka cannot determine whether data loss occurs or not, so that accuracy of data synchronization cannot be guaranteed.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for detecting lost data, which are used for ensuring the accuracy of data synchronization.
In one aspect, an embodiment of the present application provides a method for detecting lost data, including:
the data synchronization device acquires M pieces of data to be synchronized in a preset period from a source-side data center, takes the M pieces of data to be compared as first data to be compared, and divides the acquired M pieces of first data to be compared into N first data groups according to time stamp information; wherein M >0,0< n < = M;
The data synchronization device acquires P pieces of data to be synchronized in the preset period from a destination data center, takes the P pieces of data to be compared as second data to be compared, and divides the acquired P pieces of second data to be compared into Q second data groups according to time stamp information; wherein P >0,0< q < = P;
For any time stamp information, the data synchronization device compares the first data set with the time stamp information with the second data set with the time stamp information, and determines whether first data to be compared in the first data set with the time stamp information is lost in the destination data center.
Optionally, the method further comprises:
And the connector in the data synchronization device acquires a plurality of data to be synchronized from the source data center and sends the data to be synchronized to the destination data center.
Optionally, the connector in the data synchronization device acquires a plurality of data to be synchronized from the source data center, and sends the plurality of data to be synchronized to the destination data center, including:
the consumption thread group in the data synchronization device acquires the plurality of data to be synchronized from the source data center and sends the plurality of data to be synchronized to a production thread in the data synchronization device;
And the production thread in the data synchronization device sends the data to be synchronized to the destination data center.
Optionally, the data synchronization device acquires M pieces of data to be synchronized in a preset period from a source data center, and divides the acquired M pieces of data to be compared into N first data groups according to timestamp information, including:
The first detection module of the data synchronization device pulls M pieces of data to be synchronized in the preset period from the plurality of pieces of data to be synchronized according to timestamp information contained in the plurality of pieces of data to be synchronized acquired by the consumption thread group, and the M pieces of data to be synchronized are used as first data to be compared;
The first detection module of the data synchronization device divides the M first data to be compared into N first data groups according to the timestamp information in the M first data to be compared, and sends the N first data groups to the monitoring acquisition module of the data synchronization device.
Optionally, the data synchronization device acquires P pieces of data to be synchronized in the preset period from a destination data center, and uses the P pieces of data to be compared as second data to be compared, and divides the acquired P pieces of second data to be compared into Q second data groups according to timestamp information, including:
the second detection module of the data synchronization device pulls P pieces of data to be synchronized in the preset period from the plurality of pieces of data to be synchronized according to timestamp information contained in the plurality of pieces of data to be synchronized acquired by the production thread, and the P pieces of data to be synchronized are used as second data to be compared;
The second detection module of the data synchronization device divides the P second data to be compared into Q second data groups according to the timestamp information in the P second data to be compared, and sends the Q second data groups to the monitoring acquisition module of the data synchronization device.
Optionally, the data synchronization device compares the first data set with the timestamp information with the second data set with the timestamp information, and determines whether the first data to be compared in the first data set has data loss in the destination data center, including:
The monitoring acquisition module of the data synchronization device judges the quantity of first data to be compared in the first data set and the quantity of second data to be compared in the second data set;
If the number of the first data to be compared in the first data group is larger than the number of the second data to be compared in the second data group, a monitoring acquisition module of the data synchronization device determines that the first data to be compared in the first data group is lost in the destination data center, and sends a data retransmission message to the source data center, wherein the data retransmission message comprises timestamp information corresponding to the first data group.
Optionally, the method further comprises:
For any time stamp information, if a first data group corresponding to the time stamp information exists and a second data group corresponding to the time stamp information does not exist, the data synchronization device determines that first data to be compared in the first data group with the time stamp information is lost in the destination data center, and sends a data retransmission message to the source data center, wherein the data retransmission message comprises the time stamp information corresponding to the first data group with the time stamp information.
In one aspect, an embodiment of the present application provides a data synchronization device, including:
the first detection module is used for acquiring M pieces of data to be synchronized in a preset period from the source-end data center, serving as first data to be compared, and dividing the acquired M pieces of first data to be compared into N first data sets according to time stamp information; wherein M >0,0< n < = M;
The second detection module is used for acquiring P pieces of data to be synchronized in the preset period from the destination data center, serving as second data to be compared, and dividing the acquired P pieces of second data to be compared into Q second data groups according to time stamp information; wherein P >0,0< q < = P;
For any time stamp information, the monitoring acquisition module is used for comparing the first data set with the time stamp information with the second data set with the time stamp information to determine whether the first data to be compared in the first data set with the time stamp information is lost in the destination data center.
Optionally, the system further comprises a synchronization module, wherein the synchronization module is specifically configured to:
And the connector in the data synchronization device acquires a plurality of data to be synchronized from the source data center and sends the data to be synchronized to the destination data center.
Optionally, the synchronization module is specifically configured to:
the consumption thread group in the data synchronization device acquires the plurality of data to be synchronized from the source data center and sends the plurality of data to be synchronized to a production thread in the data synchronization device;
And the production thread in the data synchronization device sends the data to be synchronized to the destination data center.
Optionally, the first detection module is specifically configured to:
The first detection module of the data synchronization device pulls M pieces of data to be synchronized in the preset period from the plurality of pieces of data to be synchronized according to timestamp information contained in the plurality of pieces of data to be synchronized acquired by the consumption thread group, and the M pieces of data to be synchronized are used as first data to be compared;
The first detection module of the data synchronization device divides the M first data to be compared into N first data groups according to the timestamp information in the M first data to be compared, and sends the N first data groups to the monitoring acquisition module of the data synchronization device.
Optionally, the second detection module is specifically configured to:
the second detection module of the data synchronization device pulls P pieces of data to be synchronized in the preset period from the plurality of pieces of data to be synchronized according to timestamp information contained in the plurality of pieces of data to be synchronized acquired by the production thread, and the P pieces of data to be synchronized are used as second data to be compared;
The second detection module of the data synchronization device divides the P second data to be compared into Q second data groups according to the timestamp information in the P second data to be compared, and sends the Q second data groups to the monitoring acquisition module of the data synchronization device.
Optionally, the monitoring and collecting module is specifically configured to:
The monitoring acquisition module of the data synchronization device judges the quantity of first data to be compared in the first data set and the quantity of second data to be compared in the second data set;
If the number of the first data to be compared in the first data group is larger than the number of the second data to be compared in the second data group, a monitoring acquisition module of the data synchronization device determines that the first data to be compared in the first data group is lost in the destination data center, and sends a data retransmission message to the source data center, wherein the data retransmission message comprises timestamp information corresponding to the first data group.
Optionally, the monitoring acquisition module is further configured to:
For any time stamp information, if a first data group corresponding to the time stamp information exists and a second data group corresponding to the time stamp information does not exist, the data synchronization device determines that first data to be compared in the first data group with the time stamp information is lost in the destination data center, and sends a data retransmission message to the source data center, wherein the data retransmission message comprises the time stamp information corresponding to the first data group with the time stamp information.
In one aspect, an embodiment of the present application provides a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for detecting lost data described above when the program is executed.
In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program executable by a computer device, which when run on the computer device, causes the computer device to perform the steps of the method of detecting lost data described above.
In the embodiment of the application, a data synchronization device acquires M pieces of data to be synchronized in a preset period from a source data center as first data to be compared, and divides the acquired M pieces of first data to be compared into N first data groups according to time stamp information. The data synchronization device acquires P pieces of data to be synchronized in a preset period from the destination data center, takes the P pieces of data to be compared as second data to be compared, and divides the acquired P pieces of second data to be compared into Q second data groups according to time stamp information. For any time stamp information, the data synchronization device compares the first data set with the time stamp information with the second data set with the time stamp information to determine whether the first data to be compared in the first data set with the time stamp information is lost in the destination data center, and whether the data is lost in the data synchronization process can be accurately determined in real time due to the fact that real-time statistics is carried out on the first data to be compared in the first data set and the second data to be compared in the second data set, so that the accuracy of data synchronization is guaranteed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;
Fig. 2 is a schematic structural diagram of a source-side data center according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data synchronization device according to an embodiment of the present application;
fig. 4 is a schematic flow chart of a data synchronization method according to an embodiment of the present application;
Fig. 5 is a schematic structural diagram of a data synchronization device according to an embodiment of the present application;
fig. 6 is a schematic flow chart of a data synchronization method according to an embodiment of the present application;
Fig. 7 is a flowchart of a method for detecting lost data according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a first data set according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a second data set according to an embodiment of the present application;
Fig. 10 is a flowchart of a method for detecting lost data according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an apparatus for detecting lost data according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantageous effects of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, a system architecture diagram for detecting lost data according to an embodiment of the present application includes at least a terminal device 101 and a data synchronization device 102.
The terminal device 101 is installed with a target application for detecting lost data, which may be a pre-installed client, web page application, or applet embedded in other applications, or the like. The terminal device 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like.
The data synchronization device 102 serves the target application as a background server for the target application. The data synchronization device 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content distribution network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform.
The terminal device 101 and the data synchronization device 102 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
The terminal device 101 transmits a lost data detection instruction to the data synchronizing apparatus 102 in response to a lost data detection operation by the user. The data synchronization device 102 acquires M pieces of data to be synchronized in a preset period from a source data center, takes the M pieces of data to be compared as first data to be compared, and divides the acquired M pieces of first data to be compared into N first data groups according to time stamp information; wherein M >0,0< n < = M; meanwhile, the data synchronization device 102 acquires P pieces of data to be synchronized in a preset period from the destination data center, and uses the P pieces of data to be compared as second data to be compared, and divides the acquired P pieces of second data to be compared into Q second data groups according to time stamp information; wherein P >0,0< q < = P; finally, for any of the time stamp information, the data synchronization device 102 compares the first data set with the time stamp information with the second data set with the time stamp information, and determines whether the first data to be compared in the first data set with the time stamp information has data loss in the destination data center.
Based on the system architecture diagram shown in fig. 1, the data synchronization device 102 in fig. 1 performs data synchronization between the source data center and the destination data center, where the data synchronization device may be Kafka, etc., and is not limited herein.
The source data center stores a plurality of data to be synchronized. In units of topics (topics), a source data center divides a plurality of data to be synchronized into different topics. And dividing the data to be synchronized corresponding to each theme into a plurality of partitions by taking the partition as a unit to obtain a plurality of theme partitions. Wherein the data to be synchronized is service data.
The source data center also stores a plurality of metadata and a plurality of consumption states. Each topic corresponds to metadata and a consumption state, wherein the metadata comprises the size and the like of each topic partition corresponding to the topic, and the consumption state comprises the partition position of each topic partition corresponding to the topic.
For example, the source data center stores 4 data to be synchronized, which are data to be synchronized 1, data to be synchronized 2, data to be synchronized 3, and data to be synchronized 4. The 4 data to be synchronized are divided into 3 topics, namely topic A, topic B and topic C, as shown in FIG. 2. Wherein, the theme A includes the data 1 to be synchronized and the data 2 to be synchronized, the theme B includes the data 3 to be synchronized, and the theme C includes the data 4 to be synchronized.
For the topic a, the topic a is divided into 2 topic partitions, namely a topic partition A1 and a topic partition A2. The topic partition A1 comprises data to be synchronized 1, and the topic partition A2 comprises data to be synchronized 2.
The division of the subject B and the subject C is similar to the division of the subject a, and will not be described here.
The structure of the data synchronization device is shown in fig. 3, and the data synchronization device comprises a controller, a monitoring collector and a plurality of connectors.
The controller is used for managing topics, such as creating topics, deleting topics and adding topic partitions.
The monitoring collector is used for monitoring the data synchronization device, including the machine load, CPU utilization rate, memory utilization rate, disk I/O utilization rate and the like of the data synchronization device. The external monitoring platform acquires each monitoring index acquired by the monitoring collector through JMX (Java Management Extensions, java management extension).
Connectors include both the admin connector and the data connector types. The connector in fig. 3 includes an admin connector, a data connector 1 and a data connector 2. The data synchronization is carried out in an admin+admin mode in the admin connector. The admin connector acquires metadata and consumption states from the source-side data center and sends the metadata and the consumption states to the destination-side data center to realize data synchronization.
The data connector internally adopts a mode of consuming threads and producing threads to synchronize data. The data connector acquires a plurality of data to be synchronized from the source data center and sends the data to be synchronized to the destination data center to realize data synchronization.
Specifically, since a plurality of data to be synchronized are divided into different topics, each topic is divided into a plurality of topic partitions, the data connector acquires the plurality of topic partitions from the source data center and transmits the plurality of topic partitions to the destination data center. Because the data synchronization device supports the multi-task cooperative work, each theme partition belonging to the same theme can be distributed to different connectors for data synchronization, and the data synchronization efficiency is improved.
As shown in fig. 4, the source data center includes a topic a, which is divided into a topic partition A1 and a topic partition A2. The data connector 1 acquires the subject partition A1 from the source data center and transmits the subject partition A1 to the destination data center, and the data connector 2 acquires the subject partition A2 from the source data center and transmits the subject partition A2 to the destination data center.
The source data center further comprises metadata A and consumption state A corresponding to the theme A, and an admin connector in the data synchronization device acquires the metadata A and the consumption state A from the source data center and sends the metadata A and the consumption state A to the destination data center.
Because the data connectors are internally in a 'consuming thread+producing thread' mode, different data connectors comprise different consuming threads, consuming threads in different data connectors form a consuming thread group, and different data connectors use the same producing thread. The specific structure of the data connector 1 and the data connector 2 in fig. 4 is as shown in fig. 5, the data connector 1 includes a consuming thread 1 and a producing thread, and the data connector 2 includes a consuming thread 2 and a producing thread. Consuming thread 1 and consuming thread 2 constitute a consuming thread group.
The data connector acquires a plurality of data to be synchronized from the source data center and sends the data to be synchronized to the destination data center, and the method specifically comprises the following steps:
A consumption thread group in the data connector acquires a plurality of data to be synchronized from a source data center and sends the data to be synchronized to a production thread in the data connector; and the production thread in the data connector sends the plurality of data to be synchronized to the destination data center.
Specifically, a consuming thread group in the data connector obtains a plurality of subject partitions from a source data center and sends the plurality of subject partitions to a production thread in the data connector. The production thread in the data connector then sends the plurality of subject partitions to the destination data center.
For example, as shown in fig. 6, the source data center includes a topic a, which is divided into a topic partition A1 and a topic partition A2. The data connector 1 comprises a consuming thread 1 and a producing thread, and the data connector 2 comprises a consuming thread 2 and a producing thread. The consumption thread 1 acquires the theme partition A1 from the source-end data center, sends the theme partition A1 to the production thread, and the production thread sends the theme partition A1 to the destination-end data center. The consumption thread 2 acquires the theme partition A2 from the source-end data center, sends the theme partition A2 to the production thread, and the production thread sends the theme partition A2 to the destination-end data center.
In the embodiment of the application, the data synchronization device realizes the data synchronization of the source data center and the destination data center based on a plurality of connectors, and because each topic partition belonging to the same topic can be distributed to different connectors for data synchronization, the data synchronization efficiency is improved.
An embodiment of the present application provides a flow of a method for detecting lost data, as shown in fig. 7, where the flow of the method is performed by the data synchronization device 102 shown in fig. 1, and includes the following steps:
Step S701, a data synchronization device acquires M pieces of data to be synchronized in a preset period from a source data center, and the M pieces of data to be compared are used as first data to be compared, and the acquired M pieces of first data to be compared are divided into N first data groups according to time stamp information; wherein M >0,0< n < = M.
Specifically, the first detection module of the data synchronization device pulls M pieces of data to be synchronized in a preset period from the multiple pieces of data to be synchronized according to timestamp information contained in the multiple pieces of data to be synchronized acquired by the consumption thread group, and the M pieces of data to be synchronized are used as first data to be compared. The first detection module divides the M first data to be compared into N first data groups according to the timestamp information in the M first data to be compared, and sends the N first data groups to the monitoring acquisition module of the data synchronization device.
The time stamp information in the data to be synchronized acquired by the consumption thread group in advance is earlier than the time stamp information in the data to be synchronized acquired later.
The preset time period is determined based on the time stamp information, not based on the machine time of the data synchronizing device, and consistency of the first data to be compared and the second data to be compared in the preset time period is ensured.
The preset time period is a plurality of incremental time periods, for example, the preset time period is 0-30, 30-60 and the like, so that the data synchronization device is ensured to continuously detect whether the data loss problem occurs.
Step S702, the data synchronization device acquires P pieces of data to be synchronized in a preset period from a destination data center, and the P pieces of data to be compared are used as second data to be compared, and the acquired P pieces of second data to be compared are divided into Q second data groups according to time stamp information; wherein P >0,0< q < = P.
Specifically, the second detection module of the data synchronization device pulls the P pieces of data to be synchronized in the preset period from the plurality of pieces of data to be synchronized as second data to be compared according to timestamp information contained in the plurality of pieces of data to be synchronized obtained by the production thread. The second detection module divides the P second data to be compared into Q second data groups according to the timestamp information in the P second data to be compared, and sends the Q second data groups to the monitoring acquisition module of the data synchronization device.
The above-described step S701 and step S702 may be performed in parallel.
In step S703, for any one of the time stamp information, the data synchronization device compares the first data set with the time stamp information with the second data set with the time stamp information, and determines whether the first data to be compared in the first data set with the time stamp information is lost in the destination data center.
Specifically, for any time stamp information, the monitoring and collecting module of the data synchronization device judges the number of first data to be compared in the first data group and the number of second data to be compared in the second data group, if the number of the first data to be compared in the first data group is larger than the number of the second data to be compared in the second data group, the monitoring and collecting module of the data synchronization device determines that the first data to be compared in the first data group is lost in the destination data center and sends a data retransmission message to the source data center, wherein the data retransmission message comprises the time stamp information corresponding to the first data group.
In the application, the data retransmission message enables the source-end data center to retransmit the first data to be compared corresponding to the corresponding time stamp information, thereby improving the accuracy of data synchronization.
If the number of the first data to be compared in the first data group is equal to the number of the second data to be compared in the second data group, the monitoring and collecting module of the data synchronizing device determines that the first data to be compared in the first data group is not lost in the data center of the destination end, and does not perform any processing.
Optionally, when the monitoring and collecting module of the data synchronizing device determines that the first data to be compared in the first data group appears in the destination data center, the monitoring and collecting module of the data synchronizing device issues timestamp information corresponding to the first data group and the data loss number of the first data group in the form of monitoring data, so that data loss reminding is conveniently performed. The monitoring and collecting module of the data synchronization device supports various external release modes, such as JMX, and the like, and is not limited herein.
For example, the first detection module of the data synchronization device obtains 1230 first data to be compared, which includes 4 kinds of timestamp information, 1651810, 1652810, 1653810, and 1654810 respectively. As shown in fig. 8, the first detection module divides 1230 first data to be compared into 4 first data sets based on 4 kinds of time stamp information, and transmits the 4 first data sets to the monitoring acquisition module of the data synchronization device.
Wherein the 4 first data sets are a first data set 1, a first data set 2, a first data set 3 and a first data set 4, respectively. The corresponding time stamp information of the first data set 1 is 1651810, and the number of first data to be compared in the first data set 1 is 300; the corresponding time stamp information of the first data set 2 is 1652810, and the number of first data to be compared in the first data set 2 is 320; the corresponding time stamp information of the first data group 3 is 1653810, and the number of first data to be compared in the first data group 3 is 310; the time stamp information corresponding to the first data set 4 is 1654810, and the number of first data to be compared in the first data set 4 is 300.
The second detection module of the data synchronization device obtains 1228 second data to be compared, where the second data to be compared includes 4 kinds of timestamp information, which are 1651810, 1652810, 1653810, and 1654810, respectively. As shown in fig. 9, the second detection module divides 1228 second data to be compared into 4 second data groups based on 4 kinds of time stamp information, and sends the 4 second data groups to the monitoring acquisition module of the data synchronization device.
Wherein the 4 second data sets are respectively a second data set 1, a second data set 2, a second data set 3 and a second data set 4. The corresponding time stamp information of the second data set 1 is 1651810, and the number of second data to be compared in the second data set 1 is 300; the corresponding time stamp information of the second data set 2 is 1652810, and the number of second data to be compared in the second data set 2 is 320; the corresponding time stamp information of the second data group 3 is 1653810, and the number of second data to be compared in the second data group 3 is 310; the time stamp information corresponding to the second data set 4 is 1654810, and the number of second data to be compared in the second data set 4 is 298.
For the time stamp information 1651810, the number of the first data to be compared in the first data set 1 corresponding to the time stamp information 1651810 is 300, and the number of the second data to be compared in the second data set 1 corresponding to the time stamp information 1651810 is 300, which are equal, so that the first data to be compared in the first data set 1 is not lost in the destination data center, and no processing is performed.
The processing of the timestamp information 1652810 and the timestamp information 1653810 is similar to the processing of the timestamp information 1651810 described above, and will not be described in detail herein.
For the timestamp information 1654810, since the number of the first data to be compared in the first data set4 corresponding to the timestamp information 1654810 is 300, and the number of the second data to be compared in the second data set4 corresponding to the timestamp information 1654810 is 298, 300 is greater than 298, the first data to be compared in the first data set4 is lost in the destination data center, and the monitoring acquisition module sends a data retransmission message to the source data center. Wherein the data retransmission message includes time stamp information 1654810.
Optionally, for any time stamp information, if there is a first data set corresponding to the time stamp information and there is no second data set corresponding to the time stamp information, the data synchronization device determines that data loss occurs in the destination data center of the first data to be compared in the first data set with the time stamp information, and sends a data retransmission message to the source data center, where the data retransmission message includes the time stamp information corresponding to the first data set with the time stamp information.
In the embodiment of the application, a data synchronization device acquires M pieces of data to be synchronized in a preset period from a source data center as first data to be compared, and divides the acquired M pieces of first data to be compared into N first data groups according to time stamp information. The data synchronization device acquires P pieces of data to be synchronized in a preset period from the destination data center, takes the P pieces of data to be compared as second data to be compared, and divides the acquired P pieces of second data to be compared into Q second data groups according to time stamp information. For any time stamp information, the data synchronization device compares the first data set with the time stamp information with the second data set with the time stamp information to determine whether the first data to be compared in the first data set with the time stamp information is lost in the destination data center, and whether the data is lost in the data synchronization process can be accurately determined in real time due to the fact that real-time statistics is carried out on the first data to be compared in the first data set and the second data to be compared in the second data set, so that the accuracy of data synchronization is guaranteed.
In order to better explain the embodiment of the present application, taking a specific implementation scenario as an example, a flow of a method for detecting lost data provided by the embodiment of the present application is described, where the method is performed by the data synchronization device 102 in fig. 1, as shown in fig. 10, and includes the following steps:
In step S1001, the first detection module of the data synchronization device pulls M pieces of data to be synchronized in a preset period from the multiple pieces of data to be synchronized according to the timestamp information included in the multiple pieces of data to be synchronized acquired by the consumption thread group, and uses the M pieces of data to be synchronized as the first data to be compared.
In step S1002, the first detection module divides the M first data to be compared into N first data sets according to the timestamp information in the M first data to be compared, and sends the N first data sets to the monitoring acquisition module of the data synchronization device.
In step S1003, the second detection module of the data synchronization device pulls P pieces of data to be synchronized in the preset period from the multiple pieces of data to be synchronized according to the timestamp information included in the multiple pieces of data to be synchronized acquired by the production thread, and the P pieces of data to be synchronized are used as the second data to be compared.
In step S1004, the second detection module divides the P second data to be compared into Q second data sets according to the timestamp information in the P second data to be compared, and sends the Q second data sets to the monitoring acquisition module of the data synchronization device.
The execution sequence of step S1001 and step S1003 is partially sequential.
Step S1005, for any time stamp information, the monitoring and collecting module of the data synchronization device determines whether the number of the first data to be compared in the first data set is greater than the number of the second data to be compared in the second data set, if yes, step S1006 is executed; otherwise, ending.
In step S1006, the monitoring and collecting module of the data synchronization device determines that the first data to be compared in the first data set is lost in the destination data center, and sends a data retransmission message to the source data center.
In the embodiment of the application, because the first data to be compared in the first data set and the second data to be compared in the second data set are counted in real time, whether the data is lost in the data synchronization process can be accurately determined in real time. The data retransmission message enables the source-side data center to retransmit the first data to be compared corresponding to the corresponding time stamp information, and accuracy of data synchronization is improved.
Based on the same technical concept, an embodiment of the present application provides a data synchronization device, as shown in fig. 11, the data synchronization device 1100 includes:
a first detection module 1101, configured to acquire M pieces of data to be synchronized in a preset period from a source data center, as first pieces of data to be compared, and divide the acquired M pieces of first data to be compared into N first data groups according to time stamp information; wherein M >0,0< n < = M;
The second detection module 1102 is configured to obtain P pieces of data to be synchronized in the preset period from the destination data center, as second data to be compared, and divide the obtained P pieces of second data to be compared into Q second data sets according to the timestamp information; wherein P >0,0< q < = P;
for any time stamp information, the monitoring and collecting module 1103 is configured to compare the first data set with the time stamp information with the second data set with the time stamp information, and determine whether the first data to be compared in the first data set with the time stamp information has data loss in the destination data center.
Optionally, a synchronization module 1104 is further included, and the synchronization module 1104 is specifically configured to:
And the connector in the data synchronization device acquires a plurality of data to be synchronized from the source data center and sends the data to be synchronized to the destination data center.
Optionally, the synchronization module 1104 is specifically configured to:
the consumption thread group in the data synchronization device acquires the plurality of data to be synchronized from the source data center and sends the plurality of data to be synchronized to a production thread in the data synchronization device;
And the production thread in the data synchronization device sends the data to be synchronized to the destination data center.
Optionally, the first detection module 1101 is specifically configured to:
The first detection module of the data synchronization device pulls M pieces of data to be synchronized in the preset period from the plurality of pieces of data to be synchronized according to timestamp information contained in the plurality of pieces of data to be synchronized acquired by the consumption thread group, and the M pieces of data to be synchronized are used as first data to be compared;
The first detection module of the data synchronization device divides the M first data to be compared into N first data groups according to the timestamp information in the M first data to be compared, and sends the N first data groups to the monitoring acquisition module of the data synchronization device.
Optionally, the second detection module 1102 is specifically configured to:
the second detection module of the data synchronization device pulls P pieces of data to be synchronized in the preset period from the plurality of pieces of data to be synchronized according to timestamp information contained in the plurality of pieces of data to be synchronized acquired by the production thread, and the P pieces of data to be synchronized are used as second data to be compared;
The second detection module of the data synchronization device divides the P second data to be compared into Q second data groups according to the timestamp information in the P second data to be compared, and sends the Q second data groups to the monitoring acquisition module of the data synchronization device.
Optionally, the monitoring and collecting module 1103 is specifically configured to:
The monitoring acquisition module of the data synchronization device judges the quantity of first data to be compared in the first data set and the quantity of second data to be compared in the second data set;
If the number of the first data to be compared in the first data group is larger than the number of the second data to be compared in the second data group, a monitoring acquisition module of the data synchronization device determines that the first data to be compared in the first data group is lost in the destination data center, and sends a data retransmission message to the source data center, wherein the data retransmission message comprises timestamp information corresponding to the first data group.
Optionally, the monitoring and collecting module 1103 is further configured to:
For any time stamp information, if a first data group corresponding to the time stamp information exists and a second data group corresponding to the time stamp information does not exist, the data synchronization device determines that first data to be compared in the first data group with the time stamp information is lost in the destination data center, and sends a data retransmission message to the source data center, wherein the data retransmission message comprises the time stamp information corresponding to the first data group with the time stamp information.
Based on the same technical concept, the embodiment of the present application provides a computer device, which may be a terminal or a server, as shown in fig. 12, including at least one processor 1201 and a memory 1202 connected to the at least one processor, where the embodiment of the present application is not limited to a specific connection medium between the processor 1201 and the memory 1202, and in fig. 12, the connection between the processor 1201 and the memory 1202 is exemplified by a bus. The buses may be divided into address buses, data buses, control buses, etc.
In an embodiment of the present application, the memory 1202 stores instructions executable by the at least one processor 1201, and the at least one processor 1201 can perform the steps included in the method for detecting lost data described above by executing the instructions stored in the memory 1202.
Where the processor 1201 is a control center of a computer device, various interfaces and lines may be utilized to connect various portions of the computer device for lost data detection by executing or executing instructions stored in the memory 1202 and invoking data stored in the memory 1202. Alternatively, the processor 1201 may include one or more processing units, and the processor 1201 may integrate an application processor that primarily processes operating systems, user interfaces, application programs, and the like, with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1201. In some embodiments, processor 1201 and memory 1202 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.
The processor 1201 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
Memory 1202 is a non-volatile computer-readable storage medium that can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1202 may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. Memory 1202 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1202 in embodiments of the present application may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data.
Based on the same inventive concept, an embodiment of the present application provides a computer-readable storage medium storing a computer program executable by a computer device, which when run on the computer device causes the computer device to perform the steps of the above-described method of detecting lost data.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A method of detecting lost data, comprising:
The data synchronization device acquires M pieces of data to be synchronized in a preset period from a source-side data center, takes the M pieces of data to be compared as first data to be compared, and divides the acquired M pieces of first data to be compared into N first data groups according to time stamp information; wherein M >0,0< n < = M; the preset period of time includes a plurality of incremental periods of time, the preset period of time being determined based on the timestamp information;
The data synchronization device acquires P pieces of data to be synchronized in the preset period from a destination data center, takes the P pieces of data to be compared as second data to be compared, and divides the acquired P pieces of second data to be compared into Q second data groups according to time stamp information; wherein P >0,0< q < = P;
for any time stamp information, the data synchronization device compares the first data set with the time stamp information with the second data set with the time stamp information, determines whether first data to be compared in the first data set with the time stamp information is lost in the destination data center, and if so, sends a data retransmission message to the source data center, wherein the data retransmission message comprises the time stamp information corresponding to the first data set with the time stamp information.
2. The method as recited in claim 1, further comprising:
And the connector in the data synchronization device acquires a plurality of data to be synchronized from the source data center and sends the data to be synchronized to the destination data center.
3. The method of claim 2, wherein the connector in the data synchronization device obtains a plurality of data to be synchronized from the source data center and sends the plurality of data to be synchronized to the destination data center, comprising:
the consumption thread group in the data synchronization device acquires the plurality of data to be synchronized from the source data center and sends the plurality of data to be synchronized to a production thread in the data synchronization device;
And the production thread in the data synchronization device sends the data to be synchronized to the destination data center.
4. The method of claim 3, wherein the data synchronization device acquires M pieces of data to be synchronized in a preset period from the source data center as first pieces of data to be compared, and divides the acquired M pieces of first data to be compared into N first data groups according to the timestamp information, including:
The first detection module of the data synchronization device pulls M pieces of data to be synchronized in the preset period from the plurality of pieces of data to be synchronized according to timestamp information contained in the plurality of pieces of data to be synchronized acquired by the consumption thread group, and the M pieces of data to be synchronized are used as first data to be compared;
The first detection module of the data synchronization device divides the M first data to be compared into N first data groups according to the timestamp information in the M first data to be compared, and sends the N first data groups to the monitoring acquisition module of the data synchronization device.
5. The method of claim 3, wherein the data synchronization device obtains P pieces of data to be synchronized in the preset period from a destination data center as second pieces of data to be compared, and divides the obtained P pieces of second pieces of data to be compared into Q second data groups according to timestamp information, including:
the second detection module of the data synchronization device pulls P pieces of data to be synchronized in the preset period from the plurality of pieces of data to be synchronized according to timestamp information contained in the plurality of pieces of data to be synchronized acquired by the production thread, and the P pieces of data to be synchronized are used as second data to be compared;
The second detection module of the data synchronization device divides the P second data to be compared into Q second data groups according to the timestamp information in the P second data to be compared, and sends the Q second data groups to the monitoring acquisition module of the data synchronization device.
6. The method of claim 1, wherein the data synchronization device compares a first data set having the timestamp information with a second data set having the timestamp information to determine whether a first data to be compared within the first data set has a data loss at the destination data center, comprising:
The monitoring acquisition module of the data synchronization device judges the quantity of first data to be compared in the first data set and the quantity of second data to be compared in the second data set;
If the number of the first data to be compared in the first data group is larger than the number of the second data to be compared in the second data group, a monitoring acquisition module of the data synchronization device determines that the first data to be compared in the first data group is lost in the destination data center, and sends a data retransmission message to the source data center, wherein the data retransmission message comprises timestamp information corresponding to the first data group.
7. The method as recited in claim 1, further comprising:
For any time stamp information, if a first data group corresponding to the time stamp information exists and a second data group corresponding to the time stamp information does not exist, the data synchronization device determines that data loss of first data to be compared in the first data group with the time stamp information occurs in the destination data center.
8. A data synchronization device, comprising:
The first detection module is used for acquiring M pieces of data to be synchronized in a preset period from the source-end data center, serving as first data to be compared, and dividing the acquired M pieces of first data to be compared into N first data sets according to time stamp information; wherein M >0,0< n < = M; the preset period of time includes a plurality of incremental periods of time, the preset period of time being determined based on the timestamp information;
The second detection module is used for acquiring P pieces of data to be synchronized in the preset period from the destination data center, serving as second data to be compared, and dividing the acquired P pieces of second data to be compared into Q second data groups according to time stamp information; wherein P >0,0< q < = P;
For any time stamp information, the monitoring acquisition module is used for comparing a first data set with the time stamp information with a second data set with the time stamp information, determining whether first data to be compared in the first data set with the time stamp information is lost in the destination data center, and if so, sending a data retransmission message to the source data center, wherein the data retransmission message comprises the time stamp information corresponding to the first data set with the time stamp information.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1-7 when the program is executed.
10. A computer readable storage medium, characterized in that it stores a computer program executable by a computer device, which program, when run on the computer device, causes the computer device to perform the steps of the method according to any one of claims 1-7.
CN202210974434.6A 2022-08-15 2022-08-15 Method, device, equipment and storage medium for detecting lost data Active CN115442272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210974434.6A CN115442272B (en) 2022-08-15 2022-08-15 Method, device, equipment and storage medium for detecting lost data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210974434.6A CN115442272B (en) 2022-08-15 2022-08-15 Method, device, equipment and storage medium for detecting lost data

Publications (2)

Publication Number Publication Date
CN115442272A CN115442272A (en) 2022-12-06
CN115442272B true CN115442272B (en) 2024-05-14

Family

ID=84242211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210974434.6A Active CN115442272B (en) 2022-08-15 2022-08-15 Method, device, equipment and storage medium for detecting lost data

Country Status (1)

Country Link
CN (1) CN115442272B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108667680A (en) * 2017-10-30 2018-10-16 上海幻电信息科技有限公司 A kind of monitoring system and method for multilink real time data steaming transfer
CN109766195A (en) * 2018-12-13 2019-05-17 平安普惠企业管理有限公司 The method and Related product of loss of data in supervisory messages queue
CN110674146A (en) * 2019-08-22 2020-01-10 视联动力信息技术股份有限公司 Data synchronization method, synchronization end, end to be synchronized, equipment and storage medium
CN112291093A (en) * 2020-10-29 2021-01-29 迈普通信技术股份有限公司 Network detection method, device, network equipment and network system
CN113067740A (en) * 2020-01-02 2021-07-02 中国移动通信有限公司研究院 Channel associated performance detection method, device, equipment and computer readable storage medium
CN113656503A (en) * 2021-08-20 2021-11-16 北京健康之家科技有限公司 Data synchronization method, device and system and computer readable storage medium
CN114567541A (en) * 2020-11-27 2022-05-31 中兴通讯股份有限公司 Pre-activation detection method, electronic equipment and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10212120B2 (en) * 2016-04-21 2019-02-19 Confluent, Inc. Distributed message queue stream verification

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108667680A (en) * 2017-10-30 2018-10-16 上海幻电信息科技有限公司 A kind of monitoring system and method for multilink real time data steaming transfer
CN109766195A (en) * 2018-12-13 2019-05-17 平安普惠企业管理有限公司 The method and Related product of loss of data in supervisory messages queue
CN110674146A (en) * 2019-08-22 2020-01-10 视联动力信息技术股份有限公司 Data synchronization method, synchronization end, end to be synchronized, equipment and storage medium
CN113067740A (en) * 2020-01-02 2021-07-02 中国移动通信有限公司研究院 Channel associated performance detection method, device, equipment and computer readable storage medium
CN112291093A (en) * 2020-10-29 2021-01-29 迈普通信技术股份有限公司 Network detection method, device, network equipment and network system
CN114567541A (en) * 2020-11-27 2022-05-31 中兴通讯股份有限公司 Pre-activation detection method, electronic equipment and computer readable storage medium
CN113656503A (en) * 2021-08-20 2021-11-16 北京健康之家科技有限公司 Data synchronization method, device and system and computer readable storage medium

Also Published As

Publication number Publication date
CN115442272A (en) 2022-12-06

Similar Documents

Publication Publication Date Title
CN111124277B (en) Deep learning data set caching method, system, terminal and storage medium
US20220239602A1 (en) Scalable leadership election in a multi-processing computing environment
CN112507029B (en) Data processing system and data real-time processing method
CN106982356B (en) Distributed large-scale video stream processing system
CN101997823A (en) Distributed file system and data access method thereof
CN112559475B (en) Data real-time capturing and transmitting method and system
CN110784498B (en) Personalized data disaster tolerance method and device
CN103716384A (en) Method and device for realizing cloud storage data synchronization in cross-data-center manner
CN111552701B (en) Method for determining data consistency in distributed cluster and distributed data system
CN111338834B (en) Data storage method and device
CN114064217B (en) OpenStack-based node virtual machine migration method and device
CN111651631A (en) High-concurrency video data processing method, electronic equipment, storage medium and system
CN107203437B (en) Method, device and system for preventing memory data from being lost
CN111680104A (en) Data synchronization method and device, computer equipment and readable storage medium
Friedman et al. Fast replicated state machines over partitionable networks
CN115442272B (en) Method, device, equipment and storage medium for detecting lost data
EP2902909A1 (en) Distributed storage apparatus, storage node, data provision method and program
CN114500289B (en) Control plane recovery method, device, control node and storage medium
CN115269519A (en) Log detection method and device and electronic equipment
CN113407629A (en) Data synchronization method and device, electronic equipment and storage medium
CN114490881A (en) Synchronous data processing method, device, equipment and storage medium
CN115238006A (en) Retrieval data synchronization method, device, equipment and computer storage medium
CN112948382A (en) Information processing method and device based on big data and related equipment
CN116743589B (en) Cloud host migration method and device and electronic equipment
CN111654410B (en) Gateway request monitoring method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant