CN116628056A - Data checking method and device, electronic equipment and storage medium - Google Patents

Data checking method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116628056A
CN116628056A CN202310495829.2A CN202310495829A CN116628056A CN 116628056 A CN116628056 A CN 116628056A CN 202310495829 A CN202310495829 A CN 202310495829A CN 116628056 A CN116628056 A CN 116628056A
Authority
CN
China
Prior art keywords
message
offset value
service data
file
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310495829.2A
Other languages
Chinese (zh)
Inventor
周鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202310495829.2A priority Critical patent/CN116628056A/en
Publication of CN116628056A publication Critical patent/CN116628056A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data reconciliation method, a device, electronic equipment and a storage medium, and relates to the technical field of data processing, wherein the method comprises the following steps: in response to detecting the target exchange task, reading a first offset value of the first message queue; in response to detecting the end of the target exchange task, reading a second offset value from the first message queue; and checking the object matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value to obtain a checking result. Therefore, the central control node can store the offset values of the first message queues before and after the object exchange (such as data exchange and file exchange), and can check data according to the offset values of the first message queues before and after the object exchange, without storing additional batch fields in the destination end, and the invasiveness to the destination end can be reduced.

Description

Data checking method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data reconciliation method, a data reconciliation device, an electronic device, and a storage medium.
Background
With the continuous development of big data technologies, the requirements for data exchange between databases or for file exchange between file servers are becoming more and more urgent, and ETL (Extract-Transform-Load), a data warehouse technology, used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source to a destination) tools are also being widely used.
For the data exchange scenario, since there is often network isolation between different departments (such as departments of different provinces), data exchange is generally performed by the capability of a central library for the different departments of the network isolation. The accuracy and consistency of data exchange are one of the necessary conditions for ensuring the normal operation of the subsequent service, so that after the data exchange, data checking is needed to be performed across the gateway, namely, the data checking is performed on the service data synchronized in the source database and the destination database across the gateway.
In the related art, after data exchange is completed, business data of a certain batch is extracted from a target database, and the extracted business data is subjected to data checking with the business data of the batch corresponding to the central database.
However, this reconciliation method is more invasive to the destination database, and requires additional lot fields to be stored in the destination database, which are often independent of the actual business.
Disclosure of Invention
The object of the present application is to solve at least to some extent one of the above technical problems.
Therefore, the application provides a data reconciliation method, a device, an electronic device and a storage medium, so as to realize that a central control node stores offset values of first message queues before and after object exchange (such as data exchange and file exchange), and performs data reconciliation according to the offset values of the first message queues before and after the object exchange, without storing additional batch fields in a destination terminal, thereby reducing the invasiveness to the destination terminal.
An embodiment of a first aspect of the present application provides a data reconciliation method, including:
in response to detecting the target exchange task, reading a first offset value of the first message queue; the first offset value is used for indicating a storage position of a last written object in the first message queue; the target exchange task is used for exchanging objects between a source end and a destination end through the first message queue;
in response to detecting that the target exchange task is over, reading a second offset value from the first message queue; the second offset value is used for indicating the storage position of the end object in the target exchange task in the first message queue;
and checking the objects matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value to obtain a checking result.
An embodiment of a second aspect of the present application provides a data reconciliation apparatus, including:
the first reading module is used for responding to the detection of the target exchange task and reading a first offset value of the first message queue; the first offset value is used for indicating a storage position of a last written object in the first message queue; the target exchange task is used for exchanging objects between a source end and a destination end through the first message queue;
A second reading module, configured to read a second offset value from the first message queue in response to detecting that the target exchange task ends; the second offset value is used for indicating the storage position of the end object in the target exchange task in the first message queue;
and the account checking module is used for checking the objects matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value so as to obtain an account checking result.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the data reconciliation method as defined in the first aspect when the program is executed.
An embodiment of a fourth aspect of the present application proposes a non-transitory computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, implements a data reconciliation method as defined in the first aspect.
An embodiment of a fifth aspect of the present application proposes a computer program product comprising a computer program which, when executed by a processor, implements the data reconciliation method of the first aspect of the application as described above.
The technical scheme provided by the embodiment of the application at least has the following beneficial effects:
reading a first offset value of a first message queue when a target exchange task is detected, and reading a second offset value from the first message queue when the target exchange task is detected to be finished; and checking the object matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value to obtain a checking result. Therefore, the central control node can store the offset values of the first message queues before and after data exchange, and can check data according to the offset values of the first message queues before and after object exchange (such as data exchange and file exchange), without storing additional batch fields in the destination end, and the invasiveness to the destination end can be reduced.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
fig. 1 is a flow chart of a data reconciliation method provided by an embodiment of the present application;
Fig. 2 is a flow chart of another data reconciliation method provided in an embodiment of the application;
fig. 3 is a flow chart of another data reconciliation method provided in an embodiment of the application;
fig. 4 is a flow chart of another data reconciliation method provided by an embodiment of the application;
fig. 5 is a flow chart of another data reconciliation method provided by an embodiment of the application;
FIG. 6 is a flowchart of another method for reconciliation of data according to an embodiment of the application;
fig. 7 is a schematic diagram of an architecture of a data reconciliation system provided by an embodiment of the application;
fig. 8 is a schematic diagram of a service data flow provided in an embodiment of the present application;
FIG. 9 is a schematic diagram of a storage format of a file according to the present application;
fig. 10 is a flowchart of another data reconciliation method provided in an embodiment of the application;
FIG. 11 is a flowchart of another method for reconciliation of data according to an embodiment of the application;
FIG. 12 is a schematic diagram of a data reconciliation apparatus in accordance with an embodiment of the application;
fig. 13 is a schematic structural view of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.
Currently, in a data exchange scenario, data reconciliation across gateways may be achieved by:
first, after the data exchange is completed, the data is checked by acquiring the synchronized data amount and comparing the synchronized data amount with the data amount of the lot corresponding to the central repository.
Secondly, business data in the source database and business data in the destination database are respectively extracted to the account checking library, and data account checking is carried out.
Thirdly, extracting business data of a certain batch from the target database, and performing data checking on the extracted business data and the business data of the batch corresponding to the central database.
However, in the first way, only the data amounts are compared to determine whether the service data exchanged in the source database and the destination database are consistent, and it cannot be determined whether there is lost or redundant service data in the destination database.
In the second way, when the data amount or the reconciliation task is more, there is a greater pressure on the reconciliation library disk and performance. If a central library of exchange tasks is used in the actual scenario, execution of the exchange tasks may be affected. In addition, in the context of incremental exchange, since it is impossible to determine which service data is related to the current data exchange task, it is necessary to extract and compare the total service data in the source database with the total service data in the destination database, and there is a certain pressure on the source database, the destination database, or the accounting database.
In the third way, there is intrusion into the destination database, and the business department often does not want to add such an intrusive field, which has no relation with the business.
Aiming at least one problem, the embodiment of the application provides a data reconciliation method, a data reconciliation device and electronic equipment. Before describing the embodiments of the present application in detail, for ease of understanding, the general technical words are first introduced:
the service timestamp refers to a timestamp carried in service data, where the service timestamp may be update time, warehouse-in time (i.e. time of storing in a database) of the service data, or occurrence time of a service event to which the service data belongs.
For example, when the service data is updated, the service timestamp may be the update time (i.e., the latest or last update time) of the service data, and when the service data is not updated, the service timestamp may be the creation time or the warehouse entry time of the service data.
The file timestamp may be an update time or a creation time of the file.
The data reconciliation method provided by the application is described in detail below with reference to fig. 1.
Fig. 1 is a flow chart of a data reconciliation method according to an embodiment of the application.
The data reconciliation method of the embodiment of the application can be applied to a central control node (central control node) or a management node.
As shown in fig. 1, the data reconciliation method includes the steps of:
step S101, in response to detection of a target exchange task, reading a first offset value of a first message queue; the first offset value is used for indicating a storage position of an object which is written last in the first message queue.
In the embodiment of the application, the target exchange task is used for exchanging objects between the source end and the destination end through the first message queue. That is, the target exchange task is used to write the object matched with the target exchange task in the source end into the first message queue, and write the object matched with the target exchange task in the first message queue into the destination end.
Wherein the object may include, but is not limited to, business data, files, and the like.
In the embodiment of the present application, the first message queue may be, for example, a Kafka (a high throughput distributed publish-subscribe message system) message queue.
In the embodiment of the present application, an offset value offset of the first message queue (referred to as a first offset value in the present application) is used to indicate a storage location in the first message queue of a last currently written object in the first message queue. For example, the first offset value is marked as offset 1
It should be noted that, after each object is stored in or written into the first message queue, the object has a corresponding offset value offset in the first message queue, where the offset value is used to indicate a storage location of the object in the first message queue.
For example, the object is taken as service data to perform an exemplary description, the offset of the service data 1 is 0, the offset of the service data 2 is 1, and the offset of the service data 3 is 2, which indicates that the service data 1 is the first service data written in the first message queue, the service data 2 is the second service data written in the first message queue, and the service data 3 is the third service data written in the first message queue.
In the embodiment of the application, when the object (such as service data and file) in the source end is detected to be updated, a target exchange task may be generated, where the target exchange task is used to indicate the updated object in the source end, or the target exchange task may be triggered manually, for example, the target exchange task may carry a query period, where the target exchange task is used to indicate the object in the source end whose timestamp (such as service timestamp and file timestamp) is in the query period, or the target exchange task may also be used to indicate all the objects in the source end, etc., where the generating manner of the target exchange task is not limited by the application.
In the embodiment of the present application, when a target exchange task is detected, the central control node may send the target exchange task to the exchange node corresponding to the source end (denoted as the first exchange node in the present application), and correspondingly, after receiving the target exchange task, the first exchange node may obtain, in response to the target exchange task, an object matching the target exchange task from the source end, and sort the objects matching the target exchange task according to a timestamp (such as a service timestamp and a file timestamp) of the object matching the target exchange task, for example, sort the objects from small to large according to a value of the timestamp, and sequentially write the sorted objects into the first message queue.
In the embodiment of the application, the central control node can also send the target exchange task to the second exchange node corresponding to the destination end, and correspondingly, the second exchange node can respond to the target exchange task after receiving the target exchange task, read the object matched with the target exchange task from the first message queue and write the object into the destination end, thereby realizing the exchange of the object between the source end and the destination end through the first message queue.
In the embodiment of the application, when the target exchange task is detected, the central control node can also acquire or read the first offset value of the first message queue; the first offset value is used for indicating a storage position of a last written object in the first message queue.
Step S102, in response to detecting the end of the target exchange task, reading a second offset value from the first message queue; the second offset value is used for indicating the storage position of the end object in the target exchange task in the first message queue.
In the embodiment of the application, when the central control node detects that the target exchange task is finished, the second offset value can be read from the first message queue, wherein the second offset value is used for indicating the storage position of the end object in the target exchange task in the first message queue. That is, the second offset value is used to indicate the storage location in the first message queue of the last one of the objects that matches the target swap task that was written to the first message queue. For example, the second offset value is marked as offset 2
Step S103, according to the first offset value and the second offset value, checking the object matched with the target exchange task in the source end and the destination end to obtain a checking result.
In the embodiment of the application, the objects matched with the target exchange task in the source end and the destination end can be checked according to the first offset value and the second offset value to obtain a checking result.
According to the data reconciliation method, a first offset value of a first message queue is read in response to detection of a target exchange task, and a second offset value is read from the first message queue when the detection of the target exchange task is finished; and checking the object matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value to obtain a checking result. Therefore, the central control node can store the offset values of the first message queues before and after the object exchange (such as data exchange and file exchange), and can check data according to the offset values of the first message queues before and after the object exchange, without storing additional batch fields in the destination end, and the invasiveness to the destination end can be reduced.
In a possible implementation manner of the embodiment of the present application, when the object is service data, the source end may be a source database, and the destination end may be a destination database, so as to clearly illustrate how service data matched with the target exchange task in the source database and the destination database are checked according to the first offset value and the second offset value in the above embodiment of the present application.
Fig. 2 is a flow chart of another data reconciliation method provided in an embodiment of the application.
As shown in fig. 2, the data reconciliation method may include the steps of:
in step S201, in response to detecting the target exchange task, a first offset value of the first message queue is read.
The first offset value is used for indicating a storage position of the last written business data in the first message queue, and the target exchange task is used for exchanging data between the source database and the target database through the first message queue.
In step S202, in response to detecting that the target exchange task is over, a second offset value is read from the first message queue.
The second offset value is used for indicating the storage position of the last service data in the target exchange task in the first message queue.
The explanation of steps S201 to S202 may be referred to the related description in any embodiment of the present application, and will not be repeated here.
In step S203, in response to the task type of the target exchange task being full-scale exchange, the first service data stream is extracted from the first message queue according to the first offset value and the second offset value.
In the embodiment of the present application, the task types of the target exchange task may include full-volume exchange (i.e., the service data matched with the target exchange task is all the service data in the source database), incremental exchange (i.e., the service data matched with the target exchange task is part of the service data in the source database, such as the service data newly added in the source database).
Wherein each first service data in the first service data streamThe offset value in the first message queue is at a first offset value (offset 1 ) And a second offset value (offset 2 ) Between them. Such as offset 1 +1 is the offset value of the first service data in the target exchange task, and the offset values can be extracted as offset respectively 1 +1,offset 1 +2,offset 1 +3,…,offset 2 -1,offset 2 Is provided).
In the embodiment of the present application, when the task type of the target exchange task is full-volume exchange, at this time, the first message queue includes all service data in the source database, so that the first service data stream may be extracted from the first message queue, and specifically, the first service data stream may be extracted from the first message queue according to the first offset value and the second offset value; wherein the offset value of each first service data in the first service data flow in the first message queue is between the first offset value and the second offset value, e.g. the offset value of each first service data in the first service data flow in the first message queue is within [ first offset value+1, second offset value ].
Wherein the offset value is used to indicate a storage location of the first service data in the first message queue.
Step S204, extracting a second service data stream from the destination database, wherein the second service data stream comprises all second service data in the destination database.
In the embodiment of the present application, the second service data stream may be extracted from the destination database, where the second service data stream includes all service data in the destination database (referred to as second service data in the present application). That is, the full amount of traffic data may be extracted from the destination database to form the second traffic data stream.
In step S205, data reconciliation is performed on the first service data stream and the second service data stream to obtain a reconciliation result.
In the embodiment of the application, the first service data stream and the second service data stream can be subjected to data reconciliation to obtain a reconciliation result.
It should be noted that, in the scenario of full-scale exchange, the first service data stream and the second service data stream both include a large amount of service data, if a large amount of service data is compared at the same time, the processing burden on the device is greater, so in one possible implementation manner of the embodiment of the present application, in order to reduce the processing burden and the processing pressure of the device, a sliding window may be used to perform sliding interception on the first service data stream and the second service data stream, and only data accounting is performed on the service data in one sliding window at a time.
As an example, the first traffic data stream and the second traffic data stream may be intercepted each time a set period (e.g., every ten seconds) is reached using a first sliding window of a first set length of time, wherein the first sliding window slides each time for a second set length of time; and checking the intercepted first business data and second business data in the first sliding window to obtain a checking result.
The magnitude relation between the first set time length and the second set time length is not limited, for example, the first set time length may be equal to the second set time length, or the first set time length may be less than the second set time length, or the first set time length may be greater than the second set time length.
For example, taking the first set time length equal to the second set time length, which are both 10 seconds, the first service data and the second service data in the time period [ a, a+10] can be intercepted for the first time (i.e. the service time stamps of the first service data and the second service data are both located in the time period [ a, a+10 ]) for data checking, the first service data and the second service data in the time period [ a+10, a+20] can be intercepted for the second time for data checking, the first service data and the second service data in the time period [ a+20, a+30] can be intercepted for the third time for data checking, and so on.
Where a is the initial timestamp.
It should be noted that, by periodically sliding the first sliding window, a situation that when one of the first service data stream and the second service data stream arrives first and the other data stream does not arrive, data is checked out for the first service data stream and the second service data stream, which results in inaccurate checking results can be avoided. For example, the first sliding window is slid once every 10 seconds, so that under the condition that a certain data stream arrives first, waiting for another data stream for 10 seconds and then checking the data, thereby improving the accuracy of the checking result.
The data reconciliation method provided by the embodiment of the application can realize reconciliation of the business data in the full-volume exchange scene, and meets the actual data reconciliation requirement.
In order to clearly illustrate how the data reconciliation is performed on the first service data stream and the second service data stream in the above embodiment of the present application, the present application also provides a data reconciliation method.
Fig. 3 is a flowchart of another data reconciliation method provided in an embodiment of the application.
As shown in fig. 3, the data reconciliation method may include the steps of:
In step S301, in response to detecting the target exchange task, a first offset value of the first message queue is read.
The first offset value is used for indicating the storage position of the last written business data in the first message queue, and the target exchange task is used for exchanging data between the source database and the target database through the first message queue.
In step S302, in response to detecting that the target exchange task is over, a second offset value is read from the first message queue.
The second offset value is used for indicating the storage position of the last service data in the target exchange task in the first message queue.
In step S303, in response to the task type of the target exchange task being full-scale exchange, the first service data stream is extracted from the first message queue according to the first offset value and the second offset value.
Wherein an offset value of each first service data in the first service data stream in the first message queue is between the first offset value and the second offset value.
Step S304, extracting a second service data stream from the destination database, wherein the second service data stream comprises all second service data in the destination database.
In step S305, each time the set period is reached, a first sliding window with a first set time length is adopted to intercept the first service data stream and the second service data stream, where the first sliding window slides with a second set time length.
The explanation of steps S301 to S305 may be referred to the related description in any embodiment of the present application, and will not be repeated here.
Step S306, establishing association relation between the intercepted first service data and second service data which are matched with the data identification in the first sliding window.
The data identifier is used for uniquely identifying the service data, for example, the data identifier may be a primary key, a unique key, an ID, etc. of the service data.
In the embodiment of the application, the data identifier of the first service data in the first sliding window can be matched with the data identifier of the second service data, and the association relationship between the first service data and the second service data which are matched with the data identifier in the first sliding window can be established.
Step S307, the service fields of the first service data and the second service data with the association relationship are compared, and the service time stamps of the first service data and the second service data with the association relationship are compared.
In the embodiment of the present application, the service fields of the first service data and the second service data having the association relationship may be compared, and whether the service fields of the first service data and the second service data having the association relationship are matched may be determined, in the case that the service fields of the first service data and the second service data having the association relationship are matched, it is indicated that the service fields of the second service data having the association relationship synchronized to the destination database are not wrong, and in the case that the service fields of the first service data and the second service data having the association relationship are not matched, it is indicated that the service fields of the second service data having the association relationship synchronized to the destination database are wrong, at this time, step S308 may be executed.
In the embodiment of the present application, the service time stamps of the first service data and the second service data having the association relationship may be compared, and whether the service time stamps of the first service data and the second service data having the association relationship are matched may be determined, if the service time stamps of the first service data and the second service data having the association relationship are matched, it is indicated that the service time stamp of the second service data having the association relationship synchronized to the destination database is not wrong, if the service time stamps of the first service data and the second service data having the association relationship are not matched, it is indicated that the service time stamp of the second service data having the association relationship synchronized to the destination database is wrong, and at this time, step S308 may be executed.
Step S308, when the service fields of the first service data and the second service data with the association relationship are not matched, and/or the service time stamps of the first service data and the second service data with the association relationship are not matched, generating a first reconciliation result according to the first service data and the second service data with the association relationship.
In the embodiment of the application, when the service fields of the first service data and the second service data with the association relationship are not matched and/or the service time stamps of the first service data and the second service data with the association relationship are not matched, the second service data with the association relationship synchronous to the target database is indicated to be wrong, and at the moment, a first reconciliation result can be generated according to the first service data and the second service data with the association relationship. The first reconciliation result is used for indicating that the second business data with the association relationship in the target database is not matched with the first business data with the association relationship in the source database.
In a possible implementation manner of the embodiment of the present application, when first service data with an unassociated relationship exists in the first sliding window, it indicates that the first service data is missed and synchronized to the destination database, and at this time, a second accounting result may be generated according to the first service data with the unassociated relationship; and the second checking result is used for indicating that the first business data which does not establish the association relation is absent in the target database.
In a possible implementation manner of the embodiment of the present application, when second service data that does not establish an association relationship exists in the first sliding window, it indicates that the second service data is redundant service data in the destination database, that is, the second service data does not exist in the source database, and at this time, a third accounting result may be generated according to the second service data that does not establish an association relationship; and the third checking result is used for indicating that the second business data which does not establish the association relation is more than the second business data in the target database.
Therefore, not only the business data with the synchronization errors in the accounting target database can be checked out, but also the missing business data and the excessive business data in the accounting target database can be checked out, and the comprehensiveness and the completeness of data accounting can be improved.
In a possible implementation manner of the embodiment of the present application, at least one of the following may be further executed according to the accounting result:
the first item updates the second business data with the association relation in the target database according to the first business data with the association relation based on the first account checking result.
And the second item, based on the second checking result, writing the first business data which does not establish the association relation into the target database.
And thirdly, deleting the second business data which does not establish the association relation in the target database based on a third reconciliation result.
Therefore, consistency and integrity of the synchronous or exchanged business data in the source database and the target database can be improved.
It should be noted that, in the process of writing the service data in the source database into the first message queue, situations such as data loss or data error may also occur, at this time, even if it is determined that the reconciliation is not made according to the first service data stream and the second service data stream, it cannot be shown that the service data synchronized or exchanged in the source database and the target database are completely consistent, so in one possible implementation manner of the embodiment of the present application, in order to improve accuracy of the reconciliation result and improve consistency and integrity of the service data synchronized or exchanged in the source database and the target database, in the present application, when a task type of a target exchange task is full-volume exchange, a third service data stream may also be extracted from the source database; the third service data stream comprises all third service data in the source database, and data checking is carried out on the first service data stream and the third service data stream to obtain a checking result.
It should be noted that, the data accounting method of the first service data stream and the third service data stream is similar to the data accounting method of the first service data stream and the second service data stream, and the implementation principle is similar, and will not be described here.
The data checking method provided by the embodiment of the application can realize the business data with synchronization errors in the checking out target database, and meets the actual data checking requirement.
In a possible implementation manner of the embodiment of the present application, when the object is service data, the source end may be a source database, and the destination end may be a destination database, so as to clearly illustrate how to perform data accounting on service data matched with the target exchange task in the source database and the destination database according to the first offset value and the second offset value in the above embodiment of the present application.
Fig. 4 is a flowchart of another data reconciliation method provided in an embodiment of the application.
As shown in fig. 4, the data reconciliation method may include the steps of:
in step S401, in response to detecting the target exchange task, a first offset value of the first message queue is read.
The first offset value is used for indicating a storage position of the last written business data currently in the first message queue; the target exchange task is for exchanging data between the source database and the destination database via the first message queue.
In step S402, in response to detecting that the target exchange task is over, a second offset value is read from the first message queue.
The second offset value is used for indicating the storage position of the last service data in the target exchange task in the first message queue.
In step S403, in response to the task type of the target exchange task being incremental exchange, a first service timestamp of service data matching the first offset value is queried from the first message queue according to the first offset value.
In the embodiment of the application, when the task type of the target exchange task is incremental exchange, the first service timestamp of the service data matched with the first offset value can be queried from the first message queue according to the first offset value. For example, the first offset value is marked as offset 1 ,offset 1 +1 is the offset value of the first traffic data in the target switching task, the first traffic timestamp may be the traffic timestamp of the first traffic data, i.e. the offset value of the traffic data matching the first offset value is offset 1 +1, the first traffic timestamp may be denoted (offset 1 +1) time stamp. And then the business data stream to be checked out can be extracted according to the first business timestamp.
Wherein the offset value is used to indicate a storage location of the traffic data in the first message queue.
Step S404, according to the second offset value, inquiring the second service time stamp of the service data matched with the second offset value from the first message queue.
Likewise, a second traffic timestamp of traffic data matching the second offset value may be queried from the first message queue according to the second offset value. For example, the second offset value is marked as offset 2 ,offset 2 For the offset value of the end service data in the target exchange task, the second service timestamp may be the service timestamp of the end service data, i.e. the offset value of the service data matching the second offset value is offset 2 Second, secondThe traffic timestamp may be denoted as offset 2 timestamp。
Step S405, according to the first service timestamp and the second service timestamp, extracts the fourth service data stream from the source database, and extracts the fifth service data stream from the destination database.
It should be noted that, when the task type of the target exchange task is incremental exchange, the source database may generate multiple pieces of service data (for example, N pieces of service data are newly added) at the same time, and the multiple pieces of service data may be exchanged into multiple exchange tasks to the target database, for example, N pieces of service data generated at the same time are exchanged into 2 exchange tasks (for example, exchange task 1 and exchange task 2) to the target database. For example, each exchanging task is configured to exchange N/2 pieces of service data to the destination database, for exchanging task 1, when exchanging task 1 ends, N/2 pieces of service data matched with exchanging task 1 may be extracted from the first message queue according to the first offset value and the second offset value corresponding to exchanging task 1, and N/2 pieces of service data may be extracted from the destination database according to the service time stamp of the extracted N/2 pieces of service data; for the exchange task 2, when the exchange task 2 is finished, according to the first offset value and the second offset value corresponding to the exchange task 2, N/2 pieces of service data matched with the exchange task 2 can be extracted from the first message queue, however, according to the service time stamp of the extracted N/2 pieces of service data, N pieces of service data (N/2 pieces of service data exchanged by the exchange task 1+n/2 pieces of service data exchanged by the exchange task 2) can be extracted from the destination database, that is, when the first exchange task is executed, only N/2 pieces of service data at the same moment are imported in the destination database, and when the second exchange task is executed, N pieces of service data at the same moment are imported in the destination database, at this time, for the exchange task 2, data reconciliation is performed between the service data extracted from the first message queue and the service data extracted from the destination database, that is mistakenly considered as redundant service data exchanged by the exchange task 1.
In view of the above, in the present application, in order to improve the comprehensiveness, integrity and accuracy of the reconciliation result, a fourth service data stream may be extracted from the source database according to the first service timestamp and the second service timestamp, where the service timestamp of each fourth service data in the fourth service data stream is located between the first service timestamp and the second service timestamp, i.e. the service timestamp of each fourth service data in the fourth service data stream is located within [ first service timestamp, second service timestamp ].
Likewise, a fifth traffic data stream may be extracted from the destination database based on the first traffic timestamp and the second traffic timestamp, wherein the traffic timestamp of each of the fifth traffic data streams is between the first traffic timestamp and the second traffic timestamp, i.e. the traffic timestamp of each of the fifth traffic data streams is within [ first traffic timestamp, second traffic timestamp ].
Step S406, data reconciliation is performed on the fourth service data stream and the fifth service data stream to obtain a reconciliation result.
In the embodiment of the application, the fourth service data stream and the fifth service data stream can be subjected to data reconciliation to obtain a reconciliation result. It should be noted that, the data reconciliation method of the fourth service data stream and the fifth service data stream is similar to the data reconciliation method of the first service data stream and the second service data stream, and the implementation principle is similar, and will not be described here.
The data reconciliation method provided by the embodiment of the application not only can realize reconciliation of the business data in the full-volume exchange scene, but also can realize reconciliation of the business data in the incremental exchange scene, and can improve the flexibility and applicability of the method.
In one possible implementation manner of the embodiment of the application, when the object is service data, the source end may be a source database and the destination end may be a destination database, so as to clearly illustrate how to reconcile the service data matched with the target exchange task in the source database and the destination database to obtain a reconciliation result.
Fig. 5 is a flow chart of another reconciliation method provided in an embodiment of the application.
As shown in fig. 5, the reconciliation method may include the steps of:
in step S501, in response to detecting the target exchange task, a first offset value of the first message queue is read.
The first offset value is used for indicating a storage position of the last written business data currently in the first message queue; the target exchange task is for exchanging data between the source database and the destination database via the first message queue.
The explanation of step S501 may be referred to the related description in any embodiment of the present application, and will not be repeated here.
Step S502, responding to the target exchange task not to finish, and extracting a sixth service data stream from the first message queue according to the first offset value; wherein, the offset value corresponding to each sixth service data in the sixth service data stream is greater than the first offset value.
It should be noted that, for a real-time reconciliation scenario, when incremental business data always exists in the source database, the target exchange task may not stop, and at this time, business data to be reconciled having an offset value greater than the first offset value in the first message queue needs to be extracted.
In the application, when the target exchange task is not finished, the sixth service data stream can be extracted from the first message queue according to the first offset value; wherein, the offset value corresponding to each sixth service data in the sixth service data stream is greater than the first offset value (offset 1 ) That is, the offset value in the first message queue may be decimated to offset 1 +1、offset 1 +2、offset 1 Traffic data of +3, ….
Step S503, extracting the seventh business data stream from the second message queue according to the third offset value; the third offset value is used for indicating a storage position of last written service data in the second message queue before the target exchange task is started, and each seventh service data in the seventh service data flow is updated service data obtained from the target database in response to the detection of the target exchange task.
In the embodiment of the application, the business data written in the target database can be recorded through the second message queue. For example, before the target exchange task is started, a third offset value of the second exchange task may be read, where the third offset value is used to indicate a storage location of the service data that is currently written in the last piece in the second message queue. And when the target exchange task is detected, the newly added service data in the target database can be perceived, and when the newly added service data in the target database is perceived, the newly added service data can be written into the second message queue. That is, the second message queue is written with service data updated in the destination database.
In the embodiment of the present application, a seventh service data stream may be extracted from the second message queue according to the third offset value; wherein each of the seventh service data in the seventh service data stream is updated service data obtained from the destination database in response to detecting the destination switching task. For example, the third offset value is marked as offset 3 The offset value in the second message queue may be extracted as offset 3 +1、offset 3 +2、offset 3 Traffic data of +3, ….
Step S504, data reconciliation is performed on the sixth service data stream and the seventh service data stream to obtain a reconciliation result.
In the embodiment of the application, the sixth service data stream and the seventh service data stream can be subjected to data reconciliation to obtain a reconciliation result.
It should be noted that, the data reconciliation method of the sixth service data stream and the seventh service data stream is similar to the data reconciliation method of the first service data stream and the second service data stream, and the implementation principle is similar, and will not be described here.
The checking method provided by the embodiment of the application not only can realize checking the business data in the full-quantity exchange scene, but also can realize checking the business data in the incremental exchange scene, and can further improve the flexibility and applicability of the method.
In any of the embodiments of the present application, for a real-time reconciliation scenario, there is a primary or unique key, timestamp field for the business data in business database a and business database B.
In the real-time reconciliation scenario, if incremental business data always exists in the source database, the exchange task may not stop, and at this time, the change of the business data needs to be perceived in real time. Before starting the exchange task, the start position offset in the first message queue (such as message queue 1) is recorded 1 . If the real-time account checking function is started, the incremental business data in the target database is started preferentially>An incremental task of a second message queue, such as message queue 2, for writing incremental traffic data in the destination database to message queue 2. At the time of data reconciliation, traffic data stream 2 may be extracted from message queue 2.
Ensuring incremental business data in destination database>After the incremental task of the message queue 2 is started, the source database is started>Incremental tasks of message queue 1. By offset at the time of data reconciliation 1 For consumption at the start position, the service data stream 1 is available.
By comparing the service data stream 1 with the service data stream 2, the service data which is lost or not arrived in time in the target database can be perceived in real time by utilizing state programming, and an alarm is given to the user.
For clarity of explanation of any of the above embodiments of the present application, the present application also proposes a data reconciliation method.
Fig. 6 is a flowchart of another data reconciliation method provided in an embodiment of the application.
As shown in fig. 6, the data reconciliation method may further comprise the following steps when a target exchange task is detected:
step S601, in response to detecting the target switching task, sends the first switching task to the first switching node corresponding to the source end.
The first switching task is used for acquiring an object matched with the first switching task from the source end by the first switching node, sequencing the object matched with the first switching task according to the timestamp of the object matched with the first switching task, and sequentially writing the sequenced objects into the first message queue.
In the embodiment of the application, when the central control node detects that the object in the source end is updated, the first exchange task may be generated, where the first exchange task is used to indicate the updated object in the source database, or the first exchange task may also be triggered manually, for example, the first exchange task may carry a query period, where the first exchange task is used to indicate the object in the source end with a timestamp in the query period, or the first exchange task may also be used to indicate all the objects in the source end, etc., where the generating manner of the first exchange task is not limited by the application.
In the embodiment of the application, the central control node may send the first switching task to the first switching node corresponding to the source end, and correspondingly, after receiving the first switching task, the first switching node may respond to the first switching task to obtain the object matched with the first switching task from the source end, sort the objects matched with the first switching task according to the timestamp of the object matched with the first switching task, for example, sort the objects from small to large according to the value of the timestamp, and sequentially write the sorted objects into the first message queue.
As a possible implementation manner, taking an object as service data as an example, a source end may be a source database, a destination end may be a destination database, a central control node may send a first switching task to a first switching node corresponding to the source database, and correspondingly, after receiving the first switching task, the first switching node may respond to the first switching task to obtain service data matched with the first switching task from the source database, sort the service data matched with the first switching task according to a service timestamp of the service data matched with the first switching task, for example, sort the service data according to a value of the service timestamp from small to large, and sequentially write the sorted service data into the first message queue.
Step S602, a second switching task is sent to a second switching node corresponding to the destination, where the second switching task is used for the second switching node to read an object matched with the first switching task from the first message queue and write the object into the destination database.
As an example, taking an object as service data for illustration, a source end may be a source database, a destination end may be a destination database, the central control node may further send a second switching task to a second switching node corresponding to the destination database, and correspondingly, after receiving the second switching task, the second switching node may respond to the second switching task to read service data matched with the second switching task from the first message queue and write the service data into the destination database. Wherein the traffic data matching the second switching task is the same as the traffic data matching the first switching task.
In the embodiment of the present application, when the end of the target exchange task is detected, the second offset value is read from the first message queue, which may specifically be: upon detecting the end of the first exchange task and/or the second exchange task, a second offset value is read from the first message queue.
In the embodiment of the application, when the central control node detects that the first switching task and/or the second switching task is finished, the second offset value can be read from the first message queue, wherein the second offset value is used for indicating the storage position of the last written business data in the first message queue.
According to the data reconciliation method provided by the embodiment of the application, the effective exchange of service data can be realized by issuing the form of the exchange task to different exchange nodes through the central control node.
In any embodiment of the application, full or incremental data reconciliation can be completed and lost business data can be accurately found without additionally occupying reconciliation library resources and without adding batch fields in a destination database.
Specifically, kafka (message queue) can be used as a relay for cross-gateway data exchange instead of a central library mode, and the central control node records the source number Database->The reconciliation node extracts the offset from Kafka by starting the value of topic offset 1 before the exchange synchronization of Kafka and the value of topic offset 2 after the completion of the exchange synchronization 2 And offset 1 And calculating the business data corresponding to the business data in the target database by using the flow to obtain the reconciliation result.
As an example, the structure of the data reconciliation system may be as shown in fig. 7, where the source end may be a source database and the destination end may be a destination database, and fig. 7 illustrates that the source database is a service database a, the first switching node is a switching node a, the destination database is a service database B, and the second switching node is a switching node B, and service data in the service database a and the service database B has a primary key or a unique key and a timestamp field.
1. The central control node (i.e. the management node in fig. 7) issues the switching tasks of the service databases a to Kafka to the switching node a, and issues the switching tasks of Kafka to the service databases B to the switching node B. The exchange task from the service database A to the Kafka needs to sort each service data according to the timestamp field of the service data, and write the sorted service data into the Kafka message queue so as to ensure that the service data is written into the Kafka message queue from the early to the late according to the timestamp.
2. According to the characteristics of the Kafka message queue, each service data write will have a corresponding index (offset), and the index value is incremented sequentially with the data write. Thus, before starting the exchange task, the central control node can read the current offset of the corresponding topic from the Kafka message queue 1 (subscript).
3. After the exchange task of gateway A and gateway B is finished, the central control node reads offset from the Kafka message queue 2 (subscript).
4. After the central control node senses that the exchange task is finished, the central control node issues a reconciliation task to a reconciliation service (namely, a reconciliation node).
In a full-volume exchange scenario, offset in Kafka message queue 2 And offset 1 The data between them is all the service data exchanged this time, so that the offset in the Kafka message queue can be directly used 2 -offset 1 The service data is taken as a service data stream 1, the total service data is extracted from a service database B to form a service data stream 2, and the service data stream 1 is compared with the service data stream 2.
In the incremental exchange scenario, the reconciliation service passes through offset 2 、offset 1 Corresponding timestamp field, determining the batch range of the current exchange, and inquiring [ (offset) from the service database A 1 +1)timestamp,offset 2 timestamp]Traffic data within the interval, forming traffic data stream 1, and querying [ (offset) from traffic database B 1 +1)timestamp,offset 2 timestamp]The traffic data in the interval forms traffic data stream2 and compares traffic data stream1 with traffic data stream 2.
5. Associating the service data stream1 with the service data stream2 by using a primary key or a unique key, using a timestamp field as a watermark semantic, using state programming (value state), storing the first-in service data into a state (such as an arrived state), establishing a timer (such as a timer with a duration of 10 seconds), outputting and deleting the timer when the other service data stream arrives within a limited time, and otherwise triggering the output of the output stream of the timer on the walking side. In the timer, the non-arrival business data are all output to the side output flow, so that all-external connection is realized, the non-arrival output is carried out on A to B, the left connection is realized, and otherwise, the right connection effect is realized.
6. The missing or changed service data of the service database B can be determined through the left connection output stream, and the redundant service data of the service database B can be determined through the right connection output stream.
Taking the example of the service data stream1 (stream 1) and the service data stream2 (stream 2) as shown in fig. 8, the data accounting is performed on the stream1 and the stream2, and the accounting result can be obtained as follows: (2, null) (i.e., data with a primary key of 2 in stream1 arrives and stream2 does not arrive), (null, 3) (i.e., data with a primary key of 3 in stream2 arrives and stream1 does not arrive), (5, null) (i.e., data with a primary key of 5 in stream1 arrives and stream2 does not arrive), (null, 2) (i.e., data with a primary key of 2 in stream2 arrives and stream1 does not arrive).
Since the service data with the primary key 2 in stream2 is actually the change data, that is, the timestamp (time t 2) of the service data with the primary key 2 in stream2 is inconsistent with the timestamp (time t 1) of the service data with the primary key 2 in stream1, here, it can be considered that the service database B lacks the service data with the primary key 2 at time t1 and exceeds the service data with the primary key 2 at time t 2.
Through the account checking result, the data account checking system can select correction operation, delete the business data with the primary key or the unique key of 2 and 3 of the business database B, and newly add the business data with the primary key or the unique key of 2 and 5 of the stream1, thereby ensuring the data consistency of the business database A and the business database B.
In summary, the accounting result can be obtained by using the business data between the extracted topic offset to perform the streaming calculation, the accounting library resources are not required to be additionally occupied, the batch fields are not required to be added in the target database, the accounting of the full or incremental data can be completed, and the lost business data can be accurately found out.
In a possible implementation manner of the embodiment of the present application, when the object is a file, the source end may be a source file server, the destination end may be a destination file server, and the file may be written into the first message queue through a plurality of messages, for example, the file may be split into a plurality of binary groups, and each binary group may be a message. For example, the file may be written to the first message queue in a format as shown in FIG. 9, where each cell in FIG. 9 is a message. In order to explain how to reconcile the files matched with the target exchange task in the source file server and the destination file server according to the first offset value and the second offset value, the application also provides a data reconciliation method.
Fig. 10 is a flowchart of another data reconciliation method provided in an embodiment of the application.
As shown in fig. 10, the data reconciliation method may include the steps of:
in step S1001, in response to detecting the target exchange task, a first offset value of the first message queue is read.
The first offset value is used for indicating a storage position of a file which is written last in the first message queue. It will be appreciated that since the file is written to the first message queue by a plurality of messages, the first offset value may be used to indicate the storage location of the last message in the file currently written to the first message queue.
Wherein the target exchange task is for exchanging files between the source file server and the destination file server via the first message queue.
In step S1002, in response to detecting the end of the target exchange task, a second offset value is read from the first message queue.
The second offset value is used for indicating the storage position of the end file in the target exchange task in the first message queue. It will be appreciated that since the file is written to the first message queue by a plurality of messages, the second offset value may be used to indicate the storage location of the last message in the end file in the first message queue.
The explanation of steps S1001 to S1002 may refer to the related description in any embodiment of the present application, and will not be repeated here.
Step S1003, extracting the first message flow from the first message queue according to the first offset value and the second offset value; wherein the offset value corresponding to each message in the first message stream is between the first offset value and the second offset value.
In the embodiment of the application, the first message flow can be extracted from the first message queue according to the first offset value and the second offset value; wherein the offset value corresponding to each message in the first message stream is between the first offset value and the second offset value. For example, the offset value of each message in the first message stream in the first message queue is within [ first offset value +1, second offset value ].
Step S1004, generating a second message stream according to the file written in the destination file server.
In the embodiment of the application, the second message stream can be generated according to the file written in the destination file server.
In a possible implementation manner of the embodiment of the present application, the generation manner of the second message flow is, for example: extracting all files (marked as target files in the application) from a target file server, carrying out hash operation on each target file to obtain hash values (marked as second hash values in the application) of each target file, generating second messages according to file names, second hash values and file time stamps corresponding to any target file for any target file, and sequencing each second message according to the file time stamps corresponding to each second message to obtain a second message stream. For example, the second messages may be ordered from small to large according to the value of the file timestamp, resulting in a second message stream.
Therefore, the checking of the message stream can be realized according to the hash value and the file timestamp of each message in the message stream, the checking of the message content in the message stream is not needed, and the checking efficiency can be improved.
In another possible implementation manner of the embodiment of the present application, the generation manner of the second message flow is, for example: the central node may read and store a fourth offset value of the third message queue before the target exchange task is started, where the fourth offset value is used to indicate a storage location of a last written message in the third message queue before the target exchange task is started, and may further read and store a fifth offset value of the third message queue when the target exchange task is ended or after the target exchange task is ended, where the fifth offset value is used to indicate a storage location of a last written message in the third message queue when the target exchange task is ended or after the target exchange task is ended, so in the present application, an initial message stream may be extracted from the third message queue according to the fourth offset value and the fifth offset value, where each message in the initial message stream is generated according to a file written in the target file server. In the application, when the target exchange task is detected, the file can be extracted from the target file server and written into the third message queue, and when the target exchange task is detected or after the target exchange task is detected, the initial message stream can be extracted from the third message queue, wherein the offset value of each message in the initial message stream is between the fourth offset value and the fifth offset value.
In the application, after combining a plurality of messages belonging to the same file in the initial message stream, carrying out hash operation to obtain a second hash value of the same file, and generating second messages according to the file name, the second hash value and the file timestamp corresponding to the same file, thereby sequencing the second messages according to the file timestamp corresponding to each second message to obtain a second message stream. For example, the second messages may be ordered from small to large according to the value of the file timestamp, resulting in a second message stream.
Therefore, the second message flow can be generated in different modes, and the flexibility and applicability of the method can be improved.
In step S1005, the first message flow and the second message flow are checked out to obtain a checking result.
In the embodiment of the application, the first message stream and the second message stream can be checked to obtain a checking result.
The data reconciliation method provided by the embodiment of the application not only can realize the reconciliation of the business data in the full-quantity exchange scene, the reconciliation of the business data in the increment exchange scene, and the reconciliation of the business data in the real-time reconciliation scene, but also can realize the reconciliation of the files in the file exchange scene, and can further improve the flexibility and applicability of the method.
In order to clearly explain how to reconcile the first message stream and the second message stream in the embodiment of the application to obtain a reconciliation result, the application also provides a reconciliation method.
Fig. 11 is a flow chart of another reconciliation method provided in an embodiment of the application.
As shown in fig. 11, step S1005 may include the steps of:
step S1101, after merging the plurality of messages belonging to the same file in the first message stream, performing a hash operation to obtain a first hash value of the same file.
In the embodiment of the application, a plurality of messages belonging to the same file in the first message stream can be combined to obtain the combined message corresponding to the same file, and the combined message of the same file is subjected to hash operation to obtain the first hash value of the same file. For example, the first hash value of the same file may be obtained by performing a hash operation on the combined Message of the same file through a hash Algorithm (e.g., MD5 (Message-Digest Algorithm), etc.).
In step S1102, a first message is generated according to the file name, the first hash value and the file timestamp corresponding to the same file.
In the embodiment of the application, the first message can be generated according to the file name, the first hash value and the file timestamp corresponding to the same file.
Step S1103, according to the file time stamp corresponding to each first message, ordering each first message to obtain an updated first message stream.
In the embodiment of the application, the first messages can be sequenced according to the file time stamp corresponding to the first messages so as to obtain the updated first message stream. For example, the first messages may be ordered from small to large according to the value of the file timestamp, to obtain the first message stream.
Step S1104, checking the updated first message stream and second message stream to obtain a checking result.
In the embodiment of the application, the updated first message flow and the updated second message flow can be checked to obtain a checking result.
It should be noted that, in the file exchange scenario, the first message flow and the second message flow both include a large number of messages, if a large number of messages are compared at the same time, the processing burden on the device is greater, so in one possible implementation manner of the embodiment of the present application, in order to reduce the processing burden and processing pressure of the device, a sliding window may be used to slidably intercept the updated first message flow and second message flow, and only the messages in one sliding window are checked at a time.
As an example, whenever a set period is reached (e.g., every ten seconds), the updated first and second message streams are intercepted using a second sliding window of a third set length of time; the second sliding window slides each time for a fourth set time length; establishing an association relationship between the first message and the second message with the file names matched in the second sliding window; and comparing the first hash value of the first message with the association relationship with the second hash value of the second message, and comparing the file timestamp of the first message with the association relationship with the file timestamp of the second message.
The magnitude relation between the third set time length and the fourth set time length is not limited, for example, the third set time length may be equal to the fourth set time length, or the third set time length may be less than the fourth set time length, or the third set time length may be greater than the fourth set time length.
In the case that the first hash value of the first message and the second hash value of the second message with the association do not match (indicating that the content of the file corresponding to the file name in the second message with the association is wrong in synchronization to the destination file server), and/or the file timestamp of the first message with the association does not match (indicating that the timestamp of the file corresponding to the file name in the second message with the association is wrong in synchronization to the destination file server), a fourth reconciliation result may be generated according to the first message and the second message with the association; and the fourth reconciliation result is used for indicating that the file corresponding to the file name in the second message with the association relationship in the destination file server is not matched with the file corresponding to the file name in the first message with the association relationship in the source file server.
In a possible implementation manner of the embodiment of the present application, when a first message with no association relationship is present in the second sliding window, it indicates that a file corresponding to a file name in the first message is missed and synchronized to the destination file server, and at this time, a fifth reconciliation result may be generated according to the first message with no association relationship; and the fifth checking result is used for indicating that a file corresponding to the file name in the first message which does not establish the association relation is absent in the destination file server.
In a possible implementation manner of the embodiment of the present application, when a second message in which an association relationship is not established exists in the second sliding window, it indicates that a file corresponding to a file name in the second message is an unnecessary file in the destination file server, that is, a file corresponding to a file name in the second message does not exist in the source file server, and at this time, a sixth reconciliation result may be generated according to the second message in which the association relationship is not established; and the sixth reconciliation result is used for indicating that a plurality of files corresponding to the file names in the second message which do not establish the association relationship are generated in the destination file server.
Therefore, not only can the files with errors in synchronization in the accounting destination file server be checked out, but also the missing files and the excessive files in the accounting destination file server can be checked out, and the comprehensiveness and the completeness of file accounting can be improved.
In a possible implementation manner of the embodiment of the present application, at least one of the following may be further executed according to the accounting result:
the first item updates, based on the fourth reconciliation result, a file corresponding to the file name in the second message having the association in the destination file server according to a file corresponding to the file name in the first message having the association.
And writing a file corresponding to the file name in the first message which does not establish the association relationship into a destination file server based on the fifth checking result.
And third, deleting the file corresponding to the file name in the second message which does not establish the association relationship in the target file server based on the sixth reconciliation result.
Thus, the consistency and integrity of the files synchronized or exchanged in the source file server and the target file server can be improved.
In any one embodiment of the present application, for a file exchange scenario, a source end may be a source file server, a destination end may be a destination file server, and a structure of the data reconciliation system may be as shown in fig. 7, where fig. 7 illustrates a source file server as a file server a, a first switching node as a switching node a, a destination file server as a file server B, and a second switching node as a switching node B.
1. The central control node (i.e. the management node in fig. 7) issues to switching node a the switching task of file server a to the first message queue (i.e. Kafka in fig. 7) and to switching node B the switching task of Kafka to file server B.
2. Before starting the exchange task, the central control node can read the current offset of the corresponding topic from Kafka 1 (subscript).
3. The exchange task a reads the file in the file server a and writes Kafka in the format shown in fig. 9, one message per cell. Wherein a file is composed of a plurality of messages.
For example, when a file is stored in Kafka in the format shown in fig. 9, the following information may be contained:
file start flag (startflag): the method comprises the steps of including a start mark, a file relative path and a file update time;
file binary content: the file is split into a plurality of binary arrays, each of which is a message;
end of file flag (endflag): representing the end of the file stream.
In the application, each message belonging to the same file can be positioned according to the file start mark and the file end mark.
4. The exchange task B analyzes the message in Kafka and writes the data into the destination file server B.
5. After the exchange task of gateway A and gateway B is finished, the central control node reads offset from Kafka 2 (subscript).
6. And after the central control node senses that the exchange task is finished, issuing a reconciliation task to the reconciliation service. Wherein, the file reconciliation only supports full scene, at this time, the offset in Kafka 2 And offset 1 The data between them is this timeAll data is exchanged.
7. The offset in Kafka 2 -offset 1 Messages in between as message stream 1.
8. The file is extracted from the file server B and written to the message queue 3 in the format shown in fig. 9, and the message stream 2 is extracted from the message queue 3.
9. Before formal reconciliation, message stream 1 and message stream 2 are preprocessed to obtain message stream 3 and message stream 4. Wherein, the pretreatment process comprises:
1) Combining a plurality of messages belonging to the same file in the message stream 1 or the message stream 2, and performing hash calculation on binary contents of the file through a hash algorithm (such as MD5 algorithm), so as to finally form a message with the following format:
2) And ordering the messages formatted in the step 1) according to the service time stamps to obtain a message stream 3 or a message stream 4.
10. By comparing message stream 3 with message stream 4, using time window and state programming, for file names where hash values (e.g., md5 values) are inconsistent, the corresponding file synchronization is considered to be in error. And finally, obtaining a difference file between the source file server and the destination file server, and generating a reconciliation result.
Corresponding to the data reconciliation method provided in the above embodiments, an embodiment of the present application further provides a data reconciliation device. Since the data reconciliation device provided by the embodiment of the present application corresponds to the data reconciliation method provided by the above-described several embodiments, the implementation of the data reconciliation method is also applicable to the data reconciliation device provided by the embodiment, and will not be described in detail in the embodiment.
Fig. 12 is a schematic structural view of a data reconciliation device according to an embodiment of the application.
As shown in fig. 12, the data reconciliation apparatus 1200 may include: first reading module 1201, second reading module 1202, and reconciliation module 1203.
Wherein, the first reading module 1201 is configured to read a first offset value of the first message queue in response to detecting the target exchange task; the first offset value is used for indicating the storage position of the last written object in the first message queue; the target exchange task is used for exchanging objects between the source end and the destination end through the first message queue.
A second read module 1202 for reading a second offset value from the first message queue in response to detecting the end of the target swap task; the second offset value is used for indicating the storage position of the end object in the target exchange task in the first message queue.
And the reconciliation module 1203 is configured to reconcile the objects matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value, so as to obtain a reconciliation result.
As a possible implementation manner of the embodiment of the application, the object is service data, the source end is a source database, and the destination end is a destination database; the reconciliation module 1203 is specifically configured to: responding to the task type of the target exchange task as full-volume exchange, and extracting a first business data stream from a first message queue according to a first offset value and a second offset value; wherein, the offset value corresponding to each first service data in the first service data stream is between the first offset value and the second offset value; extracting a second service data stream from the destination database; the second service data flow comprises all second service data in a target database; and performing data reconciliation on the first service data stream and the second service data stream to obtain a reconciliation result.
As a possible implementation manner of the embodiment of the present application, the reconciliation module 1203 is further configured to: extracting a third service data stream from the source database; the third service data flow comprises all third service data in a source database; and performing data reconciliation on the first service data stream and the third service data stream to obtain a reconciliation result. .
As a possible implementation manner of the embodiment of the application, the object is service data, the source end is a source database, and the destination end is a destination database; the reconciliation module 1203 is specifically configured to: responding to the task type of the target exchange task as increment exchange, and inquiring a first service time stamp of service data matched with the first offset value from a first message queue according to the first offset value; inquiring a second service time stamp of the service data matched with the second offset value from the first message queue according to the second offset value; according to the first service time stamp and the second service time stamp, extracting a fourth service data stream from the source database and extracting a fifth service data stream from the destination database; performing data reconciliation on the fourth service data stream and the fifth service data stream to obtain a reconciliation result; wherein the service time stamp of each fourth service data in the fourth service data stream is between the first service time stamp and the second service time stamp, and the service time stamp of each fifth service data in the fifth service data stream is between the first service time stamp and the second service time stamp.
As a possible implementation manner of the embodiment of the application, the object is service data, the source end is a source database, and the destination end is a destination database; the data reconciliation apparatus 1200 may further comprise:
The first extraction module is used for extracting a sixth service data stream from the first message queue according to the first offset value in response to the target exchange task not ending; wherein, the offset value corresponding to each sixth service data in the sixth service data stream is greater than the first offset value.
A second extracting module, configured to extract a seventh service data flow from the second message queue according to the third offset value; the third offset value is used for indicating a storage position of last written service data in the second message queue before the target exchange task is started, and each seventh service data in the seventh service data flow is updated service data obtained from the target database in response to the detection of the target exchange task.
The reconciliation module 1203 is further configured to perform data reconciliation on the sixth service data flow and the seventh service data flow to obtain a reconciliation result.
As a possible implementation manner of the embodiment of the present application, the reconciliation module 1203 is specifically configured to: intercepting the first service data stream and the second service data stream by adopting a first sliding window with a first set time length every time a set period is reached; the first sliding window slides in a second set time length; and checking the intercepted first business data and second business data in the first sliding window to obtain a checking result.
As a possible implementation manner of the embodiment of the present application, the reconciliation module 1203 is specifically configured to: establishing an association relationship between first service data and second service data matched with data identifiers in a first sliding window; comparing the service fields of the first service data and the second service data with the association relationship, and comparing the service time stamps of the first service data and the second service data with the association relationship; generating a first reconciliation result according to the first service data and the second service data with the association relationship under the condition that the service fields of the first service data and the second service data with the association relationship are not matched and/or the service time stamps of the first service data and the second service data with the association relationship are not matched; the first reconciliation result is used for indicating that the second business data with the association relationship in the target database is not matched with the first business data with the association relationship in the source database.
As a possible implementation manner of the embodiment of the present application, the reconciliation module 1203 is further configured to: generating a second account checking result according to the first business data which does not establish the association relation in response to the first business data which does not establish the association relation in the first sliding window; responding to second business data which does not establish an association relationship in the first sliding window, and generating a third account checking result according to the second business data which does not establish the association relationship; the second account checking result is used for indicating that the first business data which does not establish the association relation is absent in the target database; and the third account checking result is used for indicating that the second business data which does not establish the association relation are more in the target database.
As a possible implementation manner of the embodiment of the present application, the data reconciliation device 1200 may further include:
an updating module for performing at least one of:
updating second business data with an association relationship in a target database according to the first business data with the association relationship based on the first account checking result;
based on the second reconciliation result, writing the first business data which does not establish the association relationship into a target database;
and deleting the second business data which does not establish the association relation in the target database based on the third reconciliation result.
As a possible implementation manner of the embodiment of the present application, the target exchange task includes a first exchange task and a second exchange task, and the data reconciliation device 1200 may further include:
the first sending module is used for responding to the detection of the target exchange task and sending a first exchange task to a first exchange node corresponding to the source end, wherein the first exchange task is used for acquiring an object matched with the first exchange task from the source end by the first exchange node, sequencing the object matched with the first exchange task according to the timestamp of the object matched with the first exchange task, and sequentially writing the sequenced objects into the first message queue.
And the second sending module is used for sending a second switching task to a second switching node corresponding to the destination terminal, wherein the second switching task is used for the second switching node to read the object matched with the first switching task from the first message queue and write the object into the destination terminal.
The second reading module 1202 is specifically configured to: in response to detecting the end of the first exchange task and/or the second exchange task, a second offset value is read from the first message queue.
As a possible implementation manner of the embodiment of the application, the object is a file, the file is written into the first message queue through a plurality of messages, the source end is a source file server, and the destination end is a destination file server; the reconciliation module 1203 is specifically configured to: extracting a first message stream from the first message queue according to the first offset value and the second offset value; wherein, the offset value corresponding to each message in the first message stream is between the first offset value and the second offset value; generating a second message stream according to the file written in the destination file server; and checking the first message stream and the second message stream to obtain a checking result. .
As a possible implementation manner of the embodiment of the present application, the reconciliation module 1203 is specifically configured to: combining a plurality of messages belonging to the same file in a first message stream, and performing hash operation to obtain a first hash value of the same file; generating a first message according to a file name, a first hash value and a file timestamp corresponding to the same file; sequencing the first messages according to the file time stamp corresponding to the first messages to obtain updated first message streams; and checking the updated first message stream and the updated second message stream to obtain a checking result.
As a possible implementation manner of the embodiment of the present application, the reconciliation module 1203 is specifically configured to: extracting all target files from a target file server, and carrying out hash operation on each target file to obtain a second hash value of each target file; generating a second message according to the file name, the second hash value and the file timestamp corresponding to the arbitrary target file aiming at the arbitrary target file; and ordering the second messages according to the file time stamps corresponding to the second messages to obtain second message streams.
As a possible implementation manner of the embodiment of the present application, the reconciliation module 1203 is specifically configured to: acquiring a fourth offset value and a fifth offset value, wherein the fourth offset value is used for indicating the storage position of the last written message in the third message queue before the target exchange task is started, and the fifth offset value is used for indicating the storage position of the last written message in the third message queue when the target exchange task is ended; extracting an initial message stream from the third message queue according to the fourth offset value and the fifth offset value, wherein each message in the initial message stream is generated according to a file written in the destination file server; combining a plurality of messages belonging to the same file in the initial message stream, and performing hash operation to obtain a second hash value of the same file; generating a second message according to the file name, the second hash value and the file timestamp corresponding to the same file; and ordering the second messages according to the file time stamps corresponding to the second messages to obtain second message streams.
As a possible implementation manner of the embodiment of the present application, the reconciliation module 1203 is specifically configured to: each time a set period is reached, a second sliding window with a third set time length is adopted to intercept the updated first message stream and the updated second message stream; the second sliding window slides in a fourth set time length; establishing an association relationship between the first message and the second message with the file names matched in the second sliding window; comparing the first hash value of the first message with the association relationship with the second hash value of the second message, and comparing the file timestamp of the first message with the association relationship with the file timestamp of the second message; generating a fourth reconciliation result according to the first message and the second message with the association relationship when the first hash value of the first message and the second hash value of the second message with the association relationship are not matched and/or the file timestamp of the first message and the file timestamp of the second message with the association relationship are not matched; and the fourth reconciliation result is used for indicating that the file corresponding to the file name in the second message with the association relationship in the destination file server is not matched with the file corresponding to the file name in the first message with the association relationship in the source file server.
As a possible implementation manner of the embodiment of the present application, the reconciliation module 1203 is further configured to: responding to the first message which does not establish the association relation in the second sliding window, and generating a fifth account checking result according to the first message which does not establish the association relation; responding to a second message which does not establish an association relationship in the second sliding window, and generating a sixth reconciliation result according to the second message which does not establish the association relationship; the fifth account checking result is used for indicating that a file corresponding to a file name in the first message which does not establish the association relation is absent in the target file server; and the sixth reconciliation result is used for indicating that the destination file server is provided with more files corresponding to the file names in the second message which do not establish the association relationship.
The data reconciliation device in the embodiment of the application reads the first offset value of the first message queue by responding to the detection of the target exchange task, and reads the second offset value from the first message queue when the detection of the end of the target exchange task; and checking the object matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value to obtain a checking result. Therefore, the central control node can store the offset values of the first message queues before and after the object exchange (such as data exchange and file exchange), and can check data according to the offset values of the first message queues before and after the object exchange, without storing additional batch fields in the destination end, and the invasiveness to the destination end can be reduced.
In order to implement the above embodiment, the present application further provides an electronic device, and fig. 13 is a schematic structural diagram of an electronic device provided in the embodiment of the present application. The electronic device includes:
memory 1301, processor 1302, and computer programs stored on memory 1301 and executable on processor 1302.
The processor 1302, when executing the program, implements the data reconciliation method provided in any of the embodiments described above.
Further, the electronic device further includes:
a communication interface 1303 for communication between the memory 1301 and the processor 1302.
Memory 1301 is used to store a computer program that can run on processor 1302.
Memory 1301 may comprise high-speed RAM memory or may also comprise non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 1302 is configured to implement the data reconciliation method described in any of the foregoing embodiments when executing the program.
If the memory 1301, the processor 1302, and the communication interface 1303 are implemented independently, the communication interface 1303, the memory 1301, and the processor 1302 may be connected to each other through a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (Peripheral Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 13, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 1301, the processor 1302 and the communication interface 1303 are integrated on a chip, the memory 1301, the processor 1302 and the communication interface 1303 may complete communication with each other through internal interfaces.
The processor 1302 may be a central processing unit (Central Processing Unit, abbreviated as CPU) or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC) or one or more integrated circuits configured to implement embodiments of the present application.
In order to implement the above embodiments, the embodiments of the present application also propose a non-transitory computer-readable storage medium on which a computer program is stored, which when executed by a processor implements a data reconciliation method as provided in any of the embodiments described above.
In order to implement the above embodiments, the embodiments of the present application also provide a computer program product, which when executed by an instruction processor in the computer program product, implements the data reconciliation method provided in any of the embodiments described above.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (19)

1. A method of reconciliation of data, the method comprising:
in response to detecting the target exchange task, reading a first offset value of the first message queue; the first offset value is used for indicating a storage position of a last written object in the first message queue; the target exchange task is used for exchanging objects between a source end and a destination end through the first message queue;
in response to detecting that the target exchange task is over, reading a second offset value from the first message queue; the second offset value is used for indicating the storage position of the end object in the target exchange task in the first message queue;
and checking the objects matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value to obtain a checking result.
2. The method of claim 1, wherein the object is business data, the source is a source database, and the destination is a destination database;
and reconciling the objects matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value, wherein the reconciliation comprises:
responding to the task type of the target exchange task as full-scale exchange, and extracting a first business data stream from the first message queue according to the first offset value and the second offset value; wherein, the offset value corresponding to each first service data in the first service data stream is between the first offset value and the second offset value;
extracting a second service data stream from the destination database; wherein the second service data stream comprises all second service data in the target database;
and carrying out data reconciliation on the first service data stream and the second service data stream to obtain a reconciliation result.
3. The method of claim 2, wherein reconciling the objects in the source and destination that match the target swap task based on the first offset value and the second offset value further comprises:
Extracting a third service data stream from the source database; wherein the third service data stream comprises all third service data in the source database;
and carrying out data reconciliation on the first service data stream and the third service data stream to obtain a reconciliation result.
4. The method of claim 1, wherein the object is business data, the source is a source database, and the destination is a destination database;
and reconciling the objects matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value, wherein the reconciliation comprises:
responding to the task type of the target exchange task as increment exchange, and inquiring a first service time stamp of service data matched with the first offset value from the first message queue according to the first offset value;
inquiring a second service time stamp of service data matched with the second offset value from the first message queue according to the second offset value;
extracting a fourth service data stream from the source database and a fifth service data stream from the destination database according to the first service time stamp and the second service time stamp;
Performing data reconciliation on the fourth service data stream and the fifth service data stream to obtain a reconciliation result;
wherein the service time stamp of each fourth service data in the fourth service data stream is between the first service time stamp and the second service time stamp, and the service time stamp of each fifth service data in the fifth service data stream is between the first service time stamp and the second service time stamp.
5. The method of claim 1, wherein the object is business data, the source is a source database, and the destination is a destination database;
the method further comprises the steps of:
responsive to the target switching task not ending, extracting a sixth traffic data stream from the first message queue according to the first offset value; wherein, the offset value corresponding to each sixth service data in the sixth service data stream is greater than the first offset value;
extracting a seventh service data stream from the second message queue according to the third offset value; the third offset value is used for indicating a storage position of last written service data in the second message queue before the target exchange task is started, and each seventh service data in the seventh service data flow is updated service data obtained from the target database in response to the detection of the target exchange task;
And carrying out data reconciliation on the sixth business data stream and the seventh business data stream to obtain a reconciliation result.
6. The method according to any one of claims 2-5, wherein the data reconciling the first and second traffic data flows to obtain a reconciliation result comprises:
intercepting the first service data stream and the second service data stream by adopting a first sliding window with a first set time length every time a set period is reached; the first sliding window slides in a second set time length;
and checking the intercepted first business data and second business data in the first sliding window to obtain a checking result.
7. The method of claim 6, wherein reconciling the intercepted first traffic data and second traffic data within the first sliding window to obtain a reconciliation result comprises:
establishing an association relationship between first service data and second service data matched with the data identification in the first sliding window;
comparing the service fields of the first service data and the second service data with the association relationship, and comparing the service time stamps of the first service data and the second service data with the association relationship;
Generating a first reconciliation result according to the first service data and the second service data with the association relationship when the service fields of the first service data and the second service data with the association relationship are not matched and/or the service time stamps of the first service data and the second service data with the association relationship are not matched;
the first reconciliation result is used for indicating that the second business data with the association relationship in the destination database is not matched with the first business data with the association relationship in the source database.
8. The method of claim 7, wherein reconciling the intercepted first traffic data and second traffic data within the first sliding window to obtain a reconciliation result, further comprises:
generating a second reconciliation result according to the first business data which does not establish the association relationship in response to the first business data which does not establish the association relationship in the first sliding window;
responding to second business data which does not establish an association relationship in the first sliding window, and generating a third reconciliation result according to the second business data which does not establish the association relationship;
The second checking result is used for indicating that the first business data which does not establish the association relationship is absent in the target database;
and the third checking result is used for indicating that the second business data which does not establish the association relation is more than the destination database.
9. The method of claim 8, wherein the method further comprises:
updating the second business data with the association relation in the target database according to the first business data with the association relation based on the first account checking result;
and/or the number of the groups of groups,
writing the first business data which does not establish the association relation into the target database based on the second account checking result;
and/or the number of the groups of groups,
and deleting the second business data which does not establish the association relationship in the target database based on the third reconciliation result.
10. The method of any of claims 1-5, wherein the target exchange task comprises a first exchange task and a second exchange task;
in response to detecting the target swap task, the method further comprises:
the first exchange task is sent to a first exchange node corresponding to the source end, wherein the first exchange task is used for the first exchange node to acquire an object matched with the first exchange task from the source end, sort the objects matched with the first exchange task according to the timestamp of the object matched with the first exchange task, and sequentially write the sorted objects into the first message queue;
Sending the second switching task to a second switching node corresponding to the destination end, wherein the second switching task is used for the second switching node to read an object matched with the first switching task from the first message queue and write the object into the destination end;
the reading a second offset value from the first message queue in response to detecting the end of the target exchange task, comprising:
a second offset value is read from the first message queue in response to detecting that the first switching task and/or the second switching task is finished.
11. The method of claim 1, wherein the object is a file, the file is written into the first message queue through a plurality of messages, the source terminal is a source file server, and the destination terminal is a destination file server;
and reconciling the objects matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value, wherein the reconciliation comprises:
extracting a first message stream from the first message queue according to the first offset value and the second offset value; wherein, the offset value corresponding to each message in the first message stream is between the first offset value and the second offset value;
Generating a second message stream according to the file written in the destination file server;
and checking the first message stream and the second message stream to obtain a checking result.
12. The method of claim 11, wherein the reconciling the first message flow and the second message flow to obtain a reconciliation result comprises:
combining a plurality of messages belonging to the same file in the first message stream, and performing hash operation to obtain a first hash value of the same file;
generating a first message according to the file name, the first hash value and the file timestamp corresponding to the same file;
ordering the first messages according to the file time stamps corresponding to the first messages to obtain updated first message streams;
and checking the updated first message stream and the updated second message stream to obtain a checking result.
13. The method of claim 11, wherein generating a second message stream from the file written in the destination file server comprises:
extracting all target files from the target file server, and carrying out hash operation on each target file to obtain a second hash value of each target file;
Generating a second message according to a file name, a second hash value and a file timestamp corresponding to any target file aiming at the any target file;
and ordering the second messages according to the file time stamp corresponding to the second messages so as to obtain the second message stream.
14. The method of claim 11, wherein generating a second message stream from the file written in the destination file server comprises:
acquiring a fourth offset value and a fifth offset value, wherein the fourth offset value is used for indicating the storage position of the last written message in a third message queue before the target exchange task is started, and the fifth offset value is used for indicating the storage position of the last written message in the third message queue when the target exchange task is ended;
extracting an initial message stream from the third message queue according to the fourth offset value and the fifth offset value, wherein each message in the initial message stream is generated according to a file written in the destination file server;
combining a plurality of messages belonging to the same file in the initial message stream, and performing hash operation to obtain a second hash value of the same file;
Generating a second message according to the file name, the second hash value and the file timestamp corresponding to the same file;
and ordering the second messages according to the file time stamp corresponding to the second messages so as to obtain the second message stream.
15. The method of any of claims 12-14, wherein reconciling the updated first message flow and the second message flow to obtain a reconciliation result comprises:
intercepting the updated first message stream and the updated second message stream by adopting a second sliding window with a third set time length every time a set period is reached; the second sliding window slides in a fourth set time length;
establishing an association relationship between the first message and the second message with the matched file names in the second sliding window;
comparing the first hash value of the first message with the association relationship with the second hash value of the second message, and comparing the file timestamp of the first message with the association relationship with the file timestamp of the second message;
generating a fourth reconciliation result according to the first message and the second message with the association relationship when the first hash value of the first message and the second hash value of the second message with the association relationship are not matched and/or the file timestamp of the first message with the association relationship and the file timestamp of the second message are not matched;
And the fourth reconciliation result is used for indicating that the file corresponding to the file name in the second message with the association relationship in the destination file server is not matched with the file corresponding to the file name in the first message with the association relationship in the source file server.
16. The method of claim 15, wherein reconciling the updated first message stream and the second message stream to obtain a reconciliation result, further comprises:
responding to a first message which does not establish an association relationship in the second sliding window, and generating a fifth reconciliation result according to the first message which does not establish the association relationship;
responding to a second message which does not establish an association relationship in the second sliding window, and generating a sixth reconciliation result according to the second message which does not establish the association relationship;
the fifth account checking result is used for indicating that a file corresponding to the file name in the first message which does not establish the association relation is absent in the target file server;
and the sixth reconciliation result is used for indicating that the destination file server is provided with more files corresponding to the file names in the second message which does not establish the association relationship.
17. A data reconciliation apparatus, the apparatus comprising:
the first reading module is used for responding to the detection of the target exchange task and reading a first offset value of the first message queue; the first offset value is used for indicating a storage position of a last written object in the first message queue; the target exchange task is used for exchanging objects between a source end and a destination end through the first message queue;
a second reading module, configured to read a second offset value from the first message queue in response to detecting that the target exchange task ends; the second offset value is used for indicating the storage position of the end object in the target exchange task in the first message queue;
and the account checking module is used for checking the objects matched with the target exchange task in the source end and the destination end according to the first offset value and the second offset value so as to obtain an account checking result.
18. An electronic device, comprising:
memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the data reconciliation method of any one of claims 1-16 when the program is executed.
19. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the data reconciliation method of any of claims 1-16.
CN202310495829.2A 2023-04-27 2023-04-27 Data checking method and device, electronic equipment and storage medium Pending CN116628056A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310495829.2A CN116628056A (en) 2023-04-27 2023-04-27 Data checking method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310495829.2A CN116628056A (en) 2023-04-27 2023-04-27 Data checking method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116628056A true CN116628056A (en) 2023-08-22

Family

ID=87620367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310495829.2A Pending CN116628056A (en) 2023-04-27 2023-04-27 Data checking method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116628056A (en)

Similar Documents

Publication Publication Date Title
CN106878473B (en) Message processing method, server cluster and system
CN110321387B (en) Data synchronization method, equipment and terminal equipment
CN109284073B (en) Data storage method, device, system, server, control node and medium
CN110377577B (en) Data synchronization method, device, system and computer readable storage medium
CN111125260A (en) Data synchronization method and system based on SQL Server
CN109542330B (en) Data storage method, data query method and device
CN113360456B (en) Data archiving method, device, equipment and storage medium
CN113326165B (en) Data processing method and device based on block chain and computer readable storage medium
JP4111881B2 (en) Data synchronization control device, data synchronization control method, and data synchronization control program
CN110390082B (en) Communication matrix comparison method and system
CN112486915A (en) Data storage method and device
CN111831954B (en) Content data updating method, device, computer equipment and storage medium
CN116628056A (en) Data checking method and device, electronic equipment and storage medium
CN114531450B (en) Height-based blockchain peer-to-peer network data synchronization method
CN105204776A (en) Data processing method and device
CN114936095A (en) Partition expansion and reduction method and system
CN112258184B (en) Method, apparatus, electronic device and readable storage medium for freezing blockchain network
CN114579416A (en) Index determination method, device, server and medium
CN113672776A (en) Fault analysis method and device
CN112053150A (en) Data processing method, device and storage medium
CN109325057B (en) Middleware management method, device, computer equipment and storage medium
CN113965489B (en) Link timeout detection method, device, computer equipment and storage medium
CN111552667B (en) Data deleting method and device and electronic equipment
CN116662603B (en) Time shaft control method and system based on kafka, electronic equipment and storage medium
CN114979179B (en) Message processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination