CN117951144A - Data synchronization verification method and device, electronic equipment and storage medium - Google Patents

Data synchronization verification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117951144A
CN117951144A CN202311733894.0A CN202311733894A CN117951144A CN 117951144 A CN117951144 A CN 117951144A CN 202311733894 A CN202311733894 A CN 202311733894A CN 117951144 A CN117951144 A CN 117951144A
Authority
CN
China
Prior art keywords
data
characteristic information
current batch
synchronization
consistent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311733894.0A
Other languages
Chinese (zh)
Inventor
崔崧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN202311733894.0A priority Critical patent/CN117951144A/en
Publication of CN117951144A publication Critical patent/CN117951144A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data synchronization verification method and device, electronic equipment and storage medium, wherein the method comprises the following steps: receiving first characteristic information of first data sent by a production end and second characteristic information of second data sent by a consumption end, wherein the first data are data sent by the production end to the consumption end in a data synchronization process, the second data are data received by the consumption end in the data synchronization process, and the data synchronization process is used for synchronizing the data in the production end to the consumption end; determining whether the first data of the current batch is consistent with the second data according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch; and under the condition that the first data and the second data of the current batch are consistent, determining that the data synchronous check result of the current batch is check passing. According to the embodiment of the application, the efficiency of data synchronization verification can be improved.

Description

Data synchronization verification method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for data synchronization verification, a system, an electronic device, and a storage medium.
Background
With the rapid development of the internet, a large amount of heterogeneous data source data is generated every day, and thousands of data synchronization tasks follow. If the data synchronization tasks have the phenomenon of data loss in the synchronization process, the data analysis after synchronization is influenced, and the secondary production of production data is influenced. How to accurately verify the quality of data synchronization with fewer resources has become a problem to be solved.
Disclosure of Invention
The application provides a data synchronization verification method and device, a data synchronization system, electronic equipment and a computer readable storage medium, which can realize accurate verification of data synchronization with fewer resources.
In a first aspect, the present application provides a method for checking data synchronization, the method comprising:
Receiving first characteristic information of first data sent by a production end and second characteristic information of second data sent by a consumption end, wherein the first data is data sent by the production end to the consumption end in a data synchronization process, the second data is data received by the consumption end in the data synchronization process, and the data synchronization process is used for synchronizing the data in the production end to the consumption end;
Determining whether the first data of the current batch is consistent with the second data according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch;
and under the condition that the first data and the second data of the current batch are consistent, determining that the data synchronous check result of the current batch is check passing.
In a second aspect, the present application provides a data synchronization verification apparatus, the apparatus comprising:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving first characteristic information of first data sent by a production end and second characteristic information of second data sent by a consumption end, wherein the first data is data sent by the production end to the consumption end in a data synchronization process, the second data is data received by the consumption end in the data synchronization process, and the data synchronization process is used for synchronizing the data in the production end to the consumption end;
the consistency determining module is used for determining whether the first data of the current batch are consistent with the second data according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch;
And the verification result determining module is used for determining that the data synchronous verification result of the current batch is verification passing under the condition that the first data and the second data of the current batch are consistent.
In a third aspect, the present application provides an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, one or more of the computer programs being executable by the at least one processor to enable the at least one processor to perform the data synchronization verification method described above.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the data synchronization verification method described above.
According to the embodiment provided by the application, the characteristic information of the first data sent by the production end and the characteristic information of the second data received by the consumption end can be obtained in the data synchronization process of the production end and the consumption end, the data synchronization verification is carried out according to the characteristic information of the first data and the characteristic information of the second data, and whether the first data in the production end are completely synchronized to the second data of the consumption end is verified by determining whether the first data are consistent with the second data. The characteristic information is part of the information in the data and can be used for representing important characteristics of the data, so that the complexity of synchronous verification can be reduced, resources occupied in the verification process can be reduced, in addition, the data synchronous verification process is performed according to data batches, and the first data sent by the production end and the second data sent by the consumption end are verified in batches, so that the data processing amount of each verification is reduced, and the verification efficiency is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, serve to explain the application. The above and other features and advantages will become more readily apparent to those skilled in the art by describing in detail exemplary embodiments with reference to the attached drawings, in which:
FIG. 1 is a flowchart of a data synchronization verification method according to an embodiment of the present application;
fig. 2 is a schematic diagram of an application scenario of a data synchronization verification method according to an embodiment of the present application;
fig. 3 is a schematic diagram of an application scenario of a data synchronization verification method according to an embodiment of the present application;
FIG. 4 is a block diagram of a data synchronization verification device according to an embodiment of the present application;
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For a better understanding of the technical solutions of the present application, the following description of exemplary embodiments of the present application is made with reference to the accompanying drawings, in which various details of embodiments of the present application are included to facilitate understanding, and they should be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The embodiments of the application and features of the embodiments may be combined with each other without conflict.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present application and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
A data loss phenomenon may occur in the data synchronization process, which may have an adverse effect on the data analysis after synchronization. In the related art, one part of synchronous tasks lack a task for checking the quality of data synchronization, and the other part of synchronous tasks are configured with a synchronous checking task to judge whether the phenomenon of data loss occurs or the data loss is checked artificially. For example, the synchronization verification task may be a task that performs synchronization verification at a single data level, and performing such synchronization verification on a large amount of heterogeneous data source data has problems of long time consumption, excessive resource occupation, large delay, low verification efficiency, and the like. The manual verification also causes a great deal of labor cost and large delay, and the data synchronization is carried out every day, so that a great deal of repeatability is brought. Whether the synchronization verification process is performed after the data synchronization is lost or the data loss is delayed, incomplete data is used for data analysis in an uncertain time period, and the accuracy of the data analysis result is affected.
In order to solve the technical problems, the embodiment of the application provides a data synchronization verification method and device, electronic equipment and a computer readable storage medium, which have the characteristics of short time consumption, less occupied resources, small delay, high verification efficiency, reduced labor cost and the like, so that the accurate verification of the data synchronization quality can be realized by less resources.
The data synchronization verification method according to the embodiment of the application can be applied to a verification terminal, that is, the method is executed by electronic Equipment such as terminal Equipment or a server corresponding to the verification terminal, where the terminal Equipment can be vehicle-mounted Equipment, user Equipment (UE), mobile Equipment, user terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like, and the server can be an independent physical server, a server cluster formed by a plurality of physical servers, or a cloud server capable of performing cloud computing. The method may be implemented by a processor invoking computer readable program instructions stored in memory. Or the method may be performed by a server.
Fig. 1 is a flowchart of a data synchronization verification method according to an embodiment of the present application. The method is applied to a verification terminal, and the verification terminal is used for implementing each step in the data synchronization method. Referring to fig. 1, the method includes:
In step S11, first characteristic information of first data sent by a production end and second characteristic information of second data sent by a consumption end are received, wherein the first data is data sent by the production end to the consumption end in a data synchronization process, the second data is data received by the consumption end in the data synchronization process, and the data synchronization process is used for synchronizing the data in the production end to the consumption end.
For example, the data synchronization process may include an execution process that synchronizes data in the production side to a data synchronization task in the consumption side. The data synchronization process may be a near real-time synchronization process, for example, it may refer to splitting a synchronization task that is periodically executed once into a plurality of sub-synchronization tasks, where a length of a time interval for executing the task of each sub-synchronization task is smaller than a length of a time interval for periodically executing the synchronization task once, and a sum of the time intervals of the respective sub-synchronization tasks is equal to the time interval for periodically executing the synchronization task once. For example, a synchronization task that is performed once a day may be split into a plurality of sub-synchronization tasks for short periods of time, which may be 30 minutes or 1 hour in length, for example. Therefore, the method not only can advance the completion time of data synchronization, but also can facilitate a service party to rapidly analyze quasi-real-time synchronous data, and can save peak resources of a synchronous task. However, in the near real-time data synchronization scenario, a high requirement is also put on how to guarantee the synchronization quality of the data.
For example, the data synchronization process may be data synchronization of the database management system with the data lake, such as based on the computing engine implementing the data synchronization of the database management system with the data lake. The embodiment of the application can be applied to a data synchronization application scene based on APACHE SPARK (Spark) synchronization MySQL writing Hudi. The Spark is a fast and general computing engine designed for large-scale data processing, mySQL is a relational database management system, hudi is a scanning optimized data storage abstraction aiming at analysis type service, can enable a data set to support change in minute-level time delay, also supports incremental processing of the data set by a downstream system, and can be used in offline and quasi-real-time application scenes. When the embodiment of the application is applied to the data synchronization scene, the synchronization verification of each batch is completed in time, so that the synchronization quality of the data can be ensured, and the stability of Hudi in large-scale use can be ensured. It should be noted that the embodiment of the present application may be applied to any data synchronization scenario, and is not limited to the application scenario of the above-described quasi-real-time MySQL synchronization Hudi.
In the embodiment of the application, the production end can comprise a system for collecting and sending data in a data synchronization task, for example, the production end can provide a service for collecting MySQL and synchronize the collected first data to the consumption end. The consumer may include a system for receiving, storing data in a data synchronization task, e.g., the consumer may provide a service to store Hive, a data warehouse tool that may be used to perform data extraction, conversion, loading, etc. The data synchronization may be that the production end directly sends the collected first data to the consumption end, so that the consumption end receives and stores the collected first data as second data, or the production end sends the collected first data to the intermediate end, and the intermediate end synchronizes the first data to the consumption end. For example, the canaal-based production end may collect MySQL data and send the collected first data to the Kafka-based intermediate end, which sends the data from the production end to the Spark-based consumption end, which may store the received data to Hudi. The Canal is a middleware for providing incremental data subscription and consumption based on database incremental log analysis, and can support analysis of MySQL logical logs, and can be used for processing obtained data after analysis is completed. Kafka is a high throughput distributed publish-subscribe messaging system that can be used to process action flow data.
It should be noted that, the production end and the consumption end of the embodiment of the present application may be any type of two ends for performing the data synchronization task, and the present application is not limited to the data synchronization manner between the production end and the consumption end. For ease of understanding, the following description will take the production side to read MySQL and synchronize the read first data to the consumption side via the intermediate side Kafka, and the consumption side stores the received second data to Hudi as an example.
In the embodiment of the application, the first characteristic information of the first data is various information which is obtained by statistics in the self service range and can characterize the first data sent to the consumer in the data synchronization of the production end, and the first characteristic information can be used for synchronous verification. For example, the production end may perform statistics based on various types of information of the first data that is currently read periodically or aperiodically, for example, may perform statistics on message generation of each synchronization table, determine a first generation time of each first data, and so on. The production end can send the first characteristic information of the first data obtained by statistics to the verification end at regular time or under the condition that the information of a synchronous table is obtained by statistics.
Wherein the first characteristic information may include at least one of a first synchronization type, a first data amount, first primary key information, and a first generation time of each piece of first data. It should be understood that, the person skilled in the art may set the first feature information according to the actual situation, as long as the first feature information can represent various types of information of the first data sent to the consumer, and the present application does not limit the manner in which the production end determines and sends the first feature information, the expression form and content of the first feature information, and so on.
In the embodiment of the application, the production end can print any log information related to the data transmission condition, the log information can be used for detecting the transmission condition of each piece of first data, and the log information can be used for carrying out exception analysis on missing data and positioning the cause of data synchronization errors, for example, positioning the operation causing the data synchronization errors and the like under the condition that the data synchronization verification result of the current batch is determined to be that verification is not passed. Illustratively, the log information may include first primary key information of each piece of first data, arbitrary point-in-time information related to the first data (e.g., a first generation time of the first data, etc.), various types of point location information generating interactions with the intermediate end, and the like.
In the embodiment of the application, the second characteristic information of the second data is various information which is obtained by statistics in the self service range of the consumer in the data synchronization and can characterize the second data received from the production end, and the second characteristic information can be used for carrying out synchronization verification. For example, the consumer may periodically or aperiodically perform statistics based on various types of information of the received second data under the condition of successfully writing into the data lake, determine second characteristic information of the second data, and send the second characteristic information of the second data to the verifier. Wherein the second characteristic information may include at least one of a second synchronization type, a second data amount, second primary key information, and a second generation time of each piece of second data. For any piece of second data, the second generation time of the second piece of second data at the consumption end can be the first generation time of the second piece of second data at the production end. It should be understood that, the person skilled in the art may set the second feature information according to the actual situation, and the present application is not limited to the manner in which the consumer determines and sends the second feature information, the representation form and content of the second feature information, and so on.
In the embodiment of the application, the consumption end can print the corresponding log information, and the log information can be used for carrying out exception analysis on the missing data and positioning the reason of the data synchronization error. The log information may include at least second primary key information of each piece of second data, for example.
In the embodiment of the application, the data synchronization can be realized, the first characteristic information of the first data and the second characteristic information of the second data in the synchronization process can be counted by the production end and the consumption end respectively and sent to the verification end, and the verification end performs data summarization based on the received first characteristic information of the first data of the production end and the received second characteristic information of the second data of the consumption end, for example, summarization is performed according to batches, and synchronization verification is performed under the condition that the first characteristic information of the first data and the second characteristic information of the second data of the current batch are obtained through summarization.
In step S12, it is determined whether the first data of the current lot is consistent with the second data according to the first characteristic information of the first data of the current lot and the second characteristic information of the second data of the current lot.
The first data of the current batch is part of the data in all the first data sent by the production end, and optionally, the first data sent by the production end can be divided into a plurality of batches according to the information such as the production time, the data quantity and the like of the first data, so that the data synchronous verification can be performed on the first data and the second data of each batch. The second data of the current batch may be understood as data synchronized to the consumer side in the first data of the current batch.
Whether the first data and the second data of the current batch are consistent can be determined according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch, for example, the first characteristic information is the data amount and the primary key information of the first data, the second characteristic information is the data amount and the primary key information of the second data, and whether the first data and the second data of the current batch are consistent is determined according to whether the data amount and the data amount of the second data are consistent and whether the primary key information and the primary key information of the first data and the primary key information of the second data are consistent.
In step S13, in the case where the first data and the second data of the current lot are consistent, it is determined that the data synchronization verification result of the current lot is verification passing.
If the first data and the second data of the current batch are consistent, the data synchronization verification result of the current batch can be determined to be verification passing. Otherwise, if the first data and the second data of the current batch are inconsistent, the verification end can determine that the data synchronous verification result of the current batch is that the verification is not passed; and further, the missing data can be determined and the missing data can be subjected to abnormal analysis.
According to the embodiment of the application, the characteristic information of the first data sent by the production end and the characteristic information of the second data received by the consumption end can be obtained in the data synchronization process of the production end and the consumption end, the data synchronization verification is carried out according to the characteristic information of the first data and the characteristic information of the second data, and whether the first data in the production end is completely synchronized to the second data of the consumption end is verified by determining whether the first data are consistent with the second data. The characteristic information is part of the information in the data and can be used for representing important characteristics of the data, so that the complexity of synchronous verification can be reduced, resources occupied in the verification process can be reduced, in addition, the data synchronous verification process is performed according to data batches, and the first data sent by the production end and the second data sent by the consumption end are verified in batches, so that the data processing amount of each verification is reduced, and the verification efficiency is improved
According to the embodiment of the application, the characteristic information of the first data and the second data of the current batch can be determined more timely through the received first characteristic information of the first data and the second characteristic information of the second data; and according to whether the characteristic information is consistent, the synchronous verification is carried out, and the lightweight synchronous verification is carried out according to batches, so that the heavy-weight verification task of a single data level is not required to be executed, and the resource occupation amount can be reduced. And moreover, the lightweight synchronous verification is carried out according to batches, so that the data synchronization condition can be judged in real time, and compared with a synchronous verification mode of spot check, the accuracy of the obtained synchronous verification result is higher, and the method is more convincing. According to the embodiment of the application, the time consumption of verification can be reduced, the verification delay is reduced, the verification efficiency is effectively improved, and the accuracy of the verification result is improved, so that the accurate verification of the data synchronization can be realized in time with fewer resources.
The data synchronization verification method according to the embodiment of the application is explained below.
In some possible implementations, the verification terminal may perform synchronization verification according to the first characteristic information of the first data and the second characteristic information of the second data of the batch when it is determined that there is one batch without synchronization delay and the first characteristic information of the first data and the second characteristic information of the second data of the batch are already collected completely.
For example, the data generation speed of the production end is fluctuating, for example, in some time periods, the data generation speed of the production end is smaller, and the consumption end can receive and store the data in time, i.e. the synchronization delay of the consumption end is smaller. In certain time periods, the data generation speed of the production end is high, and the synchronization delay of the consumption end is high under the condition that the consumption end cannot timely receive and store a large amount of data. Illustratively, the production side generates a large amount of first data at1 o ' clock, the consumption side may be able to process completely at 3 o ' clock, in which case it may be determined that there is one lot without synchronization delay, and before 3 o ' clock, it may be determined that there is no one lot without synchronization delay. As described above, the first characteristic information of the first data and the second characteristic information of the second data are used for performing the synchronization verification, and when it is determined that there is a current lot having no synchronization delay and the first characteristic information of the first data and the second characteristic information of the second data of the current lot are collected completely, the synchronization verification may be performed according to the first characteristic information of the first data and the second characteristic information of the second data of the current lot.
In the embodiment of the application, the verification is performed when the fact that the batch without synchronous delay exists and the first characteristic information of the first data and the second characteristic information of the second data for different verification are collected completely is determined, so that the data for verification can be ensured to be accurate and complete, and the accuracy of the verification is ensured. It should be noted that, the data read by the production end is the average amount of all the days, the uneven all the days and the timing generation, and the embodiment of the application does not waste much resources to ensure low delay of all the sub-synchronous tasks.
In the embodiment of the application, the batches can be obtained by time division, and in some optional application scenarios, the batches can also be obtained by division according to service types, for example, the synchronous data comprises data of a plurality of service types, each service type can be a batch, as long as the data range included in each batch can be defined, and the application does not limit the division mode of the batches.
In some possible implementations, the batches may be partitioned by time intervals, for example, in time intervals on the order of hours, where the time intervals of different batches may be the same or different in length. For example, the time intervals of different batches may be summarized once every hour, or the time intervals of different batches may be set to be different, and the present application is not limited to the time intervals of the batches. By the method, the verification frequency of synchronous verification can be flexibly adjusted by setting the lengths of time intervals of different batches according to the characteristics of service scenes, and verification resources can be effectively saved.
In some possible implementations, the first characteristic information includes a first generation time of each piece of first data, and the second characteristic information includes a second generation time of each piece of second data. The first generation time of the first characteristic information and the second generation time of the second characteristic information of the second data of the same batch are in the same time interval.
As previously described, for any piece of data, the second generation time at the consumer side may be the first generation time of the data at the producer side. When the batches are divided by time intervals, the first generation time of the first feature information and the second generation time of the second feature information of the same batch are in the same time interval.
By the method, the check end can simply and accurately determine the data range and the data consumption condition of the same batch, so that the check end can conveniently judge whether the condition of executing the data check of the order of magnitude is met. For example, it may be determined whether a condition for performing synchronization verification on data synchronization of the current lot is satisfied based on the second generation time of the received second feature information. For example, it may be determined whether there is a current lot without a synchronization delay and whether the first and second characteristic information of the current lot have been collected intact based on the second generation time of the received second characteristic information to achieve the synchronization check.
In some possible implementations, the method further includes:
Under the condition that the second generation time of the received second characteristic information exceeds a time interval corresponding to the current batch, determining that the first characteristic information and the second characteristic information of the current batch are collected completely;
And determining the first characteristic information of the first data and the second characteristic information of the second data of the current batch according to the first generation time and the second generation time.
For example, in the data synchronization process, the verification end is always receiving the characteristic information sent by the production end and the consumption end, and may receive the characteristic information of multiple pieces of data at one time. The time interval of the current batch is set to be 14 to 15 points, namely the production time of the data at the production end is set to be between 14 to 15 points. If the second generation time of the second characteristic information received by the verification terminal exceeds 15 points, for example, the earliest generation time in the second characteristic information received is 15 points 01, the first characteristic information and the second characteristic information of the current batch can be determined to be collected completely without data of an earlier time point. Further, the first characteristic information and the second characteristic information of the current lot may be summarized based on the received first generation time of each piece of the first characteristic information and the second generation time of each piece of the second characteristic information. For example, each of the first characteristic information including the first generation time at 14 to 15 points may be determined as the first characteristic information of the current lot, and each of the second characteristic information including the second generation time at 14 to 15 points may be determined as the second characteristic information of the current lot.
In some possible implementations, the same batch of data may be aggregated, updated, and stored in a record. Illustratively, the record may include a plurality of field information therein, for example, including: batch information, first data volume of each piece of first data which is synchronized in the current batch of the production end, synchronization type (including types such as data insertion, data update and data deletion), primary key information, first production time and the like; the second data amount of each piece of second data in the current batch of the consumption end, the synchronization type (including the types of data insertion, data update, data deletion and the like), the primary key information, the second generation time and the like.
In this way, under the condition that the second generation time of the second characteristic information exceeds the time interval, the verification end can determine that the first data of the production end of the current batch are synchronous and are received and stored at the consumption end, so that the first characteristic information and the second characteristic information of the current batch can be accurately judged to be completely collected, and the first characteristic information and the second characteristic information of the current batch are further determined to be synchronous and verified.
In some possible implementations, the second characteristic information of the second data is obtained by periodically or aperiodically summarizing the received second data by the consumer, where the second generation time for determining whether the first characteristic information and the second characteristic information of the current lot have been collected is earlier than other second generation times included in the second characteristic information. For example, the second characteristic information may include a plurality of second generation times, and in a case where an earliest second generation time of the plurality of second generation times has exceeded a time interval, it may be directly determined that the first characteristic information and the second characteristic information of the current lot have been collected completely.
By means of the method, when the earliest second generation time in the second generation times exceeds the time interval, it can be guaranteed that the second feature information of the follow-up second data including the second feature information does not include the data of the current batch, and therefore the fact that the first data of the production end of the current batch are synchronous and received and stored at the consumption end can be rapidly and accurately determined, probability of data omission is reduced, and accuracy of data verification is improved.
In the embodiment of the present application, in step S12, whether the first data of the current batch is consistent with the second data is determined according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch, which may be a specific way to determine the consistency judgment of the first data and the second data based on the content of the first characteristic information and the second characteristic information.
In the case where there are different sync types for data in the data synchronization process, the data consistency judgment may include a consistency judgment of the data amount and a consistency judgment of the sync type.
In some possible implementations, the first characteristic information includes a first synchronization type and a first data amount of each piece of first data, the second characteristic information includes a second synchronization type and a second data amount of each piece of second data, and the step S12 may include:
Determining whether a first data amount of first data of the current batch is consistent with a second data amount of second data;
Determining whether the first synchronization type of each piece of first data is consistent with the second synchronization type of each piece of second data under the condition that the first data amount is consistent with the second data amount;
and under the condition that the first synchronous type of each piece of first data is consistent with the second synchronous type of each piece of second data, determining that the first data of the current batch is consistent with the second data.
That is, the first data amount of each first data and the second data amount of each second data may be compared to determine whether the first data amount of the first data and the second data amount of the second data of the current lot are consistent; if the data quantity is consistent, the comparison of the synchronous types can be carried out, and whether the first synchronous type of each piece of first data is consistent with the second synchronous type of each piece of second data is respectively compared; if the synchronization types are consistent, determining that the first data of the current batch is consistent with the second data; further, in step S13, it is determined that the data synchronization verification result of the current lot is verification passing.
The resources occupied by synchronous verification can be reduced and the accuracy of the verification can be improved by the verification mode of judging the consistency of the total data quantity and the synchronous type.
In some possible implementations, the first characteristic information includes a first synchronization type and a first amount of data for each piece of first data, and the second characteristic information includes a second synchronization type and a second amount of data for each piece of second data; the synchronization type of the data synchronization process includes data insertion, data update, and data deletion. Step S12 may include:
determining the first data quantity of data insertion, the first data quantity of data update and the first data quantity of data deletion according to the first synchronization type of the first data of the current batch;
Determining the second data quantity of data insertion, the second data quantity of data update and the second data quantity of data deletion according to the second synchronous type of the second data of the current batch;
And if the first data quantity inserted by the data is consistent with the second data quantity inserted by the determined data, the first data quantity updated by the data is consistent with the second data quantity updated by the data, and the first data quantity deleted by the data is consistent with the second data quantity deleted by the data, determining that the first data and the second data of the current batch are consistent.
That is, the first data amount of the type data insertion, the first data amount of the type data update, and the first data amount of the type data deletion may be determined according to the first synchronization type of each piece of the first data in the first feature information, respectively; similarly, according to the second synchronous type of each piece of second data in the second characteristic information, respectively determining the second data quantity with the type of data insertion, the second data quantity with the type of data update and the second data quantity with the type of data deletion; further, the data quantity of the type data insertion, the data quantity of the type data update and the data quantity of the type data deletion are respectively compared; if the data quantity of each synchronization type is consistent, determining that the first data of the current batch is consistent with the second data; further, in step S13, it is determined that the data synchronization verification result of the current lot is verification passing.
The accuracy of synchronous verification can be improved by a verification mode that the data quantity of a plurality of synchronous types is respectively judged. For example, when the synchronization types are not distinguished and all the data amounts are identical for a single comparison, a case may occur in which the total amount of data amounts of the respective synchronization types are identical, however, the data amounts of the partial synchronization types are not identical. Under the condition, whether all data amounts are consistent or not is not distinguished by the synchronous type, and misjudgment is caused by single comparison, but the consistency judgment method provided by the embodiment of the application can give accurate results of inconsistent data amounts and improve the accuracy of consistency judgment of the data amounts.
In the embodiment of the present application, in the case where the content of the feature information includes the primary key information, the consistency judgment of the first data and the second data may further include judgment of the primary key information. The primary key information is a primary key (PRIMARY KEY) of the data, is one or more fields in the data, and is used for uniquely identifying a piece of data.
In some possible implementations, the first characteristic information further includes first primary key information of each piece of first data, and the second characteristic information further includes second primary key information of each piece of second data. Step S12 further includes:
Matching the first main key information of each piece of first data with the second main key information of each piece of second data;
If the matching is successful, the first data of the current batch is determined to be consistent with the second data.
That is, the first primary key information of each piece of first data and the second primary key information of each piece of second data may be matched; if the first primary key information of each piece of first data can be matched with the same second primary key information, the matching is determined to be successful, and the first data of the current batch can be determined to be consistent with the second data.
The accuracy of synchronous verification can be further improved through a verification mode of primary key matching.
In the embodiment of the present application, in the case where the content of the feature information includes the generation time of the data, the consistency judgment of the first data and the second data may further include judgment of the generation time.
In some possible implementations, the first characteristic information further includes a first generation time of each piece of first data, and the second characteristic information further includes a second generation time of each piece of second data. Step S12 further includes:
matching the first generation time of each piece of first data with the second generation time of each piece of second data;
If the matching is successful, the first data of the current batch is determined to be consistent with the second data.
That is, the first generation time of each piece of first data and the second generation time of each piece of second data may be matched; if the first generation time of each piece of first data can be matched with the same second generation time, the matching is determined to be successful, and the first data and the second data of the current batch can be determined to be consistent. Further, in step S13, it is determined that the data synchronization verification result of the current lot is verification passing.
The accuracy of synchronous verification can be further improved by generating a time matching verification mode.
The above description has been made for the synchronization verification method of determining the consistency of the total data amount and the synchronization type, determining the data amount of a plurality of synchronization types, and generating the time match, respectively, but it should be understood that, according to the actual situation, any of the above synchronization verification methods may be used alone or in combination by a person skilled in the art, which is not limited in this application.
In some possible implementations, the method further includes:
Under the condition that the first data and the second data of the current batch are inconsistent, determining that the data synchronous verification result of the current batch is that verification is not passed;
Determining third primary key information of missing data of the current batch according to the first primary key information of the first data of the current batch and the second primary key information of the second data of the current batch;
According to third primary key information of missing data of the current batch, acquiring log information of the missing data, wherein the log information is used for carrying out exception analysis on the missing data.
For example, if the first data and the second data of the current lot are inconsistent, the verification terminal may determine that the data synchronization verification result of the current lot is that the verification is failed. In the case that the verification is not passed, the verification end may perform an alarm operation and further analyze the batch that is not passed through the full-scale comparison method across the data sources. For example, the first primary key information of each piece of first data of the current batch is matched with the second primary key information of each piece of second data, and the third primary key information of the missing data in the current batch is determined. For example, a synchronous check task may be initiated autonomously, which reads MySQL, hive tables of a lot that fails to check, and determines third primary key information for missing data within the lot.
By the method, single data level comparison is not required for all synchronous data, and under the condition that verification is not passed, single data level comparison is performed for a batch which is not passed, so that the data quantity for comparison can be effectively reduced, the accuracy of synchronous verification is ensured, the comparison efficiency is improved, and the main key information of missing data in the batch is timely determined.
According to the third primary key information of the missing data of the current batch, the second data missing at the consumer end can be determined, and specifically, the first data corresponding to the second data at the production end is determined, so that the reason for occurrence of synchronization abnormality of the missing data is determined by acquiring log information of the missing data.
In some possible implementation manners, in a scenario that the first data of the production end is synchronized to the consumption end through the intermediate end, log information of the missing data at the sending end, the consumption end and the intermediate end respectively can be obtained based on third primary key information of the missing data in the current batch, so that the reason of the data loss condition in the data synchronization process is assisted to be positioned. The middle end is used for synchronizing first data of the production end to the consumption end, and the log information is used for carrying out exception analysis on missing data.
The method comprises the steps of designing a corresponding anomaly analysis algorithm, and determining an error stage of missing data according to log information; the log information may also be sent to a related person (e.g., an operation and maintenance person) responsible for exception analysis, where the related person may implement exception analysis of the missing data, which is not limited in this regard by the present application.
For example, in the scenario that the first data of the production end is synchronized to the consumption end by the intermediate end, the error stage that causes missing data may be any stage that the production end reads data into the storage data of the consumption end, for example, a stage that the production end reads data, a stage that the production end sends the first data to the intermediate end, a stage that the intermediate end processes the data, a stage that the consumption end obtains data from the intermediate end, a stage that the consumption end stores the data, and so on. The log information may be used to record any data processing record at each end, e.g., various operations on the data. In the event of missing data, an anomaly analysis may be performed based on the log information within the batch to determine the particular error stage that caused the missing data to occur. For example, based on log information of missing data, it is determined at which stage an abnormality occurs in the missing data, and an operation related to the missing data in the specific error stage is determined to locate the cause of the error in the missing data. The application is not limited to a specific way of locating the cause of the error based on the log information.
According to the embodiment of the application, the relevant information for carrying out the lightweight synchronous verification is buried in the data synchronization process, and the synchronous verification of the synchronous data can be carried out timely and efficiently by collecting, counting and alarming at the verification end. Aiming at the batch which is inconsistent in data quantity and is determined to not pass the synchronous verification, the embodiment of the application supports the check comparison of the heavyweight level of a single data level, and can assist in positioning the root cause of the lost data in the data synchronization process according to the main key information of the missing data and the log information of the batch, and the stored data can also be used as a report data source for processing a synchronous task execution report and sent to a user of the synchronous task.
Fig. 2 is a schematic diagram of an application scenario of a data synchronization verification method according to an embodiment of the present application. Fig. 3 is a schematic diagram of an application scenario of a data synchronization verification method according to an embodiment of the present application. For ease of understanding, the data synchronization verification process of the embodiment of the present application will be described in the following with reference to fig. 2 and 3.
As shown in fig. 2, the production end reads the first data (i.e. MySQL data) and appends Hudi related parameters, and synchronizes the read first data to the consumption end based on the intermediate end, wherein the intermediate end Kafka creates a data information queue topic for synchronizing the data to the consumption end. The production end burial point is used for collecting first characteristic information for synchronous verification, for example, including a synchronous type, first generation time of first data and the like. The consumption end analyzes the MySQL related parameters and stores the parameters to Hudi so as to realize the consumption of the received second data; and the consumption end counts and gathers the synchronous type of the second data, the second generation time of the second data, point location information and the like. The method comprises the steps that a characteristic information queue topic used for synchronous verification is created by the intermediate terminal Kafka, and the characteristic information queue topic is used for sending first characteristic information of first data of a production terminal and second characteristic information of second data of a consumption terminal to a verification terminal. The verification terminal analyzes the received information, namely the first characteristic information of the first data and the second characteristic information of the second data; under the condition that the lightweight synchronous check condition (complete data collection) of one batch is met, analyzing the related data, summarizing the first data of the production end and the second data of the consumption end of the batch into one record, and carrying out synchronous check, namely judging whether the first data and the second data of the batch are consistent. Under the condition that the first data and the second data are consistent, determining that the data synchronization verification result of the batch is verification passing; and under the condition that the first data and the second data are inconsistent, determining that the data synchronization verification result of the batch is that the verification is not passed and alarming.
As shown in fig. 3, taking a time interval of one batch per hour as an example, for example, a 13 point to 14 point time interval is one batch, and a 14 point to 15 point time interval is one batch. The checking end can determine the completion time of the current task synchronous queue, and round up the completion time, wherein the obtained round up time is the whole hour number which is earlier than the completion time. For example, if the completion time is 15 points 03 minutes, the rounding up is 15 points. If the completion time is 14 points and 59 minutes, the completion time is rounded up to obtain 14 points. The verification terminal can judge whether a batch without synchronous delay exists or not based on the whole hour number obtained after upward rounding, and the detection of one batch is completed. Under the application scene, whether an undelayed hour interval exists or not can be judged, for example, 14 points are obtained by rounding upwards, it can be determined that data incompletion and data synchronization still exist in the time interval of the batch from 14 points to 15 points, no batch without synchronization delay exists currently, namely, no undelayed hour interval exists, synchronization verification is not performed, and the detection is skipped. For example, if 15 points are rounded up, it can be determined that the data synchronization is completed for all the data in the time interval from 14 points to 15 points, and there is currently a lot without synchronization delay, i.e. a time interval without delay. And the verification end executes the synchronous verification based on the time interval of the non-delay time zone, and determines that the data synchronous verification result of the batch passes the verification under the condition that the data are consistent, does not give an alarm, and ends the detection. Under the condition of inconsistent data, determining that the data synchronous verification result of the batch is that verification fails, giving an alarm, further starting a heavyweight synchronous verification task to determine the primary key information of the missing data in the batch, and storing corresponding log information to further check the reason of the data synchronous error.
It will be appreciated that the above-mentioned method embodiments of the present application can be combined with each other to form a combined embodiment without departing from the principle logic, and the present application is not repeated herein. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the application also provides a data synchronization verification device, a data synchronization system, an electronic device and a computer readable storage medium, which can be used for realizing any one of the data synchronization verification methods provided by the application, and corresponding technical schemes and descriptions and corresponding records of method parts are omitted.
Fig. 4 is a block diagram of a data synchronization verification device according to an embodiment of the present application.
Referring to fig. 4, an embodiment of the present application provides a data synchronization verification apparatus, including:
A receiving module 41, configured to receive first characteristic information of first data sent by a production end and second characteristic information of second data sent by a consumption end, where the first data is data sent by the production end to the consumption end in a data synchronization process, the second data is data received by the consumption end in the data synchronization process, and the data synchronization process is used to synchronize the data in the production end to the consumption end;
a consistency determining module 42, configured to determine whether the first data and the second data of the current batch are consistent according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch;
the verification result determining module 43 is configured to determine that the data synchronization verification result of the current batch is verification pass when the first data and the second data of the current batch are consistent.
In some possible implementations, the first characteristic information includes a first synchronization type and a first amount of data for each piece of first data, and the second characteristic information includes a second synchronization type and a second amount of data for each piece of second data;
A consistency determination module 42 for: determining whether a first data amount of first data of the current batch is consistent with a second data amount of second data; determining whether the first synchronization type of each piece of first data is consistent with the second synchronization type of each piece of second data under the condition that the first data amount is consistent with the second data amount; and under the condition that the first synchronous type of each piece of first data is consistent with the second synchronous type of each piece of second data, determining that the first data of the current batch is consistent with the second data.
In some possible implementations, the first characteristic information includes a first synchronization type and a first data amount of each piece of first data, the second characteristic information includes a second synchronization type and a second data amount of each piece of second data, and the synchronization type of the data synchronization process includes data insertion, data update, and data deletion;
A consistency determination module 42 for: determining the first data quantity of data insertion, the first data quantity of data update and the first data quantity of data deletion according to the first synchronization type of the first data of the current batch; determining the second data quantity of data insertion, the second data quantity of data update and the second data quantity of data deletion according to the second synchronous type of the second data of the current batch; and if the first data quantity inserted by the data is consistent with the second data quantity inserted by the determined data, the first data quantity updated by the data is consistent with the second data quantity updated by the data, and the first data quantity deleted by the data is consistent with the second data quantity deleted by the data, determining that the first data and the second data of the current batch are consistent.
In some possible implementations, the first feature information further includes first primary key information of each piece of first data, and the second feature information further includes second primary key information of each piece of second data, the primary key information being used to uniquely identify one piece of data;
A consistency determination module 42 for: matching the first main key information of each piece of first data with the second main key information of each piece of second data; if the matching is successful, the first data of the current batch is determined to be consistent with the second data.
In some possible implementations, the first characteristic information further includes a first generation time of each piece of first data, and the second characteristic information further includes a second generation time of each piece of second data;
a consistency determination module 42 for: matching the first generation time of each piece of first data with the second generation time of each piece of second data; if the matching is successful, the first data of the current batch is determined to be consistent with the second data.
In some possible implementations, the first characteristic information includes a first generation time of each piece of first data, and the second characteristic information further includes a second generation time of each piece of second data; the apparatus further comprises:
the collection determining module is used for determining that the first characteristic information and the second characteristic information of the current batch are collected completely under the condition that the second generation time of the received second characteristic information exceeds a time interval corresponding to the current batch;
The characteristic information determining module is used for determining the first characteristic information of the first data and the second characteristic information of the second data of the current batch according to the first generation time and the second generation time under the condition that the first characteristic information and the second characteristic information of the current batch are collected completely.
In some possible implementations, the apparatus further includes:
The verification result determining module is further used for determining that the data synchronous verification result of the current batch is that verification fails under the condition that the first data and the second data of the current batch are inconsistent;
The primary key determining module is used for determining third primary key information of missing data of the current batch according to the first primary key information of the first data of the current batch and the second primary key information of the second data of the current batch;
The log acquisition module is used for acquiring log information of the missing data according to third primary key information of the missing data of the current batch, and the log information is used for carrying out exception analysis on the missing data.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application.
Referring to fig. 5, an embodiment of the present application provides an electronic device including: at least one processor 701; at least one memory 702, and one or more I/O interfaces 703 connected between the processor 701 and the memory 702; the memory 702 stores one or more computer programs executable by the at least one processor 701, and the one or more computer programs are executed by the at least one processor 701 to enable the at least one processor 701 to perform the data synchronization verification method described above.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, wherein the computer program realizes the data synchronization verification method when being executed by a processor. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
Embodiments of the present application also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when executed in a processor of an electronic device, performs the above-described data synchronization verification method.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and may include any information delivery media.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present application may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present application are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will therefore be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the scope of the present application as set forth in the following claims.

Claims (10)

1. A method for data synchronization verification, the method comprising:
Receiving first characteristic information of first data sent by a production end and second characteristic information of second data sent by a consumption end, wherein the first data is data sent by the production end to the consumption end in a data synchronization process, the second data is data received by the consumption end in the data synchronization process, and the data synchronization process is used for synchronizing the data in the production end to the consumption end;
Determining whether the first data of the current batch is consistent with the second data according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch;
and under the condition that the first data and the second data of the current batch are consistent, determining that the data synchronous check result of the current batch is check passing.
2. The method of claim 1, wherein the first characteristic information comprises a first synchronization type and a first amount of data for each piece of first data, and the second characteristic information comprises a second synchronization type and a second amount of data for each piece of second data; the determining whether the first data and the second data of the current batch are consistent according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch comprises the following steps:
Determining whether a first data amount of first data of the current batch is consistent with a second data amount of second data;
Determining whether the first synchronization type of each piece of first data is consistent with the second synchronization type of each piece of second data under the condition that the first data amount is consistent with the second data amount;
and under the condition that the first synchronous type of each piece of first data is consistent with the second synchronous type of each piece of second data, determining that the first data of the current batch is consistent with the second data.
3. The method of claim 1, wherein the first characteristic information includes a first synchronization type and a first data amount of each piece of first data, the second characteristic information includes a second synchronization type and a second data amount of each piece of second data, and the synchronization type of the data synchronization process includes data insertion, data update, and data deletion;
The determining whether the first data and the second data of the current batch are consistent according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch comprises the following steps:
determining the first data quantity of data insertion, the first data quantity of data update and the first data quantity of data deletion according to the first synchronization type of the first data of the current batch;
Determining the second data quantity of data insertion, the second data quantity of data update and the second data quantity of data deletion according to the second synchronous type of the second data of the current batch;
And if the first data quantity inserted by the data is consistent with the second data quantity inserted by the determined data, the first data quantity updated by the data is consistent with the second data quantity updated by the data, and the first data quantity deleted by the data is consistent with the second data quantity deleted by the data, determining that the first data and the second data of the current batch are consistent.
4. A method according to claim 2 or 3, wherein the first characteristic information further comprises first primary key information of each piece of first data, the second characteristic information further comprises second primary key information of each piece of second data, the primary key information being for uniquely identifying one piece of data; the determining that the first data and the second data of the current batch are consistent includes:
Matching the first main key information of each piece of first data with the second main key information of each piece of second data;
If the matching is successful, the first data of the current batch is determined to be consistent with the second data.
5. A method according to claim 2 or 3, wherein the first characteristic information further comprises a first generation time of each piece of first data, and the second characteristic information further comprises a second generation time of each piece of second data; the determining that the first data and the second data of the current batch are consistent includes:
matching the first generation time of each piece of first data with the second generation time of each piece of second data;
If the matching is successful, the first data of the current batch is determined to be consistent with the second data.
6. The method of claim 1, wherein the first characteristic information includes a first generation time of each piece of first data, and the second characteristic information further includes a second generation time of each piece of second data; the method further comprises the steps of:
Under the condition that the second generation time of the received second characteristic information exceeds a time interval corresponding to the current batch, determining that the first characteristic information and the second characteristic information of the current batch are collected completely;
first data of which the first production time is within the time interval is determined as first data of a current lot, and second data of which the second production time is within the time interval is determined as second data of the current lot.
7. The method according to claim 1, wherein the method further comprises:
Under the condition that the first data and the second data of the current batch are inconsistent, determining that the data synchronous verification result of the current batch is that verification is not passed;
Determining third primary key information of missing data of the current batch according to the first primary key information of the first data of the current batch and the second primary key information of the second data of the current batch;
According to third primary key information of missing data of the current batch, acquiring log information of the missing data, wherein the log information is used for carrying out exception analysis on the missing data.
8. A data synchronization verification apparatus, the apparatus comprising:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving first characteristic information of first data sent by a production end and second characteristic information of second data sent by a consumption end, wherein the first data is data sent by the production end to the consumption end in a data synchronization process, the second data is data received by the consumption end in the data synchronization process, and the data synchronization process is used for synchronizing the data in the production end to the consumption end;
the consistency determining module is used for determining whether the first data of the current batch are consistent with the second data according to the first characteristic information of the first data of the current batch and the second characteristic information of the second data of the current batch;
And the verification result determining module is used for determining that the data synchronous verification result of the current batch is verification passing under the condition that the first data and the second data of the current batch are consistent.
9. An electronic device, comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the data synchronization verification method of any one of claims 1 to 7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data synchronization verification method according to any one of claims 1 to 7.
CN202311733894.0A 2023-12-15 2023-12-15 Data synchronization verification method and device, electronic equipment and storage medium Pending CN117951144A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311733894.0A CN117951144A (en) 2023-12-15 2023-12-15 Data synchronization verification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311733894.0A CN117951144A (en) 2023-12-15 2023-12-15 Data synchronization verification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117951144A true CN117951144A (en) 2024-04-30

Family

ID=90802272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311733894.0A Pending CN117951144A (en) 2023-12-15 2023-12-15 Data synchronization verification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117951144A (en)

Similar Documents

Publication Publication Date Title
CN110321387B (en) Data synchronization method, equipment and terminal equipment
CN107832196B (en) Monitoring device and monitoring method for abnormal content of real-time log
CN111008246B (en) Database log synchronization method, device, computer equipment and readable storage medium
CN105446706B (en) Method and device for evaluating form page use effect and providing original data
CN111563041B (en) Test case on-demand accurate execution method
CN112559475B (en) Data real-time capturing and transmitting method and system
CN110677211B (en) Time synchronization method and device and power acquisition terminal
CN109284331B (en) Certificate making information acquisition method based on service data resources, terminal equipment and medium
US20220129483A1 (en) Data processing method and device, computing device and medium
CN111782546A (en) Automatic interface testing method and device based on machine learning
US20200201650A1 (en) Automatic anomaly detection in computer processing pipelines
CN109582504A (en) A kind of data reconstruction method and device for apple equipment
WO2017007981A1 (en) Action correlation framework
CN116737482A (en) Method and device for collecting chip test data in real time and electronic equipment
CN109861843B (en) Method, device and equipment for completely collecting and confirming log files
CN104794013A (en) Method and device for positioning system operation state and method and device for building system operation state model
CN113468196A (en) Method, apparatus, system, server and medium for processing data
CN111124650A (en) Streaming data processing method and device
CN117951144A (en) Data synchronization verification method and device, electronic equipment and storage medium
CN115344633A (en) Data processing method, device, equipment and storage medium
CN112860527A (en) Fault monitoring method and device of application server
CN111737242A (en) Method for monitoring mass data processing process
CN116431366B (en) Behavior path analysis method, system, storage terminal, server terminal and client terminal
CN116384956B (en) Message batch sending method, device, equipment and storage medium
CN114201541A (en) Data extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination