CN114461724A - Data synchronization comparison method, device and system based on random sampling - Google Patents

Data synchronization comparison method, device and system based on random sampling Download PDF

Info

Publication number
CN114461724A
CN114461724A CN202111587837.7A CN202111587837A CN114461724A CN 114461724 A CN114461724 A CN 114461724A CN 202111587837 A CN202111587837 A CN 202111587837A CN 114461724 A CN114461724 A CN 114461724A
Authority
CN
China
Prior art keywords
sampling
comparison
data
configuration
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111587837.7A
Other languages
Chinese (zh)
Inventor
杨连群
张研
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Aisino Corp
Original Assignee
Anhui Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Aisino Corp filed Critical Anhui Aisino Corp
Priority to CN202111587837.7A priority Critical patent/CN114461724A/en
Publication of CN114461724A publication Critical patent/CN114461724A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity

Abstract

The invention provides a data synchronization comparison method, device and system based on random sampling, and relates to the technical field of data inspection. The invention relates to a data synchronization comparison method based on random sampling, which comprises the following steps: generating a sampling comparison configuration table according to random sampling rule configuration; generating a sampling data table according to the sampling comparison configuration table; and generating a comparison verification result table according to the sampling data table and the target end data table, and judging whether the comparison verification passes according to the comparison verification result table. According to the technical scheme, the data consistency of the source end and the target end of the data synchronization is checked based on random sampling rule configuration, the sampling data table is directly matched and compared with the data table of the target end through the main key, the comparison and verification of the data synchronization record are realized, and the consistency of the data records of the source end and the target end is accurately and efficiently ensured.

Description

Data synchronization comparison method, device and system based on random sampling
Technical Field
The invention relates to the technical field of data inspection, in particular to a data synchronization comparison method, device and system based on random sampling.
Background
With the continuous development of big data, various data sources are increasing, and in order to enhance the integration and utilization of data, data synchronization has been generally applied to each big data platform, but data errors may occur in the data synchronization process, so it is important to ensure the accuracy and consistency of data synchronization.
At present, the common application in the accuracy check of data synchronization is to check whether the total data amount before and after data synchronization is consistent, but the data consistency check is often ignored, and the existing data consistency check method is cumbersome and is not beneficial to the synchronous check of a large number of data tables.
Disclosure of Invention
The invention solves the problem of how to realize the accurate and efficient consistency check of data.
In order to solve the above problems, the present invention provides a data synchronization comparison method based on random sampling, which comprises: generating a sampling comparison configuration table according to random sampling rule configuration; generating a sampling data table according to the sampling comparison configuration table; and generating a comparison verification result table according to the sampling data table and the target end data table, and judging whether comparison verification passes according to the comparison verification result table.
The data synchronization comparison method based on random sampling checks the data consistency of the source end and the target end of the data synchronization based on the random sampling rule configuration, directly matches and compares the sampling data table with the data table of the target end through the main key, realizes the comparison and verification of the data synchronization record, and accurately and efficiently ensures the consistency of the data record of the data source end and the target end.
Optionally, the generating a sampling comparison configuration table according to a random sampling rule configuration includes: determining the number of sampling data record samples, a primary key of a configuration data table and a configuration data type field according to a random sampling rule configuration item, and generating the sampling comparison configuration table according to the number of the sampling data record samples, the primary key of the data table and the data type field.
The data synchronization comparison method based on random sampling determines the number of the sample of the sampled data record, configures the main key of the data table and configures the data type field according to the random sampling rule configuration item to generate the sampling comparison configuration table, thereby realizing the comparison verification of the data synchronization record and accurately and efficiently ensuring the consistency of the data record of the data source end and the data record of the target end.
Optionally, the data type field includes a string field, a numeric type field, and a time type field, and the configuration data type field includes: and selecting one representative field from the character string field, the numerical type field and the time type field as a sampling comparison field.
According to the data synchronization comparison method based on random sampling, the difference possibly generated by different data types is covered by respectively selecting one representative field from the character string field, the numerical value type field and the time type field as the sampling comparison field, the coverage rate of the sampling field type is increased, and the accuracy of comparison verification is improved.
Optionally, the generating a sampling comparison configuration table according to a random sampling rule configuration further includes: and when a plurality of data tables needing to be compared exist, writing the configuration of the plurality of data tables needing to be compared into the sampling comparison configuration table.
According to the data synchronous comparison method based on random sampling, when a plurality of data tables needing to be compared exist, the configuration of the data tables needing to be compared is written into the sampling comparison configuration table, and batch comparison verification of the data tables can be achieved.
Optionally, the generating a comparison verification result table according to the sampling data table and the target end data table includes: and matching and comparing the sampling data sheet with the target terminal data sheet, and generating the comparison verification result sheet according to the field type and the value difference determined by matching and comparing.
The data synchronization comparison method based on random sampling carries out matching comparison on the sampling data sheet and the target end data sheet, generates a comparison verification result sheet according to the field type and the numerical difference determined by matching comparison, and further can judge whether comparison verification passes or not according to the comparison verification result sheet.
Optionally, the determining whether the comparison verification passes according to the comparison verification result table includes: if the comparison verification result table does not have a difference set result, the comparison verification is passed, and if the comparison verification result table has the difference set result, an abnormal result alarm is given.
The data synchronization comparison method based on random sampling judges whether the comparison verification is passed or abnormal result warning is carried out according to the difference set result of the comparison verification result table, and the consistency of the data records of the data source end and the target end is accurately and efficiently ensured.
Optionally, the difference set result includes a primary key association mismatch, a numeric field check not passed, a time field check not passed, and a character field check not passed.
The data synchronization comparison method based on random sampling realizes the high-efficiency check of the data records of the data source end and the target end by setting the data sampling consistency check content.
The invention also provides a data synchronization comparison device based on random sampling, which comprises: the configuration module is used for generating a sampling comparison configuration table according to random sampling rule configuration; the data table module is used for generating a sampling data table according to the sampling comparison configuration table; and the comparison module is used for generating a comparison verification result table according to the sampling data table and the target end data table and judging whether the comparison verification passes according to the comparison verification result table. Compared with the prior art, the random sampling-based data synchronous comparison device and the random sampling-based data synchronous comparison method have the same advantages, and are not described herein again.
The invention also provides a data synchronization comparison system based on random sampling, which comprises a computer readable storage medium and a processor, wherein the computer readable storage medium is used for storing a computer program, and the computer program is read by the processor and runs to realize the data synchronization comparison method based on random sampling. Compared with the prior art, the data synchronization comparison system based on random sampling and the data synchronization comparison method based on random sampling have the same advantages, and are not described herein again.
The invention also provides a computer readable storage medium, which stores a computer program, and when the computer program is read and executed by a processor, the data synchronization comparison method based on random sampling is realized. The advantages of the computer-readable storage medium and the random sampling-based data synchronization comparison method over the prior art are the same, and are not described herein again.
Drawings
Fig. 1 is a schematic diagram of a random sampling-based data synchronization comparison method according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
As shown in fig. 1, an embodiment of the present invention provides a data synchronization comparison method based on random sampling, including: generating a sampling comparison configuration table according to random sampling rule configuration; generating a sampling data table according to the sampling comparison configuration table; and generating a comparison verification result table according to the sampling data table and the target end data table, and judging whether the comparison verification passes according to the comparison verification result table.
Specifically, in this embodiment, the data synchronization comparing method based on random sampling includes: generating a sampling comparison configuration table according to random sampling rule configuration; generating a sampling data table according to the sampling comparison configuration table; and generating a comparison verification result table according to the sampling data table and the target end data table, and judging whether the comparison verification passes according to the comparison verification result table. The data consistency of the source end and the target end of the data synchronization is checked based on random sampling rule configuration, a sampling data table generated at the source end is configured according to the rule, and then the sampling data table is directly matched and compared with the data table of the target end through a main key, so that the comparison and verification of the data synchronization record are realized, and the data consistency of the data source end and the target end is accurately and efficiently ensured. By taking the financial big data as an example, the consistency check of the financial big data can reduce or even eliminate the data error condition, thereby improving the financial security.
In this embodiment, the data consistency of the source end and the target end of the data synchronization is checked based on the random sampling rule configuration, and the sampling data table is directly matched and compared with the data table of the target end through the main key, so that the comparison and verification of the data synchronization record are realized, and the consistency of the data records of the source end and the target end is accurately and efficiently ensured.
Optionally, the generating a sampling comparison configuration table according to a random sampling rule configuration includes: determining the number of sampling data record samples, a primary key of a configuration data table and a configuration data type field according to a random sampling rule configuration item, and generating the sampling comparison configuration table according to the number of the sampling data record samples, the primary key of the data table and the data type field.
Specifically, in this embodiment, configuring and generating the sampling comparison configuration table according to the random sampling rule includes: (1) determining the number of sample data records to be sampled according to the random sampling rule configuration items, namely the number of random sampling N, performing configuration management, wherein the number can be set to be 20, 50, 100 and the like, and the data records representing sampling are corresponding N records; by randomly extracting representative records, the resource consumption of comparison verification can be reduced to a certain extent; (2) configuring a data table main key according to a random sampling rule configuration item, matching and comparing a sampling data table and a data table of a target end through the main key, wherein the purpose of configuring the main key is to perform associated matching and use with the data table of the target end at the later time, if the data table is an increment synchronous table, random sampling increment partition data can be selected, and a partition field is additionally added in a configuration item; (3) the data type field is configured according to a random sampling rule configuration item.
In this embodiment, the number of samples of the sampled data record, the primary key of the configuration data table, and the type field of the configuration data are determined according to the random sampling rule configuration item to generate a sampling comparison configuration table, so that comparison verification of data synchronization records can be realized, and the consistency of data records of a data source end and a target end can be accurately and efficiently ensured.
Optionally, the data type field includes a string field, a numeric type field, and a time type field, and the configuration data type field includes: and selecting one representative field from the character string field, the numerical type field and the time type field as a sampling comparison field.
Specifically, in this embodiment, the data type field includes a character string field, a numerical value type field, and a time type field, and for the table primary key and the custom configuration data type field, the data table needs to be analyzed in advance, the data table primary key and the sampling field are configured, and instead of performing comparison verification on the full field of the data table, one representative is selected from each of the three types of fields of each table, that is, three fields other than the data table primary key are selected.
In the embodiment, one representative field is selected from the character string field, the numerical field and the time type field respectively to serve as the sampling comparison field, so that the difference possibly generated by different data types is covered, the coverage rate of the sampling field type is increased, and the accuracy of comparison verification is improved.
Optionally, the generating a sampling comparison configuration table according to a random sampling rule configuration further includes: and when a plurality of data tables needing to be compared exist, writing the configuration of the plurality of data tables needing to be compared into the sampling comparison configuration table.
Specifically, in this embodiment, for the sampling comparison configuration table, when there are a plurality of data tables that need to be compared, the configuration of the plurality of data tables that need to be compared is written into the sampling comparison configuration table, so that batch comparison verification of the plurality of data tables can be realized.
In this embodiment, when there are a plurality of data tables to be compared, the configuration of the plurality of data tables to be compared is written into the sampling comparison configuration table, so that batch comparison and verification of the plurality of data tables can be realized.
Optionally, the generating a comparison verification result table according to the sampling data table and the target end data table includes: and matching and comparing the sampling data table with the target end data table, and generating the comparison verification result table according to the field type and the value difference determined by matching and comparing.
Specifically, in this embodiment, the sampling data table and the target end data table are matched and compared, a difference set between the sampling data table and the target end data table is compared, whether a field type and a numerical value are different is determined, a corresponding data comparison verification result is generated, and whether comparison verification passes or not can be determined according to the comparison verification result table.
In this embodiment, the sampling data table and the target-end data table are matched and compared, a comparison verification result table is generated according to the field type and the value difference determined by matching and comparing, and whether comparison verification passes or not can be determined according to the comparison verification result table.
Optionally, the determining whether the comparison verification passes according to the comparison verification result table includes: if the comparison verification result table does not have a difference set result, the comparison verification is passed, and if the comparison verification result table has the difference set result, an abnormal result alarm is given.
Specifically, in this embodiment, the determining whether the comparison verification passes according to the comparison verification result table includes: if the comparison verification result table does not have the difference set result, the comparison verification is passed, and if the comparison verification result table has the difference set result, an abnormal inspection result exists, and an abnormal result alarm needs to be performed.
In this embodiment, whether the comparison verification passes or an abnormal result alarm is performed is determined according to whether a difference set result occurs in the comparison verification result table, so that the consistency of the data records of the data source end and the target end is accurately and efficiently ensured.
Optionally, the difference set result includes a primary key association mismatch, a numeric field check not passed, a time field check not passed, and a character field check not passed.
Specifically, in this embodiment, during comparison and verification, after the sample data table is generated, the full-table records or the full-table fields of the original table do not need to be compared and verified one by one, only whether the difference exists between the sample data table and the corresponding field records of the target table is compared, and the data sampling consistency check is mainly:
(1) if the main keys are consistent, if the main key association matching is not up, the checking is not passed, and the normal matching represents that the checking is successful;
(2) if the numerical field is legal, and if the field values are different, if the numerical field is legal, the task fails to pass, and if the numerical field is not legal, the task passes;
(3) the time field is used for checking whether the time field value is legal or not, comparing the time field values, judging whether a difference exists or not, if so, failing to pass the task, and if not, successfully passing the task;
(4) and character type fields, checking whether field values are consistent or not, recording whether differences exist or not, failing to pass the task if the differences exist, and succeeding to pass the task if the differences do not exist.
In this embodiment, by setting the data sample consistency check content, efficient checking of data records of the data source end and the target end is achieved.
The following examples are given.
For example, table tab01 has a primary key pkey of uuid, and a sample field of: and tzje, djxh, lrrq, UUID- > pkey, tzje- > col _ double, djxh- > col _ string, lrrq- > timestamp (common data field types: double, string, timestamp, select representative 3 fields, and configure the fields into the configuration table by comma segmentation).
And randomly sampling N records and storing the N records into a sampling data table. And then matching and comparing the sampled data table with the target end data table, associating through a key main key, comparing whether the field values of the three data types are consistent, if so, checking to be passed, otherwise, checking not to be passed.
Another embodiment of the present invention provides a data synchronization comparing apparatus based on random sampling, including: the configuration module is used for generating a sampling comparison configuration table according to random sampling rule configuration; the data table module is used for generating a sampling data table according to the sampling comparison configuration table; and the comparison module is used for generating a comparison verification result table according to the sampling data table and the target end data table and judging whether the comparison verification passes according to the comparison verification result table.
Another embodiment of the present invention provides a random sampling-based data synchronization comparison system, which includes a computer-readable storage medium storing a computer program and a processor, where the computer program is read by the processor and executed to implement the above random sampling-based data synchronization comparison method.
Another embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is read and executed by a processor, the computer program implements the random sampling-based data synchronization comparison method as described above.
Although the present disclosure has been described above, the scope of the present disclosure is not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present disclosure, and these changes and modifications are intended to be within the scope of the present disclosure.

Claims (10)

1. A data synchronization comparison method based on random sampling is characterized by comprising the following steps:
generating a sampling comparison configuration table according to random sampling rule configuration;
generating a sampling data table according to the sampling comparison configuration table;
and generating a comparison verification result table according to the sampling data table and the target end data table, and judging whether comparison verification passes according to the comparison verification result table.
2. The random sampling-based data synchronization comparison method according to claim 1, wherein said generating a sampling comparison configuration table according to a random sampling rule configuration comprises:
determining the number of sampling data record samples, a primary key of a configuration data table and a configuration data type field according to a random sampling rule configuration item, and generating the sampling comparison configuration table according to the number of the sampling data record samples, the primary key of the data table and the data type field.
3. The random sample based data synchronization method of claim 2, wherein the data type field comprises a string field, a numeric field, and a time type field, and the configuring the data type field comprises:
and selecting one representative field from the character string field, the numerical type field and the time type field as a sampling comparison field.
4. The random sampling-based data synchronization comparison method according to claim 2, wherein said generating a sampling comparison configuration table according to a random sampling rule configuration further comprises:
and when a plurality of data tables needing to be compared exist, writing the configuration of the plurality of data tables needing to be compared into the sampling comparison configuration table.
5. The random sampling based data synchronization comparison method as claimed in claim 1, wherein said generating a comparison verification result table according to the sampling data table and the target end data table comprises:
and matching and comparing the sampling data table with the target end data table, and generating the comparison verification result table according to the field type and the value difference determined by matching and comparing.
6. The random sampling based data synchronization comparison method as claimed in claim 5, wherein said determining whether the comparison verification passes according to the comparison verification result table comprises:
if the comparison verification result table does not have a difference set result, the comparison verification is passed, and if the comparison verification result table has the difference set result, an abnormal result alarm is given.
7. The method of claim 6, wherein the difference result comprises a primary key association mismatch, a numeric field check fail, a time field check fail, and a character field check fail.
8. A random sampling based data synchronization comparing apparatus, comprising:
the configuration module is used for generating a sampling comparison configuration table according to random sampling rule configuration;
the data table module is used for generating a sampling data table according to the sampling comparison configuration table;
and the comparison module is used for generating a comparison verification result table according to the sampling data table and the target end data table and judging whether the comparison verification passes according to the comparison verification result table.
9. A random sample based data synchronous alignment system, comprising a computer readable storage medium storing a computer program and a processor, wherein the computer program is read by the processor and executed to implement the random sample based data synchronous alignment method according to any one of claims 1 to 7.
10. A computer-readable storage medium, storing a computer program, which when read and executed by a processor, implements the random sample based data synchronous comparison method according to any one of claims 1 to 7.
CN202111587837.7A 2021-12-23 2021-12-23 Data synchronization comparison method, device and system based on random sampling Pending CN114461724A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111587837.7A CN114461724A (en) 2021-12-23 2021-12-23 Data synchronization comparison method, device and system based on random sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111587837.7A CN114461724A (en) 2021-12-23 2021-12-23 Data synchronization comparison method, device and system based on random sampling

Publications (1)

Publication Number Publication Date
CN114461724A true CN114461724A (en) 2022-05-10

Family

ID=81405686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111587837.7A Pending CN114461724A (en) 2021-12-23 2021-12-23 Data synchronization comparison method, device and system based on random sampling

Country Status (1)

Country Link
CN (1) CN114461724A (en)

Similar Documents

Publication Publication Date Title
US8782609B2 (en) Test failure bucketing
US7712087B2 (en) Methods and systems for identifying intermittent errors in a distributed code development environment
CN107329894B (en) Application program system testing method and device and electronic equipment
CN110389941B (en) Database checking method, device, equipment and storage medium
CN110046155B (en) Method, device and equipment for updating feature database and determining data features
CN111767350A (en) Data warehouse testing method and device, terminal equipment and storage medium
US20200097579A1 (en) Detecting anomalous transactions in computer log files
CN106250319A (en) Static code scanning result treating method and apparatus
US10657028B2 (en) Method for replicating production behaviours in a development environment
CN105630656A (en) Log model based system robustness analysis method and apparatus
CN112559526A (en) Data table export method and device, computer equipment and storage medium
CN112632330A (en) Method and device for routing inspection of ATM equipment, computer equipment and storage medium
US10540600B2 (en) Method and apparatus for detecting changed data
US10782942B1 (en) Rapid onboarding of data from diverse data sources into standardized objects with parser and unit test generation
US10289531B2 (en) Software integration testing with unstructured database
CN114461724A (en) Data synchronization comparison method, device and system based on random sampling
CN114676054A (en) Test data generation method, device, equipment, medium and product
CN111639478B (en) Automatic data auditing method and system based on EXCEL document
CN110517010B (en) Data processing method, system and storage medium
US9275358B1 (en) System, method, and computer program for automatically creating and submitting defect information associated with defects identified during a software development lifecycle to a defect tracking system
CN113238940A (en) Interface test result comparison method, device, equipment and storage medium
CN115525660A (en) Data table verification method, device, equipment and medium
CN116303015A (en) Processing method, device, equipment and medium for interface robustness test
CN115858363A (en) Interface test method, system, storage medium and computer equipment
CN116644078A (en) Data quality inspection method, inspection device, inspection equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination