CN118193514A - Data verification method, device, equipment and storage medium - Google Patents

Data verification method, device, equipment and storage medium Download PDF

Info

Publication number
CN118193514A
CN118193514A CN202410375035.7A CN202410375035A CN118193514A CN 118193514 A CN118193514 A CN 118193514A CN 202410375035 A CN202410375035 A CN 202410375035A CN 118193514 A CN118193514 A CN 118193514A
Authority
CN
China
Prior art keywords
data
target
source
source data
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410375035.7A
Other languages
Chinese (zh)
Inventor
姜昱君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202410375035.7A priority Critical patent/CN118193514A/en
Publication of CN118193514A publication Critical patent/CN118193514A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data verification method, a device, equipment and a storage medium, wherein the method comprises the following steps: splitting the source data and the target data according to the preset data quantity to obtain a plurality of source data items and a plurality of target data items. And reading a first preset number of source data items and target data items, calculating a primary source data digest value and a primary target data digest value, reading a second preset number of source data items and target data items if the primary source data digest value and the primary target data digest value are different, and calculating a secondary source data digest value and a secondary target data digest value, wherein if the secondary source data digest value and the secondary target data digest value are different, the verification result of the second preset number of target data items is verification error. The data summary is utilized to realize data verification of the source data and the target data, the data comparison is easy, no special equipment or access data processing platform is needed, the verification cost can be reduced, and the quick and lightweight data verification is realized.

Description

Data verification method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computers, and in particular, to a data verification method, apparatus, device, and storage medium.
Background
With the development of the current computer related technology, more and more data are generated. The data refers to service information stored in the service system in the process of executing service steps, and the back-end system often stores the data in a database or file form.
When the database is replaced by a new system or an old system of the database is replaced by a new system, data migration is performed, and integrity and correctness verification needs to be performed on migrated data, namely, data verification needs to be performed on migrated data, which is a key method for guaranteeing data quality after target data is generated from source data migration.
However, the current data verification method needs to use special equipment or access a data processing platform, so that verification resources are needed to be used, and the cost is high.
Disclosure of Invention
In view of the above, the present application aims to provide a data verification method, device, apparatus and storage medium, which can reduce verification cost and realize fast and lightweight data verification.
The application provides a data verification method, which comprises the following steps:
Acquiring source data and target data, splitting the source data and the target data according to a preset data amount to respectively obtain a plurality of source data items and a plurality of target data items; the source data is subjected to data migration to obtain the target data;
Reading a first preset number of source data items and target data items, respectively constructing a first-stage temporary source data file and a first-stage temporary target file, respectively calculating a first-stage source data abstract value of the first-stage temporary source data file and a first-stage target data abstract value of the first-stage temporary target file, and if the first-stage source data abstract value is the same as the first-stage target data abstract value, determining that the verification result of the first preset number of target data items is successful;
If the primary source data abstract value and the primary target data abstract value are different, respectively reading a second preset number of source data items and target data items from the primary temporary source data file and the primary temporary target file, respectively constructing a secondary temporary source data file and a secondary temporary target file, respectively calculating a secondary source data abstract value of the secondary temporary source data file and a secondary target data abstract value of the secondary temporary target file, and if the secondary source data abstract value and the secondary target data abstract value are different, determining that the verification result of the target data items of the second preset number is verification error; the second preset number is smaller than the first preset number;
and comparing the target data items of the second preset number with the source data items of the second preset number line by line to obtain error data items.
Optionally, the comparing the second preset number of the target data items with the second preset number of the source data items row by row, and obtaining the error data items includes:
And identifying the position of the source data item by using a source offset value, identifying the position of the target data item by using a target offset value, comparing the character strings of the second preset number of the target data items and the second preset number of the source data items row by row, gradually increasing the source offset value and the target offset value if the character string comparison is the same, and identifying the target data item corresponding to the target offset value as an error data item if the character string comparison is different.
Optionally, the source offset value includes a primary source offset value and a secondary source offset value, and the target offset value includes a primary target offset value and a secondary target offset value; the first-level source offset value is used for identifying positions of all the source data items ordered according to a preset sequence, the first-level target offset value is used for identifying positions of all the target data items ordered according to the preset sequence, the second-level source offset value is used for identifying positions of a second preset number of the source data items ordered according to the preset sequence, and the second-level target offset value is used for identifying positions of the second preset number of the target data items ordered according to the preset sequence;
the comparing the second preset number of the target data items with the second preset number of the source data items row by row in character strings gradually increases the source offset value and the target offset value if the character string comparison is the same, and if the character string comparison is different, confirming the target data item corresponding to the target offset value as an error data item includes:
Comparing character strings of the second preset number of the target data items with the second preset number of the source data items row by row, and gradually increasing the secondary source offset value and the secondary target offset value if the character string comparison is the same;
If the character string comparison is different, confirming the target data item corresponding to the secondary target offset value as an error data item, comparing the character string of the source data item with the character string of the target data item, and if the character string of the source data item is larger than the character string of the target data item, increasing the primary target offset value of the error data item by one bit; if the character string of the source data item is smaller than that of the target data item, the primary source offset value of the error data item is increased by one bit.
Optionally, splitting the source data and the target data according to a preset data amount to obtain a plurality of source data items and a plurality of target data items respectively includes:
Splitting the source data and the target data according to a preset data amount and sequencing the source data and the target data according to a splitting sequence to obtain a plurality of sequenced source data items and a plurality of sequenced target data items.
Optionally, splitting the source data and the target data according to a preset data amount and sorting according to a splitting order includes:
Splitting the source data and the target data according to a preset data amount, and sorting the split multiple source data items and multiple target data items in a merging sorting mode to obtain sorted multiple source data items and multiple target data items.
Optionally, the method further comprises:
And carrying out data mapping on the source data to obtain source data which accords with the data format of the target data.
Optionally, the method further comprises:
And acquiring initial data and original data, extracting important data in the initial data, splicing the important data to obtain the source data, and extracting the important data in the original data, splicing the important data to obtain the target data.
The application provides a data verification device, which comprises:
The splitting unit is used for acquiring source data and target data, splitting the source data and the target data according to preset data quantity, and respectively obtaining a plurality of source data items and a plurality of target data items; the source data is subjected to data migration to obtain the target data;
The first computing unit is used for reading a first preset number of the source data items and the target data items, respectively constructing a first-stage temporary source data file and a first-stage temporary target file, respectively computing a first-stage source data abstract value of the first-stage temporary source data file and a first-stage target data abstract value of the first-stage temporary target file, and if the first-stage source data abstract value is the same as the first-stage target data abstract value, verifying results of the first preset number of the target data items are successful in verification;
The second computing unit is configured to read a second preset number of source data items and target data items from the first temporary source data file and the first temporary target data file, respectively, if the first-level source data digest value and the first-level target data digest value are different, respectively, construct a second temporary source data file and a second temporary target file, respectively, and calculate a second source data digest value of the second temporary source data file and a second target data digest value of the second temporary target file, if the second source data digest value and the second target data digest value are different, the verification result of the second preset number of target data items is a verification error; the second preset number is smaller than the first preset number;
and the comparison unit is used for comparing the target data items with the second preset number of source data items row by row to obtain error data items.
The present application provides a data verification device, the device comprising: a processor and a memory;
The memory is used for storing instructions;
the processor configured to execute the instructions in the memory and perform the method according to any one of the above embodiments.
The present application provides a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform a method as in any of the above embodiments.
The application provides a data verification method, which comprises the following steps: and acquiring source data and target data, wherein the source data is subjected to data migration to obtain the target data. Splitting the source data and the target data according to a preset data amount to respectively obtain a plurality of source data items and a plurality of target data items. Reading a first preset number of source data items and target data items, respectively constructing a first-stage temporary source data file and a first-stage temporary target file, respectively calculating a first-stage source data digest value of the first-stage temporary source data file and a first-stage target data digest value of the first-stage temporary target file, if the first-stage source data digest value is the same as the first-stage target data digest value, the verification result of the first preset number of target data items is verification success, if the first-stage source data digest value is different from the first-stage target data digest value, respectively reading a second preset number of source data items and target data items from the first-stage temporary source data file and the first-stage temporary target file, respectively constructing a second-stage temporary source data file and a second-stage temporary target file, respectively calculating a second-stage source data digest value of the second-stage temporary source data file and a second-stage target data digest value of the second-stage temporary target file, if the second-stage source data digest value is different from the first-stage target data digest value, the second-stage data digest value is verification error, and the second preset number of target data items is smaller than the first preset number, that is verification error, namely, the first-stage error data item verification is easy to realize comparing the first preset number of source data item error and the second preset number of target data item is obtained, and the error data item verification is more easily compared with the first preset number of target item data item. Therefore, the data verification method provided by the application does not need to use special equipment and access a data processing platform, can reduce the verification cost and realizes quick and lightweight data verification.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows a flow chart of a data verification method according to an embodiment of the present application;
fig. 2 shows a schematic structural diagram of a data verification device according to an embodiment of the present application.
Detailed Description
In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.
The data validation may currently take place using proprietary devices or access to a data processing platform, such as a blockchain. The scheme is heavy in weight, and the data acquisition logic is required to be put into to acquire data when a user performs data verification, so that the user needs to access a data processing platform, and the intermediate workload is high. The distributed multi-terminal interconnection data verification is focused, the decentralization and non-falsification characteristics of the block chain technology and the special equipment guarantee the data authenticity, and the data verification is not carried out for simple data verification, but storage and calculation resources are consumed. Taking a system reconstruction scene as an example, most of the scenes are in enterprise internal software development, the data migration requirement on the original data authenticity verification is not high, and the processing scheme is redundant.
In addition, the data verification difficulty is high for the user by utilizing the special equipment or the access data processing platform, the user is required to know technologies such as distributed and block chain, if the technologies are not known, the use problem is difficult to check, the users are difficult to change according to different requirements, and the operation and maintenance difficulty is increased.
Based on this, the embodiment of the application provides a data verification method, which comprises the following steps: and acquiring source data and target data, wherein the source data is subjected to data migration to obtain the target data. Splitting the source data and the target data according to a preset data amount to respectively obtain a plurality of source data items and a plurality of target data items. Reading a first preset number of source data items and target data items, respectively constructing a first-stage temporary source data file and a first-stage temporary target file, respectively calculating a first-stage source data digest value of the first-stage temporary source data file and a first-stage target data digest value of the first-stage temporary target file, if the first-stage source data digest value is the same as the first-stage target data digest value, the verification result of the first preset number of target data items is verification success, if the first-stage source data digest value is different from the first-stage target data digest value, respectively reading a second preset number of source data items and target data items from the first-stage temporary source data file and the first-stage temporary target file, respectively constructing a second-stage temporary source data file and a second-stage temporary target file, respectively calculating a second-stage source data digest value of the second-stage temporary source data file and a second-stage target data digest value of the second-stage temporary target file, if the second-stage source data digest value is different from the first-stage target data digest value, the second-stage data digest value is verification error, and the second preset number of target data items is smaller than the first preset number, that is verification error, namely, the first-stage error data item verification is easy to realize comparing the first preset number of source data item error and the second preset number of target data item is obtained, and the error data item verification is more easily compared with the first preset number of target item data item. Therefore, the data verification method provided by the application does not need to use special equipment and access a data processing platform, can reduce the verification cost and realizes quick and lightweight data verification.
For a better understanding of the technical solutions and technical effects of the present application, specific embodiments will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, the flow chart of a data verification method according to an embodiment of the present application is shown.
The data verification method provided by the embodiment comprises the following steps:
s101, acquiring source data and target data, splitting the source data and the target data according to preset data quantity, and respectively obtaining a plurality of source data items and a plurality of target data items.
In the embodiment of the application, the source data is basic data for data migration, and the target data is result data obtained after the source data is subjected to data migration. In order to verify the integrity and the consistency of the target data after data migration, the source data and the target data can be respectively acquired, the target data is verified by taking the source data as a basis, and the consistency and the integrity of the source data and the target data are ensured.
Specifically, the file may be read or the database may be queried to obtain the source data and the target data.
In some possible implementations, data verification may only verify important data, and other non-important data may not need to be verified, which may reduce verification time. The method can read the file or query the database to obtain the initial data and the original data respectively, extract the important data in the initial data to splice to obtain the source data, and extract the important data in the original data to splice to obtain the target data.
In practical application, a row parameter assembling interface capable of assembling data can be provided for a user, and the user can return the spliced source data or target data to the user by inputting initial data or original data in the interface, so that the user can autonomously control the data items of the source data and the target data.
In some possible implementations, for data verification, it is generally required that the source data and the target data are in the same data format, so that the source data may be converted into the data format of the target data, and the target data may be converted into the data format of the source data.
The source data can be subjected to data mapping to obtain source data which accords with the data format of the target data. In practical applications, the line parameter escape interface may be utilized to map source data to a data format of target data.
In practical applications, to obtain source data, the generated file type parameter (sourceGenerate) in the configuration file may be read, and using the sourceGenerate, it may be determined whether to splice the source data and convert the data format. If the user directly provides a file in a specified format that includes the source data, such as by generating the source data using a database export tool, then splicing or data format conversion is skipped by configuring sourceGenerate to 0. Accordingly, to obtain the target data, the generated file type parameter (TARGETGENERATE) in the configuration file may be read, and using this TARGETGENERATE, it may be determined whether to splice and format the target data. If the user directly provides a file in a specified format that includes the target data, such as by generating the source data using a database export tool, then splicing or data format conversion is skipped by configuring TARGETGENERATE to 0.
In the embodiment of the application, after the source data and the target data are obtained, the source data and the target data can be split according to the preset data quantity to respectively obtain a plurality of source data items and a plurality of target data items. That is, the user may set a preset data amount in advance, so that the source data and the target data can be split according to the preset data amount. The predetermined amount of data is also referred to as granularity, and may be 100MB, for example.
When the data splitting is specifically performed, the source data and the target data can be split according to the preset data quantity and named sequentially, and then the plurality of source data items and the plurality of target data items can be ordered according to the named sequence, so that the ordered plurality of source data items and the ordered plurality of target data items are obtained. If the file corresponding to the source data or the target data is smaller than the preset data quantity, the files are arranged in the first order, so that the subsequent traversing operation is facilitated.
And in the specific sorting, a plurality of source data items and a plurality of target data items which are obtained by splitting according to the preset data quantity can be sequentially read into the memory and sorted. The split multiple source data items and multiple target data items can be sequenced in a merging and sequencing mode, so that a data verification basis for data verification is provided.
In practical applications, to obtain the sequenced multiple source data items, the sequencing requirement parameter (sourceFileSort) in the configuration file may be read, and the sourceFileSort may be used to determine whether to split and sequence the source data. If the user directly provides an ordered file, such as where the source data is in sequence, splitting and ordering of the source data is skipped by configuring sourceFileSort to 0. Accordingly, to obtain the sorted multi-label data item, the sorting requirement parameter (targetFileSort) in the configuration file may be read, and using the targetFileSort, it may be determined whether to split and sort the target data. If the user directly provides an ordered file, such as the target data has a timing sequence, then splitting and ordering the source data is skipped by configuring targetFileSort to 0.
S102, reading a first preset number of source data items and target data items, respectively constructing a first-stage temporary source data file and a first-stage temporary target file, respectively calculating a first-stage source data abstract value of the first-stage temporary source data file and a first-stage target data abstract value of the first-stage temporary target file, and if the first-stage source data abstract value is the same as the first-stage target data abstract value, obtaining a verification result of the first preset number of target data items as verification success.
In the embodiment of the application, in order to realize quick data verification, a first preset number of source data items and a first preset number of target data items can be read, and a first-stage temporary source data file and a first-stage temporary target file are respectively constructed, so that a plurality of source data items and a plurality of target data items can be divided into a plurality of first-stage temporary source data files and a plurality of first-stage temporary target files, and whether each first-stage temporary source data file is identical to the corresponding first-stage temporary target file or not can be directly compared subsequently, and whether data errors exist or not and whether the first-stage temporary target file where the data errors exist can be quickly determined.
The first-stage source data abstract value of the first-stage temporary source data file and the first-stage target data abstract value of the first-stage temporary target file can be calculated, and whether the first-stage source data abstract value and the first-stage target data abstract value are identical or not is judged, so that whether the first preset number of target data items are successfully verified or not is determined, and whether the first preset number of target data items have data errors or not is determined. The data digest refers to a method for calculating a file with any size into a section of hash value with a fixed length by adopting a certain algorithm, and has the advantages of high calculation speed and uniqueness, so that quick verification and accurate verification can be realized by utilizing the data digest for data verification. That is, data verification through the data digest can improve data verification efficiency while ensuring accuracy.
If the primary source data abstract value is the same as the primary target data abstract value, the verification result of the first preset number of target data items is verification success. If the primary source data digest value and the primary target data digest value are different, the verification result of the first preset number of target data items is verification failure. Repeating the process of comparing the primary source data abstract value and the primary target data abstract value until all primary temporary source data files and primary temporary target files are traversed.
S103, if the primary source data abstract value and the primary target data abstract value are different, respectively reading a second preset number of source data items and target data items from the primary temporary source data file and the primary temporary target file, respectively constructing a secondary temporary source data file and a secondary temporary target file, respectively calculating the secondary source data abstract value of the secondary temporary source data file and the secondary target data abstract value of the secondary temporary target file, and if the secondary source data abstract value and the secondary target data abstract value are different, determining that the verification result of the second preset number of target data items is verification error.
In the embodiment of the application, if the primary temporary source data file and the primary temporary target file with data errors are determined by comparing the difference of the primary source data abstract value and the primary target data abstract value, the data abstract can be continuously calculated for a plurality of source data items and a plurality of target data items in the primary temporary source data file and the primary temporary target file, so that the data error range with smaller granularity is determined.
In order to realize quick data verification, a second preset number of source data items and a second preset number of target data items are respectively read from a first-stage temporary source data file and a first-stage temporary target file which are different in first-stage source data abstract value and first-stage target data abstract value, and a second-stage temporary source data file and a second-stage temporary target file are respectively constructed, so that a plurality of source data items included in the first-stage temporary source data file and a plurality of target data items included in the first-stage temporary target file can be divided into a plurality of second-stage temporary source data files and a plurality of second-stage temporary target files, whether each second-stage temporary source data file is identical with the corresponding second-stage temporary target file or not is directly compared, and whether data errors exist or not and the second-stage temporary target file where the data errors exist can be quickly determined.
The second preset number is smaller than the first preset number, for example, the first preset number may be 100000, and the second preset number may be 10000. The first predetermined number may be obtained by reading a control data field (numthreshod) parameter in the configuration file and the second predetermined number may be obtained by reading a sub-control data field (subnumthreshod) parameter in the configuration file. The first preset number and the second preset number of users may be configured in a configuration file.
The method comprises the steps of calculating a secondary source data abstract value of a secondary temporary source data file and a secondary target data abstract value of a secondary temporary target file, and determining whether a second preset number of target data items are successfully verified or not and whether data errors exist or not by judging whether the secondary source data abstract value and the secondary target data abstract value are the same or not.
If the secondary source data abstract value is the same as the secondary target data abstract value, the verification result of the second preset number of target data items is verification success. If the secondary source data digest value and the secondary target data digest value are different, the verification result of the second preset number of target data items is a verification error. And repeating the process of comparing the secondary source data abstract value and the secondary target data abstract value until all the secondary temporary source data files and the secondary temporary target files are traversed.
S104, comparing the target data items with the second preset number and the source data items with each other line by line to obtain error data items.
In the embodiment of the present application, if the secondary source data digest value and the secondary target data digest value are different, the verification result representing the second preset number of target data items is a verification error, and at this time, the second preset number of target data items and the second preset number of source data items may be compared line by line, that is, each item of target data item and the corresponding source data item may be compared, so as to obtain an error data item.
As a possible implementation manner, the character strings of the target data item and the source data item may be compared to each other, so as to obtain an error data item.
In the specific comparison process, when the error data items are obtained through comparison, the positions of the error data items in all target data items also need to be known, and the positions can be identified by using offset values. For example, the location of the source data item is identified with a source offset value and the location of the target data item is identified with a target offset value. The initial values of the source offset value and the target offset value may be 0, and as the values of the source offset value and the target offset value are sequentially increased, the maximum value of the source offset value may obtain the number of source offset values, and the maximum value of the target offset value may obtain the number of target offset values. That is, the second preset number of target data items and the second preset number of source data items are compared in a row-by-row manner, if the character string comparison is the same, the source offset value and the target offset value are gradually increased, if the character string comparison is different, the data error exists in the row of target data items, the target data item corresponding to the target offset value is confirmed as an error data item, and the target offset value corresponding to the error data item is recorded to indicate the data error position.
In practical applications, there may be a case that the number of target data items and the number of source data items are not matched, at this time, the target data items and the source data items that are compared by using the offset value identification may be in a case that the source offset value and the target offset value are not matched, that is, when the source offset value and the target offset value are the same, the target data items corresponding to the target data items are not obtained by migration of the source data items, that is, in the data migration process, there may be a case that the number of source data items is smaller than the number of target data items due to repeated migration of the source data, and correspondingly, there may also be a case that the number of source data items is greater than the number of target data items due to missed migration of the source data. Even if the number of target data items and the number of source data items match, an error may occur in the offset value due to mismatch of the target data items and the source data items. The correction of the offset value may be performed using two levels of offset values at this time.
Specifically, the source offset value includes a primary source offset value and a secondary source offset value, and the target offset value includes a primary target offset value and a secondary target offset value. The first level source offset value is used for identifying positions of all source data items ordered according to a preset sequence, and the first level target offset value is used for identifying positions of all target data items ordered according to the preset sequence. The second-level source offset value is used for identifying the positions of the second preset number of source data items ordered according to the preset sequence, and the second-level target offset value is used for identifying the positions of the second preset number of target data items ordered according to the preset sequence. The initial values of the primary source offset value, the secondary source offset value, the primary target offset value, and the secondary target offset value are all 0, that is, the secondary source offset value and the secondary target offset value are only used to identify the locations of the source data item and the target data item in the secondary temporary source data file and the secondary temporary target file.
When the character strings are compared with the second preset number of target data items and the second preset number of source data items row by row, if the character strings are compared with each other the second source offset value and the second target offset value are gradually increased, and the first source offset value and the first target offset value are not changed. If the character string comparison is different, the target data item corresponding to the secondary target offset value is confirmed to be the error data item, at the moment, the character strings of the source data item and the character strings of the target data item with different character string comparison results can be compared in detail, and the sizes of the two character strings can be compared directly by using a default method or a user rewriting method. If the string of the source data item is larger than the string of the target data item, which represents that there may be one more additional data in the target data item, the second target offset value and the first target offset value of the error data item may be increased by one bit, and the error data item may be added to the first file, where the first file is a file with more target data items than the source data item. If the string of the source data item is smaller than the string of the target data item, which represents that there may be a case where the source data is missing and migrated, the second source offset value and the first source offset value of the error data item may be increased by one bit, and the error data item may be added to a second file, where the second file is a file with less target data items than the source data item.
Therefore, by adjusting the primary source offset value or the primary target offset value, the source data items and the target data items in different primary temporary source data files and primary temporary target files can be always matched, namely, the target data items obtained by migration of the source data items are always checked, so that accurate data verification is realized.
As an example, when the second preset number of target data items are [ a b d e ] and the second preset number of source data items are [ a c d e ] respectively, and the secondary source offset value and the secondary target offset value are 2 respectively, the character strings of the source data item c and the target data item b are compared, the character strings are different in comparison, and the target data item b with the secondary target offset value of 2 is confirmed as an error data item. Continuing to compare the string sizes of the source data item c and the target data item b in detail, the target data can be considered to be one more data item b because c > b, and the secondary target offset value is increased by 1 to point to the target data item d. And comparing the character string sizes of the source data item c and the target data item d, wherein c is smaller than d, so that the target data can be considered to be less than the data item c, and the secondary source offset value is increased by 1 to point to the source data item d. When the secondary source offset value and the secondary target offset value are respectively 3, comparing the character strings of the source data item d and the target data item d, wherein the character string comparison is the same, the secondary source offset value and the secondary target offset value are both added with 1 and point to the source data item e and the target data item e, and the like, and gradually comparing all the source data items and all the target data items by utilizing the character strings.
After the first file and the second file are obtained, the first file and the second file may be returned to the user so that the user can learn the data verification result and the error data.
Therefore, the data verification method provided by the embodiment of the application does not need to use special equipment and an access data processing platform, reduces the intermediate workload of a user, and has the advantages of lightweight scheme and easy integration. And the data verification flow can be finished locally, the data migration verification is often accompanied with a large amount of service data, the local operation can avoid the problem of data loss or the safety of the data in the network transmission process, and the method has the advantage of high safety. The data sorting and the data summarization adopted by the embodiment of the application are all widely used mature software technologies, so that the user understanding difficulty is low, the service problems can be conveniently checked during use, the use risk is greatly reduced, and the method and the device have the advantage of easy operation and maintenance. In addition, the data verification method provided by the embodiment of the application decouples the data verification from the actual business process, is suitable for various migration data with comparable sequences, and the sequence is the basic characteristic of database data, and because of the uniqueness of the primary key, the business data with related characteristics are all suitable for the data verification method provided by the embodiment of the application, and the user can rapidly develop the characteristics of the database per se, thereby having the advantage of universality.
Based on the data verification method provided by the above embodiment, the embodiment of the application also provides a data verification device 2, and the figure is a schematic structural diagram of the data verification device provided by the embodiment of the application.
The data verification apparatus 200 provided in this embodiment includes:
A splitting unit 210, configured to obtain source data and target data, split the source data and the target data according to a preset data amount, and obtain a plurality of source data items and a plurality of target data items respectively; the source data is subjected to data migration to obtain the target data;
A first calculating unit 220, configured to read a first preset number of the source data items and the target data items, respectively construct a first-stage temporary source data file and a first-stage temporary target file, respectively calculate a first-stage source data digest value of the first-stage temporary source data file and a first-stage target data digest value of the first-stage temporary target file, and if the first-stage source data digest value is the same as the first-stage target data digest value, verify that the first preset number of the target data items is successful;
A second calculating unit 230, configured to read a second preset number of the source data items and the target data items from the first temporary source data file and the first temporary target data file, respectively, if the first-level source data digest value and the first-level target data digest value are different, respectively, construct a second temporary source data file and a second temporary target file, respectively, calculate a second source data digest value of the second temporary source data file and a second target data digest value of the second temporary target file, and if the second source data digest value and the second target data digest value are different, the verification result of the second preset number of the target data items is a verification error; the second preset number is smaller than the first preset number;
And a comparing unit 240, configured to compare the second preset number of the target data items and the second preset number of the source data items line by line, so as to obtain an error data item.
Optionally, the comparing unit 240 is configured to:
And identifying the position of the source data item by using a source offset value, identifying the position of the target data item by using a target offset value, comparing the character strings of the second preset number of the target data items and the second preset number of the source data items row by row, gradually increasing the source offset value and the target offset value if the character string comparison is the same, and identifying the target data item corresponding to the target offset value as an error data item if the character string comparison is different.
Optionally, the source offset value includes a primary source offset value and a secondary source offset value, and the target offset value includes a primary target offset value and a secondary target offset value; the first-level source offset value is used for identifying positions of all the source data items ordered according to a preset sequence, the first-level target offset value is used for identifying positions of all the target data items ordered according to the preset sequence, the second-level source offset value is used for identifying positions of a second preset number of the source data items ordered according to the preset sequence, and the second-level target offset value is used for identifying positions of the second preset number of the target data items ordered according to the preset sequence;
the comparing unit 240 is configured to:
Comparing character strings of the second preset number of the target data items with the second preset number of the source data items row by row, and gradually increasing the secondary source offset value and the secondary target offset value if the character string comparison is the same;
If the character string comparison is different, confirming the target data item corresponding to the secondary target offset value as an error data item, comparing the character string of the source data item with the character string of the target data item, and if the character string of the source data item is larger than the character string of the target data item, increasing the primary target offset value of the error data item by one bit; if the character string of the source data item is smaller than that of the target data item, the primary source offset value of the error data item is increased by one bit.
Optionally, the splitting unit 210 is configured to:
Splitting the source data and the target data according to a preset data amount and sequencing the source data and the target data according to a splitting sequence to obtain a plurality of sequenced source data items and a plurality of sequenced target data items.
Optionally, the splitting unit 210 is configured to:
Splitting the source data and the target data according to a preset data amount, and sorting the split multiple source data items and multiple target data items in a merging sorting mode to obtain sorted multiple source data items and multiple target data items. Optionally, the apparatus further comprises a conversion unit for:
And carrying out data mapping on the source data to obtain source data which accords with the data format of the target data.
Optionally, the device further comprises a splicing unit, and the splicing unit is used for:
And acquiring initial data and original data, extracting important data in the initial data, splicing the important data to obtain the source data, and extracting the important data in the original data, splicing the important data to obtain the target data.
Based on the data verification method provided by the above embodiment, the embodiment of the present application further provides a data verification device, where the data verification device includes:
processors and memory, the number of processors may be one or more. In some embodiments of the application, the processor and memory may be connected by a bus or other means.
The memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include NVRAM. The memory stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for performing various operations. The operating system may include various system programs for implementing various underlying services and handling hardware-based tasks.
The processor controls the operation of the terminal device, which may also be referred to as a CPU.
The method disclosed by the embodiment of the application can be applied to a processor or realized by the processor. The processor may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor described above may be a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The embodiments of the present application also provide a computer readable storage medium storing a program code for executing any one of the methods of the foregoing embodiments.
In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
When introducing elements of various embodiments of the present application, the articles "a," "an," "the," and "said" are intended to mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements.
It should be noted that, it will be understood by those skilled in the art that all or part of the above-mentioned method embodiments may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-mentioned method embodiments when executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
Computer program code for carrying out operations of the present application may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, wherein the units and modules illustrated as separate components may or may not be physically separate. In addition, some or all of the units and modules can be selected according to actual needs to achieve the purpose of the embodiment scheme. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely a preferred embodiment of the present application, and the present application has been disclosed in the above description of the preferred embodiment, but is not limited thereto. Any person skilled in the art can make many possible variations and modifications to the technical solution of the present application or modifications to equivalent embodiments using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present application. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present application still fall within the scope of the technical solution of the present application.

Claims (10)

1. A method of data validation, the method comprising:
Acquiring source data and target data, splitting the source data and the target data according to a preset data amount to respectively obtain a plurality of source data items and a plurality of target data items; the source data is subjected to data migration to obtain the target data;
Reading a first preset number of source data items and target data items, respectively constructing a first-stage temporary source data file and a first-stage temporary target file, respectively calculating a first-stage source data abstract value of the first-stage temporary source data file and a first-stage target data abstract value of the first-stage temporary target file, and if the first-stage source data abstract value is the same as the first-stage target data abstract value, determining that the verification result of the first preset number of target data items is successful;
If the primary source data abstract value and the primary target data abstract value are different, respectively reading a second preset number of source data items and target data items from the primary temporary source data file and the primary temporary target file, respectively constructing a secondary temporary source data file and a secondary temporary target file, respectively calculating a secondary source data abstract value of the secondary temporary source data file and a secondary target data abstract value of the secondary temporary target file, and if the secondary source data abstract value and the secondary target data abstract value are different, determining that the verification result of the target data items of the second preset number is verification error; the second preset number is smaller than the first preset number;
and comparing the target data items of the second preset number with the source data items of the second preset number line by line to obtain error data items.
2. The method of claim 1, wherein the comparing the second predetermined number of the target data items with the second predetermined number of the source data items row by row, obtaining the error data items comprises:
And identifying the position of the source data item by using a source offset value, identifying the position of the target data item by using a target offset value, comparing the character strings of the second preset number of the target data items and the second preset number of the source data items row by row, gradually increasing the source offset value and the target offset value if the character string comparison is the same, and identifying the target data item corresponding to the target offset value as an error data item if the character string comparison is different.
3. The method of claim 2, wherein the source offset value comprises a primary source offset value and a secondary source offset value, and the target offset value comprises a primary target offset value and a secondary target offset value; the first-level source offset value is used for identifying positions of all the source data items ordered according to a preset sequence, the first-level target offset value is used for identifying positions of all the target data items ordered according to the preset sequence, the second-level source offset value is used for identifying positions of a second preset number of the source data items ordered according to the preset sequence, and the second-level target offset value is used for identifying positions of the second preset number of the target data items ordered according to the preset sequence;
the comparing the second preset number of the target data items with the second preset number of the source data items row by row in character strings gradually increases the source offset value and the target offset value if the character string comparison is the same, and if the character string comparison is different, confirming the target data item corresponding to the target offset value as an error data item includes:
Comparing character strings of the second preset number of the target data items with the second preset number of the source data items row by row, and gradually increasing the secondary source offset value and the secondary target offset value if the character string comparison is the same;
If the character string comparison is different, confirming the target data item corresponding to the secondary target offset value as an error data item, comparing the character string of the source data item with the character string of the target data item, and if the character string of the source data item is larger than the character string of the target data item, increasing the primary target offset value of the error data item by one bit; if the character string of the source data item is smaller than that of the target data item, the primary source offset value of the error data item is increased by one bit.
4. The method of claim 1, wherein splitting the source data and the target data according to a preset data amount to obtain a plurality of source data items and a plurality of target data items respectively comprises:
Splitting the source data and the target data according to a preset data amount and sequencing the source data and the target data according to a splitting sequence to obtain a plurality of sequenced source data items and a plurality of sequenced target data items.
5. The method of claim 4, wherein splitting the source data and the target data by a preset amount of data and ordering by split order comprises:
Splitting the source data and the target data according to a preset data amount, and sorting the split multiple source data items and multiple target data items in a merging sorting mode to obtain sorted multiple source data items and multiple target data items.
6. The method according to claim 1, wherein the method further comprises:
And carrying out data mapping on the source data to obtain source data which accords with the data format of the target data.
7. The method according to any one of claims 1-6, further comprising:
And acquiring initial data and original data, extracting important data in the initial data, splicing the important data to obtain the source data, and extracting the important data in the original data, splicing the important data to obtain the target data.
8. A data verification device, the device comprising:
The splitting unit is used for acquiring source data and target data, splitting the source data and the target data according to preset data quantity, and respectively obtaining a plurality of source data items and a plurality of target data items; the source data is subjected to data migration to obtain the target data;
The first computing unit is used for reading a first preset number of the source data items and the target data items, respectively constructing a first-stage temporary source data file and a first-stage temporary target file, respectively computing a first-stage source data abstract value of the first-stage temporary source data file and a first-stage target data abstract value of the first-stage temporary target file, and if the first-stage source data abstract value is the same as the first-stage target data abstract value, verifying results of the first preset number of the target data items are successful in verification;
The second computing unit is configured to read a second preset number of source data items and target data items from the first temporary source data file and the first temporary target data file, respectively, if the first-level source data digest value and the first-level target data digest value are different, respectively, construct a second temporary source data file and a second temporary target file, respectively, and calculate a second source data digest value of the second temporary source data file and a second target data digest value of the second temporary target file, if the second source data digest value and the second target data digest value are different, the verification result of the second preset number of target data items is a verification error; the second preset number is smaller than the first preset number;
and the comparison unit is used for comparing the target data items with the second preset number of source data items row by row to obtain error data items.
9. A data verification device, the device comprising: a processor and a memory;
The memory is used for storing instructions;
The processor being configured to execute the instructions in the memory and to perform the method of any of claims 1-7.
10. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-7.
CN202410375035.7A 2024-03-29 2024-03-29 Data verification method, device, equipment and storage medium Pending CN118193514A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410375035.7A CN118193514A (en) 2024-03-29 2024-03-29 Data verification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410375035.7A CN118193514A (en) 2024-03-29 2024-03-29 Data verification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118193514A true CN118193514A (en) 2024-06-14

Family

ID=91392638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410375035.7A Pending CN118193514A (en) 2024-03-29 2024-03-29 Data verification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118193514A (en)

Similar Documents

Publication Publication Date Title
US10713589B1 (en) Consistent sort-based record-level shuffling of machine learning data
CN108108127B (en) File reading method and system
CN103888254A (en) Network information verification method and apparatus
CN108108260B (en) Resource file verification method and device
US9348832B2 (en) Method and device for reassembling a data file
CN112307374A (en) Jumping method, device and equipment based on backlog and storage medium
CN113157651B (en) Method, system, equipment and medium for renaming resource files of android project in batches
CN111290998A (en) Method, device and equipment for calibrating migration data and storage medium
CN113377740A (en) Railway metadata management method, application method and device
CN104378397A (en) Method and system for issuing incremental updating of program package
CN115237444A (en) Concurrent control method, device and equipment based on version number and storage medium
CN113568604B (en) Method and device for updating wind control strategy and computer readable storage medium
CN113515303A (en) Project transformation method, device and equipment
CN111274202A (en) Electronic contract generating method and device, computer equipment and storage medium
CN117036115A (en) Contract data verification method, device and server
CN118193514A (en) Data verification method, device, equipment and storage medium
CN110069455A (en) A kind of file mergences method and device
CN111737349A (en) Data consistency checking method and device
CN113704123B (en) Interface testing method, device, equipment and storage medium
CN116401229A (en) Database data verification method, device and equipment
CN106326310B (en) Resource encryption updating method for mobile phone client software
CN113342647A (en) Test data generation method and device
CN115002079B (en) Short address generation method, device, equipment and storage medium
CN116662622B (en) Power grid data consistency comparison method and device, medium and electronic equipment
CN112433743B (en) File updating method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination