CN111680004B - Method and device for checking migration accuracy of unstructured image file - Google Patents

Method and device for checking migration accuracy of unstructured image file Download PDF

Info

Publication number
CN111680004B
CN111680004B CN202010513532.0A CN202010513532A CN111680004B CN 111680004 B CN111680004 B CN 111680004B CN 202010513532 A CN202010513532 A CN 202010513532A CN 111680004 B CN111680004 B CN 111680004B
Authority
CN
China
Prior art keywords
migration
list
image file
unstructured image
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010513532.0A
Other languages
Chinese (zh)
Other versions
CN111680004A (en
Inventor
牛安宇
单亚冰
刘朝晨
李慧
郝炎
秦荣倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010513532.0A priority Critical patent/CN111680004B/en
Publication of CN111680004A publication Critical patent/CN111680004A/en
Application granted granted Critical
Publication of CN111680004B publication Critical patent/CN111680004B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems

Abstract

The invention provides a method and a device for checking the migration accuracy of unstructured image files, wherein the method comprises the following steps: reading an unstructured image file from an migration system, obtaining an migration MD5 value and migration attribute information of the unstructured image file, writing an migration list, and sending the unstructured image file and the migration list from the migration system to the migration system; receiving an unstructured image file and an migration list sent by an migration system, calculating a migration MD5 value and migration attribute information of the received unstructured image file, and writing the migration list; storing the received unstructured image file into an migration system according to the migration MD5 value and the migration MD5 value; and determining the migration accuracy of the unstructured image file according to the migration list and the migration list. The invention can accurately and efficiently transfer the unstructured image file, shortens the time from the transfer system to the transfer system of the unstructured image file, and ensures the continuity of banking business.

Description

Method and device for checking migration accuracy of unstructured image file
Technical Field
The present invention relates to the field of data migration technologies, and in particular, to a method and an apparatus for verifying migration accuracy of unstructured image files.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In the construction and development process of a bank IT system, a plurality of system upgrading and transformation and large-scale migration of historical data are often experienced, how to ensure the accuracy of data migration becomes a key proposition in front of a bank IT engineer, especially in the unstructured data migration process, because unstructured data is relatively complex to a structured data comparison process, the comparison period is longer, and the comparison process can not be completed in a shorter production window under the condition of large-scale data quantity.
The conventional bank data migration checking mode is generally performed by adopting an automatic checking mode combined with service verification mode: firstly, ensuring the integrity of data migration by comparing data key data items in a data migration source system and a data migration target system database; and then, extracting a small part of data from the total data to perform service continuity test so as to ensure the accuracy of data migration.
The prior method adopts a key data item mode of checking the total data in the database after the data migration is completed, and cannot ensure that the file is migrated from a source system to a target system without damage or modification of the content when the image file migration is involved; moreover, the conventional method adopts a key data item mode of checking the whole data in the database after the data migration is completed, so that the integrity of the data migration is ensured, but the method consumes a long time when the data scale is large, and the data migration checking work can not be completed in a production window of a banking system. The longer time consumption is mainly manifested in the following links: reading or exporting database records; the comparison adopts a full-scale data set comparison method, and the calculated amount is the Cartesian product of the data scale.
Therefore, how to provide a new solution to the above technical problem is a technical problem to be solved in the art.
Disclosure of Invention
The embodiment of the invention provides a method for checking the migration accuracy of unstructured image files, which realizes the efficient and accurate migration of unstructured image files and comprises the following steps:
reading an unstructured image file from an migration system, obtaining an migration MD5 value and migration attribute information of the unstructured image file, writing an migration list, and sending the unstructured image file and the migration list from the migration system to the migration system;
receiving an unstructured image file and an migration list sent by an migration system, calculating a migration MD5 value and migration attribute information of the received unstructured image file, and writing the migration list; storing the received unstructured image file into an migration system according to the migration MD5 value and the migration MD5 value;
and determining the migration accuracy of the unstructured image file according to the migration list and the migration list.
The embodiment of the invention also provides a device for checking the migration accuracy of the unstructured image file, which comprises the following steps:
the data migration module is used for reading the unstructured image file from the migration system, obtaining the migration MD5 value and the migration attribute information of the unstructured image file, writing in the migration list, and sending the unstructured image file and the migration list from the migration system to the migration system;
the data migration module is used for receiving the unstructured image file and the migration list sent by the migration system, calculating the migration MD5 value and the migration attribute information of the received unstructured image file, and writing the migration list; storing the received unstructured image file into an migration system according to the migration MD5 value and the migration MD5 value;
and the data checking module is used for determining the migration accuracy of the unstructured image file according to the migration list and the migration list.
The embodiment of the invention also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method for checking the migration accuracy of the unstructured image file when executing the computer program
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the method for checking the migration accuracy of the unstructured image files.
According to the method and the device for checking the migration accuracy of the unstructured image files, the migration MD5 value and the migration attribute information of the structured image files are acquired in the migration system, the migration list is written in, the unstructured image files and the migration list are sent to the migration system from the migration system, when the unstructured image files and the migration list sent by the migration system are received by the migration system, the migration MD5 value and the migration attribute information of the received unstructured image files are calculated, the migration list is written in, and the received unstructured image files are stored in the migration system according to the migration MD5 value and the migration MD5 value; by checking the migration MD5 value and the migration MD5 value in the migration process, the accuracy of the unstructured image file in the data migration process can be ensured, the comparison of the total data of the unstructured image file after the data migration is completed is avoided, the time from the migration system to the migration system of the unstructured image file and the data checking time after the migration is completed can be greatly shortened, the checking process of the whole data migration can be completed in a banking production time window, and the continuity of banking business is ensured; and then determining the migration accuracy of the unstructured image file according to the migration list and the migration list, and checking the migration list and the migration list after data migration is completed, so that the accuracy of data migration can be further ensured.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a schematic diagram of an apparatus for verifying the migration accuracy of unstructured image files according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating an internal operation of the data migration module and the data migration module of the apparatus for verifying the migration accuracy of unstructured image files according to an embodiment of the present invention.
FIG. 3 is a flow chart illustrating the operation of the data checking module of the apparatus for checking the migration accuracy of unstructured image files according to the embodiment of the invention.
Fig. 4 is a schematic diagram illustrating a distributed comparison of an apparatus for verifying migration accuracy of unstructured image files according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of a method for verifying the migration accuracy of unstructured image files according to an embodiment of the present invention.
FIG. 6 is a schematic diagram of a computer device running a method for verifying the migration accuracy of unstructured image files according to the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
As shown in a schematic diagram of an apparatus for checking the migration accuracy of an unstructured image file in the embodiment of the present invention in fig. 1, the embodiment of the present invention further provides an apparatus for checking the migration accuracy of an unstructured image file, so as to realize efficient and accurate migration of an unstructured image file, including:
the data migration module 101 is configured to read an unstructured image file from the migration system, obtain an export MD5 value and export attribute information of the unstructured image file, write an export list, and send the unstructured image file and the export list from the migration system to the migration system;
the data migration module 102 is configured to receive the unstructured image file and the migration list sent by the migration system, calculate a migration MD5 value and migration attribute information of the received unstructured image file, and write the migration list; storing the received unstructured image file into an migration system according to the migration MD5 value and the migration MD5 value;
the data checking module 103 is configured to determine migration accuracy of the unstructured image file according to the migration list and the migration list.
According to the device for checking the migration accuracy of the unstructured image files, the migration MD5 value and the migration attribute information of the structured image files are firstly obtained in the migration system, the migration list is written in, the unstructured image files and the migration list are sent to the migration system from the migration system, when the unstructured image files and the migration list sent by the migration system are received by the migration system, the migration MD5 value and the migration attribute information of the received unstructured image files are calculated, the migration list is written in, and the received unstructured image files are stored in the migration system according to the migration MD5 value and the migration MD5 value; by checking the migration MD5 value and the migration MD5 value in the migration process, the accuracy of the unstructured image file in the data migration process can be ensured, the comparison of the whole data of the unstructured image file after the completion of the data is avoided, the comparison time of the unstructured image file from the migration system to the migration system can be greatly shortened, the checking process of the whole data migration can be completed in a banking industry production time window, and the continuity of banking business is ensured; and then determining the migration accuracy of the unstructured image file according to the migration list and the migration list, and checking the migration list and the migration list after data migration is completed, so that the accuracy of data migration can be further ensured.
Aiming at the characteristics of large storage space occupied by unstructured image files, low read-write efficiency and the like, the difficulty of checking the data of the whole unstructured image files after migration is high, the period is long, the requirements of high production and operation stability, short production window and the like of a banking system cannot be met, and the data checking needs to be completed quickly after the data migration so as to ensure the accuracy of the migrated data. If the prior art full data set alignment scheme is used, the calculated amount is a Cartesian product of the data size (assuming that the amount of migrated data is n records, then the aligned calculated amount is n n ) The method comprises the steps of carrying out a first treatment on the surface of the This requires a more accurate migration flow and a more efficient way of reconciliation to ensure banking unstructured data migration accuracy.
The embodiment of the invention provides a device for checking the migration accuracy of unstructured image files, which comprises the following components:
the data migration module is used for reading the unstructured image file from the migration system, obtaining the migration MD5 value and the migration attribute information of the unstructured image file, writing in the migration list, and sending the unstructured image file and the migration list from the migration system to the migration system;
the data migration module is used for receiving the unstructured image file and the migration list sent by the migration system, calculating the migration MD5 value and the migration attribute information of the received unstructured image file, and writing the migration list; storing the received unstructured image file into an migration system according to the migration MD5 value and the migration MD5 value;
and the data checking module is used for determining the migration accuracy of the unstructured image file according to the migration list and the migration list.
The embodiment of the invention can accurately store the received unstructured image file into the migration system by comparing the migration MD5 value with the migration MD5 value.
In implementing the above-mentioned device for checking the migration accuracy of unstructured image files according to the embodiment of the present invention, the MD5 algorithm is a widely used cryptographic hash function, and can generate a 128-bit (16-byte) hash value to ensure complete and consistent information transmission. The embodiment of the invention also uses distributed computation: the overall computation is broken down into many small parts and distributed to multiple computing nodes for processing, which may include one to multiple computers or servers. Therefore, the overall calculation time can be saved, and the calculation efficiency is greatly improved. However, in the decomposition process, a reasonable decomposition mode is required to be adopted for different scenes, so that the distributed calculation result is ensured to be the same as the centralized calculation result.
Fig. 2 is an internal workflow diagram of a data migration module and a data migration module of an apparatus for verifying migration accuracy of unstructured image files according to an embodiment of the present invention, as shown in fig. 2, in which the data migration module operates in an migration system and the data migration module operates in the migration system. And the data migration module is used for generating an unstructured image file and a list file in a specified format by combining the data storage characteristics of the migration system, calculating the MD5 value and the migration attribute information of the file when the file is migrated, and writing the MD5 value and the migration attribute information into the migration list. An outgoing list is locally reserved, and meanwhile, the outgoing list and the unstructured image files are transmitted to an outgoing system through FTP.
The data migration module reads the unstructured image file and the migration list generated by the data migration module, calculates the migration MD5 value and the migration attribute information of the received unstructured image file, compares the migration MD5 value recorded in the migration list file, stores the received unstructured image file into the migration system after checking, and writes the migration MD5 value and the migration attribute information of the received unstructured image file into the migration list. The process uses distributed transactions to ensure consistency of file migration manifest and migration manifest results.
In an embodiment of the present invention, when implementing the apparatus for verifying migration accuracy of unstructured image files, the foregoing data migration module is specifically configured to:
comparing the immigrate MD5 value with the immigrate MD5 value;
and when the immigrating MD5 value is the same as the immigrating MD5 value, storing the received unstructured image file into an immigrating system.
In the embodiment, by comparing the migration MD5 value with the migration MD5 value, when the migration MD5 value is the same as the migration MD5 value, the received unstructured image file is stored in the migration system, so that the accuracy of the unstructured image file in the data migration process can be realized, the full data comparison of the unstructured image file after the data migration is completed is avoided, the data verification time of the unstructured image file after the data migration is completed can be greatly shortened, the whole data migration verification process can be completed in a banking industry production time window, and the continuity of banking business is ensured.
In an embodiment of the present invention, when implementing the apparatus for verifying migration accuracy of unstructured image files, the foregoing data migration module is further configured to: when the migration MD5 value is different from the migration MD5 value, the unstructured image files corresponding to the migration MD5 value are used as migration failure files and recorded into a migration failure list, and after all the unstructured image files except the migration failure files are stored into the migration system, migration is performed again.
In an embodiment, the migration system records a migration failure list for a file with migration failure, and subsequently attempts migration again, where the steps are auxiliary flows, and when there is no migration failure, the migration failure list is not generated.
FIG. 3 is a flow chart illustrating the operation of the data checking module of the apparatus for checking the migration accuracy of unstructured image files according to the embodiment of the invention. As shown in FIG. 3, the data checking module operates after the data migration is completed, and is mainly responsible for checking whether the migration list is consistent with the migration list or not, so as to ensure that no package missing occurs in the migration process. Meanwhile, the comparison after migration is finished relates to larger data quantity, in order to ensure that the checking process can be finished within a limited production time window of a banking system, the module adopts a distributed computing mode to group the contents in a list file according to hash values, ensures that data with the same remainder obtained by dividing the hash values by a positive integer are distributed into the same group, uses a plurality of servers to respectively compare an migration list with the same group, and finally gathers difference data of different groups.
In an embodiment of the present invention, when implementing the apparatus for verifying migration accuracy of unstructured image files, the foregoing data verification module is specifically configured to:
grouping the migration list and the migration list respectively, and distributing the migration list and the migration list to a plurality of computing nodes for distributed comparison;
and summarizing and analyzing the distributed comparison result to determine the migration accuracy of the unstructured image file.
In the embodiment, the comparison method of grouping calculation and multi-node parallel processing can improve the comparison efficiency of migration bill, effectively shorten the check time after migration is finished, and ensure the continuity of banking business.
In an embodiment, after the data migration process from the migration system to the migration system of the unstructured image file is completed, the migration list and the migration list are checked through the data checking module, so as to check whether the migration list is consistent with the migration list or not, and ensure that no package missing occurs in the migration process. Meanwhile, the comparison after migration is completed involves a large data volume, and in order to ensure that the checking process can be completed within a limited production time window of a banking system, the module adopts a distributed computing mode; the migration list and the migration list are checked by adopting a reasonable data grouping mode and a distributed computing method, so that the large-scale data check can be completed within a specified production time window, and the time for checking the list is greatly shortened.
In an embodiment of the present invention, when implementing the apparatus for verifying migration accuracy of unstructured image files, the aforementioned data verification module is further configured to:
calculating hash values of the migrated attribute information of the unstructured image files migrated in the migration list;
calculating a hash value of the migration attribute information of the unstructured image file received in the migration list;
according to the hash value of the immigrating attribute information of the unstructured image files and the hash value of the immigrating attribute information of the unstructured image files received in the immigrating list, a plurality of groups of immigrating lists and immigrating lists to be compared are formed according to a set splitting rule;
and distributing the migration list and the migration list to be compared to a plurality of computing nodes for simultaneous computation, and carrying out distributed comparison.
In an embodiment of the present invention, when implementing the apparatus for verifying migration accuracy of unstructured image files, the aforementioned data verification module is further configured to:
setting the grouping number;
dividing the hash value of the migrated unstructured image file migration attribute information in the migration list by the set grouping number to obtain a migration list remainder;
dividing the hash value of the migration attribute information of the unstructured image file received in the migration list by the set packet number to obtain a migration list remainder;
and distributing the unstructured image file attribute information with the remainder of the migration list and the remainder of the migration list into the same comparison group to form a plurality of groups of migration lists to be compared and migration lists.
Fig. 4 is a schematic diagram illustrating a distributed comparison of an apparatus for verifying migration accuracy of unstructured image files according to an embodiment of the present invention. As shown in fig. 4, in an embodiment, the unstructured image file migration attribute information migrated in the foregoing migration list and the unstructured image file migration attribute information received in the migration list may include an unstructured image file ID (or ID), for example, a 32-bit UUID of "b75c823182f543c695f71f6d0527009 f"; if two are directly 10 8 The list of bar details is directly compared to find the difference, the calculated amount is 10 16 Moreover, the method can only calculate on one machine, is long in time consumption and cannot be completed within a specified production time window.
The migration list and the migration list are divided into 100 according to a set splitting rule, the splitting rule is that the image file ID takes a hash value, if the result is 0, the image file ID is allocated to the 1 st group, if the result is 1, the image file ID is allocated to the 2 nd group, and the like, if the result is 99, the image file ID is allocated to the 100 th group, the data can be divided into 100 groups, and about 10 files in each group are arranged 6 The record of the stripe ensures that the same data in the migration list as in the migration list must be allocated to the same group of small files. And comparing the same group of migration small files with migration small files, and finally summarizing 100 comparison results to obtain the final difference.
The calculated amount of the method is 10 6 ×10 6 ×100=10 14 And can be distributed to 100 nodes to be executed simultaneously, which is time-consuming by 1/10 of that of direct comparison 4 The comparison efficiency is greatly improved. In addition, if the computing resources are enough, the grouping number can be increased continuously, the comparison efficiency is improved to a greater extent, for example, the list is divided into 1000 groups, and the data amount of each group is 10 5 The calculated amount becomes 10 5 ×10 5 ×1000=10 13 And can be distributed to 1000 songs of nodes and executed simultaneously, which is time-consuming by 1/10 of that of direct comparison 6 . If the full data is divided into n groups and distributed to n nodes for simultaneous computation, it takes 1/10 of the time to directly compare n . It can be seen that by this way of collation, the whole can be madeThe checking efficiency is improved exponentially.
The embodiment of the invention also provides a method for checking the migration accuracy of the unstructured image file, which is described in the following embodiment. Because the principle of solving the problem of the method is similar to that of a device for checking the migration accuracy of the unstructured image file, the implementation of the method can refer to the implementation of the device for checking the migration accuracy of the unstructured image file, and the repetition is omitted.
FIG. 5 is a schematic diagram of a method for verifying the migration accuracy of unstructured image files according to an embodiment of the present invention. As shown in fig. 5, an embodiment of the present invention further provides a method for checking migration accuracy of an unstructured image file, including:
step 501: reading an unstructured image file from an migration system, obtaining an migration MD5 value and migration attribute information of the unstructured image file, writing an migration list, and sending the unstructured image file and the migration list from the migration system to the migration system;
step 502: receiving an unstructured image file and an migration list sent by an migration system, calculating a migration MD5 value and migration attribute information of the received unstructured image file, and writing the migration list; storing the received unstructured image file into an migration system according to the migration MD5 value and the migration MD5 value;
step 503: and determining the migration accuracy of the unstructured image file according to the migration list and the migration list.
In an embodiment of a method for verifying migration accuracy of unstructured image files according to an embodiment of the present invention, storing a received unstructured image file in an migration system according to an migrate MD5 value and an migrate MD5 value includes:
comparing the immigrate MD5 value with the immigrate MD5 value;
and when the immigrating MD5 value is the same as the immigrating MD5 value, storing the received unstructured image file into an immigrating system.
In an embodiment of the method for verifying migration accuracy of unstructured image files according to the embodiment of the present invention, when the value of the migration MD5 is different from the value of the migration MD5, the unstructured image files corresponding to the value of the migration MD5 are used as migration failure files, recorded in a migration failure list, and migrated again after all other unstructured image files except for the migration failure files are stored in the migration system.
In an embodiment of the present invention, when implementing a method for checking migration accuracy of unstructured image files, determining migration accuracy of unstructured image files according to an export list and an import list includes:
grouping the migration list and the migration list respectively, and distributing the migration list and the migration list to a plurality of computing nodes for distributed comparison;
and summarizing and analyzing the distributed comparison result to determine the migration accuracy of the unstructured image file.
In an embodiment of the present invention, when implementing a method for checking migration accuracy of unstructured image files, in one embodiment, an migration list and an migration list are respectively grouped and distributed to a plurality of computing nodes for distributed comparison, where the method includes:
calculating hash values of the migrated attribute information of the unstructured image files migrated in the migration list;
calculating a hash value of the migration attribute information of the unstructured image file received in the migration list;
according to the hash value of the immigrating attribute information of the unstructured image files and the hash value of the immigrating attribute information of the unstructured image files received in the immigrating list, a plurality of groups of immigrating lists and immigrating lists to be compared are formed according to a set splitting rule;
and distributing the migration list and the migration list to be compared to a plurality of computing nodes for simultaneous computation, and carrying out distributed comparison.
In an embodiment of the present invention, when implementing a method for checking migration accuracy of unstructured image files, according to a hash value of migration attribute information of unstructured image files migrated in a migration list and a hash value of migration attribute information of unstructured image files received in a migration list, a plurality of groups of migration lists and migration lists to be compared are formed according to a set splitting rule, including:
setting the grouping number;
dividing the hash value of the migrated unstructured image file migration attribute information in the migration list by the set grouping number to obtain a migration list remainder;
dividing the hash value of the migration attribute information of the unstructured image file received in the migration list by the set packet number to obtain a migration list remainder;
and distributing the unstructured image file attribute information with the remainder of the migration list and the remainder of the migration list into the same comparison group to form a plurality of groups of migration lists to be compared and migration lists.
As shown in fig. 6, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the above method for checking the migration accuracy of the unstructured image file when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the method for checking the migration accuracy of the unstructured image files.
The data migration program of the embodiment of the invention is divided into two parts, wherein one part is the data migration program running in the migration system, and the other part is the data migration program running in the migration system. The consistency of the migration list generated by the migration program and the migration file is a key constraint factor of the accuracy of the subsequent data check. When the migration program is migrated, verifying the migration list and the corresponding image file ensures the consistency of the migration list and the migration image file; the adoption of distributed transactions when the migration program writes files to the storage device and writes the migration list to the database ensures consistency of the migration files and the migration list. And an MD5 value verification link is introduced in the file migration process, so that the file is not damaged in the file migration process. The data checking link adopts a distributed computing method to divide the data into a plurality of groups, and the comparison mode in the groups not only greatly reduces the calculated amount, but also can lead the data comparison of different groups to be parallel, thereby greatly shortening the checking time.
In actual production, if the file is required to be ensured to be lossless or the content is required to be unchanged in the migration process, the method of checking the MD5 value of the file during migration is required to be used for ensuring; if the data is required to be ensured not to be heavy and not to leak in the migration process, the data items of the migration system and the migration system are required to be checked after the data is migrated; if a large amount of data in the data list is required to be checked in a limited production time window, a data checking algorithm is required to be optimized, and a plurality of server resources are adopted for parallel calculation, so that the comparison efficiency is improved.
The embodiment of the invention adopts a double checking mode of checking the MD5 value of the file in the migration process and checking the list after the migration is completed, thereby ensuring the accuracy of data migration; the migration list checking adopts a reasonable data grouping mode and a distributed computing method, so that large-scale data checking can be completed within a specified production time window. The accuracy of large-scale image data migration is guaranteed by the two points.
In summary, the method and the device for verifying migration accuracy of an unstructured image file provided by the embodiments of the present invention firstly acquire an migration MD5 value and migration attribute information of the structured image file in an migration system, write an migration list, send the unstructured image file and the migration list from the migration system to the migration system, and when the migration system receives the unstructured image file and the migration list sent by the migration system, calculate the migration MD5 value and the migration attribute information of the received unstructured image file, write the migration list, and accurately store the received unstructured image file to the migration system by comparing the migration MD5 value and the migration MD5 value; by checking the migration MD5 value and the migration MD5 value in the migration process, the accuracy of the unstructured image file in the data migration process can be ensured, the whole data comparison of the unstructured image file after the data migration is completed is avoided, the time from the migration system to the migration system and the data checking time from the migration system to the migration system of the unstructured image file can be greatly shortened, the whole data migration checking process can be completed in a banking production time window, and the continuity of banking business is ensured; then determining the migration accuracy of the unstructured image file according to the migration list and the migration list, and checking the migration list and the migration list after data migration is completed, so that the accuracy of data migration can be further ensured; the comparison after the migration is finished involves a large data volume, and a distributed computing mode is adopted to ensure that the checking process can be finished within a limited production time window of a banking system; the migration list and the migration list are checked by adopting a reasonable data grouping mode and a distributed computing method, so that the large-scale data check can be completed within a specified production time window, and the time for checking the list is greatly shortened.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A method for verifying the migration accuracy of unstructured image files, comprising:
reading an unstructured image file from an migration system, obtaining an migration MD5 value and migration attribute information of the unstructured image file, writing an migration list, and sending the unstructured image file and the migration list from the migration system to the migration system;
receiving an unstructured image file and an migration list sent by an migration system, calculating a migration MD5 value and migration attribute information of the received unstructured image file, and writing the migration list; storing the received unstructured image file into an migration system according to the migration MD5 value and the migration MD5 value;
determining the migration accuracy of the unstructured image file according to the migration list and the migration list;
determining the migration accuracy of the unstructured image file according to the migration list and the migration list comprises the following steps:
grouping the migration list and the migration list respectively, and distributing the migration list and the migration list to a plurality of computing nodes for distributed comparison;
summarizing and analyzing the distributed comparison result to determine the migration accuracy of the unstructured image file;
grouping the migration list and the migration list respectively, and distributing the migration list and the migration list to a plurality of computing nodes for distributed comparison, wherein the distributed comparison comprises the following steps:
calculating hash values of the migrated attribute information of the unstructured image files migrated in the migration list;
calculating a hash value of the migration attribute information of the unstructured image file received in the migration list;
according to the hash value of the immigrating attribute information of the unstructured image files and the hash value of the immigrating attribute information of the unstructured image files received in the immigrating list, a plurality of groups of immigrating lists and immigrating lists to be compared are formed according to a set splitting rule;
distributing a plurality of groups of migration lists to be compared and migration lists to a plurality of computing nodes for simultaneous computation, and carrying out distributed comparison;
according to the hash value of the migrating attribute information of the migrating unstructured image file in the migrating list and the hash value of the migrating attribute information of the unstructured image file received in the migrating list, a plurality of groups of to-be-compared migrating list and migrating list are formed according to a set splitting rule, and the method comprises the following steps:
setting the grouping number;
dividing the hash value of the migrated unstructured image file migration attribute information in the migration list by the set grouping number to obtain a migration list remainder;
dividing the hash value of the migration attribute information of the unstructured image file received in the migration list by the set packet number to obtain a migration list remainder;
and distributing the unstructured image file attribute information with the remainder of the migration list and the remainder of the migration list into the same comparison group to form a plurality of groups of migration lists to be compared and migration lists.
2. The method of claim 1, wherein storing the received unstructured image files in an migration system based on the values of the migrating MD5 and the values of the migrating MD5 comprises:
comparing the immigrate MD5 value with the immigrate MD5 value;
and when the immigrating MD5 value is the same as the immigrating MD5 value, storing the received unstructured image file into an immigrating system.
3. The method of claim 2, wherein when the migration MD5 value and the migration MD5 value are different, the unstructured image file corresponding to the migration MD5 value is recorded as a migration failure file in the migration failure list, and the migration is performed again after all the unstructured image files except the migration failure file are stored in the migration system.
4. An apparatus for verifying the migration accuracy of unstructured image files, comprising:
the data migration module is used for reading the unstructured image file from the migration system, obtaining the migration MD5 value and the migration attribute information of the unstructured image file, writing in the migration list, and sending the unstructured image file and the migration list from the migration system to the migration system;
the data migration module is used for receiving the unstructured image file and the migration list sent by the migration system, calculating the migration MD5 value and the migration attribute information of the received unstructured image file, and writing the migration list; storing the received unstructured image file into an migration system according to the migration MD5 value and the migration MD5 value;
the data checking module is used for determining the migration accuracy of the unstructured image file according to the migration list and the migration list;
the data checking module is specifically used for:
grouping the migration list and the migration list respectively, and distributing the migration list and the migration list to a plurality of computing nodes for distributed comparison;
summarizing and analyzing the distributed comparison result to determine the migration accuracy of the unstructured image file;
the data checking module is further used for:
calculating hash values of the migrated attribute information of the unstructured image files migrated in the migration list;
calculating a hash value of the migration attribute information of the unstructured image file received in the migration list;
according to the hash value of the immigrating attribute information of the unstructured image files and the hash value of the immigrating attribute information of the unstructured image files received in the immigrating list, a plurality of groups of immigrating lists and immigrating lists to be compared are formed according to a set splitting rule;
distributing a plurality of groups of migration lists to be compared and migration lists to a plurality of computing nodes for simultaneous computation, and carrying out distributed comparison;
the data checking module is further used for:
setting the grouping number;
dividing the hash value of the migrated unstructured image file migration attribute information in the migration list by the set grouping number to obtain a migration list remainder;
dividing the hash value of the migration attribute information of the unstructured image file received in the migration list by the set packet number to obtain a migration list remainder;
and distributing the unstructured image file attribute information with the remainder of the migration list and the remainder of the migration list into the same queue group to form a plurality of groups of migration and migration lists to be compared.
5. The apparatus of claim 4, wherein the data migration module is configured to:
comparing the immigrate MD5 value with the immigrate MD5 value;
and when the immigrating MD5 value is the same as the immigrating MD5 value, storing the received unstructured image file into an immigrating system.
6. The apparatus of claim 5, wherein the data migration module is further to: when the migration MD5 value is different from the migration MD5 value, the unstructured image files corresponding to the migration MD5 value are used as migration failure files and recorded into a migration failure list, and after all the unstructured image files except the migration failure files are stored into the migration system, migration is performed again.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of checking for migration accuracy of unstructured image files according to any of claims 1 to 3 when the computer program is executed by the processor.
8. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for executing a method of achieving the verification of migration accuracy of unstructured image files according to any one of claims 1 to 3.
CN202010513532.0A 2020-06-08 2020-06-08 Method and device for checking migration accuracy of unstructured image file Active CN111680004B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010513532.0A CN111680004B (en) 2020-06-08 2020-06-08 Method and device for checking migration accuracy of unstructured image file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010513532.0A CN111680004B (en) 2020-06-08 2020-06-08 Method and device for checking migration accuracy of unstructured image file

Publications (2)

Publication Number Publication Date
CN111680004A CN111680004A (en) 2020-09-18
CN111680004B true CN111680004B (en) 2023-09-22

Family

ID=72454042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010513532.0A Active CN111680004B (en) 2020-06-08 2020-06-08 Method and device for checking migration accuracy of unstructured image file

Country Status (1)

Country Link
CN (1) CN111680004B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345430A (en) * 2013-07-03 2013-10-09 中国科学院高能物理研究所 Distributed type storage pool fuzzy balancing method and system
CN106484690A (en) * 2015-08-24 2017-03-08 阿里巴巴集团控股有限公司 A kind of verification method of Data Migration and device
CN107037978A (en) * 2016-10-31 2017-08-11 福建亿榕信息技术有限公司 Data Migration bearing calibration and system
CN110032339A (en) * 2019-04-12 2019-07-19 北京旷视科技有限公司 Data migration method, device, system, equipment and storage medium
CN111125063A (en) * 2019-12-20 2020-05-08 无线生活(杭州)信息科技有限公司 Method and device for rapidly verifying data migration among clusters
CN111158900A (en) * 2019-12-09 2020-05-15 中国船舶重工集团公司第七一六研究所 Lightweight distributed parallel computing system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345430A (en) * 2013-07-03 2013-10-09 中国科学院高能物理研究所 Distributed type storage pool fuzzy balancing method and system
CN106484690A (en) * 2015-08-24 2017-03-08 阿里巴巴集团控股有限公司 A kind of verification method of Data Migration and device
CN107037978A (en) * 2016-10-31 2017-08-11 福建亿榕信息技术有限公司 Data Migration bearing calibration and system
CN110032339A (en) * 2019-04-12 2019-07-19 北京旷视科技有限公司 Data migration method, device, system, equipment and storage medium
CN111158900A (en) * 2019-12-09 2020-05-15 中国船舶重工集团公司第七一六研究所 Lightweight distributed parallel computing system and method
CN111125063A (en) * 2019-12-20 2020-05-08 无线生活(杭州)信息科技有限公司 Method and device for rapidly verifying data migration among clusters

Also Published As

Publication number Publication date
CN111680004A (en) 2020-09-18

Similar Documents

Publication Publication Date Title
EP3678346A1 (en) Blockchain smart contract verification method and apparatus, and storage medium
US11138031B2 (en) Framework for authoring data loaders and data savers
US20210049715A1 (en) Blockchain-based data procesing method, apparatus, and electronic device
CN111444196B (en) Method, device and equipment for generating Hash of global state in block chain type account book
CN104252481A (en) Dynamic check method and device for consistency of main and salve databases
CN102667734B (en) System and method for checking consistency of pointers in hierarchical database
TWI730690B (en) Method and device for simultaneously executing transactions in block chain, computer readable storage medium and computing equipment
CN110555770B (en) Block chain world state checking and recovering method based on incremental hash
WO2024021362A1 (en) Data verification method and apparatus for traffic replay
CN106897342A (en) A kind of data verification method and equipment
CN109446211A (en) A kind of consistency desired result method and device
CN108009223B (en) Method and device for detecting consistency of transaction data
CN114564499A (en) Lightweight financial data query, quantitative strategy development and retest method and device
CN107329966B (en) Machine data storage method and system
CN112948473A (en) Data processing method, device and system of data warehouse and storage medium
CN109857806B (en) Synchronous verification method and device for database table
CN110706108B (en) Method and apparatus for concurrently executing transactions in a blockchain
CN111680004B (en) Method and device for checking migration accuracy of unstructured image file
CN115373889A (en) Method and device for data comparison verification and data repair in data synchronization
CN115114297A (en) Data lightweight storage and search method and device, electronic equipment and storage medium
CN114356768A (en) Method and device for reducing transaction read-write conflict through placeholder
US20200204348A1 (en) Verifying a blockchain-type ledger
JP2023507688A (en) edge table representation of the process
US11768855B1 (en) Replicating data across databases by utilizing validation functions for data completeness and sequencing
CN109542900B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant