CN111680004A - Method and device for checking migration accuracy of unstructured image file - Google Patents

Method and device for checking migration accuracy of unstructured image file Download PDF

Info

Publication number
CN111680004A
CN111680004A CN202010513532.0A CN202010513532A CN111680004A CN 111680004 A CN111680004 A CN 111680004A CN 202010513532 A CN202010513532 A CN 202010513532A CN 111680004 A CN111680004 A CN 111680004A
Authority
CN
China
Prior art keywords
migration
list
unstructured image
value
image file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010513532.0A
Other languages
Chinese (zh)
Inventor
牛安宇
单亚冰
刘朝晨
李慧
郝炎
秦荣倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010513532.0A priority Critical patent/CN111680004A/en
Publication of CN111680004A publication Critical patent/CN111680004A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems

Abstract

The invention provides a method and a device for checking the migration accuracy of an unstructured image file, wherein the method comprises the following steps: reading the unstructured image file from the migration system, acquiring the migration MD5 value and the migration attribute information of the unstructured image file, writing the value and the migration attribute information into a migration list, and sending the unstructured image file and the migration list from the migration system to the migration system; receiving the unstructured image files and the migration lists sent by the migration system, calculating migration MD5 values and migration attribute information of the received unstructured image files, and writing the migration MD5 values and the migration attribute information into the migration lists; storing the received non-structural image file to the migration system according to the migration MD5 value and the migration MD5 value; and determining the migration accuracy of the unstructured image files according to the migration list and the migration list. The invention can accurately and efficiently transfer the unstructured image files, shorten the time from transferring the unstructured image files out of the system to transferring the unstructured image files into the system and ensure the continuity of banking business.

Description

Method and device for checking migration accuracy of unstructured image file
Technical Field
The invention relates to the technical field of data migration, in particular to a method and a device for checking migration accuracy of an unstructured image file.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In the construction and development process of a bank IT system, a plurality of times of system upgrading and transformation and large-scale migration of historical data are often performed, how to ensure the accuracy of data migration becomes a key proposition before bank IT engineers, and particularly in the unstructured data migration process, because the comparison process of unstructured data relative to structured data is relatively complex and the comparison period is long, the comparison process cannot be completed in a short production window under the condition of large-scale data volume.
The existing bank data migration check mode usually adopts a mode of combining automatic check and business verification: firstly, ensuring the integrity of data migration by comparing key data items in a database of a data migration source system and a database of a target system; and then, extracting a small part of data from the full amount of data to perform a service continuity test so as to ensure the accuracy of data migration.
The existing method adopts a key data item mode of checking the total data in the database after the data migration is finished, and when the image file migration is involved, the file migration from a source system to a target system cannot be guaranteed to be free of damage or content modification; in addition, the existing method adopts a key data item mode of checking the total data in the database after the data migration is finished, so that the completeness of the data migration is ensured, but the method has large data scale and long time consumption, and the work of checking the migrated data in a production window of a bank system can not be finished. The time consumption is mainly embodied in the following links: reading or exporting database records; the comparison adopts a full data set comparison mode, and the calculated amount is the Cartesian product of the data scale.
Therefore, how to provide a new solution, which can solve the above technical problems, is a technical problem to be solved in the art.
Disclosure of Invention
The embodiment of the invention provides a method for checking the migration accuracy of an unstructured image file, which realizes the efficient and accurate migration of the unstructured image file and comprises the following steps:
reading the unstructured image file from the migration system, acquiring the migration MD5 value and the migration attribute information of the unstructured image file, writing the value and the migration attribute information into a migration list, and sending the unstructured image file and the migration list from the migration system to the migration system;
receiving the unstructured image files and the migration lists sent by the migration system, calculating migration MD5 values and migration attribute information of the received unstructured image files, and writing the migration MD5 values and the migration attribute information into the migration lists; storing the received non-structural image file to the migration system according to the migration MD5 value and the migration MD5 value;
and determining the migration accuracy of the unstructured image files according to the migration list and the migration list.
The embodiment of the invention also provides a device for checking the migration accuracy of the unstructured image file, which comprises the following steps:
the data migration module is used for reading the unstructured image files from the migration system, obtaining the migration MD5 values and the migration attribute information of the unstructured image files, writing the migration MD5 values and the migration attribute information into a migration list, and sending the unstructured image files and the migration list from the migration system to the migration system;
the data migration module is used for receiving the unstructured image files and the migration list sent by the migration system, calculating migration MD5 values and migration attribute information of the received unstructured image files, and writing the migration MD5 values and the migration attribute information into the migration list; storing the received non-structural image file to the migration system according to the migration MD5 value and the migration MD5 value;
and the data checking module is used for determining the migration accuracy of the unstructured image files according to the migration list and the migration list.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the computer program to realize the method for checking the migration accuracy of the unstructured image file
An embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program for implementing the method for checking the migration accuracy of an unstructured image file.
The method and the device for checking migration accuracy of the unstructured image files, provided by the embodiment of the invention, comprise the steps of firstly obtaining a migration MD5 value and migration attribute information of the structured image files in a migration system, writing in a migration list, sending the unstructured image files and the migration list from the migration system to the migration system, calculating a migration MD5 value and migration attribute information of the received unstructured image files when the migration system receives the unstructured image files and the migration list sent by the migration system, writing in the migration list, and storing the received unstructured image files to the migration system according to the migration MD5 value and the migration MD5 value; by checking the migrated MD5 value and the migrated MD5 value in the migration process, the accuracy of the unstructured image file in the data migration process can be ensured, the comparison of the full data of the unstructured image file after the data migration is completed is avoided, the time from the migration system to the migration system of the unstructured image file and the data checking time after the migration are greatly shortened, the whole data migration checking process can be completed within the production time window of the banking industry, and the continuity of banking business is ensured; and then determining the migration accuracy of the unstructured image files according to the migration list and the migration list, realizing the check between the migration list and the migration list after the data migration is finished, and further ensuring the accuracy of the data migration.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a schematic diagram of an apparatus for checking migration accuracy of an unstructured image file according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating an internal work flow of a data migration module and a data migration module of a device for checking migration accuracy of an unstructured image file according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating an internal operation of a data checking module of an apparatus for checking migration accuracy of an unstructured image file according to an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating a distributed comparison of an apparatus for checking migration accuracy of an unstructured image file according to an embodiment of the present invention.
FIG. 5 is a schematic diagram illustrating a method for checking migration accuracy of an unstructured image file according to an embodiment of the present invention.
FIG. 6 is a schematic diagram of a computer device for performing a method for verifying migration accuracy of an unstructured image file according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
As shown in fig. 1, a schematic diagram of an apparatus for checking migration accuracy of an unstructured image file according to an embodiment of the present invention is shown, and an embodiment of the present invention further provides an apparatus for checking migration accuracy of an unstructured image file, which realizes efficient and accurate migration of an unstructured image file, and includes:
the data migration module 101 is configured to read an unstructured image file from the migration system, obtain a migration MD5 value and migration attribute information of the unstructured image file, write the migration MD5 value and the migration attribute information into a migration list, and send the unstructured image file and the migration list from the migration system to the migration system;
the data migration module 102 is configured to receive an unstructured image file and a migration list sent by the migration system, calculate a migration MD5 value and migration attribute information of the received unstructured image file, and write the migration MD5 value and migration attribute information into the migration list; storing the received non-structural image file to the migration system according to the migration MD5 value and the migration MD5 value;
and the data checking module 103 is configured to determine migration accuracy of the unstructured image file according to the migration list and the migration list.
The device for checking migration accuracy of the unstructured image files, provided by the embodiment of the invention, comprises the steps of firstly obtaining a migration MD5 value and migration attribute information of a structured image file in a migration system, writing in a migration list, sending the unstructured image file and the migration list from the migration system to a migration system, calculating a migration MD5 value and migration attribute information of the received unstructured image file by the migration system when the migration system receives the unstructured image file and the migration list sent by the migration system, writing in the migration list, and storing the received unstructured image file to the migration system according to the migration MD5 value and the migration MD5 value; by checking the migrated MD5 value and the migrated MD5 value in the migration process, the accuracy of the unstructured image file in the data migration process can be ensured, the comparison of the unstructured image file with the full data after the data is completed is avoided, the comparison time of the unstructured image file from the migration system to the migration system can be greatly shortened, the whole data migration checking process can be completed in a banking industry production time window, and the continuity of banking business is ensured; and then determining the migration accuracy of the unstructured image files according to the migration list and the migration list, realizing the check between the migration list and the migration list after the data migration is finished, and further ensuring the accuracy of the data migration.
Aiming at the characteristics of large storage space occupation and low read-write efficiency of the unstructured image file, the difficulty of data checking on the whole quantity of the unstructured image file after migration is high, the period is long, the requirements of high production and operation stability, short production window and the like of a bank system cannot be met, and the number of the unstructured image file is requiredAnd data checking is quickly completed after the migration so as to ensure the accuracy of the migrated data. If the prior art is adopted, the total data set is compared in a centralized way, and the calculated amount is the Cartesian product of the data scale (assuming that the migration data amount is n records, the calculated amount of comparison is nn) (ii) a This requires a more accurate migration flow and a more efficient verification approach to ensure banking unstructured data migration accuracy.
The embodiment of the invention provides a device for checking the migration accuracy of an unstructured image file, which comprises:
the data migration module is used for reading the unstructured image files from the migration system, obtaining the migration MD5 values and the migration attribute information of the unstructured image files, writing the migration MD5 values and the migration attribute information into a migration list, and sending the unstructured image files and the migration list from the migration system to the migration system;
the data migration module is used for receiving the unstructured image files and the migration list sent by the migration system, calculating migration MD5 values and migration attribute information of the received unstructured image files, and writing the migration MD5 values and the migration attribute information into the migration list; storing the received non-structural image file to the migration system according to the migration MD5 value and the migration MD5 value;
and the data checking module is used for determining the migration accuracy of the unstructured image files according to the migration list and the migration list.
According to the embodiment of the invention, the received non-structural image file can be accurately stored in the migration system by comparing the migration MD5 value with the migration MD5 value.
In implementing the apparatus for checking migration accuracy of an unstructured image file according to the above embodiment of the present invention, the MD5 algorithm is a widely used cryptographic hash function, and can generate a 128-bit (16-byte) hash value for ensuring complete and consistent information transmission, and the algorithm can generate a unique "digital fingerprint" for any file (regardless of size, format, and number), and by using the "digital fingerprint", it can be ensured that the content of the file is not changed during transmission by checking whether the MD5 value before and after the file is changed. The embodiment of the invention also uses distributed computation: the overall computation is broken down into many small parts, which are distributed to multiple compute nodes, which may include one to multiple computers or servers, for processing. Therefore, the overall calculation time can be saved, and the calculation efficiency is greatly improved. But in the decomposition process, a reasonable decomposition mode needs to be adopted for different scenes, and the result of distributed computation is ensured to be the same as that of centralized computation.
Fig. 2 is a flowchart illustrating an internal work flow of a data migration module and a data migration module of a device for checking migration accuracy of an unstructured image file according to an embodiment of the present invention, as shown in fig. 2. The data migration module generates an unstructured image file and a list file in a specified format by combining the data storage characteristics of a migration system, calculates the MD5 value and migration attribute information of the file during file migration, and writes the information into a migration list. And locally keeping an emigration list and simultaneously transmitting the emigration list and the unstructured image files to an emigration system through the FTP.
The data migration module reads the unstructured image file and the migration list generated by the data migration module, calculates the migration MD5 value and the migration attribute information of the received unstructured image file, compares the migration MD5 value recorded in the migration list file, stores the received unstructured image file to the migration system after checking the data, and writes the migration MD5 value and the migration attribute information of the received unstructured image file into the migration list. The process utilizes distributed transactions to ensure consistency of file migration list and migration list results.
In an embodiment of the invention, when the apparatus for checking migration accuracy of an unstructured image file is implemented, the data migration module is specifically configured to:
comparing the migrated MD5 value with migrated MD5 value;
and storing the received non-structural image file to the migration system when the value of the migration MD5 is the same as that of the migration MD 5.
In the embodiment, by comparing the migrated MD5 value with the migrated MD5 value, when the migrated MD5 value is the same as the migrated MD5 value, the received unstructured image file is stored in the migration system, so that the accuracy of the unstructured image file in the data migration process can be realized, the comparison of the full data of the unstructured image file after the data migration is completed is avoided, the data check time after the migration of the unstructured image file is completed can be greatly shortened, the whole data migration check process can be completed within the production time window of the banking industry, and the continuity of banking business is ensured.
In an embodiment of the invention, when the apparatus for checking migration accuracy of an unstructured image file is implemented, the data migration module is further configured to: and when the migrated MD5 value is different from the migrated MD5 value, taking the unstructured image file corresponding to the migrated MD5 value as a migration failure file, recording the migration failure file to a migration failure list, and after all the unstructured image files except the migration failure file are stored in the migration system, migrating the unstructured image files again.
In the embodiment, the migration system records the migration failure list for the file with the migration failure, and then tries migration again, the above steps are auxiliary flows, and when the migration failure does not occur, the migration failure list cannot be generated.
FIG. 3 is a flowchart illustrating an internal operation of a data checking module of an apparatus for checking migration accuracy of an unstructured image file according to an embodiment of the present invention. As shown in fig. 3, the data checking module operates after the data migration is completed, and is mainly responsible for checking whether the migration list is consistent with the migration list, so as to ensure that packet leakage does not occur during the migration process. Meanwhile, the comparison after the migration is finished involves a large amount of data, and in order to ensure that the verification process can be finished in a limited production time window of a bank system, the module adopts a distributed computing mode to group the contents in the list file according to the hash value of the contents, ensures that the data with the same remainder obtained by dividing the hash value by a positive integer is distributed to the same group, uses a plurality of servers to respectively compare the migration list and the migration list with the same group, and finally collects the difference data of different groups.
In an embodiment of the invention, when the apparatus for checking migration accuracy of an unstructured image file is implemented, the data checking module is specifically configured to:
grouping the emigration lists and the emigration lists respectively, and distributing the emigration lists and the emigration lists to a plurality of computing nodes for distributed comparison;
and summarizing and analyzing the results of the distributed comparison, and determining the migration accuracy of the unstructured image files.
In the embodiment, by the comparison method of the packet computation and the multi-node parallel processing, the comparison efficiency of the migration list can be improved, the check time after the migration is finished is effectively shortened, and the continuity of the banking business is ensured.
In the embodiment, after the data migration process of the unstructured image file from the migration system to the migration system is completed, the data check module is used for checking the migration list and the migration list, and the data check module is mainly used for checking whether the migration list is consistent with the migration list or not so as to ensure that packet missing does not occur in the migration process. Meanwhile, the comparison after the migration is finished involves a large amount of data, and in order to ensure that the verification process can be finished in a limited production time window of a bank system, the module adopts a distributed computing mode; the check of the emigration list and the immigration list adopts a reasonable data grouping mode and a distributed calculation method, thereby ensuring that the large-scale data check can be completed in a specified production time window and greatly shortening the time of the list check.
In an embodiment of the invention, when the apparatus for checking migration accuracy of an unstructured image file is implemented, the data checking module is further configured to:
calculating the hash value of the migrated unstructured image file migration attribute information in the migration list;
calculating a hash value of the unstructured image file migration attribute information received in the migration list;
forming a plurality of groups of migration lists and migration lists to be compared according to the hash value of the migration attribute information of the unstructured image files migrated from the migration list and the hash value of the migration attribute information of the unstructured image files received from the migration list and a set splitting rule;
and distributing a plurality of groups of migration lists to be compared and migration lists to a plurality of computing nodes for simultaneous computing, and performing distributed comparison.
In an embodiment of the invention, when the apparatus for checking migration accuracy of an unstructured image file is implemented, the data checking module is further configured to:
setting the number of groups;
dividing the hash value of the migration attribute information of the unstructured image files migrated from the migration list by a set grouping number to obtain a migration list remainder;
dividing the hash value of the unstructured image file migration attribute information received in the migration list by a set grouping number to obtain a migration list remainder;
and distributing the unstructured image file attribute information with the same migration list remainder as that of the migration list into the same comparison group to form a plurality of groups of migration lists and migration lists to be compared.
Fig. 4 is a schematic diagram illustrating a distributed comparison of an apparatus for checking migration accuracy of an unstructured image file according to an embodiment of the present invention. As shown in fig. 4, in an embodiment, the aforementioned migrated unstructured video file migration attribute information in the migration list and the unstructured video file migration attribute information received in the migration list may include an unstructured video file ID (or ID), such as a 32-bit UUID of "b 75c823182f543c695f71f6d0527009 f"; if two have 10 directly8The detailed lists are compared directly to find out the differences, the calculated amount is 1016And the calculation can be carried out on one machine only, the time consumption is long, and the calculation cannot be finished within a specified production time window.
The migrated list and migrated list are divided into 100 groups according to a set splitting rule, wherein the splitting rule is that the image file ID is divided into 100 after taking a hash value, if the result is 0, the image file ID is distributed to the 1 st group, if the result is 1, the image file ID is distributed to the 2 nd group, and the like, if the result is 99, the image file ID is distributed to the 100 th group, the data can be divided into 100 groups, and about 10 files in each group have data6A record, thus ensuring the same number in the migration list as in the migration listAnd must be assigned to the same group of small files. Then, the migrated small files in the same group are compared with the migrated small files, and finally 100 comparison results are summarized to obtain the final difference.
The calculated amount of the method is 106×106×100=1014And can be distributed to 100 nodes to execute simultaneously, which consumes 1/10 time for direct comparison4And the comparison efficiency is greatly improved. In addition, if the computing resources are enough, the grouping number can be continuously increased, and the comparison efficiency is improved to a greater extent, for example, the list is divided into 1000 groups, and the data quantity of each group is 105Then the calculated amount becomes 105×105×1000=1013And can be distributed to 1000 song nodes to execute simultaneously, which consumes 1/10 time for direct comparison6. If the full data is divided into n groups and distributed to n nodes for simultaneous computation, it takes time to directly compare 1/10n. It can be seen that by such a checking method, the overall checking efficiency can be exponentially improved.
The embodiment of the invention also provides a method for checking the migration accuracy of an unstructured image file, which is described in the following embodiments. Because the principle of solving the problem of the method is similar to that of a device for checking the migration accuracy of the unstructured image files, the implementation of the method can refer to the implementation of the device for checking the migration accuracy of the unstructured image files, and repeated details are omitted.
FIG. 5 is a schematic diagram illustrating a method for checking migration accuracy of an unstructured image file according to an embodiment of the present invention. As shown in fig. 5, an embodiment of the present invention further provides a method for checking migration accuracy of an unstructured image file, including:
step 501: reading the unstructured image file from the migration system, obtaining the migration MD5 value and the migration attribute information of the unstructured image file, writing the value and the migration attribute information into a migration list, and sending the unstructured image file and the migration list from the migration system to the migration system;
step 502: receiving the unstructured image files and the migration lists sent by the migration system, calculating migration MD5 values and migration attribute information of the received unstructured image files, and writing the migration MD5 values and the migration attribute information into the migration lists; storing the received non-structural image file to the migration system according to the migration MD5 value and the migration MD5 value;
step 503: and determining the migration accuracy of the unstructured image files according to the migration list and the migration list.
In an embodiment of the present invention, when the method for checking migration accuracy of an unstructured image file is implemented, the storing the received unstructured image file to a migration system according to a migration MD5 value and a migration MD5 value includes:
comparing the migrated MD5 value with migrated MD5 value;
and storing the received non-structural image file to the migration system when the value of the migration MD5 is the same as that of the migration MD 5.
In a specific implementation of the method for checking migration accuracy of an unstructured image file according to the embodiment of the present invention, in an embodiment, when an emigration MD5 value is different from an emigration MD5 value, an unstructured image file corresponding to an emigration MD5 value is taken as an emigration failure file, the emigration failure file is recorded to an emigration failure list, and after all unstructured image files except the emigration failure file are stored in an emigration system, the emigration is performed again.
In an embodiment of the present invention, when the method for checking the migration accuracy of an unstructured image file is implemented, the determining the migration accuracy of the unstructured image file according to the migration list and the migration list includes:
grouping the emigration lists and the emigration lists respectively, and distributing the emigration lists and the emigration lists to a plurality of computing nodes for distributed comparison;
and summarizing and analyzing the results of the distributed comparison, and determining the migration accuracy of the unstructured image files.
In an embodiment of the present invention, when the method for checking migration accuracy of an unstructured image file is implemented, the migrating out list and the migrating in list are respectively grouped and distributed to a plurality of computing nodes for distributed comparison, where the method includes:
calculating the hash value of the migrated unstructured image file migration attribute information in the migration list;
calculating a hash value of the unstructured image file migration attribute information received in the migration list;
forming a plurality of groups of migration lists and migration lists to be compared according to the hash value of the migration attribute information of the unstructured image files migrated from the migration list and the hash value of the migration attribute information of the unstructured image files received from the migration list and a set splitting rule;
and distributing a plurality of groups of migration lists to be compared and migration lists to a plurality of computing nodes for simultaneous computing, and performing distributed comparison.
In a specific implementation of the method for checking migration accuracy of an unstructured image file according to the embodiments of the present invention, in one embodiment, a plurality of sets of migration out lists and migration in lists to be compared are formed according to a set splitting rule according to a hash value of migration out attribute information of an unstructured image file migrated out of a migration out list and a hash value of migration in attribute information of an unstructured image file received in a migration in list, where the method includes:
setting the number of groups;
dividing the hash value of the migration attribute information of the unstructured image files migrated from the migration list by a set grouping number to obtain a migration list remainder;
dividing the hash value of the unstructured image file migration attribute information received in the migration list by a set grouping number to obtain a migration list remainder;
and distributing the unstructured image file attribute information with the same migration list remainder as that of the migration list into the same comparison group to form a plurality of groups of migration lists and migration lists to be compared.
As shown in fig. 6, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method for checking the migration accuracy of an unstructured image file when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program for implementing the method for checking the migration accuracy of an unstructured image file.
The data migration program of the embodiment of the invention is divided into two parts, wherein one part is the data migration program running in the migration system, and the other part is the data migration program running in the migration system. Consistency between the migrated list generated by the migrated program and the migrated file is a key constraint factor of the accuracy of subsequent data check. When the migration program is migrated, the migration list and the corresponding image file are verified, so that the consistency of the migration list and the migration image file is ensured; and when the migration program writes the file into the storage device and writes the migration list into the database, the consistency of the migration file and the migration list is ensured by adopting distributed transactions. And an MD5 value verification link is introduced in the file migration process, so that the file migration process is not damaged. The data checking link adopts a distributed computing method to divide the data into a plurality of groups, and by means of comparison in groups, the computing amount is greatly reduced, data comparison of different groups can be parallel, and the checking time is greatly shortened.
In actual production, if the file is required to be ensured to be lossless or the content is not modified in the migration process, the method of verifying the MD5 value of the file during migration must be used for ensuring; if data are required to be ensured not to be leaked in the migration process, checking data items of the migration system and the migration system after the data are migrated; if a large amount of data in a data list needs to be checked in a limited production time window, a data checking algorithm needs to be optimized, and a plurality of server resources are adopted for parallel calculation, so that the comparison efficiency is increased.
The embodiment of the invention adopts a double checking mode of checking the MD5 value of the file in the transferring process and checking the list after the transferring is finished, thereby ensuring the accuracy of data transferring; the transfer list checking adopts a reasonable data grouping mode and a distributed computing method, and ensures that the large-scale data checking can be completed in a specified production time window. The two points ensure the accuracy of large-scale image data migration.
To sum up, the method and apparatus for checking migration accuracy of an unstructured image file according to the embodiments of the present invention first obtain an emigration MD5 value and emigration attribute information of a structured image file in an emigration system, write the emigration list, send the unstructured image file and the emigration list from the emigration system to the emigration system, when the emigration system receives the unstructured image file and the emigration list sent by the emigration system, calculate an emigration MD5 value and emigration attribute information of the received unstructured image file, write the emigration list, and accurately store the received unstructured image file to the emigration system by comparing the emigration MD5 value and the emigration MD5 value; by checking the migrated MD5 value and the migrated MD5 value in the migration process, the accuracy of the unstructured image file in the data migration process can be ensured, the full data comparison of the unstructured image file after the data migration is completed is avoided, the time from the migration system to the migration system of the unstructured image file and the data checking time from the migration system to the migration system can be greatly shortened, the whole data migration checking process can be completed in a production time window of the banking industry, and the continuity of banking business is ensured; then according to the migration list and the migration list, the migration accuracy of the unstructured image files is determined, the check between the migration list and the migration list after the data migration is completed is realized, and the accuracy of the data migration can be further ensured; the comparison after the migration is finished involves a large amount of data, and a distributed computing mode is adopted to ensure that the checking process can be finished in a limited production time window of a bank system; the check of the emigration list and the immigration list adopts a reasonable data grouping mode and a distributed calculation method, thereby ensuring that the large-scale data check can be completed in a specified production time window and greatly shortening the time of the list check.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (14)

1. A method for checking migration accuracy of an unstructured image file, comprising:
reading the unstructured image file from the migration system, acquiring the migration MD5 value and the migration attribute information of the unstructured image file, writing the value and the migration attribute information into a migration list, and sending the unstructured image file and the migration list from the migration system to the migration system;
receiving the unstructured image files and the migration lists sent by the migration system, calculating migration MD5 values and migration attribute information of the received unstructured image files, and writing the migration MD5 values and the migration attribute information into the migration lists; storing the received non-structural image file to the migration system according to the migration MD5 value and the migration MD5 value;
and determining the migration accuracy of the unstructured image files according to the migration list and the migration list.
2. The method of claim 1, wherein storing the received unstructured image files to the migration system according to the migration MD5 value and the migration MD5 value comprises:
comparing the migrated MD5 value with migrated MD5 value;
and storing the received non-structural image file to the migration system when the value of the migration MD5 is the same as that of the migration MD 5.
3. The method according to claim 2, wherein when the migrated MD5 value is different from the migrated MD5 value, the unstructured image file corresponding to the migrated MD5 value is recorded as a migration failure file in the migration failure list, and after all the unstructured image files except the migration failure file are stored in the migration system, migration is performed again.
4. The method of claim 1, wherein determining the migration accuracy of the unstructured image files based on the migration list and the migration list comprises:
grouping the emigration lists and the emigration lists respectively, and distributing the emigration lists and the emigration lists to a plurality of computing nodes for distributed comparison;
and summarizing and analyzing the results of the distributed comparison, and determining the migration accuracy of the unstructured image files.
5. The method of claim 4, wherein grouping the immigration list and the immigration list respectively, and assigning the grouping to a plurality of computing nodes for distributed comparison comprises:
calculating the hash value of the migrated unstructured image file migration attribute information in the migration list;
calculating a hash value of the unstructured image file migration attribute information received in the migration list;
forming a plurality of groups of migration lists and migration lists to be compared according to the hash value of the migration attribute information of the unstructured image files migrated from the migration list and the hash value of the migration attribute information of the unstructured image files received from the migration list and a set splitting rule;
and distributing a plurality of groups of migration lists to be compared and migration lists to a plurality of computing nodes for simultaneous computing, and performing distributed comparison.
6. The method of claim 5, wherein the step of forming a plurality of sets of migration out lists and migration in lists to be compared according to the set splitting rule based on the hash value of the migration out attribute information of the unstructured image file migrated out of the migration out list and the hash value of the migration in attribute information of the unstructured image file received in the migration in list comprises:
setting the number of groups;
dividing the hash value of the migration attribute information of the unstructured image files migrated from the migration list by a set grouping number to obtain a migration list remainder;
dividing the hash value of the unstructured image file migration attribute information received in the migration list by a set grouping number to obtain a migration list remainder;
and distributing the unstructured image file attribute information with the same migration list remainder as that of the migration list into the same comparison group to form a plurality of groups of migration lists and migration lists to be compared.
7. An apparatus for verifying migration accuracy of an unstructured image file, comprising:
the data migration module is used for reading the unstructured image files from the migration system, acquiring the migration MD5 values and the migration attribute information of the unstructured image files, writing the migration MD5 values and the migration attribute information into a migration list, and sending the unstructured image files and the migration list from the migration system to the migration system;
the data migration module is used for receiving the unstructured image files and the migration list sent by the migration system, calculating migration MD5 values and migration attribute information of the received unstructured image files, and writing the migration MD5 values and the migration attribute information into the migration list; storing the received non-structural image file to the migration system according to the migration MD5 value and the migration MD5 value;
and the data checking module is used for determining the migration accuracy of the unstructured image files according to the migration list and the migration list.
8. The apparatus of claim 7, wherein the data immigration module is specifically configured to:
comparing the migrated MD5 value with migrated MD5 value;
and storing the received non-structural image file to the migration system when the value of the migration MD5 is the same as that of the migration MD 5.
9. The apparatus of claim 8, wherein the data immigration module is further to: and when the migrated MD5 value is different from the migrated MD5 value, taking the unstructured image file corresponding to the migrated MD5 value as a migration failure file, recording the migration failure file to a migration failure list, and after all the unstructured image files except the migration failure file are stored in the migration system, migrating the unstructured image files again.
10. The apparatus of claim 7, wherein the data collation module is specifically configured to:
grouping the emigration lists and the emigration lists respectively, and distributing the emigration lists and the emigration lists to a plurality of computing nodes for distributed comparison;
and summarizing and analyzing the results of the distributed comparison, and determining the migration accuracy of the unstructured image files.
11. The apparatus of claim 10, wherein the data reconciliation module is further configured to:
calculating the hash value of the migrated unstructured image file migration attribute information in the migration list;
calculating a hash value of the unstructured image file migration attribute information received in the migration list;
forming a plurality of groups of migration lists and migration lists to be compared according to the hash value of the migration attribute information of the unstructured image files migrated from the migration list and the hash value of the migration attribute information of the unstructured image files received from the migration list and a set splitting rule;
and distributing a plurality of groups of migration lists to be compared and migration lists to a plurality of computing nodes for simultaneous computing, and performing distributed comparison.
12. The apparatus of claim 11, wherein the data reconciliation module is further configured to:
setting the number of groups;
dividing the hash value of the migration attribute information of the unstructured image files migrated from the migration list by a set grouping number to obtain a migration list remainder;
dividing the hash value of the unstructured image file migration attribute information received in the migration list by a set grouping number to obtain a migration list remainder;
and distributing the unstructured image file attribute information with the same migration list remainder as the migration list remainder to the same comparison group to form a plurality of migration and migration lists to be compared.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for checking the migration accuracy of an unstructured image file according to any one of claims 1 to 6 when executing the computer program.
14. A computer-readable storage medium storing a computer program for implementing the method for checking migration accuracy of unstructured image files according to any one of claims 1 to 6.
CN202010513532.0A 2020-06-08 2020-06-08 Method and device for checking migration accuracy of unstructured image file Pending CN111680004A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010513532.0A CN111680004A (en) 2020-06-08 2020-06-08 Method and device for checking migration accuracy of unstructured image file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010513532.0A CN111680004A (en) 2020-06-08 2020-06-08 Method and device for checking migration accuracy of unstructured image file

Publications (1)

Publication Number Publication Date
CN111680004A true CN111680004A (en) 2020-09-18

Family

ID=72454042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010513532.0A Pending CN111680004A (en) 2020-06-08 2020-06-08 Method and device for checking migration accuracy of unstructured image file

Country Status (1)

Country Link
CN (1) CN111680004A (en)

Similar Documents

Publication Publication Date Title
US10402427B2 (en) System and method for analyzing result of clustering massive data
CN102831222A (en) Differential compression method based on data de-duplication
CN104252481A (en) Dynamic check method and device for consistency of main and salve databases
CN105468473A (en) Data migration method and data migration apparatus
CN101308471B (en) Method and device for data restoration
US8793224B2 (en) Linear sweep filesystem checking
CN103136243A (en) File system duplicate removal method and device based on cloud storage
US10108658B1 (en) Deferred assignments in journal-based storage systems
CN107665219B (en) Log management method and device
TWI730690B (en) Method and device for simultaneously executing transactions in block chain, computer readable storage medium and computing equipment
CN111444196B (en) Method, device and equipment for generating Hash of global state in block chain type account book
US20200356901A1 (en) Target variable distribution-based acceptance of machine learning test data sets
CN107016047A (en) Document query, document storing method and device
CN111444192B (en) Method, device and equipment for generating Hash of global state in block chain type account book
CN106445643A (en) Method and device for cloning and updating virtual machine
CN106354587A (en) Mirror image server and method for exporting mirror image files of virtual machine
CN106897342A (en) A kind of data verification method and equipment
CN110706108B (en) Method and apparatus for concurrently executing transactions in a blockchain
CN111680004A (en) Method and device for checking migration accuracy of unstructured image file
CN110119947B (en) Method and apparatus for shared workload proof computing power generation of symbiotic blockchains
CN109460406A (en) A kind of data processing method and device
US20180091409A1 (en) Distributed computing utilizing a recovery site
CN109271456A (en) Host data library file deriving method and device
CN109542900B (en) Data processing method and device
US20210241273A1 (en) Smart contract platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination