CN109977082A - A kind of method and computer readable storage medium of high-volume data automatic comparison - Google Patents

A kind of method and computer readable storage medium of high-volume data automatic comparison Download PDF

Info

Publication number
CN109977082A
CN109977082A CN201910184724.9A CN201910184724A CN109977082A CN 109977082 A CN109977082 A CN 109977082A CN 201910184724 A CN201910184724 A CN 201910184724A CN 109977082 A CN109977082 A CN 109977082A
Authority
CN
China
Prior art keywords
file
comparison
compared
collection
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910184724.9A
Other languages
Chinese (zh)
Inventor
蔡卓明
蔡伟杰
郭超年
王桐森
陈金德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FUJIAN RURAL CREDIT YONHAP
Original Assignee
FUJIAN RURAL CREDIT YONHAP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FUJIAN RURAL CREDIT YONHAP filed Critical FUJIAN RURAL CREDIT YONHAP
Priority to CN201910184724.9A priority Critical patent/CN109977082A/en
Publication of CN109977082A publication Critical patent/CN109977082A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides the method and computer readable storage medium of a kind of high-volume data automatic comparison, comprising: the database file before step 10, automatic acquisition migration and the database file after migration, and split and exported respectively;Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, the database file after splitting derived migration pre-processes, and obtains comparison document collection;The file of step 30, the scanning reference file collection, generates and compares file directory;File progress MD5 value corresponding with comparison document concentration compares and obtains comparison result one by one for step 40, the file compared according to the needs that the comparison file directory concentrates the reference file.The method and computer readable storage medium of a kind of high-volume data automatic comparison provided by the invention are pre-processed, than equity automatically by data file of the program to migration front and back, reduce the operation of artificial link, improve data comparison efficiency.

Description

A kind of method and computer readable storage medium of high-volume data automatic comparison
Technical field
The present invention relates to data processing fields, and in particular to the method and computer of a kind of high-volume data automatic comparison can Read storage medium.
Background technique
In recent years, information-based intensified competition, the frontier that big data is fought at enterprise, with rising abruptly for " wisdom economy " Rise, acquisition data, grasp data, maintenance data at enterprise core competitiveness.Traditional oracle database has met The not demand of big data parallel processing, goes Oracle gradually to bring into schedule.
Data Migration improves the means of data application efficiency as enterprise, also brings the integrality, accurate of Data Migration The problem of how property ensures.The method that tradition compares migration front and back data, every table compare will by manually checking, analyzing, Time-consuming, artificial investment is big.With the development of computer technology, also emerged one after another using the method that algorithmic technique is compared, Active and standby end data is imported into memory, and is compared using various lookup algorithms.
Existing Publication No. CN104239301B, patent of invention " a kind of data ratio that publication date is on 2 13rd, 2018 To method and apparatus ", which proposes a kind of data comparison method, comprising: determines the first data set and second to be compared Data set, each comparison other in data set include one or more relatively items;Determine the type of relatively item, the type is extremely It less include: that the first kind compares item and the non-first kind compares item;To in the comparison other and the second data set in the first data set Comparison other is compared, wherein if the first kind of the first comparison other in the first data set compares item and described second The correspondence first kind in the second comparison other in data set compares that item is identical, and the non-first kind of the first comparison other compares item Difference compared with the non-first kind of the correspondence of the second comparison other between item meets preset condition, then judge the first comparison other and Second comparison other is consistent.
Existing Publication No. CN107679104A, " big surface low formula is simultaneously for the patent of invention that publication date is on 2 9th, 2018 Row high-speed data comparison method ", which proposes a kind of big surface low formula parallel high-speed data comparison method, characterized in that packet Include following steps: (1-1) compares application program and configures primary database table information and standby data to be compared by database link The information of library table, if primary database table information and standby database table message structure are inconsistent, return can not be compared;(1-2) ratio Compared with the index field of primary database table information and standby database table, minimum value min and maximum value max is obtained, is started for comparing It is marked with end;And parallel comparison port number N is set, N dynamic generation is used for parallel processing;(1-3) is by master/slave data library table Record presses index field sort ascending, and ranking results is carried out piecemeal by the parallel port number N that compares, and each channel is flowed respectively Formula reads data, is stored in caching;(1-4) parallel data processing compares in each channel, records comparison result.
The above inventive method the prior art has at least the following problems:
1, the method that current method is directed to single file data or small lot comparing, is unable to satisfy a large amount of numbers According to the requirement of comparison;
2, current method is the inspection based on file content, and time-consuming, low efficiency, fails effective compression ratio to duration;
3, it needs to touch solid data in comparison process, does not consider Information Security problem, it is understood that there may be leaking data wind Danger.
Summary of the invention
One of the technical problem to be solved in the present invention is to provide a kind of method of high-volume data automatic comparison, pass through Program pre-processes the data file of migration front and back, automatically than equity, reduces the operation of artificial link, improves comparing Efficiency.
One of technical problems to be solved of the embodiment of the present invention are achieved in that
A kind of method of high-volume data automatic comparison, includes the following steps:
Step 10 is configured database configuration item, automatically to the database file before migration and the data after migration Library file is split and is exported respectively;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, will split Database file after derived migration is pre-processed, and comparison document collection is obtained;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file directory needs of concentrating the reference file one by one with institute Stating comparison document concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, as MD5 value is different Then generation error record is caused, for subsequent data analysis and tracking.
Preferably, the method further includes:
First access authority is arranged to database configuration item and the program file used, to the reference file collection and is compared The second access authority is arranged in file set, and third access authority is arranged to the comparison file directory and error logging.
Preferably, the method further includes:
Step 50, setting memory space release switch, when the switch is opened, while deleting reference file collection and comparison document Concentration has compared the file passed through, when the switch is closed, deletes without file.
Preferably, it pre-processes and specifically includes in the step 20:
Data normalization processing, including difference existing for removal hashed field, elimination heterogeneous database are carried out to file, really It is consistent to protect two-end structure;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
Preferably, the step 30 specifically: the data file of reference file collection described in timing scan generates and compares file Catalogue compares file directory but creation time and is later than catalogue when file is not present or is present in the comparison file directory When the creation time of middle record, judge file for new file, and by the information of the new file be added compare file directory, it is described Whether compare file directory information includes file path, file creation time, " having compared " and " comparing situation ".
Preferably, the step 40 specifically includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the reference file one by one with institute Stating comparison document concentrates corresponding file to be compared, as MD5 value unanimously if determine to have compared and pass through, " whether will compare " mark It is denoted as "Yes", " comparison situation " is labeled as " MD5 is consistent ";If MD5 value is inconsistent, the comparison that the comparison document is concentrated Whether file file corresponding with reference file concentration is compared line by line, " will compare " labeled as "Yes", " will compare feelings Condition " is labeled as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking.
The second technical problem to be solved by the present invention is to provide a kind of computer readable storage medium, passes through program pair The data file of migration front and back is pre-processed, automatically than equity, is reduced the operation of artificial link, is improved data comparison efficiency.
The two of technical problems to be solved of the embodiment of the present invention are achieved in that
A kind of computer readable storage medium is stored thereon with computer program (instruction), and the program (instruction) is processed Device performs the steps of when executing
Step 10 is configured database configuration item, automatically to the database file before migration and the data after migration Library file is split and is exported respectively;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, will split Database file after derived migration is pre-processed, and comparison document collection is obtained;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, as MD5 value is different Then generation error record is caused, for subsequent data analysis and tracking.
Preferably, described program also executes:
First access authority is arranged to database configuration item and the program file used, to the reference file collection and is compared The second access authority is arranged in file set, and third access authority is arranged to the comparison file directory and error logging.
Preferably, described program also executes:
Step 50, setting memory space release switch, when the switch is opened, while deleting reference file collection and comparison document Concentration has compared the file passed through, when the switch is closed, deletes without file.
Preferably, described program also executes:
Data normalization processing, including difference existing for removal hashed field, elimination heterogeneous database are carried out to file, really It is consistent to protect two-end structure;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
Preferably, the step 30 specifically: the data file of reference file collection described in timing scan generates and compares file Catalogue compares file directory but creation time and is later than catalogue when file is not present or is present in the comparison file directory When the creation time of middle record, judge file for new file, and by the information of the new file be added compare file directory, it is described Whether compare file directory information includes file path, file creation time, " having compared " and " comparing situation ".
Preferably, the step 40 specifically includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, " whether will compare It is right " it is labeled as "Yes", " comparison situation " is labeled as " MD5 is consistent ";If MD5 value is inconsistent, the comparison document is concentrated It compares file file corresponding with reference file concentration to be compared line by line, " whether will compare " labeled as "Yes", " will compare Situation " is labeled as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking.
The present invention has the advantage that
1, pre-processed automatically, by data file of the program to migration front and back than equity, overcome the prior art without Method meets mass data and compares problem, and the automation for realizing high-volume data compares;
2, it realizes that file compares by the comparison to file MD5 value, overcomes the current inspection based on file content, consume Duration, low efficiency fail the problem of effective compression ratio is to duration, have time-consuming short, high-efficient advantage;
3, different permissions is set by the comparison to file MD5 value and to program and comparison file and result, overcomes It needs to touch solid data in comparison process, it is understood that there may be the problem of leaking data risk, with high security, be easily managed number According to the advantage for comparing situation.
Detailed description of the invention
The present invention is further illustrated in conjunction with the embodiments with reference to the accompanying drawings.
Fig. 1 is the method for the present invention execution flow chart.
Fig. 2 is flow chart of data processing figure of the embodiment of the present invention.
Fig. 3 is comparing of embodiment of the present invention flow chart.
Specific embodiment
It please refers to shown in Fig. 1 to 3, a kind of method of high-volume data automatic comparison includes the following steps:
Step 10 is configured database configuration item, automatically to the database file before migration and the data after migration Library file is split and is exported respectively;After being configured to database configuration item, split and derivation program logarithm by data It is split according to the big list file in library, it is ensured that data file size is controlled, and can be handled by program;Before data export It can check disk space service condition, when Insufficient disk space, pause export continues to export after disk space release.
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, will split Database file after derived migration is pre-processed, and comparison document collection is obtained;
The file of step 30, the scanning reference file collection, generating comparison file directory (can also scan comparison document Collection generates and compares file directory);
Step 40, the file compared according to the comparison file directory needs of concentrating the reference file one by one with institute Stating comparison document corresponding file is concentrated to carry out the comparison of MD5 value (is that scanning comparison document collection generates as compared file directory, then Corresponding with reference file concentration file carries out MD5 value and compares), as MD5 value unanimously if determine to have compared and pass through, not such as MD5 value Consistent then generation error records, for subsequent data analysis and tracking.
Entire file as a big text information, is converted algorithm by its irreversible character string, produced only by MD5 One MD5 informative abstract.The typical case of MD5 is to generate informative abstract to a segment information, to prevent from being tampered, if anyone Any change is done to file, changes will occur for MD5 value.Therefore it can pass through comparison basis file set and comparison document collection In the MD5 value of one-to-one file judge whether file completely the same.
In a preferred embodiment, the method further includes:
To database configuration item, (the database configuration item includes that database access address, database user name, data are close Code needs derived table and field information etc.) and the program file that uses the first access authority is set, to the reference file collection and The second access authority is arranged in comparison document collection, and third access authority is arranged to the comparison file directory and error logging.
For example setting program maintenance personnel possess the first access authority, can safeguard alignment programs, and to database therein The important parameters such as link information, data processing rule are configured;Setting common test personnel: possessing third access authority, can Comparison result, error logging are accessed, is convenient for tracking to compare situation, progress, and to inconsistent data progress preliminary analysis and instead Feedback;Setting advanced test personnel possess the second access authority and third access authority simultaneously, and addressable reference file collection, comparison are literary Part collection, comparison result, error logging, it is ensured that when occurring to compare inconsistent, have permission and error logging is analysed in depth.
The method further includes:
Step 50, setting memory space release switch, when the switch is opened, while deleting reference file collection and comparison document Concentration has compared the file passed through, when the switch is closed, deletes without file.
In the case that and memory space huge in correlation data amount is relatively limited, in order to meet the comparison need of mass data It asks, reference file collection and comparison document concentration can have been compared the file passed through and deleted, one memory space of setting discharges Switch, the unlatching when needing to discharge memory space, program, which will be automatically deleted, have been compared the benchmark passed through, has compared file, with release Memory space is subsequent file warehousing vacating space to be compared.
Pretreatment specifically includes in the step 20:
Data normalization processing, including removal hashed field (such as timestamp), elimination heterogeneous database are carried out to file Existing difference, it is ensured that two-end structure is consistent;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
After being pre-processed to the file of reference file collection and comparison document collection, it can be ensured that will not be because of in comparison process Other factors other than data itself influence comparison result.
The step 30 specifically: the data file of reference file collection described in timing scan, generation comparison file directory ( Can scan comparison document collection, generate and compare file directory), when file is not present or existing in the comparison file directory (creation time recorded in catalogue is upper one when comparing file directory but creation time is later than the creation time recorded in catalogue When secondary scanning is to the file, the creation time of this document), judge that file for new file, and the information of the new file is added Enter and compare file directory, the comparison file directory information includes whether file path, file creation time, " having compared " are (silent Think "No") and " comparing situation " (being defaulted as " to be compared ").
The step 40 specifically includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, " whether will compare It is right " it is labeled as "Yes", " comparison situation " is labeled as " MD5 is consistent ";If MD5 value is inconsistent, the comparison document is concentrated It compares file file corresponding with reference file concentration to be compared line by line, " whether will compare " labeled as "Yes", " will compare Situation " is labeled as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking;As comparison document is concentrated There is no the one-to-one files of file compared with the needs, then do not deal with, and continue to execute the next item down and compare task.
In a preferred embodiment, the content that inconsistent row is recorded while comparison line by line, when inconsistent line number is excessive When, the content of preceding 10 row can be only recorded, and generation error records, tester can be recorded specific interior by access errors Hold, analyzes inconsistent situation, and operate according to actual needs to database file.
It refer again to shown in Fig. 1 to 3, a kind of computer readable storage medium, be stored thereon with computer program (instruction), The program (instruction) performs the steps of when being executed by processor
Step 10 is configured database configuration item, automatically to the database file before migration and the data after migration Library file is split and is exported respectively;After being configured to database configuration item, split and derivation program logarithm by data It is split according to the big list file in library, it is ensured that data file size is controlled, and can be handled by program;Before data export It can check disk space service condition, when Insufficient disk space, pause export continues to export after disk space release;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, will split Database file after derived migration is pre-processed, and comparison document collection is obtained;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, as MD5 value is different Then generation error record is caused, for subsequent data analysis and tracking.
Entire file as a big text information, is converted algorithm by its irreversible character string, produced only by MD5 One MD5 informative abstract.The typical case of MD5 is to generate informative abstract to a segment information, to prevent from being tampered, if anyone Any change is done to file, changes will occur for MD5 value.Therefore it can pass through comparison basis file set and comparison document collection In the MD5 value of one-to-one file judge whether file completely the same.
In a preferred embodiment, described program also executes:
To database configuration item, (the database configuration item includes that database access address, database user name, data are close Code needs derived table and field information etc.) and the program file that uses the first access authority is set, to the reference file collection and The second access authority is arranged in comparison document collection, and third access authority is arranged to the comparison file directory and error logging.
For example setting program maintenance personnel possess the first access authority, can safeguard alignment programs, and to database therein The important parameters such as link information, data processing rule are configured;Setting common test personnel: possessing third access authority, can Comparison result, error logging are accessed, is convenient for tracking to compare situation, progress, and to inconsistent data progress preliminary analysis and instead Feedback;Setting advanced test personnel possess the second access authority and third access authority simultaneously, and addressable reference file collection, comparison are literary Part collection, comparison result, error logging, it is ensured that when occurring to compare inconsistent, have permission and error logging is analysed in depth.
Described program also executes:
Step 50, setting memory space release switch, when the switch is opened, while deleting reference file collection and comparison document Concentration has compared the file passed through, when the switch is closed, deletes without file.
In the case that and memory space huge in correlation data amount is relatively limited, in order to meet the comparison need of mass data It asks, reference file collection and comparison document concentration can have been compared the file passed through and deleted, one memory space of setting discharges Switch, the unlatching when needing to discharge memory space, program, which will be automatically deleted, have been compared the benchmark passed through, has compared file, with release Memory space is subsequent file warehousing vacating space to be compared.
Described program also executes:
Data normalization processing, including removal hashed field (such as timestamp), elimination heterogeneous database are carried out to file Existing difference, it is ensured that two-end structure is consistent;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
After being pre-processed to the file of reference file collection and comparison document collection, it can be ensured that will not be because of in comparison process Other factors other than data itself influence comparison result.
The step 30 specifically: the data file of reference file collection described in timing scan generates and compares file directory, when File is not present or is present in the comparison file directory to compare file directory but creation time and be later than in catalogue and record Creation time when (when the creation time recorded in catalogue is last scanning to the file, the creation time of this document), Judge file for new file, and by the information of the new file be added compare file directory, the comparison file directory information packet Include file path, file creation time, " whether having compared " (being defaulted as "No") and " comparing situation " (being defaulted as " to be compared ").
The step 40 specifically includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, " whether will compare It is right " it is labeled as "Yes", " comparison situation " is labeled as " MD5 is consistent ";If MD5 value is inconsistent, the comparison document is concentrated It compares file file corresponding with reference file concentration to be compared line by line, " whether will compare " labeled as "Yes", " will compare Situation " is labeled as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking;As comparison document is concentrated There is no the one-to-one files of file compared with the needs, then do not deal with, and continue to execute the next item down and compare task.
In a preferred embodiment, the content that inconsistent row is recorded while comparison line by line, when inconsistent line number is excessive When, the content of preceding 10 row can be only recorded, and generation error records, tester can be recorded specific interior by access errors Hold, analyzes inconsistent situation, and operate according to actual needs to database file.
The present invention is pre-processed, by data file of the program to migration front and back than equity automatically, overcomes existing skill Art is unable to satisfy mass data and compares problem, and the automation for realizing high-volume data compares;Pass through the comparison to file MD5 value It realizes that file compares, overcomes the current inspection based on file content, time-consuming, low efficiency, fails effective compression ratio clock synchronization Long problem has time-consuming short, high-efficient advantage;By the comparison to file MD5 value and to program and compare file and knot Different permissions is arranged in fruit, and overcoming needs to touch solid data in comparison process, it is understood that there may be the problem of leaking data risk, With high security, it is easily managed the advantage of comparing situation.
Although specific embodiments of the present invention have been described above, those familiar with the art should be managed Solution, we are merely exemplary described specific embodiment, rather than for the restriction to the scope of the present invention, it is familiar with this The technical staff in field should be covered of the invention according to modification and variation equivalent made by spirit of the invention In scope of the claimed protection.

Claims (12)

1. a kind of method of high-volume data automatic comparison, which comprises the steps of:
Step 10 is treated comparison data library configuration item and is configured, before the migration of database configuration item automatic acquisition of scientific information Database file after database file and migration, and split and exported respectively;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, export splitting Migration after database file pre-processed, obtain comparison document collection;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file directory needs of concentrating the reference file one by one with the ratio Carry out the comparison of MD5 value compared with file corresponding in file set, as MD5 value unanimously if determine to have compared and pass through, if MD5 value is inconsistent Generation error record, for subsequent data analysis and tracking.
2. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that the method is into one Step includes:
First access authority is arranged to database configuration item and the program file used, to the reference file collection and comparison document Collection the second access authority of setting, is arranged third access authority to the comparison file directory and error logging.
3. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that the method is into one Step includes:
Step 50, setting memory space release switch when the switch is opened, while deleting reference file collection and comparison document concentration The file passed through has been compared, when the switch is closed, has been deleted without file.
4. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that in the step 20 Pretreatment specifically includes:
Data normalization processing, including difference existing for removal hashed field, elimination heterogeneous database are carried out to file, it is ensured that two End structure is consistent;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
5. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that step 30 tool Body are as follows: the data file of reference file collection described in timing scan generates and compares file directory, when file is in the comparison file mesh It is not present or is present in record when comparing file directory but creation time and being later than the creation time recorded in catalogue, judge file For new file, and the information of the new file is added and compares file directory, the comparison file directory information includes file road Diameter, file creation time, " whether having compared " and " comparing situation ".
6. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that step 40 tool Body includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the reference file one by one with the ratio Be compared compared with file corresponding in file set, as MD5 value unanimously if determine to have compared and pass through, " whether will compare " and be labeled as " comparing situation " is labeled as " MD5 is consistent " by "Yes";If MD5 value is inconsistent, the comparison file that the comparison document is concentrated Whether file corresponding with reference file concentration is compared line by line, " will compare " and mark labeled as "Yes", by " comparison situation " It is denoted as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking.
7. a kind of computer readable storage medium is stored thereon with computer program (instruction), which is characterized in that the program (refers to Enable) it performs the steps of when being executed by processor
Step 10 is treated comparison data library configuration item and is configured, before the migration of database configuration item automatic acquisition of scientific information Database file after database file and migration, and split and exported respectively;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, export splitting Migration after database file pre-processed, obtain comparison document collection;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file directory needs of concentrating the reference file one by one with the ratio Carry out the comparison of MD5 value compared with file corresponding in file set, as MD5 value unanimously if determine to have compared and pass through, if MD5 value is inconsistent Generation error record, for subsequent data analysis and tracking.
8. a kind of computer readable storage medium according to claim 7, which is characterized in that described program also executes:
First access authority is arranged to database configuration item and the program file used, to the reference file collection and comparison document Collection the second access authority of setting, is arranged third access authority to the comparison file directory and error logging.
9. a kind of computer readable storage medium according to claim 7, which is characterized in that described program also executes:
Step 50, setting memory space release switch when the switch is opened, while deleting reference file collection and comparison document concentration The file passed through has been compared, when the switch is closed, has been deleted without file.
10. a kind of computer readable storage medium according to claim 7, which is characterized in that described program also executes:
Data normalization processing, including difference existing for removal hashed field, elimination heterogeneous database are carried out to file, it is ensured that two End structure is consistent;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
11. a kind of computer readable storage medium according to claim 7, which is characterized in that the step 30 specifically: The data file of reference file collection described in timing scan generates and compares file directory, when file is in the comparison file directory It is not present or is present in when comparing file directory but creation time and being later than the creation time recorded in catalogue, judge that file is new File, and the information of the new file is added and compares file directory, the comparison file directory information includes file path, text Part creation time, " whether having compared " and " comparing situation ".
12. a kind of computer readable storage medium according to claim 7, which is characterized in that the step 40 is specifically wrapped It includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the reference file one by one with the ratio Be compared compared with file corresponding in file set, as MD5 value unanimously if determine to have compared and pass through, " whether will compare " and be labeled as " comparing situation " is labeled as " MD5 is consistent " by "Yes";If MD5 value is inconsistent, the comparison file that the comparison document is concentrated Whether file corresponding with reference file concentration is compared line by line, " will compare " and mark labeled as "Yes", by " comparison situation " It is denoted as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking.
CN201910184724.9A 2019-03-12 2019-03-12 A kind of method and computer readable storage medium of high-volume data automatic comparison Pending CN109977082A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910184724.9A CN109977082A (en) 2019-03-12 2019-03-12 A kind of method and computer readable storage medium of high-volume data automatic comparison

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910184724.9A CN109977082A (en) 2019-03-12 2019-03-12 A kind of method and computer readable storage medium of high-volume data automatic comparison

Publications (1)

Publication Number Publication Date
CN109977082A true CN109977082A (en) 2019-07-05

Family

ID=67078595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910184724.9A Pending CN109977082A (en) 2019-03-12 2019-03-12 A kind of method and computer readable storage medium of high-volume data automatic comparison

Country Status (1)

Country Link
CN (1) CN109977082A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459916A (en) * 2020-04-16 2020-07-28 中国银行股份有限公司 GBASE and ORAC L E database table comparison method and system
CN112948389A (en) * 2021-03-05 2021-06-11 上海上讯信息技术股份有限公司 MD 5-based database table data comparison method and equipment
CN115632877A (en) * 2022-12-01 2023-01-20 成都九洲电子信息系统股份有限公司 Large-scale PCAP data correctness verification method, system and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150234885A1 (en) * 2014-02-18 2015-08-20 Black Duck Software, Inc. Methods and systems for efficient comparison of file sets
CN105989089A (en) * 2015-02-12 2016-10-05 阿里巴巴集团控股有限公司 Data comparison method and device
CN106682534A (en) * 2017-01-23 2017-05-17 郑州云海信息技术有限公司 Method and device for verifying data integrity in data migration process
CN108256034A (en) * 2018-01-11 2018-07-06 北京潘达互娱科技有限公司 Data migration method and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150234885A1 (en) * 2014-02-18 2015-08-20 Black Duck Software, Inc. Methods and systems for efficient comparison of file sets
CN105989089A (en) * 2015-02-12 2016-10-05 阿里巴巴集团控股有限公司 Data comparison method and device
CN106682534A (en) * 2017-01-23 2017-05-17 郑州云海信息技术有限公司 Method and device for verifying data integrity in data migration process
CN108256034A (en) * 2018-01-11 2018-07-06 北京潘达互娱科技有限公司 Data migration method and equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459916A (en) * 2020-04-16 2020-07-28 中国银行股份有限公司 GBASE and ORAC L E database table comparison method and system
CN111459916B (en) * 2020-04-16 2023-05-23 中国银行股份有限公司 GBASE and ORACLE database table comparison method and system
CN112948389A (en) * 2021-03-05 2021-06-11 上海上讯信息技术股份有限公司 MD 5-based database table data comparison method and equipment
CN112948389B (en) * 2021-03-05 2023-07-25 上海上讯信息技术股份有限公司 MD 5-based database table data comparison method and device
CN115632877A (en) * 2022-12-01 2023-01-20 成都九洲电子信息系统股份有限公司 Large-scale PCAP data correctness verification method, system and storage medium

Similar Documents

Publication Publication Date Title
CN109460349B (en) Test case generation method and device based on log
CN109977082A (en) A kind of method and computer readable storage medium of high-volume data automatic comparison
US9235622B2 (en) System and method for an efficient query sort of a data stream with duplicate key values
CN110941621A (en) Method and device for synchronizing databases between internal network and external network
US8498995B1 (en) Optimizing data retrieval during event data query processing
NL2026782B1 (en) Method and system for determining affiliation of software to software families
Rabl et al. Just can't get enough: Synthesizing Big Data
CN112231407A (en) DDL synchronization method, device, equipment and medium of PostgreSQL database
KR101990329B1 (en) Method and apparatus for improving database recovery speed using log data analysis
Araujo et al. Comparative performance analysis of NoSQL Cassandra and MongoDB databases
CN103617122B (en) A kind of comparison method of source code
US8095548B2 (en) Methods, program product, and system of data management having container approximation indexing
Jones et al. A method and implementation for the empirical study of deleted file persistence in digital devices and media
CN111221690B (en) Model determination method and device for integrated circuit design and terminal
CN104933096A (en) Abnormal key recognition method of database, abnormal key recognition device of database and data system
CN106776255A (en) The log extracting method and device of intelligent television system
US9305080B2 (en) Accelerating queries using delayed value projection of enumerated storage
CN111104441A (en) Data acquisition method and system
CN106096804B (en) Monitoring method for whole maintenance process of intelligent power grid dispatching control system model
CN109408525A (en) A kind of agricultural data library SQL statement safety detection method and system
Zhengwei et al. The application of structure arrays and files in the SCPI parsing system
CN108345541A (en) A kind of program detecting method and system
Al Sadi et al. Improving the efficiency of big forensic data analysis using NoSQL
Fernandes et al. An archiver appliance performance and resources consumption study
KR20190067147A (en) Method and apparatus for improving database recovery speed using log data analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190705

RJ01 Rejection of invention patent application after publication