CN109977082A - A kind of method and computer readable storage medium of high-volume data automatic comparison - Google Patents
A kind of method and computer readable storage medium of high-volume data automatic comparison Download PDFInfo
- Publication number
- CN109977082A CN109977082A CN201910184724.9A CN201910184724A CN109977082A CN 109977082 A CN109977082 A CN 109977082A CN 201910184724 A CN201910184724 A CN 201910184724A CN 109977082 A CN109977082 A CN 109977082A
- Authority
- CN
- China
- Prior art keywords
- file
- comparison
- compared
- collection
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides the method and computer readable storage medium of a kind of high-volume data automatic comparison, comprising: the database file before step 10, automatic acquisition migration and the database file after migration, and split and exported respectively;Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, the database file after splitting derived migration pre-processes, and obtains comparison document collection;The file of step 30, the scanning reference file collection, generates and compares file directory;File progress MD5 value corresponding with comparison document concentration compares and obtains comparison result one by one for step 40, the file compared according to the needs that the comparison file directory concentrates the reference file.The method and computer readable storage medium of a kind of high-volume data automatic comparison provided by the invention are pre-processed, than equity automatically by data file of the program to migration front and back, reduce the operation of artificial link, improve data comparison efficiency.
Description
Technical field
The present invention relates to data processing fields, and in particular to the method and computer of a kind of high-volume data automatic comparison can
Read storage medium.
Background technique
In recent years, information-based intensified competition, the frontier that big data is fought at enterprise, with rising abruptly for " wisdom economy "
Rise, acquisition data, grasp data, maintenance data at enterprise core competitiveness.Traditional oracle database has met
The not demand of big data parallel processing, goes Oracle gradually to bring into schedule.
Data Migration improves the means of data application efficiency as enterprise, also brings the integrality, accurate of Data Migration
The problem of how property ensures.The method that tradition compares migration front and back data, every table compare will by manually checking, analyzing,
Time-consuming, artificial investment is big.With the development of computer technology, also emerged one after another using the method that algorithmic technique is compared,
Active and standby end data is imported into memory, and is compared using various lookup algorithms.
Existing Publication No. CN104239301B, patent of invention " a kind of data ratio that publication date is on 2 13rd, 2018
To method and apparatus ", which proposes a kind of data comparison method, comprising: determines the first data set and second to be compared
Data set, each comparison other in data set include one or more relatively items;Determine the type of relatively item, the type is extremely
It less include: that the first kind compares item and the non-first kind compares item;To in the comparison other and the second data set in the first data set
Comparison other is compared, wherein if the first kind of the first comparison other in the first data set compares item and described second
The correspondence first kind in the second comparison other in data set compares that item is identical, and the non-first kind of the first comparison other compares item
Difference compared with the non-first kind of the correspondence of the second comparison other between item meets preset condition, then judge the first comparison other and
Second comparison other is consistent.
Existing Publication No. CN107679104A, " big surface low formula is simultaneously for the patent of invention that publication date is on 2 9th, 2018
Row high-speed data comparison method ", which proposes a kind of big surface low formula parallel high-speed data comparison method, characterized in that packet
Include following steps: (1-1) compares application program and configures primary database table information and standby data to be compared by database link
The information of library table, if primary database table information and standby database table message structure are inconsistent, return can not be compared;(1-2) ratio
Compared with the index field of primary database table information and standby database table, minimum value min and maximum value max is obtained, is started for comparing
It is marked with end;And parallel comparison port number N is set, N dynamic generation is used for parallel processing;(1-3) is by master/slave data library table
Record presses index field sort ascending, and ranking results is carried out piecemeal by the parallel port number N that compares, and each channel is flowed respectively
Formula reads data, is stored in caching;(1-4) parallel data processing compares in each channel, records comparison result.
The above inventive method the prior art has at least the following problems:
1, the method that current method is directed to single file data or small lot comparing, is unable to satisfy a large amount of numbers
According to the requirement of comparison;
2, current method is the inspection based on file content, and time-consuming, low efficiency, fails effective compression ratio to duration;
3, it needs to touch solid data in comparison process, does not consider Information Security problem, it is understood that there may be leaking data wind
Danger.
Summary of the invention
One of the technical problem to be solved in the present invention is to provide a kind of method of high-volume data automatic comparison, pass through
Program pre-processes the data file of migration front and back, automatically than equity, reduces the operation of artificial link, improves comparing
Efficiency.
One of technical problems to be solved of the embodiment of the present invention are achieved in that
A kind of method of high-volume data automatic comparison, includes the following steps:
Step 10 is configured database configuration item, automatically to the database file before migration and the data after migration
Library file is split and is exported respectively;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, will split
Database file after derived migration is pre-processed, and comparison document collection is obtained;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file directory needs of concentrating the reference file one by one with institute
Stating comparison document concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, as MD5 value is different
Then generation error record is caused, for subsequent data analysis and tracking.
Preferably, the method further includes:
First access authority is arranged to database configuration item and the program file used, to the reference file collection and is compared
The second access authority is arranged in file set, and third access authority is arranged to the comparison file directory and error logging.
Preferably, the method further includes:
Step 50, setting memory space release switch, when the switch is opened, while deleting reference file collection and comparison document
Concentration has compared the file passed through, when the switch is closed, deletes without file.
Preferably, it pre-processes and specifically includes in the step 20:
Data normalization processing, including difference existing for removal hashed field, elimination heterogeneous database are carried out to file, really
It is consistent to protect two-end structure;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
Preferably, the step 30 specifically: the data file of reference file collection described in timing scan generates and compares file
Catalogue compares file directory but creation time and is later than catalogue when file is not present or is present in the comparison file directory
When the creation time of middle record, judge file for new file, and by the information of the new file be added compare file directory, it is described
Whether compare file directory information includes file path, file creation time, " having compared " and " comparing situation ".
Preferably, the step 40 specifically includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the reference file one by one with institute
Stating comparison document concentrates corresponding file to be compared, as MD5 value unanimously if determine to have compared and pass through, " whether will compare " mark
It is denoted as "Yes", " comparison situation " is labeled as " MD5 is consistent ";If MD5 value is inconsistent, the comparison that the comparison document is concentrated
Whether file file corresponding with reference file concentration is compared line by line, " will compare " labeled as "Yes", " will compare feelings
Condition " is labeled as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking.
The second technical problem to be solved by the present invention is to provide a kind of computer readable storage medium, passes through program pair
The data file of migration front and back is pre-processed, automatically than equity, is reduced the operation of artificial link, is improved data comparison efficiency.
The two of technical problems to be solved of the embodiment of the present invention are achieved in that
A kind of computer readable storage medium is stored thereon with computer program (instruction), and the program (instruction) is processed
Device performs the steps of when executing
Step 10 is configured database configuration item, automatically to the database file before migration and the data after migration
Library file is split and is exported respectively;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, will split
Database file after derived migration is pre-processed, and comparison document collection is obtained;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute
Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, as MD5 value is different
Then generation error record is caused, for subsequent data analysis and tracking.
Preferably, described program also executes:
First access authority is arranged to database configuration item and the program file used, to the reference file collection and is compared
The second access authority is arranged in file set, and third access authority is arranged to the comparison file directory and error logging.
Preferably, described program also executes:
Step 50, setting memory space release switch, when the switch is opened, while deleting reference file collection and comparison document
Concentration has compared the file passed through, when the switch is closed, deletes without file.
Preferably, described program also executes:
Data normalization processing, including difference existing for removal hashed field, elimination heterogeneous database are carried out to file, really
It is consistent to protect two-end structure;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
Preferably, the step 30 specifically: the data file of reference file collection described in timing scan generates and compares file
Catalogue compares file directory but creation time and is later than catalogue when file is not present or is present in the comparison file directory
When the creation time of middle record, judge file for new file, and by the information of the new file be added compare file directory, it is described
Whether compare file directory information includes file path, file creation time, " having compared " and " comparing situation ".
Preferably, the step 40 specifically includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute
Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, " whether will compare
It is right " it is labeled as "Yes", " comparison situation " is labeled as " MD5 is consistent ";If MD5 value is inconsistent, the comparison document is concentrated
It compares file file corresponding with reference file concentration to be compared line by line, " whether will compare " labeled as "Yes", " will compare
Situation " is labeled as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking.
The present invention has the advantage that
1, pre-processed automatically, by data file of the program to migration front and back than equity, overcome the prior art without
Method meets mass data and compares problem, and the automation for realizing high-volume data compares;
2, it realizes that file compares by the comparison to file MD5 value, overcomes the current inspection based on file content, consume
Duration, low efficiency fail the problem of effective compression ratio is to duration, have time-consuming short, high-efficient advantage;
3, different permissions is set by the comparison to file MD5 value and to program and comparison file and result, overcomes
It needs to touch solid data in comparison process, it is understood that there may be the problem of leaking data risk, with high security, be easily managed number
According to the advantage for comparing situation.
Detailed description of the invention
The present invention is further illustrated in conjunction with the embodiments with reference to the accompanying drawings.
Fig. 1 is the method for the present invention execution flow chart.
Fig. 2 is flow chart of data processing figure of the embodiment of the present invention.
Fig. 3 is comparing of embodiment of the present invention flow chart.
Specific embodiment
It please refers to shown in Fig. 1 to 3, a kind of method of high-volume data automatic comparison includes the following steps:
Step 10 is configured database configuration item, automatically to the database file before migration and the data after migration
Library file is split and is exported respectively;After being configured to database configuration item, split and derivation program logarithm by data
It is split according to the big list file in library, it is ensured that data file size is controlled, and can be handled by program;Before data export
It can check disk space service condition, when Insufficient disk space, pause export continues to export after disk space release.
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, will split
Database file after derived migration is pre-processed, and comparison document collection is obtained;
The file of step 30, the scanning reference file collection, generating comparison file directory (can also scan comparison document
Collection generates and compares file directory);
Step 40, the file compared according to the comparison file directory needs of concentrating the reference file one by one with institute
Stating comparison document corresponding file is concentrated to carry out the comparison of MD5 value (is that scanning comparison document collection generates as compared file directory, then
Corresponding with reference file concentration file carries out MD5 value and compares), as MD5 value unanimously if determine to have compared and pass through, not such as MD5 value
Consistent then generation error records, for subsequent data analysis and tracking.
Entire file as a big text information, is converted algorithm by its irreversible character string, produced only by MD5
One MD5 informative abstract.The typical case of MD5 is to generate informative abstract to a segment information, to prevent from being tampered, if anyone
Any change is done to file, changes will occur for MD5 value.Therefore it can pass through comparison basis file set and comparison document collection
In the MD5 value of one-to-one file judge whether file completely the same.
In a preferred embodiment, the method further includes:
To database configuration item, (the database configuration item includes that database access address, database user name, data are close
Code needs derived table and field information etc.) and the program file that uses the first access authority is set, to the reference file collection and
The second access authority is arranged in comparison document collection, and third access authority is arranged to the comparison file directory and error logging.
For example setting program maintenance personnel possess the first access authority, can safeguard alignment programs, and to database therein
The important parameters such as link information, data processing rule are configured;Setting common test personnel: possessing third access authority, can
Comparison result, error logging are accessed, is convenient for tracking to compare situation, progress, and to inconsistent data progress preliminary analysis and instead
Feedback;Setting advanced test personnel possess the second access authority and third access authority simultaneously, and addressable reference file collection, comparison are literary
Part collection, comparison result, error logging, it is ensured that when occurring to compare inconsistent, have permission and error logging is analysed in depth.
The method further includes:
Step 50, setting memory space release switch, when the switch is opened, while deleting reference file collection and comparison document
Concentration has compared the file passed through, when the switch is closed, deletes without file.
In the case that and memory space huge in correlation data amount is relatively limited, in order to meet the comparison need of mass data
It asks, reference file collection and comparison document concentration can have been compared the file passed through and deleted, one memory space of setting discharges
Switch, the unlatching when needing to discharge memory space, program, which will be automatically deleted, have been compared the benchmark passed through, has compared file, with release
Memory space is subsequent file warehousing vacating space to be compared.
Pretreatment specifically includes in the step 20:
Data normalization processing, including removal hashed field (such as timestamp), elimination heterogeneous database are carried out to file
Existing difference, it is ensured that two-end structure is consistent;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
After being pre-processed to the file of reference file collection and comparison document collection, it can be ensured that will not be because of in comparison process
Other factors other than data itself influence comparison result.
The step 30 specifically: the data file of reference file collection described in timing scan, generation comparison file directory (
Can scan comparison document collection, generate and compare file directory), when file is not present or existing in the comparison file directory
(creation time recorded in catalogue is upper one when comparing file directory but creation time is later than the creation time recorded in catalogue
When secondary scanning is to the file, the creation time of this document), judge that file for new file, and the information of the new file is added
Enter and compare file directory, the comparison file directory information includes whether file path, file creation time, " having compared " are (silent
Think "No") and " comparing situation " (being defaulted as " to be compared ").
The step 40 specifically includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute
Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, " whether will compare
It is right " it is labeled as "Yes", " comparison situation " is labeled as " MD5 is consistent ";If MD5 value is inconsistent, the comparison document is concentrated
It compares file file corresponding with reference file concentration to be compared line by line, " whether will compare " labeled as "Yes", " will compare
Situation " is labeled as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking;As comparison document is concentrated
There is no the one-to-one files of file compared with the needs, then do not deal with, and continue to execute the next item down and compare task.
In a preferred embodiment, the content that inconsistent row is recorded while comparison line by line, when inconsistent line number is excessive
When, the content of preceding 10 row can be only recorded, and generation error records, tester can be recorded specific interior by access errors
Hold, analyzes inconsistent situation, and operate according to actual needs to database file.
It refer again to shown in Fig. 1 to 3, a kind of computer readable storage medium, be stored thereon with computer program (instruction),
The program (instruction) performs the steps of when being executed by processor
Step 10 is configured database configuration item, automatically to the database file before migration and the data after migration
Library file is split and is exported respectively;After being configured to database configuration item, split and derivation program logarithm by data
It is split according to the big list file in library, it is ensured that data file size is controlled, and can be handled by program;Before data export
It can check disk space service condition, when Insufficient disk space, pause export continues to export after disk space release;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, will split
Database file after derived migration is pre-processed, and comparison document collection is obtained;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute
Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, as MD5 value is different
Then generation error record is caused, for subsequent data analysis and tracking.
Entire file as a big text information, is converted algorithm by its irreversible character string, produced only by MD5
One MD5 informative abstract.The typical case of MD5 is to generate informative abstract to a segment information, to prevent from being tampered, if anyone
Any change is done to file, changes will occur for MD5 value.Therefore it can pass through comparison basis file set and comparison document collection
In the MD5 value of one-to-one file judge whether file completely the same.
In a preferred embodiment, described program also executes:
To database configuration item, (the database configuration item includes that database access address, database user name, data are close
Code needs derived table and field information etc.) and the program file that uses the first access authority is set, to the reference file collection and
The second access authority is arranged in comparison document collection, and third access authority is arranged to the comparison file directory and error logging.
For example setting program maintenance personnel possess the first access authority, can safeguard alignment programs, and to database therein
The important parameters such as link information, data processing rule are configured;Setting common test personnel: possessing third access authority, can
Comparison result, error logging are accessed, is convenient for tracking to compare situation, progress, and to inconsistent data progress preliminary analysis and instead
Feedback;Setting advanced test personnel possess the second access authority and third access authority simultaneously, and addressable reference file collection, comparison are literary
Part collection, comparison result, error logging, it is ensured that when occurring to compare inconsistent, have permission and error logging is analysed in depth.
Described program also executes:
Step 50, setting memory space release switch, when the switch is opened, while deleting reference file collection and comparison document
Concentration has compared the file passed through, when the switch is closed, deletes without file.
In the case that and memory space huge in correlation data amount is relatively limited, in order to meet the comparison need of mass data
It asks, reference file collection and comparison document concentration can have been compared the file passed through and deleted, one memory space of setting discharges
Switch, the unlatching when needing to discharge memory space, program, which will be automatically deleted, have been compared the benchmark passed through, has compared file, with release
Memory space is subsequent file warehousing vacating space to be compared.
Described program also executes:
Data normalization processing, including removal hashed field (such as timestamp), elimination heterogeneous database are carried out to file
Existing difference, it is ensured that two-end structure is consistent;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
After being pre-processed to the file of reference file collection and comparison document collection, it can be ensured that will not be because of in comparison process
Other factors other than data itself influence comparison result.
The step 30 specifically: the data file of reference file collection described in timing scan generates and compares file directory, when
File is not present or is present in the comparison file directory to compare file directory but creation time and be later than in catalogue and record
Creation time when (when the creation time recorded in catalogue is last scanning to the file, the creation time of this document),
Judge file for new file, and by the information of the new file be added compare file directory, the comparison file directory information packet
Include file path, file creation time, " whether having compared " (being defaulted as "No") and " comparing situation " (being defaulted as " to be compared ").
The step 40 specifically includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the comparison document one by one with institute
Stating reference file concentrates corresponding file to carry out the comparison of MD5 value, as MD5 value unanimously if determine to have compared and pass through, " whether will compare
It is right " it is labeled as "Yes", " comparison situation " is labeled as " MD5 is consistent ";If MD5 value is inconsistent, the comparison document is concentrated
It compares file file corresponding with reference file concentration to be compared line by line, " whether will compare " labeled as "Yes", " will compare
Situation " is labeled as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking;As comparison document is concentrated
There is no the one-to-one files of file compared with the needs, then do not deal with, and continue to execute the next item down and compare task.
In a preferred embodiment, the content that inconsistent row is recorded while comparison line by line, when inconsistent line number is excessive
When, the content of preceding 10 row can be only recorded, and generation error records, tester can be recorded specific interior by access errors
Hold, analyzes inconsistent situation, and operate according to actual needs to database file.
The present invention is pre-processed, by data file of the program to migration front and back than equity automatically, overcomes existing skill
Art is unable to satisfy mass data and compares problem, and the automation for realizing high-volume data compares;Pass through the comparison to file MD5 value
It realizes that file compares, overcomes the current inspection based on file content, time-consuming, low efficiency, fails effective compression ratio clock synchronization
Long problem has time-consuming short, high-efficient advantage;By the comparison to file MD5 value and to program and compare file and knot
Different permissions is arranged in fruit, and overcoming needs to touch solid data in comparison process, it is understood that there may be the problem of leaking data risk,
With high security, it is easily managed the advantage of comparing situation.
Although specific embodiments of the present invention have been described above, those familiar with the art should be managed
Solution, we are merely exemplary described specific embodiment, rather than for the restriction to the scope of the present invention, it is familiar with this
The technical staff in field should be covered of the invention according to modification and variation equivalent made by spirit of the invention
In scope of the claimed protection.
Claims (12)
1. a kind of method of high-volume data automatic comparison, which comprises the steps of:
Step 10 is treated comparison data library configuration item and is configured, before the migration of database configuration item automatic acquisition of scientific information
Database file after database file and migration, and split and exported respectively;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, export splitting
Migration after database file pre-processed, obtain comparison document collection;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file directory needs of concentrating the reference file one by one with the ratio
Carry out the comparison of MD5 value compared with file corresponding in file set, as MD5 value unanimously if determine to have compared and pass through, if MD5 value is inconsistent
Generation error record, for subsequent data analysis and tracking.
2. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that the method is into one
Step includes:
First access authority is arranged to database configuration item and the program file used, to the reference file collection and comparison document
Collection the second access authority of setting, is arranged third access authority to the comparison file directory and error logging.
3. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that the method is into one
Step includes:
Step 50, setting memory space release switch when the switch is opened, while deleting reference file collection and comparison document concentration
The file passed through has been compared, when the switch is closed, has been deleted without file.
4. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that in the step 20
Pretreatment specifically includes:
Data normalization processing, including difference existing for removal hashed field, elimination heterogeneous database are carried out to file, it is ensured that two
End structure is consistent;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
5. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that step 30 tool
Body are as follows: the data file of reference file collection described in timing scan generates and compares file directory, when file is in the comparison file mesh
It is not present or is present in record when comparing file directory but creation time and being later than the creation time recorded in catalogue, judge file
For new file, and the information of the new file is added and compares file directory, the comparison file directory information includes file road
Diameter, file creation time, " whether having compared " and " comparing situation ".
6. a kind of method of high-volume data automatic comparison according to claim 1, which is characterized in that step 40 tool
Body includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the reference file one by one with the ratio
Be compared compared with file corresponding in file set, as MD5 value unanimously if determine to have compared and pass through, " whether will compare " and be labeled as
" comparing situation " is labeled as " MD5 is consistent " by "Yes";If MD5 value is inconsistent, the comparison file that the comparison document is concentrated
Whether file corresponding with reference file concentration is compared line by line, " will compare " and mark labeled as "Yes", by " comparison situation "
It is denoted as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking.
7. a kind of computer readable storage medium is stored thereon with computer program (instruction), which is characterized in that the program (refers to
Enable) it performs the steps of when being executed by processor
Step 10 is treated comparison data library configuration item and is configured, before the migration of database configuration item automatic acquisition of scientific information
Database file after database file and migration, and split and exported respectively;
Step 20 will split the database file before derived migration and pre-process, and obtain reference file collection, export splitting
Migration after database file pre-processed, obtain comparison document collection;
The file of step 30, the scanning reference file collection, generates and compares file directory;
Step 40, the file compared according to the comparison file directory needs of concentrating the reference file one by one with the ratio
Carry out the comparison of MD5 value compared with file corresponding in file set, as MD5 value unanimously if determine to have compared and pass through, if MD5 value is inconsistent
Generation error record, for subsequent data analysis and tracking.
8. a kind of computer readable storage medium according to claim 7, which is characterized in that described program also executes:
First access authority is arranged to database configuration item and the program file used, to the reference file collection and comparison document
Collection the second access authority of setting, is arranged third access authority to the comparison file directory and error logging.
9. a kind of computer readable storage medium according to claim 7, which is characterized in that described program also executes:
Step 50, setting memory space release switch when the switch is opened, while deleting reference file collection and comparison document concentration
The file passed through has been compared, when the switch is closed, has been deleted without file.
10. a kind of computer readable storage medium according to claim 7, which is characterized in that described program also executes:
Data normalization processing, including difference existing for removal hashed field, elimination heterogeneous database are carried out to file, it is ensured that two
End structure is consistent;
Data after standardization are ranked up, it is ensured that both ends sequence consensus.
11. a kind of computer readable storage medium according to claim 7, which is characterized in that the step 30 specifically:
The data file of reference file collection described in timing scan generates and compares file directory, when file is in the comparison file directory
It is not present or is present in when comparing file directory but creation time and being later than the creation time recorded in catalogue, judge that file is new
File, and the information of the new file is added and compares file directory, the comparison file directory information includes file path, text
Part creation time, " whether having compared " and " comparing situation ".
12. a kind of computer readable storage medium according to claim 7, which is characterized in that the step 40 is specifically wrapped
It includes:
Step 41 will compare the file that state " whether has been compared " in file directory as "No" as comparison file task;
Step 42, the file compared according to the comparison file task needs of concentrating the reference file one by one with the ratio
Be compared compared with file corresponding in file set, as MD5 value unanimously if determine to have compared and pass through, " whether will compare " and be labeled as
" comparing situation " is labeled as " MD5 is consistent " by "Yes";If MD5 value is inconsistent, the comparison file that the comparison document is concentrated
Whether file corresponding with reference file concentration is compared line by line, " will compare " and mark labeled as "Yes", by " comparison situation "
It is denoted as " file compares inconsistent ", generation error record, for subsequent data analysis and tracking.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910184724.9A CN109977082A (en) | 2019-03-12 | 2019-03-12 | A kind of method and computer readable storage medium of high-volume data automatic comparison |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910184724.9A CN109977082A (en) | 2019-03-12 | 2019-03-12 | A kind of method and computer readable storage medium of high-volume data automatic comparison |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109977082A true CN109977082A (en) | 2019-07-05 |
Family
ID=67078595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910184724.9A Pending CN109977082A (en) | 2019-03-12 | 2019-03-12 | A kind of method and computer readable storage medium of high-volume data automatic comparison |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977082A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459916A (en) * | 2020-04-16 | 2020-07-28 | 中国银行股份有限公司 | GBASE and ORAC L E database table comparison method and system |
CN112948389A (en) * | 2021-03-05 | 2021-06-11 | 上海上讯信息技术股份有限公司 | MD 5-based database table data comparison method and equipment |
CN115632877A (en) * | 2022-12-01 | 2023-01-20 | 成都九洲电子信息系统股份有限公司 | Large-scale PCAP data correctness verification method, system and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150234885A1 (en) * | 2014-02-18 | 2015-08-20 | Black Duck Software, Inc. | Methods and systems for efficient comparison of file sets |
CN105989089A (en) * | 2015-02-12 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Data comparison method and device |
CN106682534A (en) * | 2017-01-23 | 2017-05-17 | 郑州云海信息技术有限公司 | Method and device for verifying data integrity in data migration process |
CN108256034A (en) * | 2018-01-11 | 2018-07-06 | 北京潘达互娱科技有限公司 | Data migration method and equipment |
-
2019
- 2019-03-12 CN CN201910184724.9A patent/CN109977082A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150234885A1 (en) * | 2014-02-18 | 2015-08-20 | Black Duck Software, Inc. | Methods and systems for efficient comparison of file sets |
CN105989089A (en) * | 2015-02-12 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Data comparison method and device |
CN106682534A (en) * | 2017-01-23 | 2017-05-17 | 郑州云海信息技术有限公司 | Method and device for verifying data integrity in data migration process |
CN108256034A (en) * | 2018-01-11 | 2018-07-06 | 北京潘达互娱科技有限公司 | Data migration method and equipment |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459916A (en) * | 2020-04-16 | 2020-07-28 | 中国银行股份有限公司 | GBASE and ORAC L E database table comparison method and system |
CN111459916B (en) * | 2020-04-16 | 2023-05-23 | 中国银行股份有限公司 | GBASE and ORACLE database table comparison method and system |
CN112948389A (en) * | 2021-03-05 | 2021-06-11 | 上海上讯信息技术股份有限公司 | MD 5-based database table data comparison method and equipment |
CN112948389B (en) * | 2021-03-05 | 2023-07-25 | 上海上讯信息技术股份有限公司 | MD 5-based database table data comparison method and device |
CN115632877A (en) * | 2022-12-01 | 2023-01-20 | 成都九洲电子信息系统股份有限公司 | Large-scale PCAP data correctness verification method, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109460349B (en) | Test case generation method and device based on log | |
CN109977082A (en) | A kind of method and computer readable storage medium of high-volume data automatic comparison | |
US9235622B2 (en) | System and method for an efficient query sort of a data stream with duplicate key values | |
CN110941621A (en) | Method and device for synchronizing databases between internal network and external network | |
US8498995B1 (en) | Optimizing data retrieval during event data query processing | |
NL2026782B1 (en) | Method and system for determining affiliation of software to software families | |
Rabl et al. | Just can't get enough: Synthesizing Big Data | |
CN112231407A (en) | DDL synchronization method, device, equipment and medium of PostgreSQL database | |
KR101990329B1 (en) | Method and apparatus for improving database recovery speed using log data analysis | |
Araujo et al. | Comparative performance analysis of NoSQL Cassandra and MongoDB databases | |
CN103617122B (en) | A kind of comparison method of source code | |
US8095548B2 (en) | Methods, program product, and system of data management having container approximation indexing | |
Jones et al. | A method and implementation for the empirical study of deleted file persistence in digital devices and media | |
CN111221690B (en) | Model determination method and device for integrated circuit design and terminal | |
CN104933096A (en) | Abnormal key recognition method of database, abnormal key recognition device of database and data system | |
CN106776255A (en) | The log extracting method and device of intelligent television system | |
US9305080B2 (en) | Accelerating queries using delayed value projection of enumerated storage | |
CN111104441A (en) | Data acquisition method and system | |
CN106096804B (en) | Monitoring method for whole maintenance process of intelligent power grid dispatching control system model | |
CN109408525A (en) | A kind of agricultural data library SQL statement safety detection method and system | |
Zhengwei et al. | The application of structure arrays and files in the SCPI parsing system | |
CN108345541A (en) | A kind of program detecting method and system | |
Al Sadi et al. | Improving the efficiency of big forensic data analysis using NoSQL | |
Fernandes et al. | An archiver appliance performance and resources consumption study | |
KR20190067147A (en) | Method and apparatus for improving database recovery speed using log data analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190705 |
|
RJ01 | Rejection of invention patent application after publication |