A kind of method, apparatus and relevant apparatus of distributed data base data check
Technical field
The present invention relates to communication technical field, more particularly to a kind of method, apparatus of distributed data base data check
And relevant apparatus.
Background technology
As business datum continues to build up on the extensive use of database technology and line, particularly Internet service is fast
Speed development, data volume is growing day by day, and unit database performance increasingly becomes the bottleneck of business on line, and due to distributed data
Storehouse can provide the database service of high-performance, large buffer memory, high concurrent, so as to be quickly applied to business on various lines
In scene.
But existing distributed data base is in Data Migration, data initialization, after not can determine that data before changing
Uniformity, so as to limit the application of distributed data base.
The content of the invention
It is existing to solve the invention provides a kind of method, apparatus and relevant apparatus of distributed data base data check
After distributed data base not can determine that data before changing in technology the problem of data consistency.
One aspect of the present invention provides a kind of method of distributed data base data check, including:
Data to be changed are exported into data and describe text, the number to be changed according to derived data describe text calculating
The check value of row data is specified in;
The data to be changed are split by row, and data to be changed described in after fractionation are imported into corresponding number
According to storehouse node;
After the completion of data import, calculate and specify the check value of row data in the data to be changed after importing, and compare and lead
Whether specify the check value of row data consistent, if unanimously, it is determined that the number to be changed if entering in the front and rear data to be changed
According in change self-consistentency.
Further, the check value of row data is specified in data to be changed described in the calculating, is specifically included:
The check value for certain the row data specified in data to be changed described in calculating, or data middle finger to be changed described in calculating
The sum of the check value of the continuous N row data of fixed one or more.
Further, when the specifies behavior row data, the calculating is specified after importing in the data to be changed
The check value of row data, is specifically included:Calculate the check value for certain the row data specified after importing in the data to be changed;It is described
Whether specify the check value of row data consistent, specifically include if comparing in the data to be changed:Compare and wait to become described in before and after importing
The check value for certain the row data specified in more data;
When the data of the one or more continuous N rows of the specifies behavior, the data to be changed after calculating importing
In specify the check values of row data, specifically include:Calculate the one or more continuous N specified after importing in the data to be changed
The sum of the check value of row data;Specify the check value of row data whether consistent in data to be changed described in the comparison, specific bag
Include:Compare the sum of the check value of the continuous N row data of the one or more specified before and after importing in the data to be changed.
Further, after being split to the data to be changed by row, and by data to be changed described in after fractionation
It imported into before corresponding database node, in addition to:
The database node that should be deposited respectively according to the data to be changed after the distributed distribution rules acquisition fractionation.
Further, data to be changed described in after fractionation are imported into corresponding database node, specifically included:
Data to be changed described in after fractionation are write to the file cache of corresponding database node, notification database cluster
Number of files and list of file names have been completed in management, and will be stored to text by the data-base cluster management trigger database broker
The data to be changed of part caching are downloaded in the database node;
Wherein, the database broker corresponds with the database node respectively.
Further, the data to be changed include data to be initiated, data to be migrated and divided data to be weighed.
Another aspect of the present invention provides a kind of device of distributed data base data check, including:
First computing unit, text is described for data to be changed to be exported into data, text is described according to derived data
The check value of row data is specified in data to be changed described in this calculating;
Import unit, for being split to the data to be changed by row, and by data to be changed described in after fractionation
It imported into corresponding database node;
Second computing unit, after the completion of being imported for data, calculate and specify row data in the data to be changed after importing
Check value;
Comparing unit, for compare import before and after in data change specified row data check value it is whether consistent,
It is if consistent, it is determined that the data to be changed are in change self-consistentency.
Further, first computing unit is additionally operable to, certain the row data specified in data to be changed described in calculating
The sum of the check values for the continuous N row data of one or more specified in check value, or data to be changed described in calculating.
Further, second computing unit is additionally operable to, and when the specifies behavior row data, calculates institute after importing
State the check value for certain the row data specified in data to be changed;When the data of the one or more continuous N rows of the specifies behavior,
Calculate the sum of the check value of the continuous N row data of the one or more specified after importing in the data to be changed;
The comparing unit is additionally operable to, and when the specifies behavior row data, compares the number to be changed before and after importing
The check value for certain the row data specified in;When the data of the one or more continuous N rows of the specifies behavior, before comparing importing
The sum of the check value for the continuous N row data of one or more specified afterwards in the data to be changed.
Further, the import unit further comprises:
Module is split, for being split to the data to be changed by row;
Acquisition module, for obtaining what the data to be changed after the fractionation should be deposited respectively according to distributed distribution rules
Database node;
Import modul, for data to be changed described in after fractionation to be imported into corresponding database node.
Further, the import unit further comprises:
Module is split, for being split to the data to be changed by row;
Import modul, for data to be changed described in after fractionation to be write to the file cache of corresponding database node,
Notification database cluster management has completed number of files and list of file names, and passes through the data-base cluster management trigger database
Agency will store to the data to be changed of file cache and download in the database node, the database broker respectively with it is described
Database node corresponds.
Further, the data to be changed include data to be initiated, data to be migrated and divided data to be weighed.
Further aspect of the present invention provides a kind of data of the device provided with the verification of above-mentioned any distributed database data
Storehouse cluster server.
The present invention has the beneficial effect that:
The present invention by compare import before and after in data change specified row data check value it is whether consistent, to determine to treat
Data are changed in change self-consistentency, are efficiently solved after distributed data base in the prior art not can determine that data before changing
The problem of data consistency.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet of the method for distributed data base data check of the embodiment of the present invention;
Fig. 2 is the schematic flow sheet of the method for another distributed data base data check of the embodiment of the present invention;
Fig. 3 is a kind of structural representation of the device of distributed data base data check of the embodiment of the present invention;
Fig. 4 is the configuration diagram of the system of the online data migration of the embodiment of the present invention.
Embodiment
In order to solve the problems, such as data consistency after distributed data base not can determine that data before changing in the prior art, this
Invention provides a kind of method, apparatus and relevant apparatus of distributed data base data check, before the present invention is by comparing importing
Need the data check value of the data row to be changed of checking whether consistent afterwards, or compare the number to be changed for importing front and rear need checking
According to line number and corresponding line number data check value with it is whether consistent, so as to accurately determine data to be changed before changing
Uniformity afterwards, efficiently solve data consistency after distributed data base in the prior art not can determine that data before changing and ask
Topic.Below in conjunction with accompanying drawing and embodiment, the present invention will be described in further detail.It should be appreciated that tool described herein
Body embodiment does not limit the present invention only to explain the present invention.
Embodiment of the method
The embodiments of the invention provide a kind of method of distributed data base data check, executive agent of the invention is several
According to storehouse cluster server, referring to Fig. 1, this method includes:
S101, data to be changed exported into data describe text, according to derived data describe text calculate described in treat
Change the check value that row data are specified in data;
S102, the data to be changed are split by row, and data to be changed described in after fractionation are imported into phase
The database node answered;
After the completion of S103, data import, the check value for specifying row data after importing in the data to be changed is calculated;
Whether S104, the check value for comparing specified row data in data change before and after importing are consistent, if unanimously,
Data to be changed are in change self-consistentency described in then determining.
That is, the present invention by compare import before and after in data change specified row data check value whether one
Cause, to determine that data to be changed are changing self-consistentency, efficiently solve distributed data base in the prior art and not can determine that
Data before changing after data consistency the problem of.
When it is implemented, the embodiment of the present invention is by the way that step S101 is specifically included:Data middle finger to be changed described in calculating
The school for the continuous N row data of one or more specified in the check value of certain fixed row data, or data to be changed described in calculating
Test the sum of value.
That is, the present invention can by the check value for calculating certain the row data specified in data to be changed simply sampled, or
Person by the sums of the check values of the continuous N row data of the one or more specified in larger range of sample calculation data to be changed,
It is whether consistent to compare the data to be changed before and after importing.
It should be noted that the one or more specified in data to be changed described in being calculated described in the embodiment of the present invention is continuous
It is to include treating whole rows of change data all to be verified in the scheme of the sum of the check value of N row data.
Specifically, when the specifies behavior row data, the calculating is specified after importing in the data to be changed
The check value of row data, is specifically included:Calculate the check value for certain the row data specified after importing in the data to be changed;It is described
Whether specify the check value of row data consistent, specifically include if comparing in the data to be changed:Compare and wait to become described in before and after importing
The check value for certain the row data specified in more data;
All even number lines are verified for example, specifying, then the present invention owns by data to be changed before and after calculating importing
The check value of the data of even number line, and whether be compared to determine data to be changed before and after importing consistent.
When the data of the one or more continuous N rows of the specifies behavior, the data to be changed after calculating importing
In specify the check values of row data, specifically include:Calculate the one or more continuous N specified after importing in the data to be changed
The sum of the check value of row data;Specify the check value of row data whether consistent in data to be changed described in the comparison, specific bag
Include:Compare the sum of the check value of the continuous N row data of the one or more specified before and after importing in the data to be changed.
All line numbers are verified for example, specifying, then all rows of data to be changed before and after the present invention is imported by calculating
Data check value, and be compared to determine import before and after data to be changed it is whether consistent.
When it is implemented, the embodiment of the present invention is by by the data row to be changed in step S101, and corresponding to the row
Data check value is stored in a default checklist, naturally it is also possible to by the number of the line number of data to be changed and corresponding line number
According to check value and be stored in the checklist, in case data importing after carry out data consistency verification.
That is, it is of the invention on the basis of existing business operation is not influenceed, will according to default data distribution rule
Data imported into corresponding node, are verified by whole line numbers and row sampling of data verifies two kinds of means and ensures to import in database
Data and initial data strong consistency, while sample verification line number can configure.
It should be noted that the present invention will enter when changing data and exporting to data and describe text to each row data
Demarcation is gone, and database node is distributed to according to the positioning of row data, after the completion of data importing, according in default checklist
(line number can be appointed for row that the needs of record are verified (such as even number line, i.e. the row sampling of data verification) or default line number
Meaning setting or the data of whole rows, i.e. whole line number verifications), the data check value of data after the completion of acquisition imports, and
The data check value recorded in checklist described in the data check value is compared, if the two is consistent, then it is assumed that before importing
Data are consistent afterwards.
Data to be changed described in the embodiment of the present invention include data to be initiated, data to be migrated and divided data to be weighed.
That is the present invention can be to data initialization, Data Migration, data consistency carries out school data grade data before changing again after
Test.Because whole data change checking procedure of the invention need not lock to current database, data press row location data
Storehouse node and data distribution and data check all can be carried out independently, can just be taken when single-node data imports and lacked database clothes
Be engaged in device I/O, therefore smaller to online service impact.
Data to be changed are exported to data and describe text by the embodiment of the present invention to be specifically included:
, it is necessary to which data to be migrated are exported into data describes text, across data before Data Migration or data initialization
Storehouse and current distributed database are supported, need to export the database table for needing to migrate according to former database syntax when inter-library
File is described into text, distributed data can be exported to text by current distributed database by LoadServer.
The embodiment of the present invention calculates the data check value for the data row to be changed that need to be verified, or calculates what need to be verified
The sum of the line number of the data to be changed and the data check value of corresponding line number, and by data row to be changed and the data school changed one's profession
Test value, the sum of the data check value of the line number of data to be changed and corresponding line number, be stored in default checklist, for rear
It is continuous to carry out consistency desired result.
When it is implemented, the present invention is to read text in internal memory by row according to text description rule, and calculate current
The ASCII value (i.e. above-mentioned data check value) of row data is saved in internal memory, when what need to be verified is continuous multirow data,
By the way that the ASCII value of each row data is added, you can obtain the sum of the ASCII value of multirow data.
After the embodiment of the present invention is split to the data to be changed, and data to be changed described in after fractionation are led
Enter to before corresponding database node, in addition to:
The database node that should be deposited respectively according to the data to be changed after the distributed distribution rules acquisition fractionation.
Data to be changed described in after fractionation are imported into corresponding database node by the embodiment of the present invention, are specifically included:
Data to be changed described in after fractionation are write to the file cache of corresponding database node, notification database cluster
Number of files and list of file names have been completed in management, and will be stored to text by the data-base cluster management trigger database broker
The data to be changed of part caching are downloaded in the database node;
Wherein, the database broker corresponds with the database node respectively.
When it is implemented, it is of the invention by the way that Current Datarow is written in corresponding database node file cache, if
After configuration file requirement present node file storage number of data lines is write completely or reached to caching, write data into file and generate new
File to be written;
After generating a certain amount of file, integrated databases can lead to according to number of files and list of file names has been completed
Primary data storehouse agency downloads to corresponding document in database server, and imported into corresponding database;
After the completion of data importing, integrated databases initiate verification, by data-base cluster management to database
Agency sends verification request, obtains the data check value of the data row of the current database node storage of database broker statistics;
Wherein, the database broker is the database generation corresponding to the database node for the data for being stored with the data row to be changed
Reason;Or the row according to the data to be changed, verification request is sent to database broker by data-base cluster management, obtained
The line number of data and the sum of corresponding data check value of the current database node of database broker statistics are taken, wherein, it is described
Database broker is the database broker corresponding to the database node for the data for being stored with the data row to be changed, and described
Database broker corresponds with the database node respectively.
Specifically, for the embodiment of the present invention after the completion of data importing, integrated databases initiate data check stream
Journey, data check request is distributed to the database broker DBAgent of all database nodes of current checklist, allows database
Act on behalf of DBAgent and count the line number of current table and the ASCII value of current checklist data, and in the feedback for receiving each node feeding back
As a result after, carry out number of data lines and data check value compares, if number of data lines is identical and sampling check value is identical, data are consistent
Property verification pass through, feedback data migrates successfully.
Fig. 2 is the schematic flow sheet of the method for another distributed data base data check of the embodiment of the present invention, below
Fig. 2 will be combined detailed explanation and illustration is carried out to method of the present invention:
S201, beginning;
S202, data export;
Will data be changed export to data and describe text, calculate the ASCII value of Current Datarow, or current line number
According to ASCII value sum;
S203, data import and verification data generation;
Specifically, the step specifically includes:Data to be changed are write to the file cache of corresponding database node, notice
Number of files and list of file names have been completed in data-base cluster management, and pass through the data-base cluster management trigger database broker
The data to be changed stored to file cache are downloaded in the database node, after the completion of data importing, pass through database
Cluster management sends verification request to database broker, obtains the data of the current database node storage of database broker statistics
Capable data check value;
S204, data check;
Compare data check value (or the data check value for importing the front and rear data row to be changed that need to be verified
With) whether consistent, if unanimously, it is determined that the data to be changed are in change self-consistentency.
S205, end.
Further detailed explanation will be carried out to method of the present invention by a specific example below and said
Bright, method of the present invention includes:
Stage one, Generating Data File:
, it is necessary to which data to be migrated are exported into data describes text, across data before Data Migration or data initialization
Storehouse and current distributed database are supported, need to export the database table for needing to migrate according to former database syntax when inter-library
File is described into text, distributed data can be exported to text by current distributed database by integrated databases
File.
Stage two, Data Migration:
According to text description rule by text by being about to digital independent into internal memory, and calculate the ASCII of Current Datarow
Value is saved in internal memory;
The database node that Current Datarow should deposit is got according to distributed distribution rules;
Current Datarow is written in corresponding database node file cache, if configuration text is write completely or reached to caching
After part requires present node file storage number of data lines, write data into file and generate new file to be written;
After generating a certain amount of file, integrated databases notify DBAgent that corresponding document is downloaded into database clothes
It is engaged in device, and imported into corresponding database;
It steps be repeated alternatively until that all data are imported into distributed data base;
Stage three, consistency verification of data:
Integrated databases receive all data importings and completed after asking, and initiate data check flow.
Data check request is distributed to the DBAgent of all database nodes of current table by integrated databases,
DBAgent is allowed to count the line number of current table and the ASCII value of current table data.
After the feedback result for receiving each node feeding back, carry out number of data lines and data check value compares, if number of data lines phase
With and sampling check value it is identical, then consistency verification of data is by the way that feedback data migrates successfully.
The example of Mariadb distributed type assemblies databases will be moved to by a specific DB2 database below to this
Invention is described in detail:
Export data:Method is provided using DB2, and data are exported into external file;
Generate checklist:Full dose verification and sampling school (are supported according to configuration verification line number and file line number generation checklist
Test);
File declustering:Data file is read by row, Current Datarow home node is calculated according to distribution rules, is judged current
Whether row data, which need, is verified, if then generation current line ASCII value is accumulated in check results and generates database
The sql sentences of Current Datarow are positioned, is written in current group verification sql files, circulates successively, it is known that file, which is read, to be terminated,
Count current file line number;
Data import:The data file split is imported into respective nodes database by database broker DBAgent
In;
Data check:After the completion of data all import, data check flow is initiated by integrated databases, compares and works as
Whether preceding document line number, check value summation are consistent with importing number of data lines summation, data check value summation in database, if unanimously
Then data are consistent before and after Data Migration, and Data Migration is completed;Need to re-start migration if inconsistent;
The present invention will be entered by an example specifically based on Mariadb distributed type assemblies data backup restorations below
Row detailed description:
Obtain full dose data:Distributed data base data are exported into text using distributed data base utility
In file;
Generation verification row-column list:(support full dose verification according to configuration verification line number and file line number generation checklist and take out
Sample verifies);
Original document is split reads data file by row, calculates Current Datarow home node according to distribution rules, judges
Whether Current Datarow, which needs, is verified, if then generation current line ASCII value is accumulated in check results and generates number
According to the sql sentences of storehouse positioning Current Datarow, it is written in current group verification sql files, circulates successively, it is known that file reads knot
Beam, count current file line number;
Data recovery:The data file split is imported into respective nodes database by database broker DBAgent
In;
Data check:After the completion of data all import, data check flow is initiated by integrated databases, compares and works as
Whether number of data lines summation, data check summation are consistent in preceding document line number, check value summation and new node, are backed up if consistent
Data are consistent before and after recovery, and full dose data recovery is completed.If inconsistent need to re-start data recovery procedure.
The present invention has following beneficial effect compared to existing distributed data base technique in the industry:
1. the performance of the present invention is good.Data check basic data of the present invention prepares, just complete in data migration process
Into without re-starting verification data set-up procedure, so as to greatly save the Data Migration duration;
2. method of the present invention does not disturb online service operation, the present invention need not increase virtual in former checklist
Row, without being locked to table, so minimum to online service impact;
3. the method for the invention verification mode is flexible, the present invention supports data from the sample survey verification and full dose data check, can
To migrate task completion time by the different verification rank of reasonable arrangement different check table to shorten current data;
4. the method for the invention supports integration across database Data Migration data check, Data Migration entrance of the present invention is data
Text is described, each database supports database to export to text and describe file, and distributed data base can pass through data
Storehouse cluster server exports to distributed data base text.
Device embodiment
The embodiments of the invention provide a kind of device of distributed data base data check, referring to Fig. 3, the device includes:
First computing unit, text is described for data to be changed to be exported into data, describing text according to derived data calculates institute
State the check value that row data are specified in data to be changed;Import unit, for being split to the data to be changed by row, and
Data to be changed described in after fractionation are imported into corresponding database node;Second computing unit, import and complete for data
Afterwards, the check value for specifying row data after importing in the data to be changed is calculated;Comparing unit, for compare import before and after it is described
Specify the check value of row data whether consistent in data to be changed, if unanimously, it is determined that the data to be changed are before changing
It is consistent afterwards.
That is, the present invention by compare import before and after in data change specified row data check value whether one
Cause, to determine that data to be changed are changing self-consistentency, efficiently solve distributed data base in the prior art and not can determine that
Data before changing after data consistency the problem of.
Further, the first computing unit is additionally operable to described in the embodiment of the present invention, is specified in data to be changed described in calculating
Certain row data check value, or the verification for the one or more continuously N row data specified in data change described in calculating
The sum of value.
That is, the present invention can by the check value for calculating certain the row data specified in data to be changed simply sampled, or
Person by the sums of the check values of the continuous N row data of the one or more specified in larger range of sample calculation data to be changed,
It is whether consistent to compare the data to be changed before and after importing.
Further, the second computing unit is additionally operable to described in the embodiment of the present invention, when the specifies behavior row data,
Calculate the check value for certain the row data specified after importing in the data to be changed;When the specifies behavior is one or more continuous
During the data of N rows, the check value of the continuous N row data of the one or more specified after importing in the data to be changed is calculated
With;
The comparing unit is additionally operable to, and when the specifies behavior row data, compares the number to be changed before and after importing
The check value for certain the row data specified in;When the data of the one or more continuous N rows of the specifies behavior, before comparing importing
The sum of the check value for the continuous N row data of one or more specified afterwards in the data to be changed.
It should be noted that data to be changed described in the embodiment of the present invention include data to be initiated, data to be migrated and
Divided data to be weighed.That is, the present invention can be to data initialization, Data Migration, data data grade data before changing again after
Uniformity is verified.Because whole data change checking procedure of the invention need not lock to current database, data
It all can independently carry out by row location database node and data distribution and data check, can just be taken when single-node data imports
Lack database server I/O, thus it is smaller to online service impact.
Further, the import unit further comprises:Module is split to split the data to be changed by row;
The database node that acquisition module should be deposited respectively according to the data to be changed after the distributed distribution rules acquisition fractionation;Lead
Enter module and data to be changed described in after fractionation are imported into corresponding database node.
That is, it is of the invention on the basis of existing business operation is not influenceed, will according to default data distribution rule
Data imported into corresponding node, are verified by whole line numbers and row sampling of data verifies two kinds of means and ensures to import in database
Data and initial data strong consistency, while sample verification line number can configure.
Further, the import unit further comprises:Module is split to split the data to be changed by row;
Import modul writes data to be changed described in after fractionation the file cache of corresponding database node, notification database cluster
Number of files and list of file names have been completed in management, and will be stored to text by the data-base cluster management trigger database broker
Part caching data to be changed download in the database node, the database broker respectively with the database node one by one
It is corresponding.
When it is implemented, it is of the invention by the way that Current Datarow is written in corresponding database node file cache, if
After configuration file requirement present node file storage number of data lines is write completely or reached to caching, write data into file and generate new
File to be written;
After generating a certain amount of file, integrated databases can lead to according to number of files and list of file names has been completed
Primary data storehouse agency downloads to corresponding document in database server, and imported into corresponding database.
When it is implemented, the second computing unit described in the embodiment of the present invention is to database generation by data-base cluster management
Haircut send verification to ask, and obtains the data check value of the data row of the current database node storage of database broker statistics;Its
In, the database broker is the database generation corresponding to the database node for the data for being stored with the data row to be changed
Reason, or, according to the row of the data to be changed, verification request is sent to database broker by data-base cluster management, obtained
Take the line number of data and the sum of corresponding data check value of the current database node of database broker statistics;Wherein, it is described
Database broker is the database broker corresponding to the database node for the data for being stored with the data row to be changed, and described
Database broker corresponds with the database node respectively.
Fig. 4 is the configuration diagram of the system of the online data migration of the embodiment of the present invention, as shown in figure 4, the present invention is real
Example is applied after the completion of data importing, and comparing unit initiates data check flow, and data check request is distributed into current verification
The database broker DBAgent of all database nodes of table, database broker DBAgent is allowed to count the line number of current table and work as
The ASCII value of preceding checklist data, and after the feedback result of each node feeding back is received, carry out number of data lines and data check value
Compare, if number of data lines is identical and sampling check value is identical, consistency verification of data is by the way that feedback data migrates successfully.
The related content of apparatus of the present invention can refer to embodiment of the method part and be understood that in this not go into detail.
Server example
The embodiments of the invention provide a kind of integrated databases, the integrated databases are implemented including device
The device of any one distributed data base data check described in example.
Related content in the embodiment of the present invention can refer to device embodiment and embodiment of the method part is understood, herein
Repeat no more.
The present invention can at least reach following beneficial effect:
The present invention by compare import before and after need checking data row change data check value it is whether consistent, or compare
Compared with the data to be changed for importing front and rear need checking line number and corresponding line number data check value with it is whether consistent, so as to
Accurately determine that data to be changed are changing self-consistentency, efficiently solve distributed data base in the prior art and not can determine that
Data before changing after data consistency the problem of.
Although being example purpose, the preferred embodiments of the present invention are had been disclosed for, those skilled in the art will recognize
Various improvement, increase and substitution are also possible, and therefore, the scope of the present invention should be not limited to above-described embodiment.