WO2018121025A1 - Système et procédé de comparaison de données de table de données - Google Patents
Système et procédé de comparaison de données de table de données Download PDFInfo
- Publication number
- WO2018121025A1 WO2018121025A1 PCT/CN2017/108196 CN2017108196W WO2018121025A1 WO 2018121025 A1 WO2018121025 A1 WO 2018121025A1 CN 2017108196 W CN2017108196 W CN 2017108196W WO 2018121025 A1 WO2018121025 A1 WO 2018121025A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- server
- database
- data
- target
- range
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/12—Applying verification of the received information
- H04L63/126—Applying verification of the received information the source of the received data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
- H04L9/0866—Generation of secret information including derivation or calculation of cryptographic keys or passwords involving user or device identifiers, e.g. serial number, physical or biometrical information, DNA, hand-signature or measurable physical characteristics
Definitions
- the present application relates to the field of databases and, more particularly, to a method and system for comparing data of a data table.
- the key-value database is the best choice for dealing with a large number of random writes and random read scenes. All data in the key-value database exists in the form of key-value.
- the key-value form has a strictly defined structure, and all data in the database exists in the underlying file system as unreversible files. The new data is written, a new key-value is generated; the old data is rewritten or deleted, and a new key-value is generated to mark the rewrite or delete.
- the big data field usually takes the function of backing up data offsite in multiple data center solutions. Therefore, verifying the consistency of data before, during and after backup data has become an important feature in the field of big data storage.
- Comparison tools are data-based comparison tools.
- the comparison tool When using the comparison tool to compare the data of two databases (working database and backup database) (the structure of the data tables in the two databases should be the same), the comparison tool will parallelize the verification task. For example, submitting a MapReduce (MR) job is distributed to many nodes for parallel execution. The comparison tool reads data from the data tables of the two databases and compares them to obtain inconsistent data.
- MR MapReduce
- the existing comparison tool compares the data in the data table line by line, the comparison efficiency is low, and the comparison tool runs slowly.
- the existing comparison technology requires the mapping framework to communicate with multiple servers of the cluster of the local database locally, and may also need to communicate with the server of the cluster of the remote database, which consumes a large amount of network resources.
- the present application provides a method and system for comparing data of a data table, which can avoid a large amount of data transmission and comparison, has a fast running speed and low cost, and has a small amount of network resources.
- a first aspect of the present application provides a method of comparing data of a data table, the method being applied to a system for comparing data of a target data table of a first database and a second database, the system comprising a client and a plurality of servers, wherein the first database corresponds to at least one first server, and the second database corresponds to at least one second server, the method comprising: the client acquiring the first database Decoding first metadata of the target data table and second metadata of the target data table in the second database, wherein the first metadata includes data of the target data table in a server of the first database a first range corresponding to the second range, wherein the second metadata includes a second range corresponding to the data of the target data table in a server of the second database; the client is according to the first range and Determining a target range by at least one of the second ranges; and the data of the target data table in the first database according to the target range by the at least one first server A first signature is a signature line; the at least one second server according to the
- the client determines the target range according to the distribution of the data of the data table, and the server signs the data according to the target range, and the client compares the signature corresponding to the data of the data table in the two databases. Consistently, it can be judged whether the data of the two data tables are consistent, avoiding a large amount of data transmission and comparison, and the running speed is fast and the cost is low, and the network resource occupation amount is small.
- each server of the first database corresponds to a first server
- the first range includes data of the target data table in each of the first databases.
- a sub-scope of the server each server of the second database corresponds to a second server
- the second range includes a sub-range of data of the target data table in each server of the second database
- the range and the data of the target data table are in a sub-range of each server of the second database, determining a sub-range of the target range, and data corresponding to each of the sub-ranges is distributed in the first database On one server, and distributed on one server in the second database.
- cross-RS data transmission across servers
- the data of the target data table in the first database is signed by the at least one first server according to the target range to obtain a first signature
- the method further includes: the client, the at least one first server Performing a tree segmentation for each of the sub-ranges with at least one of the at least one second server; the at least one first server is configured to target data in the first database according to the target range
- the data of the table is signed to obtain the first signature, including: the at least one first server signs the segment of the data of the target data table in the first database according to the tree segment to obtain a tree type Decoding a first signature; the at least one second server, according to the target range, signing data of the target data table in the second database to obtain a second signature, including: At least one second server based on the tree segment, the second segment of data in the target database data table with the second signature is a signature
- the client determines, according to the first signature and the second signature, data of a target data table in the first database and a target in the second database Whether the data of the data table is the same, including: the client determining the same layer of the first signature and the second signature tree according to the first signature of the tree type and the second signature of the tree type Whether the signatures are consistent.
- the signatures are inconsistent, it is determined that the data of the target data table in the first database in the segment corresponding to the layer is different from the data in the target data table in the second database.
- At least one of the client, the at least one first server, and the at least one second server performs a tree type for each of the sub-ranges Segmentation, comprising: the at least one first server and the at least one second server counting statistics on density of data in the target range; the at least one first server and the at least one second service According to the statistical results, for each of the children
- the range is tree segmented.
- the at least one first server according to the target range, signatures data of the target data table in the first database to obtain a first signature, including: And the at least one first server performs a first signature on the data of the target data table in the first database by using a hash algorithm according to the target range; and the at least one second server is configured according to the target range.
- the data of the target data table in the second database is signed to obtain a second signature, and the method includes: the at least one second server, according to the target range, data of the target data table in the second database by using a hash algorithm Sign the signature to get the second signature.
- a second aspect of the present application provides a system for comparing data of a data table, wherein the system is configured to compare data of a target data table of a first database and a second database, the system including a computing device running a client And running a plurality of servers of the server, wherein the first database comprises at least one first server running a first server, and the second database comprises at least one second server running a second server: the computing device is for Acquiring first metadata of the target data table in the first database and second metadata of the target data table in the second database, where the first metadata includes data of the target data table a first range corresponding to the server in the first database, where the second metadata includes a second range corresponding to data of the target data table in a server of the second database; the calculating The device is further configured to determine a target range according to at least one of the first range and the second range; the at least one first server And signing, according to the target range, data of the target data table in the first database to obtain a first signature; the at least one second server is
- each server in the first database for storing the target data table is the first server running the first server
- the first The range includes data of the target data table in a sub-range of each of the first servers of the first database
- each server in the second database for storing the target data table is running
- the second range includes a sub-range of the data of the target data table in each of the second servers of the second database
- the computing device is specifically configured to:
- the data of the target data table is in a sub-range of each of the first servers of the first database and data of the target data table is in a sub-range of each of the second servers of the second database, Determining a sub-range of the target range, data corresponding to each of the sub-ranges is distributed on one server in the first database, and distributed on one server in the second database.
- the first server signs, according to the target range, data of a target data table in the first database to obtain a first signature
- the second server is configured according to The target range, at least one of the computing device, the at least one first server, and the at least one second server before signing data of the target data table in the second database to obtain a second signature
- the at least one first server is specifically configured to: perform segmentation of data of the target data table in the first database according to the tree segment Signing the first signature of the tree type
- the at least one second server is specifically configured to: sign the segment of the data of the target data table in the second database according to the tree segment to obtain a tree type The second signature.
- the computing device is specifically configured to: determine, according to the first signature of a tree type and the second signature of a tree, the first signature and the first Whether the signatures of the same layer of the two signed trees are consistent. When the signatures are inconsistent, it is determined that the data of the target data table in the first database is different from the data of the target data table in the second database. .
- the at least one first server and the at least one second server are configured to perform statistics on density of data in the target range; the at least one first server and The at least one second server is configured to perform tree segmentation for each of the sub-ranges according to a statistical result.
- the at least one first server is specifically configured to: according to the target range, sign the data of the target data table in the first database by using a hash algorithm a signature; the at least one second server is specifically configured to: according to the target range, sign the data of the target data table in the second database by using a hash algorithm to obtain a second signature.
- the third aspect of the present application provides a storage medium in which a program is stored, and when the program is run by a computing device and a server, the computing device and the server perform the foregoing first aspect or any implementation of the first aspect.
- the storage medium includes, but is not limited to, a read only memory, a random access memory, a flash memory, an HDD, or an SSD.
- a fourth aspect of the present application provides a computer program product comprising program instructions for performing the foregoing first aspect or first aspect when the computer program product is executed by a computing device and a server
- An implementation provides a method of comparing data of a data table.
- the computer program product can be a software installation package, and in the case of the method of comparing the data of the data table provided by any of the foregoing first aspect or the first aspect, the computer program product can be downloaded and used in the computing device And execute the computer program product on the server.
- FIG. 1 is a schematic diagram of a method of comparing data of a data table using a comparison tool.
- FIG. 2 is a schematic block diagram of a system for comparing data of a data table in accordance with one embodiment of the present invention.
- FIG. 3 is a schematic block diagram of a system for comparing data of a data table in accordance with another embodiment of the present invention.
- FIG. 4 is a schematic flow chart of a method of comparing data of a data table according to an embodiment of the present invention.
- Figure 5 is a schematic illustration of the segmentation target range of one embodiment of the present invention.
- Figure 6 is a schematic illustration of a segmentation target range in accordance with another embodiment of the present invention.
- Figure 7 is a schematic illustration of a segmentation target range in accordance with another embodiment of the present invention.
- Figure 8 is a schematic illustration of a segmentation target range in accordance with another embodiment of the present invention.
- Figure 9 is a schematic illustration of a segmentation target range in accordance with another embodiment of the present invention.
- Figure 10 is a schematic illustration of the results of the segmentation of the target range of one embodiment of the present invention.
- FIG. 11 is a schematic diagram of a tree-type signature in accordance with an embodiment of the present invention.
- Figure 12 is a schematic block diagram of a computing device or server in accordance with one embodiment of the present invention.
- the existing comparison tool is a data-based comparison tool.
- the comparison tool parallelizes the verification tasks.
- the following is a description of the process of comparing the data of the data tables in the database with the Hadoop database (Hbase) and the existing comparison tool as an example.
- 1 is a schematic diagram of a method 100 of a prior comparison tool comparing data of a data table.
- the method 100 includes:
- the existing comparison tool submits an MR job to the Hbase cluster corresponding to the database of the data center (DC) 1.
- the remote controller (RM) of the Hbase cluster distributes the MR job to many nodes for parallel execution, that is, assigns the MR job to multiple map tasks.
- each map task is responsible for comparing a part of the data.
- Each map task reads data from the HBase clusters of the two data centers DC1 and DC2, then compares the data and prints inconsistent data.
- each server in the HBase cluster is configured with a Service Area Server (RS), which is used to manage tasks running on the server.
- RS Service Area Server
- the existing comparison tool compares the data in the data table line by line, the comparison efficiency is low, and the comparison tool runs slowly.
- the existing comparison tool not only requires the participation of two HBase clusters, but also requires the cluster to provide the running nodes of the RM jobs, and the comparison tools occupy and operate at a higher cost.
- the existing comparison technology requires the mapping framework to communicate with the RSs of multiple servers of the HBase cluster of the local database locally, and may also need to communicate with the RS of the server of the HBase cluster of the remote database, which takes up a large amount of Internet resources.
- an embodiment of the present invention provides a method for comparing data of a data table.
- 2 shows a schematic block diagram of a system 200 for comparing data of a data table in accordance with an embodiment of the present invention.
- the system 200 illustrated in Figure 2 is a schematic block diagram of the perspective of software.
- the system 200 includes a client 210 and a plurality of servers from a software perspective, wherein each database corresponds to at least one server, and the first database corresponds to at least one first server 221, and the second database corresponds to At least one second server 222.
- FIG. 3 shows a schematic block diagram of a system 300 for comparing data of a data table in accordance with an embodiment of the present invention.
- system 300 includes a computing device 310 running a client and a plurality of servers running a server.
- the client 210 can be deployed on the user's computing device 310.
- the computing device 310 is not usually a server corresponding to any database, that is, a server that is not normally a DC.
- the first server 221 can be deployed in the first DC corresponding to the first database.
- a server 321 can be deployed on the second server 322 of the second DC corresponding to the second database.
- a first server 221 may be deployed on each server of the first database for storing the data table, that is, the server deploying the first server 221 is considered to be the first server 321; and the second database is configured to store data.
- a second server 222 can be deployed on each server of the table, that is, the server deploying the second server 222 is considered to be the second server 322.
- a plurality of servers in each database may share a server, which is not limited in this embodiment of the present invention.
- the number of the first server and the second server shown in FIG. 2, and the number of the first server and the second server shown in FIG. 3 are only schematic, and are not intended to limit the embodiments of the present invention.
- Metadata is acquired, and the metadata is generally stored in a meta table, and the meta table is usually stored in another database other than the server that stores the data table in the database.
- the meta table of the first database is schematically shown in FIG. 3 and stored on the third server 323 of the first database, and the meta table of the second database is stored on the fourth server 324 of the second database.
- the meta table can also be stored. The embodiment of the present invention does not limit this on the server that stores the data table in the database.
- the server for storing the data table (for example, the first server and the second server) may be regarded as a storage node, and the server is deployed on the storage node, and the server may be part of the function of the RS, or may exist independently with the RS.
- the server that stores the meta table can be considered a metadata management node.
- server of the embodiment of the present invention may be used as a function module of the RS, or may be a separate module or unit, which is not limited by the embodiment of the present invention.
- method 400 includes:
- the client 210 acquires the first metadata of the target data table in the first database and the second metadata of the target data table in the second database, where the data including the target data table in the first metadata is in the server of the first database.
- the second metadata includes a second range corresponding to the data of the target data table in the server of the second database;
- the client 210 determines a target range according to at least one of the first range and the second range.
- the at least one first server 221 signs the data of the target data table in the first database according to the target range to obtain a first signature
- the at least one second server 222 signs the data of the target data table in the second database according to the target range to obtain a second signature.
- the client 210 determines, according to the first signature and the second signature, whether data of the target data table in the first database is the same as data of the target data table in the second database.
- the client determines the target range according to the data distribution of the data table, and the server signs the data according to the target range, and the client compares whether the signatures corresponding to the data of the data tables in the two databases are consistent. Whether the data of the two data tables are consistent, avoiding a large amount of data transmission and comparison, the operation speed is fast and the cost is low, and the network resource occupation amount is small.
- the first database and the second database where the target data table to be compared in the embodiment of the present invention are located belong to different databases, and the two databases may further belong to clusters of servers of different data centers.
- the two databases may belong to the same data center, which is not limited by the embodiment of the present invention.
- the data table in the database is large, and it is generally required to divide the data table horizontally and store it on multiple servers to enhance the speed of concurrent processing.
- the client 210 communicates with the server storing the first database and the second database of the target data table, respectively, to obtain the first metadata of the target data table in the first database and the second data of the target data table in the second database.
- Metadata is generally stored in a meta table.
- the meta table is usually stored in a database other than the server storing the data table.
- the meta table can also be stored in a database in the database for storing the data table. This embodiment of the present invention does not limit this.
- the client 210 obtains two corresponding meta tables of the target data tables of the two databases, that is, obtains the first metadata and the second metadata. It is assumed that each database includes three servers, one RS is run on each server, and each RS corresponds to a region in which the target data table is stored. According to the first metadata and the second metadata, a range distribution corresponding to each region is obtained, that is, a start key and an end key. Wherein, the data including the target data table in the first metadata is in the server of the first database Corresponding the first range, the second metadata includes the second range corresponding to the data of the target data table in the server of the second database. In a specific example, the distribution of the target data table table1 can be as shown in Table 1.
- the target data table of the first database has a key range of 1-30 on the RS1 of the first database, a key range of 31-80 on the RS2 of the first database, and a range of keys on the RS3 of the first database. It is 81-100.
- the target data table of the second database has a key range of 1-25 on RS1 of the second database, a key range of 26-60 on RS2 of the second database, and a range of keys on RS3 of the second database. It is 61-100.
- the client 210 determines the target range according to at least one of the first range and the second range.
- each server of the first database corresponds to a first server 221, and the first range includes a sub-range of data of the target data table in each server of the first database, and the second database
- Each server corresponds to a second server 222, and the second range includes data of the target data table in a sub-scope of each server of the second database.
- the client 210 determines the target range according to at least one of the first range and the second range, and may include: the client 210 according to the data of the target data table, the sub-range of each server of the first database, and the target data table according to the data of the target data table.
- the data is in a sub-range of each server of the second database, and the sub-range of the target range is determined.
- the data corresponding to each sub-range is distributed on one server in the first database and distributed on one server in the second database.
- the client 210 may perform a segmentation of the maximum matching target of the repetition range according to the first range and the second range (ie, the distribution of the start key and the end key) corresponding to the two data tables, respectively, to obtain a target range.
- the target range includes a plurality of sub-ranges, and the data corresponding to each sub-range is distributed on one server in the first database and distributed on one server in the second database. In this way, when the data is signed, the data transmission between the servers (cross-RS) is no longer required, which can further improve the running speed and reduce the occupation of network resources.
- a scheme for dividing the sub-range of the target range is described in detail below. This scheme not only makes the sub-range of the target range distributed on one server in the first database, but also distributes it on one server in the second database; and it also ensures that the number of sub-ranges divided is the least.
- the specific steps of the segmentation can be as follows.
- Step 1 The client 210 forms two region queues by distributing the target data tables of the two databases on the server in a descending order according to row keys.
- the first range corresponds to the region queue A (A1, A2, 7)
- the second range corresponds to the region queue B (B1, B2, ).
- the client 210 sequentially selects regions from the two region queues.
- Step 2 The client 210 compares the ranges of the selected two regions (for example, Ax and By) to see if the two regions overlap.
- the ranges of the selected two regions for example, Ax and By
- the start key smaller region is output as the already segmented region (ie, a sub-range of the target range), and then the region is removed from the region queue in which the region with the smaller start key is located. Region, then continue to repeat step 2 and continue the comparison.
- any one of the regions is output as the already-divided region C1 (ie, a sub-range of the target range), and then from the two region queues. Take the next region separately, and then repeat the operation of step 2 to continue the comparison.
- region B1 is segmented by start key and end key of region A1, and C1, C2, and B1-(the remaining portion of region B1) are obtained.
- C1 and C2 are saved as the result of the segmentation, and the next region A2 of B1- and region queue A is taken as the two regions to be compared, and the comparison of step 2 is performed.
- the start key of region B1 is smaller than the start key of region A1, and the end key of region B1 is also smaller than the end key of region A1.
- the start key of region A1 and the end key of region B1 are used as the segmentation criteria.
- A1 and region B1 are segmented.
- the first two regions C1 and region C2 (the sub-ranges of the target range respectively) obtained after the segmentation are output as the result, and the remaining region A1 of the region A1 and the next region B2 of the region queue B are regarded as two to be compared.
- Region performs a comparison of step 2.
- the start key of region A1 is used as a segmentation criterion, and region A1 and region B1 are segmented.
- region A1 and region B1 are segmented.
- two regions C1 and region C2 are obtained as the segmentation result output, and then the next region A2 of the region queue A and the next region B2 of the region queue B are taken as the two to be compared.
- the regions are compared in step 2.
- Step 3 The client 210 sequentially reads the region in the first range and the region in the second range corresponding to the target data table of the two databases until the division is completed.
- the target range includes 5 sub-ranges, and each sub-range is distributed on one RS whether in the first database or the second database, and does not cross the RS.
- the client 210 may also use one of the first range and the second range as the target range.
- the specific manner of dividing the target range is not limited in the embodiment of the present invention.
- each sub-range of the above target range can be directly used as the finest granularity, and the data of the target data table in the two databases is signed by the server.
- At least one first server at S330 according to the target range, signatures data of the target data table in the first database to obtain a first signature
- S340 at least one second server
- the method 300 may further include: in the client, the at least one first server, and the at least one second server, before the data of the target data table in the second database is signed to obtain the second signature.
- At least one tree segmentation is performed for each sub-range; and the S330, at least one first server, signs the data of the target data table in the first database to obtain the first signature according to the target range, and may include: at least one first server according to the at least one first server a tree segment segmenting, signing a segment of the data of the target data table in the first database to obtain a first signature of the tree type; S340, at least one second server terminal performing data on the target data table in the second database according to the target range
- the signing of the second signature may include: at least one second server signing the segment of the data of the target data table in the second database according to the tree segment to obtain the second signature of the tree.
- At least one of the client, the at least one first server, and the at least one second server performs tree segmentation for each sub-range, including: at least one first server and at least one second service
- the terminal performs statistics on the density of the data in the target range; at least one first server and at least one second server perform tree segmentation for each sub-range according to the statistical result.
- the client 210 encapsulates the information of the sub-range of the segmented target range into a request for the statistical count and sends it to the server of the two databases. Because the data structure of the target data table in the two databases to be compared is the same, it is only necessary for each sub-range to perform statistical counting on the server of any one of the two databases.
- a load balancing operation is performed on servers in two databases. As shown in Table 2, the sub-range [0-25] is assigned to the second server of the second database (corresponding to RS1) to count the density, and the sub-range [26-30] is assigned to the first service of the first database. The end (corresponding to RS1) is used to count the density.
- the sub-range [81-100] can be assigned to either the first server of the first database or the second server of the second database. In this way, no RS is idle, and no RS is too busy, which can balance the load of each server.
- the load balancing of each server may be disregarded, and the client 210 may select the server of any one of the two databases to count the data density; or the client 210 may access the two databases.
- a database is selected, and the statistical data density is used by the server of the selected database.
- Table 2 shows the density statistics
- the RS2 statistics of the second database obtain the density of the data in the sub-range [31-58], and the sub-range is segmented, and the sub-range [31-58] is divided into trees with two branches per layer.
- the shape, the lowest layer of the tree (ie, the finest segments) are [31-37][38-44][45-51][52-58].
- the RS2 of the second database encapsulates the information and sends it to the RS2 of the first database.
- the format may be "start key, end key, least size, child size" as follows, and the value is "31, 58, 7, 2".
- the RS2 of the first database obtains the information of the tree grouping.
- the second server of the first database (corresponding to RS2) reads the data according to the tree segment, and signs the segment of the data of the target data table in the first database to obtain the first signature of the tree.
- reading data is a link that takes a long time. Therefore, the RS2 of the second database can complete the signature while counting the density of the data in the sub-range of the target range.
- the process of signing the data segment by the server according to the tree segmentation to obtain the tree signature can be as follows.
- the server performs a signature operation on each of the lowest-level segments of each sub-range tree, and then performs a bottom-up tree construction operation according to the branches of the tree.
- Figure 11 is a diagram showing the creation of a tree-type signature in accordance with one embodiment of the present invention.
- the embodiment of the present invention uses a hash algorithm to sign the data.
- the data may be signed by the Message Digest Algorithm 5 (MD5).
- the at least one first server of the S330 signs the data of the target data table in the first database according to the target range to obtain the first signature, which may include: at least one first server is configured by the hash algorithm according to the target range.
- the data of the target data table in a database is signed to obtain a first signature;
- S340 at least one second server signs the data of the target data table in the second database according to the target range to obtain a second signature, which may include: at least one second The server signs the data of the target data table in the second database by the hash algorithm according to the target range to obtain the second signature.
- each server After each server is signed, the first signature of the tree or the second signature of the tree can be fed back to the client 210.
- each sub-range in the embodiment of the present invention corresponds to a tree-shaped signature, so there may be multiple first signatures and multiple second signatures.
- Each server may also feed back only the signature of the highest layer of the first signature of the tree or the signature of the highest layer of the second signature of the tree to the client 210.
- the signatures of the highest layer are inconsistent, the signature of the lower layer is sent to the client 210 for comparison, which is not limited by the embodiment of the present invention.
- Client 210 receives signatures for sub-ranges of target ranges from both databases. The client 210 compares the signatures. If the signatures of the highest layer are equal, it is considered that the contents of the target data tables in the two databases are consistent, and the comparison ends.
- the client 210 finds that the signatures of the highest layer are not equal, the signatures of the lower layers are compared in turn until the most fine-grained segments with inconsistent signatures are found, and it is determined which data is inconsistent. Alternatively, if the client 210 finds that the signatures of the highest layer are not equal, the server is required to return the signature of the next layer, and the client 210 continues to compare the returned signatures. If any of the signatures are found to be inconsistent, the server is required to continue to return to the next layer. Sign until you find the signature is inconsistent The finest-grained segmentation.
- the S350 client determines, according to the first signature and the second signature, whether the data of the target data table in the first database is the same as the data of the target data table in the second database, and may include: the client first according to the tree type The signature and the second signature of the tree determine whether the signatures of the same layer of the first signature and the second signature are consistent. When the signatures are inconsistent, the data of the target data table in the first database is determined by the segment corresponding to the layer. The data of the target data table in the second database is different.
- the client 210 can perform a small-range query on the target data table of the two databases according to the most fine-grained segmentation in which the signatures are inconsistent, and the read data is compared in the client 210 by string comparison, that is, Detailed data sheet differences can be obtained.
- the embodiment of the present invention may not be used for detailed comparison, and only the data of the target data table is consistent, which is not limited by the embodiment of the present invention.
- FIG. 12 shows a schematic block diagram of an apparatus 500 in accordance with an embodiment of the present invention, which may correspond to any of the computing devices or servers referred to in FIG. 3 of an embodiment of the present invention.
- device 500 can include a processor 510, a memory 520, and a network interface 530.
- the processor 510 can be used to execute the method of the embodiment of the present invention
- the memory 520 can be used to store code executed by the processor 510
- the network interface 530 is used to communicate with other devices.
- the computing device 310 of FIG. 3 can also include an output device or an output interface coupled to the output device for outputting a comparison result.
- Output devices can include displays, printers, and the like.
- the processor, memory and network interface in device 500 can communicate with one another via internal connection paths to communicate control and/or data signals.
- the disclosed systems, devices, and methods may be implemented in other manners.
- the system embodiment described above is merely illustrative.
- the division of the unit is only a logical function division, and the actual implementation may have another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
- the technical solution of the present invention is essentially or The portion that contributes to the prior art or the portion of the technical solution may be embodied in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be an individual) A computer, server, or network device, etc.) performs all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
La présente invention concerne un système et un procédé de comparaison des données d'une table de données. Le système comprend un client et une pluralité de serveurs. Une première base de données correspond à au moins un premier serveur, et une seconde base de données correspond à au moins un second serveur. Le client acquiert des premières métadonnées et des secondes métadonnées d'une table de données cible dans les deux bases de données, les premières métadonnées comprenant une première plage correspondant aux données de la table de données cible, les secondes métadonnées comprenant une seconde plage correspondant aux données de la table de données cible. Le client détermine une plage cible conformément aux première et/ou seconde plages. Le premier serveur signe, conformément à la plage cible, les données de la table de données cible dans la première base de données afin d'obtenir une première signature ; de même, le second serveur obtient une seconde signature. Le client détermine, conformément à la première signature et à la seconde signature, si les données de la table de données cible dans les deux bases de données sont identiques, évitant une transmission excessive de données et une comparaison excessive de données. L'invention présente les avantages d'une vitesse de fonctionnement élevée, d'un faible coût et d'une faible occupation de ressources de réseau.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611260662.8 | 2016-12-30 | ||
CN201611260662.8A CN107070645B (zh) | 2016-12-30 | 2016-12-30 | 比较数据表的数据的方法和系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018121025A1 true WO2018121025A1 (fr) | 2018-07-05 |
Family
ID=59624007
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/108196 WO2018121025A1 (fr) | 2016-12-30 | 2017-10-28 | Système et procédé de comparaison de données de table de données |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107070645B (fr) |
WO (1) | WO2018121025A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960613A (zh) * | 2019-03-11 | 2019-07-02 | 中国银联股份有限公司 | 一种数据批处理的方法及装置 |
CN110287182A (zh) * | 2019-05-05 | 2019-09-27 | 浙江吉利控股集团有限公司 | 一种大数据的数据对比方法、装置、设备及终端 |
CN112395276A (zh) * | 2020-11-13 | 2021-02-23 | 中国人寿保险股份有限公司 | 一种数据比对方法及相关设备 |
CN112613808A (zh) * | 2020-12-15 | 2021-04-06 | 嘉兴蓝匠仓储系统软件有限公司 | 一种使用rfid群读出入库物料的方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107070645B (zh) * | 2016-12-30 | 2020-06-16 | 华为技术有限公司 | 比较数据表的数据的方法和系统 |
CN109739831A (zh) * | 2018-11-23 | 2019-05-10 | 网联清算有限公司 | 数据库之间数据校验方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102084416A (zh) * | 2008-02-21 | 2011-06-01 | 史诺有限公司 | 音视频签名、导出签名的方法以及比较音视频数据的方法 |
US8744840B1 (en) * | 2013-10-11 | 2014-06-03 | Realfusion LLC | Method and system for n-dimentional, language agnostic, entity, meaning, place, time, and words mapping |
CN104391894A (zh) * | 2014-11-11 | 2015-03-04 | 广州科腾信息技术有限公司 | 一种重复数据的检查处理方法 |
CN107070645A (zh) * | 2016-12-30 | 2017-08-18 | 华为技术有限公司 | 比较数据表的数据的方法和系统 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9002792B2 (en) * | 2012-11-19 | 2015-04-07 | Compellent Technologies | Confirming data consistency in a data storage environment |
CN104111937A (zh) * | 2013-04-18 | 2014-10-22 | 中兴通讯股份有限公司 | 主、备数据库及其数据一致性检测、修复方法和装置 |
CN103646073A (zh) * | 2013-12-11 | 2014-03-19 | 浪潮电子信息产业股份有限公司 | 一种基于HBase表的条件查询优化方法 |
CN104077373B (zh) * | 2014-06-24 | 2018-12-04 | 北京京东尚科信息技术有限公司 | 一种数据一致性校验方法 |
CN105677645B (zh) * | 2014-11-17 | 2018-12-21 | 阿里巴巴集团控股有限公司 | 一种数据表比对方法和装置 |
CN105988889B (zh) * | 2015-02-11 | 2019-06-14 | 阿里巴巴集团控股有限公司 | 一种数据校验方法及装置 |
CN105989089A (zh) * | 2015-02-12 | 2016-10-05 | 阿里巴巴集团控股有限公司 | 一种数据对比方法及装置 |
US9910906B2 (en) * | 2015-06-25 | 2018-03-06 | International Business Machines Corporation | Data synchronization using redundancy detection |
-
2016
- 2016-12-30 CN CN201611260662.8A patent/CN107070645B/zh active Active
-
2017
- 2017-10-28 WO PCT/CN2017/108196 patent/WO2018121025A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102084416A (zh) * | 2008-02-21 | 2011-06-01 | 史诺有限公司 | 音视频签名、导出签名的方法以及比较音视频数据的方法 |
US8744840B1 (en) * | 2013-10-11 | 2014-06-03 | Realfusion LLC | Method and system for n-dimentional, language agnostic, entity, meaning, place, time, and words mapping |
CN104391894A (zh) * | 2014-11-11 | 2015-03-04 | 广州科腾信息技术有限公司 | 一种重复数据的检查处理方法 |
CN107070645A (zh) * | 2016-12-30 | 2017-08-18 | 华为技术有限公司 | 比较数据表的数据的方法和系统 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960613A (zh) * | 2019-03-11 | 2019-07-02 | 中国银联股份有限公司 | 一种数据批处理的方法及装置 |
CN110287182A (zh) * | 2019-05-05 | 2019-09-27 | 浙江吉利控股集团有限公司 | 一种大数据的数据对比方法、装置、设备及终端 |
CN112395276A (zh) * | 2020-11-13 | 2021-02-23 | 中国人寿保险股份有限公司 | 一种数据比对方法及相关设备 |
CN112395276B (zh) * | 2020-11-13 | 2024-05-28 | 中国人寿保险股份有限公司 | 一种数据比对方法及相关设备 |
CN112613808A (zh) * | 2020-12-15 | 2021-04-06 | 嘉兴蓝匠仓储系统软件有限公司 | 一种使用rfid群读出入库物料的方法 |
Also Published As
Publication number | Publication date |
---|---|
CN107070645A (zh) | 2017-08-18 |
CN107070645B (zh) | 2020-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018121025A1 (fr) | Système et procédé de comparaison de données de table de données | |
US11422853B2 (en) | Dynamic tree determination for data processing | |
US9996593B1 (en) | Parallel processing framework | |
US8832130B2 (en) | System and method for implementing on demand cloud database | |
US9020802B1 (en) | Worldwide distributed architecture model and management | |
US8417991B2 (en) | Mitigating reduction in availability level during maintenance of nodes in a cluster | |
CN106339254B (zh) | 一种虚拟机快速启动方法、装置及管理节点 | |
CN101957863A (zh) | 数据并行处理方法、装置及系统 | |
US10541936B1 (en) | Method and system for distributed analysis | |
US20060095435A1 (en) | Configuring and deploying portable application containers for improved utilization of server capacity | |
CN109194711B (zh) | 一种组织架构的同步方法、客户端、服务端及介质 | |
CN108874558A (zh) | 分布式事务的消息订阅方法、电子装置及可读存储介质 | |
WO2017028394A1 (fr) | Procédé et appareil de récupération de données distribuées basée sur des exemples | |
US10185743B2 (en) | Method and system for optimizing reduce-side join operation in a map-reduce framework | |
US10558373B1 (en) | Scalable index store | |
US20200073993A1 (en) | Synchronizing in-use source data and an unmodified migrated copy thereof | |
CN107276914B (zh) | 基于cmdb的自助资源分配调度的方法 | |
US8667008B2 (en) | Search request control apparatus and search request control method | |
EP3811227B1 (fr) | Procédés, dispositifs et systèmes pour des mises à niveau non perturbatrices d'un moteur de coordination distribué dans un environnement informatique distribué | |
CN113535673A (zh) | 生成配置文件及数据处理的方法和装置 | |
CN110019057B (zh) | 请求处理方法及装置 | |
US11157454B2 (en) | Event-based synchronization in a file sharing environment | |
Srinivasan et al. | Techniques and Efficiencies from Building a Real-Time DBMS | |
CN106168983B (zh) | 一种混合资源处理方法及装置 | |
CN112487089B (zh) | 基于数据页路由的分布式存储方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17889190 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17889190 Country of ref document: EP Kind code of ref document: A1 |