CN112597127A - Cross-cluster access method, device, equipment and storage medium - Google Patents

Cross-cluster access method, device, equipment and storage medium Download PDF

Info

Publication number
CN112597127A
CN112597127A CN202011483662.0A CN202011483662A CN112597127A CN 112597127 A CN112597127 A CN 112597127A CN 202011483662 A CN202011483662 A CN 202011483662A CN 112597127 A CN112597127 A CN 112597127A
Authority
CN
China
Prior art keywords
data
cluster
tables
initial
migration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011483662.0A
Other languages
Chinese (zh)
Inventor
黄李强
熊志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Hanyun Technology Co ltd
Original Assignee
Shenzhen Hanyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Hanyun Technology Co ltd filed Critical Shenzhen Hanyun Technology Co ltd
Priority to CN202011483662.0A priority Critical patent/CN112597127A/en
Publication of CN112597127A publication Critical patent/CN112597127A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of data migration, and discloses a cross-cluster access method, a device, equipment and a storage medium. The method comprises the following steps: receiving data query instructions of N data initial tables; calculating the migration cost values corresponding to the N data initial tables, and sequentially calculating the sum of the migration cost values of the data initial tables stored corresponding to the M data clusters to obtain the cost removal values corresponding to the M data clusters; adding the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtracting the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters; comparing the M cluster migration total values to obtain a minimum cluster migration total value, and determining a data cluster corresponding to the cluster migration total value as a calculation cluster; and creating data transfer tables corresponding to the N initial data tables in the computing cluster, and copying data in the N initial data tables to the data transfer tables.

Description

Cross-cluster access method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data migration, and in particular, to a cross-cluster access method, apparatus, device, and storage medium.
Background
In the data processing process, data is increasingly used by governments and enterprises, and a large amount of data is stored in various databases. In actual use of the data, a large amount of data is distributed in the database of each site. When data is inquired and called, the clusters are called mutually, and a large amount of cross-cluster access occurs in data migration. In the case of suddenly increasing data volume access, various kinds of call access are very easy to cause system crash.
Under the condition of data inter-call, the performance and the safety of cross-cluster data access have certain problems, not only the efficiency is not high, but also the cost is high due to various repeated data migration, the performance influence is very serious, and the cross-cluster transmission safety is directly influenced by calling various interfaces and protocols, so that the technical problem of ensuring the performance and the transmission safety is needed.
Disclosure of Invention
The invention mainly aims to solve the technical problems of low performance and insecurity in the existing cross-cluster data transmission process.
The invention provides a cross-cluster access method, which comprises the following steps:
receiving a data query instruction of N initial data tables, wherein the N initial data tables are distributed and stored in M data clusters, N is not less than M, and N, M is a positive integer;
calculating migration cost values corresponding to the N data initial tables, and sequentially calculating the sum of the migration cost values of the data initial tables stored corresponding to the M data clusters to obtain cost removal values corresponding to the M data clusters;
adding the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtracting the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters;
comparing the M cluster migration total values to obtain a minimum cluster migration total value, and determining a data cluster corresponding to the cluster migration total value as a computing cluster;
and creating data transfer tables corresponding to the N initial data tables in the computing cluster, copying data in the N initial data tables to the data transfer tables, and executing the data query instruction in the computing cluster.
Optionally, in a first implementation manner of the first aspect of the present invention, after receiving a data query instruction for N initial data tables, before calculating migration cost values corresponding to the N initial data tables and sequentially calculating a sum of the migration cost values of the initial data tables stored in correspondence to M data clusters to obtain cost removal values corresponding to the M data clusters, the method includes:
judging whether the data query instruction meets a preset SQL language structure;
if the data is not satisfied, sending information which cannot be subjected to data query to a preset port;
and if so, analyzing SQL links in the data query instruction, and accessing N data initial tables according to the SQL links.
Optionally, in a second implementation manner of the first aspect of the present invention, the calculating migration cost values corresponding to N data initial tables, and sequentially calculating a sum of the migration cost values of the data initial tables stored in correspondence to M data clusters, to obtain cost removal values corresponding to M data clusters includes:
reading the effective line number and the average line length corresponding to the data initial table;
calculating the product of the effective line number and the average line length corresponding to the initial data table to obtain a migration cost value corresponding to the initial data table;
and sequentially summing the migration cost values corresponding to the data initial tables stored by the M data clusters to obtain the cost removal values corresponding to the M data clusters.
Optionally, in a third implementation form of the first aspect of the invention,
the creating of the data transfer tables corresponding to the N initial data tables in the computing cluster, the copying of the data in the N initial data tables to the data transfer tables, and the executing of the data query instruction in the computing cluster include:
reading L data initial tables stored in the computing cluster, and marking the L data initial tables to obtain L marked data tables, wherein L is a positive integer smaller than N;
shifting out L marked data tables in the N data initial tables to obtain N-L unmarked data initial tables in the N data initial tables;
and creating a data transfer table corresponding to the N-L non-marked data initial tables in the computing cluster, copying the data of the N-L non-marked data initial tables into the data transfer table, and executing the data query instruction in the computing cluster.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the creating, in the computing cluster, data transfer tables corresponding to N-L non-marked data initial tables, and copying data of the N-L non-marked data initial tables to the data transfer tables includes:
reading verification character strings corresponding to the N-L non-marking data initial tables;
creating N-L data transfer tables in the computing cluster, and writing the N-L verification character strings into the N-L data transfer tables in a one-to-one correspondence manner;
and copying the data of the N-L non-marked data initial tables into the data transfer table according to the matching relation of the verification character strings.
Optionally, in a fifth implementation manner of the first aspect of the present invention, after the creating data transfer tables corresponding to the N initial data tables in the computing cluster, copying data in the N initial data tables to the data transfer tables, and executing the data query instruction in the computing cluster, the method further includes:
and receiving a query termination instruction, and deleting all the data transfer tables in the computing cluster.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the comparing M total cluster migration values to obtain a minimum total cluster migration value, and determining a data cluster corresponding to the total cluster migration value as a computing cluster includes:
selecting two cluster migration total values from the M cluster migration total values to be compared with each other, marking the larger cluster migration total value as a non-target option, and performing cyclic comparison to obtain the smallest cluster migration total value, wherein the non-target option does not enter the cluster migration total value of the cyclic comparison;
and accessing the data cluster corresponding to the minimum cluster migration total value, and marking the data cluster as a computing cluster.
A second aspect of the present invention provides a cross-cluster access device, including:
the receiving module is used for receiving data query instructions of N initial data tables, wherein the N initial data tables are distributed and stored in M data clusters, N is not less than M, and N, M is a positive integer;
the calculation module is used for calculating migration cost values corresponding to the N data initial tables, and sequentially calculating the sum of the migration cost values of the data initial tables stored corresponding to the M data clusters to obtain cost removal values corresponding to the M data clusters;
a summing and subtracting module, configured to sum up the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtract the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters;
the comparison module is used for comparing the M total cluster migration values to obtain the minimum total cluster migration value, and determining a data cluster corresponding to the total cluster migration value as a calculation cluster;
and the copying module is used for creating data transfer tables corresponding to the N initial data tables in the computing cluster, copying data in the N initial data tables to the data transfer tables, and executing the data query instruction in the computing cluster.
A third aspect of the present invention provides a cross-cluster access device, comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor invokes the instructions in the memory to cause the cross-cluster access device to perform the cross-cluster access method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the above-described cross-cluster access method.
Drawings
FIG. 1 is a diagram of a first embodiment of a cross-cluster access method in an embodiment of the present invention;
FIG. 2 is a diagram of a second embodiment of a cross-cluster access method in an embodiment of the present invention;
FIG. 3 is a diagram of a third embodiment of a cross-cluster access method in an embodiment of the present invention;
FIG. 4 is a diagram of an embodiment of a cross-cluster access device in an embodiment of the invention;
FIG. 5 is a schematic diagram of another embodiment of an access device across a cluster in an embodiment of the invention;
fig. 6 is a schematic diagram of an embodiment of an access device across clusters in the embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a cross-cluster access method, a cross-cluster access device, cross-cluster access equipment and a storage medium.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a cross-cluster access method in the embodiment of the present invention includes:
101. receiving a data query instruction of N initial data tables, wherein the N initial data tables are distributed and stored in M data clusters, N is not less than M, and N, M is a positive integer;
in this embodiment, the language of the SQL database is adopted, and a specific SQL language instruction is attached, and the content of the adopted language is as follows:
CREATE TABLE[IF NOT EXISTS]tblna
me[(create_definition,...)]
ENGINE=LINKED
CONNECTION‘user/password@ip:port/db.tbl’
in the SQL language instruction, table _ name represents a table name to be created, create _ definition represents a table structure field definition, user represents a user name connected with an opposite-end cluster, password represents a password connected with the opposite-end cluster, IP represents an IP of the opposite-end cluster, port represents a port of the opposite-end cluster, db represents a database name of the opposite-end cluster, and tbl represents the table name of the opposite-end cluster. CREATE is a CREATE table instruction and CONNECTION is a connect instruction. It will be appreciated that 3 data initial tables are distributed across 3 clusters, and an instruction to read the 3 data initial tables is received.
102. Calculating the migration cost values corresponding to the N data initial tables, and sequentially calculating the sum of the migration cost values of the data initial tables stored corresponding to the M data clusters to obtain the cost removal values corresponding to the M data clusters;
in this embodiment, the number of rows of valid lines rows and the row average length avg _ length of the three data initial tables are read, and the number of rows of valid lines row by the row average length avg _ length is calculated to be migration costs of the three data initial tables in three clusters, which are countA, countB, and countC, respectively. The sum of the migration cost values of the three data clusters stored correspondingly is countA, countB, and countC, and it can be understood that if the three data initial tables are all in one cluster, the sum of the migration cost values of the three data clusters is 0, countA + countB + countC, that is, the cost removal values of the three data clusters are 0, countA + countB + countC.
103. Adding the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtracting the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters;
in this embodiment, the cost removal values stored in correspondence with the three data clusters are count ta, count tb, and count c, respectively, the total migration value is count ta + count b + count c, the total migration value is used to reduce count a, count tb, and count c in sequence, and the total migration values of the three data clusters are: countB + countC, countA + countB.
104. Comparing the M cluster migration total values to obtain a minimum cluster migration total value, and determining a data cluster corresponding to the cluster migration total value as a calculation cluster;
in this embodiment, the sizes of countB + countC and countA + countC are first determined, if countA + countC is larger, countA + countC is excluded, the sizes of countB + countC and countA + countB are compared, if countA + countB is larger, countA + countB is excluded, the total cluster migration value of countB + countC is the smallest, the cluster corresponding to countB + countC is found as the first cluster, and the cluster is determined as the calculation cluster.
105. And creating a data transfer table corresponding to the N initial data tables in the computing cluster, copying data in the N initial data tables to the data transfer table, and executing a data query instruction in the computing cluster.
In this embodiment, the total migration values of the three data clusters are respectively: and if the total value of the cluster migration of the countB + countC is the minimum total value of the cluster migration, creating a transfer table tmp C and tmp B in the minimum cluster, and copying the data initial table B and the data initial table C into the tmp B and the tmp C by using a connecting link B and a connecting link C pull. Therefore, three data tables including tmp b, tmp c and data initial A exist in the computing cluster, and the data query instruction is computed in the computing cluster.
In the embodiment of the invention, the cluster with the minimum data table reading cost in all the clusters is obtained by calculating, comparing and judging the migration cost of all the clusters and the migration cost of the data tables, other data tables which are not stored in the cluster are copied into the cluster, and the minimum query cost is realized by data query in the calculation cluster, so that the performance of the cross-cluster data transmission process is improved.
Referring to fig. 2, a second embodiment of the cross-cluster access method according to the embodiment of the present invention includes:
201. receiving a data query instruction of N initial data tables, wherein the N initial data tables are distributed and stored in M data clusters, N is not less than M, and N, M is a positive integer;
the method embodiment described in this embodiment is similar to the first embodiment, and reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
202. Accessing a preset registration database, and judging whether a form to be inquired in the data inquiry instruction belongs to a form in the registration database;
in the present embodiment, a newly designed form is registered in the registration database, and if the newly designed form is registered, it is a processable form, and if it is not data in the registration database, it is not a processable form.
203. If not, sending the data query instruction to a preset switching port; (ii) a
In this embodiment, the data query command is sent to the transit port and processed by other processing methods of the transit port, that is, the data query command is output to other processing logic for processing. 204. If the data query instruction belongs to the data query instruction, analyzing the link modes corresponding to the N data initial tables in the data query instruction, and accessing the N data initial tables according to the link modes;
in this embodiment, the connection mode in the data query instruction is satisfied, and the data initial tables distributed in the M clusters are accessed based on matching of the IP address, the port, the user name, the password, the library name, and the table name according to the connection mode in the link table.
205. Reading the effective line number and the average line length corresponding to the data initial table;
in the present embodiment, the number rows of the data initial table and the row average length avg _ length are read.
206. Calculating the product of the effective line number and the average line length corresponding to the initial data table to obtain the migration cost value corresponding to the initial data table;
in this embodiment, the number of active rows is 1000 rows, and the average row length avg _ length is 96, so that the migration cost is 1000 × 96 — 96000.
207. Sequentially summing the migration cost values corresponding to the data initial tables stored by the M data clusters to obtain cost removal values corresponding to the M data clusters;
in this embodiment, 6, 5, 1, 6, 9, 7, and 3 data initial tables are stored in 8 data clusters, the migration cost values corresponding to these data initial tables are respectively 5000, 66600, 2000, 66663, 7778, 999, 4223, and 635, and the cost removal values corresponding to 8 data clusters are 5000, 66600, 2000, 66663, 7778, 999, 4223, and 635.
208. Adding the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtracting the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters;
209. comparing the M cluster migration total values to obtain a minimum cluster migration total value, and determining a data cluster corresponding to the cluster migration total value as a calculation cluster;
the method embodiments described in embodiments 208 and 209 are similar to the first embodiment, and reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.
210. Reading L data initial tables stored in the computing cluster, and marking the L data initial tables to obtain L marked data tables, wherein L is a positive integer smaller than N;
in this embodiment, 3 data initial tables are stored in the computing cluster, and the 3 data initial tables are labeled to obtain 3 labeled data tables.
211. Shifting out L marked data tables in the N data initial tables to obtain N-L unmarked data initial tables in the N data initial tables;
in this embodiment, 3 marked data tables of the 44 data initial tables are removed, and 41 data initial tables are obtained.
212. Reading verification character strings corresponding to the N-L non-marking data initial tables;
in this embodiment, the validation strings corresponding to the 41 data initial tables are analyzed, and the validation strings include IP addresses, ports, user names, passwords, library names, and table names.
213. Creating N-L data transfer tables in the computing cluster, and writing the N-L verification character strings into the N-L data transfer tables in a one-to-one correspondence manner;
in this embodiment, a data transfer table corresponding to 41 pieces of data initial table data is created in the computing cluster, and an IP address, a port, a user name, a password, a library name, and a table name are written into the data transfer table.
214. And copying the data of the N-L non-marked data initial tables into the data transfer table according to the matching relation of the verification character strings.
In this embodiment, based on the matching relationship between the IP address, the port, the user name, the password, the library name, and the table name, the data initial table is copied to the computing cluster to create 41 data transfer tables, and data processing is completed.
In the embodiment of the invention, the cluster with the minimum data table reading cost in all the clusters is obtained by calculating, comparing and judging the migration cost of all the clusters and the migration cost of the data tables, other data tables which are not stored in the cluster are copied into the cluster, and the minimum query cost is realized by data query in the calculation cluster, so that the performance of the cross-cluster data transmission process is improved.
Referring to fig. 3, a third embodiment of a cross-cluster access method according to the embodiment of the present invention includes:
301. receiving a data query instruction of N initial data tables, wherein the N initial data tables are distributed and stored in M data clusters, N is not less than M, and N, M is a positive integer;
302. calculating the migration cost values corresponding to the N data initial tables, and sequentially calculating the sum of the migration cost values of the data initial tables stored corresponding to the M data clusters to obtain the cost removal values corresponding to the M data clusters;
303. adding the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtracting the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters;
the method embodiments described in the 301-303 embodiments are similar to the first embodiment, and reference may be made to the corresponding processes in the foregoing method embodiments, which are not repeated herein.
304. Selecting two cluster migration total values from the M cluster migration total values to be compared with each other, marking the larger cluster migration total value as a non-target option, and performing cyclic comparison to obtain the smallest cluster migration total value, wherein the non-target option does not enter the cluster migration total value of the cyclic comparison;
in this embodiment, the total of 8 cluster migration values are 6222, 63300, 6633, 411, 88243, 935, 4520, 99964, respectively, the 6222 and 63300 are grabbed, if the 63300 is determined to be large, the marker 63300 is a non-target option, the 411 and 6633 are randomly grabbed, if the 6633 is determined to be large, the marker 6633 is a non-target option, and the process is sequentially cycled, and finally the cluster migration total value 411 which is the minimum is obtained.
305. Accessing a data cluster corresponding to the minimum cluster migration total value, and marking the data cluster as a computing cluster;
in this embodiment, a data cluster corresponding to 411 is accessed, and the data cluster is set as a computing cluster.
306. Creating data transfer tables corresponding to the N data initial tables in a computing cluster, copying data in the N data initial tables into the data transfer tables, and executing a data query instruction in the computing cluster;
the method embodiment described in this embodiment is similar to the first embodiment, and reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
307. And receiving a query termination instruction, and deleting all data transfer tables in the computing cluster.
In this embodiment, the data transfer table of the computing cluster corresponding to 411 is cleared according to the query ending instruction, the whole data query process is ended, the data transfer table is reconstructed for the next query, a new data table in the update cluster is obtained, and the situation that data update lags behind cannot occur.
In the embodiment of the invention, the cluster with the minimum data table reading cost in all the clusters is obtained by calculating, comparing and judging the migration cost of all the clusters and the migration cost of the data tables, other data tables which are not stored in the cluster are copied into the cluster, and the minimum query cost is realized by data query in the calculation cluster, so that the performance of the cross-cluster data transmission process is improved.
With reference to fig. 4, the cross-cluster access device in the embodiment of the present invention is described above, and an embodiment of the cross-cluster access device in the embodiment of the present invention includes:
a receiving module 401, configured to receive a data query instruction of N initial data tables, where the N initial data tables are distributed and stored in M data clusters, N is not less than M, and N, M is a positive integer;
a calculating module 402, configured to calculate migration cost values corresponding to the N data initial tables, and sequentially calculate a sum of the migration cost values of the data initial tables stored in correspondence to the M data clusters, so as to obtain cost removal values corresponding to the M data clusters;
a summing and subtracting module 403, configured to sum the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtract the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters;
a comparing module 404, configured to compare the M total cluster migration values to obtain a minimum total cluster migration value, and determine a data cluster corresponding to the total cluster migration value as a computing cluster;
a copying module 405, configured to create a data transfer table corresponding to the N initial data tables in the computing cluster, copy data in the N initial data tables to the data transfer table, and execute the data query instruction in the computing cluster.
In the embodiment of the invention, the cluster with the minimum data table reading cost in all the clusters is obtained by calculating, comparing and judging the migration cost of all the clusters and the migration cost of the data tables, other data tables which are not stored in the cluster are copied into the cluster, and the minimum query cost is realized by data query in the calculation cluster, so that the performance of the cross-cluster data transmission process is improved.
Referring to fig. 5, another embodiment of the cross-cluster access device in the embodiment of the present invention includes:
a receiving module 401, configured to receive a data query instruction of N initial data tables, where the N initial data tables are distributed and stored in M data clusters, N is not less than M, and N, M is a positive integer;
a calculating module 402, configured to calculate migration cost values corresponding to the N data initial tables, and sequentially calculate a sum of the migration cost values of the data initial tables stored in correspondence to the M data clusters, so as to obtain cost removal values corresponding to the M data clusters;
a summing and subtracting module 403, configured to sum the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtract the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters;
a comparing module 404, configured to compare the M total cluster migration values to obtain a minimum total cluster migration value, and determine a data cluster corresponding to the total cluster migration value as a computing cluster;
a copying module 405, configured to create a data transfer table corresponding to the N initial data tables in the computing cluster, copy data in the N initial data tables to the data transfer table, and execute the data query instruction in the computing cluster.
The cross-cluster access device further includes a determining module 406, where the determining module 406 is specifically configured to:
judging whether the data query instruction meets a preset SQL language structure;
if the data is not satisfied, sending information which cannot be subjected to data query to a preset port;
and if so, analyzing SQL links in the data query instruction, and accessing N data initial tables according to the SQL links.
Wherein the calculating module 402 is specifically configured to:
reading the effective line number and the average line length corresponding to the data initial table;
calculating the product of the effective line number and the average line length corresponding to the initial data table to obtain a migration cost value corresponding to the initial data table;
and sequentially summing the migration cost values corresponding to the data initial tables stored by the M data clusters to obtain the cost removal values corresponding to the M data clusters.
Wherein the replication module 405 comprises:
a reading unit 4051, configured to read L data initial tables stored in the computing cluster, and mark the L data initial tables to obtain L marked data tables, where L is a positive integer smaller than N;
a removing unit 4052, configured to remove L marked data tables from the N data initial tables to obtain N-L non-marked data initial tables from the N data initial tables;
the copying unit 4053 is configured to create a data transfer table corresponding to the N-L non-labeled data initial tables in the computing cluster, copy the data of the N-L non-labeled data initial tables to the data transfer table, and execute the data query instruction in the computing cluster.
Wherein, the copy unit 4053 is specifically configured to:
reading verification character strings corresponding to the N-L non-marking data initial tables;
creating N-L data transfer tables in the computing cluster, and writing the N-L verification character strings into the N-L data transfer tables in a one-to-one correspondence manner;
and copying the data of the N-L non-marked data initial tables into the data transfer table according to the matching relation of the verification character strings.
The cross-cluster access device further includes a deleting module 407, where the deleting module 407 is specifically configured to:
and receiving a query termination instruction, and deleting all the data transfer tables in the computing cluster.
Wherein the comparing module 404 is specifically configured to:
selecting two cluster migration total values from the M cluster migration total values to be compared with each other, marking the larger cluster migration total value as a non-target option, and performing cyclic comparison to obtain the smallest cluster migration total value, wherein the non-target option does not enter the cluster migration total value of the cyclic comparison;
and accessing the data cluster corresponding to the minimum cluster migration total value, and marking the data cluster as a computing cluster.
In the embodiment of the invention, the cluster with the minimum data table reading cost in all the clusters is obtained by calculating, comparing and judging the migration cost of all the clusters and the migration cost of the data tables, other data tables which are not stored in the cluster are copied into the cluster, and the minimum query cost is realized by data query in the calculation cluster, so that the performance of the cross-cluster data transmission process is improved.
Fig. 4 and fig. 5 describe the cross-cluster access device in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the cross-cluster access device in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 6 is a schematic structural diagram of a cross-cluster access device according to an embodiment of the present invention, where the cross-cluster access device 600 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) for storing applications 633 or data 632. Memory 620 and storage medium 630 may be, among other things, transient or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a sequence of instructions operating on the access device 600 across the cluster. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the access device 600 across the cluster.
The cross-cluster based access device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input-output interfaces 660, and/or one or more operating systems 631, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the cross-cluster access device architecture shown in fig. 6 does not constitute a limitation to cross-cluster based access devices, and may include more or fewer components than shown, or combine certain components, or a different arrangement of components.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the cross-cluster access method.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses, and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A cross-cluster access method, comprising the steps of:
receiving a data query instruction of N initial data tables, wherein the N initial data tables are distributed and stored in M data clusters, N is not less than M, and N, M is a positive integer;
calculating migration cost values corresponding to the N data initial tables, and sequentially calculating the sum of the migration cost values of the data initial tables stored corresponding to the M data clusters to obtain cost removal values corresponding to the M data clusters;
adding the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtracting the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters;
comparing the M cluster migration total values to obtain a minimum cluster migration total value, and determining a data cluster corresponding to the cluster migration total value as a computing cluster;
and creating data transfer tables corresponding to the N initial data tables in the computing cluster, copying data in the N initial data tables to the data transfer tables, and executing the data query instruction in the computing cluster.
2. The cross-cluster access method according to claim 1, wherein after the receiving a data query instruction for N initial tables of data, before the calculating migration cost values corresponding to the N initial tables of data and sequentially calculating a sum of migration cost values of the initial tables of data stored in correspondence to M data clusters to obtain cost removal values corresponding to M data clusters, the method includes:
judging whether the data query instruction meets a preset SQL language structure;
if the data is not satisfied, sending information which cannot be subjected to data query to a preset port;
and if so, analyzing SQL links in the data query instruction, and accessing N data initial tables according to the SQL links.
3. The cross-cluster access method according to claim 1, wherein the calculating migration cost values corresponding to the N initial tables of data, and sequentially calculating a sum of the migration cost values of the initial tables of data stored in correspondence to the M data clusters, and obtaining cost removal values corresponding to the M data clusters includes:
reading the effective line number and the average line length corresponding to the data initial table;
calculating the product of the effective line number and the average line length corresponding to the initial data table to obtain a migration cost value corresponding to the initial data table;
and sequentially summing the migration cost values corresponding to the data initial tables stored by the M data clusters to obtain the cost removal values corresponding to the M data clusters.
4. The cross-cluster access method according to claim 1, wherein the creating of the data transfer tables corresponding to the N initial data tables in the computing cluster, the copying of the data in the N initial data tables into the data transfer tables, and the executing of the data query instruction in the computing cluster comprise:
reading L data initial tables stored in the computing cluster, and marking the L data initial tables to obtain L marked data tables, wherein L is a positive integer smaller than N;
shifting out L marked data tables in the N data initial tables to obtain N-L unmarked data initial tables in the N data initial tables;
and creating a data transfer table corresponding to the N-L non-marked data initial tables in the computing cluster, copying the data of the N-L non-marked data initial tables into the data transfer table, and executing the data query instruction in the computing cluster.
5. The cross-cluster access method according to claim 4, wherein the creating, in the computing cluster, data transfer tables corresponding to the N-L non-labeled data initial tables, and copying the data of the N-L non-labeled data initial tables into the data transfer tables includes:
reading verification character strings corresponding to the N-L non-marking data initial tables;
creating N-L data transfer tables in the computing cluster, and writing the N-L verification character strings into the N-L data transfer tables in a one-to-one correspondence manner;
and copying the data of the N-L non-marked data initial tables into the data transfer table according to the matching relation of the verification character strings.
6. The cross-cluster access method according to claim 1, wherein after the creating of the data transfer tables corresponding to the N initial data tables in the computing cluster, copying data in the N initial data tables to the data transfer tables, and executing the data query instruction in the computing cluster, further comprises:
and receiving a query termination instruction, and deleting all the data transfer tables in the computing cluster.
7. The cross-cluster access method according to any one of claims 1 to 6, wherein the comparing the M total cluster migration values to obtain a minimum total cluster migration value, and determining the data cluster corresponding to the total cluster migration value as the computing cluster comprises:
selecting two cluster migration total values from the M cluster migration total values to be compared with each other, marking the larger cluster migration total value as a non-target option, and performing cyclic comparison to obtain the smallest cluster migration total value, wherein the non-target option does not enter the cluster migration total value of the cyclic comparison;
and accessing the data cluster corresponding to the minimum cluster migration total value, and marking the data cluster as a computing cluster.
8. A cross-cluster access device, the cross-cluster access device comprising:
the receiving module is used for receiving data query instructions of N initial data tables, wherein the N initial data tables are distributed and stored in M data clusters, N is not less than M, and N, M is a positive integer;
the calculation module is used for calculating migration cost values corresponding to the N data initial tables, and sequentially calculating the sum of the migration cost values of the data initial tables stored corresponding to the M data clusters to obtain cost removal values corresponding to the M data clusters;
a summing and subtracting module, configured to sum up the migration cost values corresponding to the N data initial tables to obtain a total migration value, and subtract the total migration value from the cost removal values corresponding to the M data clusters to obtain a total cluster migration value corresponding to the M data clusters;
the comparison module is used for comparing the M total cluster migration values to obtain the minimum total cluster migration value, and determining a data cluster corresponding to the total cluster migration value as a calculation cluster;
and the copying module is used for creating data transfer tables corresponding to the N initial data tables in the computing cluster, copying data in the N initial data tables to the data transfer tables, and executing the data query instruction in the computing cluster.
9. A cross-cluster access device, the cross-cluster access device comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invoking the instructions in the memory to cause the cross-cluster access device to perform the cross-cluster access method of any of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the cross-cluster access method of any of claims 1-7.
CN202011483662.0A 2020-12-15 2020-12-15 Cross-cluster access method, device, equipment and storage medium Pending CN112597127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011483662.0A CN112597127A (en) 2020-12-15 2020-12-15 Cross-cluster access method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011483662.0A CN112597127A (en) 2020-12-15 2020-12-15 Cross-cluster access method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112597127A true CN112597127A (en) 2021-04-02

Family

ID=75196432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011483662.0A Pending CN112597127A (en) 2020-12-15 2020-12-15 Cross-cluster access method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112597127A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130138902A1 (en) * 2011-11-30 2013-05-30 International Business Machines Corporation Optimizing Migration/Copy of De-Duplicated Data
CN110162517A (en) * 2019-05-30 2019-08-23 深圳前海微众银行股份有限公司 Data migration method, device, equipment and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130138902A1 (en) * 2011-11-30 2013-05-30 International Business Machines Corporation Optimizing Migration/Copy of De-Duplicated Data
CN110162517A (en) * 2019-05-30 2019-08-23 深圳前海微众银行股份有限公司 Data migration method, device, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
Page et al. A practical implementation of the factoring theorem for network reliability
CN110908997B (en) Data blood relationship construction method and device, server and readable storage medium
US8935575B2 (en) Test data generation
Adler et al. Towards compressing web graphs
CN112597153B (en) Block chain-based data storage method, device and storage medium
EP1808779A1 (en) Bundling database
US20060101452A1 (en) Method and apparatus for preserving dependancies during data transfer and replication
US8010568B2 (en) Enforcing constraints from a parent table to a child table
CN106528898A (en) Method and device for converting data of non-relational database into relational database
CN113297250A (en) Method and system for multi-table association query of distributed database
US10599614B1 (en) Intersection-based dynamic blocking
US20200394163A1 (en) Method, device, and computer apparatus for merging regions of hbase table
CN108121774B (en) Data table backup method and terminal equipment
CN112912870A (en) Tenant identifier conversion
CN116204534B (en) Data archiving method, device, equipment and storage medium
WO2020224498A1 (en) Relational database based on alliance chain, and operation method and apparatus therefor
CN116719822B (en) Method and system for storing massive structured data
CN109672623A (en) A kind of message processing method and device
CN112597127A (en) Cross-cluster access method, device, equipment and storage medium
EP0807885B1 (en) System and method for automatically distributing copies of a replicated database in a computer system
AU2002351296B2 (en) System and method for processing a request using multiple database units
CN110851437A (en) Storage method, device and equipment
CN113806803B (en) Data storage method, system, terminal equipment and storage medium
US20230169036A1 (en) System and method for deleting parent snapshots of running points of storage objects using extent ownership values
CN108399152A (en) Compression expression method, system, storage medium and the rule match device of digital search tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination