CN112597148A - Data table connection method and device - Google Patents

Data table connection method and device Download PDF

Info

Publication number
CN112597148A
CN112597148A CN202011335852.8A CN202011335852A CN112597148A CN 112597148 A CN112597148 A CN 112597148A CN 202011335852 A CN202011335852 A CN 202011335852A CN 112597148 A CN112597148 A CN 112597148A
Authority
CN
China
Prior art keywords
record
key
records
new partitions
data table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011335852.8A
Other languages
Chinese (zh)
Inventor
李栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202011335852.8A priority Critical patent/CN112597148A/en
Publication of CN112597148A publication Critical patent/CN112597148A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • G06F16/2456Join operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data table connection method, which comprises the following steps: determining a first data table and a second data table to be connected; the first data table comprises a plurality of first records, and the second data table comprises a plurality of second records; determining a first key value key of each first record and a second key value key of each second record; pulling the first record and the second record of the matched first key and the second key from the original partition into one or more new partitions; concatenating the first record and the second record of the one or more new partitions.

Description

Data table connection method and device
Technical Field
The invention relates to a big data processing technology, in particular to a method and a device for connecting data tables.
Background
Apache Spark is a fast, general-purpose computing engine designed specifically for large-scale data processing. The Shuffle process in the Spark calculation process needs to pull data from one Partition (Partition) to another Partition, and this process will generate network resource consumption, memory consumption, and consumption of disk IO (Input Output).
When the join class calculation of the two data tables is related, before the join operation is performed on the two tables, records in the two data tables need to be distributed into a plurality of partitions according to a key value (key) of each record in the two data tables, and the join operation is performed on the records belonging to the two tables in each partition, wherein the distribution of the records to the plurality of partitions relates to a Shuffle operation of migrating data from one partition to another partition.
If the difference between the keys in the two data tables is very large (for example, 50% of the keys in all records of table a do not exist in table B), the Join operation result for the record is null, but if the Shuffle operation needs to be performed on all records of the two data tables according to Join execution logic, the Shuffle operation corresponding to the record to which the Key with the larger difference belongs can be considered to be invalid. When there are more invalid Shuffle operations, the overall performance of the computation is greatly reduced.
Disclosure of Invention
The present disclosure provides a method for connecting data tables, so as to solve at least the above technical problems in the prior art.
One aspect of the present disclosure provides a method for connecting data tables, including:
determining a first data table and a second data table to be connected; the first data table comprises a plurality of first records, and the second data table comprises a plurality of second records;
determining a first key value key of each first record and a second key value key of each second record;
pulling the first record and the second record of the matched first key and the second key from the original partition into one or more new partitions;
concatenating the first record and the second record of the one or more new partitions.
Wherein the determining the first key of each of the first records and the second key of each of the second records comprises:
generating a first key according to the specific field in the first record;
and generating a second key according to the specific field in the second record.
Wherein the content of the first and second substances,
pulling the first record and the second record belonging to the matched first key and second key from the original partition into one or more new partitions, wherein the pulling comprises the following steps:
establishing one or more new partitions, and setting corresponding specific conditions for the one or more new partitions;
determining a first record of which a first key meets the specific condition, determining a second record of which a second key meets the specific condition, wherein the first key and the second key which meet the same specific condition are matched with each other;
the first record and the second record satisfying the same specific condition are pulled to the corresponding one or more new partitions.
And pulling the first record and the second record meeting the same specific condition to one or more corresponding new partitions based on a hash algorithm.
Wherein, the method also comprises:
if the original partition to which the matched first record belongs comprises other first records, pulling the other first records into one or more new partitions corresponding to the original partition;
if the original partition to which the matched second record belongs comprises other second records, pulling the other second records into one or more new partitions corresponding to the original partition;
and the one or more new partitions to which the matched first record and the second record belong, the one or more new partitions to which the other first records belong, and the one or more new partitions to which the other second records belong are different partitions.
Another aspect of the present disclosure provides a data table connection device, including:
the data storage module is used for determining a first data table and a second data table to be connected; the first data table comprises a plurality of first records, and the second data table comprises a plurality of second records;
the calculation module is used for determining a first key of each first record and a second key of each second record;
the pulling module is used for pulling the first record and the second record of the matched first key and the second key from the original partition into one or more new partitions;
and the connecting module is used for connecting the first record and the second record of the one or more new partitions.
Wherein the content of the first and second substances,
the calculation module is used for generating a first key according to the specific field in the first record; and the second key is also used for generating a second key according to the specific field in the second record.
Wherein, the device still includes:
the resource partitioning module is used for establishing one or more new partitions and setting corresponding specific conditions for the established one or more new partitions;
the computing module is further configured to determine a first record of which a first key meets the specific condition, determine a second record of which a second key meets the specific condition, and determine that the first key and the second key meeting the same specific condition are matched with each other;
the pulling module is further configured to pull the first record and the second record that satisfy the same specific condition to the corresponding one or more new partitions.
The pulling module is used for pulling the first record and the second record meeting the same specific condition to one or more corresponding new partitions based on a hash algorithm.
The pulling module is further configured to pull the other first records into one or more new partitions corresponding to the original partition when the original partition to which the matched first record belongs includes the other first records;
the pull module is further configured to pull the other second records into one or more new partitions corresponding to the original partition when the original partition to which the matched second record belongs includes the other second records;
and the one or more new partitions to which the matched first record and the second record belong, the one or more new partitions to which the other first records belong, and the one or more new partitions to which the other second records belong are different partitions.
In the scheme of the disclosure, a new partition is established, the first record and the second record meeting the specific conditions of the new partition are pulled into the new partition, and the pulling operation is not executed for the records in which all the records in the original partition do not meet the conditions of the new partition, so that the records stay in the original partition, thereby reducing the Shuffle operation, saving the network resource consumption, the memory consumption and the disk IO (Input Output) consumption in the Shuffle process, and simultaneously, only performing join calculation for the records meeting the new partition, thereby improving the calculation performance.
Drawings
FIG. 1 illustrates a flow diagram of a method for joining data tables, according to an embodiment;
FIG. 2 is a flow chart illustrating a method for linking data tables according to another embodiment;
FIG. 3 illustrates a diagram of a connection device structure for a data table according to one embodiment;
FIG. 4 is a diagram showing a configuration of a connection device of a data table according to another embodiment.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present disclosure provides a method for connecting data tables, as shown in fig. 1, including:
step 101, determining a first data table and a second data table to be connected; the first data table comprises a plurality of first records, and the second data table comprises a plurality of second records;
step 102, determining a first key of each first record and a second key of each second record;
step 103, pulling the first record and the second record of the matched first key and second key from the original partition into one or more new partitions;
step 104, concatenating the first record and the second record of the one or more new partitions.
In the embodiment of the present disclosure, for convenience of description, two pieces of data for join are denoted as a first data table and a second data table, each record in the first data table is referred to as a first record, each record in the second data table is referred to as a second record, a key of the first record is referred to as a first key, and a key of the second record is referred to as a second key.
In the above example of the present disclosure, the first record and the second record of the key matching are pulled into one or more new partitions, and for the key that cannot be matched, the record to which the key belongs does not perform the pull operation (Shuffle operation), so that the invalid Shuffle operation can be reduced.
The above-described scheme is explained below by way of specific examples.
Each table is an elastic Distributed data set (RDD) in Spark, and is mapped into a plurality of partitions (partitions), and the partitions between the tables are independent.
Assume that the first data table contains 100 first records (numbered 1-100) distributed as:
partition 1-1: first records 1-30;
partition 1-2: a first record 31-70;
partition 1-3: a first record 71-100;
the second data table contains 50 second records (numbered 1-50) distributed as:
partition 2-1: a second record 1-30;
partition 2-2: the second record 31-50.
Before the operation of pulling, first determining a first record and a second record which are matched, and assuming that the first record and the second record which are matched by key are: a first record 20-60 and a second record 31-40. Then, a pull operation is performed, specifically:
in this embodiment, the purpose of the pull operation is to pull the first records 20-60 and the second records 31-40 from the first data table and the second data table, when the pull operation is performed, if the first records 20-60 and other first records are in the same original partition, then other first records in the original partition also need to be pulled, and if only other first records exist in one original partition, then the first records in the original partition do not need to be pulled; similarly, if the second record 31-40 and other second records are in the same original partition, then the other second records in the original partition also need to be pulled, and if only other second records exist in the original partition, then the second record in the original partition does not need to be pulled. It should be noted that the new partition has a corresponding relationship with the original partition.
Assuming that after the pulling operation, the new partition condition corresponding to each record in the first data table and the second data table is as follows:
partition 3-1: first records 1-19; (New Partition3-1 corresponds to original Partition1-1)
Partition 3-2: a first record 61-70; (New Partition 3-2 corresponds to original Partition1-2)
Partition 3-3: a first record 71-100; (where Partition 3-3 is not a new Partition, but an original Partition, Partition1-3, where the records in Partition1-3 need not be pulled, but are only re-identified here)
Partition 4-1: a second record 1-30; (where Partition4-1 is not a new Partition, but an original Partition, Partition2-1, the records in original Partition, Partition2-1, need not be pulled, and are only re-identified here)
Partition 4-2: second records 41-50; (New Partition 4-2 corresponds to original Partition2-2)
Partition 5: a first record 20-60 and a second record 31-40. (the new Partition 5 corresponds to the original partitions Partition1-1, Partition1-2 and Partition2-2)
It should be noted that the number of the new partitions is only an example, and the number of the new partitions is determined by a preconfigured parameter, so according to the preconfigured parameter, the pulled first records 1-19, 61-70 in the first data table may be distributed in one new partition, or may be distributed in more new partitions, which is not limited in the present invention; similarly, the second records 41-50 pulled in the second data table may be distributed in one new partition or may be distributed in more new partitions. The matching first records 20-60 and second records 31-40 may also be distributed in one new partition or in more new partitions. However, the new partition in which the matching first records 20-60 and second records 31-40 are located is not the same partition as the new partition in which the other first records and other second records are located. The details will be described herein with reference to the following examples.
Then, when a join operation is performed: if the second record does not exist in the Partition3-1/3-2/3-3, directly outputting an empty Join result or outputting the content (namely, each record) of the Partition3-1/3-2/3-3 as a Join result according to the Join operation, and if the first record does not exist in the Partition4-1/4-2, directly outputting the empty Join result or outputting the content (namely, each record) of the Partition4-1/4-2 as the Join result according to the Join operation; and for the records in the Partition 5, performing join calculation according to the keys of the first record and the second record to obtain and output corresponding results.
Therefore, on the basis of ensuring the correctness of the join result, a large number of pulling operations can be reduced, and the calculation performance is improved.
The implementation of fig. 1 is explained in detail below:
first, the key is generated as follows:
generating a first key according to the specific field in the first record;
and generating a second key according to the specific field in the second record.
The specific field may be one field or a plurality of fields.
The key generation method of the first record and the key generation method of the second record are not limited in the embodiment of the disclosure, as long as the first key and the second key are generated in the same way.
After generating the key of each record, a pull operation may be performed, and then the step 103 pulls the first record and the second record to which the matched first key and second key belong from the original partition to one or more partitions, as shown in fig. 2, including:
step 201, establishing one or more new partitions, and setting corresponding specific conditions for the established one or more new partitions;
step 202, determining a first record of which a first key meets the specific condition, determining a second record of which a second key meets the specific condition, wherein the first key and the second key which meet the same specific condition are matched with each other;
step 203, the first record and the second record meeting the same specific condition are pulled to the corresponding one or more new partitions.
Here, a hash algorithm may be employed to pull the matching first and second records into the corresponding new partition or partitions.
In one example, the specific condition is key-related, e.g., for a numeric type of key, the specific condition may be configured by a distribution interval of values: the specific condition may be 0-1024, and then the first record and the second record whose keys belong to 0-1024 satisfy the specific condition; for example, for the character type key, hash operation may be performed on each key to obtain a numerical type key, and then a specific condition is configured according to a distribution interval of the numerical values, or a specific condition is configured according to a characteristic of the numerical values, for example, the specific condition is that a last numerical value is an odd number, or the specific condition is that a last numerical value is an even number; of course, for the character type key, the hash operation may not be performed, and the specific condition corresponding to the new partition may be configured according to the character characteristic.
Taking the example above, assuming that 2 new partitions A, B are created, the 2 new partitions corresponding to a particular condition, and that the first record 20-60 and the second record 31-40 satisfy the particular condition, then the first record 20-60 and the second record 31-40 are pulled from the original partition into the 2 new partitions A, B according to a hash algorithm.
It should be noted that the new partition established in step 201 is used to support pulling of the matched first record and second record, and the number is not limited, and may be one or multiple.
In addition, the specific condition may be multiple, such as the value ranges 0-1024 and 1025-2047, then 0-1024 may correspond to one or more partitions, 1025-2047 may correspond to one or more partitions, and the partitions corresponding to 0-1024 and 1025-2047 may be the same or different.
It should be noted that, in the process of pulling the matching first record and second record:
if the original partition to which the matched first record belongs comprises other first records, pulling the other first records into one or more new partitions corresponding to the original partition;
if the original partition to which the matched second record belongs comprises other second records, pulling the other second records into one or more new partitions corresponding to the original partition;
the one or more new partitions to which the matched first record and second record belong, the one or more new partitions to which other first records belong, and the one or new partitions to which other second records belong are different partitions.
And if the matched first record or second record does not exist in the original partition, the first record or second record in the original partition does not execute the pulling operation.
In addition, for the first record table:
since the first record 20-30 (satisfying the specific condition) and the first record 1-19 (not satisfying the specific condition) belong to the same original Partition1-1, a pull operation is also required for the first record 1-19;
the first record 31-60 (satisfying the specific condition) and the first record 61-70 (not satisfying the specific condition) belong to the same original Partition1-2, then the pull operation is also required to be performed on the first record 61-70;
thus, one or more new partitions may be established, and assuming that 1 new partition C is established, the first record 1-19, 61-70 is pulled into the 1 new partition according to the hash algorithm.
For the second record table:
since the second record 31-40 (satisfying the specific condition) and the second record 41-50 (not satisfying the specific condition) belong to the same original Partition2-2, a pull operation is also required for the second record 41-50;
thus, one or more new partitions may be created, and assuming 2 new partitions D, E are created, then the second record 41-50 is pulled D, E into the 2 new partitions according to the hash algorithm.
And for the original partitions Partition 3-3 and Partition4-1, there is no record that satisfies the above-described specific condition, and therefore, the pull operation is not performed.
At this point, the Shuffle operation process ends.
After the Shuffle operation process is finished, join operations are performed on the first record and the second record in one or more new partitions (created in step 201), and the result is output. For the unmatched first record or second record in the new partition (different from the new partition created in step 201, such as the new partition C, D, E described above), and the first record or second record in the original partition, the null join result can be directly output or the records in the partition can be directly output, because only the first record or only the second record is included in the partitions.
In the above example of the present disclosure, a new partition is established, a first record and a second record that meet a specific condition of the new partition are pulled into the new partition, and a pulling operation is not performed on a record that does not meet a condition of the new partition in an original partition, so that the record stays in the original partition, thereby reducing Shuffle operations, saving network resource consumption, memory consumption and consumption of a disk IO (Input Output) in a Shuffle process, and meanwhile, only performing join calculation on a record that meets a condition of the new partition, thereby improving calculation performance.
As shown in fig. 3, an example of the present disclosure provides a data table connection apparatus 30, including:
a data storage module 31, configured to determine a first data table and a second data table to be connected; the first data table comprises a plurality of first records, and the second data table comprises a plurality of second records;
a calculation module 32, configured to determine a first key of each of the first records and a second key of each of the second records;
the pulling module 33 is configured to pull the first record and the second record belonging to the matched first key and second key from the original partition into one or more new partitions;
a linking module 34, configured to link the first record and the second record of the one or more new partitions.
The calculation module 32 is configured to generate a first key according to a specific field in the first record; and the second key is also used for generating a second key according to the specific field in the second record.
As shown in fig. 4, the apparatus 30 further includes:
the resource partitioning module 35 is configured to establish one or more new partitions, and set corresponding specific conditions for the established one or more new partitions;
the calculation module 32 is further configured to determine a first record of which a first key satisfies the specific condition, determine a second record of which a second key satisfies the specific condition, and determine that the first key and the second key satisfying the same specific condition are matched with each other;
the pulling module 33 is further configured to pull the first record and the second record that satisfy the same specific condition into the corresponding one or more new partitions.
The pulling module 33 is configured to pull the first record and the second record that satisfy the same specific condition into the corresponding one or more new partitions.
The pulling module 33 is further configured to pull the first record and the second record that satisfy the same specific condition to the corresponding one or more new partitions based on a hash algorithm.
The pulling module 33 is further configured to, when the original partition to which the matched second record belongs includes other second records, pull the other second records to one or more new partitions corresponding to the original partition;
and the one or more new partitions to which the matched first record and the second record belong, the one or more new partitions to which the other first records belong, and the one or more new partitions to which the other second records belong are different partitions.
Illustratively, the present disclosure also provides an electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is used for reading the executable instruction from the memory and executing the instruction to realize the data table connection method.
Illustratively, the present invention also provides a computer-readable storage medium storing a computer program for executing the above-described data table linking method.
In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the methods according to the various embodiments of the present application described in the "exemplary methods" section of this specification, above.
The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the present application described in the "exemplary methods" section above of this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (10)

1. A method for connecting data tables comprises the following steps:
determining a first data table and a second data table to be connected; the first data table comprises a plurality of first records, and the second data table comprises a plurality of second records;
determining a first key value key of each first record and a second key value key of each second record;
pulling the first record and the second record of the matched first key and the second key from the original partition into one or more new partitions;
concatenating the first record and the second record of the one or more new partitions.
2. The method for linking data tables according to claim 1, wherein the determining the first key of each first record and the second key of each second record comprises:
generating a first key according to the specific field in the first record;
and generating a second key according to the specific field in the second record.
3. The method for linking data tables according to claim 1 or 2, wherein the step of pulling the first record and the second record belonging to the matched first key and second key from the original partition into one or more new partitions comprises the steps of:
establishing one or more new partitions, and setting corresponding specific conditions for the one or more new partitions;
determining a first record of which a first key meets the specific condition, determining a second record of which a second key meets the specific condition, wherein the first key and the second key which meet the same specific condition are matched with each other;
the first record and the second record satisfying the same specific condition are pulled to the corresponding one or more new partitions.
4. The method for linking data tables according to claim 3,
and pulling the first record and the second record meeting the same specific condition to one or more corresponding new partitions based on a hash algorithm.
5. The method of claim 4, further comprising:
if the original partition to which the matched first record belongs comprises other first records, pulling the other first records into one or more new partitions corresponding to the original partition;
if the original partition to which the matched second record belongs comprises other second records, pulling the other second records into one or more new partitions corresponding to the original partition;
and the one or more new partitions to which the matched first record and the second record belong, the one or more new partitions to which the other first records belong, and the one or more new partitions to which the other second records belong are different partitions.
6. A data table connection apparatus comprising:
the data storage module is used for determining a first data table and a second data table to be connected; the first data table comprises a plurality of first records, and the second data table comprises a plurality of second records;
the calculation module is used for determining a first key of each first record and a second key of each second record;
the pulling module is used for pulling the first record and the second record of the matched first key and the second key from the original partition into one or more new partitions;
and the connecting module is used for connecting the first record and the second record of the one or more new partitions.
7. The data sheet connecting device of claim 6,
the calculation module is used for generating a first key according to the specific field in the first record; and the second key is also used for generating a second key according to the specific field in the second record.
8. The data table connecting device according to claim 6 or 7, further comprising:
the resource partitioning module is used for establishing one or more new partitions and setting corresponding specific conditions for the established one or more new partitions;
the computing module is further configured to determine a first record of which a first key meets the specific condition, determine a second record of which a second key meets the specific condition, and determine that the first key and the second key meeting the same specific condition are matched with each other;
the pulling module is further configured to pull the first record and the second record that satisfy the same specific condition to the corresponding one or more new partitions.
9. The connection device of data sheet of claim 8,
and the pulling module is used for pulling the first record and the second record meeting the same specific condition to one or more corresponding new partitions based on a hash algorithm.
10. The connection device of data sheet of claim 9,
the pull module is further configured to pull the other first records into one or more new partitions corresponding to the original partition when the original partition to which the matched first record belongs includes the other first records;
the pull module is further configured to pull the other second records into one or more new partitions corresponding to the original partition when the original partition to which the matched second record belongs includes the other second records;
and the one or more new partitions to which the matched first record and the second record belong, the one or more new partitions to which the other first records belong, and the one or more new partitions to which the other second records belong are different partitions.
CN202011335852.8A 2020-11-25 2020-11-25 Data table connection method and device Pending CN112597148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011335852.8A CN112597148A (en) 2020-11-25 2020-11-25 Data table connection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011335852.8A CN112597148A (en) 2020-11-25 2020-11-25 Data table connection method and device

Publications (1)

Publication Number Publication Date
CN112597148A true CN112597148A (en) 2021-04-02

Family

ID=75183804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011335852.8A Pending CN112597148A (en) 2020-11-25 2020-11-25 Data table connection method and device

Country Status (1)

Country Link
CN (1) CN112597148A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095455A (en) * 2015-07-27 2015-11-25 中国联合网络通信集团有限公司 Data connection optimization method and data operation system
CN106874322A (en) * 2016-06-27 2017-06-20 阿里巴巴集团控股有限公司 A kind of data table correlation method and device
CN107153643A (en) * 2016-03-02 2017-09-12 阿里巴巴集团控股有限公司 Tables of data connection method and device
CN111241163A (en) * 2020-01-17 2020-06-05 平安科技(深圳)有限公司 Distributed computing task response method and device
CN111506569A (en) * 2020-03-02 2020-08-07 平安科技(深圳)有限公司 Data storage method and device and electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095455A (en) * 2015-07-27 2015-11-25 中国联合网络通信集团有限公司 Data connection optimization method and data operation system
CN107153643A (en) * 2016-03-02 2017-09-12 阿里巴巴集团控股有限公司 Tables of data connection method and device
CN106874322A (en) * 2016-06-27 2017-06-20 阿里巴巴集团控股有限公司 A kind of data table correlation method and device
CN111241163A (en) * 2020-01-17 2020-06-05 平安科技(深圳)有限公司 Distributed computing task response method and device
CN111506569A (en) * 2020-03-02 2020-08-07 平安科技(深圳)有限公司 Data storage method and device and electronic device

Similar Documents

Publication Publication Date Title
Burrage Parallel methods for initial value problems
He et al. Model approach to grammatical evolution: deep-structured analyzing of model and representation
CN105824957A (en) Query engine system and query method of distributive memory column-oriented database
US9213738B2 (en) Method and device for generating an RDF database for an RDF database query and a search method and a search device for the RDF database query
CN108829884B (en) Data mapping method and device
US11372929B2 (en) Sorting an array consisting of a large number of elements
CN109614492B (en) Text data enhancement method, device, equipment and storage medium based on artificial intelligence
WO2024036662A1 (en) Parallel graph rule mining method and apparatus based on data sampling
CN113220710B (en) Data query method, device, electronic equipment and storage medium
CN114064925A (en) Knowledge graph construction method, data query method, device, equipment and medium
CN112580279B (en) Optimization method and optimization device for logic circuit and storage medium
US20240004778A1 (en) Method for processing command, device for processing command, and electronic device
CN107463671B (en) Method and device for path query
US11449461B2 (en) Metadata-driven distributed dynamic reader and writer
CN117215540A (en) Code generation method, device and system of remote procedure call framework
CN112597148A (en) Data table connection method and device
CN107590166B (en) A kind of data creation method and device based on inquiry content
CN112765280A (en) Block data storage method and device, computer readable medium and electronic equipment
CN112802467A (en) Voice recognition method and device
US20090300038A1 (en) Methods and Apparatus for Reuse Optimization of a Data Storage Process Using an Ordered Structure
CN107038022B (en) Deserialization method and deserialization device
CN113886199B (en) Data processing method and device
CN110941658A (en) Data export method, device, server and storage medium
US9009172B2 (en) Methods, systems and computer readable media for comparing XML documents
US20090164197A1 (en) Method for transforming overlapping paths in a logical model to their physical equivalent based on transformation rules and limited traceability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination