CN112115138A

CN112115138A - Method, device and equipment for determining association relation between data tables

Info

Publication number: CN112115138A
Application number: CN202010839661.9A
Authority: CN
Inventors: 姚均霖
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2020-12-22
Also published as: WO2022037624A1

Abstract

The disclosure provides a method, a device and equipment for determining an incidence relation between data tables, which includes: acquiring data table pairs related in a target data table set; wherein the pair of data tables includes a first data table and a second data table; acquiring a splicing key field pair for associating the first data table and the second data table; wherein the split key field pair comprises a foreign key field of the first data table and a primary key field of the second data table; and calculating the association degree of the splicing key field pair, and determining the association relation between the data tables according to the calculated association degree.

Description

Method, device and equipment for determining association relation between data tables

Technical Field

The present invention relates to the field of data processing, and more particularly, to a method of determining an association between data tables, an apparatus for determining an association between data tables, an apparatus including at least one computing device and at least one storage device, and a computer-readable storage medium.

Background

With the occurrence of mass data in various industries, data needs to be processed in more and more scenes, for example, an association relationship between data tables is determined in advance, and the data tables are spliced according to the association relationship.

In the related technology, the incidence relation between two data tables and the main foreign key information of the two data tables are manually input by a user, the mode depends on the manual supply of the main foreign key information and the table incidence relation of the data tables, and when the user cannot obtain or provide the table incidence relation and the main foreign key information of the data tables, the data tables cannot be spliced.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a new technical solution for determining an association relationship between data tables.

Acquiring data table pairs related in a target data table set; wherein the pair of data tables includes a first data table and a second data table;

acquiring a splicing key field pair for associating the first data table and the second data table; wherein the split key field pair comprises a foreign key field of the first data table and a primary key field of the second data table;

and calculating the association degree of the splicing key field pair, and determining the association relation between the data tables according to the calculated association degree.

Optionally, the method further comprises:

and splicing the first data table and the second data table under the condition that the association degree meets a set condition.

Optionally, the method further comprises:

acquiring attribute information of the data table set;

acquiring the calculation complexity of the data table set according to the attribute information and a preset complexity calculation function;

comparing the calculation complexity with the complexity threshold value to obtain a comparison result;

in a case where the comparison result indicates that the computational complexity is less than or equal to the complexity threshold, performing the step of obtaining the pair of data tables involved in the target set of data tables.

Optionally, the method further comprises:

when the comparison result indicates that the computation complexity is greater than the complexity threshold, prompting that the computation complexity is greater than the complexity threshold; alternatively, the first and second electrodes may be,

receiving a forced execution instruction if the comparison result indicates that the computational complexity is greater than the complexity threshold; responding to the forced execution instruction, and executing the step of acquiring the data table pair involved in the target data table set; alternatively, the first and second electrodes may be,

providing a selection interface for selecting a complexity computation function if the comparison result indicates that the computation complexity is greater than the complexity threshold; and taking the complexity calculation function selected through the selection interface as the preset complexity calculation function, and executing the step of obtaining the calculation complexity of the data table set according to the attribute information and the preset complexity calculation function again.

Optionally, the attribute information at least includes a total number of data tables in the data table set, a maximum number of attribute fields of the data tables in the data table set, and a maximum number of rows of the data tables in the data table set.

Optionally, the obtaining of the pair of data tables involved in the target data table set includes:

taking each data table in the data table set as the first data table in sequence;

and aiming at the first data table, sequentially selecting other data tables except the data table from the data table set as second data tables to obtain the data table pair.

Optionally, the obtaining a splicing key field pair for associating the first data table and the second data table includes:

acquiring a foreign key field of the first data table and a primary key field of the second data table;

obtaining the split key field pair according to the foreign key field of the first data table and the primary key field of the second data table.

Optionally, the obtaining the foreign key field of the first data table and the primary key field of the second data table includes:

acquiring one or more attribute fields related in the first data table and the second data table specified by a user as the foreign key field and the primary key field; alternatively, the first and second electrodes may be,

and in the case of no designation by a user, directly taking all attribute fields involved in the first data table and the second data table as the foreign key field and the primary key field.

Optionally, the obtaining the split key field pair according to the foreign key field of the first data table and the primary key field of the second data table includes:

selecting a foreign key field of the first data table;

selecting a primary key field with the same data field type as the foreign key field from the second data table;

and combining the foreign key field and the primary key field to obtain the splicing key field pair.

Optionally, the degree of association includes a degree of overlap,

calculating the relevance of the splicing key field pair, including:

acquiring a unique value list of the foreign key field in the splicing key field pair;

acquiring a unique value list of the main key field in the splicing key field pair;

and acquiring the contact ratio of the foreign key field and the primary key field according to the unique value list of the foreign key field and the unique value list of the primary key field.

Optionally, the obtaining the unique value list of the foreign key field or the obtaining the unique value list of the primary key field includes:

and filtering data meeting set conditions in the foreign key field or the main key field to obtain a unique value list of the foreign key field or the unique value list of the main key field.

Optionally, the setting condition includes a first condition and a second condition, and the filtering out data that satisfies the setting condition in the foreign key field or the primary key field includes:

when the data included in the foreign key field or the main key field is null, filtering out the null; and/or the presence of a gas in the gas,

and when the occurrence frequency of the same data included in the foreign key field or the main key field is more than one time, retaining one data and filtering other data.

Optionally, the obtaining the coincidence degree of the foreign key field and the primary key field according to the unique value list of the foreign key field and the unique value list of the primary key field includes:

sampling the unique value list of the foreign key field according to the preset sampling number;

acquiring the number of data included in the unique value list of the foreign key field after sampling as a first data number;

acquiring the number of data in the unique value list of the foreign key field, which belong to the unique value list of the main key field after sampling, as a second data number;

and acquiring the contact ratio of the external key field and the main key field according to the first data number and the second data number.

Optionally, the sampling the unique value list of the foreign key field according to a preset number of samples includes:

under the condition that the preset sampling number is larger than the number of rows of the first data table, all data in the unique value list of the foreign key field are reserved; alternatively, the first and second electrodes may be,

and extracting the sample number of data from the unique value list of the foreign key field under the condition that the preset sample number is less than or equal to the line number of the first data table.

Optionally, the association degree further includes an editing similarity,

calculating the association degree of the splicing key field pair, and further comprising:

acquiring the editing distance for converting the foreign key field into the primary key field, the length of the field name of the foreign key field and the length of the field name of the primary key field;

and obtaining the editing similarity according to the editing distance, the length of the field name of the foreign key field and the length of the field name of the primary key field.

Optionally, determining an association relationship between the data tables includes:

under the condition that the foreign key field comprises data with more than one occurrence times and the main key field comprises data with more than one occurrence times, acquiring a many-to-many incidence relation of the first data table and the second data table; alternatively, the first and second electrodes may be,

acquiring a many-to-one association relation between the first data table and the right data table under the condition that the foreign key field comprises data with more than one occurrence times and the main key field does not comprise data with more than one occurrence times; alternatively, the first and second electrodes may be,

under the condition that the foreign key field does not comprise data with more than one occurrence times and the main key field comprises data with more than one occurrence times, acquiring a one-to-many incidence relation of the first data table and the second data table; alternatively, the first and second electrodes may be,

and under the condition that the foreign key field does not contain data with more than one occurrence frequency and the main key field does not contain data with more than one occurrence frequency, acquiring the one-to-one association relationship of the first data table and the second data table.

Optionally, the relevance includes at least one of a degree of overlap and an editing similarity,

the splicing the first data table and the second data table under the condition that the association degree meets a set condition includes:

for any data table pair, sequencing the obtained contact ratio and/or the editing similarity of each splicing key field pair to obtain a sequencing result;

acquiring splicing key field pairs corresponding to the preset number of coincidence degrees and/or the editing similarity according to the coincidence degree sequencing result and/or the editing similarity sequencing result, and taking the splicing key field pairs as target splicing key field pairs corresponding to the data table pairs;

and splicing the first data table and the second data table according to the target splicing key field pair.

Optionally, the splicing the first data table and the second data table when the association degree satisfies a set condition further includes:

acquiring a set contact ratio threshold value and/or an editing similarity threshold value;

for any of the pairs of data tables, comparing the contact ratio of each of the pairs of split key fields with the contact ratio threshold value, and/or comparing the editing similarity of each of the pairs of split key fields with the editing similarity threshold value;

taking the splicing key field pair corresponding to the contact ratio with the contact ratio larger than the contact ratio threshold value and/or the splicing key field pair corresponding to the editing similarity with the editing similarity larger than the editing similarity threshold value as a target splicing key field pair corresponding to the data table pair;

for any of the data table pairs, deleting the contact degrees of which the contact degrees are less than or equal to a contact degree threshold value from the sequencing results of the contact degrees; and/or the presence of a gas in the gas,

deleting the editing similarity with the editing similarity smaller than or equal to the threshold value of the editing similarity in the sequencing result of the editing similarity;

taking the splicing key field pair corresponding to the overlap ratio reserved after deletion and/or the splicing key field pair corresponding to the editing similarity reserved after deletion as a target splicing key field pair corresponding to the data table pair;

for any data table pair, acquiring splicing key field pairs corresponding to a preset number of overlap degrees and/or editing similarity in the sorting result of the overlap degrees with the deletion overlap degree smaller than or equal to the overlap degree threshold value and/or the sorting result with the deletion editing similarity smaller than or equal to the editing similarity threshold value, and taking the splicing key field pairs as target splicing key field pairs corresponding to the data table pairs;

Optionally, the method further comprises:

responding to a request for obtaining the ranking result of the contact ratio and/or the editing similarity, and obtaining a set display mode;

and displaying the sequencing result of the contact degree and/or the editing similarity according to the display mode.

Optionally, the association degree includes at least one of a degree of overlap and an editing similarity, and the method further includes:

providing a configuration interface for configuring the association degree;

acquiring configuration information input through the configuration interface; wherein the configuration information at least comprises the on-off state of the contact ratio and/or the editing similarity;

and displaying the contact ratio and/or the editing similarity under the condition that the corresponding switch state is in an opening state.

According to a second aspect of the present disclosure, there is also provided an apparatus for determining an association relationship between data tables, including:

the first acquisition module is used for acquiring data table pairs related in the target data table set; wherein the pair of data tables includes a first data table and a second data table;

a second obtaining module, configured to obtain a splicing key field pair used for associating the first data table and the second data table; wherein the split key field pair comprises a foreign key field of the first data table and a primary key field of the second data table;

and the calculation module is used for calculating the association degree of the splicing key field pair and determining the association relation between the data tables according to the calculated association degree.

Optionally, the apparatus further comprises a splicing module,

and the splicing module is used for splicing the first data table and the second data table under the condition that the association degree meets a set condition.

Optionally, the first obtaining module is specifically configured to:

acquiring attribute information of the data table set;

Optionally, the first obtaining module is specifically configured to:

Optionally, the second obtaining module is specifically configured to:

selecting a foreign key field of the first data table;

Optionally, the association degree includes a contact ratio, and the calculation module is specifically configured to:

Optionally, the calculation module is specifically configured to:

Optionally, the setting condition includes a first condition and a second condition, and the calculating module is specifically configured to:

Optionally, the calculation module is specifically configured to:

The calculation module is specifically configured to:

Optionally, the calculation module is specifically configured to:

Optionally, the association further includes an editing similarity, and the calculation module is specifically configured to:

Optionally, the calculation module is specifically configured to:

counting whether the foreign key field comprises data with the occurrence frequency more than one time; and the number of the first and second groups,

counting whether the main key field comprises data with the occurrence frequency more than one time;

acquiring a many-to-one association relation between the first data table and the second data table under the condition that the foreign key field comprises data with more than one occurrence times and the main key field does not comprise data with more than one occurrence times; alternatively, the first and second electrodes may be,

Optionally, the association degree includes at least one of a degree of overlap and an editing similarity, and the splicing module is specifically configured to:

sequencing the obtained contact ratio and/or the editing similarity of each splicing key field pair to obtain a sequencing result;

acquiring splicing key field pairs corresponding to the preset number of overlap degrees and/or the editing similarities as target splicing key field pairs according to the ordering results of the overlap degrees and/or the ordering results of the editing similarities;

and splicing the related first data table and the second data table according to the target splicing key field pair.

Optionally, the splicing module is specifically configured to:

comparing the contact ratio of each splicing key field pair with the contact ratio threshold value, and/or comparing the editing similarity of each splicing key field pair with the editing similarity threshold value;

taking the splicing key field pair corresponding to the contact ratio with the contact ratio larger than the contact ratio threshold value and/or the splicing key field pair corresponding to the editing similarity with the editing similarity larger than the editing similarity threshold value as a target splicing key field pair;

Optionally, the splicing module is specifically configured to:

deleting the contact degrees of which the contact degrees are less than or equal to a contact degree threshold value from the contact degree sequencing results; and/or the presence of a gas in the gas,

taking the splicing key field pair corresponding to the overlap ratio reserved after deletion and/or the splicing key field pair corresponding to the editing similarity reserved after deletion as a target splicing key field pair;

Optionally, the splicing module is specifically configured to:

acquiring splicing key field pairs corresponding to a preset number of overlap degrees and/or editing similarity in a sorting result of the overlap degrees with the deletion overlap degree smaller than or equal to the overlap degree threshold value and/or in a sorting result with the deletion editing similarity smaller than or equal to the editing similarity threshold value, and taking the splicing key field pairs as target splicing key field pairs;

Optionally, the apparatus further comprises a display module, and the first display module is configured to:

Optionally, the apparatus further comprises a second display module, configured to:

providing a configuration interface for configuring the association degree;

According to a third aspect of the present disclosure, there is also provided an apparatus comprising at least one computing device and at least one storage device, wherein the at least one storage device is configured to store instructions for controlling the at least one computing device to perform the method according to the above first aspect.

According to a fourth aspect of the present disclosure, there is also provided a computer readable storage medium, wherein a computer program is stored thereon, which when executed by a processor, implements the method as described above in the first aspect.

The beneficial effect of the present disclosure lies in that, according to the technical scheme of the embodiment of the present disclosure, it can obtain the data table pair involved in the target data table set, and the splice key field pair for associating two data tables in the data table pair, and calculate the association degree of the splice key field pair, and then automatically identify the association relationship between the data tables, that is, the embodiment of the present disclosure does not rely on manual screening and analyzing the splice key, and can automatically identify the association relationship of the data tables at the same time, thereby reducing a large amount of manpower work, and making the data table association more efficient and accurate.

Drawings

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Fig. 1 is a block diagram showing an example of a hardware configuration of an electronic device that can be used to implement an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a method for determining associations between data tables according to an embodiment of the present disclosure;

FIG. 3 is a functional block diagram of an apparatus for determining associations between data tables according to an embodiment of the present disclosure;

fig. 4 is a schematic block diagram of an apparatus for determining an association relationship between data tables according to another embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Various embodiments and examples according to embodiments of the present invention are described below with reference to the accompanying drawings.

< hardware configuration >

The method of the embodiments of the present disclosure may be implemented by at least one electronic device, i.e., the apparatus 3000 for implementing the method may be disposed on the at least one electronic device. Fig. 1 shows a hardware structure of an arbitrary electronic device. The electronic device shown in fig. 1 may be a portable computer, a desktop computer, a workstation, a server, or the like, or may be any other device having a computing device such as a processor and a storage device such as a memory, and is not limited herein.

As shown in fig. 1, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like. Wherein the processor 1100 is adapted to execute computer programs. The computer program may be written in an instruction set of an architecture such as x86, Arm, RISC, MIPS, SSE, etc. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1400 is capable of wired or wireless communication, for example, and may specifically include Wifi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 1500 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 1600 may include, for example, a touch screen, a keyboard, a somatosensory input, and the like. The electronic device 1000 may output voice information through the speaker 1700, and may collect voice information through the microphone 1800, and the like.

The electronic device shown in fig. 1 is merely illustrative and is in no way meant to limit the invention, its application, or uses. In an embodiment of the present disclosure, the memory 1200 of the electronic device 1000 is used for storing instructions for controlling the processor 1100 to operate so as to execute the method for determining the association relationship between the data tables according to the embodiment of the present disclosure. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.

In one embodiment, an apparatus is provided that includes at least one computing device and at least one storage device to store instructions to control the at least one computing device to perform a method according to any embodiment of the present disclosure.

The apparatus may include at least one electronic device 1000 as shown in fig. 1 to provide at least one computing device, such as a processor, and at least one storage device, such as a memory, without limitation.

< method examples >

In the present embodiment, a method for determining an association relationship between data tables is provided, and the method for determining an association relationship between data tables may be implemented by an electronic device, which may be the electronic device 1000 shown in fig. 1.

According to fig. 2, the method for determining the association relationship between the data tables of the present embodiment may include the following steps S2100 to S2300:

in step S2100, the pair of data tables involved in the target set of data tables is obtained.

The target data table set can be a table set composed of data tables selected by a user and needing data table processing. The target set of data tables may be denoted as E ═ t₁,t₂,…,t_nN represents the number of data tables in the data table set E, t_nRepresents the nth data table.

The pair of data tables includes a first data table and a second data table. For example, for the set of data tables E ═ { t ═ t₁,t₂,…,t_nAny one data table pair in<t_i,t_j>In, t_iIs a first data table, also called left data table, t_jIs a second data table, which may also be referred to as a right data table.

In this embodiment, the acquiring of the data table pair related to the target data table set in step S2100 may further include the following steps S2110 to S2120:

step S2110, sequentially using each data table in the data table set as a first data table.

In step S2110, the data table set E ═ t₁,t₂,t₃I.e. including the data table t in the data table set E₁Data table t₂And data table t₃Here, the data table t may be sequentially formed₁Data table t₂And data table t₃As a first data table.

And S2120, aiming at the first data table, sequentially selecting other data tables except the data table from the data table set as second data tables to obtain data table pairs.

Continuing with the example of step S2110 above, the data table t is first generated₁As the first data table, at this time, the data table t is sequentially selected from the data table set E₁Other data table t₂And data table t₃As a second data table, a data table t is obtained₁A pair of data tables being a first data table<t1,t2>And<t1,t3>here, in the same manner, the data table t can be obtained₂A pair of data tables being a first data table<t2,t1>And<t2,t3>and, obtaining a data table t₃A pair of data tables being a first data table<t3,t1>And<t3,t2>by adopting the sequential acquisition mode, the efficiency and the accuracy of the acquired data table pairs can be ensured.

It will be appreciated that the user may also select only one master data table, considering only such ordered data table pair < t, ti >, where t is the user selected master data table and ti is the other data table in the data table set except for the master data table t.

In step S2200, a splice key field pair for associating the first data table and the second data table is obtained.

The split key field pair includes a foreign key field of the first data table and a primary key field of the second data table.

In this embodiment, the step S2200 of acquiring the concatenation key field pair for associating the first data table and the second data table may further include the following steps S2210 to S2220:

in step S2210, the foreign key field of the first data table and the primary key field of the second data table are obtained.

In one example, the foreign key field of the first data table and the primary key field of the second data table may be specified by a user, for example, the user may specify one or more foreign key fields of the first data table and one or more primary key fields of the second data table, where the electronic device 1000 may obtain one or more attribute fields related to the first data table and the second data table specified by the user as the foreign key field and the primary key field, so that the selected primary key field and the foreign key field are more suitable for the user.

In one example, all attribute fields involved in the first data table and the second data table may be directly used as the foreign key field and the primary key field when the user does not specify the fields.

Using data table set E ═ t₁,t₂For example, the specific contents of the data table t1 and the data table t2 are shown in the following tables 1 and 2, and the data table t 2110 and the data table t2 can be obtained by the method of the above step S2110 and the present step S2120₁A pair of data tables being a first data table<t1,t2>And, with t₂A pair of data tables being a first data table<t2,t1>The data table pair obtained in step S2210<t1,t2>The foreign key field of the first data table t1 may be index, name, level and flag, the primary key field of the second data table t2 may be idx, alias and rank, and the resulting data table pair<t2,t1>The foreign key fields of the first data table t2 may be idx, alias and rank, the primary key fields of the second data table t1 may be index, name, level and flag, and the data tables t1 and t2 are shown in tables 1 and 2 below:

table 1: data table t1

index	name	level	flag
				0	Mike	None	1
1	Peter	a	1
				2	Mary	b	1
3	Steve	None	1
				4	John Doe	a	1

Table 2: data table t2

idx	alias	rank
			3	Mike	c
1	Peter	a
			2	Mary	b
7	Steve	None
			5	John Doe	a
10	Steve	d

Step S2220, a splice key field pair is obtained according to the foreign key field of the first data table and the primary key field of the second data table.

In step S2220, obtaining a concatenation key field pair according to the foreign key field of the first data table and the primary key field of the second data table may further include: selecting a foreign key field of a first data table; selecting a primary key field with the same data field type as the foreign key field from a second data table; and combining the foreign key field and the primary key field to obtain a splicing key field pair.

It is understood that the field index and the field flag in the data table t1 are integer type, the field name and the field level are character type, the field idx in the data table t2 is integer type, and the field alias and the field rank are character type.

Continuing with the example of step S2210, for the data table pair < t1, t2>, the foreign key field index of the first data table t1 in the data table pair < t1, t2> is selected, because the foreign key field index is of the integer type, here, the primary key field of the integer type in the second data table t2 is selected as idx, and the foreign key field index and the primary key field idx are combined to obtain the concatenation key field pair < index, idx >, here, the concatenation key field pair < name >, alias >, < name, rank >, < level, alias >, < level, rank >, < flag, idx > can be obtained in the same manner.

Similarly, for the data table pair < t2, t1>, the foreign key field idx of the first data table t2 in the data table pair < t2, t1> is selected, and since the foreign key field idx is the integer type, here, the primary key field of the integer type in the second data table t1 is selected as index and flag, the foreign key field index and the primary key field index, flag are respectively combined to obtain the splice key field pair < idx, index > and < idx, flag >, where the splice key field pair < alias, name >, < alias, level >, < rank, name >, < rank, level > can be obtained in the same manner.

And step S2300, calculating the association degree of the splicing key field pair, and determining the association relation between the data tables according to the calculated association degree.

In one example, the association degree includes a contact degree, and the calculating of the association degree of the split key field pair in the step S2300 may further include the following steps S2310a to S2330 a:

in step S2310a, a unique value list of the splice key field to the middle and outer key fields is obtained.

In step S2320a, a unique value list of the primary key field in the concatenation key field pair is obtained.

The above step S2310a and the present step S2320a may further include: and filtering data meeting set conditions in the foreign key field or the primary key field to obtain a unique value list of the foreign key field or the unique value list of the primary key field.

The setting condition may include a first condition and a second condition, where the unique value list is obtained according to the filtered field set, which not only can improve the processing speed and reduce the data redundancy, but also can improve the accuracy of the obtained unique value list.

The above first condition may include: the data included in the foreign key field or the primary key field is null. Filtering out the data meeting the set condition in the foreign key field or the primary key field comprises: when the data included in the foreign key field or the primary key field is null, the null is filtered out.

The above second condition may include: the number of occurrences of the same data included in the foreign key field or the primary key field is more than one. Filtering out the data meeting the set condition in the foreign key field or the primary key field comprises: and when the occurrence times of the same data included in the foreign key field or the main key field is more than one time, one data is reserved, and other data is filtered.

In this embodiment, the field is filtered out as long as it satisfies any of the first condition and the second condition.

Illustratively, for the foreign key field level in the concatenation key field pair < level, rank >, a null value None is removed, meanwhile, since the occurrence frequency of the data a is more than one time, any one of the data a is filtered, and the other data a is reserved, so as to obtain a unique value list [ a, b ] of the foreign key field level. And eliminating null values None for the main key field rank in the splicing key field pair < level, rank >, and meanwhile, because the occurrence frequency of the data a is more than one time, filtering any one of the data a and reserving the other data a to obtain a unique value list [ a, b, c, d ] of the foreign key field level.

In step S2330a, the coincidence degree of the foreign key field and the primary key field is obtained according to the unique value list of the foreign key field and the unique value list of the primary key field.

In step S2330a, obtaining the coincidence degree of the foreign key field and the primary key field according to the unique value list of the foreign key field and the unique value list of the primary key field may further include the following steps S2331 to S2334:

step S2331, a unique value list of the foreign key field is sampled according to a preset number of samples.

The preset number of samples may be a value set according to a specific application scenario and a specific application requirement. For example, the preset number of samples may be 100, or may be other values, and this embodiment is not limited herein.

For example, it may be that all data in the unique value list of the foreign key field is retained in the case where the preset number of samples is larger than the number of rows of the first data table.

Illustratively, for the concatenation key field pair < level, rank >, the corresponding number of rows of the first data table t1 is 6, which is less than the number of samples 100, where all data in the unique value list [ a, b ] of the foreign key field level may be retained.

For example, in the case where the preset number of samples is less than or equal to the number of lines of the first data table, the number of samples of data may be extracted from the unique value list of the foreign key field.

In step S2332, the number of data included in the unique value list of the foreign key field after sampling is obtained as the first data number.

Continuing with the example of step S2331, for the concatenation key field pair < level, rank >, the list of unique values of the foreign key field level after sampling is [ a, b ], the number of the data is 2, in order to distinguish the data in step S2332 from the data in the subsequent step S2333, the number of the data 2 is referred to as the first data number.

In step S2333, the number of data in the unique value list of the foreign key field that belongs to the unique value list of the primary key field after sampling is obtained as the second data number.

Continuing with the example of step S2332, for the concatenation key field pair < level, rank >, since the unique value list of the primary key field rank is [ a, b, c, d ], here, since the unique value list of the sampled foreign key field level is [ a, b ], both data a, b in the unique value list are located in the unique value list of the primary key field rank, where the number of the second data is 2.

Step S2334, obtaining the overlap ratio of the foreign key field and the primary key field according to the number of the first data and the number of the second data.

In step S2334, the calculation formula of the contact ratio C is as follows:

wherein L is₁Indicates the first data number, L₂Indicating the second data number.

Continuing with the example of step S2334 above, the first number of data L₁2, the number of second data L₂2, then for the split key field pair<level,rank>The coincidence degree C of the foreign key field level and the primary key field rank of (2/2) is 1. Here, the same method can be used to obtain the split key field pair<index,idx>The overlap ratio C of the foreign key field index and the primary key field idx of (1) is 0.6, and the splice key field pair<name,alias>The contact ratio C of the external key field name and the main key field alias is 1, and the splicing key field pair<name,rank>The overlap ratio C of the external key field name and the main key field rank is 0, and the splicing key field pair<level,alias>The overlap ratio C of the foreign key field level and the primary key field alias is 0, and the splicing key field pair<level,rank>The coincidence degree C of the foreign key field level and the primary key field alias is 1, and the splicing key field pair<flag,idx>The overlap ratio C of the foreign key field flag and the primary key field idx is 1, and the splicing key field pair<idx,index>The overlap ratio C of the foreign key field idx and the primary key field index of 0.5, and the splice key field pair<idx,flag>The overlap ratio C of the foreign key field idx and the primary key field flag of 0.166667, the splice key field pair<alias,name>The contact ratio C of the foreign key field alias and the main key field name is 1, and the splicing key field pair<alias,level>The coincidence degree C of the foreign key field alias and the primary key field level is 0, and the splicing key field pair<rank,name>The contact ratio C of the foreign key field rank and the main key field name is 0, and the splicing key field pair<rank,level>The coincidence degree C of the foreign key field rank and the primary key field level of (2) is 0.5.

In one example, the association degree includes an editing similarity, and the calculating of the association degree of the split key field pair in the step S2300 may further include the following steps S2310b to S2320 b:

in step S2310b, the edit distance to convert the foreign key field into the primary key field, the length of the field name of the foreign key field, and the length of the field name of the primary key field are obtained.

The above edit distance is the minimum number of operands required to convert one character string into another, for example, to convert a character string abc into a character string acb, and a minimum of 2 operations are required, the first operation being to change b in abc to c, and the second operation being to change c in abc to b.

For example, for the concatenation key field pair < level, rank >, the field name of the foreign key field is level, and the field name of the primary key field is rank, it can be known that the edit distance d is 5.

Step S2320b, according to the editing distance, the length of the field name of the foreign key field and the length of the field name of the primary key field, the editing similarity is obtained.

In step S2320b, the editing similarity sim may be calculated by using the following formula:

where d denotes an edit distance, s denotes a length of a field name of the middle-outer key field of the split key field pair, t denotes a length of a field name of the main key field of the split key field pair, and max (s, t) denotes taking a maximum value of s and t.

Continuing with the example of step S2320b above, for a splice key field pair<level,rank>The edit distance d is 5, the length s of the field name of the foreign key field level is 5, the length t of the field name of the primary key field rank is 4, max (s, t) is 5, and for the split key field pair<level,rank>Has an editing similarity of

It will be appreciated that other similarity calculation methods, such as Jaccard similarity, may be used.

In one example, the association degree may further include whether data having a number of occurrences greater than one is included in the foreign key field and the primary key field.

For example, it may be counted whether data whose number of occurrences is more than one is included in the foreign key field and whether data whose number of occurrences is more than one is included in the primary key field.

Specifically, under the condition that the foreign key field comprises data with more than one occurrence times and the main key field comprises data with more than one occurrence times, a many-to-many incidence relation between the first data table and the second data table is obtained; alternatively, the first and second electrodes may be,

under the condition that the foreign key field comprises data with more than one occurrence times and the main key field does not comprise data with more than one occurrence times, acquiring a many-to-one incidence relation of the first data table and the second data table; alternatively, the first and second electrodes may be,

under the condition that the foreign key field does not include data with more than one occurrence times and the main key field includes data with more than one occurrence times, acquiring a one-to-many incidence relation between the first data table and the second data table; alternatively, the first and second electrodes may be,

and under the condition that the foreign key field does not contain data with the occurrence frequency more than one time and the primary key field does not contain data with the occurrence frequency more than one time, acquiring the one-to-one association relationship of the first data table and the second data table.

According to the method disclosed by the embodiment of the invention, the data table pairs related in the target data table set can be obtained, the splicing key field pairs used for correlating two data tables in the data table pairs are calculated, and the association degree of the splicing key field pairs is calculated, so that the association relation between the data tables is automatically identified.

In one embodiment, before the step of acquiring the pair of data tables involved in the target data table set in step S2100, the following steps S2010 to S2040 are further included:

step S2010, acquiring attribute information of the data table set.

The attribute information of the data table set at least comprises the total number of the data tables in the data table set, the maximum number of the attribute fields of the data tables in the data table set, and the maximum row number of the data tables in the data table set.

Step 2020, the calculation complexity of the data table set is obtained according to the attribute information and a preset complexity calculation function.

The calculation complexity in step S2020 can be calculated using the following formula:

ceil(log₁₀(n_tables ²×max_n_col²×max_n_row)) (3)

wherein n is_tablesDenotes the total number of data tables in the data table set, max _ n _ col denotes the maximum number of attribute fields of the data tables in the data table set, max _ n _ row denotes the maximum number of rows of the data tables in the data table set, log₁₀Representing a base 10 logarithmic function with ceil being an rounding-up function.

And step S2030, comparing the calculation complexity with the complexity threshold value to obtain a comparison result.

The complexity threshold may be a numerical value set according to a specific application scenario and a specific application requirement, and this embodiment is not limited herein.

Step S2040, in case the comparison result indicates that the computation complexity is less than or equal to the complexity threshold, performs the step of acquiring the pair of data tables involved in the target set of data tables.

According to the embodiment, before the data table pairs in the target data table set are obtained, the calculation complexity of the target data table set is estimated, then the calculation complexity is compared with the preset complexity threshold, and if the estimated calculation complexity is smaller than or equal to the threshold, the subsequent steps are continued, so that resource waste can be avoided.

In this embodiment, when the comparison result indicates that the computation complexity is greater than the complexity threshold, a prompt is made that the computation complexity is greater than the complexity threshold; alternatively, the first and second electrodes may be,

receiving a forced execution instruction under the condition that the comparison result shows that the calculation complexity is greater than the complexity threshold value; responding to the forced execution instruction, and executing the step of acquiring the data table pairs involved in the target data table set; alternatively, the first and second electrodes may be,

providing a selection interface for selecting a complexity calculation function under the condition that the comparison result indicates that the calculation complexity is greater than the complexity threshold; and taking the complexity calculation function selected through the selection interface as a preset complexity calculation function, and re-executing the complexity calculation function according to the attribute information and the preset complexity calculation function to obtain the calculation complexity of the target data table set.

As can be seen from the above, in the case that the computation complexity is greater than the complexity threshold, the user will be prompted that the current computation complexity is higher, and relevant suggestions for reducing the computation complexity are provided. Meanwhile, the user may also choose to enforce, or may use other formulas or models that estimate the computational complexity.

In one embodiment, since the association degree may include a degree of overlap and an editing similarity, after calculating the association degree of the join key pair according to the above step S2300, the method further includes step S2400 of joining the first data table and the second data table when the association degree satisfies a set condition, where table 3 represents the relevant information of the target data table set E ═ { t1, t2} obtained according to the above embodiment:

table 3: relevant information of target data table set E ═ { t1, t2}

In table 3, left _ table represents the left data table, i.e., the first data table is the first, right _ table represents the right data table, i.e., the second data table, left _ key represents the left key, i.e., the outer key of the first data table, right _ key represents the right key, i.e., the main key of the second data table, left _ est _ join _ rto represents the degree of coincidence, relationship represents the table association relationship, 1: n denotes that the relationship is one-to-many, N: n represents that the correlation is many-to-many, N: 1 denotes that the correlation is many-to-one, 1: 1 represents that the association relationship is one-to-one, left _ is _ primary represents that the left key is not a unique value, that is, whether the foreign key of the first data table is a unique value, 1 represents a unique value, 0 represents a unique value, right _ is _ primary represents that the right key is a unique value, that is, whether the left key of the second data table is a unique value, 1 represents a unique value, and 0 represents a unique value, wherein the table association relationship can be determined according to the value of left _ is _ primary and the value of right _ is _ primaryIs, for example, the first data table t in the first row of Table 3₁Has a left _ is _ primary value of 1 corresponding to the foreign key index, and a second data table t₂The value of right _ is _ primary corresponding to the primary key idx is 1, and the corresponding table relationship type is one-to-one.

In one example, when the association degree satisfies the set condition in step S2400, the splicing the first data table and the second data table may include the following steps S2410a to S2430 a:

step S2410a, for any data table pair, sorting the obtained overlap ratio and/or editing similarity of each splicing key field pair to obtain a sorting result.

In this step S2410a, for example, for the data table pair < t1, t2> and the data table pair < t2, t1>, the obtained coincidence degrees (left _ est _ join _ rot) of each splice key field pair are sorted in descending order, and the obtained sorting results are shown in table 4:

table 4: sorted related information of the target data table set E ═ { t1, t2}

Step S2420a, according to the ranking result of the overlap ratio and/or the ranking result of the editing similarity, obtaining the previous predetermined number of splicing key field pairs corresponding to the overlap ratio and/or the editing similarity as the target splicing key field pair corresponding to the data table pair.

In this step S2420a, for the data table pair < t1, t2> and the data table pair < t2, t1>, the splicing key field pair corresponding to the overlap ratio of the first 3 bits may be respectively selected as the target splicing key field pair, for example, the target splicing key field pair for the data table pair < t1, t2 in table 5 is < name, alias >, < level, rank > and < flag, idx >, and the target splicing key field pair for the data table pair < t2, t1> is < alias, name >, < idx, index > and < rank, level >, as shown in table 5:

table 5: target splice key field pair of target data table set E ═ { t1, t2}

And step S2430a, splicing the first data table and the second data table according to the target splicing key field pair.

Continuing with the example of step S2420a above, for example, data table t1 and data table t2 may be left stitched according to < name, alias > or < level, rank > or < flag, idx >, and data tables t1 and t2 may be left stitched according to < alias, name >, < idx, index > and < rank, level >.

In one example, when the association degree satisfies the set condition in step S2400, the splicing the first data table and the second data table may include the following steps S2410b to S2440 b:

in step S2410b, the set overlap ratio threshold and/or edit similarity threshold is obtained.

The overlap ratio threshold and the editing similarity threshold may be set according to a specific application scenario and a specific application requirement, for example, the overlap ratio threshold and the editing similarity threshold may both be 0.7.

In step S2420b, for any data table pair, the overlapping degree of each split key field pair is compared with the overlapping degree threshold value, and/or the editing similarity of each split key field pair is compared with the editing similarity threshold value.

Step S2430b, the splice key field pair corresponding to the overlap ratio with the overlap ratio greater than the overlap ratio threshold value and/or the splice key field pair corresponding to the edit similarity with the edit similarity greater than the edit similarity threshold value is used as the target splice key field pair corresponding to the data table pair.

And step S2440b, splicing the first data table and the second data table according to the target splicing key field pair.

In one example, when the association degree satisfies the set condition in step S2400, the splicing the first data table and the second data table may include the following steps S2410c to S2440 c:

in step S2410c, for any data table pair, the overlap ratio with the overlap ratio less than or equal to the overlap ratio threshold is deleted from the ranking result of the overlap ratios.

In step S2420c, the editing similarity having the editing similarity smaller than or equal to the editing similarity threshold is deleted from the ranking result of the editing similarity.

And step S2430c, taking the splicing key field pair corresponding to the overlap ratio retained after deletion and/or the splicing key field pair corresponding to the editing similarity retained after deletion as the target splicing key field pair corresponding to the data table pair.

And step S2440c, splicing the first data table and the second data table according to the target splicing key field pair.

In one example, when the association degree satisfies the setting condition in step S2400, the splicing the first data table and the second data table may include the following steps S2410d to S2420 d:

step S2410d, for any data table pair, acquiring, as target splice key field pairs, splice key field pairs corresponding to a predetermined number of overlap ratios and/or edit similarities in the sorting result of the overlap ratios with the deletion overlap ratio less than or equal to the overlap ratio threshold and/or in the sorting result with the deletion edit similarity less than or equal to the edit similarity threshold.

For example, if the threshold value of the overlap ratio is 0.7 and the predetermined number is 2, then the candidate field pairs with the overlap ratio lower than the threshold value of the overlap ratio 0.7 are removed from the sorting result, and the 2 field pairs with the highest overlap ratio and editing similarity among all the candidate field pairs of each data table pair are selected, where the target splicing key field pair for the data table pair < t1, t2> in table 6 is < name, alias > and < level, rank >, and the target splicing key field pair for the data table pair < t2, t1> is < alias, name >, as shown in table 6 below:

table 6: target splice key field pair of target data table set E ═ { t1, t2}

Step S2420d, concatenate the first data table and the second data table according to the target concatenation key field pair.

Continuing with the example of step S2420a above, for example, data table t1 and data table t2 may be left stitched according to < name, alias > or < level, rank >, and data tables t1 and t2 may be left stitched according to < alias, name >.

In one embodiment, the method for determining the association relationship between the data tables according to the present disclosure may further include: responding to a request for obtaining a sequencing result of the contact ratio and/or the editing similarity, and obtaining a set display mode; and displaying the sequencing results of the contact ratio and/or the editing similarity according to the display mode.

The display mode may be in the form of a graph.

In this embodiment, it may display the ranking results of the degrees of overlap and/or the editing similarities according to the set display mode according to the request for obtaining the ranking results of the degrees of overlap and/or the editing similarities, so that the display output has more friendly visibility.

In one embodiment, the method for determining the association relationship between the data tables according to the present disclosure may further include the following steps 3100 to S3300:

and step S3100, providing a configuration interface for configuring the association degree.

Step S3200, obtaining configuration information input through a configuration interface.

The configuration information includes at least the on-off state of the degree of coincidence and/or the degree of editing similarity.

And step S3300, displaying the coincidence degree and/or the editing similarity when the corresponding switch state is in the opening state.

According to the embodiment of the present disclosure, the user may manually select the display items, for example, as in tables 3, 4, 5, and 6, the editing similarity is not displayed, of course, the contact ratio may not be displayed, and of course, other items may not be displayed, which is not limited herein.

< apparatus embodiment >

In the present embodiment, an apparatus 3000 for determining an association relationship between data tables is provided, as shown in fig. 3, a first obtaining module 3100, a second obtaining module 3200, and a calculating module 3300.

A first obtaining module 3100, configured to obtain pairs of data tables involved in a target set of data tables; wherein the pair of data tables includes a first data table and a second data table.

A second obtaining module 3200, configured to obtain a splicing key field pair associated with the first data table and the second data table; wherein the split key field pair includes a foreign key field of the first data table and a primary key field of the second data table.

A calculating module 3300, configured to calculate a degree of association of the splicing key field pair, and determine an association relationship between the data tables according to the calculated degree of association.

In one embodiment, as shown in fig. 4, the apparatus 3000 further comprises a splicing module 3400.

The splicing module 3400 is configured to splice the first data table and the second data table when the association degree meets a set condition.

In one embodiment, the first acquisition module 3100 is specifically configured to:

acquiring attribute information of the data table set;

In one embodiment, the attribute information at least includes a total number of data tables in the set of data tables, a maximum number of attribute fields of the data tables in the set of data tables, and a maximum number of rows of the data tables in the set of data tables.

In one embodiment, the second acquisition module 3100 is specifically configured to:

In one embodiment, the second obtaining module 3200 is specifically configured to:

selecting a foreign key field of the first data table;

In one embodiment, the association degree includes a contact ratio, and the calculating module 3300 is specifically configured to:

In one embodiment, the calculation module 3300 is specifically configured to:

In an embodiment, the setting condition includes a first condition and a second condition, and the calculating module 3300 is specifically configured to:

In one embodiment, the calculation module 3300 is specifically configured to:

In one embodiment, the association further includes an editing similarity, and the calculating module 3300 is specifically configured to:

In one embodiment, the calculation module 3300 is specifically configured to:

In one embodiment, the association includes at least one of a degree of overlap and an editing similarity, and the splicing module 3400 is specifically configured to:

In one embodiment, the splicing module 3400 is specifically configured to:

In one embodiment, the apparatus 3000 further comprises a first display module (not shown), configured to:

In one embodiment, the apparatus 3000 further comprises a second display module configured to:

providing a configuration interface for configuring the association degree;

< storage Medium embodiment >

The present embodiment provides a computer-readable storage medium, wherein a computer program is stored thereon, which computer program, when being executed by a processor, realizes the method according to any one of the above-mentioned method embodiments.

The present invention may be an apparatus, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A method of determining associations between data tables, comprising:

2. The method of claim 1, wherein the method further comprises:

3. The method of claim 1, wherein the method further comprises:

acquiring attribute information of the target data table set;

acquiring the calculation complexity of the target data table set according to the attribute information and a preset complexity calculation function;

4. The method of claim 3, wherein the method further comprises:

providing a selection interface for selecting a complexity computation function if the comparison result indicates that the computation complexity is greater than the complexity threshold; and taking the complexity calculation function selected through the selection interface as the preset complexity calculation function, and executing the step of obtaining the calculation complexity of the target data table set according to the attribute information and the preset complexity calculation function again.

5. The method of claim 3, wherein,

the attribute information at least comprises the total number of the data tables in the data table set, the maximum number of the attribute fields of the data tables in the data table set, and the maximum row number of the data tables in the data table set.

6. The method of claim 1, wherein the obtaining the pair of data tables involved in the target set of data tables comprises:

7. The method of claim 1, wherein the obtaining a splice key field pair for associating the first data table and the second data table comprises:

8. An apparatus for determining associations between data tables, comprising:

9. An apparatus comprising at least one computing device and at least one storage device, wherein the at least one storage device is to store instructions that when executed by the at least one computing device implement the method of any of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.