WO2021057088A1

WO2021057088A1 - Data connection method and apparatus and electronic device

Info

Publication number: WO2021057088A1
Application number: PCT/CN2020/094583
Authority: WO
Inventors: 陈萌萌
Original assignee: 蚂蚁金服（杭州）网络技术有限公司
Priority date: 2019-09-27
Filing date: 2020-06-05
Publication date: 2021-04-01
Also published as: CN110807030A; CN110807030B

Abstract

A data connection method and apparatus and an electronic device. The method comprises: carrying out a data connection operation on a first data set and a second data set by means of a first connection algorithm, to connect data of the same key value in the first data set and the second data set; in the execution process of the first connection algorithm, obtaining an execution parameter representing the execution efficiency of the algorithm in the execution process of the first connection algorithm; and switching the first connection algorithm into a second connection algorithm according to the execution parameter, and continuing executing the data connection operation by means of the second connection algorithm. According to the method, automatic switching of the connection algorithm is realized according to the algorithm execution state in the data connection process.

Description

Method, device and electronic equipment for data connection

Technical field

This specification relates to the field of software technology, and in particular to a data connection method, device and electronic equipment.

Background technique

Join is to associate two data together. For example, after data A and B are connected, data AB can be formed, which is used for data query, file association, etc. It is one of the basic relational algebra operations of the database. In the database implementation Has a wide range of application scenarios. The realization of the connection algorithm directly affects the realization efficiency of the connection operator (Join Operator), and has a vital impact on the performance of the entire database. Common connection algorithms (Join Algorithm) include Nested-Loop Join, Merge Join, Hash Join, and so on. In contrast to merge joins, a zigzag merge join (Zigzag Merge Join) also appears.

Merge connection: For two data sets, the data in one data set is usually called the left input, and the data set in the other data set is called the right input. When the input on the left and right sides is ordered according to the connection key , The required data connection operation is completed by sequentially accessing the records input on the left and right sides. Due to the stringent requirements, the execution efficiency is low in most scenarios. Zigzag merge connection: A variant of the merge connection. The zigzag merge connection uses the index structure input on the left and right sides to alternately locate the data on the other side through the input key values on the left and right sides to avoid accessing invalid data on the left and right sides. However, because the zigzag merge connection uses a search operation when accessing the next row, when most of the data is valid, the search cost is higher than the cost of sequential access, resulting in a greatly reduced efficiency of the zigzag merge connection. That is, no matter whether the merge connection or the zigzag merge connection is used, there is a problem of low execution efficiency, and a method to improve the execution efficiency of data connections is urgently needed.

Summary of the invention

The embodiments of this specification provide a data connection method, device, and electronic equipment, which are used to solve the technical problem of low data connection execution efficiency.

In the first aspect, an embodiment of this specification provides a data connection method, and the method includes:

Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;

During the execution of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;

Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .

Optionally, the obtaining the execution parameters of the first connection algorithm includes:

When the first connection algorithm is a merge connection algorithm, obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or,

When the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.

When the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, a preset step is added to the value of the execution parameter; otherwise, the value of the execution parameter is subtracted Go to the preset step size;

When the first connection algorithm is a zigzag merge connection algorithm, the value of the execution parameter is added to the preset step every time the data row found is the same as the next row of the previously read data row. Long, otherwise, subtract the preset step length from the value of the execution parameter.

Optionally, when the first connection algorithm is a merge connection algorithm, adding a preset step to the value of the execution parameter every time a data row in a data set occurs continuously is read, including:

When the first connection algorithm is a merge connection algorithm, determine the size of the key value of the read current first data row and the current second data row, the first data row belongs to the first data set, and the second data The row belongs to the second data set;

If the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row that was judged last time. The key value of the first data row is less than or equal to the key value of the second data row, and the preset step is added to the execution parameter;

If the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row judged last time and the key value of the second data row. The key value of one data row is greater than the key value of the second data row, and the preset step size is added to the execution parameter.

Optionally, the first connection algorithm is a merge connection algorithm or a zigzag merge connection algorithm, and the second connection algorithm is a zigzag merge connection algorithm or a merge connection algorithm.

In the second aspect, this embodiment provides a data connection device, and the device includes:

The connection unit is configured to perform a data connection operation on the first data set and the second data set through a first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;

A detection unit, configured to obtain execution parameters of the first connection algorithm during the execution of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;

The switching unit is configured to switch the first connection algorithm to the second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the The first connection algorithm is different.

Optionally, the detection unit is configured to: when the first connection algorithm is a merge connection algorithm, obtain the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or When the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.

Optionally, the detection unit is configured to: when the first connection algorithm is a merge connection algorithm, add a preset step to the value of the execution parameter every time a data row in a data set is continuously read. , On the contrary, subtract the preset step length from the value of the execution parameter; when the first connection algorithm is a zigzag merge connection algorithm, every time the data line found and the data line read before appear When the next line of is the same, the preset step size is added to the value of the execution parameter, otherwise, the preset step size is subtracted from the value of the execution parameter.

Optionally, the detection unit is further configured to: when the first connection algorithm is a merge connection algorithm, determine the size of the key value of the current first data row and the current second data row that have been read, and the first data The row belongs to the first data set, and the second data row belongs to the second data set; if the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the first data judged last time The judgment result of the key value of the row and the key value of the second data row. If the key value of the first data row judged last time is less than or equal to the key value of the second data row, add the preset value to the execution parameter. Set the step size; if the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row judged last time. The key value of the first data row is judged to be greater than the key value of the second data row, and the preset step is added to the execution parameter.

In the third aspect, a computer-readable storage medium has a computer program stored thereon, and when the program is executed by a processor, the following steps are implemented:

In the execution process of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;

In a fourth aspect, an electronic device includes a memory and one or more programs, wherein one or more programs are stored in the memory and are configured to be executed by one or more processors The program contains instructions for the following operations:

The above one or more technical solutions in the embodiments of this specification have at least the following technical effects:

The embodiment of this specification provides a data connection method, including: performing a data connection operation on a first data set and a second data set through a first connection algorithm to connect data with the same key value in the first data set and the second data set During the execution of the first connection algorithm, the execution parameters of the first connection algorithm are acquired, and the execution parameters are used to characterize the execution efficiency of the algorithm execution process; according to the acquired execution parameters, the first connection algorithm is selected to switch to the second Connection algorithm. The above method is for data connection, and does not specify the use of a certain connection algorithm, but adaptively switches according to the execution efficiency of the current connection algorithm, so as to realize the algorithm's adaptive adjustment to different data, in scenarios that include different data distributions It can achieve high execution efficiency, overcome the relatively inefficient problem of a single connection algorithm for complex data, and improve the execution efficiency of data connections.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of this specification, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the specification. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work.

FIG. 1 is a schematic flowchart of a data connection method provided by an embodiment of this specification;

2a is a partial flowchart of the adaptive switching between the zigzag merged connection algorithm and the merged connection algorithm provided by the embodiment of this specification;

2b is a partial flowchart of the adaptive switching between the zigzag merged connection algorithm and the merged connection algorithm provided by the embodiment of this specification;

FIG. 3 is a schematic diagram of the inefficient execution of only the zigzag merge connection algorithm provided by the embodiment of this specification;

FIG. 4 is a schematic diagram of the implementation of the zigzag merged connection algorithm and the merged connection algorithm provided by the embodiment of this specification during adaptive switching;

Figure 5 is a schematic diagram of a data connection device provided by an embodiment of the specification

Fig. 6 is a schematic diagram of an electronic device provided by an embodiment of the specification.

detailed description

In order to make the purpose, technical solutions and advantages of the embodiments of this specification clearer, the following will clearly and completely describe the technical solutions in the embodiments of this specification in conjunction with the drawings in the embodiments of this specification. Obviously, the described embodiments It is a part of the embodiments of this specification, not all the embodiments. Based on the embodiments in this specification, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this specification.

The embodiment of this specification provides a data connection method. By obtaining the execution parameter that characterizes the execution efficiency of the connection algorithm, and switching the connection algorithm based on the execution parameter, the algorithm is adaptively adjusted to different data, thereby improving the execution efficiency of the data connection. .

The main implementation principles, specific implementation manners and corresponding beneficial effects of the technical solutions of the embodiments of the present specification will be described in detail below in conjunction with the accompanying drawings.

This embodiment provides a data connection method, which is applied to data processing systems, such as databases, data tables, and other systems that require data connection. The merge connection algorithm and the zigzag merge connection algorithm are used to connect the first data set and the second data set to be connected. The data set is adaptively connected to the data. Please refer to Figure 1. The data connection method includes:

Step 10. Perform a data connection operation on the first data set and the second data set by using the first connection algorithm to connect data with the same key value in the first data set and the second data set;

Step 11. During the execution of the first connection algorithm, obtain execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;

Step 12. Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the first connection algorithm. The connection algorithm is different.

Wherein, when the data connection is initialized, step 10 may randomly select any one of the merge connection algorithm and the zigzag merge connection algorithm as the first connection algorithm of the data connection. Alternatively, in step 10, the first connection algorithm may be set as the common algorithm of each data processing system according to the usage records of each database. For example, the zigzag merge connection algorithm is set as the first connection algorithm, and the zigzag merge connection algorithm is set as the second connection algorithm. The connection algorithm uses the merge connection algorithm as a supplement to the zigzag merge connection algorithm to perform data connection operations to achieve data connection.

When performing a data connection operation, you need to input the first data set and the second data set into the memory respectively. It is customary to call the data in the first data set the left input and the data in the second data set as the right input. The left input and the right input respectively contain multiple rows of data, and each row of data corresponds to its own connection key. Before step 10, the left input and the right input can be sorted according to the key value of the connection key, that is, the value, for example, the key value can be sorted in ascending order, so that the left and right inputs are ordered for the connection key. When the input on the left and right sides is in order for the connection keys, it can not only ensure the fast execution of the merge connection algorithm, but also effectively improve the key value search efficiency of the zigzag merge connection algorithm.

When the input on the left and right sides is in order for the connection keys, the execution process of the merge connection algorithm is as follows:

Step 1: Obtain a row of data entered on the left and on the right, and judge whether the key values of the connection keys are equal, if they are equal, go to the third step; if they are not equal, go to the second step; if either left or right If the side input is exhausted, then enter the fourth step;

Step 2: If the left key value is less than the right key value, read the next row of data on the left; otherwise, read the next row of data on the right, and then go back to the third step;

Step 3: Cache the read data in the left and right input lines into two temporary buffer areas respectively, and output the connection result of the two lines of data; then, read the next line on the right, if the next line on the right If the key value is equal to the previous right row key value, continue to cache the row to the buffer area, and output the connection result of the next right row and the current left row, continue to read the next right row, and loop until the right row key value Different from the previous line; then, start to read the next line on the left. If the key value of the previous line on the left is the same, scan all the lines on the right side of the buffer in the buffer in turn, and perform the connection operation, and repeat until If the key value of the left row is different from the previous row, stop the calculation and return to the third step;

Step 4: The merge connection algorithm ends, and the remaining data on the unfinished side is discarded.

The execution process of the zigzag merge connection algorithm is basically the same as the logic of the merge connection algorithm. The difference is that in the second step, if the left key value is less than the right key value, the left input data is searched according to the right key value Operation, find the first left input greater than or equal to the right key value as the next line of the left input; vice versa.

During the execution of step 10, step 11 is further executed to obtain the execution parameters of the current algorithm, that is, the execution process of the first connection algorithm. Among them, the parameters that characterize the execution efficiency of the connection algorithm include connection efficiency, connection output time, and algorithm calculation amount. In step 11, any of the above parameters or any combination of parameters can be selected as the execution parameters. When two or two parameters are combined, the weighted sum of each parameter can be used as the execution parameter. Continue to step 12 for the acquired execution parameters.

When step 12 is executed, it can be determined whether the acquired execution parameter is greater than the set threshold. Greater than the set threshold indicates that the algorithm execution efficiency is low, and algorithm switching is required. Otherwise, algorithm switching is not required. The set threshold can be set according to different algorithms and different efficiency requirements, as long as Condition 1 and Condition 2 are satisfied. This embodiment does not limit the specific value of the set threshold. Condition 1: In the basic merge algorithm, if there are fewer rows that continuously access data rows in a data set, the more likely it is to switch to the zigzag merge connection algorithm, and vice versa, the less inclined to use the zigzag merge connection algorithm. Condition 2: In the zigzag basic merge algorithm, if there are fewer lines skipped by the search, the more likely it is to switch to the merge connection algorithm, and vice versa, the less inclined to use the merge connection algorithm.

If the execution parameter is greater than the set threshold, switch the current connection algorithm: switch the first connection algorithm to the second connection algorithm, if the merge connection algorithm is currently used, switch it to the zigzag merge connection algorithm, otherwise, if If the zigzag merge connection algorithm is currently used, switch it to the merge connection algorithm, and continue to perform the data connection operation for data connection through the switched zigzag merge connection algorithm or merge connection algorithm.

In the process of performing the entire data connection operation, step 10 to step 12 are executed cyclically, so that the entire data processing process is switched in real time according to the actual execution of the algorithm. The use of a more efficient connection algorithm for the current data overcomes the use of a certain algorithm. There is a technical problem of low execution efficiency in certain scenarios.

For the merge connection algorithm, step 11 may adopt a method of obtaining algorithm revenue to obtain execution parameters: obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter of the merge connection algorithm. The merge join algorithm continuously accesses the data rows in a data set with more rows, the smaller the algorithm profit, and the lower its execution efficiency, the more inclined it is to convert to a zigzag merge join algorithm, which will continuously access the data rows in a data set The number of rows is used as an execution parameter to characterize the efficiency of execution. Specifically, for the merge connection algorithm, each row input on the left and right sides can be sequentially numbered, and the current row number of continuous access minus the first row number of continuous access is the number of continuous access rows.

For the zigzag merge connection algorithm, step 11 can also use the method of obtaining the algorithm revenue to obtain the execution parameters: obtain the data rows skipped by the zigzag merge connection algorithm as the execution parameters. The zigzag merge connection algorithm obtains the next line of input by searching. When it is detected that the number of rows that can be skipped by the zigzag merge connection algorithm is detected, the less the algorithm gains, the lower the execution efficiency, and the more inclined to convert to the merge connection algorithm. The number of skipped data rows is used as an execution parameter to characterize execution efficiency, which is also simple and effective. Specifically, it is also possible to sequentially number each line entered on the left and right sides, record the next line number A of the line currently accessed by the zigzag merge connection algorithm, and the next line number B, number B obtained by the zigzag merge connection algorithm search Subtracting the number A is the number of rows skipped by the zigzag merge connection algorithm.

Based on the same concept of obtaining algorithm revenue as the execution parameter, this implementation can also use the following method to perform step 11: When the merge connection algorithm is used to connect the output, every time a data row in a data set is continuously read, the execution parameter is affected. The value plus the preset step length, on the contrary, subtract the preset step length from the value of the execution parameter; when the zigzag merge connection algorithm is used to connect the output, every time the data line found and the data line read before appear When the next line is the same, add the preset step length to the value of the execution parameter, otherwise, subtract the preset step length from the value of the execution parameter. Among them, the initial value of the execution parameter can be set to zero. By adopting the method of cumulative addition and subtraction step size to obtain the execution parameters, there is no need to number the left and right input lines to be connected, which effectively saves storage space and calculation amount, and further improves the execution efficiency.

Wherein, when the merge connection algorithm is used to connect the output, the key value size of the read current first data row and the current second data row is judged, and the first data row, that is, the data row input on the left belongs to the first data set, The second data row, that is, the data row entered on the right belongs to the second data set; if the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the first data judged last time The judgment result of the key value of the row and the key value of the second data row. If the key value of the first data row judged last time is less than or equal to the key value of the second data row, add the preset value to the execution parameter. Set the step size; if the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row judged last time. The key value of the first data row is judged to be greater than the key value of the second data row, and the preset step is added to the execution parameter.

Please refer to Figure 2a and Figure 2b for a complete execution process in this embodiment:

①. Set l_key and r_key as the theoretical lower bound of the connection key value. l_key corresponds to the row key value entered on the left, and r_key corresponds to the row key value entered on the right. Normally, the theoretical lower bound of the connected key value is the theoretical lower bound of the current numeric type, for example, for 64-bit signed integer data , Its theoretical lower bound is -9,223,372,036,854,775,808, and other different data types are based on the realization of each database, and may have different theoretical lower bounds.

②. Find the first row of data greater than or equal to l_key through the lookup function, and make l_row point to this row, that is, find the initial first data row.

③. Find the first row of data greater than or equal to r_key through the lookup function, and make r_row point to this row, that is, find the initial second data row.

④. Determine whether the key value of the row pointed to by l_row is less than or equal to the key value of the row pointed to by r_row, if it is, go to step ⑤, if not, go to step

⑤. Determine whether the key value of the new_row row pointed to by l_row is equal to the key value of the new_row row pointed to by r_row, if it is, go to step ⑩, if not, proceed to step ⑥. Among them, the row pointed to by l_row or r_row is also called the current input row or new_row row.

⑥. Judge whether begin_left is not empty, if it is, go to step ⑦, if not, go to step ⑨. Among them, begin_left is the pointer used to store the previously read row.

⑦. Assign begin_left to l_row.

⑧. Let begin_left be empty.

⑨. Let direction=0; make input point to the left table; assign l_key to old_key; assign r_key to new_key; point l_row to new_row, and then go to step

Wherein, direction=0 indicates that the last key value comparison result of the left and right rows is that the left table row is less than or equal to the right table row, and direction=1 indicates that the last left and right row key value comparison result is that the left table row is greater than the right table row.

⑩. Output the connection result of the row pointed to by l_row and r_row.

Determine if begin_left is empty, if yes, go to step

If not, go to step

Let begin_left=l_row.

Judge whether current_mode is equal to 1, if yes, go to step

If not, go to step ⑨. Among them, current_mode=1 indicates that the merge connection algorithm is currently used, and current_mode=0 indicates that the zigzag merge connection algorithm is currently used.

Determine whether the direction is equal to 0, if so, the progress step

If not, go to step

Decrease use_basic by 1 and go to step ⑨ Among them, use_basic is the execution parameter.

Add 1 to use_basic and go to step ⑨.

Determine whether current_mode is equal to 1, if yes, continue to the steps

If not, go to step

Determine whether the direction is equal to 1, if yes, continue to the steps

If not, go to step

add 1 to use_basic and go to step

Decrease use_basic by 1, and go to step

Let direction=1; make the input point to the right table; assign r_key to old_key; assign l_key to new_key; point r_row to new_row, and then continue to the steps

Set current_mode=(use_basic>threshold), that is, if use_basic is greater than the threshold, set current_mode=1 to make the currently used merge connection algorithm use the merge connection algorithm, otherwise, set current_mode=0 to make the currently used merge connection algorithm use zigzag Combine the connection algorithm, that is, switch the algorithm when the execution parameter is greater than the threshold.

Determine whether current_mode is equal to 0, if yes, go to step

If not, go to step

Use the next function to read the next line of the old_key of the current input and make next_row point to this line.

Use the lookup function to find the first line greater than or equal to new_key through new_key, and make new_key point to this line.

Judge whether next_row and new_row point to the same row, if yes, go to step

If not, go to step

add 1 to use_basic and go to step

Decrease use_basic by 1, and go to step

Let current_mode=1. Here, you can also skip directly to the next step

Use the next function to read the next line of the current input old_key, and make new_row point to this line.

Judge whether direction is equal to 0, if yes, go to step

If not, go to step

Make l_row point to the row pointed to by new_row, and go to step

Make r_row point to the row pointed to by new_row, and go to step

Judge whether new_row is empty, if so, end, if not, go to step ④.

Figure 2 shows the logic of the data connection method provided in this embodiment. Steps ①-③ are the initial work part. The initial algorithm logic is the zigzag merge connection algorithm; the overall algorithm includes the merge connection algorithm and the zigzag merge connection algorithm. The main body of the merge connection algorithm is the first

Step, the main body of the zigzag merge connection algorithm is the first

step. First

Step, when the algorithm is a zigzag merge connection, if the next line obtained by the search is found to be the same line as the next line of the direct sequential access, it means that the zigzag merge connection has no benefit. By adding 1 to the use_basic, it means that the merge connection algorithm is preferred. Conversely, if it is found that the next line obtained by the search is the same line as the next line of the direct sequential access, it means that the zigzag merge connection has benefits, and the use_basic or minus 1 indicates that it is not inclined to use the merge connection algorithm;

Step, when in the merge connection algorithm, if the row key value obtained from one side is continuously smaller than the other side row key value, add 1 through use_basic, which means that it is not inclined to use the zigzag connection algorithm. On the contrary, if from one side The obtained row key value is continuously smaller than the other side row key value, then use_basic is reduced by 1, indicating that the zigzag connection algorithm is preferred. First

Step: Determine whether the current use_basic value exceeds a certain preset threshold. If it exceeds, it means that a merge connection algorithm (current_mode=1) is needed.

Please refer to Figure 3, which is a scenario where only the zigzag merge connection algorithm is used for connection processing and the connection processing is inefficient. The data in the first data set is shown in the left table, and the data in the second data set is shown in the right table. When the glyph merge connection algorithm connects the data in the left and right tables, there is less data that can be skipped. The execution cost of searching by key value is greater than the execution cost of sequential scanning. Therefore, the execution cost of the zigzag merge connection algorithm is greater than the merge connection. The execution cost of the algorithm. For the same scenario, Figure 4 shows the execution process of this embodiment. The execution process of the entire zigzag merge connection is monitored. When it is detected that the number of rows that can be skipped by the zigzag merge connection algorithm is insufficient (the profit is too small), Switch the algorithm to the merge connection algorithm; when in the merge connection algorithm state, if there is insufficient continuous access to the data on a certain side, switch back to the zigzag merge connection algorithm. The switch between the merge connection algorithm and the zigzag merge connection algorithm is in the process of executing the algorithm Based on the execution parameters of, it will switch automatically.

The improved scheme of the above embodiment combines two parts of the logic of the merged connection algorithm and the zigzag merged connection algorithm. When each part of the logic is executed, a different execution feedback logic is used (the basic merge algorithm uses the first

Step, the zigzag merge connection algorithm uses the first

Step), real-time adjustment of the switching of the two algorithms, without manual intervention and hard coding, realizes the adaptive adjustment of the algorithm to different data, overcomes the merge connection algorithm and the zigzag merge connection algorithm. The respective processing is prone to comparison The problem of inefficiency enables the overall algorithm to achieve better execution results in different data distribution scenarios, and enhances the robustness of the merged connection algorithm.

Please refer to FIG. 5. This embodiment is based on the data connection method provided in FIG. 1, and correspondingly provides a data processing device, which includes:

The connection unit 51 is configured to perform a data connection operation on the first data set and the second data set through a first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;

The detection unit 52 is configured to obtain execution parameters of the first connection algorithm during the execution of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;

The switching unit 53 is configured to switch the first connection algorithm to the second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the second connection algorithm. The first connection algorithm is different.

Wherein, when the detection unit 52 detects and obtains the execution parameter, when the first connection algorithm is a merge connection algorithm, the number of rows in which the merge connection algorithm continuously accesses a data row in a data set may be obtained as the execution parameter; Or, when the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.

When the detection unit 52 detects and obtains execution parameters, or when the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, the value of the execution parameter is added to the preset value. Set the step size, and vice versa, subtract the preset step size from the value of the execution parameter; when the first connection algorithm is a zigzag merge connection algorithm, every time the data line found is compared with the previous read When the next row of the data row is the same, the preset step size is added to the value of the execution parameter, and vice versa, the preset step size is subtracted from the value of the execution parameter. Specifically, when the first connection algorithm is a merge connection algorithm, determine the size of the key value between the current first data row and the current second data row that are read, the first data row belongs to the first data set, and the The second data row belongs to the second data set; if the key value of the current first data row is less than or equal to the key value of the current second data row, the key value of the first data row and the second data row judged last time are obtained If the key value of the first data row determined last time is less than or equal to the key value of the second data row, add the preset step to the execution parameter; if the current first data The key value of the row is greater than the key value of the current second data row, get the judgment result of the key value of the first data row and the key value of the second data row judged last time, if the key of the first data row judged last time If the value is greater than the key value of the second data row, the preset step size is added to the execution parameter.

As an optional implementation manner, the first connection algorithm may be a merge connection algorithm or a zigzag merge connection algorithm, and the second connection algorithm may be a zigzag merge connection algorithm or a merge connection algorithm.

Regarding the device in the above-mentioned embodiment, the specific operation mode of each unit therein has been described in detail in the embodiment of the relevant method, and will not be elaborated here.

Please refer to FIG. 6, which is a block diagram of an electronic device 700 for implementing a data query method according to an exemplary embodiment. For example, the electronic device 700 may be a computer, a database console, a tablet device, a personal digital assistant, or the like.

6, the electronic device 700 may include one or more of the following components: a processing component 702, a memory 704, a power supply component 706, an input/output (I/O) interface 710, and a communication component 712.

The processing component 702 generally controls the overall operations of the electronic device 700, such as operations associated with display, data communication, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to complete all or part of the steps of the foregoing method. In addition, the processing component 702 may include one or more modules to facilitate the interaction between the processing component 702 and other components.

The memory 704 is configured to store various types of data to support operations in the electronic device 700. Examples of these data include instructions for any application or method operating on the electronic device 700, contact data, phone book data, messages, pictures, videos, etc. The memory 704 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

The power supply component 706 provides power for various components of the electronic device 700. The power supply component 706 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 700.

The I/O interface 710 provides an interface between the processing component 702 and a peripheral interface module. The above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.

The communication component 712 is configured to facilitate wired or wireless communication between the electronic device 700 and other devices. The electronic device 700 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 712 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 712 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

In an exemplary embodiment, the electronic device 700 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.

In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as the memory 704 including instructions, and the foregoing instructions may be executed by the processor 720 of the electronic device 700 to complete the foregoing method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer-readable storage medium, when instructions in the storage medium are executed by a processor of a mobile terminal, so that an electronic device can execute a data query method, the method includes:

Perform a data connection operation on the first data set and the second data set through the first connection algorithm to connect data with the same key value in the first data set and the second data set; in the execution of the first connection algorithm In the process, the execution parameters of the first connection algorithm are obtained, and the execution parameters are used to characterize the execution efficiency of the algorithm execution process; the first connection algorithm is switched to the second connection algorithm according to the execution parameters, and the The second connection algorithm continues to perform the data connection operation, wherein the second connection algorithm is different from the first connection algorithm.

It should be understood that the present embodiment is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of this embodiment is only limited by the appended claims

The foregoing descriptions are only preferred embodiments of this embodiment, and are not intended to limit this embodiment. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this embodiment shall be included in this embodiment. Within the protection scope of the embodiment.

Claims

A data connection method, the method includes:

Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;

In the execution process of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;

Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .
The method according to claim 1, wherein said obtaining the execution parameters of the first connection algorithm comprises:

When the first connection algorithm is a merge connection algorithm, obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or,

When the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
The method according to claim 1, wherein said obtaining the execution parameters of the first connection algorithm comprises:

When the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, a preset step is added to the value of the execution parameter; otherwise, the value of the execution parameter is subtracted Go to the preset step size;

When the first connection algorithm is a zigzag merge connection algorithm, the value of the execution parameter is added to the preset step every time the data row found is the same as the next row of the previously read data row. Long, otherwise, subtract the preset step length from the value of the execution parameter.
The method according to claim 3, wherein when the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, a preset step is added to the value of the execution parameter ,include:

When the first connection algorithm is a merge connection algorithm, determine the size of the key value of the read current first data row and the current second data row, the first data row belongs to the first data set, and the second data The row belongs to the second data set;

If the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row that was judged last time. The key value of the first data row is less than or equal to the key value of the second data row, and the preset step is added to the execution parameter;

If the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row judged last time and the key value of the second data row. The key value of one data row is greater than the key value of the second data row, and the preset step size is added to the execution parameter.
The method according to any one of claims 1 to 4, wherein the first connection algorithm is a merge connection algorithm or a zigzag merge connection algorithm, and the second connection algorithm is a zigzag merge connection algorithm or a merge connection algorithm.
A data connection device, the device comprising:

The connection unit is configured to perform a data connection operation on the first data set and the second data set through a first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;

A detection unit, configured to obtain execution parameters of the first connection algorithm during the execution of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;

The switching unit is configured to switch the first connection algorithm to the second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the The first connection algorithm is different.
The device according to claim 6, wherein the detection unit is configured to:

When the first connection algorithm is a merge connection algorithm, obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or,

When the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
The device according to claim 6, wherein the detection unit is configured to:

When the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, a preset step is added to the value of the execution parameter; otherwise, the value of the execution parameter is subtracted Go to the preset step size;

When the first connection algorithm is a zigzag merge connection algorithm, the value of the execution parameter is added to the preset step every time the data row found is the same as the next row of the previously read data row. Long, otherwise, subtract the preset step length from the value of the execution parameter.
The device according to claim 8, wherein the detection unit is further configured to:

When the first connection algorithm is a merge connection algorithm, determine the size of the key value of the read current first data row and the current second data row, the first data row belongs to the first data set, and the second data The row belongs to the second data set;

If the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row that was judged last time. The key value of the first data row is less than or equal to the key value of the second data row, and the preset step is added to the execution parameter;

If the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row judged last time and the key value of the second data row. The key value of one data row is greater than the key value of the second data row, and the preset step size is added to the execution parameter.
The device according to any one of claims 6-9, the device further comprising:

The first connection algorithm is a merge connection algorithm or a zigzag merge connection algorithm, and the second connection algorithm is a zigzag merge connection algorithm or a merge connection algorithm.
A computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the following steps are implemented:

Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;

In the execution process of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;

Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .
An electronic device that includes a memory and one or more programs, wherein one or more programs are stored in the memory and configured to be executed by one or more processors, including: Instructions to do the following:

Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;

In the execution process of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;

Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .