WO2021057088A1 - Data connection method and apparatus and electronic device - Google Patents

Data connection method and apparatus and electronic device Download PDF

Info

Publication number
WO2021057088A1
WO2021057088A1 PCT/CN2020/094583 CN2020094583W WO2021057088A1 WO 2021057088 A1 WO2021057088 A1 WO 2021057088A1 CN 2020094583 W CN2020094583 W CN 2020094583W WO 2021057088 A1 WO2021057088 A1 WO 2021057088A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
connection
algorithm
connection algorithm
row
Prior art date
Application number
PCT/CN2020/094583
Other languages
French (fr)
Chinese (zh)
Inventor
陈萌萌
Original Assignee
蚂蚁金服(杭州)网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 蚂蚁金服(杭州)网络技术有限公司 filed Critical 蚂蚁金服(杭州)网络技术有限公司
Publication of WO2021057088A1 publication Critical patent/WO2021057088A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • G06F16/2456Join operations

Definitions

  • This specification relates to the field of software technology, and in particular to a data connection method, device and electronic equipment.
  • Join is to associate two data together. For example, after data A and B are connected, data AB can be formed, which is used for data query, file association, etc. It is one of the basic relational algebra operations of the database. In the database implementation Has a wide range of application scenarios. The realization of the connection algorithm directly affects the realization efficiency of the connection operator (Join Operator), and has a vital impact on the performance of the entire database. Common connection algorithms (Join Algorithm) include Nested-Loop Join, Merge Join, Hash Join, and so on. In contrast to merge joins, a zigzag merge join (Zigzag Merge Join) also appears.
  • Merge connection For two data sets, the data in one data set is usually called the left input, and the data set in the other data set is called the right input.
  • the required data connection operation is completed by sequentially accessing the records input on the left and right sides. Due to the stringent requirements, the execution efficiency is low in most scenarios.
  • Zigzag merge connection A variant of the merge connection. The zigzag merge connection uses the index structure input on the left and right sides to alternately locate the data on the other side through the input key values on the left and right sides to avoid accessing invalid data on the left and right sides.
  • the zigzag merge connection uses a search operation when accessing the next row, when most of the data is valid, the search cost is higher than the cost of sequential access, resulting in a greatly reduced efficiency of the zigzag merge connection. That is, no matter whether the merge connection or the zigzag merge connection is used, there is a problem of low execution efficiency, and a method to improve the execution efficiency of data connections is urgently needed.
  • the embodiments of this specification provide a data connection method, device, and electronic equipment, which are used to solve the technical problem of low data connection execution efficiency.
  • an embodiment of this specification provides a data connection method, and the method includes:
  • the obtaining the execution parameters of the first connection algorithm includes:
  • the first connection algorithm is a merge connection algorithm, obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or,
  • the first connection algorithm is a zigzag merge connection algorithm
  • the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
  • the obtaining the execution parameters of the first connection algorithm includes:
  • the first connection algorithm is a merge connection algorithm
  • a preset step is added to the value of the execution parameter; otherwise, the value of the execution parameter is subtracted Go to the preset step size;
  • the value of the execution parameter is added to the preset step every time the data row found is the same as the next row of the previously read data row. Long, otherwise, subtract the preset step length from the value of the execution parameter.
  • adding a preset step to the value of the execution parameter every time a data row in a data set occurs continuously is read, including:
  • the first connection algorithm is a merge connection algorithm
  • determine the size of the key value of the read current first data row and the current second data row the first data row belongs to the first data set, and the second data The row belongs to the second data set;
  • the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row that was judged last time.
  • the key value of the first data row is less than or equal to the key value of the second data row, and the preset step is added to the execution parameter;
  • the key value of one data row is greater than the key value of the second data row, and the preset step size is added to the execution parameter.
  • the first connection algorithm is a merge connection algorithm or a zigzag merge connection algorithm
  • the second connection algorithm is a zigzag merge connection algorithm or a merge connection algorithm.
  • this embodiment provides a data connection device, and the device includes:
  • connection unit is configured to perform a data connection operation on the first data set and the second data set through a first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
  • a detection unit configured to obtain execution parameters of the first connection algorithm during the execution of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
  • the switching unit is configured to switch the first connection algorithm to the second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the The first connection algorithm is different.
  • the detection unit is configured to: when the first connection algorithm is a merge connection algorithm, obtain the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or When the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
  • the detection unit is configured to: when the first connection algorithm is a merge connection algorithm, add a preset step to the value of the execution parameter every time a data row in a data set is continuously read. , On the contrary, subtract the preset step length from the value of the execution parameter; when the first connection algorithm is a zigzag merge connection algorithm, every time the data line found and the data line read before appear When the next line of is the same, the preset step size is added to the value of the execution parameter, otherwise, the preset step size is subtracted from the value of the execution parameter.
  • the detection unit is further configured to: when the first connection algorithm is a merge connection algorithm, determine the size of the key value of the current first data row and the current second data row that have been read, and the first data The row belongs to the first data set, and the second data row belongs to the second data set; if the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the first data judged last time The judgment result of the key value of the row and the key value of the second data row. If the key value of the first data row judged last time is less than or equal to the key value of the second data row, add the preset value to the execution parameter.
  • the first connection algorithm is a merge connection algorithm
  • the first connection algorithm is a merge connection algorithm or a zigzag merge connection algorithm
  • the second connection algorithm is a zigzag merge connection algorithm or a merge connection algorithm.
  • a computer-readable storage medium has a computer program stored thereon, and when the program is executed by a processor, the following steps are implemented:
  • an electronic device in a fourth aspect, includes a memory and one or more programs, wherein one or more programs are stored in the memory and are configured to be executed by one or more processors
  • the program contains instructions for the following operations:
  • the embodiment of this specification provides a data connection method, including: performing a data connection operation on a first data set and a second data set through a first connection algorithm to connect data with the same key value in the first data set and the second data set During the execution of the first connection algorithm, the execution parameters of the first connection algorithm are acquired, and the execution parameters are used to characterize the execution efficiency of the algorithm execution process; according to the acquired execution parameters, the first connection algorithm is selected to switch to the second Connection algorithm.
  • the above method is for data connection, and does not specify the use of a certain connection algorithm, but adaptively switches according to the execution efficiency of the current connection algorithm, so as to realize the algorithm's adaptive adjustment to different data, in scenarios that include different data distributions It can achieve high execution efficiency, overcome the relatively inefficient problem of a single connection algorithm for complex data, and improve the execution efficiency of data connections.
  • FIG. 1 is a schematic flowchart of a data connection method provided by an embodiment of this specification
  • 2a is a partial flowchart of the adaptive switching between the zigzag merged connection algorithm and the merged connection algorithm provided by the embodiment of this specification;
  • 2b is a partial flowchart of the adaptive switching between the zigzag merged connection algorithm and the merged connection algorithm provided by the embodiment of this specification;
  • FIG. 3 is a schematic diagram of the inefficient execution of only the zigzag merge connection algorithm provided by the embodiment of this specification;
  • FIG. 4 is a schematic diagram of the implementation of the zigzag merged connection algorithm and the merged connection algorithm provided by the embodiment of this specification during adaptive switching;
  • Figure 5 is a schematic diagram of a data connection device provided by an embodiment of the specification.
  • Fig. 6 is a schematic diagram of an electronic device provided by an embodiment of the specification.
  • the embodiment of this specification provides a data connection method.
  • the algorithm is adaptively adjusted to different data, thereby improving the execution efficiency of the data connection. .
  • This embodiment provides a data connection method, which is applied to data processing systems, such as databases, data tables, and other systems that require data connection.
  • the merge connection algorithm and the zigzag merge connection algorithm are used to connect the first data set and the second data set to be connected.
  • the data set is adaptively connected to the data. Please refer to Figure 1.
  • the data connection method includes:
  • Step 10 Perform a data connection operation on the first data set and the second data set by using the first connection algorithm to connect data with the same key value in the first data set and the second data set;
  • Step 11 During the execution of the first connection algorithm, obtain execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
  • Step 12 Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the first connection algorithm.
  • the connection algorithm is different.
  • step 10 may randomly select any one of the merge connection algorithm and the zigzag merge connection algorithm as the first connection algorithm of the data connection.
  • the first connection algorithm may be set as the common algorithm of each data processing system according to the usage records of each database.
  • the zigzag merge connection algorithm is set as the first connection algorithm
  • the zigzag merge connection algorithm is set as the second connection algorithm.
  • the connection algorithm uses the merge connection algorithm as a supplement to the zigzag merge connection algorithm to perform data connection operations to achieve data connection.
  • the left input and the right input respectively contain multiple rows of data, and each row of data corresponds to its own connection key.
  • the left input and the right input can be sorted according to the key value of the connection key, that is, the value, for example, the key value can be sorted in ascending order, so that the left and right inputs are ordered for the connection key.
  • Step 1 Obtain a row of data entered on the left and on the right, and judge whether the key values of the connection keys are equal, if they are equal, go to the third step; if they are not equal, go to the second step; if either left or right If the side input is exhausted, then enter the fourth step;
  • Step 2 If the left key value is less than the right key value, read the next row of data on the left; otherwise, read the next row of data on the right, and then go back to the third step;
  • Step 3 Cache the read data in the left and right input lines into two temporary buffer areas respectively, and output the connection result of the two lines of data; then, read the next line on the right, if the next line on the right If the key value is equal to the previous right row key value, continue to cache the row to the buffer area, and output the connection result of the next right row and the current left row, continue to read the next right row, and loop until the right row key value Different from the previous line; then, start to read the next line on the left. If the key value of the previous line on the left is the same, scan all the lines on the right side of the buffer in the buffer in turn, and perform the connection operation, and repeat until If the key value of the left row is different from the previous row, stop the calculation and return to the third step;
  • Step 4 The merge connection algorithm ends, and the remaining data on the unfinished side is discarded.
  • the execution process of the zigzag merge connection algorithm is basically the same as the logic of the merge connection algorithm. The difference is that in the second step, if the left key value is less than the right key value, the left input data is searched according to the right key value Operation, find the first left input greater than or equal to the right key value as the next line of the left input; vice versa.
  • step 11 is further executed to obtain the execution parameters of the current algorithm, that is, the execution process of the first connection algorithm.
  • the parameters that characterize the execution efficiency of the connection algorithm include connection efficiency, connection output time, and algorithm calculation amount.
  • any of the above parameters or any combination of parameters can be selected as the execution parameters.
  • the weighted sum of each parameter can be used as the execution parameter.
  • step 12 it can be determined whether the acquired execution parameter is greater than the set threshold. Greater than the set threshold indicates that the algorithm execution efficiency is low, and algorithm switching is required. Otherwise, algorithm switching is not required.
  • the set threshold can be set according to different algorithms and different efficiency requirements, as long as Condition 1 and Condition 2 are satisfied. This embodiment does not limit the specific value of the set threshold.
  • Condition 1 In the basic merge algorithm, if there are fewer rows that continuously access data rows in a data set, the more likely it is to switch to the zigzag merge connection algorithm, and vice versa, the less inclined to use the zigzag merge connection algorithm.
  • Condition 2 In the zigzag basic merge algorithm, if there are fewer lines skipped by the search, the more likely it is to switch to the merge connection algorithm, and vice versa, the less inclined to use the merge connection algorithm.
  • switch the current connection algorithm switch the first connection algorithm to the second connection algorithm, if the merge connection algorithm is currently used, switch it to the zigzag merge connection algorithm, otherwise, if If the zigzag merge connection algorithm is currently used, switch it to the merge connection algorithm, and continue to perform the data connection operation for data connection through the switched zigzag merge connection algorithm or merge connection algorithm.
  • step 10 to step 12 are executed cyclically, so that the entire data processing process is switched in real time according to the actual execution of the algorithm.
  • the use of a more efficient connection algorithm for the current data overcomes the use of a certain algorithm. There is a technical problem of low execution efficiency in certain scenarios.
  • step 11 may adopt a method of obtaining algorithm revenue to obtain execution parameters: obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter of the merge connection algorithm.
  • the merge join algorithm continuously accesses the data rows in a data set with more rows, the smaller the algorithm profit, and the lower its execution efficiency, the more inclined it is to convert to a zigzag merge join algorithm, which will continuously access the data rows in a data set
  • the number of rows is used as an execution parameter to characterize the efficiency of execution.
  • each row input on the left and right sides can be sequentially numbered, and the current row number of continuous access minus the first row number of continuous access is the number of continuous access rows.
  • step 11 can also use the method of obtaining the algorithm revenue to obtain the execution parameters: obtain the data rows skipped by the zigzag merge connection algorithm as the execution parameters.
  • the zigzag merge connection algorithm obtains the next line of input by searching. When it is detected that the number of rows that can be skipped by the zigzag merge connection algorithm is detected, the less the algorithm gains, the lower the execution efficiency, and the more inclined to convert to the merge connection algorithm. The number of skipped data rows is used as an execution parameter to characterize execution efficiency, which is also simple and effective.
  • this implementation can also use the following method to perform step 11:
  • the merge connection algorithm is used to connect the output, every time a data row in a data set is continuously read, the execution parameter is affected.
  • the value plus the preset step length on the contrary, subtract the preset step length from the value of the execution parameter;
  • the zigzag merge connection algorithm is used to connect the output, every time the data line found and the data line read before appear
  • add the preset step length to the value of the execution parameter otherwise, subtract the preset step length from the value of the execution parameter.
  • the initial value of the execution parameter can be set to zero.
  • the key value size of the read current first data row and the current second data row is judged, and the first data row, that is, the data row input on the left belongs to the first data set,
  • the second data row that is, the data row entered on the right belongs to the second data set; if the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the first data judged last time The judgment result of the key value of the row and the key value of the second data row. If the key value of the first data row judged last time is less than or equal to the key value of the second data row, add the preset value to the execution parameter.
  • l_key and r_key as the theoretical lower bound of the connection key value.
  • l_key corresponds to the row key value entered on the left
  • r_key corresponds to the row key value entered on the right.
  • the theoretical lower bound of the connected key value is the theoretical lower bound of the current numeric type, for example, for 64-bit signed integer data . Its theoretical lower bound is -9,223,372,036,854,775,808, and other different data types are based on the realization of each database, and may have different theoretical lower bounds.
  • step 5 Determine whether the key value of the new_row row pointed to by l_row is equal to the key value of the new_row row pointed to by r_row, if it is, go to step 10, if not, proceed to step 6.
  • the row pointed to by l_row or r_row is also called the current input row or new_row row.
  • begin_left is the pointer used to store the previously read row.
  • direction 0; make input point to the left table; assign l_key to old_key; assign r_key to new_key; point l_row to new_row, and then go to step
  • use_basic is the execution parameter.
  • Steps 1-3 are the initial work part.
  • the initial algorithm logic is the zigzag merge connection algorithm; the overall algorithm includes the merge connection algorithm and the zigzag merge connection algorithm.
  • the main body of the merge connection algorithm is the first Step, the main body of the zigzag merge connection algorithm is the first step.
  • First Step when the algorithm is a zigzag merge connection, if the next line obtained by the search is found to be the same line as the next line of the direct sequential access, it means that the zigzag merge connection has no benefit. By adding 1 to the use_basic, it means that the merge connection algorithm is preferred.
  • Step when in the merge connection algorithm, if the row key value obtained from one side is continuously smaller than the other side row key value, add 1 through use_basic, which means that it is not inclined to use the zigzag connection algorithm. On the contrary, if from one side The obtained row key value is continuously smaller than the other side row key value, then use_basic is reduced by 1, indicating that the zigzag connection algorithm is preferred.
  • Figure 3 is a scenario where only the zigzag merge connection algorithm is used for connection processing and the connection processing is inefficient.
  • the data in the first data set is shown in the left table, and the data in the second data set is shown in the right table.
  • the glyph merge connection algorithm connects the data in the left and right tables, there is less data that can be skipped.
  • the execution cost of searching by key value is greater than the execution cost of sequential scanning. Therefore, the execution cost of the zigzag merge connection algorithm is greater than the merge connection.
  • the execution cost of the algorithm For the same scenario, Figure 4 shows the execution process of this embodiment. The execution process of the entire zigzag merge connection is monitored.
  • the improved scheme of the above embodiment combines two parts of the logic of the merged connection algorithm and the zigzag merged connection algorithm.
  • a different execution feedback logic is used (the basic merge algorithm uses the first Step, the zigzag merge connection algorithm uses the first Step), real-time adjustment of the switching of the two algorithms, without manual intervention and hard coding, realizes the adaptive adjustment of the algorithm to different data, overcomes the merge connection algorithm and the zigzag merge connection algorithm.
  • the respective processing is prone to comparison
  • the problem of inefficiency enables the overall algorithm to achieve better execution results in different data distribution scenarios, and enhances the robustness of the merged connection algorithm.
  • FIG. 5 Please refer to FIG. 5.
  • This embodiment is based on the data connection method provided in FIG. 1, and correspondingly provides a data processing device, which includes:
  • the connection unit 51 is configured to perform a data connection operation on the first data set and the second data set through a first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
  • the detection unit 52 is configured to obtain execution parameters of the first connection algorithm during the execution of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
  • the switching unit 53 is configured to switch the first connection algorithm to the second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the second connection algorithm.
  • the first connection algorithm is different.
  • the detection unit 52 detects and obtains the execution parameter
  • the first connection algorithm is a merge connection algorithm
  • the number of rows in which the merge connection algorithm continuously accesses a data row in a data set may be obtained as the execution parameter;
  • the first connection algorithm is a zigzag merge connection algorithm
  • the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
  • the detection unit 52 detects and obtains execution parameters, or when the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, the value of the execution parameter is added to the preset value.
  • Set the step size, and vice versa subtract the preset step size from the value of the execution parameter;
  • the first connection algorithm is a zigzag merge connection algorithm, every time the data line found is compared with the previous read
  • the preset step size is added to the value of the execution parameter, and vice versa, the preset step size is subtracted from the value of the execution parameter.
  • the first connection algorithm is a merge connection algorithm
  • determine the size of the key value between the current first data row and the current second data row that are read the first data row belongs to the first data set, and the The second data row belongs to the second data set; if the key value of the current first data row is less than or equal to the key value of the current second data row, the key value of the first data row and the second data row judged last time are obtained If the key value of the first data row determined last time is less than or equal to the key value of the second data row, add the preset step to the execution parameter; if the current first data The key value of the row is greater than the key value of the current second data row, get the judgment result of the key value of the first data row and the key value of the second data row judged last time, if the key of the first data row judged last time If the value is greater than the key value of the second data row, the preset step size is added to the execution parameter.
  • the first connection algorithm may be a merge connection algorithm or a zigzag merge connection algorithm
  • the second connection algorithm may be a zigzag merge connection algorithm or a merge connection algorithm
  • FIG. 6 is a block diagram of an electronic device 700 for implementing a data query method according to an exemplary embodiment.
  • the electronic device 700 may be a computer, a database console, a tablet device, a personal digital assistant, or the like.
  • the electronic device 700 may include one or more of the following components: a processing component 702, a memory 704, a power supply component 706, an input/output (I/O) interface 710, and a communication component 712.
  • a processing component 702 a memory 704
  • a power supply component 706 an input/output (I/O) interface 710
  • the processing component 702 generally controls the overall operations of the electronic device 700, such as operations associated with display, data communication, and recording operations.
  • the processing component 702 may include one or more processors 720 to execute instructions to complete all or part of the steps of the foregoing method.
  • the processing component 702 may include one or more modules to facilitate the interaction between the processing component 702 and other components.
  • the memory 704 is configured to store various types of data to support operations in the electronic device 700. Examples of these data include instructions for any application or method operating on the electronic device 700, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 704 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • the power supply component 706 provides power for various components of the electronic device 700.
  • the power supply component 706 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 700.
  • the I/O interface 710 provides an interface between the processing component 702 and a peripheral interface module.
  • the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
  • the communication component 712 is configured to facilitate wired or wireless communication between the electronic device 700 and other devices.
  • the electronic device 700 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
  • the communication component 712 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 712 further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the electronic device 700 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • ASIC application-specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing devices
  • PLD programmable logic devices
  • FPGA field-available A programmable gate array
  • controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • non-transitory computer-readable storage medium including instructions, such as the memory 704 including instructions, and the foregoing instructions may be executed by the processor 720 of the electronic device 700 to complete the foregoing method.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • a non-transitory computer-readable storage medium when instructions in the storage medium are executed by a processor of a mobile terminal, so that an electronic device can execute a data query method, the method includes:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephone Function (AREA)

Abstract

A data connection method and apparatus and an electronic device. The method comprises: carrying out a data connection operation on a first data set and a second data set by means of a first connection algorithm, to connect data of the same key value in the first data set and the second data set; in the execution process of the first connection algorithm, obtaining an execution parameter representing the execution efficiency of the algorithm in the execution process of the first connection algorithm; and switching the first connection algorithm into a second connection algorithm according to the execution parameter, and continuing executing the data connection operation by means of the second connection algorithm. According to the method, automatic switching of the connection algorithm is realized according to the algorithm execution state in the data connection process.

Description

一种数据连接的方法、装置及电子设备Method, device and electronic equipment for data connection 技术领域Technical field
本说明书涉及软件技术领域,特别涉及一种数据连接的方法、装置及电子设备。This specification relates to the field of software technology, and in particular to a data connection method, device and electronic equipment.
背景技术Background technique
连接(Join)是将两个数据关联在一起,例如对于数据A和B连接之后可以形成数据A-B,用于进行数据查询、文件关联等,是数据库的基本关系代数运算之一,在数据库实现中有着广泛的应用场景。连接算法的实现直接影响着连接算子(Join Operator)的实现效率,且对整个数据库的性能有着至关重要的影响。常见的连接算法(Join Algorithm)包括嵌套连接(Nested-Loop Join)合并连接(Merge Join)、哈希连接(Hash Join)等。相对于合并连接,还出现了一种之字形合并连接(Zigzag Merge Join)。Join is to associate two data together. For example, after data A and B are connected, data AB can be formed, which is used for data query, file association, etc. It is one of the basic relational algebra operations of the database. In the database implementation Has a wide range of application scenarios. The realization of the connection algorithm directly affects the realization efficiency of the connection operator (Join Operator), and has a vital impact on the performance of the entire database. Common connection algorithms (Join Algorithm) include Nested-Loop Join, Merge Join, Hash Join, and so on. In contrast to merge joins, a zigzag merge join (Zigzag Merge Join) also appears.
合并连接:针对两个数据集,通常将其中一个数据集中的数据称为左侧输入,另一个数据集中的数据集称为右侧输入,在左右两侧的输入按照连接键有序的情况下,通过依次访问左右两侧输入的记录完成所需要的数据连接操作,由于要求严苛大部分场景下执行效率较低。之字形合并连接:一种合并连接的实现变种,之字形合并连接利用左右侧输入的索引结构,通过左右两侧的输入键值交替定位另一侧的数据,达到避免访问左右两侧无效数据的效果,但是,由于之字形合并连接在访问下一行的时候采用了查找操作,在大部分数据有效的时候,查找的代价比顺序访问的代价更高,导致之字形合并连接的效率大大降低。即无论使用合并连接还是使用之字形合并连接,都存在执行效率低下的问题,亟需一种提高数据连接执行效率的方法。Merge connection: For two data sets, the data in one data set is usually called the left input, and the data set in the other data set is called the right input. When the input on the left and right sides is ordered according to the connection key , The required data connection operation is completed by sequentially accessing the records input on the left and right sides. Due to the stringent requirements, the execution efficiency is low in most scenarios. Zigzag merge connection: A variant of the merge connection. The zigzag merge connection uses the index structure input on the left and right sides to alternately locate the data on the other side through the input key values on the left and right sides to avoid accessing invalid data on the left and right sides. However, because the zigzag merge connection uses a search operation when accessing the next row, when most of the data is valid, the search cost is higher than the cost of sequential access, resulting in a greatly reduced efficiency of the zigzag merge connection. That is, no matter whether the merge connection or the zigzag merge connection is used, there is a problem of low execution efficiency, and a method to improve the execution efficiency of data connections is urgently needed.
发明内容Summary of the invention
本说明书实施例提供一种数据连接的方法、装置及电子设备,用于解决数据连接执行效率较低的技术问题。The embodiments of this specification provide a data connection method, device, and electronic equipment, which are used to solve the technical problem of low data connection execution efficiency.
第一方面,本说明书实施例提供一种数据连接的方法,所述方法包括:In the first aspect, an embodiment of this specification provides a data connection method, and the method includes:
通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据;Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执 行参数用于表征算法执行过程的执行效率;During the execution of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .
可选的,所述获取所述第一连接算法的执行参数,包括:Optionally, the obtaining the execution parameters of the first connection algorithm includes:
当所述第一连接算法为合并连接算法时,获取所述合并连接算法连续访问一个数据集中数据行的行数作为所述执行参数;或者,When the first connection algorithm is a merge connection algorithm, obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or,
当所述第一连接算法为之字形合并连接算法时,获取所述之字形合并连接算法跳过的数据行的行数作为执行参数。When the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
可选的,所述获取所述第一连接算法的执行参数,包括:Optionally, the obtaining the execution parameters of the first connection algorithm includes:
当所述第一连接算法为合并连接算法时,每出现一次连续读取一个数据集中的数据行则对所述执行参数的值加上预设步长,反之,对所述执行参数的值减去所述预设步长;When the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, a preset step is added to the value of the execution parameter; otherwise, the value of the execution parameter is subtracted Go to the preset step size;
当所述第一连接算法为之字形合并连接算法时,每出现一次查找到的数据行与在前读取的数据行的下一行相同时对所述执行参数的值加上所述预设步长,反之,对所述执行参数的值减去所述预设步长。When the first connection algorithm is a zigzag merge connection algorithm, the value of the execution parameter is added to the preset step every time the data row found is the same as the next row of the previously read data row. Long, otherwise, subtract the preset step length from the value of the execution parameter.
可选的,所述当所述第一连接算法为合并连接算法时,每出现一次连续读取一个数据集中的数据行则对所述执行参数的值加上预设步长,包括:Optionally, when the first connection algorithm is a merge connection algorithm, adding a preset step to the value of the execution parameter every time a data row in a data set occurs continuously is read, including:
当所述第一连接算法为合并连接算法时,判断读取的当前第一数据行与当前第二数据行的键值大小,所述第一数据行属于第一数据集,所述第二数据行属于所述第二数据集;When the first connection algorithm is a merge connection algorithm, determine the size of the key value of the read current first data row and the current second data row, the first data row belongs to the first data set, and the second data The row belongs to the second data set;
若当前第一数据行的键值小于或等于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值小于或等于第二数据行的键值,对所述执行参数加上所述预设步长;If the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row that was judged last time. The key value of the first data row is less than or equal to the key value of the second data row, and the preset step is added to the execution parameter;
若当前第一数据行的键值大于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值大于第二数据行的键值,对所述执行参数加上所述预设步长。If the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row judged last time and the key value of the second data row. The key value of one data row is greater than the key value of the second data row, and the preset step size is added to the execution parameter.
可选的,所述第一连接算法为合并连接算法或之字形合并连接算法,所述第二连接算法为之字形合并连接算法或合并连接算法。Optionally, the first connection algorithm is a merge connection algorithm or a zigzag merge connection algorithm, and the second connection algorithm is a zigzag merge connection algorithm or a merge connection algorithm.
第二方面、本实施例提供一种数据连接的装置,所述装置包括:In the second aspect, this embodiment provides a data connection device, and the device includes:
连接单元,用于通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据;The connection unit is configured to perform a data connection operation on the first data set and the second data set through a first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
检测单元,用于在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率;A detection unit, configured to obtain execution parameters of the first connection algorithm during the execution of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
切换单元,用于根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。The switching unit is configured to switch the first connection algorithm to the second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the The first connection algorithm is different.
可选的,所述检测单元用于:当所述第一连接算法为合并连接算法时,获取所述合并连接算法连续访问一个数据集中数据行的行数作为所述执行参数;或者,当所述第一连接算法为之字形合并连接算法时,获取所述之字形合并连接算法跳过的数据行的行数作为执行参数。Optionally, the detection unit is configured to: when the first connection algorithm is a merge connection algorithm, obtain the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or When the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
可选的,所述检测单元用于:当所述第一连接算法为合并连接算法时,每出现一次连续读取一个数据集中的数据行则对所述执行参数的值加上预设步长,反之,对所述执行参数的值减去所述预设步长;当所述第一连接算法为之字形合并连接算法时,每出现一次查找到的数据行与在前读取的数据行的下一行相同时对所述执行参数的值加上所述预设步长,反之,对所述执行参数的值减去所述预设步长。Optionally, the detection unit is configured to: when the first connection algorithm is a merge connection algorithm, add a preset step to the value of the execution parameter every time a data row in a data set is continuously read. , On the contrary, subtract the preset step length from the value of the execution parameter; when the first connection algorithm is a zigzag merge connection algorithm, every time the data line found and the data line read before appear When the next line of is the same, the preset step size is added to the value of the execution parameter, otherwise, the preset step size is subtracted from the value of the execution parameter.
可选的,所述检测单元还用于:当所述第一连接算法为合并连接算法时,判断读取的当前第一数据行与当前第二数据行的键值大小,所述第一数据行属于第一数据集,所述第二数据行属于所述第二数据集;若当前第一数据行的键值小于或等于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值小于或等于第二数据行的键值,对所述执行参数加上所述预设步长;若当前第一数据行的键值大于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值大于第二数据行的键值,对所述执行参数加上所述预设步长。Optionally, the detection unit is further configured to: when the first connection algorithm is a merge connection algorithm, determine the size of the key value of the current first data row and the current second data row that have been read, and the first data The row belongs to the first data set, and the second data row belongs to the second data set; if the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the first data judged last time The judgment result of the key value of the row and the key value of the second data row. If the key value of the first data row judged last time is less than or equal to the key value of the second data row, add the preset value to the execution parameter. Set the step size; if the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row judged last time. The key value of the first data row is judged to be greater than the key value of the second data row, and the preset step is added to the execution parameter.
可选的,所述第一连接算法为合并连接算法或之字形合并连接算法,所述第二连接算法为之字形合并连接算法或合并连接算法。Optionally, the first connection algorithm is a merge connection algorithm or a zigzag merge connection algorithm, and the second connection algorithm is a zigzag merge connection algorithm or a merge connection algorithm.
第三方面、一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现以下步骤:In the third aspect, a computer-readable storage medium has a computer program stored thereon, and when the program is executed by a processor, the following steps are implemented:
通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据;Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率;In the execution process of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .
第四方面、一种电子设备,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:In a fourth aspect, an electronic device includes a memory and one or more programs, wherein one or more programs are stored in the memory and are configured to be executed by one or more processors The program contains instructions for the following operations:
通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据;Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率;In the execution process of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .
本说明书实施例中的上述一个或多个技术方案,至少具有如下技术效果:The above one or more technical solutions in the embodiments of this specification have at least the following technical effects:
本说明书实施例提供一种数据连接的方法,包括:通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接第一数据集与第二数据集中键值相等的数据;在第一连接算法执行过程中,获取所述第一连接算法的执行参数,该执行参数用于表征算法执行过程的执行效率;根据获取到的执行参数选择将第一连接算法切换为第二连接算法。上述方法针对数据连接,并不指定使用某一种连接算法,而是根据当前连接算法的执行效率进行自适应切换,从而实现了算法对不同数据的自适应调整,在包含不同数据分布的场景中都能达到较高的执行效率,克服了单一连接算法针对复杂数据存在比较低效的问题,提高了数据连接的执行效率。The embodiment of this specification provides a data connection method, including: performing a data connection operation on a first data set and a second data set through a first connection algorithm to connect data with the same key value in the first data set and the second data set During the execution of the first connection algorithm, the execution parameters of the first connection algorithm are acquired, and the execution parameters are used to characterize the execution efficiency of the algorithm execution process; according to the acquired execution parameters, the first connection algorithm is selected to switch to the second Connection algorithm. The above method is for data connection, and does not specify the use of a certain connection algorithm, but adaptively switches according to the execution efficiency of the current connection algorithm, so as to realize the algorithm's adaptive adjustment to different data, in scenarios that include different data distributions It can achieve high execution efficiency, overcome the relatively inefficient problem of a single connection algorithm for complex data, and improve the execution efficiency of data connections.
附图说明Description of the drawings
为了更清楚地说明本说明书实施例中的技术方案,下面将对实施例描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本说明书的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获 得其他的附图。In order to more clearly describe the technical solutions in the embodiments of this specification, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the specification. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work.
图1为本说明书实施例提供的一种数据连接方法的流程示意图;FIG. 1 is a schematic flowchart of a data connection method provided by an embodiment of this specification;
图2a为本说明书实施例提供的之字形合并连接算法与合并连接算法自适应切换的部分流程图;2a is a partial flowchart of the adaptive switching between the zigzag merged connection algorithm and the merged connection algorithm provided by the embodiment of this specification;
图2b为本说明书实施例提供的之字形合并连接算法与合并连接算法自适应切换的部分流程图;2b is a partial flowchart of the adaptive switching between the zigzag merged connection algorithm and the merged connection algorithm provided by the embodiment of this specification;
图3为本说明书实施例提供的仅适用之字形合并连接算法低效执行的示意图;FIG. 3 is a schematic diagram of the inefficient execution of only the zigzag merge connection algorithm provided by the embodiment of this specification;
图4为本说明书实施例提供的之字形合并连接算法与合并连接算法自适应切换时的执行示意图;FIG. 4 is a schematic diagram of the implementation of the zigzag merged connection algorithm and the merged connection algorithm provided by the embodiment of this specification during adaptive switching;
图5为本说明书实施例提供的一种数据连接装置的示意图Figure 5 is a schematic diagram of a data connection device provided by an embodiment of the specification
图6为本说明书实施例提供的一种电子设备的示意图。Fig. 6 is a schematic diagram of an electronic device provided by an embodiment of the specification.
具体实施方式detailed description
为使本说明书实施例的目的、技术方案和优点更加清楚,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本说明书一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本说明书保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of this specification clearer, the following will clearly and completely describe the technical solutions in the embodiments of this specification in conjunction with the drawings in the embodiments of this specification. Obviously, the described embodiments It is a part of the embodiments of this specification, not all the embodiments. Based on the embodiments in this specification, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this specification.
在本说明书实施例提供一种数据连接的方法,通过获取表征连接算法执行效率的执行参数,并基于该执行参数切换连接算法,实现算法对不同数据的自适应调整,进而提高数据连接的执行效率。The embodiment of this specification provides a data connection method. By obtaining the execution parameter that characterizes the execution efficiency of the connection algorithm, and switching the connection algorithm based on the execution parameter, the algorithm is adaptively adjusted to different data, thereby improving the execution efficiency of the data connection. .
下面结合附图对本说明书实施例技术方案的主要实现原理、具体实施方式及其对应能够达到的有益效果进行详细的阐述。The main implementation principles, specific implementation manners and corresponding beneficial effects of the technical solutions of the embodiments of the present specification will be described in detail below in conjunction with the accompanying drawings.
本实施例提供一种数据连接的方法,应用于数据处理系统,如数据库、数据表等需要进行数据连接的系统,采用合并连接算法和之字形合并连接算法对待连接的第一数据集和第二数据集进行自适应数据连接。请参考图1,该数据连接方法包括:This embodiment provides a data connection method, which is applied to data processing systems, such as databases, data tables, and other systems that require data connection. The merge connection algorithm and the zigzag merge connection algorithm are used to connect the first data set and the second data set to be connected. The data set is adaptively connected to the data. Please refer to Figure 1. The data connection method includes:
步骤10、通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据; Step 10. Perform a data connection operation on the first data set and the second data set by using the first connection algorithm to connect data with the same key value in the first data set and the second data set;
步骤11、在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率; Step 11. During the execution of the first connection algorithm, obtain execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
步骤12、根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。 Step 12. Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the first connection algorithm. The connection algorithm is different.
其中,在进行数据连接初始化时,步骤10可以随机选择合并连接算法和之字形合并连接算法中的任一算法,作为数据连接的第一连接算法。或者,步骤10也可以根据各数据库使用记录,将第一连接算法设置为各数据处理系统的常用算法,如将之字形合并连接算法设置为第一连接算法,将之字形合并连接算法作为第二连接算法,通过合并连接算法作为之字形合并连接算法的补充,执行数据连接操作以实现数据连接。Wherein, when the data connection is initialized, step 10 may randomly select any one of the merge connection algorithm and the zigzag merge connection algorithm as the first connection algorithm of the data connection. Alternatively, in step 10, the first connection algorithm may be set as the common algorithm of each data processing system according to the usage records of each database. For example, the zigzag merge connection algorithm is set as the first connection algorithm, and the zigzag merge connection algorithm is set as the second connection algorithm. The connection algorithm uses the merge connection algorithm as a supplement to the zigzag merge connection algorithm to perform data connection operations to achieve data connection.
执行数据连接操作时,需要将第一数据集和第二数据集分别输入内存中,习惯性将第一数据集中的数据称为左侧输入,将第二数据集中的数据称为右侧输入,左侧输入和右侧输入分别包含多行数据,每行数据都对应有各自的连接键。在步骤10之前,可以分别对左侧输入和右侧输入按照连接键的键值即取值进行排序,例如,可以按照键值升序排列,使得左右两侧输入针对连接键有序。在左右两侧输入针对连接键有序的情况下,不仅能够保证合并连接算法的快速执行,还能有效提高之字形合并连接算法的键值查找效率。When performing a data connection operation, you need to input the first data set and the second data set into the memory respectively. It is customary to call the data in the first data set the left input and the data in the second data set as the right input. The left input and the right input respectively contain multiple rows of data, and each row of data corresponds to its own connection key. Before step 10, the left input and the right input can be sorted according to the key value of the connection key, that is, the value, for example, the key value can be sorted in ascending order, so that the left and right inputs are ordered for the connection key. When the input on the left and right sides is in order for the connection keys, it can not only ensure the fast execution of the merge connection algorithm, but also effectively improve the key value search efficiency of the zigzag merge connection algorithm.
在左右两侧输入针对连接键有序的情况下,合并连接算法的执行过程如下:When the input on the left and right sides is in order for the connection keys, the execution process of the merge connection algorithm is as follows:
第一步:获取左侧输入和右侧输入的一行数据,判断其连接键的键值是否相等,如果相等,则进入到第三步;如果不等,则进入第二步;如果左右任意一侧输入耗尽,则进入到第四步;Step 1: Obtain a row of data entered on the left and on the right, and judge whether the key values of the connection keys are equal, if they are equal, go to the third step; if they are not equal, go to the second step; if either left or right If the side input is exhausted, then enter the fourth step;
第二步:如果左侧键值小于右侧键值,则读取左侧的下一行数据;反之,则读取右侧下一行数据,然后回到第三步;Step 2: If the left key value is less than the right key value, read the next row of data on the left; otherwise, read the next row of data on the right, and then go back to the third step;
第三步:将读取的左右侧输入行中的数据分别缓存到两个临时缓存区中,并输出这两行数据的连接结果;接着,读取右侧的下一行,如果右侧下一行与之前的右侧行键值相等,则继续缓存该行到缓存区,并输出右侧下一行与当前左侧行的连接结果,继续读取右侧下一行,循环执行直至右侧行键值与上一行不同;然后,开始读取左侧下一行,如果与左侧之前行的键值相同,则对缓冲区中的缓存的右侧所有行依次扫描,并进行连接操作,如此循环执行直至左侧行键值与上一行不同,则停止计算,回到第三步;Step 3: Cache the read data in the left and right input lines into two temporary buffer areas respectively, and output the connection result of the two lines of data; then, read the next line on the right, if the next line on the right If the key value is equal to the previous right row key value, continue to cache the row to the buffer area, and output the connection result of the next right row and the current left row, continue to read the next right row, and loop until the right row key value Different from the previous line; then, start to read the next line on the left. If the key value of the previous line on the left is the same, scan all the lines on the right side of the buffer in the buffer in turn, and perform the connection operation, and repeat until If the key value of the left row is different from the previous row, stop the calculation and return to the third step;
第四步:合并连接算法结束,丢弃掉未完结一侧的剩余数据。Step 4: The merge connection algorithm ends, and the remaining data on the unfinished side is discarded.
关于之字形合并连接算法的执行过程,与合并连接算法的逻辑基本一致,区别在于第二步中,如果左侧键值小于右侧键值,则根据右侧键值对左侧输入数据进行查找操作,查找获得第一个大于或等于右侧键值的左侧输入作为左侧输入的下一行;反之同理。The execution process of the zigzag merge connection algorithm is basically the same as the logic of the merge connection algorithm. The difference is that in the second step, if the left key value is less than the right key value, the left input data is searched according to the right key value Operation, find the first left input greater than or equal to the right key value as the next line of the left input; vice versa.
在步骤10的执行过程中,进一步执行步骤11获取当前算法即第一连接算法执行过程的执行参数。其中,表征连接算法执行效率的参数包含连接效率、连接输出时长、算法计算量等。步骤11可以选择上述任一参数或任意参数进行组合作为执行参数。将两个或两个参数进行组合时,可以对各个参数进行加权相加的和值作为执行参数。对于获取到的执行参数继续执行步骤12。During the execution of step 10, step 11 is further executed to obtain the execution parameters of the current algorithm, that is, the execution process of the first connection algorithm. Among them, the parameters that characterize the execution efficiency of the connection algorithm include connection efficiency, connection output time, and algorithm calculation amount. In step 11, any of the above parameters or any combination of parameters can be selected as the execution parameters. When two or two parameters are combined, the weighted sum of each parameter can be used as the execution parameter. Continue to step 12 for the acquired execution parameters.
执行步骤12时,可以判断获取到的执行参数是否大于设定阈值。大于设定阈值表明算法执行效率偏低,需要进行算法切换,反之,则不需要进行算法切换。该设定阈值可以根据不同的算法和不同的效率需求进行设定,满足条件1和条件2即可,本实施例不限制设定阈值的具体取值。条件1,在基础合并算法时,若出现连续访问一个数据集中的数据行的行越少越倾向于切换使用之字形合并连接算法,反之越不倾向于使用之字形合并连接算法。条件2,在之字形基础合并算法时,若出现查找获得的行跳过的行越少越倾向于切换使用合并连接算法,反之越不倾向于使用合并连接算法。When step 12 is executed, it can be determined whether the acquired execution parameter is greater than the set threshold. Greater than the set threshold indicates that the algorithm execution efficiency is low, and algorithm switching is required. Otherwise, algorithm switching is not required. The set threshold can be set according to different algorithms and different efficiency requirements, as long as Condition 1 and Condition 2 are satisfied. This embodiment does not limit the specific value of the set threshold. Condition 1: In the basic merge algorithm, if there are fewer rows that continuously access data rows in a data set, the more likely it is to switch to the zigzag merge connection algorithm, and vice versa, the less inclined to use the zigzag merge connection algorithm. Condition 2: In the zigzag basic merge algorithm, if there are fewer lines skipped by the search, the more likely it is to switch to the merge connection algorithm, and vice versa, the less inclined to use the merge connection algorithm.
若执行参数大于设定阈值,对当前连接算法进行切换:将第一连接算法切换为第二连接算法,若当前使用的是合并连接算法,则将其切换为之字形合并连接算法,反之,若当前使用的是之字形合并连接算法,则将其切换为合并连接算法,并通过切换后的之字形合并连接算法或合并连接算法继续执行数据连接操作进行数据连接。If the execution parameter is greater than the set threshold, switch the current connection algorithm: switch the first connection algorithm to the second connection algorithm, if the merge connection algorithm is currently used, switch it to the zigzag merge connection algorithm, otherwise, if If the zigzag merge connection algorithm is currently used, switch it to the merge connection algorithm, and continue to perform the data connection operation for data connection through the switched zigzag merge connection algorithm or merge connection algorithm.
在执行整个数据连接操作的过程中,循环执行步骤10~步骤12,使得整个数据处理过程根据算法的实际执行情况实时进行切换,使用针对当前数据更高效的连接算法,克服了使用某一种算法存在某些场景执行效率较低的技术问题。In the process of performing the entire data connection operation, step 10 to step 12 are executed cyclically, so that the entire data processing process is switched in real time according to the actual execution of the algorithm. The use of a more efficient connection algorithm for the current data overcomes the use of a certain algorithm. There is a technical problem of low execution efficiency in certain scenarios.
针对合并连接算法,步骤11可以采用获取算法收益的方法来获取执行参数:获取合并连接算法连续访问一个数据集中的数据行的行数作为合并连接算法的执行参数。合并连接算法连续访问一个数据集中的数据行的行数越多,其算法收益越小,其执行效率越低,越倾向于转换为之字形合并连接算法,将其连续访问一个数据集中的数据行的行数作为执行参数,用于表征执行效率简单有效。具体的,对于合并连接算法,可以先对左右两侧输入的每一行进行顺序编号,将连续访问的当前行编号减去连续访问的首行编 号即为连续访问的行数。For the merge connection algorithm, step 11 may adopt a method of obtaining algorithm revenue to obtain execution parameters: obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter of the merge connection algorithm. The merge join algorithm continuously accesses the data rows in a data set with more rows, the smaller the algorithm profit, and the lower its execution efficiency, the more inclined it is to convert to a zigzag merge join algorithm, which will continuously access the data rows in a data set The number of rows is used as an execution parameter to characterize the efficiency of execution. Specifically, for the merge connection algorithm, each row input on the left and right sides can be sequentially numbered, and the current row number of continuous access minus the first row number of continuous access is the number of continuous access rows.
针对之字形合并连接算法,步骤11同样可以采用获取算法收益的放法来获取执行参数:获取之字形合并连接算法跳过的数据行作为执行参数。之字形合并连接算法通过查找获取下一行输入,当检测到执行之字形合并连接算法可以跳过的行数越少,算法收益越小,执行效率越低,越倾向于转换为合并连接算法,将其跳过的数据行的行数作为执行参数来表征执行效率,同样简单有效。具体的,也可以先对左右两侧输入的每一行进行顺序编号,记录之字形合并连接算法当前访问的行的下一行编号A,以及之字形合并连接算法查找获得的下一行编号B,编号B减去编号A即为之字形合并连接算法跳过的行数。For the zigzag merge connection algorithm, step 11 can also use the method of obtaining the algorithm revenue to obtain the execution parameters: obtain the data rows skipped by the zigzag merge connection algorithm as the execution parameters. The zigzag merge connection algorithm obtains the next line of input by searching. When it is detected that the number of rows that can be skipped by the zigzag merge connection algorithm is detected, the less the algorithm gains, the lower the execution efficiency, and the more inclined to convert to the merge connection algorithm. The number of skipped data rows is used as an execution parameter to characterize execution efficiency, which is also simple and effective. Specifically, it is also possible to sequentially number each line entered on the left and right sides, record the next line number A of the line currently accessed by the zigzag merge connection algorithm, and the next line number B, number B obtained by the zigzag merge connection algorithm search Subtracting the number A is the number of rows skipped by the zigzag merge connection algorithm.
基于获取算法收益作为执行参数的同一构思,本实施还可以采用下述方法来执行步骤11:在使用合并连接算法连接输出时,每出现一次连续读取一个数据集中的数据行则对执行参数的值加上预设步长,反之,对执行参数的值减去预设步长;在使用之字形合并连接算法连接输出时,每出现一次查找到的数据行与在前读取的数据行的下一行相同时对执行参数的值加上预设步长,反之,对执行参数的值减去预设步长。其中,执行参数的初始值可以设置为零。通过采用累积加减步长的方法来获取执行参数,无需对待连接的左右输入行进行编号,有效节约存储空间和计算量,进一步提高了执行效率。Based on the same concept of obtaining algorithm revenue as the execution parameter, this implementation can also use the following method to perform step 11: When the merge connection algorithm is used to connect the output, every time a data row in a data set is continuously read, the execution parameter is affected. The value plus the preset step length, on the contrary, subtract the preset step length from the value of the execution parameter; when the zigzag merge connection algorithm is used to connect the output, every time the data line found and the data line read before appear When the next line is the same, add the preset step length to the value of the execution parameter, otherwise, subtract the preset step length from the value of the execution parameter. Among them, the initial value of the execution parameter can be set to zero. By adopting the method of cumulative addition and subtraction step size to obtain the execution parameters, there is no need to number the left and right input lines to be connected, which effectively saves storage space and calculation amount, and further improves the execution efficiency.
其中,在使用合并连接算法连接输出时,判断读取的当前第一数据行与当前第二数据行的键值大小,所述第一数据行即左侧输入的数据行属于第一数据集,所述第二数据行即右侧输入的数据行属于所述第二数据集;若当前第一数据行的键值小于或等于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值小于或等于第二数据行的键值,对所述执行参数加上所述预设步长;若当前第一数据行的键值大于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值大于第二数据行的键值,对所述执行参数加上所述预设步长。Wherein, when the merge connection algorithm is used to connect the output, the key value size of the read current first data row and the current second data row is judged, and the first data row, that is, the data row input on the left belongs to the first data set, The second data row, that is, the data row entered on the right belongs to the second data set; if the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the first data judged last time The judgment result of the key value of the row and the key value of the second data row. If the key value of the first data row judged last time is less than or equal to the key value of the second data row, add the preset value to the execution parameter. Set the step size; if the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row judged last time. The key value of the first data row is judged to be greater than the key value of the second data row, and the preset step is added to the execution parameter.
请参考图2a和图2b,为本实施例中的一个完整的执行流程:Please refer to Figure 2a and Figure 2b for a complete execution process in this embodiment:
①.设置l_key和r_key为连接键值的理论下界。l_key对应左侧输入的行键值,r_key对应右侧输入的行键值,通常情况下连接键值的理论下界值是可以为当前数值类型的理论下界,例如针对64位有符号型整型数据,其理论下界为-9,223,372,036,854,775,808,其他不同的数据类型基于各数据库的实现,可能有不同的理论下界值。①. Set l_key and r_key as the theoretical lower bound of the connection key value. l_key corresponds to the row key value entered on the left, and r_key corresponds to the row key value entered on the right. Normally, the theoretical lower bound of the connected key value is the theoretical lower bound of the current numeric type, for example, for 64-bit signed integer data , Its theoretical lower bound is -9,223,372,036,854,775,808, and other different data types are based on the realization of each database, and may have different theoretical lower bounds.
②.通过lookup函数查找大于或者等于l_key的第一行数据,并使l_row指向该行,即找到初始第一数据行。②. Find the first row of data greater than or equal to l_key through the lookup function, and make l_row point to this row, that is, find the initial first data row.
③.通过lookup函数查找大于或者等于r_key的第一行数据,并使r_row指向该行,即找到初始第二数据行。③. Find the first row of data greater than or equal to r_key through the lookup function, and make r_row point to this row, that is, find the initial second data row.
④.判断l_row指向的行的键值是否小于或等于r_row指向的行的键值,若是转入步骤⑤,若否,转入步骤
Figure PCTCN2020094583-appb-000001
④. Determine whether the key value of the row pointed to by l_row is less than or equal to the key value of the row pointed to by r_row, if it is, go to step ⑤, if not, go to step
Figure PCTCN2020094583-appb-000001
⑤.判断l_row指向的new_row行的键值是否等于r_row指向的new_row行的键值,若是转入步骤⑩,若否,进步步骤⑥。其中,l_row或r_row指向的行又称为当前输入行或new_row行。⑤. Determine whether the key value of the new_row row pointed to by l_row is equal to the key value of the new_row row pointed to by r_row, if it is, go to step ⑩, if not, proceed to step ⑥. Among them, the row pointed to by l_row or r_row is also called the current input row or new_row row.
⑥.判断begin_left是否为非空,若是转入步骤⑦,若否,转入步骤⑨。其中,begin_left为用于存储在前读取行的指针。⑥. Judge whether begin_left is not empty, if it is, go to step ⑦, if not, go to step ⑨. Among them, begin_left is the pointer used to store the previously read row.
⑦.将begin_left赋值给l_row。⑦. Assign begin_left to l_row.
⑧.令begin_left为空。⑧. Let begin_left be empty.
⑨.令direction=0;使输入input指向左表;将l_key赋值给old_key;将r_key赋值给new_key;将l_row指向new_row,随后转入步骤
Figure PCTCN2020094583-appb-000002
其中,direction=0表示上一次左右侧行键值比较结果是左表行小于或等于右表行,direction=1表示上一次左右侧行键值比较结果是左表行大于右表行。
⑨. Let direction=0; make input point to the left table; assign l_key to old_key; assign r_key to new_key; point l_row to new_row, and then go to step
Figure PCTCN2020094583-appb-000002
Wherein, direction=0 indicates that the last key value comparison result of the left and right rows is that the left table row is less than or equal to the right table row, and direction=1 indicates that the last left and right row key value comparison result is that the left table row is greater than the right table row.
⑩.输出l_row与r_row指向的行的连接结果。⑩. Output the connection result of the row pointed to by l_row and r_row.
Figure PCTCN2020094583-appb-000003
判断断begin_left是否为空,若是,转入步骤
Figure PCTCN2020094583-appb-000004
若否,转入步骤
Figure PCTCN2020094583-appb-000005
Figure PCTCN2020094583-appb-000003
Determine if begin_left is empty, if yes, go to step
Figure PCTCN2020094583-appb-000004
If not, go to step
Figure PCTCN2020094583-appb-000005
Figure PCTCN2020094583-appb-000006
令begin_left=l_row。
Figure PCTCN2020094583-appb-000006
Let begin_left=l_row.
Figure PCTCN2020094583-appb-000007
判断current_mode是否等于1,若是,转入步骤
Figure PCTCN2020094583-appb-000008
若否,转入步骤⑨。其中,current_mode=1表示当前使用的是合并连接算法,current_mode=0表示当前使用的是之字形合并连接算法。
Figure PCTCN2020094583-appb-000007
Judge whether current_mode is equal to 1, if yes, go to step
Figure PCTCN2020094583-appb-000008
If not, go to step ⑨. Among them, current_mode=1 indicates that the merge connection algorithm is currently used, and current_mode=0 indicates that the zigzag merge connection algorithm is currently used.
Figure PCTCN2020094583-appb-000009
判断direction是否等于0,若是,进步步骤
Figure PCTCN2020094583-appb-000010
若否,转入步骤
Figure PCTCN2020094583-appb-000011
Figure PCTCN2020094583-appb-000009
Determine whether the direction is equal to 0, if so, the progress step
Figure PCTCN2020094583-appb-000010
If not, go to step
Figure PCTCN2020094583-appb-000011
Figure PCTCN2020094583-appb-000012
use_basic减1,并转入步骤⑨。其中,use_basic为执行参数。
Figure PCTCN2020094583-appb-000012
Decrease use_basic by 1 and go to step ⑨ Among them, use_basic is the execution parameter.
Figure PCTCN2020094583-appb-000013
use_basic加1,并转入步骤⑨。
Figure PCTCN2020094583-appb-000013
Add 1 to use_basic and go to step ⑨.
Figure PCTCN2020094583-appb-000014
判断current_mode是否等于1,若是,继续步骤
Figure PCTCN2020094583-appb-000015
若否,转入步骤
Figure PCTCN2020094583-appb-000016
Figure PCTCN2020094583-appb-000014
Determine whether current_mode is equal to 1, if yes, continue to the steps
Figure PCTCN2020094583-appb-000015
If not, go to step
Figure PCTCN2020094583-appb-000016
Figure PCTCN2020094583-appb-000017
判断direction是否等于1,若是,继续步骤
Figure PCTCN2020094583-appb-000018
若否,转入步骤
Figure PCTCN2020094583-appb-000019
Figure PCTCN2020094583-appb-000017
Determine whether the direction is equal to 1, if yes, continue to the steps
Figure PCTCN2020094583-appb-000018
If not, go to step
Figure PCTCN2020094583-appb-000019
Figure PCTCN2020094583-appb-000020
use_basic加1,并转入步骤
Figure PCTCN2020094583-appb-000021
Figure PCTCN2020094583-appb-000020
add 1 to use_basic and go to step
Figure PCTCN2020094583-appb-000021
Figure PCTCN2020094583-appb-000022
use_basic减1,并转入步骤
Figure PCTCN2020094583-appb-000023
Figure PCTCN2020094583-appb-000022
Decrease use_basic by 1, and go to step
Figure PCTCN2020094583-appb-000023
Figure PCTCN2020094583-appb-000024
令direction=1;使输入input指向右表;将r_key赋值给old_key;将l_key赋值给new_key;将r_row指向new_row,随后继续步骤
Figure PCTCN2020094583-appb-000025
Figure PCTCN2020094583-appb-000024
Let direction=1; make the input point to the right table; assign r_key to old_key; assign l_key to new_key; point r_row to new_row, and then continue to the steps
Figure PCTCN2020094583-appb-000025
Figure PCTCN2020094583-appb-000026
令current_mode=(use_basic>threshold),即若use_basic大于阈值threshold则令current_mode=1,使当前使用的合并连接算法使用合并连接算法,反之,令current_mode=0,使当前使用的合并连接算法使用之字形合并连接算法,即在执行参数大于阈值时进行算法切换。
Figure PCTCN2020094583-appb-000026
Set current_mode=(use_basic>threshold), that is, if use_basic is greater than the threshold, set current_mode=1 to make the currently used merge connection algorithm use the merge connection algorithm, otherwise, set current_mode=0 to make the currently used merge connection algorithm use zigzag Combine the connection algorithm, that is, switch the algorithm when the execution parameter is greater than the threshold.
Figure PCTCN2020094583-appb-000027
判断current_mode是否等于0,若是,转入步骤
Figure PCTCN2020094583-appb-000028
若否,转入步骤
Figure PCTCN2020094583-appb-000029
Figure PCTCN2020094583-appb-000027
Determine whether current_mode is equal to 0, if yes, go to step
Figure PCTCN2020094583-appb-000028
If not, go to step
Figure PCTCN2020094583-appb-000029
Figure PCTCN2020094583-appb-000030
使用next函数读取当前输入input的old_key下一行,并使next_row指向该行。
Figure PCTCN2020094583-appb-000030
Use the next function to read the next line of the old_key of the current input and make next_row point to this line.
Figure PCTCN2020094583-appb-000031
使用lookup函数通过new_key查找第一个大于或等于new_key的一行,并使new_key指向该行。
Figure PCTCN2020094583-appb-000031
Use the lookup function to find the first line greater than or equal to new_key through new_key, and make new_key point to this line.
Figure PCTCN2020094583-appb-000032
判断next_row与new_row是否指向同一行,若是,转入步骤
Figure PCTCN2020094583-appb-000033
若否,转入步骤
Figure PCTCN2020094583-appb-000034
Figure PCTCN2020094583-appb-000032
Judge whether next_row and new_row point to the same row, if yes, go to step
Figure PCTCN2020094583-appb-000033
If not, go to step
Figure PCTCN2020094583-appb-000034
Figure PCTCN2020094583-appb-000035
use_basic加1,并转入步骤
Figure PCTCN2020094583-appb-000036
Figure PCTCN2020094583-appb-000035
add 1 to use_basic and go to step
Figure PCTCN2020094583-appb-000036
Figure PCTCN2020094583-appb-000037
use_basic减1,并转入步骤
Figure PCTCN2020094583-appb-000038
Figure PCTCN2020094583-appb-000037
Decrease use_basic by 1, and go to step
Figure PCTCN2020094583-appb-000038
Figure PCTCN2020094583-appb-000039
令current_mode=1。此处,也可以直接跳过进入下一步
Figure PCTCN2020094583-appb-000040
Figure PCTCN2020094583-appb-000039
Let current_mode=1. Here, you can also skip directly to the next step
Figure PCTCN2020094583-appb-000040
Figure PCTCN2020094583-appb-000041
使用next函数读取当前input的old_key下一行,并使new_row指向该行。
Figure PCTCN2020094583-appb-000041
Use the next function to read the next line of the current input old_key, and make new_row point to this line.
Figure PCTCN2020094583-appb-000042
判断direction是否等于0,若是,转入步骤
Figure PCTCN2020094583-appb-000043
若否,转入步骤
Figure PCTCN2020094583-appb-000044
Figure PCTCN2020094583-appb-000042
Judge whether direction is equal to 0, if yes, go to step
Figure PCTCN2020094583-appb-000043
If not, go to step
Figure PCTCN2020094583-appb-000044
Figure PCTCN2020094583-appb-000045
使l_row指向new_row指向的行,并转入步骤
Figure PCTCN2020094583-appb-000046
Figure PCTCN2020094583-appb-000045
Make l_row point to the row pointed to by new_row, and go to step
Figure PCTCN2020094583-appb-000046
Figure PCTCN2020094583-appb-000047
使r_row指向new_row指向的行,并转入步骤
Figure PCTCN2020094583-appb-000048
Figure PCTCN2020094583-appb-000047
Make r_row point to the row pointed to by new_row, and go to step
Figure PCTCN2020094583-appb-000048
Figure PCTCN2020094583-appb-000049
判断new_row是否为空,若是,结束,若否,转入步骤④。
Figure PCTCN2020094583-appb-000049
Judge whether new_row is empty, if so, end, if not, go to step ④.
在图2中展示了本实施例提供的数据连接方法的逻辑。第①-③步为初始化工作部分,初始算法逻辑为之字形合并连接算法;整体算法包含了合并连接算法和之字形合并连接算法两部分,合并连接算法主体为第
Figure PCTCN2020094583-appb-000050
步,之字形合并连接算法的主体为第
Figure PCTCN2020094583-appb-000051
步。第
Figure PCTCN2020094583-appb-000052
步,当算法为之字形合并连接时,如果发现通过查找得到的下一行与直接顺序访问的下一行为同一行,说明之字形合并连接没有收益,通过use_basic加1,表示倾向使用合并连接算法,反之,如果发现通过查找得到的下一行与直接顺序访问的下一行为同一行,说明之字形合并连接有收益,通过use_basic或减1,表示不倾向使用合并连接算法;第
Figure PCTCN2020094583-appb-000053
步,当处于合并连接算法时,如果从某一侧获取的行键值连续小于另一侧行键值,则通过use_basic加1,表示不倾向使用之字形连接算法,反之,如果从某一侧获取的行键值连续小于另一侧行键值,则通过use_basic减1,表示倾向使用之字形连接算法。第
Figure PCTCN2020094583-appb-000054
步:判断当前use_basic值是否超过某一预设阈值,如果超过表示需要使用合并连接算法(current_mode=1)。
Figure 2 shows the logic of the data connection method provided in this embodiment. Steps ①-③ are the initial work part. The initial algorithm logic is the zigzag merge connection algorithm; the overall algorithm includes the merge connection algorithm and the zigzag merge connection algorithm. The main body of the merge connection algorithm is the first
Figure PCTCN2020094583-appb-000050
Step, the main body of the zigzag merge connection algorithm is the first
Figure PCTCN2020094583-appb-000051
step. First
Figure PCTCN2020094583-appb-000052
Step, when the algorithm is a zigzag merge connection, if the next line obtained by the search is found to be the same line as the next line of the direct sequential access, it means that the zigzag merge connection has no benefit. By adding 1 to the use_basic, it means that the merge connection algorithm is preferred. Conversely, if it is found that the next line obtained by the search is the same line as the next line of the direct sequential access, it means that the zigzag merge connection has benefits, and the use_basic or minus 1 indicates that it is not inclined to use the merge connection algorithm;
Figure PCTCN2020094583-appb-000053
Step, when in the merge connection algorithm, if the row key value obtained from one side is continuously smaller than the other side row key value, add 1 through use_basic, which means that it is not inclined to use the zigzag connection algorithm. On the contrary, if from one side The obtained row key value is continuously smaller than the other side row key value, then use_basic is reduced by 1, indicating that the zigzag connection algorithm is preferred. First
Figure PCTCN2020094583-appb-000054
Step: Determine whether the current use_basic value exceeds a certain preset threshold. If it exceeds, it means that a merge connection algorithm (current_mode=1) is needed.
请参考图3,为一个仅使用之字形合并连接算法进行连接处理出现效率低下的场景,第一数据集中的数据如左表所示,第二数据集中的数据如右表所示,当通过之字形合并连接算法对左右两表中的数据进行连接时,可以跳过的数据较少,按照键值查找的执行开销大于顺序扫描的执行开销,因此之字形合并连接算法的执行开销反而大于合并连接算法的执行开销。针对同样场景,图4展示了本实施例的执行过程,对整个之字形合并连接的执行过程进行监测,当探测到执行之字形合并连接算法可以跳过的行数不足(收益过小)时,将算法切换到合并连接算法;当处于合并连接算法状态时,如果出现某侧数据连续访问不足时,切换回之字形合并连接算法,合并连接算法和之字形合并连接算法的切换以算法执行过程中的执行参数为依据,自动进行切换。Please refer to Figure 3, which is a scenario where only the zigzag merge connection algorithm is used for connection processing and the connection processing is inefficient. The data in the first data set is shown in the left table, and the data in the second data set is shown in the right table. When the glyph merge connection algorithm connects the data in the left and right tables, there is less data that can be skipped. The execution cost of searching by key value is greater than the execution cost of sequential scanning. Therefore, the execution cost of the zigzag merge connection algorithm is greater than the merge connection. The execution cost of the algorithm. For the same scenario, Figure 4 shows the execution process of this embodiment. The execution process of the entire zigzag merge connection is monitored. When it is detected that the number of rows that can be skipped by the zigzag merge connection algorithm is insufficient (the profit is too small), Switch the algorithm to the merge connection algorithm; when in the merge connection algorithm state, if there is insufficient continuous access to the data on a certain side, switch back to the zigzag merge connection algorithm. The switch between the merge connection algorithm and the zigzag merge connection algorithm is in the process of executing the algorithm Based on the execution parameters of, it will switch automatically.
上述实施例提高的方案,结合了合并连接算法和之字形合并连接算法两部分逻辑,在每部分逻辑执行时,通过不同的执行反馈逻辑(基础合并算法使用第
Figure PCTCN2020094583-appb-000055
步,之字形合并连接算法使用第
Figure PCTCN2020094583-appb-000056
步),实时的调整两种算法的切换,在无需人工干预和硬编码的前提下,实现了算法对不同数据的自适应调整,克服了合并连接算法以及之字形合并连接算法各自处理容易出现比较低效的问题,使得整体算法在不同数据分布的场景中都能达到较好的执行效果,增强了合并连接算法的鲁棒性。
The improved scheme of the above embodiment combines two parts of the logic of the merged connection algorithm and the zigzag merged connection algorithm. When each part of the logic is executed, a different execution feedback logic is used (the basic merge algorithm uses the first
Figure PCTCN2020094583-appb-000055
Step, the zigzag merge connection algorithm uses the first
Figure PCTCN2020094583-appb-000056
Step), real-time adjustment of the switching of the two algorithms, without manual intervention and hard coding, realizes the adaptive adjustment of the algorithm to different data, overcomes the merge connection algorithm and the zigzag merge connection algorithm. The respective processing is prone to comparison The problem of inefficiency enables the overall algorithm to achieve better execution results in different data distribution scenarios, and enhances the robustness of the merged connection algorithm.
请参考图5,本实施例基于图1提供的一种数据连接方法,对应提供一种数据处理装置,该装置包括:Please refer to FIG. 5. This embodiment is based on the data connection method provided in FIG. 1, and correspondingly provides a data processing device, which includes:
连接单元51,用于通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据;The connection unit 51 is configured to perform a data connection operation on the first data set and the second data set through a first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
检测单元52,用于在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率;The detection unit 52 is configured to obtain execution parameters of the first connection algorithm during the execution of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
切换单元53,用于根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。The switching unit 53 is configured to switch the first connection algorithm to the second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the second connection algorithm. The first connection algorithm is different.
其中,所述检测单元52在检测获得执行参数时,可以当所述第一连接算法为合并连接算法时,获取所述合并连接算法连续访问一个数据集中数据行的行数作为所述执行参数;或者,当所述第一连接算法为之字形合并连接算法时,获取所述之字形合并连接算法跳过的数据行的行数作为执行参数。Wherein, when the detection unit 52 detects and obtains the execution parameter, when the first connection algorithm is a merge connection algorithm, the number of rows in which the merge connection algorithm continuously accesses a data row in a data set may be obtained as the execution parameter; Or, when the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
所述检测单元52在检测获得执行参数时,也可以当所述第一连接算法为合并连接算法时,每出现一次连续读取一个数据集中的数据行则对所述执行参数的值加上预设步长,反之,对所述执行参数的值减去所述预设步长;当所述第一连接算法为之字形合并连接算法时,每出现一次查找到的数据行与在前读取的数据行的下一行相同时对所述执行参数的值加上所述预设步长,反之,对所述执行参数的值减去所述预设步长。具体的,当所述第一连接算法为合并连接算法时,判断读取的当前第一数据行与当前第二数据行的键值大小,所述第一数据行属于第一数据集,所述第二数据行属于所述第二数据集;若当前第一数据行的键值小于或等于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值小于或等于第二数据行的键值,对所述执行参数加上所述预设步长;若当前第一数据行的键值大于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值大于第二数据行的键值,对所述执行参数加上所述预设步长。When the detection unit 52 detects and obtains execution parameters, or when the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, the value of the execution parameter is added to the preset value. Set the step size, and vice versa, subtract the preset step size from the value of the execution parameter; when the first connection algorithm is a zigzag merge connection algorithm, every time the data line found is compared with the previous read When the next row of the data row is the same, the preset step size is added to the value of the execution parameter, and vice versa, the preset step size is subtracted from the value of the execution parameter. Specifically, when the first connection algorithm is a merge connection algorithm, determine the size of the key value between the current first data row and the current second data row that are read, the first data row belongs to the first data set, and the The second data row belongs to the second data set; if the key value of the current first data row is less than or equal to the key value of the current second data row, the key value of the first data row and the second data row judged last time are obtained If the key value of the first data row determined last time is less than or equal to the key value of the second data row, add the preset step to the execution parameter; if the current first data The key value of the row is greater than the key value of the current second data row, get the judgment result of the key value of the first data row and the key value of the second data row judged last time, if the key of the first data row judged last time If the value is greater than the key value of the second data row, the preset step size is added to the execution parameter.
作为一种可选的实施方式,第一连接算法可以为合并连接算法或之字形合并连接算法,第二连接算法可以为之字形合并连接算法或合并连接算法。As an optional implementation manner, the first connection algorithm may be a merge connection algorithm or a zigzag merge connection algorithm, and the second connection algorithm may be a zigzag merge connection algorithm or a merge connection algorithm.
关于上述实施例中的装置,其中各个单元执行操作的具体方式已经在有关方法的 实施例中进行了详细描述,此处不再详细阐述。Regarding the device in the above-mentioned embodiment, the specific operation mode of each unit therein has been described in detail in the embodiment of the relevant method, and will not be elaborated here.
请参考图6,是根据一示例性实施例示出的一种用于实现数据查询方法的电子设备700的框图。例如,电子设备700可以是计算机、数据库控制台、平板设备、个人数字助理等。Please refer to FIG. 6, which is a block diagram of an electronic device 700 for implementing a data query method according to an exemplary embodiment. For example, the electronic device 700 may be a computer, a database console, a tablet device, a personal digital assistant, or the like.
参照图6,电子设备700可以包括以下一个或多个组件:处理组件702、存储器704、电源组件706、输入/输出(I/O)接口710、以及通信组件712。6, the electronic device 700 may include one or more of the following components: a processing component 702, a memory 704, a power supply component 706, an input/output (I/O) interface 710, and a communication component 712.
处理组件702通常控制电子设备700的整体操作,诸如与显示,数据通信,及记录操作相关联的操作。处理组件702可以包括一个或多个处理器720来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件702可以包括一个或多个模块,便于处理组件702和其他组件之间的交互。The processing component 702 generally controls the overall operations of the electronic device 700, such as operations associated with display, data communication, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to complete all or part of the steps of the foregoing method. In addition, the processing component 702 may include one or more modules to facilitate the interaction between the processing component 702 and other components.
存储器704被配置为存储各种类型的数据以支持在电子设备700的操作。这些数据的示例包括用于在电子设备700上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器704可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 704 is configured to store various types of data to support operations in the electronic device 700. Examples of these data include instructions for any application or method operating on the electronic device 700, contact data, phone book data, messages, pictures, videos, etc. The memory 704 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
电源组件706为电子设备700的各种组件提供电力。电源组件706可以包括电源管理系统,一个或多个电源,及其他与为电子设备700生成、管理和分配电力相关联的组件。The power supply component 706 provides power for various components of the electronic device 700. The power supply component 706 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 700.
I/O接口710为处理组件702和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 710 provides an interface between the processing component 702 and a peripheral interface module. The above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
通信组件712被配置为便于电子设备700和其他设备之间有线或无线方式的通信。电子设备700可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件712经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件712还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术、红外数据协会(IrDA)技术、超宽带(UWB)技术、蓝牙(BT)技术和其他技术来实现。The communication component 712 is configured to facilitate wired or wireless communication between the electronic device 700 and other devices. The electronic device 700 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 712 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 712 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,电子设备700可以被一个或多个应用专用集成电路(ASIC)、 数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, the electronic device 700 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器704,上述指令可由电子设备700的处理器720执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as the memory 704 including instructions, and the foregoing instructions may be executed by the processor 720 of the electronic device 700 to complete the foregoing method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
一种非临时性计算机可读存储介质,当所述存储介质中的指令由移动终端的处理器执行时,使得电子设备能够执行一种数据查询方法,所述方法包括:A non-transitory computer-readable storage medium, when instructions in the storage medium are executed by a processor of a mobile terminal, so that an electronic device can execute a data query method, the method includes:
通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据;在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率;根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。Perform a data connection operation on the first data set and the second data set through the first connection algorithm to connect data with the same key value in the first data set and the second data set; in the execution of the first connection algorithm In the process, the execution parameters of the first connection algorithm are obtained, and the execution parameters are used to characterize the execution efficiency of the algorithm execution process; the first connection algorithm is switched to the second connection algorithm according to the execution parameters, and the The second connection algorithm continues to perform the data connection operation, wherein the second connection algorithm is different from the first connection algorithm.
应当理解的是,本实施例并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本实施例的范围仅由所附的权利要求来限制It should be understood that the present embodiment is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of this embodiment is only limited by the appended claims
以上所述仅为本实施例的较佳实施例,并不用以限制本实施例,凡在本实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本实施例的保护范围之内。The foregoing descriptions are only preferred embodiments of this embodiment, and are not intended to limit this embodiment. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this embodiment shall be included in this embodiment. Within the protection scope of the embodiment.

Claims (12)

  1. 一种数据连接的方法,所述方法包括:A data connection method, the method includes:
    通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集和所述第二数据集中键值相等的数据;Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
    在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率;In the execution process of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
    根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .
  2. 如权利要求1所述的方法,所述获取所述第一连接算法的执行参数,包括:The method according to claim 1, wherein said obtaining the execution parameters of the first connection algorithm comprises:
    当所述第一连接算法为合并连接算法时,获取所述合并连接算法连续访问一个数据集中数据行的行数作为所述执行参数;或者,When the first connection algorithm is a merge connection algorithm, obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or,
    当所述第一连接算法为之字形合并连接算法时,获取所述之字形合并连接算法跳过的数据行的行数作为执行参数。When the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
  3. 如权利要求1所述的方法,所述获取所述第一连接算法的执行参数,包括:The method according to claim 1, wherein said obtaining the execution parameters of the first connection algorithm comprises:
    当所述第一连接算法为合并连接算法时,每出现一次连续读取一个数据集中的数据行则对所述执行参数的值加上预设步长,反之,对所述执行参数的值减去所述预设步长;When the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, a preset step is added to the value of the execution parameter; otherwise, the value of the execution parameter is subtracted Go to the preset step size;
    当所述第一连接算法为之字形合并连接算法时,每出现一次查找到的数据行与在前读取的数据行的下一行相同时对所述执行参数的值加上所述预设步长,反之,对所述执行参数的值减去所述预设步长。When the first connection algorithm is a zigzag merge connection algorithm, the value of the execution parameter is added to the preset step every time the data row found is the same as the next row of the previously read data row. Long, otherwise, subtract the preset step length from the value of the execution parameter.
  4. 如权利要求3所述的方法,所述当所述第一连接算法为合并连接算法时,每出现一次连续读取一个数据集中的数据行则对所述执行参数的值加上预设步长,包括:The method according to claim 3, wherein when the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, a preset step is added to the value of the execution parameter ,include:
    当所述第一连接算法为合并连接算法时,判断读取的当前第一数据行与当前第二数据行的键值大小,所述第一数据行属于第一数据集,所述第二数据行属于所述第二数据集;When the first connection algorithm is a merge connection algorithm, determine the size of the key value of the read current first data row and the current second data row, the first data row belongs to the first data set, and the second data The row belongs to the second data set;
    若当前第一数据行的键值小于或等于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值小于或等于第二数据行的键值,对所述执行参数加上所述预设步长;If the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row that was judged last time. The key value of the first data row is less than or equal to the key value of the second data row, and the preset step is added to the execution parameter;
    若当前第一数据行的键值大于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值大于第二数据行的键值,对所述执行参数加上所述预设步长。If the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row judged last time and the key value of the second data row. The key value of one data row is greater than the key value of the second data row, and the preset step size is added to the execution parameter.
  5. 如权利要求1~4中任一所述的方法,所述第一连接算法为合并连接算法或之字 形合并连接算法,所述第二连接算法为之字形合并连接算法或合并连接算法。The method according to any one of claims 1 to 4, wherein the first connection algorithm is a merge connection algorithm or a zigzag merge connection algorithm, and the second connection algorithm is a zigzag merge connection algorithm or a merge connection algorithm.
  6. 一种数据连接的装置,所述装置包括:A data connection device, the device comprising:
    连接单元,用于通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据;The connection unit is configured to perform a data connection operation on the first data set and the second data set through a first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
    检测单元,用于在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率;A detection unit, configured to obtain execution parameters of the first connection algorithm during the execution of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
    切换单元,用于根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。The switching unit is configured to switch the first connection algorithm to the second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is the same as the The first connection algorithm is different.
  7. 如权利要求6所述的装置,所述检测单元用于:The device according to claim 6, wherein the detection unit is configured to:
    当所述第一连接算法为合并连接算法时,获取所述合并连接算法连续访问一个数据集中数据行的行数作为所述执行参数;或者,When the first connection algorithm is a merge connection algorithm, obtaining the number of rows of data rows in a data set that the merge connection algorithm continuously accesses as the execution parameter; or,
    当所述第一连接算法为之字形合并连接算法时,获取所述之字形合并连接算法跳过的数据行的行数作为执行参数。When the first connection algorithm is a zigzag merge connection algorithm, the number of data rows skipped by the zigzag merge connection algorithm is acquired as an execution parameter.
  8. 如权利要求6所述的装置,所述检测单元用于:The device according to claim 6, wherein the detection unit is configured to:
    当所述第一连接算法为合并连接算法时,每出现一次连续读取一个数据集中的数据行则对所述执行参数的值加上预设步长,反之,对所述执行参数的值减去所述预设步长;When the first connection algorithm is a merge connection algorithm, each time a data row in a data set is continuously read, a preset step is added to the value of the execution parameter; otherwise, the value of the execution parameter is subtracted Go to the preset step size;
    当所述第一连接算法为之字形合并连接算法时,每出现一次查找到的数据行与在前读取的数据行的下一行相同时对所述执行参数的值加上所述预设步长,反之,对所述执行参数的值减去所述预设步长。When the first connection algorithm is a zigzag merge connection algorithm, the value of the execution parameter is added to the preset step every time the data row found is the same as the next row of the previously read data row. Long, otherwise, subtract the preset step length from the value of the execution parameter.
  9. 如权利要求8所述的装置,所述检测单元还用于:The device according to claim 8, wherein the detection unit is further configured to:
    当所述第一连接算法为合并连接算法时,判断读取的当前第一数据行与当前第二数据行的键值大小,所述第一数据行属于第一数据集,所述第二数据行属于所述第二数据集;When the first connection algorithm is a merge connection algorithm, determine the size of the key value of the read current first data row and the current second data row, the first data row belongs to the first data set, and the second data The row belongs to the second data set;
    若当前第一数据行的键值小于或等于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值小于或等于第二数据行的键值,对所述执行参数加上所述预设步长;If the key value of the current first data row is less than or equal to the key value of the current second data row, obtain the judgment result of the key value of the first data row and the key value of the second data row that was judged last time. The key value of the first data row is less than or equal to the key value of the second data row, and the preset step is added to the execution parameter;
    若当前第一数据行的键值大于当前第二数据行的键值,获取上次判断的第一数据行的键值与第二数据行的键值大小的判断结果,若上次判断的第一数据行的键值大于第二数据行的键值,对所述执行参数加上所述预设步长。If the key value of the current first data row is greater than the key value of the current second data row, obtain the judgment result of the key value of the first data row judged last time and the key value of the second data row. The key value of one data row is greater than the key value of the second data row, and the preset step size is added to the execution parameter.
  10. 如权利要求6~9中任一所述的装置,所述装置还包括:The device according to any one of claims 6-9, the device further comprising:
    所述第一连接算法为合并连接算法或之字形合并连接算法,所述第二连接算法为之字形合并连接算法或合并连接算法。The first connection algorithm is a merge connection algorithm or a zigzag merge connection algorithm, and the second connection algorithm is a zigzag merge connection algorithm or a merge connection algorithm.
  11. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现以下步骤:A computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the following steps are implemented:
    通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据;Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
    在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率;In the execution process of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
    根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .
  12. 一种电子设备,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:An electronic device that includes a memory and one or more programs, wherein one or more programs are stored in the memory and configured to be executed by one or more processors, including: Instructions to do the following:
    通过第一连接算法对第一数据集和第二数据集执行数据连接操作,以连接所述第一数据集与所述第二数据集中键值相等的数据;Performing a data connection operation on the first data set and the second data set by the first connection algorithm, so as to connect data with the same key value in the first data set and the second data set;
    在所述第一连接算法的执行过程中,获取所述第一连接算法的执行参数,所述执行参数用于表征算法执行过程的执行效率;In the execution process of the first connection algorithm, acquiring execution parameters of the first connection algorithm, where the execution parameters are used to characterize the execution efficiency of the algorithm execution process;
    根据所述执行参数将所述第一连接算法切换为第二连接算法,通过所述第二连接算法继续执行所述数据连接操作,其中,所述第二连接算法与所述第一连接算法不同。Switch the first connection algorithm to a second connection algorithm according to the execution parameter, and continue to execute the data connection operation through the second connection algorithm, wherein the second connection algorithm is different from the first connection algorithm .
PCT/CN2020/094583 2019-09-27 2020-06-05 Data connection method and apparatus and electronic device WO2021057088A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910923118.4A CN110807030B (en) 2019-09-27 2019-09-27 Data connection method and device and electronic equipment
CN201910923118.4 2019-09-27

Publications (1)

Publication Number Publication Date
WO2021057088A1 true WO2021057088A1 (en) 2021-04-01

Family

ID=69487837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/094583 WO2021057088A1 (en) 2019-09-27 2020-06-05 Data connection method and apparatus and electronic device

Country Status (2)

Country Link
CN (1) CN110807030B (en)
WO (1) WO2021057088A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807030B (en) * 2019-09-27 2021-03-16 蚂蚁金服(杭州)网络技术有限公司 Data connection method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060190497A1 (en) * 2005-02-18 2006-08-24 International Business Machines Corporation Support for schema evolution in a multi-node peer-to-peer replication environment
CN102609487A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage-oriented Hash joint method for indexes in barrels
CN102799667A (en) * 2012-07-13 2012-11-28 北京工商大学 Hierarchical clustering method based on asymmetric distance
CN110807030A (en) * 2019-09-27 2020-02-18 支付宝(杭州)信息技术有限公司 Data connection method and device and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1986108B1 (en) * 2007-04-27 2009-09-16 Software Ag Method and database system for executing an XML database query
CN108182192A (en) * 2016-12-08 2018-06-19 南京航空航天大学 A kind of half-connection inquiry plan selection algorithm based on distributed data base
CN108536692B (en) * 2017-03-01 2022-03-11 华为技术有限公司 Execution plan generation method and device and database server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060190497A1 (en) * 2005-02-18 2006-08-24 International Business Machines Corporation Support for schema evolution in a multi-node peer-to-peer replication environment
CN102609487A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage-oriented Hash joint method for indexes in barrels
CN102799667A (en) * 2012-07-13 2012-11-28 北京工商大学 Hierarchical clustering method based on asymmetric distance
CN110807030A (en) * 2019-09-27 2020-02-18 支付宝(杭州)信息技术有限公司 Data connection method and device and electronic equipment

Also Published As

Publication number Publication date
CN110807030A (en) 2020-02-18
CN110807030B (en) 2021-03-16

Similar Documents

Publication Publication Date Title
US10884817B2 (en) Method and apparatus for parallel execution in terminal database using data partitions
JP5946973B2 (en) Television station logo identification method, apparatus, television, system, program, and recording medium
US9207861B2 (en) Method and mobile terminal for processing touch input in two different states
CN107741937A (en) A kind of data query method and device
US20210149806A1 (en) Data prefetching method and terminal device
US11481318B2 (en) Method and apparatus, and storage system for translating I/O requests before sending
WO2021057088A1 (en) Data connection method and apparatus and electronic device
CN104765560A (en) Display control method
CN104951637B (en) A kind of method and device for obtaining training parameter
WO2014101520A1 (en) Method and system for achieving analytic function based on mapreduce
CN111068313B (en) Scene update control method and device in application and storage medium
CN107885718A (en) Semanteme determines method and device
CN110928900B (en) Multi-table data query method, device, terminal and computer storage medium
WO2020094064A1 (en) Performance optimization method, device, apparatus, and computer readable storage medium
CN109885384B (en) Task parallelism optimization method and device, computer equipment and storage medium
US11409523B2 (en) Graphics processing unit
CN114840565A (en) Sampling query method, device, electronic equipment and computer readable storage medium
WO2020102937A1 (en) Handwriting processing method, handwriting input device and computer readable storage medium
CN106020999B (en) Communication means and equipment inside a kind of operating system
CN111143411A (en) Dynamic streaming pre-calculation method and device and storage medium
WO2022257970A1 (en) Point cloud geometric information encoding processing method, decoding processing method, and related device
CN104346362A (en) Method and device for finding target objects on basis of attribute values
US12033267B2 (en) Scene update control method and apparatus, electronic device, and storage medium
WO2024146591A1 (en) List creating method and terminal
CN109271597A (en) A kind of non-relational database multilist scene Pagination Display method and device thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20868386

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20868386

Country of ref document: EP

Kind code of ref document: A1