TWI746511B - Data table connection method and device - Google Patents

Data table connection method and device Download PDF

Info

Publication number
TWI746511B
TWI746511B TW106104646A TW106104646A TWI746511B TW I746511 B TWI746511 B TW I746511B TW 106104646 A TW106104646 A TW 106104646A TW 106104646 A TW106104646 A TW 106104646A TW I746511 B TWI746511 B TW I746511B
Authority
TW
Taiwan
Prior art keywords
data
data table
target
connection
data record
Prior art date
Application number
TW106104646A
Other languages
Chinese (zh)
Other versions
TW201738781A (en
Inventor
吳煒
Original Assignee
香港商阿里巴巴集團服務有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 香港商阿里巴巴集團服務有限公司 filed Critical 香港商阿里巴巴集團服務有限公司
Publication of TW201738781A publication Critical patent/TW201738781A/en
Application granted granted Critical
Publication of TWI746511B publication Critical patent/TWI746511B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Abstract

本發明提供一種資料表連接方法及裝置。方法包括:接收資料表連接任務,資料表連接任務指示按照連接條件對第一資料表和第二資料表進行連接操作;根據連接條件,將第二資料表中的資料記錄載入到分散式系統中至少兩個節點上;讀取第一資料表中的資料記錄作為目前資料記錄,根據目前資料記錄對應的連接條件,從至少兩個節點中確定目標節點,並讀取目標節點上儲存的第二資料表中的資料記錄作為目標資料記錄;對目前資料記錄和目標資料記錄進行連接操作。本發明可以降低資料表連接操作消耗的計算資源。 The invention provides a data table connection method and device. The method includes: receiving the data table connection task, the data table connection task instructs to connect the first data table and the second data table according to the connection conditions; according to the connection conditions, load the data records in the second data table to the distributed system Read the data record in the first data table as the current data record, determine the target node from at least two nodes according to the connection condition corresponding to the current data record, and read the first data record stored on the target node 2. The data record in the data table is used as the target data record; the current data record and the target data record are connected. The invention can reduce the computing resources consumed by the data table connection operation.

Description

資料表連接方法及裝置 Data table connection method and device

本發明關於資料庫技術領域,尤其關於一種資料表連接方法及裝置。 The present invention relates to the technical field of database, in particular to a method and device for connecting data tables.

隨著互聯網的發展,資料呈現爆發式增長,資料結構也開始多元化,資料含有的信息量越來越多,資料倉庫在這樣的背景下發揮著巨大的作用。由於大資料時代的降臨,資料倉庫轉成為分散式架構,以滿足爆發式增長的計算及儲存的需求。分散式資料倉庫一般使用列式儲存,並以檔的形式保存資料,因此,採用分散式資料倉庫可提高大資料的儲存及計算性能。 With the development of the Internet, data has shown explosive growth, and data structures have begun to diversify. Data contains more and more information. Data warehouses play a huge role in this context. With the advent of the era of big data, data warehouses have turned into a decentralized structure to meet the explosive growth of computing and storage requirements. Distributed data warehouses generally use row storage and save data in the form of files. Therefore, the use of distributed data warehouses can improve the storage and computing performance of large data.

在分散式資料倉庫的查詢過程中,經常需要進行資料表之間的連接(Join)計算。現有技術在處理資料表之間的Join計算時,一般都是先將所有待Join的資料表透過MapReduce的方式做洗牌(shuffle)排序,然後在Reducer端對已經排過序的資料表進行歸併操作。shuffle排序實際上是指將Map端各個待Join的資料表按照Join條件進行分區並分配到不同Reducer端的過程。 In the process of querying distributed data warehouses, it is often necessary to perform join calculations between data tables. When processing Join calculations between data tables in the prior art, generally, all the data tables to be joined are first shuffled and sorted through MapReduce, and then the data tables that have been sorted are merged on the Reducer side. operate. Shuffle sorting actually refers to the process of partitioning the data tables to be joined on the Map side according to the Join conditions and assigning them to different Reducer sides.

在典型的“星型”Join場景下,假設待Join資料表包括一個主表和n個輔表,主表包含M條資料記錄,那麼在對主表和n個輔表進行Join計算時,shuffle排序需要處理的總數據量包括shuffle主表需要處理的資料量即M*n和shuffle n個輔表需要處理的資料量,這會消耗很多計算資源。 In a typical "star" Join scenario, assuming that the data table to be joined includes a main table and n auxiliary tables, and the main table contains M data records, then when performing Join calculations on the main table and n auxiliary tables, shuffle The total amount of data that needs to be processed for sorting includes the amount of data that needs to be processed by the shuffle main table, that is, the amount of data that needs to be processed by M*n and shuffle n auxiliary tables, which consumes a lot of computing resources.

本發明的多個態樣提供一種資料表連接方法及裝置,用以降低資料表連接操作消耗的計算資源。 Various aspects of the present invention provide a data table connection method and device, which are used to reduce the computing resources consumed by the data table connection operation.

本發明的一態樣,提供一種資料表連接方法,包括:接收資料表連接任務,所述資料表連接任務指示按照連接條件對第一資料表和第二資料表進行連接操作;根據所述連接條件,將所述第二資料表中的資料記錄載入到分散式系統中至少兩個節點上;讀取所述第一資料表中的資料記錄作為目前資料記錄,根據所述目前資料記錄對應的連接條件,從所述至少兩個節點中確定目標節點,並讀取所述目標節點上儲存的所述第二資料表中的資料記錄作為目標資料記錄;對所述目前資料記錄和所述目標資料記錄進行連接操作。 An aspect of the present invention provides a data table connection method, including: receiving a data table connection task, the data table connection task instructs to perform a connection operation on the first data table and the second data table according to the connection condition; Condition, load the data records in the second data table to at least two nodes in the distributed system; read the data records in the first data table as current data records, and correspond to the current data records The connection condition of the target node is determined from the at least two nodes, and the data record in the second data table stored on the target node is read as the target data record; the current data record and the The target data record is connected.

本發明的另一態樣,提供一種資料表連接裝置,包括:接收模組,用於接收資料表連接任務,所述資料表連 接任務指示按照連接條件對第一資料表和第二資料表進行連接操作;載入模組,用於根據所述連接條件,將所述第二資料表中的資料記錄載入到分散式系統中至少兩個節點上;讀取模組,用於讀取所述第一資料表中的資料記錄作為目前資料記錄,根據所述目前資料記錄對應的連接條件,從所述至少兩個節點中確定目標節點,並讀取所述目標節點上儲存的所述第二資料表中的資料記錄作為目標資料記錄;連接模組,用於對所述目前資料記錄和所述目標資料記錄進行連接操作。 Another aspect of the present invention provides a data table connection device, including: a receiving module for receiving data table connection tasks, the data table connection The connection task instruction connects the first data table and the second data table according to the connection conditions; the loading module is used to load the data records in the second data table to the distributed system according to the connection conditions The reading module is used to read the data record in the first data table as the current data record, according to the connection condition corresponding to the current data record, from the at least two nodes Determine the target node, and read the data record in the second data table stored on the target node as the target data record; the connection module is used to connect the current data record and the target data record .

在本發明中,在處理資料表連接任務時,首先根據其中的連接條件,將第二資料表中的資料記錄載入到分散式系統中至少兩個節點上,之後,可以直接讀取第一資料表中的資料記錄,並根據所讀取的第一資料表中的資料記錄對應的連接條件,從相應節點上讀取所需的第二資料表中的資料記錄,之後對讀取到的兩個資料表中的資料記錄進行連接操作。由此可見,本發明只需將第二資料表按照連接條件分佈到不同節點上,不需要將第一資料表分佈到不同節點上,減少了shuffle排序需要處理的資料量,有利於降低連接操作所消耗的計算資源。 In the present invention, when processing the data table connection task, first load the data records in the second data table to at least two nodes in the distributed system according to the connection conditions therein, and then directly read the first data record. Data records in the data table, and according to the connection conditions corresponding to the data records in the first data table read, read the data records in the second data table needed from the corresponding node, and then compare the read data records The data records in the two data tables are connected. It can be seen that the present invention only needs to distribute the second data table to different nodes according to the connection conditions, and does not need to distribute the first data table to different nodes, which reduces the amount of data that needs to be processed for shuffle sorting, and is beneficial to reduce connection operations. The computing resources consumed.

S101、S102、S103、S104‧‧‧方法步驟 S101, S102, S103, S104‧‧‧Method steps

21‧‧‧控制節點 21‧‧‧Control Node

22‧‧‧調度節點 22‧‧‧Scheduling Node

23‧‧‧計算節點 23‧‧‧Compute Node

31‧‧‧接收模組 31‧‧‧Receiving Module

32‧‧‧載入模組 32‧‧‧Load module

33‧‧‧讀取模組 33‧‧‧Reading Module

34‧‧‧連接模組 34‧‧‧Connecting module

為了更清楚地說明本發明實施例中的技術方案,下面 將對實施例或現有技術描述中所需要使用的附圖作一簡單地介紹,顯而易見地,下面描述中的附圖是本發明的一些實施例,對於本領域普通技術人員來講,在不付出創造性勞動的前提下,還可以根據這些附圖獲得其他的附圖。 In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following A brief introduction will be made to the embodiments or the accompanying drawings needed in the description of the prior art. Obviously, the accompanying drawings in the following description are some embodiments of the present invention. Under the premise of creative work, other drawings may be obtained based on these drawings.

圖1為本發明一實施例提供的資料表連接方法的流程示意圖;圖2為本發明另一實施例提供的分散式系統的架構示意圖;圖3為本發明又一實施例提供的資料表連接裝置的結構示意圖;圖4為本發明又一實施例提供的資料表連接裝置的結構示意圖。 Fig. 1 is a schematic flow chart of a data table connection method provided by an embodiment of the present invention; Fig. 2 is a schematic diagram of the architecture of a distributed system provided by another embodiment of the present invention; Fig. 3 is a data table connection provided by another embodiment of the present invention Schematic diagram of the structure of the device; FIG. 4 is a schematic diagram of the structure of a data table connection device provided by another embodiment of the present invention.

為使本發明實施例的目的、技術方案和優點更加清楚,下面將結合本發明實施例中的附圖,對本發明實施例中的技術方案進行清楚、完整地描述,顯然,所描述的實施例是本發明一部分實施例,而不是全部的實施例。基於本發明中的實施例,本領域普通技術人員在沒有做出創造性勞動前提下所獲得的所有其他實施例,都屬於本發明保護的範圍。 In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

在分散式資料倉庫的查詢過程中,經常需要進行資料表之間的連接(Join)計算。現有技術在處理資料表之間的Join操作時,由於待Join的資料表比較大,所以一般都 是先將所有待Join的資料表透過MapReduce的方式做shuffle排序,然後在Reducer端對已經排過序的資料表進行歸併操作。shuffle排序實際上是指將Map端各個待Join的資料表按照Join條件進行分區並分配到不同Reducer端的過程。由於需要對所有待Join的資料表進行shuffle排序,消耗的計算資源較多。 In the process of querying distributed data warehouses, it is often necessary to perform join calculations between data tables. In the prior art, when processing Join operations between data tables, since the data tables to be joined are relatively large, they are generally It first shuffles all the data tables to be joined through MapReduce, and then merges the ordered data tables on the Reducer side. Shuffle sorting actually refers to the process of partitioning the data tables to be joined on the Map side according to the Join conditions and assigning them to different Reducer sides. Due to the need to shuffle all the data tables to be joined, more computing resources are consumed.

針對上述技術問題,本發明提供一種解決方案,即透過將第二資料表分佈儲存到多個節點上,成為一個分散式的快取,對第一資料表處理時,透過網路獲取遠端節點上儲存的第二資料表中的資料記錄,從而進行分散式的雜湊映射連接(Hash map Join),使得無需對主表進行shuffle排序,這樣可以節約對第一資料表進行shuffle排序消耗的計算資源。 In view of the above technical problems, the present invention provides a solution, that is, by distributing and storing the second data table on multiple nodes, it becomes a distributed cache. When the first data table is processed, the remote node is obtained through the network. The data records in the second data table stored on the above, so as to perform a distributed hash map join (Hash map Join), so that there is no need to shuffle the main table, which can save the computing resources consumed by shuffle the first data table .

圖1為本發明一實施例提供的資料表連接方法的流程示意圖。如圖1所示,該方法包括: FIG. 1 is a schematic flowchart of a data table connection method provided by an embodiment of the present invention. As shown in Figure 1, the method includes:

101、接收資料表連接任務,該資料表連接任務指示按照連接條件對第一資料表和第二資料表進行連接操作。 101. Receive a data table connection task, where the data table connection task instructs to perform a connection operation on the first data table and the second data table according to the connection conditions.

102、根據上述連接條件,將第二資料表中的資料記錄載入到分散式系統中至少兩個節點上。 102. According to the above connection conditions, load the data records in the second data table to at least two nodes in the distributed system.

103、讀取第一資料表中的資料記錄作為目前資料記錄,根據目前資料記錄對應的連接條件,從至少兩個節點中確定目標節點,並讀取目標節點上儲存的第二資料表中的資料記錄作為目標資料記錄。 103. Read the data record in the first data table as the current data record, determine the target node from at least two nodes according to the connection conditions corresponding to the current data record, and read the second data table stored on the target node The data record serves as the target data record.

104、對目前資料記錄和目標資料記錄進行連接操 作。 104. Connect the current data record and target data record do.

本實施例提供一種資料表連接方法,可由資料表連接裝置來執行,用以進行資料表之間的Join操作,同時儘量降低所消耗的計算資源。本實施例提供的方法適用於分散式系統,這裡的分散式系統中的不同機器可以分別作為一個節點。本實施例並不限制分散式系統的實現架構,例如可以是但不限於MapReduce架構。 This embodiment provides a data table connection method, which can be executed by a data table connection device to perform a Join operation between data tables while minimizing the consumption of computing resources. The method provided in this embodiment is applicable to a distributed system, where different machines in the distributed system can be used as a node respectively. This embodiment does not limit the implementation architecture of the distributed system, for example, it may be but not limited to the MapReduce architecture.

當需要進行資料表之間的Join操作時,可以向資料表連接裝置發送資料表連接任務;資料表連接裝置接收資料表連接任務。該資料表連接任務指示按照連接條件對第一資料表和第二資料表進行Join處理。這裡的第一資料表和第二資料表實際上是待連接的資料表。 When a Join operation between data tables is required, the data table connection task can be sent to the data table connection device; the data table connection device receives the data table connection task. The data table connection task instructs to perform Join processing on the first data table and the second data table according to the connection conditions. The first data table and the second data table here are actually data tables to be connected.

在具體實現上,該資料表連接任務攜帶有連接條件、第一資料表的標識、第二資料表的標識、第一資料表的儲存位置、以及第二資料表的儲存位置等資訊。其中,資料表連接裝置可以對資料表連接任務進行解析,獲取連接條件、第一資料表的標識、第二資料表的標識、第一資料表的儲存位置、以及第二資料表的儲存位置等資訊,並根據第一資料表的標識和第二資料表的標識確定需要進行Join操作的資料表,另外,可以根據第一資料表的儲存位置和第二資料表的儲存位置讀取第一資料表和第二資料表。 In specific implementation, the data table connection task carries information such as connection conditions, the identification of the first data table, the identification of the second data table, the storage location of the first data table, and the storage location of the second data table. Among them, the data table connection device can analyze the data table connection task to obtain connection conditions, the identification of the first data table, the identification of the second data table, the storage location of the first data table, and the storage location of the second data table, etc. Information, and according to the identification of the first data table and the identification of the second data table to determine the data table that needs to be joined. In addition, the first data can be read according to the storage location of the first data table and the storage location of the second data table Table and second data table.

在一種實際應用中,第一資料表可以作為主表,第二資料表可以作為輔表實現。其中,輔表的數量可以是一個或多個。 In a practical application, the first data table can be used as the main table, and the second data table can be implemented as the auxiliary table. Among them, the number of auxiliary tables can be one or more.

資料表連接裝置接收到資料表連接任務之後,可以獲知需要按照連接條件對第一資料表和第二資料表進行Join操作。之後,在執行Join操作之前,首先根據連接條件,將第二資料表中的資料記錄載入到分散式系統中至少兩個節點上,實現分散式儲存。 After the data table connection device receives the data table connection task, it can learn that it is necessary to perform a Join operation on the first data table and the second data table according to the connection conditions. After that, before performing the Join operation, first load the data records in the second data table to at least two nodes in the distributed system according to the connection conditions to realize distributed storage.

較佳的,位於至少兩個節點中每個節點上的第二資料表中的資料記錄,其資料量小於單個節點的記憶體限制,也就是說,分佈到至少兩個節點中每個節點上的第二資料表中的資料記錄,均能夠全部放到相應節點的儲存空間(較佳為記憶體)中。 Preferably, the data records in the second data table located on each of the at least two nodes have a data amount less than the memory limit of a single node, that is, they are distributed to each of the at least two nodes The data records in the second data table can all be placed in the storage space (preferably memory) of the corresponding node.

在一可選實施方式中,上述連接條件包括連接所需的至少一個目標鍵,這裡的目標鍵實際上就是鍵值對(key-value)中的鍵(key)。基於此,資料表連接裝置具體可以分別對至少一個目標鍵中的各目標鍵進行雜湊運算,以獲取各目標鍵的雜湊值;根據各目標鍵的雜湊值和上述用於儲存第二資料表中的資料記錄的至少兩個節點的數量,確定各目標鍵對應的節點;將第二資料表中對應於各目標鍵的資料記錄分別載入到各目標鍵對應的節點上。 In an optional embodiment, the above connection condition includes at least one target key required for the connection, and the target key here is actually a key in a key-value pair. Based on this, the data table connection device can specifically perform a hash operation on each target key in the at least one target key to obtain the hash value of each target key; according to the hash value of each target key and the above used to store the second data table The number of at least two nodes in the data record of, determine the node corresponding to each target key; load the data record corresponding to each target key in the second data table to the node corresponding to each target key.

進一步,資料表連接裝置可以利用各目標鍵的雜湊值對上述用於儲存第二資料表中的資料記錄的至少兩個節點的數量取模,根據取模結果確定各目標鍵對應的節點。具體的,可以將取模結果代表的節點作為目標鍵對應的節點。或者, 資料表連接裝置可以根據上述用於儲存第二資料表中 的資料記錄的至少兩個節點的數量和目標鍵的數量,將各目標鍵均分到各節點上,在均分過程中,可以根據各目標鍵的雜湊值,將雜湊值相近的目標鍵分到相同節點。這裡的雜湊值相近可以是指雜湊值之差小於預設門限,但不限於此。 Further, the data table connection device may use the hash value of each target key to modulate the number of at least two nodes used for storing data records in the second data table, and determine the node corresponding to each target key according to the result of the modulus. Specifically, the node represented by the result of the modulus can be used as the node corresponding to the target key. or, The data table connection device can be used to store the second data table according to the above The data records the number of at least two nodes and the number of target keys, and each target key is equally divided into each node. In the process of equalization, the target key with similar hash value can be divided according to the hash value of each target key. To the same node. The similar hash values here may mean that the difference between the hash values is less than a preset threshold, but it is not limited to this.

進一步,在上述載入第二資料表中的資料記錄到至少兩個節點上的過程中,具體可以將第二資料表中的資料記錄載入到至少兩個節點的記憶體中。第二資料表中的資料記錄儲存在節點的記憶體中,可以隨時讀取,讀取速度較快,有利於提高Join操作的效率。 Further, in the process of loading the data records in the second data table to at least two nodes, specifically, the data records in the second data table may be loaded into the memories of at least two nodes. The data records in the second data table are stored in the memory of the node, which can be read at any time, and the reading speed is faster, which is beneficial to improve the efficiency of the Join operation.

值得說明的是,較佳的,可以將第二資料表中的資料記錄載入到上述至少兩個節點的記憶體中,但並不限於記憶體,還可以是節點的固態硬碟(Solid State Drives,SSD)或者其他儲存媒體中。 It is worth noting that, preferably, the data records in the second data table can be loaded into the memory of the above at least two nodes, but it is not limited to the memory, and it can also be a solid state drive (Solid State Drive) of the node. Drives, SSD) or other storage media.

在一可選實施方式中,在根據連接條件,將第二資料表中的資料記錄載入到分散式系統中的至少兩個節點上之前,可以判斷第二資料表的資料量是否大於單個節點的記憶體限制;若判斷結果為是,即第二資料表的資料量大於單個節點的記憶體限制,這意味著第二資料表中的資料記錄不能全部放在單個節點的記憶體中,因此可以根據連接條件,將第二資料表中的資料記錄載入到至少兩個節點中,使得分佈到每個節點上的第二資料表中的資料記錄均能全部放到相應節點的記憶體中,實現分散式儲存。簡單的說,分佈到每個節點上的第二資料表中的資料記錄,其 資料量小於單個節點的記憶體限制。 In an optional embodiment, before loading the data records in the second data table to at least two nodes in the distributed system according to the connection conditions, it can be determined whether the data volume of the second data table is greater than that of a single node If the judgment result is yes, that is, the amount of data in the second data table is greater than the memory limit of a single node, which means that the data records in the second data table cannot all be placed in the memory of a single node, so According to the connection conditions, the data records in the second data table can be loaded into at least two nodes, so that the data records in the second data table distributed on each node can all be placed in the memory of the corresponding node , Realize decentralized storage. Simply put, the data records in the second data table distributed to each node, its The amount of data is less than the memory limit of a single node.

若上述判斷結果為否,即第二資料表的資料量小於或等於單個節點的記憶體限制,這意味著第二資料表中的資料記錄可以全部放在單個節點的記憶體中,較為較佳的,可以將第二資料表的資料記錄全部放到單個節點的記憶體中,從而節省對第二資料表的資料記錄進行shuffle排序,節約計算資源。 If the above judgment result is no, that is, the amount of data in the second data table is less than or equal to the memory limit of a single node, which means that all the data records in the second data table can be placed in the memory of a single node, which is better Yes, all the data records of the second data table can be stored in the memory of a single node, thereby saving the shuffle sorting of the data records of the second data table and saving computing resources.

上述將第二資料表中的資料記錄載入到分散式系統中至少兩個節點上,這相當於將第二資料表變成了多個小表,每個小表可以全部在相應節點的記憶體中放得下,形成分散式的KV儲存,使得可以做分散式的Hash map Join,而不需要做排序合併連接(Sort Merge Join)。做分散式的Hash map Join,不需要對第一資料中的資料記錄進行排序,可以直接讀取第一資料表中的資料記錄,並根據所讀取的第一資料表中的資料記錄對應的連接條件,從相應節點上讀取所需的第二資料表中的資料記錄,之後對讀取到的兩個資料表中的資料記錄進行Join操作。 The above loading the data records in the second data table to at least two nodes in the distributed system, which is equivalent to turning the second data table into multiple small tables, and each small table can be all in the memory of the corresponding node It can be put down to form a distributed KV storage, which makes it possible to do distributed Hash map Join without the need to do Sort Merge Join. To do distributed Hash map Join, you don’t need to sort the data records in the first data table, you can directly read the data records in the first data table, and correspond to the data records in the first data table read. The connection condition is to read the required data records in the second data table from the corresponding node, and then perform the Join operation on the data records in the read two data tables.

其中,本實施例中分散式的Hash map Join與現有Hash map Join的區別在於:在對第一資料表處理時,不是在本地記憶體中查找第二資料表中的資料記錄,而是透過網路獲取遠端節點上儲存的第二資料表中的資料記錄。 Among them, the difference between the distributed Hash map Join in this embodiment and the existing Hash map Join is that when processing the first data table, instead of looking up data records in the second data table in the local memory, it uses the Internet. To obtain the data records in the second data table stored on the remote node.

具體的,在將第二資料表中的資料記錄載入到分散式系統中至少兩個節點之後,資料表連接裝置可以到第一資料表的儲存位置讀取第一資料表中的資料記錄,將讀取到 的資料記錄作為目前資料記錄,根據目前資料記錄對應的連接條件,從上述至少兩個節點中確定目標節點,這裡的目標節點是指與目前資料記錄進行Join操作所需的第二資料表中的資料記錄所在的節點,然後讀取目標節點上儲存的第二資料表中的資料記錄作為目標資料記錄,這裡的目標資料記錄是指與目前資料記錄進行Join操作所需的第二資料表中的資料記錄。 Specifically, after the data records in the second data table are loaded into at least two nodes in the distributed system, the data table connection device can go to the storage location of the first data table to read the data records in the first data table, Will read The data record of is used as the current data record. According to the connection conditions corresponding to the current data record, the target node is determined from the above two nodes. The target node here refers to the second data table required for the Join operation with the current data record The node where the data record is located, and then read the data record in the second data table stored on the target node as the target data record, where the target data record refers to the second data table required for the Join operation with the current data record Data records.

在讀取到目前資料記錄以及與目前資料記錄進行Join操作所需的目標資料記錄之後,對目前資料記錄與目標資料記錄進行Join操作。由於如何對目前資料記錄與目標資料記錄進行Join操作不是本發明的重點,在此不再詳述,可參考現有技術中有關Join操作的處理流程。 After reading the current data record and the target data record required for the Join operation with the current data record, perform the Join operation on the current data record and the target data record. Since how to perform the Join operation on the current data record and the target data record is not the focus of the present invention, it will not be described in detail here, and the processing flow of the Join operation in the prior art can be referred to.

在一可選實施方式中,考慮到資料表連接裝置的本地快取中可能會存在與目前資料記錄進行Join操作所需的目標資料記錄,基於此,在根據目前資料記錄對應的連接條件,從至少兩個節點中確定目標節點,並讀取目標節點上儲存的第二資料表中的資料記錄作為目標資料記錄之前,可以根據目前資料記錄對應的連接條件,判斷本地快取中是否存在目標資料記錄,若判斷結果為否,則執行根據目前資料記錄對應的連接條件,從至少兩個節點中確定目標節點,並讀取目標節點上儲存的第二資料表中的資料記錄作為目標資料記錄的操作;若判斷結果為是,則可以從本地快取中獲取目標資料記錄,這樣可以更加快速的獲取目標資料記錄,節約獲取目標資料記錄所消耗的網路資 源,提高Join操作的效率。 In an alternative embodiment, considering that there may be target data records required for the Join operation with the current data record in the local cache of the data table connection device, based on this, according to the connection conditions corresponding to the current data record, from Before determining the target node among at least two nodes, and reading the data record in the second data table stored on the target node as the target data record, it can be judged whether the target data exists in the local cache according to the connection condition corresponding to the current data record Record, if the result of the judgment is no, execute the target node from at least two nodes according to the connection condition corresponding to the current data record, and read the data record in the second data table stored on the target node as the target data record Operation; if the judgment result is yes, the target data record can be obtained from the local cache, so that the target data record can be obtained more quickly, and the network resources consumed to obtain the target data record can be saved Source, improve the efficiency of Join operation.

進一步,上述目前資料記錄對應的連接條件可以是目標鍵,則一種根據目前資料記錄對應的連接條件,從至少兩個節點中確定目標節點的實施方式包括:對目前資料記錄對應的目標鍵進行雜湊運算,以獲得目前資料記錄對應的目標鍵的雜湊值;根據目前資料記錄對應的目標鍵的雜湊值和上述至少兩個節點的數量,確定目前資料記錄對應的目標鍵對應的節點作為目標節點。 Further, the connection condition corresponding to the current data record may be a target key, and an implementation manner of determining the target node from at least two nodes according to the connection condition corresponding to the current data record includes: hashing the target key corresponding to the current data record Perform operations to obtain the hash value of the target key corresponding to the current data record; and determine the node corresponding to the target key corresponding to the current data record as the target node according to the hash value of the target key corresponding to the current data record and the number of at least two nodes mentioned above.

更進一步,在透過目標鍵確定從某個節點上取目標資料的過程中,若目標鍵有多個,則可以進行批量操作,這樣可以充分發揮分散式系統的優勢,提高處理性能。 Furthermore, in the process of determining to fetch target data from a certain node through the target key, if there are multiple target keys, batch operations can be performed, which can give full play to the advantages of a distributed system and improve processing performance.

由上述分析可見,本實施例在處理資料表連接任務時,首先根據其中的連接條件,將第二資料表中的資料記錄載入到至少兩個節點上,這相當於變成了一個分散式的KV儲存(即會有分散式的雜湊表),這樣不需要做Sort Merge Join,使得可以做分散式的Hash map Join,即不需要對第一資料中的資料記錄進行排序,而是可以直接讀取第一資料表中的資料記錄,並根據所讀取的第一資料表中的資料記錄對應的連接條件,從相應節點上讀取所需的第二資料表中的資料記錄,之後對讀取到的兩個資料表中的資料記錄進行Join操作。由此可見,本實施例只需將第二資料表按照連接條件分佈到不同節點上,不需要將第一資料表分佈到不同節點上,減少了shuffle排序需要處理的資料量,有利於降低連接操作所消耗的計算資源。 It can be seen from the above analysis that in this embodiment, when processing the data table connection task, first load the data records in the second data table to at least two nodes according to the connection conditions, which is equivalent to becoming a distributed KV storage (that is, there will be a distributed hash table), so that you do not need to do Sort Merge Join, so that you can do distributed Hash map Join, that is, you do not need to sort the data records in the first data, but can be read directly Take the data records in the first data table, and read the required data records in the second data table from the corresponding node according to the connection conditions corresponding to the data records in the first data table read, and then read The data records in the obtained two data tables perform the Join operation. It can be seen that in this embodiment, only the second data table needs to be distributed to different nodes according to the connection conditions, and the first data table does not need to be distributed to different nodes, which reduces the amount of data that needs to be processed for shuffle sorting, and is beneficial to reduce connections. The computing resources consumed by the operation.

下面透過對比Sort Merge Join與分散式的Hash map Join消耗的計算資源,以說明本發明技術方案帶來的優勢。 The following compares the computing resources consumed by Sort Merge Join and distributed Hash map Join to illustrate the advantages brought by the technical solution of the present invention.

假設主表是A,其資料大小是100T,假設輔表有2個分別是B和C,輔表B的資料大小為10G,輔表C的數據大小為100G。 Suppose that the main table is A, and its data size is 100T. Suppose there are two auxiliary tables B and C, the data size of auxiliary table B is 10G, and the data size of auxiliary table C is 100G.

若採用現有Sort Merge Join,其shuffle排序階段需要將主表A和輔表B進行一次排序處理,還需要將主表A和輔表C進行一次排序處理,每次排序處理包括透過網路IO讀資料表以及透過CPU進行排序,所以每次排序處理的資源消耗包括:排序所占CPU和讀表所占的網路IO。為便於描述,透過處理的資料量表示資源消耗,在這裡,考慮到CPU排序處理的資料量也就是透過網路IO讀取的資料量,故以一份資料量來表示每次排序處理的資源消耗,則shuffle排序階段需要總資源消耗為:(100T+10G)+(100T+100G)=2 * 100T+10G+100G。 If the existing Sort Merge Join is used, the shuffle sorting stage requires a sorting process for the main table A and auxiliary table B, and a sorting process for the main table A and auxiliary table C. Each sorting process includes reading through the network IO The data table is sorted by the CPU, so the resource consumption of each sorting process includes: the CPU occupied by the sorting and the network IO occupied by the table reading. For ease of description, the amount of processed data represents resource consumption. Here, considering that the amount of data processed by CPU sorting is the amount of data read through network IO, a piece of data is used to represent the resource for each sorting process. Consumption, the total resource consumption required in the shuffle sorting stage is: (100T+10G)+(100T+100G)=2 * 100T+10G+100G.

若採用本發明分散式的Hash map Join,其shuffle排序階段需要將輔表B分佈到不同節點上,還需要將輔表C分佈到不同節點上,每次將某個表分佈到不同節點上包括透過網路IO讀資料表以及透過CPU進行排序,所以將某個表分佈到不同節點上的資源消耗同樣包括:排序所占CPU和讀表所占的網路IO。為便於描述,透過處理的資料量表示資源消耗,在這裡,考慮到CPU排序處理的資料量也就是透過網路IO讀取的資料量,故以一份資料量 來表示每次排序處理的資源消耗,則shuffle排序階段需要總資源消耗為:10G+100G。 If the distributed Hash map Join of the present invention is used, the auxiliary table B needs to be distributed to different nodes in the shuffle sorting phase, and auxiliary table C needs to be distributed to different nodes. Each time a certain table is distributed to different nodes, including Reading data tables through network IO and sorting through CPU, so the resource consumption of distributing a table to different nodes also includes: CPU occupied by sorting and network IO occupied by table reading. For ease of description, the amount of processed data represents resource consumption. Here, considering that the amount of data processed by the CPU sorting is also the amount of data read through the network IO, the amount of data is To represent the resource consumption of each sorting process, the total resource consumption required for the shuffle sorting stage is: 10G+100G.

由上述可見,由於本發明技術方案只需將第二資料表按照連接條件分佈到不同節點上,不需要將第一資料表分佈到不同節點上,減少了shuffle排序需要處理的資料量,有利於降低連接操作所消耗的計算資源。 It can be seen from the above that, because the technical solution of the present invention only needs to distribute the second data table to different nodes according to the connection conditions, it is not necessary to distribute the first data table to different nodes, which reduces the amount of data that needs to be processed for shuffle sorting, which is beneficial Reduce the computing resources consumed by connection operations.

圖2為本發明另一實施例提供的分散式系統的架構示意圖。如圖2所示,該分散式系統包括:控制節點21、調度節點22、以及至少兩個計算節點23。進一步,如圖2所示,計算節點23至少包括快取模組和處理模組。 FIG. 2 is a schematic diagram of the architecture of a distributed system provided by another embodiment of the present invention. As shown in FIG. 2, the distributed system includes: a control node 21, a scheduling node 22, and at least two computing nodes 23. Further, as shown in FIG. 2, the computing node 23 at least includes a cache module and a processing module.

值得說明的是,圖2所示分散式系統僅為一種示例,並不限於此,例如可以將圖2中的調度節點22省略從而獲得一種更為簡單的分散式系統。 It is worth noting that the distributed system shown in FIG. 2 is only an example and is not limited to this. For example, the scheduling node 22 in FIG. 2 can be omitted to obtain a simpler distributed system.

下面將基於圖2所示分散式系統,對本發明技術方案進行詳細說明。 The technical scheme of the present invention will be described in detail below based on the distributed system shown in FIG. 2.

控制節點21負責接收資料表連接任務,根據資料表連接任務獲知需要按照連接條件對第一資料表和第二資料表進行連接操作。 The control node 21 is responsible for receiving the data table connection task, and learns from the data table connection task that it is necessary to perform a connection operation on the first data table and the second data table according to the connection conditions.

控制節點21可以根據資料表連接任務向調度節點22發送調度指令,控制調度節點22調度分散式系統中可用的計算節點23。調度節點22具體接收控制節點21的調度指令,根據調度指令調度分散式系統中的計算節點23。 The control node 21 can send scheduling instructions to the scheduling node 22 according to the data table connection task, and control the scheduling node 22 to schedule the computing nodes 23 available in the distributed system. The scheduling node 22 specifically receives the scheduling instruction of the control node 21, and schedules the computing node 23 in the distributed system according to the scheduling instruction.

在上述調度分散式系統中的計算節點23的過程中, 控制節點21透過調度節點22向計算節點23提供後續載入第二資料表中的資料記錄所需的設定檔,該設定檔記載有第二資料表的標識、儲存位置、以及需要載入的資料記錄的標識資訊等。 In the process of scheduling computing nodes 23 in the decentralized system described above, The control node 21 provides the computing node 23 with the configuration file required for subsequent loading of the data records in the second data table through the scheduling node 22. The configuration file records the identifier of the second data table, the storage location, and the data to be loaded. Recorded identification information, etc.

在分散式系統中各計算節點23上部署有載入進程,該載入進程主要根據設定檔,將第二資料表中的資料記錄載入到快取模組中。具體的,調度模組22啟動各計算節點23上的載入進程,載入進程根據設定檔,到相應儲存位置讀取第二資料表中的相應資料記錄,將其所讀取的資料記錄載入到快取模組中。值得說明的是,第二資料表可以儲存在分散式系統之外的空間中,但並不限於此。 A loading process is deployed on each computing node 23 in the distributed system, and the loading process mainly loads the data records in the second data table into the cache module according to the configuration file. Specifically, the scheduling module 22 starts the loading process on each computing node 23. The loading process reads the corresponding data record in the second data table from the corresponding storage location according to the configuration file, and loads the data record read by it. Into the cache module. It is worth noting that the second data table can be stored in a space outside the distributed system, but it is not limited to this.

當所有計算節點23上的載入進程執行完載入操作,即均進入監聽埠狀態時,透過調度節點22向控制節點21返回一個載入結束指令。控制節點21根據該載入結束指令,可以獲知各計算節點23已經將第二資料表中的資料記錄載入到快取模組中。 When all the load processes on the computing nodes 23 finish the load operation and enter the listening port state, a load end instruction is returned to the control node 21 through the scheduling node 22. According to the load end instruction, the control node 21 can learn that each computing node 23 has loaded the data records in the second data table into the cache module.

控制節點21向調度節點22發送啟動指令,使得調度節點22啟動各計算節點23上的處理進程。在各計算節點23上部署有處理進程,處理進程主要用於讀取第一資料表中的資料記錄作為目前資料記錄,根據所讀取的目前資料記錄對應的key,確定該key對應的第二資料表中的資料記錄所在的計算節點23,讀取所確定的計算節點23上儲存的第二資料表中的資料記錄作為目標資料記錄,對所讀取的目前資料記錄和目標資料記錄進行Join操作。值 得說明的是,第一資料表可以儲存在分散式系統之外的空間中,但並不限於此。 The control node 21 sends a start instruction to the scheduling node 22, so that the scheduling node 22 starts the processing process on each computing node 23. A processing process is deployed on each computing node 23. The processing process is mainly used to read the data record in the first data table as the current data record, and determine the second corresponding to the key according to the key corresponding to the read current data record. The computing node 23 where the data record in the data table is located, reads the data record in the second data table stored on the determined computing node 23 as the target data record, and performs Join on the read current data record and the target data record operate. value It should be noted that the first data table can be stored in a space outside the distributed system, but it is not limited to this.

可選的,在一種具體實現方式中,上述各計算節點23可以採用服務端/用戶端的方式實現。例如,各計算節點23的快取模組可以作為快取服務端(CacheService)實現,該快取服務端還包括一個快取管理者(CacheManager),各快取模組對應一個快取節點(CacheNode);相應的,各計算節點23的處理模組作為快取用戶端(CacheClient)實現。 Optionally, in a specific implementation manner, the foregoing computing nodes 23 may be implemented in a server/user manner. For example, the cache module of each computing node 23 can be implemented as a cache server (CacheService), and the cache server also includes a cache manager (CacheManager), and each cache module corresponds to a cache node (CacheNode ); Correspondingly, the processing module of each computing node 23 is implemented as a cache client (CacheClient).

具體的,CacheManager協調管理所有CacheNode。CacheNode負責載入資料到記憶體,並提供服務。可選的,第二資料表可以用shard檔的形式進行儲存管理,採用shard檔的目的是因為在故障(failover)時,一但CacheNode重啟,只需再次讀入shard文件,使得處理相對簡單。 Specifically, CacheManager coordinates and manages all CacheNodes. CacheNode is responsible for loading data into memory and providing services. Optionally, the second data table can be stored and managed in the form of a shard file. The purpose of using the shard file is because in the event of a failure (failover), once the CacheNode restarts, only the shard file needs to be read again, making the processing relatively simple.

CacheClient訪問CacheService,透過對key進行hash計算,並根據計算結果從其中某個CacheNode讀取資料。此外,CacheClient中應該有一部分本地快取,通常會使用近期最少使用演算法(Least Recently Used,LRU)等快取演算法將部分已經讀取的資料保存在本地快取中,這樣CacheClient可以優先從本地快取中讀取所需的資料,如果在本地快取中讀取到所需的資料,可以節約透過網路從CacheNode讀取資料的操作,有利於提高效率、節約資源。 CacheClient accesses CacheService, performs hash calculation on the key, and reads data from one of the CacheNodes based on the calculation result. In addition, there should be a part of local cache in CacheClient. Usually, cache algorithms such as Least Recently Used (LRU) are used to save some of the data that has been read in the local cache, so that CacheClient can take priority from The required data is read in the local cache. If the required data is read in the local cache, the operation of reading data from the CacheNode through the network can be saved, which is beneficial to improve efficiency and save resources.

由上述分析可見,本實施例在處理資料表連接任務時,首先根據其中的連接條件,將第二資料表中的資料記錄載入到至少兩個節點上,實現分散式儲存,使得可以直接讀取第一資料表中的資料記錄,並根據所讀取的第一資料表中的資料記錄對應的連接條件,從相應節點上讀取所需的第二資料表中的資料記錄,之後對讀取到的兩個資料表中的資料記錄進行Join操作,實現分散式的Hash map Join。由此可見,本實施例只需將第二資料表按照連接條件分佈到不同節點上,不需要將第一資料表分佈到不同節點上,減少了shuffle排序需要處理的資料量,有利於降低連接操作所消耗的計算資源。 It can be seen from the above analysis that when processing the data table connection task in this embodiment, first, according to the connection conditions, load the data records in the second data table to at least two nodes to realize distributed storage, so that it can be directly read. Take the data records in the first data table, and read the required data records in the second data table from the corresponding node according to the connection conditions corresponding to the data records in the first data table read, and then read The data records in the obtained two data tables are joined to realize distributed Hash map Join. It can be seen that in this embodiment, only the second data table needs to be distributed to different nodes according to the connection conditions, and the first data table does not need to be distributed to different nodes, which reduces the amount of data that needs to be processed for shuffle sorting, and is beneficial to reduce connections. The computing resources consumed by the operation.

需要說明的是,對於前述的各方法實施例,為了簡單描述,故將其都表述為一系列的動作組合,但是本領域技術人員應該知悉,本發明並不受所描述的動作順序的限制,因為依據本發明,某些步驟可以採用其他順序或者同時進行。其次,本領域技術人員也應該知悉,說明書中所描述的實施例均屬於較佳實施例,所關於的動作和模組並不一定是本發明所必須的。 It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described sequence of actions. Because according to the present invention, some steps can be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the related actions and modules are not necessarily required by the present invention.

在上述實施例中,對各個實施例的描述都各有側重,某個實施例中沒有詳述的部分,可以參見其他實施例的相關描述。 In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

圖3為本發明又一實施例提供的資料表連接裝置的結構示意圖。如圖3所示,該裝置包括:接收模組31、載入模組32、讀取模組33和連接模組34。 FIG. 3 is a schematic structural diagram of a data table connection device provided by another embodiment of the present invention. As shown in FIG. 3, the device includes: a receiving module 31, a loading module 32, a reading module 33, and a connecting module 34.

接收模組31,用於接收資料表連接任務,該資料表連接任務指示按照連接條件對第一資料表和第二資料表進行連接操作。 The receiving module 31 is configured to receive a data table connection task, which instructs to perform a connection operation on the first data table and the second data table according to the connection conditions.

載入模組32,用於根據連接條件,將第二資料表中的資料記錄載入到分散式系統中至少兩個節點上。 The loading module 32 is used for loading the data records in the second data table to at least two nodes in the distributed system according to the connection conditions.

讀取模組33,用於讀取第一資料表中的資料記錄作為目前資料記錄,根據目前資料記錄對應的連接條件,從至少兩個節點中確定目標節點,並讀取目標節點上儲存的第二資料表中的資料記錄作為目標資料記錄。 The reading module 33 is used to read the data record in the first data table as the current data record, determine the target node from at least two nodes according to the connection condition corresponding to the current data record, and read the data stored on the target node The data record in the second data table is used as the target data record.

連接模組34,用於對目前資料記錄和目標資料記錄進行連接操作。 The connection module 34 is used to connect the current data record and the target data record.

較佳的,位於至少兩個節點中每個節點上的第二資料表中的資料記錄,其資料量小於單個節點的記憶體限制,也就是說,分佈到至少兩個節點中每個節點上的第二資料表中的資料記錄,均能夠全部放到相應節點的儲存空間(較佳為記憶體)中。 Preferably, the data records in the second data table located on each of the at least two nodes have a data amount less than the memory limit of a single node, that is, they are distributed to each of the at least two nodes The data records in the second data table can all be placed in the storage space (preferably memory) of the corresponding node.

進一步,如圖4所示,該裝置還包括:第一判斷模組35。 Furthermore, as shown in FIG. 4, the device further includes: a first judgment module 35.

第一判斷模組35,用於判斷第二資料表的資料量是否大於單個節點的記憶體限制,以及在判斷結果為是時觸發載入模組32執行根據連接條件,將第二資料表中的資料記錄載入到分散式系統中至少兩個節點上的操作。 The first judgment module 35 is used to judge whether the amount of data in the second data table is greater than the memory limit of a single node, and when the judgment result is yes, trigger the load module 32 to execute the second data table according to the connection condition The data records are loaded into the operation of at least two nodes in the distributed system.

更進一步,如圖4所示,該裝置還包括:第二判斷模組36。 Furthermore, as shown in FIG. 4, the device further includes: a second judgment module 36.

第二判斷模組36,用於根據目前資料記錄對應的連接條件,判斷本地快取中是否存在所述目標資料記錄,以及在判斷結果為否時觸發讀取模組33執行根據目前資料記錄對應的連接條件,從至少兩個節點中確定目標節點,並讀取目標節點上儲存的第二資料表中的資料記錄作為目標資料記錄的操作。 The second judgment module 36 is used for judging whether the target data record exists in the local cache according to the connection condition corresponding to the current data record, and triggering the reading module 33 to execute the correspondence based on the current data record when the judgment result is no The operation of determining the target node from at least two nodes, and reading the data record in the second data table stored on the target node as the target data record.

在一可選實施方式中,上述連接條件包括連接所需的至少一個目標鍵。這裡的目標鍵實際上是鍵值對中的鍵。 In an optional embodiment, the above-mentioned connection condition includes at least one target key required for connection. The target key here is actually the key in the key-value pair.

基於上述,載入模組32具體用於:分別對所述至少一個目標鍵中的各目標鍵進行雜湊運算,以獲取各目標鍵的雜湊值;根據各目標鍵的雜湊值和所述至少兩個節點的數量,確定各目標鍵對應的節點;將所述第二資料表中對應於各目標鍵的資料記錄分別載入到各目標鍵對應的節點上。 Based on the foregoing, the loading module 32 is specifically configured to: perform a hash operation on each target key in the at least one target key to obtain the hash value of each target key; according to the hash value of each target key and the at least two target keys The number of nodes determines the node corresponding to each target key; the data records corresponding to each target key in the second data table are respectively loaded onto the node corresponding to each target key.

在一可選實施方式中,載入模組32具體用於:根據所述連接條件,將所述第二資料表中的資料記錄載入到所述至少兩個節點的記憶體中。第二資料表中的資料記錄儲存在節點的記憶體中,可以隨時讀取,讀取速度較快,有利於提高Join操作的效率。 In an alternative embodiment, the loading module 32 is specifically configured to load the data records in the second data table into the memory of the at least two nodes according to the connection condition. The data records in the second data table are stored in the memory of the node, which can be read at any time, and the reading speed is faster, which is beneficial to improve the efficiency of the Join operation.

值得說明的是,較佳的,可以將第二資料表中的資料記錄載入到上述至少兩個節點的記憶體中,但並不限於記憶體,還可以是節點的SSD或者其他儲存媒體中。 It is worth noting that, preferably, the data records in the second data table can be loaded into the memory of the at least two nodes mentioned above, but it is not limited to the memory, and it can also be the SSD or other storage media of the node. .

本實施例提供的資料表連接裝置,在處理資料表連接 任務時,首先根據其中的連接條件,將第二資料表中的資料記錄載入到分散式系統中至少兩個節點上,這相當於變成了一個分散式的KV儲存,這樣不需要做Sort Merge Join,使得可以做分散式的Hash map Join,即不需要對第一資料中的資料記錄進行排序,而是可以直接讀取第一資料表中的資料記錄,並根據所讀取的第一資料表中的資料記錄對應的連接條件,從相應節點上讀取所需的第二資料表中的資料記錄,之後對讀取到的兩個資料表中的資料記錄進行連接操作。由此可見,採用本實施例提供的資料表連接裝置,只需將第二資料表按照連接條件分佈到不同節點上,不需要將第一資料表分佈到不同節點上,減少了shuffle排序需要處理的資料量,有利於降低連接操作所消耗的計算資源。 The data table connection device provided in this embodiment is processing data table connection During the task, first load the data records in the second data table to at least two nodes in the distributed system according to the connection conditions, which is equivalent to becoming a distributed KV storage, so there is no need to do Sort Merge Join makes it possible to do distributed Hash map Join, that is, it is not necessary to sort the data records in the first data, but can directly read the data records in the first data table, and according to the first data read The data records in the table correspond to the connection conditions, the data records in the second data table required are read from the corresponding node, and then the data records in the two data tables read are connected. It can be seen that with the data table connection device provided in this embodiment, only the second data table needs to be distributed to different nodes according to the connection conditions, and the first data table does not need to be distributed to different nodes, which reduces the need for shuffle sorting. The amount of data is conducive to reducing the computing resources consumed by the connection operation.

所屬領域的技術人員可以清楚地瞭解到,為描述的方便和簡潔,上述描述的系統,裝置和單元的具體工作過程,可以參考前述方法實施例中的對應過程,在此不再贅述。 Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

在本發明所提供的幾個實施例中,應該理解到,所揭露的系統,裝置和方法,可以透過其它的方式實現。例如,以上所描述的裝置實施例僅僅是示意性的,例如,所述單元的劃分,僅僅為一種邏輯功能劃分,實際實現時可以有另外的劃分方式,例如多個單元或元件可以結合或者可以整合到另一個系統,或一些特徵可以忽略,或不執行。另一點,所顯示或討論的相互之間的耦合或直接耦合 或通信連接可以是透過一些介面,裝置或單元的間接耦合或通信連接,可以是電性,機械或其它的形式。 In the several embodiments provided by the present invention, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or elements may be combined or may be Integrate into another system, or some features can be ignored or not implemented. Another point, the mutual coupling or direct coupling shown or discussed Or the communication connection may be indirect coupling or communication connection of devices or units through some interfaces, and may be in electrical, mechanical or other forms.

所述作為分離部件說明的單元可以是或者也可以不是物理上分開的,作為單元顯示的部件可以是或者也可以不是物理單元,即可以位於一個地方,或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部單元來實現本實施例方案的目的。 The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. . Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

另外,在本發明各個實施例中的各功能單元可以整合在一個處理單元中,也可以是各個單元單獨物理存在,也可以兩個或兩個以上單元整合在一個單元中。上述整合的單元既可以採用硬體的形式實現,也可以採用硬體加軟體功能單元的形式實現。 In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be realized either in the form of hardware, or in the form of hardware plus software functional units.

上述以軟體功能單元的形式實現的整合的單元,可以儲存在一個電腦可讀取儲存媒體中。上述軟體功能單元儲存在一個儲存媒體中,包括若干指令用以使得一台電腦設備(可以是個人電腦,伺服器,或者網路設備等)或處理器(processor)執行本發明各個實施例所述方法的部分步驟。而前述的儲存媒體包括:U碟、行動硬碟、唯讀記憶體(Read-Only Memory,ROM)、隨機存取記憶體(Random Access Memory,RAM)、磁碟或者光碟等各種可以儲存程式碼的媒體。 The above-mentioned integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The above-mentioned software functional unit is stored in a storage medium, and includes a number of instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) or a processor to execute the various embodiments of the present invention Part of the method. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks, etc., which can store program codes Media.

最後應說明的是:以上實施例僅用以說明本發明的技術方案,而非對其限制;儘管參照前述實施例對本發明進行了詳細的說明,本領域的普通技術人員應當理解:其依 然可以對前述各實施例所記載的技術方案進行修改,或者對其中部分技術特徵進行等同替換;而這些修改或者替換,並不使相應技術方案的本質脫離本發明各實施例技術方案的精神和範圍。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: However, the technical solutions described in the foregoing embodiments can be modified, or some of the technical features can be equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and spirit of the technical solutions of the embodiments of the present invention. Scope.

Claims (10)

一種分散式系統的資料表連接方法,其特徵在於,該方法包括:在分散式資料倉庫的查詢過程中,接收資料表連接任務,該資料表連接任務指示按照連接條件對第一資料表和第二資料表進行連接操作;根據該連接條件,將該第二資料表中的資料記錄載入到該分散式系統中至少兩個節點上以實現分散式儲存;讀取該第一資料表中的資料記錄作為目前資料記錄,根據該目前資料記錄對應的連接條件,從該至少兩個節點中確定目標節點,並讀取該目標節點上儲存的該第二資料表中的資料記錄作為目標資料記錄;以及對該目前資料記錄和該目標資料記錄進行連接操作。 A data table connection method for a distributed system, characterized in that the method includes: in the query process of the distributed data warehouse, receiving a data table connection task, the data table connection task instructs to compare the first data table and the first data table according to the connection conditions. Two data tables are connected; according to the connection condition, the data records in the second data table are loaded to at least two nodes in the distributed system to realize distributed storage; and the data in the first data table is read The data record is used as the current data record. According to the connection condition corresponding to the current data record, the target node is determined from the at least two nodes, and the data record in the second data table stored on the target node is read as the target data record ; And the connection operation of the current data record and the target data record. 根據申請專利範圍第1項所述的方法,其中,該根據該連接條件,將該第二資料表中的資料記錄載入到該分散式系統中至少兩個節點上之前,包括:判斷該第二資料表的資料量是否大於單個節點的記憶體限制;以及若判斷結果為是,則執行根據該連接條件,將該第二資料表中的資料記錄載入到該分散式系統中至少兩個節點上的操作。 The method according to item 1 of the scope of patent application, wherein, before loading the data records in the second data table to at least two nodes in the distributed system according to the connection condition, the method includes: judging the first data record Whether the data volume of the second data table is greater than the memory limit of a single node; and if the judgment result is yes, execute according to the connection condition, load the data records in the second data table into the distributed system at least two Operations on the node. 根據申請專利範圍第1項所述的方法,其中,該根據該目前資料記錄對應的連接條件,從該至少兩個節點中確定目標節點,並讀取該目標節點上儲存的該第二資料 表中的資料記錄作為目標資料記錄之前,包括:根據該目前資料記錄對應的連接條件,判斷本地快取中是否存在該目標資料記錄;以及若判斷結果為否,則執行根據該目前資料記錄對應的連接條件,從該至少兩個節點中確定目標節點,並讀取該目標節點上儲存的該第二資料表中的資料記錄作為目標資料記錄的操作。 The method according to item 1 of the scope of patent application, wherein the target node is determined from the at least two nodes according to the connection condition corresponding to the current data record, and the second data stored on the target node is read Before the data record in the table is used as the target data record, it includes: judging whether the target data record exists in the local cache according to the connection condition corresponding to the current data record; and if the judgment result is no, execute the corresponding corresponding to the current data record The operation of determining the target node from the at least two nodes, and reading the data record in the second data table stored on the target node as the target data record. 根據申請專利範圍第1項所述的方法,其中,該連接條件包括連接所需的至少一個目標鍵;該根據該連接條件,將該第二資料表中的資料記錄載入到該分散式系統中至少兩個節點上,包括:分別對該至少一個目標鍵中的各目標鍵進行雜湊運算,以獲取各目標鍵的雜湊值;根據各目標鍵的雜湊值和該至少兩個節點的數量,確定各目標鍵對應的節點;以及將該第二資料表中對應於各目標鍵的資料記錄分別載入到各目標鍵對應的節點上。 The method according to item 1 of the scope of patent application, wherein the connection condition includes at least one target key required for connection; the data record in the second data table is loaded into the distributed system according to the connection condition At least two nodes in the at least two nodes, including: performing a hash operation on each target key in the at least one target key to obtain the hash value of each target key; according to the hash value of each target key and the number of the at least two nodes, Determine the node corresponding to each target key; and load the data record corresponding to each target key in the second data table to the node corresponding to each target key. 根據申請專利範圍第1至4項中任一項所述的方法,其中,該根據該連接條件,將該第二資料表中的資料記錄載入到該分散式系統中至少兩個節點上,包括:根據該連接條件,將該第二資料表中的資料記錄載入到該至少兩個節點的記憶體中。 The method according to any one of items 1 to 4 of the scope of patent application, wherein the data records in the second data table are loaded to at least two nodes in the distributed system according to the connection condition, It includes: according to the connection condition, loading the data records in the second data table into the memory of the at least two nodes. 一種分散式系統的資料表連接裝置,其特徵在於,該裝置包括: 接收模組,用於在分散式資料倉庫的查詢過程中,接收資料表連接任務,該資料表連接任務指示按照連接條件對第一資料表和第二資料表進行連接操作;載入模組,用於根據該連接條件,將該第二資料表中的資料記錄載入到該分散式系統中至少兩個節點上以實現分散式儲存;讀取模組,用於讀取該第一資料表中的資料記錄作為目前資料記錄,根據該目前資料記錄對應的連接條件,從該至少兩個節點中確定目標節點,並讀取該目標節點上儲存的該第二資料表中的資料記錄作為目標資料記錄;以及連接模組,用於對該目前資料記錄和該目標資料記錄進行連接操作。 A data table connection device of a distributed system, characterized in that the device includes: The receiving module is used to receive the data table connection task in the query process of the distributed data warehouse. The data table connection task instructs the connection operation of the first data table and the second data table according to the connection conditions; loading the module, According to the connection condition, load the data records in the second data table to at least two nodes in the distributed system to realize distributed storage; the reading module is used to read the first data table The data record in the current data record is used as the current data record, the target node is determined from the at least two nodes according to the connection condition corresponding to the current data record, and the data record in the second data table stored on the target node is read as the target Data record; and a connection module for connecting the current data record and the target data record. 根據申請專利範圍第6項所述的裝置,其中,還包括:第一判斷模組,用於判斷該第二資料表的資料量是否大於單個節點的記憶體限制,以及在判斷結果為是時觸發該載入模組執行根據該連接條件,將該第二資料表中的資料記錄載入到該分散式系統中至少兩個節點上的操作。 The device according to item 6 of the scope of patent application, further comprising: a first judgment module for judging whether the amount of data in the second data table is greater than the memory limit of a single node, and when the judgment result is yes The loading module is triggered to perform the operation of loading the data records in the second data table to at least two nodes in the distributed system according to the connection condition. 根據申請專利範圍第6項所述的裝置,其中,還包括:第二判斷模組,用於根據該目前資料記錄對應的連接條件,判斷本地快取中是否存在該目標資料記錄,以及在判斷結果為否時觸發該讀取模組執行根據該目前資料記錄對應的連接條件,從該至少兩個節點中確定目標節點,並 讀取該目標節點上儲存的該第二資料表中的資料記錄作為目標資料記錄的操作。 The device according to item 6 of the scope of patent application, further comprising: a second judgment module for judging whether the target data record exists in the local cache according to the connection condition corresponding to the current data record, and when judging If the result is no, trigger the reading module to execute the connection condition corresponding to the current data record to determine the target node from the at least two nodes, and The operation of reading the data record in the second data table stored on the target node as the target data record. 根據申請專利範圍第6項所述的裝置,其中,該連接條件包括連接所需的至少一個目標鍵;該載入模組具體用於:分別對該至少一個目標鍵中的各目標鍵進行雜湊運算,以獲取各目標鍵的雜湊值;根據各目標鍵的雜湊值和該至少兩個節點的數量,確定各目標鍵對應的節點;以及將該第二資料表中對應於各目標鍵的資料記錄分別載入到各目標鍵對應的節點上。 The device according to item 6 of the scope of patent application, wherein the connection condition includes at least one target key required for connection; the loading module is specifically used to: hash each target key of the at least one target key respectively Perform operations to obtain the hash value of each target key; determine the node corresponding to each target key according to the hash value of each target key and the number of the at least two nodes; and the data corresponding to each target key in the second data table The records are respectively loaded on the nodes corresponding to each target key. 根據申請專利範圍第6至9項中任一項所述的裝置,其中,該載入模組具體用於:根據該連接條件,將該第二資料表中的資料記錄載入到該至少兩個節點的記憶體中。 The device according to any one of items 6 to 9 of the scope of patent application, wherein the loading module is specifically used to: load the data records in the second data table into the at least two data records according to the connection condition. In the memory of each node.
TW106104646A 2016-03-02 2017-02-13 Data table connection method and device TWI746511B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610118167.7 2016-03-02
CN201610118167.7A CN107153643B (en) 2016-03-02 2016-03-02 Data table connection method and device

Publications (2)

Publication Number Publication Date
TW201738781A TW201738781A (en) 2017-11-01
TWI746511B true TWI746511B (en) 2021-11-21

Family

ID=59742547

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106104646A TWI746511B (en) 2016-03-02 2017-02-13 Data table connection method and device

Country Status (3)

Country Link
CN (1) CN107153643B (en)
TW (1) TWI746511B (en)
WO (1) WO2017148297A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710643B (en) * 2018-12-20 2020-11-13 上海达梦数据库有限公司 External connection management method, device, server and storage medium
CN111506670B (en) * 2019-01-31 2023-07-18 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN110413670B (en) * 2019-06-28 2023-07-14 创新先进技术有限公司 Data export method, device and equipment based on MapReduce
US11520738B2 (en) * 2019-09-20 2022-12-06 Samsung Electronics Co., Ltd. Internal key hash directory in table
CN111752972A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 Data association query method and system under key-value storage mode based on RocksDB
CN112597148A (en) * 2020-11-25 2021-04-02 联想(北京)有限公司 Data table connection method and device
CN112732715B (en) * 2020-12-31 2023-08-25 星环信息科技(上海)股份有限公司 Data table association method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177874B2 (en) * 2003-01-16 2007-02-13 Jardin Cary A System and method for generating and processing results data in a distributed system
CN103186651A (en) * 2011-12-31 2013-07-03 中国移动通信集团公司 Distributed relational database as well as method and device for building and querying same
CN104504114A (en) * 2014-12-30 2015-04-08 杭州华为数字技术有限公司 Multi-hash table-based relational operation optimization method, device and system
TWI522827B (en) * 2015-01-09 2016-02-21 Chunghwa Telecom Co Ltd Real-time storage and real-time reading of huge amounts of data for non-related databases

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7085769B1 (en) * 2001-04-26 2006-08-01 Ncr Corporation Method and apparatus for performing hash join
CN102214176B (en) * 2010-04-02 2014-02-05 中国人民解放军国防科学技术大学 Method for splitting and join of huge dimension table
CN102467570B (en) * 2010-11-17 2014-03-12 日电(中国)有限公司 Connection query system and method for distributed data warehouse
CN102323947B (en) * 2011-09-05 2013-07-10 东北大学 Generation method of pre-join table on ring-shaped schema database
CN104424240B (en) * 2013-08-27 2019-06-14 腾讯科技(深圳)有限公司 Multilist correlating method, main service node, calculate node and system
US20160055212A1 (en) * 2014-08-22 2016-02-25 Attivio, Inc. Automatic joining of data sets based on statistics of field values in the data sets
CN104391957A (en) * 2014-12-01 2015-03-04 浪潮电子信息产业股份有限公司 Data interaction analysis method for hybrid big data processing system
CN105045871B (en) * 2015-07-15 2018-09-28 国家超级计算深圳中心(深圳云计算中心) Data aggregate querying method and device
CN105183880A (en) * 2015-09-22 2015-12-23 浪潮集团有限公司 Hash join method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177874B2 (en) * 2003-01-16 2007-02-13 Jardin Cary A System and method for generating and processing results data in a distributed system
CN103186651A (en) * 2011-12-31 2013-07-03 中国移动通信集团公司 Distributed relational database as well as method and device for building and querying same
CN104504114A (en) * 2014-12-30 2015-04-08 杭州华为数字技术有限公司 Multi-hash table-based relational operation optimization method, device and system
TWI522827B (en) * 2015-01-09 2016-02-21 Chunghwa Telecom Co Ltd Real-time storage and real-time reading of huge amounts of data for non-related databases

Also Published As

Publication number Publication date
CN107153643A (en) 2017-09-12
TW201738781A (en) 2017-11-01
WO2017148297A1 (en) 2017-09-08
CN107153643B (en) 2021-02-19

Similar Documents

Publication Publication Date Title
TWI746511B (en) Data table connection method and device
US10884799B2 (en) Multi-core processor in storage system executing dynamic thread for increased core availability
US20240020038A1 (en) Distributed Storage Method and Device
US11169706B2 (en) Rebalancing storage I/O workloads by storage controller selection and redirection
US9996432B2 (en) Automated local database connection affinity and failover
US9075856B2 (en) Systems and methods for distributing replication tasks within computing clusters
US20150127691A1 (en) Efficient implementations for mapreduce systems
JPH0954754A (en) Customer-information control system and method in loosely-coupled parallel processing environment
US8874626B2 (en) Tracking files and directories related to unsuccessful change operations
KR102043276B1 (en) Apparatus and method for dynamic resource allocation based on interconnect fabric switching
WO2013131443A1 (en) Data storage method and device
US11868333B2 (en) Data read/write method and apparatus for database
US20230136106A1 (en) Space efficient distributed storage systems
US20140089260A1 (en) Workload transitioning in an in-memory data grid
Nogueira et al. Elastic state machine replication
US8621260B1 (en) Site-level sub-cluster dependencies
CN107943615B (en) Data processing method and system based on distributed cluster
US11625503B2 (en) Data integrity procedure
JPH0944461A (en) System and method for control of customer information with api start and cancel transaction function in loosely-coupledparallel processing environment
CN112596669A (en) Data processing method and device based on distributed storage
WO2023155591A1 (en) Progress information management and control method, micro-service apparatus, electronic device, and storage medium
Yu et al. DMooseFS: Design and implementation of distributed files system with distributed metadata server
US11507512B2 (en) Fault tolerant cluster data handling
US11271992B2 (en) Lazy lock queue reduction for cluster group changes
CN112835967B (en) Data processing method, device, equipment and medium based on distributed storage system