TWI752005B - Method, system and distributed system for intermediate data transmission - Google Patents

Method, system and distributed system for intermediate data transmission Download PDF

Info

Publication number
TWI752005B
TWI752005B TW106103972A TW106103972A TWI752005B TW I752005 B TWI752005 B TW I752005B TW 106103972 A TW106103972 A TW 106103972A TW 106103972 A TW106103972 A TW 106103972A TW I752005 B TWI752005 B TW I752005B
Authority
TW
Taiwan
Prior art keywords
data
transmitted
network
intermediate data
message
Prior art date
Application number
TW106103972A
Other languages
Chinese (zh)
Other versions
TW201732632A (en
Inventor
呂志強
陶陽宇
陸一峰
李超
吳永軍
李治
劉耀莉
Original Assignee
香港商阿里巴巴集團服務有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 香港商阿里巴巴集團服務有限公司 filed Critical 香港商阿里巴巴集團服務有限公司
Publication of TW201732632A publication Critical patent/TW201732632A/en
Application granted granted Critical
Publication of TWI752005B publication Critical patent/TWI752005B/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications

Abstract

本發明關於一種中間資料傳輸方法及系統、分布式系統。其中,中間資料傳輸方法包括:確定所述上級子任務本次要傳輸的中間資料,記為待傳資料;從所述下級子任務的用於監聽資料的網路端口中選擇需要接收所述待傳資料的網路端口,記為接收端口;將所述待傳資料通過網路直接傳輸給所述接收端口。本發明實施例中,中間資料的傳輸不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。 The present invention relates to an intermediate data transmission method and system, and a distributed system. The intermediate data transmission method includes: determining the intermediate data to be transmitted by the upper-level subtask this time, and recording it as the data to be transmitted; The network port for data transmission is recorded as the receiving port; the data to be transmitted is directly transmitted to the receiving port through the network. In the embodiment of the present invention, the transmission of the intermediate data does not need to pass through a distributed storage system, thereby avoiding the influence of the IO rate of the disk on the transmission rate of the intermediate data, and improving the transmission rate of the intermediate data.

Description

用於中間資料傳輸的方法、系統及分布式系統 Method, system and distributed system for intermediate data transmission

本發明涉及通信領域,尤其涉及一種中間資料傳輸方法及系統、分布式系統。 The present invention relates to the field of communications, in particular to an intermediate data transmission method and system, and a distributed system.

在分布式系統中,用戶任務通常可以被分解為幾級不同的子任務。這些子任務之間存在依賴關係,一些子任務的輸出結果會作為另一些子任務的輸入資料。這就涉及到了子任務之間的資料傳遞,這些在子任務之間傳遞的資料稱為中間資料。 In distributed systems, user tasks can often be decomposed into several levels of different subtasks. There are dependencies between these subtasks, and the output results of some subtasks will be used as input data for other subtasks. This involves data transfer between subtasks, and these data transferred between subtasks are called intermediate data.

目前,在相關技術中,中間資料的傳輸是通過分布式儲存系統來完成的,其過程是:產生中間資料的上級子任務通過分布式儲存系統的介面將中間資料以文件的形式寫到分布式儲存系統的磁盤上;需要將該中間資料作為輸入資料的下級子任務從分布式儲存系統的磁盤中讀取該中間資料,以做進一步處理。 At present, in the related art, the transmission of intermediate data is completed through a distributed storage system. The process is: the upper-level subtask that generates the intermediate data writes the intermediate data in the form of files to the distributed storage system through the interface of the distributed storage system. On the disk of the storage system; the lower-level subtask that needs to take the intermediate data as input data reads the intermediate data from the disk of the distributed storage system for further processing.

以最常見的MapReduce任務為例。MapReduce任務可以被分解為MapTask和ReduceTask這兩級子任務。當通過MapReduce任務完成資料排序時,MapTask可以在 不同的機器上同時啟動若干個進程,每個進程分別讀取一部分輸入資料,並對這部分資料進行排序,然後將排序的結果輸出。這些輸出資料會作為ReduceTask的輸入,做進一步排序,以達到全域有序。這一過程中,在MapTask和ReduceTask這兩級子任務之間傳遞的資料是排序過程中產生的中間計算結果,是不需要被呈現給用戶的。 Take the most common MapReduce task as an example. MapReduce tasks can be decomposed into two sub-tasks, MapTask and ReduceTask. When data sorting is done through MapReduce tasks, MapTask can Several processes are started simultaneously on different machines, each process reads a part of the input data, sorts this part of the data, and then outputs the sorted results. These output data will be used as the input of ReduceTask for further sorting to achieve global order. In this process, the data passed between the two sub-tasks of MapTask and ReduceTask are the intermediate calculation results generated in the sorting process, and do not need to be presented to the user.

圖1為相關技術中MapTask和ReduceTask之間的中間資料傳輸過程示意圖。如圖1所示,相關技術中,MapTask和ReduceTask之間的中間資料傳輸過程如下:MapTask輸出中間資料,通過分布式儲存系統的用戶介面將中間資料以文件形式寫入分布式儲存系統,分布式儲存系統將中間資料儲存到一個或多個儲存節點中,即將這些中間資料持久化到一台或多台機器的磁盤上;ReduceTask通過分布式儲存系統的用戶介面,從儲存中間資料的儲存節點中讀取中間資料。 FIG. 1 is a schematic diagram of an intermediate data transmission process between MapTask and ReduceTask in the related art. As shown in Figure 1, in the related art, the intermediate data transmission process between MapTask and ReduceTask is as follows: MapTask outputs intermediate data, and writes the intermediate data in the form of files into the distributed storage system through the user interface of the distributed storage system. The storage system stores the intermediate data in one or more storage nodes, that is, persists these intermediate data to the disks of one or more machines; ReduceTask, through the user interface of the distributed storage system, retrieves the intermediate data from the storage nodes that store the intermediate data. Read intermediate data.

相關技術中,分布式系統的中間資料傳輸存在如下問題: In the related art, the following problems exist in the intermediate data transmission of the distributed system:

1、中間資料傳輸需要經過磁盤的讀寫來實現,而傳統機械硬盤的平均讀寫速率只能達到100MB/s左右。因此,這種中間資料傳輸方式嚴重受到磁盤IO(輸入輸出)速率的影響,傳輸速率低,導致用戶任務執行效率低下。 1. The intermediate data transmission needs to be realized by the read and write of the disk, while the average read and write rate of the traditional mechanical hard disk can only reach about 100MB/s. Therefore, this intermediate data transmission method is seriously affected by the IO (input and output) rate of the disk, and the transmission rate is low, resulting in low efficiency of user task execution.

2、分布式儲存系統中可能會出現單個儲存節點不可用的情況,為了保證資料不會丟失,分布式儲存系統通常 會為一個文件產生多份拷貝,並存放到不同的儲存節點上,這一過程造成了同樣的資料會在網路中進行多次傳輸,佔用網路帶寬。 2. A single storage node may be unavailable in a distributed storage system. In order to ensure that data will not be lost, distributed storage systems usually Multiple copies of a file will be generated and stored on different storage nodes. This process causes the same data to be transmitted multiple times in the network, occupying network bandwidth.

本發明的目的在於提供一種中間資料傳輸方法及系統、分布式系統,提高分布式系統的中間資料傳輸速率。 The purpose of the present invention is to provide an intermediate data transmission method and system, and a distributed system, so as to improve the intermediate data transmission rate of the distributed system.

為實現上述目的,本發明提出了一種中間資料傳輸方法,用於分布式系統,所述分布式系統的用戶任務包括多級子任務,所述子任務中產生中間資料的子任務稱為上級子任務,所述子任務中依賴所述中間資料進行處理的子任務稱為下級子任務,所述方法包括:確定所述上級子任務本次要傳輸的中間資料,記為待傳資料;從所述下級子任務的用於監聽資料的網路端口中選擇需要接收所述待傳資料的網路端口,記為接收端口;將所述待傳資料通過網路直接傳輸給所述接收端口。 In order to achieve the above object, the present invention proposes an intermediate data transmission method, which is used in a distributed system. The user tasks of the distributed system include multi-level subtasks, and the subtasks that generate intermediate data in the subtasks are called upper-level subtasks. In the subtasks, the subtasks that rely on the intermediate data for processing are called subtasks, and the method includes: determining the intermediate data to be transmitted by the superior subtask this time, and denoting it as the data to be transmitted; The network port that needs to receive the data to be transmitted is selected from the network ports used for monitoring data of the subtasks of the lower level, which is recorded as the receiving port; the data to be transmitted is directly transmitted to the receiving port through the network.

本發明實施例中的中間資料傳輸方法,中間資料的傳輸不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。 In the intermediate data transmission method in the embodiment of the present invention, the transmission of the intermediate data does not need to go through a distributed storage system, which avoids the influence of the disk IO rate on the intermediate data transmission rate, and improves the intermediate data transmission rate.

為實現上述目的,本發明還提出了一種中間資料傳輸系統,用於分布式系統,所述分布式系統的用戶任務包括多級子任務,所述子任務中產生中間資料的子任務稱為上級子任務,所述子任務中依賴所述中間資料進行處理的子 任務稱為下級子任務,所述中間資料傳輸系統包括:確定模組,用於確定所述上級子任務本次要傳輸的中間資料,記為待傳資料;選擇模組,用於從所述下級子任務的用於監聽資料的網路端口中選擇需要接收所述待傳資料的網路端口,記為接收端口;傳輸模組,用於將所述確定模組確定的待傳資料通過網路直接傳輸給所述選擇模組選擇的接收端口。 In order to achieve the above object, the present invention also proposes an intermediate data transmission system, which is used in a distributed system. The user tasks of the distributed system include multi-level subtasks, and the subtasks that generate intermediate data in the subtasks are called superiors. Subtasks, the subtasks in the subtasks that rely on the intermediate data for processing The task is called a lower-level subtask, and the intermediate data transmission system includes: a determination module, which is used to determine the intermediate data to be transmitted by the upper-level subtask this time, which is recorded as the data to be transmitted; Select the network port that needs to receive the data to be transmitted among the network ports used for monitoring data of the subtasks of the lower level, which is recorded as the receiving port; the transmission module is used to pass the data to be transmitted determined by the determining module through the network. The channel is directly transmitted to the receiving port selected by the selection module.

本發明實施例中的中間資料傳輸系統,中間資料的傳輸不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。 In the intermediate data transmission system in the embodiment of the present invention, the transmission of the intermediate data does not need to go through the distributed storage system, which avoids the influence of the IO rate of the disk on the transmission rate of the intermediate data, and improves the transmission rate of the intermediate data.

為實現上述目的,本發明還提出了一種分布式系統,包括前述任一項所述的中間資料傳輸系統。 To achieve the above object, the present invention also proposes a distributed system, including the intermediate data transmission system described in any one of the foregoing.

本發明實施例中的分布式系統,中間資料的傳輸不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。 In the distributed system in the embodiment of the present invention, the transmission of the intermediate data does not need to pass through the distributed storage system, which avoids the influence of the IO rate of the disk on the transmission rate of the intermediate data, and improves the transmission rate of the intermediate data.

800‧‧‧中間資料傳輸系統 800‧‧‧Intermediate Data Transmission System

810‧‧‧確定模組 810‧‧‧Determine the module

820‧‧‧選擇模組 820‧‧‧Select module

830‧‧‧傳輸模組 830‧‧‧Transmission Module

910‧‧‧第一啟動模組 910‧‧‧First Startup Module

920‧‧‧通知模組 920‧‧‧Notification Module

930‧‧‧第二啟動模組 930‧‧‧Second Activation Module

940‧‧‧確定模組 940‧‧‧Determine the module

950‧‧‧選擇模組 950‧‧‧Select modules

960‧‧‧傳輸模組 960‧‧‧Transmission Module

1000‧‧‧傳輸模組 1000‧‧‧Transmission modules

1010‧‧‧進程間傳輸單元 1010‧‧‧Interprocess transfer unit

1100‧‧‧傳輸模組 1100‧‧‧Transmission Module

1110‧‧‧進程間傳輸單元 1110‧‧‧Interprocess transfer unit

1120‧‧‧響應接收單元 1120‧‧‧Response receiving unit

1130‧‧‧許可單元 1130‧‧‧Licensing Unit

1140‧‧‧重傳單元 1140‧‧‧Retransmission Unit

1200‧‧‧分布式系統 1200‧‧‧Distributed Systems

圖1為相關技術中MapTask和ReduceTask之間的中間資料傳輸過程示意圖。 FIG. 1 is a schematic diagram of an intermediate data transmission process between MapTask and ReduceTask in the related art.

圖2為本發明實施例一中中間資料傳輸方法的流程圖。 FIG. 2 is a flowchart of an intermediate data transmission method in Embodiment 1 of the present invention.

圖3為根據圖2所示的中間資料傳輸方法進行傳輸時MapTask和ReduceTask之間的中間資料傳輸過程示意圖 之一。 FIG. 3 is a schematic diagram of the intermediate data transmission process between MapTask and ReduceTask during transmission according to the intermediate data transmission method shown in FIG. 2 one.

圖4為根據圖2所示的中間資料傳輸方法進行傳輸時MapTask和ReduceTask之間的中間資料傳輸過程示意圖之二。 FIG. 4 is the second schematic diagram of the intermediate data transmission process between MapTask and ReduceTask when the transmission is performed according to the intermediate data transmission method shown in FIG. 2 .

圖5為本發明實施例二中中間資料傳輸方法的流程圖。 FIG. 5 is a flowchart of an intermediate data transmission method in Embodiment 2 of the present invention.

圖6為本發明實施例三中中間資料傳輸方法的流程圖。 FIG. 6 is a flowchart of an intermediate data transmission method in Embodiment 3 of the present invention.

圖7為本發明實施例四中中間資料傳輸方法的流程圖。 FIG. 7 is a flowchart of an intermediate data transmission method in Embodiment 4 of the present invention.

圖8為本發明實施例五中中間資料傳輸系統的結構框圖。 FIG. 8 is a structural block diagram of an intermediate data transmission system in Embodiment 5 of the present invention.

圖9為本發明實施例六中中間資料傳輸系統的結構框圖。 FIG. 9 is a structural block diagram of an intermediate data transmission system in Embodiment 6 of the present invention.

圖10為本發明實施例七中中間資料傳輸系統的傳輸模組的結構框圖。 FIG. 10 is a structural block diagram of a transmission module of the intermediate data transmission system in Embodiment 7 of the present invention.

圖11為本發明實施例八中中間資料傳輸系統的傳輸模組的結構框圖。 FIG. 11 is a structural block diagram of a transmission module of the intermediate data transmission system in Embodiment 8 of the present invention.

圖12為本發明實施例九中分布式系統的結構框圖。 FIG. 12 is a structural block diagram of a distributed system in Embodiment 9 of the present invention.

以下結合附圖對本發明的原理和特徵進行描述,所舉實施例只用于解釋本發明,並非用於限定本發明的範圍。對於本領域普通技術人員來講,在不付出創造性勞動的前 提下,根據本發明精神所獲得的所有實施例,都屬本發明的保護範圍。 The principles and features of the present invention will be described below with reference to the accompanying drawings. The embodiments are only used to explain the present invention, but not to limit the scope of the present invention. For those of ordinary skill in the art, without any creative effort It is mentioned that all the embodiments obtained according to the spirit of the present invention belong to the protection scope of the present invention.

需要說明的是,本發明各實施例中的中間資料傳輸方法和中間資料傳輸系統均用於分布式系統,該分布式系統的用戶任務包括多級子任務,其中,將這些子任務中產生中間資料的子任務稱為上級子任務,將這些子任務中依賴中間資料進行處理的子任務稱為下級子任務。上級子任務和下級子任務是相對而言的。例如,在具有多級子任務的用戶任務中,一個子任務對於子任務A來說是下級子任務,但對於子任務B來說是卻可能是下級子任務。 It should be noted that, the intermediate data transmission method and the intermediate data transmission system in each embodiment of the present invention are all used in a distributed system, and the user tasks of the distributed system include multi-level subtasks, wherein the intermediate tasks are generated from these subtasks. The subtasks of the data are called upper-level subtasks, and the subtasks that rely on intermediate data for processing among these subtasks are called lower-level subtasks. Superior subtasks and subordinate subtasks are relative. For example, in a user task with multiple levels of subtasks, a subtask may be a subtask for subtask A but may be a subtask for subtask B.

其中,用戶任務可以是MapReduce任務、DAG(Directed Acyclic Graph,有向無環圖)任務等。 The user task may be a MapReduce task, a DAG (Directed Acyclic Graph, directed acyclic graph) task, or the like.

圖2為本發明實施例一中中間資料傳輸方法的流程圖。如圖2所示,本實施例中,中間資料傳輸方法可以包括如下步驟:步驟S201,確定上級子任務本次要傳輸的中間資料,記為待傳資料;在上級子任務有多個進程的情況下,上級子任務本次要傳輸的中間資料(也即待傳資料)可以是上級子任務的一個進程產生的中間資料,也可以是上級子任務的多個或全部進程產生的中間資料。 FIG. 2 is a flowchart of an intermediate data transmission method in Embodiment 1 of the present invention. As shown in FIG. 2 , in this embodiment, the method for transmitting intermediate data may include the following steps: Step S201, determining the intermediate data to be transmitted by the superior subtask this time, and denoting it as the data to be transmitted; if the superior subtask has multiple processes In this case, the intermediate data to be transmitted by the superior subtask this time (that is, the data to be transmitted) may be intermediate data generated by a process of the superior subtask, or may be intermediate data generated by multiple or all processes of the superior subtask.

步驟S202從下級子任務的用於監聽資料的網路端口中選擇需要接收待傳資料的網路端口,記為接收端口;下級子任務用於監聽資料的網路端口用來接收上級子 任務發送的待傳資料。 Step S202 selects the network port that needs to receive the data to be transmitted from the network port used for monitoring data of the lower-level subtask, and is denoted as the receiving port; The data to be sent by the task.

下級子任務可以根據需要設置網路端口。比如,下級子任務可以為每個進程分別設置一個網路端口,也可以為所有進程設置一個或多個共同網路端口。 Lower-level subtasks can set network ports as needed. For example, lower-level subtasks can set a network port for each process, or set one or more common network ports for all processes.

步驟S203,將待傳資料通過網路直接傳輸給接收端口。 Step S203, the data to be transmitted is directly transmitted to the receiving port through the network.

也就是說,將待傳資料由上級子任務所在的網路通過網路直接傳輸到下任務所在的網路,中間不再像背景技術中提到的相關技術那樣經過分布式儲存系統的寫入和讀取過程。 That is to say, the data to be transmitted is directly transmitted from the network where the superior subtask is located to the network where the next task is located through the network, and is no longer written in the distributed storage system as in the related art mentioned in the background art. and reading process.

以MapReduce任務為例。MapReduce任務的兩級子任務MapTask和ReduceTask之間的中間資料傳輸過程如圖3所示。 Take MapReduce tasks as an example. Figure 3 shows the intermediate data transfer process between MapTask and ReduceTask, the two-level subtasks of MapReduce tasks.

在此基礎上,用戶可以根據需要制定具體的傳輸策略。 On this basis, users can formulate specific transmission strategies as needed.

例如,傳輸策略之一可以是:為上級子任務的每個進程分別設置一個第一網路端口,為下級子任務的每個進程分別設置一個第二網路端口,將待傳資料由產生待傳資料的一個或多個進程對應的第一網路端口分別傳輸到所有的第二網路端口。也即,將中間資料由產生中間資料的進程直接發送給下級子任務的所有進程。仍以MapReduce任務為例。按照此傳輸策略,MapReduce任務的兩級子任務MapTask和ReduceTask之間的中間資料傳遞過程如圖4所示。 For example, one of the transmission strategies may be: setting a first network port for each process of the upper-level subtask, setting a second network port for each process of the lower-level subtask, and transferring the data to be transmitted by The first network ports corresponding to one or more processes that transmit data are respectively transmitted to all the second network ports. That is, the intermediate data is sent directly to all processes of subtasks from the process that generates the intermediate data. Still take the MapReduce task as an example. According to this transmission strategy, the intermediate data transmission process between the two-level subtasks MapTask and ReduceTask of the MapReduce task is shown in Figure 4.

傳輸策略之二可以是:為上級子任務的所有進程設置一個或多個第一共同網路端口,為下級子任務的所有進程設置一個或多個第二共同網路端口,將上級子任務的各個進程產生的中間資料由第一共同網路端口通過網路直接傳輸到第二共同網路端口,第二共同網路端口再將中間資料分發給下級子任務的各個進程。 The second transmission strategy can be: set one or more first common network ports for all processes of the superior subtask, set one or more second common network ports for all processes of the inferior subtask, and transfer the The intermediate data generated by each process is directly transmitted from the first common network port to the second common network port through the network, and the second common network port then distributes the intermediate data to each process of the lower-level subtask.

本發明實施例中的中間資料傳輸方法,將分布式系統中的中間資料由產生該中間資料的上級子任務通過網路直接傳輸給下級子任務,中間不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。 The intermediate data transmission method in the embodiment of the present invention directly transmits the intermediate data in the distributed system from the upper-level subtask that generates the intermediate data to the lower-level subtask through the network, without going through the distributed storage system in the middle, avoiding the need for disks The impact of the IO rate on the intermediate data transfer rate increases the intermediate data transfer rate.

圖5為本發明實施例二中中間資料傳輸方法的流程圖。如圖5所示,本實施例中,中間資料傳輸方法可以包括如下步驟:步驟S501,啟動下級子任務,使下級子任務的用於監聽資料的網路端口處於監聽狀態;由於本發明實施例中,中間資料是由上級子任務通過網路直接傳輸給下級子任務的,因此在啟動上級子任務產生中間資料前,下級子任務應處於能夠接收中間資料的狀態。也就是說,下級子任務應當先於上級子任務啟動,這樣才能保證下級子任務能夠接收到上級子任務產生的中間資料。 FIG. 5 is a flowchart of an intermediate data transmission method in Embodiment 2 of the present invention. As shown in FIG. 5 , in this embodiment, the intermediate data transmission method may include the following steps: Step S501 , start a subtask, so that the network port of the subtask used for monitoring data is in a monitoring state; because the embodiment of the present invention , the intermediate data is directly transmitted by the superior subtask to the inferior subtask through the network, so before starting the superior subtask to generate the intermediate data, the inferior subtask should be in a state that can receive the intermediate data. That is to say, the lower-level subtask should be started before the upper-level subtask, so as to ensure that the lower-level subtask can receive the intermediate data generated by the upper-level subtask.

步驟S502,將下級子任務的用於監聽資料的網路端口的資訊通知給上級子任務; 網路端口的資訊一般可以包括IP地址等。上級子任務可以根據下級子任務的用於監聽資料的網路端口的資訊確定中間資料的目的地址。 Step S502, the information of the network port used for monitoring data of the lower-level subtask is notified to the upper-level subtask; The information of the network port may generally include an IP address and the like. The upper-level subtask can determine the destination address of the intermediate data according to the information of the network port of the lower-level subtask for monitoring data.

在具體應用中,下級子任務可以將網路端口的資訊先上報給調度器,調度器再將下級子任務的用於監聽資料的網路端口資訊發送給上級子任務。 In a specific application, the lower-level subtask can first report the network port information to the scheduler, and the scheduler then sends the network port information of the lower-level subtask for monitoring data to the upper-level subtask.

步驟S503,在下級子任務啟動之後啟動上級子任務,產生中間資料;在下級子任務啟動後,下級子任務的用於監聽資料的網路端口已經處於監聽狀態的情況下,再啟動上級子任務。即先啟動下級子任務啟動,後啟動上級子任務。 Step S503, start the upper-level subtask after the lower-level subtask is started, and generate intermediate data; after the lower-level subtask is started, when the network port of the lower-level subtask for monitoring data is already in the monitoring state, start the upper-level subtask again . That is, start the lower-level subtask first, and then start the upper-level subtask.

步驟S504,確定上級子任務本次要傳輸的中間資料,記為待傳資料;步驟S505從下級子任務的用於監聽資料的網路端口中選擇需要接收待傳資料的網路端口,記為接收端口;步驟S506,將待傳資料通過網路直接傳輸給接收端口。 Step S504, determine the intermediate data to be transmitted by the superior subtask this time, and denote it as the data to be transmitted; Step S505 selects the network port that needs to receive the data to be transmitted from the network ports of the subtask for monitoring data, denoted as A receiving port; Step S506, the data to be transmitted is directly transmitted to the receiving port through the network.

本發明實施例中的中間資料傳輸方法,將分布式系統中的中間資料由產生該中間資料的上級子任務通過網路直接傳輸給下級子任務,中間不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。再者,本發明實施例中的中間資料傳輸方法,下級子任務先於上級子任務啟動,以保證下級子任務能夠接收到上級子任務產生的中間資料。 The intermediate data transmission method in the embodiment of the present invention directly transmits the intermediate data in the distributed system from the upper-level subtask that generates the intermediate data to the lower-level subtask through the network, without going through the distributed storage system in the middle, avoiding the need for disks The impact of the IO rate on the intermediate data transfer rate increases the intermediate data transfer rate. Furthermore, in the intermediate data transmission method in the embodiment of the present invention, the lower-level subtask is started before the upper-level subtask, so as to ensure that the lower-level subtask can receive the intermediate data generated by the upper-level subtask.

圖6為本發明實施例三中中間資料傳輸方法的流程圖。如圖6所示,本實施例中,中間資料傳輸方法可以包括如下步驟:步驟S601,確定上級子任務的單個進程本次要傳輸的中間資料,記為待傳資料,其中,上級子任務的每個進程對應一個第一網路端口;本實施例中,以進程為單位進行中間資料的發送。上級子任務的每個進程各自確定自己要傳輸的中間資料,然後將各自將自己要傳輸的中間資料通過網路傳輸給下級子任務的所有進程。 FIG. 6 is a flowchart of an intermediate data transmission method in Embodiment 3 of the present invention. As shown in FIG. 6 , in this embodiment, the method for transmitting intermediate data may include the following steps: Step S601 , determine the intermediate data to be transmitted by a single process of the upper-level subtask this time, and record it as the data to be transmitted, wherein the data of the upper-level subtask is determined. Each process corresponds to a first network port; in this embodiment, the intermediate data is sent in units of processes. Each process of the superior subtask determines its own intermediate data to be transmitted, and then transmits the intermediate data to be transmitted to all the processes of the inferior subtask through the network.

步驟S602,下級子任務的每個進程對應一個第二網路端口,將所有第二網路端口均選為待傳資料的接收端口;在本實施例中,上、下級子任務的每個進程都分別對應一個網路端口。上級子任務的進程對應的第一網路端口為中間資料的發送端口,下級子任務的進程對應的第二網路端口為中間資料的接收端口。相應地,將上級子任務的進程稱為發送進程,將下級子任務的進程稱為接收進程。中間資料由發送端口通過網路直接傳輸到接收端口。該傳輸方式例如前述圖4所示的中間資料傳輸方式。 Step S602, each process of the lower-level subtask corresponds to a second network port, and all second network ports are selected as the receiving ports of the data to be transmitted; in this embodiment, each process of the upper and lower-level subtasks Each corresponds to a network port. The first network port corresponding to the process of the upper-level subtask is the sending port of the intermediate data, and the second network port corresponding to the process of the lower-level subtask is the receiving port of the intermediate data. Correspondingly, the process of the upper-level subtask is called the sending process, and the process of the lower-level subtask is called the receiving process. The intermediate data is directly transmitted from the sending port to the receiving port through the network. The transmission mode is, for example, the intermediate data transmission mode shown in FIG. 4 .

步驟S603,將待傳資料由發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口,其中,發送進程為產生待傳資料的進程。 Step S603, the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, wherein the sending process is the process of generating the data to be transmitted.

由上述步驟可見,本實施例中,中間資料由發送進程 通過網路直接傳輸到接收進程,不必經由上級子任務收集各個發送進程產生的中間資料統一發送,也不必經由下級子任務統一接收中間資料後再分別發給各個接收進程,因此可以進一步提高中間資料傳輸速率。不僅如此,接收進程可以在接收到中間資料後立即進行處理,而不必等到所有發送進程產生完中間資料並傳輸到接收進程後才能進行處理,因此,中間資料由發送進程通過網路直接傳輸到接收進程,還可以讓接收進程在發送進程仍在產生中間資料的時候就能夠對已經傳輸到接收進程的一部分中間資料先進行處理,也就是說,接收進程和發送進程可以並行運行,這顯著提高了上下級子任務之間的並行執行能力,縮短了用戶任務整體的運行時間。 It can be seen from the above steps that in this embodiment, the intermediate data is sent by the sending process It is directly transmitted to the receiving process through the network, and it is not necessary to collect the intermediate data generated by each sending process through the superior subtask and send it uniformly, and it is not necessary to uniformly receive the intermediate data through the subtask and then send it to each receiving process separately. Transmission rate. Not only that, the receiving process can process the intermediate data immediately after receiving it, instead of waiting for all the sending processes to generate the intermediate data and transmit it to the receiving process. Therefore, the intermediate data is directly transmitted from the sending process to the receiving process through the network. process, it can also allow the receiving process to process a part of the intermediate data that has been transmitted to the receiving process when the sending process is still generating the intermediate data, that is, the receiving process and the sending process can run in parallel, which significantly improves the The parallel execution capability between upper and lower sub-tasks shortens the overall running time of user tasks.

由於通過網路傳輸資料本身會有一定的延時,如果發送進程每次只發送少量資料,那麼發送資料的次數就會變多,這樣由於網路傳輸產生的overhead(間接成本)就會很大。另一方面,如果發送進程每次發送的資料量都較大,當需要接收進程比較多時,就會導致發送進程使用的記憶體很高。為了解決發送進程何時發送中間資料的問題,在本發明其他實施例中,在步驟S603之前,也即在將待傳資料由發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口之前,還可以包括如下步驟:判斷待傳資料是否大於或等於msgSize,msgSize表示發送進程每次傳輸中的發送資料長度;在待傳資料大於或等於msgSize的情況下,啟動待傳 資料的傳輸,並確定待傳資料的發送資料長度為msgSize;其中,msgSize=max(min(BuffSize,MaxMsgSize*n),MinMsgSize*n)/n,BuffSize表示預設的發送進程的可用記憶體上限,MaxMsgSize表示預設的發送進程在一次發送中傳輸的最大資料長度,MinMsgSize表示預設的發送進程在一次發送中傳輸的最小資料長度,n表示第二網路端口的數量,“*”表示乘以運算,“/”表示除以運算。n的含義是接收進程的數量,在本實施例中,接收進程的數量等於接收端口的數量,也等於第二網路端口的數量,因此,接收進程的數量就是第二網路端口的數量。 Since there will be a certain delay in transmitting data through the network, if the sending process only sends a small amount of data at a time, the number of times of sending data will increase, and the overhead (indirect cost) caused by network transmission will be very large. On the other hand, if the amount of data sent by the sending process each time is large, when there are many receiving processes, the memory used by the sending process will be very high. In order to solve the problem of when the sending process sends the intermediate data, in other embodiments of the present invention, before step S603, that is, the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second data through the network. Before the network port, it can also include the following steps: judging whether the data to be transmitted is greater than or equal to msgSize, msgSize represents the length of the data to be sent in each transmission of the sending process; when the data to be transmitted is greater than or equal to msgSize, start the data to be transmitted Data transmission, and determine the length of the data to be transmitted is msgSize; among them, msgSize=max(min(BuffSize,MaxMsgSize*n),MinMsgSize*n)/n, BuffSize represents the default upper limit of available memory for the sending process , MaxMsgSize represents the maximum data length transmitted by the preset sending process in one transmission, MinMsgSize represents the minimum data length transmitted by the preset sending process in one transmission, n represents the number of second network ports, "*" represents the multiplication With operation, "/" means division operation. The meaning of n is the number of receiving processes. In this embodiment, the number of receiving processes is equal to the number of receiving ports and the number of second network ports. Therefore, the number of receiving processes is the number of second network ports.

上述步驟根據用戶任務的規模(指接收進程的數量)及每個發送進程可用的記憶體大小計算發送進程每次傳輸中的發送資料長度,實現動態調節,這樣就可以避免由於每次發送資料過少導致的間接成本過高的問題以及每次發送資料過大導致的記憶體需求過高的問題。 The above steps calculate the length of the sent data in each transmission of the sending process according to the scale of the user task (referring to the number of receiving processes) and the available memory size of each sending process, so as to realize dynamic adjustment, so as to avoid the problem of too little data sent each time. The resulting problem of high overhead costs and the problem of high memory requirements caused by too much data sent each time.

待傳資料通常是以消息的形式發送的。在本發明實施例中,承載待傳資料的消息中可以攜帶第一進程標識和消息編號,其中,第一進程標識用於表明該消息的發送進程,消息編號用於表明該消息是該發送進程發送的第幾條消息。也就是說,第一進程標識用於說明消息是由“誰”發送的問題,消息編號是單調遞增的,它用於說明該消息在發送進程發送的所有消息中所處的位置問題或者說序號 問題。 The material to be transmitted is usually sent in the form of a message. In this embodiment of the present invention, a message carrying data to be transmitted may carry a first process identifier and a message number, where the first process identifier is used to indicate the sending process of the message, and the message number is used to indicate that the message is the sending process The number of messages sent. That is to say, the first process identifier is used to explain the question of "who" sent the message, and the message number is monotonically increasing, which is used to explain the position or sequence number of the message in all the messages sent by the sending process problem.

接收進程可以根據承載待傳資料的消息中攜帶的第一進程標識和消息編號判斷是否已經接收到該消息,以便根據具體情況處理該消息。 The receiving process may judge whether the message has been received according to the first process identifier and the message number carried in the message carrying the data to be transmitted, so as to process the message according to the specific situation.

第二網路端口對應的進程可以記錄當前已經從發送進程收到的最大消息編號,如果接收到的消息攜帶的消息編號大於該最大消息編號,說明接收進程第一次接收該消息,如果接收到的消息攜帶的消息編號小於或等於該最大消息編號,說明接收進程已經收到過該消息。 The process corresponding to the second network port can record the maximum message number that has been received from the sending process. If the message number carried in the received message is greater than the maximum message number, it means that the receiving process receives the message for the first time. The message number carried in the message is less than or equal to the maximum message number, indicating that the receiving process has received the message.

據此,在本發明實施例中,在將待傳資料由發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口之後,還可以包括:第二網路端口接收到承載待傳資料的消息後,在該消息攜帶的消息編號大於第二網路端口對應的進程記錄的當前已經從該發送進程收到的最大消息編號的情況下,第二網路端口對應的進程保存該待傳資料,並將記錄的最大消息編號更新為該消息攜帶的消息編號。 Accordingly, in this embodiment of the present invention, after the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, it may further include: the second network port receives the After carrying the message of the data to be transmitted, in the case that the message number carried in the message is greater than the maximum message number currently received from the sending process recorded by the process corresponding to the second network port, the process corresponding to the second network port Save the data to be transmitted, and update the recorded maximum message number to the message number carried in the message.

或者,在本發明實施例中,在將待傳資料由發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口之後,還可以包括:第二網路端口接收到承載待傳資料的消息後,在該消息攜帶的消息編號小於或等於第二網路端口對應的進程記錄的當前已經從該發送進程收到的最大消息編號的情況下,第二網路端口對應的進程丟棄該消息,並維持記錄的最大消息編號不變。 Or, in this embodiment of the present invention, after the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, it may further include: the second network port receives the bearer After the message of the data to be transmitted, in the case that the message number carried in the message is less than or equal to the maximum message number currently received from the sending process recorded by the process corresponding to the second network port, the corresponding The process discards the message and maintains the highest recorded message number.

本發明實施例中的中間資料傳輸方法,將分布式系統中的中間資料由產生該中間資料的上級子任務通過網路直接傳輸給下級子任務,中間不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。而且,本發明實施例的中間資料傳輸系統,還顯著提高了上下級子任務之間的並行執行能力,縮短了用戶任務整體的運行時間。 The intermediate data transmission method in the embodiment of the present invention directly transmits the intermediate data in the distributed system from the upper-level subtask that generates the intermediate data to the lower-level subtask through the network, without going through the distributed storage system in the middle, avoiding the need for disks The impact of the IO rate on the intermediate data transfer rate increases the intermediate data transfer rate. Moreover, the intermediate data transmission system of the embodiment of the present invention also significantly improves the parallel execution capability between the subtasks of the upper and lower levels, and shortens the overall running time of the user task.

在中間資料採用圖6所示實施例中的由發送進程直接傳輸到接收進程的傳輸方式的情況下,可以採用超時重發機制應對可能出現的消息丟失(即發送進程發送的中間資料無法到達接收進程)現象。該超時重發機制的具體內容可以是,發送進程向接收進程發送中間資料後,如果接收進程接收到該中間資料,就向發送進程返回應答響應,告知發送進程該接收進程已經接收到其發送的中間資料;發送進程接收到應答響應後,就可以向返回應答響應的接收進程發送下一次的中間資料了。如果發送進程向接收進程發送中間資料後,接收進程沒有接收到該中間資料,就不會向發送進程返回應答響應。發送進程在設定的時限內沒有接收到應答響應,就認為沒有返回應答響應的接收進程沒有接收到該中間資料,因此會向沒有返回應答響應的接收進程重發本次傳輸的中間資料,而不會向沒有返回應答響應的接收進程發送下一次的中間資料,直到發送進程接收到該接收進程返回的應答響應為止。這樣,既可以保證接收進程能夠接收到中間資料,又可以使同樣的中間資料 不會在網路中進行多次傳輸,因此能夠節約網路帶寬。 In the case where the intermediate data is directly transmitted from the sending process to the receiving process in the embodiment shown in FIG. 6 , a timeout retransmission mechanism can be used to deal with possible message loss (that is, the intermediate data sent by the sending process cannot be reached). receiving process) phenomenon. The specific content of the timeout retransmission mechanism may be that after the sending process sends the intermediate data to the receiving process, if the receiving process receives the intermediate data, it will return a response to the sending process, informing the sending process that the receiving process has received its transmission. After the sending process receives the response, it can send the next intermediate data to the receiving process that returns the response. If the sending process sends the intermediate data to the receiving process, and the receiving process does not receive the intermediate data, it will not return a response to the sending process. The sending process does not receive a response within the set time limit, and it is considered that the receiving process that did not return a response did not receive the intermediate data, so it will retransmit the intermediate data of this transmission to the receiving process that did not return a response. The next intermediate data will be sent to the receiving process that does not return a response response until the sending process receives the response response returned by the receiving process. In this way, it can not only ensure that the receiving process can receive the intermediate data, but also make the same intermediate data There are no multiple transmissions in the network, thus saving network bandwidth.

據此,本發明的中間資料傳輸方法可以採用圖7所示的流程。 Accordingly, the intermediate data transmission method of the present invention may adopt the flow shown in FIG. 7 .

圖7為本發明實施例四中中間資料傳輸方法的流程圖。如圖7所示,本實施例中,中間資料傳輸方法可以包括如下步驟:步驟S701,確定上級子任務的單個進程本次要傳輸的中間資料,記為待傳資料,其中,上級子任務的每個進程對應一個第一網路端口;步驟S702,下級子任務的每個進程對應一個第二網路端口,將所有第二網路端口均選為待傳資料的接收端口;步驟S703,將待傳資料由發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口,其中,發送進程為產生待傳資料的進程;步驟S704,在傳輸之後,判斷在設定時間內發送進程是否接收到第二網路端口對應的進程返回的應答響應,如果在設定時間內發送進程接收到第二網路端口對應的進程返回的應答響應,則執行步驟S705,否則執行步驟S706;為了區分應答響應來自於哪一個接收進程,應答響應中可以攜帶第二進程標識,該第二進程標識用於表明發出該應答響應的接收進程,發送進程根據第二進程標識判斷發出該應答響應的接收進程。 FIG. 7 is a flowchart of an intermediate data transmission method in Embodiment 4 of the present invention. As shown in FIG. 7 , in this embodiment, the method for transmitting intermediate data may include the following steps: Step S701 , determining the intermediate data to be transmitted by a single process of the upper-level subtask this time, which is recorded as the data to be transmitted, wherein the data of the upper-level subtask is Each process corresponds to a first network port; in step S702, each process of the lower-level subtask corresponds to a second network port, and all second network ports are selected as the receiving ports of the data to be transmitted; step S703, the The data to be transmitted is directly transmitted by the first network port corresponding to the sending process to all the second network ports through the network, wherein the sending process is the process of generating the data to be transmitted; step S704, after the transmission, it is judged that within the set time Whether the sending process receives the response response returned by the process corresponding to the second network port, if the sending process receives the response response returned by the process corresponding to the second network port within the set time, step S705 is performed, otherwise step S706 is performed; In order to distinguish which receiving process the response comes from, the second process identifier can be carried in the response. The second process identifier is used to indicate the receiving process that issued the response, and the sending process judges the sender of the response according to the second process identifier. receive process.

步驟S705,允許發送進程通過網路向返回應答響應的進程傳輸下一次的資料;步驟S706,發送進程再次將待傳資料通過網路直接傳輸給未返回應答響應的一個或多個第二網路端口,返回步驟S704。 Step S705, allowing the sending process to transmit the next data to the process returning the response response through the network; Step S706, the sending process again directly transmits the data to be transmitted through the network to one or more second network ports that did not return a response response , and return to step S704.

本發明實施例中的中間資料傳輸方法,將分布式系統中的中間資料由產生該中間資料的上級子任務通過網路直接傳輸給下級子任務,中間不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。而且,本發明實施例的中間資料傳輸系統,還顯著提高了上下級子任務之間的並行執行能力,縮短了用戶任務整體的運行時間。同時,在本發明實施例中,同樣的中間資料不會在網路中進行多次傳輸,因此本發明實施例的中間資料傳輸系統,還能夠節約網路帶寬。 The intermediate data transmission method in the embodiment of the present invention directly transmits the intermediate data in the distributed system from the upper-level subtask that generates the intermediate data to the lower-level subtask through the network, without going through the distributed storage system in the middle, avoiding the need for disks The impact of the IO rate on the intermediate data transfer rate increases the intermediate data transfer rate. Moreover, the intermediate data transmission system of the embodiment of the present invention also significantly improves the parallel execution capability between the subtasks of the upper and lower levels, and shortens the overall running time of the user task. Meanwhile, in the embodiment of the present invention, the same intermediate data will not be transmitted multiple times in the network, so the intermediate data transmission system in the embodiment of the present invention can also save network bandwidth.

本發明還提出了中間資料傳輸系統,用以實施上述各實施例的中間資料傳輸方法。上述對中間資料傳輸方法的說明均適用於中間資料傳輸系統的相應部分。下面本發明各實施例中的中間資料傳輸系統都用於分布式系統,該分布式系統的用戶任務包括多級子任務,這些子任務中產生中間資料的子任務稱為上級子任務,這些子任務中依賴中間資料進行處理的子任務稱為下級子任務。 The present invention also provides an intermediate data transmission system, which is used to implement the intermediate data transmission methods of the above embodiments. The above description of the intermediate data transmission method is applicable to the corresponding part of the intermediate data transmission system. The intermediate data transmission systems in the following embodiments of the present invention are all used in a distributed system. The user tasks of the distributed system include multi-level subtasks. The subtasks that generate intermediate data in these subtasks are called upper-level subtasks. Subtasks in tasks that rely on intermediate data for processing are called subtasks.

圖8為本發明實施例五中中間資料傳輸系統的結構框圖。如圖8所示,本實施例中,中間資料傳輸系統800可以包括確定模組810、選擇模組820和傳輸模組830。其 中,確定模組810用於確定上級子任務本次要傳輸的中間資料,記為待傳資料。選擇模組820用於從下級子任務的用於監聽資料的網路端口中選擇需要接收待傳資料的網路端口,記為接收端口。傳輸模組830用於將確定模組810確定的待傳資料通過網路直接傳輸給選擇模組820選擇的接收端口。 FIG. 8 is a structural block diagram of an intermediate data transmission system in Embodiment 5 of the present invention. As shown in FIG. 8 , in this embodiment, the intermediate data transmission system 800 may include a determination module 810 , a selection module 820 and a transmission module 830 . That Among them, the determining module 810 is used to determine the intermediate data to be transmitted by the superior subtask this time, which is recorded as the data to be transmitted. The selection module 820 is used to select the network port that needs to receive the data to be transmitted from the network ports used for monitoring data of the subtasks of the lower level, which is recorded as the receiving port. The transmission module 830 is used for directly transmitting the data to be transmitted determined by the determination module 810 to the receiving port selected by the selection module 820 through the network.

其中,在上級子任務有多個進程的情況下,確定模組810所確定的待傳資料可以是上級子任務的一個進程產生的中間資料,也可以是上級子任務的多個或全部進程產生的中間資料。 Wherein, in the case that the upper-level subtask has multiple processes, the data to be transmitted determined by the determination module 810 may be intermediate data generated by one process of the upper-level subtask, or may be generated by multiple or all processes of the upper-level subtask. intermediate data.

其中,選擇模組820選擇的接收端口可以是下級子任務的單個進程對應的網路端口,也可以為下級子任務的所有進程對應的一個或多個共同網路端口。 The receiving port selected by the selection module 820 may be a network port corresponding to a single process of the lower-level subtask, or may be one or more common network ports corresponding to all processes of the lower-level subtask.

其中,傳輸模組830可以使用用戶根據需要制定的具體傳輸策略將待傳資料通過網路直接傳輸給接收端口。例如前述本發明實施例一中列舉的傳輸策略。 The transmission module 830 can directly transmit the data to be transmitted to the receiving port through the network by using a specific transmission strategy formulated by the user as required. For example, the transmission strategy listed in the first embodiment of the present invention.

本發明實施例中的中間資料傳輸系統,將分布式系統中的中間資料由產生該中間資料的上級子任務通過網路直接傳輸給下級子任務,中間不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。 The intermediate data transmission system in the embodiment of the present invention directly transmits the intermediate data in the distributed system from the upper-level subtask that generates the intermediate data to the lower-level subtask through the network, without going through the distributed storage system in the middle, avoiding the need for disks The impact of the IO rate on the intermediate data transfer rate increases the intermediate data transfer rate.

圖9為本發明實施例六中中間資料傳輸系統的結構框圖。如圖9所示,本實施例中,中間資料傳輸系統可以包括第一啟動模組910、通知模組920、第二啟動模組930 確定模組940、選擇模組950和傳輸模組960。其中,第一啟動模組910用於啟動下級子任務,使下級子任務的用於監聽資料的網路端口處於監聽狀態。通知模組920與第一啟動模組910相連,用於將下級子任務的用於監聽資料的網路端口的資訊通知給上級子任務。第二啟動模組930分別與通知模組920和確定模組940相連,用於在下級子任務啟動之後啟動上級子任務,產生中間資料。確定模組940用於確定上級子任務本次要傳輸的中間資料,記為待傳資料。選擇模組950用於從下級子任務的用於監聽資料的網路端口中選擇需要接收待傳資料的網路端口,記為接收端口。傳輸模組960用於將確定模組940確定的待傳資料通過網路直接傳輸給選擇模組950選擇的接收端口。 FIG. 9 is a structural block diagram of an intermediate data transmission system in Embodiment 6 of the present invention. As shown in FIG. 9 , in this embodiment, the intermediate data transmission system may include a first activation module 910 , a notification module 920 , and a second activation module 930 Determine module 940 , select module 950 and transfer module 960 . Wherein, the first activation module 910 is used to activate the lower-level subtask, so that the network port of the lower-level subtask used for monitoring data is in a listening state. The notification module 920 is connected to the first activation module 910, and is used for notifying the upper-level subtask of the information of the network port used for monitoring data of the lower-level subtask. The second activation module 930 is respectively connected to the notification module 920 and the determination module 940, and is used to activate the upper-level subtask after the lower-level subtask is activated to generate intermediate data. The determining module 940 is used to determine the intermediate data to be transmitted by the superior subtask this time, which is recorded as the data to be transmitted. The selection module 950 is used to select the network port that needs to receive the data to be transmitted from the network ports used for monitoring data of the subtask, which is recorded as the receiving port. The transmission module 960 is used for directly transmitting the data to be transmitted determined by the determination module 940 to the receiving port selected by the selection module 950 through the network.

本發明實施例中的中間資料傳輸系統,將分布式系統中的中間資料由產生該中間資料的上級子任務通過網路直接傳輸給下級子任務,中間不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。再者,本發明實施例中的中間資料傳輸系統,下級子任務先於上級子任務啟動,以保證下級子任務能夠接收到上級子任務產生的中間資料。 The intermediate data transmission system in the embodiment of the present invention directly transmits the intermediate data in the distributed system from the upper-level subtask that generates the intermediate data to the lower-level subtask through the network, without going through the distributed storage system in the middle, avoiding the need for disks The impact of the IO rate on the intermediate data transfer rate increases the intermediate data transfer rate. Furthermore, in the intermediate data transmission system in the embodiment of the present invention, the lower-level subtask is started before the upper-level subtask, so as to ensure that the lower-level subtask can receive the intermediate data generated by the upper-level subtask.

圖10為本發明實施例七中中間資料傳輸系統的傳輸模組的結構框圖。如圖10所示,本實施例中,中間資料傳輸系統的傳輸模組1000可以包括進程間傳輸單元1010。進程間傳輸單元1010用於在待傳資料由上級子任務的單個進程產生,上級子任務的每個進程對應一個第一 網路端口,下級子任務的每個進程對應一個第二網路端口,所有第二網路端口均為接收端口的情況下,將待傳資料由發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口,其中,發送進程指產生待傳資料的進程。 FIG. 10 is a structural block diagram of a transmission module of the intermediate data transmission system in Embodiment 7 of the present invention. As shown in FIG. 10 , in this embodiment, the transmission module 1000 of the intermediate data transmission system may include an inter-process transmission unit 1010 . The inter-process transmission unit 1010 is used to generate the data to be transmitted by a single process of the upper-level subtask, and each process of the upper-level subtask corresponds to a first. Network port. Each process of the subtask corresponds to a second network port. If all second network ports are receiving ports, the data to be transmitted is sent through the first network port corresponding to the sending process through the network. It is directly transmitted to all the second network ports, wherein the sending process refers to the process that generates the data to be transmitted.

本實施例中,中間資料由發送進程通過網路直接傳輸到接收進程,不必經由上級子任務收集各個發送進程產生的中間資料統一發送,也不必經由下級子任務統一接收中間資料後再分別發給各個接收進程,因此可以進一步提高中間資料傳輸速率。不僅如此,接收進程可以在接收到中間資料後立即進行處理,而不必等到所有發送進程產生完中間資料並傳輸到接收進程後才能進行處理,因此,中間資料由發送進程通過網路直接傳輸到接收進程,還可以讓接收進程在發送進程仍在產生中間資料的時候就能夠對已經傳輸到接收進程的一部分中間資料先進行處理,也就是說,接收進程和發送進程可以並行運行,這顯著提高了上下級子任務之間的並行執行能力,縮短了用戶任務整體的運行時間。 In this embodiment, the intermediate data is directly transmitted from the sending process to the receiving process through the network, and it is not necessary to collect the intermediate data generated by each sending process through the upper-level subtask and send it uniformly, and it is also unnecessary to uniformly receive the intermediate data through the lower-level subtask and then send it to Each receiving process can thus further increase the intermediate data transfer rate. Not only that, the receiving process can process the intermediate data immediately after receiving it, instead of waiting for all the sending processes to generate the intermediate data and transmit it to the receiving process. Therefore, the intermediate data is directly transmitted from the sending process to the receiving process through the network. process, it can also allow the receiving process to process a part of the intermediate data that has been transmitted to the receiving process when the sending process is still generating the intermediate data, that is, the receiving process and the sending process can run in parallel, which significantly improves the The parallel execution capability between upper and lower sub-tasks shortens the overall running time of user tasks.

在圖10所示實施例的基礎上,傳輸模組1000還可以進一步包括判斷單元和啟動單元。其中,判斷單元用於判斷待傳資料是否大於或等於msgSize,msgSize表示發送進程每次傳輸中的發送資料長度。啟動單元用於在判斷單元的判斷結果為待傳資料大於或等於msgSize的情況下,啟動進程間傳輸單元1010進行待傳資料的傳輸,並確定 待傳資料的發送資料長度為msgSize。其中,msgSize=max(min(BuffSize,MaxMsgSize*n),MinMsgSize*n)/n,BuffSize表示預設的發送進程的可用記憶體上限,MaxMsgSize表示預設的發送進程在一次發送中傳輸的最大資料長度,MinMsgSize表示預設的發送進程在一次發送中傳輸的最小資料長度,n表示第二網路端口的數量,“*”表示乘以運算,“/”表示除以運算。 On the basis of the embodiment shown in FIG. 10 , the transmission module 1000 may further include a judgment unit and an activation unit. Wherein, the judging unit is used for judging whether the data to be transmitted is greater than or equal to msgSize, and msgSize represents the length of the data to be sent in each transmission of the sending process. The starting unit is used to start the inter-process transmission unit 1010 to transmit the data to be transmitted when the judgment result of the judging unit is that the data to be transmitted is greater than or equal to msgSize, and determine The length of the data to be transmitted is msgSize. Among them, msgSize=max(min(BuffSize,MaxMsgSize*n),MinMsgSize*n)/n, BuffSize represents the upper limit of available memory of the preset sending process, and MaxMsgSize represents the maximum data transmitted by the preset sending process in one sending Length, MinMsgSize represents the minimum data length transmitted by the preset sending process in one sending, n represents the number of second network ports, "*" represents multiplication, and "/" represents division.

上述判斷單元和啟動單元根據用戶任務的規模(指接收進程的數量)及每個發送進程可用的記憶體大小計算發送進程每次傳輸中的發送資料長度,實現動態調節,這樣就可以避免由於每次發送資料過少導致的間接成本過高的問題以及每次發送資料過大導致的記憶體需求過高的問題。 The above judging unit and starting unit calculate the length of the sent data in each transmission of the sending process according to the scale of user tasks (referring to the number of receiving processes) and the available memory size of each sending process, so as to realize dynamic adjustment, so as to avoid The problem of high overhead costs caused by too few data transfers at one time and the problem of high memory requirements caused by too much data sent each time.

在本發明實施例中,承載待傳資料的消息中可以攜帶第一進程標識和消息編號,其中,第一進程標識用於表明該消息的發送進程,消息編號用於表明該消息是該發送進程發送的第幾條消息。 In this embodiment of the present invention, a message carrying data to be transmitted may carry a first process identifier and a message number, where the first process identifier is used to indicate the sending process of the message, and the message number is used to indicate that the message is the sending process The number of messages sent.

接收進程可以根據承載待傳資料的消息中攜帶的第一進程標識和消息編號判斷是否已經接收到該消息,以便根據具體情況處理該消息。 The receiving process may judge whether the message has been received according to the first process identifier and the message number carried in the message carrying the data to be transmitted, so as to process the message according to the specific situation.

第二網路端口對應的進程可以記錄當前已經從發送進程收到的最大消息編號,如果接收到的消息攜帶的消息編號大於該最大消息編號,說明接收進程第一次接收該消息,如果接收到的消息攜帶的消息編號小於或等於該最大 消息編號,說明接收進程已經收到過該消息。 The process corresponding to the second network port can record the maximum message number that has been received from the sending process. If the message number carried in the received message is greater than the maximum message number, it means that the receiving process receives the message for the first time. The message number carried by the message is less than or equal to the maximum The message number, indicating that the receiving process has already received the message.

據此,在本發明實施例中,傳輸模組1000可以包括保存單元。保存單元用於在第二網路端口接收到承載待傳資料的消息後,在該消息攜帶的消息編號大於第二網路端口對應的進程記錄的當前已經從該發送進程(指發出待傳資料的進程)收到的最大消息編號的情況下,令第二網路端口對應的進程保存待傳資料,並將記錄的最大消息編號更新為該消息攜帶的消息編號。 Accordingly, in this embodiment of the present invention, the transmission module 1000 may include a storage unit. The saving unit is configured to, after receiving the message carrying the data to be transmitted on the second network port, the message number carried in the message is greater than that recorded by the process corresponding to the second network port that has currently been sent from the sending process (referring to the sending of the data to be transmitted). In the case of the maximum message number received by the process), make the process corresponding to the second network port save the data to be transmitted, and update the recorded maximum message number to the message number carried by the message.

在本發明實施例中,傳輸模組1000還可以包括丟棄單元。丟棄單元用於在第二網路端口接收到承載待傳資料的消息後,在消息攜帶的消息編號小於或等於第二網路端口對應的進程記錄的當前已經從發送進程(指發出待傳資料的進程)收到的最大消息編號的情況下,令第二網路端口對應的進程丟棄該消息,並維持記錄的最大消息編號不變。 In this embodiment of the present invention, the transmission module 1000 may further include a discarding unit. The discarding unit is used to, after the second network port receives the message carrying the data to be transmitted, when the message number carried in the message is less than or equal to the process record corresponding to the second network port that has been sent from the sending process (indicating that the data to be transmitted is sent out). In the case of the maximum message number received by the process), make the process corresponding to the second network port discard the message, and keep the recorded maximum message number unchanged.

本發明實施例中的中間資料傳輸系統,將分布式系統中的中間資料由產生該中間資料的上級子任務通過網路直接傳輸給下級子任務,中間不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。而且,本發明實施例的中間資料傳輸系統,還顯著提高了上下級子任務之間的並行執行能力,縮短了用戶任務整體的運行時間。 The intermediate data transmission system in the embodiment of the present invention directly transmits the intermediate data in the distributed system from the upper-level subtask that generates the intermediate data to the lower-level subtask through the network, without going through the distributed storage system in the middle, avoiding the need for disks The impact of the IO rate on the intermediate data transfer rate increases the intermediate data transfer rate. Moreover, the intermediate data transmission system of the embodiment of the present invention also significantly improves the parallel execution capability between the subtasks of the upper and lower levels, and shortens the overall running time of the user task.

在中間資料採用圖10所示實施例中的進程間傳輸單元進行傳輸的情況下,傳輸模組可以採用超時重發機制應 對可能出現的消息丟失(即發送進程發送的中間資料無法到達接收進程)現象。 In the case where the intermediate data is transmitted by the inter-process transmission unit in the embodiment shown in FIG. 10, the transmission module can use the timeout retransmission mechanism to respond For possible message loss (that is, the intermediate data sent by the sending process cannot reach the receiving process).

據此,中間資料傳輸系統的傳輸模組可以採用如圖11所示的結構。圖11為本發明實施例八中中間資料傳輸系統的傳輸模組的結構框圖。如圖11所示,本實施例中,傳輸模組1100可以包括進程間傳輸單元1110、響應接收單元1120。其中,進程間傳輸單元1110同前述的進程間傳輸單元1000的功能相同,此處不再贅述。響應接收單元1120用於在進程間傳輸單元1110進行傳輸之後,令發送進程接收第二網路端口對應的進程返回的應答響應,該應答響應用於表示第二網路端口對應的進程已經接收到待傳資料。 Accordingly, the transmission module of the intermediate data transmission system can adopt the structure shown in FIG. 11 . FIG. 11 is a structural block diagram of a transmission module of the intermediate data transmission system in Embodiment 8 of the present invention. As shown in FIG. 11 , in this embodiment, the transmission module 1100 may include an inter-process transmission unit 1110 and a response receiving unit 1120 . The function of the inter-process transmission unit 1110 is the same as that of the inter-process transmission unit 1000, which is not repeated here. The response receiving unit 1120 is configured to make the sending process receive the response response returned by the process corresponding to the second network port after the inter-process transmission unit 1110 transmits, and the response response is used to indicate that the process corresponding to the second network port has received the response. Information to be transmitted.

參見圖11,傳輸模組1110還可以包括許可單元1130。許可單元1130用於在響應接收單元1120接收到應答響應後,允許發送進程通過網路向返回應答響應的進程傳輸下一次的資料。 Referring to FIG. 11 , the transmission module 1110 may further include a license unit 1130 . The permitting unit 1130 is configured to, after the response receiving unit 1120 receives the response response, allow the sending process to transmit the next data to the process returning the response response through the network.

參見圖11,傳輸模組1110還可以包括重傳單元1140。重傳單元1140用於在進程間傳輸單元1110進行傳輸之後,在設定時間內響應接收單元1120未接收到一個或多個第二網路端口對應的進程返回的應答響應的情況下,令發送進程再次將待傳資料通過網路直接傳輸給未返回應答響應的一個或多個第二網路端口。 Referring to FIG. 11 , the transmission module 1110 may further include a retransmission unit 1140 . The retransmission unit 1140 is configured to, after the inter-process transmission unit 1110 performs transmission, in the case that the response receiving unit 1120 does not receive a response response returned by the process corresponding to the one or more second network ports within the set time, make the sending process Again, the data to be transmitted is directly transmitted through the network to one or more second network ports that have not returned a response response.

為了區分應答響應來自於哪一個接收進程,應答響應中可以攜帶第二進程標識,該第二進程標識用於表明發出 該應答響應的接收進程,發送進程根據第二進程標識判斷發出該應答響應的接收進程。 In order to distinguish which receiving process the response comes from, the response can carry a second process identifier, which is used to indicate the sending process For the receiving process of the response response, the sending process determines the receiving process that issued the response response according to the second process identifier.

本發明實施例中的中間資料傳輸系統,將分布式系統中的中間資料由產生該中間資料的上級子任務通過網路直接傳輸給下級子任務,中間不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。而且,本發明實施例的中間資料傳輸系統,還顯著提高了上下級子任務之間的並行執行能力,縮短了用戶任務整體的運行時間。同時,在本發明實施例中,同樣的中間資料不會在網路中進行多次傳輸,因此本發明實施例的中間資料傳輸系統還能夠節約網路帶寬。 The intermediate data transmission system in the embodiment of the present invention directly transmits the intermediate data in the distributed system from the upper-level subtask that generates the intermediate data to the lower-level subtask through the network, without going through the distributed storage system in the middle, avoiding the need for disks The impact of the IO rate on the intermediate data transfer rate increases the intermediate data transfer rate. Moreover, the intermediate data transmission system of the embodiment of the present invention also significantly improves the parallel execution capability between the subtasks of the upper and lower levels, and shortens the overall running time of the user task. Meanwhile, in the embodiment of the present invention, the same intermediate data will not be transmitted multiple times in the network, so the intermediate data transmission system in the embodiment of the present invention can also save network bandwidth.

圖12為本發明實施例九中分布式系統的結構框圖。如圖12所示,本實施例中,分布式系統1200可以包括中間資料傳輸系統。該中間資料傳輸系統可以是本發明前述實施例中的任一種中間資料傳輸系統。 FIG. 12 is a structural block diagram of a distributed system in Embodiment 9 of the present invention. As shown in FIG. 12 , in this embodiment, the distributed system 1200 may include an intermediate data transmission system. The intermediate data transmission system may be any of the intermediate data transmission systems in the foregoing embodiments of the present invention.

本發明實施例中的分布式系統中包括中間資料傳輸系統,將分布式系統中的中間資料由產生該中間資料的上級子任務通過網路直接傳輸給下級子任務,中間不需要經過分布式儲存系統,避免了磁盤IO速率對中間資料傳輸速率的影響,提高了中間資料傳輸速率。而且,本發明實施例的分布式系統,還能夠顯著提高上下級子任務之間的並行執行能力,縮短用戶任務整體的運行時間。同時,在本發明實施例中,同樣的中間資料不會在網路中進行多次傳輸,因此本發明實施例的分布式系統還能夠節約網路帶 寬。 The distributed system in the embodiment of the present invention includes an intermediate data transmission system, and the intermediate data in the distributed system is directly transmitted from the upper-level subtask that generates the intermediate data to the lower-level subtask through the network, without going through distributed storage in the middle. The system avoids the impact of the disk IO rate on the intermediate data transfer rate, and improves the intermediate data transfer rate. Moreover, the distributed system of the embodiment of the present invention can also significantly improve the parallel execution capability between the subtasks of the upper and lower levels, and shorten the overall running time of the user task. At the same time, in the embodiment of the present invention, the same intermediate data will not be transmitted multiple times in the network, so the distributed system in the embodiment of the present invention can also save network bandwidth width.

以上所述僅為本發明的較佳實施例,並不用以限制本發明,凡在本發明的精神和原則之內,所作的任何修改、等同替換、改進等,均應包含在本發明的保護範圍之內。 The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection of the present invention. within the range.

Claims (19)

一種中間資料傳輸方法,用於分布式系統,所述分布式系統的用戶任務包括多級子任務,所述子任務中產生所述中間資料的子任務稱為上級子任務,所述子任務中依賴所述中間資料進行處理的子任務稱為下級子任務,所述方法包括:確定所述上級子任務本次要傳輸的所述中間資料,記為待傳資料;從所述下級子任務的用於監聽資料的網路端口中選擇需要接收所述待傳資料的網路端口,記為接收端口;將所述待傳資料通過網路直接傳輸給所述接收端口,其包括:在所述待傳資料由所述上級子任務的單個進程產生,所述上級子任務的每個進程對應一個第一網路端口,所述下級子任務的每個進程對應一個第二網路端口,所有第二網路端口均為接收端口的情況下,將所述待傳資料由發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口,所述發送進程為產生所述待傳資料的進程;其中,承載所述待傳資料的消息中攜帶第一進程標識和消息編號,所述第一進程標識用於表明所述消息的所述發送進程,所述消息編號用於表明所述消息是所述發送進程發送的第幾條消息。 An intermediate data transmission method, used in a distributed system, the user tasks of the distributed system include multi-level subtasks, the subtasks that generate the intermediate data in the subtasks are called upper-level subtasks, and the subtasks in the subtasks The subtasks that rely on the intermediate data for processing are called lower-level subtasks, and the method includes: determining the intermediate data to be transmitted by the upper-level subtask this time, and denoting it as the data to be transmitted; Select the network port that needs to receive the data to be transmitted among the network ports used for monitoring data, which is recorded as the receiving port; directly transmitting the data to be transmitted to the receiving port through the network, which includes: in the The data to be transmitted is generated by a single process of the upper-level subtask, each process of the upper-level subtask corresponds to a first network port, each process of the lower-level subtask corresponds to a second network port, and all the first When the two network ports are both receiving ports, the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, and the sending process is to generate the pending data. The process of transmitting data; wherein, the message carrying the data to be transmitted carries a first process identifier and a message number, the first process identifier is used to indicate the sending process of the message, and the message number is used to indicate The message is the first message sent by the sending process. 根據申請專利範圍第1項所述的中間資料傳輸方法,其中,在所述確定所述上級子任務本次要傳輸的所述 中間資料之前,還包括:啟動所述下級子任務,使所述下級子任務的用於監聽資料的網路端口處於監聽狀態;將所述下級子任務的用於監聽資料的網路端口的資訊通知給所述上級子任務;在所述下級子任務啟動之後啟動所述上級子任務,產生所述中間資料。 The intermediate data transmission method according to item 1 of the scope of the application, wherein, in the determining of the current secondary transmission of the superior subtask Before the intermediate data, it also includes: starting the lower-level subtask, so that the network port of the lower-level subtask used for monitoring data is in a monitoring state; Notify the upper-level subtask; start the upper-level subtask after the lower-level subtask is started, and generate the intermediate data. 根據申請專利範圍第1項所述的中間資料傳輸方法,其中,在將所述待傳資料由所述發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口之前,還包括:判斷所述待傳資料是否大於或等於msgSize,msgSize表示產生所述發送進程每次傳輸中的發送資料長度;在所述待傳資料大於或等於msgSize的情況下,啟動所述待傳資料的傳輸,並確定所述待傳資料的發送資料長度為msgSize;其中,msgSize=max(min(BuffSize,MaxMsgSize*n),MinMsgSize*n)/n,BuffSize表示預設的所述發送進程的可用記憶體上限,MaxMsgSize表示預設的所述發送進程在一次發送中傳輸的最大資料長度,MinMsgSize表示預設的所述發送進程在一次發送中傳輸的最小資料長度,n表示所述第二網路端口的數量,“*”表示乘以運算,“/”表示除以運算。 The intermediate data transmission method according to item 1 of the scope of the application, wherein, before the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, It also includes: judging whether the data to be transmitted is greater than or equal to msgSize, and msgSize represents the length of the data to be transmitted in each transmission of the sending process; when the data to be transmitted is greater than or equal to msgSize, start the data to be transmitted. Data transmission, and determine that the length of the data to be transmitted is msgSize; wherein, msgSize=max(min(BuffSize, MaxMsgSize*n), MinMsgSize*n)/n, and BuffSize represents the preset size of the sending process. The upper limit of available memory, MaxMsgSize represents the preset maximum data length transmitted by the sending process in one transmission, MinMsgSize represents the preset minimum data length transmitted by the sending process in one transmission, n represents the second network The number of channel ports, "*" means multiplication operation, "/" means division operation. 根據申請專利範圍第1項所述的中間資料傳輸方法,其中,在將所述待傳資料由所述發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口之後,還包括:所述第二網路端口接收到承載所述待傳資料的消息後,在所述消息攜帶的消息編號大於所述第二網路端口對應的進程記錄的當前已經從所述發送進程收到的最大消息編號的情況下,所述第二網路端口對應的進程保存所述待傳資料,並將記錄的所述最大消息編號更新為所述消息攜帶的消息編號。 The intermediate data transmission method according to item 1 of the scope of the application, wherein after the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, It also includes: after the second network port receives the message carrying the data to be transmitted, when the message number carried in the message is greater than the current record from the sending process recorded by the process corresponding to the second network port In the case of the received maximum message number, the process corresponding to the second network port saves the data to be transmitted, and updates the recorded maximum message number to the message number carried in the message. 根據申請專利範圍第1項所述的中間資料傳輸方法,其中,在將所述待傳資料由所述發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口之後,還包括:所述第二網路端口接收到承載所述待傳資料的消息後,在所述消息攜帶的消息編號小於或等於所述第二網路端口對應的進程記錄的當前已經從所述發送進程收到的最大消息編號的情況下,所述第二網路端口對應的進程丟棄所述消息,並維持記錄的所述最大消息編號不變。 The intermediate data transmission method according to item 1 of the scope of the application, wherein after the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, It also includes: after the second network port receives the message carrying the data to be transmitted, when the message number carried in the message is less than or equal to the process record corresponding to the second network port that has currently been sent from the In the case of the maximum message number received by the sending process, the process corresponding to the second network port discards the message and keeps the recorded maximum message number unchanged. 根據申請專利範圍第1項所述的中間資料傳輸方法,其中,在將所述待傳資料由所述發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口之後,還包括:在所述傳輸之後,所述發送進程接收所述第二網路端 口對應的進程返回的應答響應,所述應答響應用於表示所述第二網路端口對應的進程已經接收到所述待傳資料。 The intermediate data transmission method according to item 1 of the scope of the application, wherein after the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, It also includes: after the transmission, the sending process receives the second network terminal The response response returned by the process corresponding to the port is used to indicate that the process corresponding to the second network port has received the data to be transmitted. 根據申請專利範圍第6項所述的中間資料傳輸方法,其中,在將所述待傳資料由所述發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口之後,還包括:在所述發送進程接收所述第二網路端口對應的進程返回的所述應答響應後,允許所述發送進程通過網路向返回所述應答響應的進程傳輸下一次的資料。 The intermediate data transmission method according to item 6 of the scope of the application, wherein after the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, It also includes: after the sending process receives the response response returned by the process corresponding to the second network port, allowing the sending process to transmit the next data to the process returning the response response through the network. 根據申請專利範圍第6項所述的中間資料傳輸方法,其中,在將所述待傳資料由所述發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口之後,還包括:在所述傳輸之後,在設定時間內所述發送進程未接收到一個或多個第二網路端口對應的進程返回的所述應答響應的情況下,所述發送進程再次將所述待傳資料通過網路直接傳輸給所述一個或多個第二網路端口。 The intermediate data transmission method according to item 6 of the scope of the application, wherein after the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all the second network ports through the network, It also includes: after the transmission, if the sending process does not receive the response response returned by the process corresponding to one or more second network ports within a set time, the sending process again sends the The data to be transmitted is directly transmitted to the one or more second network ports through the network. 根據申請專利範圍第6項所述的中間資料傳輸方法,其中,所述應答響應中攜帶第二進程標識,所述第二進程標識用於表明發出所述應答響應的進程,所述發送進程根據所述第二進程標識判斷發出所述應答響應的進程。 The intermediate data transmission method according to item 6 of the scope of the application, wherein the response response carries a second process identifier, and the second process identifier is used to indicate the process that sent the response response, and the sending process is based on The second process identifier determines the process that sends the response response. 一種中間資料傳輸系統,用於分布式系統,所述分布式系統的用戶任務包括多級子任務,所述子任務中產生所述中間資料的子任務稱為上級子任務,所述子任務中 依賴所述中間資料進行處理的子任務稱為下級子任務,所述中間資料傳輸系統包括:確定模組,用於確定所述上級子任務本次要傳輸的所述中間資料,記為待傳資料;選擇模組,用於從所述下級子任務的用於監聽資料的網路端口中選擇需要接收所述待傳資料的網路端口,記為接收端口;傳輸模組,用於將所述確定模組確定的待傳資料通過網路直接傳輸給所述選擇模組選擇的接收端口,其中,所述傳輸模組包括:進程間傳輸單元,用於在所述待傳資料由所述上級子任務的單個進程產生,所述上級子任務的每個進程對應一個第一網路端口,所述下級子任務的每個進程對應一個第二網路端口,所有第二網路端口均為接收端口的情況下,將所述待傳資料由發送進程對應的第一網路端口通過網路直接傳輸給所有第二網路端口,所述發送進程指產生所述待傳資料的進程,其中,承載所述待傳資料的消息中攜帶第一進程標識和消息編號,所述第一進程標識用於表明所述消息的所述發送進程,所述消息編號用於表明所述消息是所述發送進程發送的第幾條消息。 An intermediate data transmission system, used in a distributed system, the user tasks of the distributed system include multi-level subtasks, the subtasks that generate the intermediate data in the subtasks are called upper-level subtasks, and the subtasks in the subtasks The subtasks that rely on the intermediate data for processing are called lower-level subtasks, and the intermediate data transmission system includes: a determination module for determining the intermediate data to be transmitted this time by the upper-level subtask, which is recorded as pending transmission. data; selection module, used to select the network port that needs to receive the data to be transmitted from the network ports used for monitoring data of the lower-level subtask, which is recorded as the receiving port; transmission module, used to The data to be transmitted determined by the determination module is directly transmitted to the receiving port selected by the selection module through the network, wherein the transmission module includes: an inter-process transmission unit, used for the data to be transmitted by the A single process of the upper-level subtask is generated, each process of the upper-level subtask corresponds to a first network port, each process of the lower-level subtask corresponds to a second network port, and all second network ports are In the case of a receiving port, the data to be transmitted is directly transmitted from the first network port corresponding to the sending process to all second network ports through the network, and the sending process refers to the process that generates the data to be transmitted, wherein , the message carrying the data to be transmitted carries a first process identifier and a message number, the first process identifier is used to indicate the sending process of the message, and the message number is used to indicate that the message is the The number of messages sent by the sending process. 根據申請專利範圍第10項所述的中間資料傳輸系統,其中,還包括:第一啟動模組,用於啟動所述下級子任務,使所述下 級子任務的用於監聽資料的網路端口處於監聽狀態;通知模組,與所述第一啟動模組相連,用於將所述下級子任務的用於監聽資料的網路端口的資訊通知給所述上級子任務;第二啟動模組,分別與所述通知模組和所述確定模組相連,用於在所述下級子任務啟動之後啟動所述上級子任務,產生所述中間資料。 The intermediate data transmission system according to item 10 of the scope of the application, further comprising: a first start-up module, configured to start the lower-level subtask, so as to enable the lower-level subtask The network port used for monitoring data of the subtask is in a monitoring state; the notification module is connected to the first startup module, and is used to notify the information of the network port used for monitoring data of the subtask To the upper-level subtask; the second startup module, which is respectively connected with the notification module and the determination module, is used to start the upper-level subtask after the lower-level subtask is started to generate the intermediate data . 根據申請專利範圍第10項所述的中間資料傳輸系統,其中,所述傳輸模組還包括:判斷單元,用於判斷所述待傳資料是否大於或等於msgSize,msgSize表示所述發送進程每次傳輸中的發送資料長度;啟動單元,用於在所述判斷單元的判斷結果為所述待傳資料大於或等於msgSize的情況下,啟動所述進程間傳輸單元進行所述待傳資料的傳輸,並確定所述待傳資料的發送資料長度為msgSize;其中,msgSize=max(min(BuffSize,MaxMsgSize*n),MinMsgSize*n)/n,BuffSize表示預設的所述發送進程的可用記憶體上限,MaxMsgSize表示預設的所述發送進程在一次發送中傳輸的最大資料長度,MinMsgSize表示預設的所述發送進程在一次發送中傳輸的最小資料長度,n表示所述第二網路端口的數量,“*”表示乘以運算,“/”表示除以運算。 The intermediate data transmission system according to item 10 of the scope of the application, wherein the transmission module further comprises: a judging unit for judging whether the data to be transmitted is greater than or equal to msgSize, where msgSize indicates that each time the sending process is performed The length of the data to be sent in transmission; a start-up unit is used to start the inter-process transmission unit to transmit the data to be transmitted when the judgment result of the judgment unit is that the data to be transmitted is greater than or equal to msgSize, And determine that the length of the data to be transmitted is msgSize; wherein, msgSize=max(min(BuffSize, MaxMsgSize*n), MinMsgSize*n)/n, and BuffSize represents the preset upper limit of available memory for the sending process , MaxMsgSize represents the preset maximum data length transmitted by the sending process in one transmission, MinMsgSize represents the preset minimum data length transmitted by the sending process in one transmission, n represents the number of the second network port , "*" means multiply operation, "/" means divide operation. 根據申請專利範圍第10項所述的中間資料傳輸系統,其中,所述傳輸模組還包括:保存單元,用於在所述第二網路端口接收到承載所述待傳資料的消息後,在所述消息攜帶的消息編號大於所述第二網路端口對應的進程記錄的當前已經從所述發送進程收到的最大消息編號的情況下,令所述第二網路端口對應的進程保存所述待傳資料,並將記錄的所述最大消息編號更新為所述消息攜帶的消息編號。 The intermediate data transmission system according to claim 10, wherein the transmission module further comprises: a storage unit for, after the second network port receives the message carrying the data to be transmitted, In the case that the message number carried in the message is greater than the largest message number currently received from the sending process recorded by the process corresponding to the second network port, make the process corresponding to the second network port save the data to be transmitted, and update the recorded maximum message number to the message number carried in the message. 根據申請專利範圍第10項所述的中間資料傳輸系統,其中,所述傳輸模組還包括:丟棄單元,用於在所述第二網路端口接收到承載所述待傳資料的消息後,在所述消息攜帶的消息編號小於或等於所述第二網路端口對應的進程記錄的當前已經從所述發送進程收到的最大消息編號的情況下,令所述第二網路端口對應的進程丟棄所述消息,並維持記錄的所述最大消息編號不變。 The intermediate data transmission system according to claim 10, wherein the transmission module further comprises: a discarding unit, configured to: after the second network port receives the message carrying the data to be transmitted, In the case that the message number carried in the message is less than or equal to the maximum message number currently received from the sending process recorded by the process corresponding to the second network port, make the number corresponding to the second network port The process discards the message and maintains the maximum message number recorded. 根據申請專利範圍第10項所述的中間資料傳輸系統,其中,所述傳輸模組還包括:響應接收單元,用於在所述進程間傳輸單元進行傳輸之後,令所述發送進程接收所述第二網路端口對應的進程返回的應答響應,所述應答響應用於表示所述第二網路端口對應的進程已經接收到所述待傳資料。 The intermediate data transmission system according to claim 10, wherein the transmission module further comprises: a response receiving unit, configured to make the sending process receive the A response response returned by the process corresponding to the second network port, where the response response is used to indicate that the process corresponding to the second network port has received the data to be transmitted. 根據申請專利範圍第15項所述的中間資料傳輸系統,其中,所述傳輸模組還包括: 許可單元,用於在所述響應接收單元接收到所述應答響應後,允許所述發送進程通過網路向返回所述應答響應的進程傳輸下一次的資料。 The intermediate data transmission system according to claim 15, wherein the transmission module further comprises: The permitting unit is configured to allow the sending process to transmit the next data to the process that returns the response response through the network after the response receiving unit receives the response response. 根據申請專利範圍第15項所述的中間資料傳輸系統,其中,所述傳輸模組還包括:重傳單元,用於在所述進程間傳輸單元進行傳輸之後,在設定時間內所述響應接收單元未接收到一個或多個第二網路端口對應的進程返回的所述應答響應的情況下,令所述發送進程再次將所述待傳資料通過網路直接傳輸給所述一個或多個第二網路端口。 The intermediate data transmission system according to claim 15, wherein the transmission module further comprises: a retransmission unit for receiving the response within a set time after the inter-process transmission unit performs transmission In the case that the unit does not receive the response response returned by the process corresponding to one or more second network ports, the sending process is made to directly transmit the data to be transmitted to the one or more second network ports again. Second network port. 根據申請專利範圍第15項所述的中間資料傳輸系統,其中,所述應答響應中攜帶第二進程標識,所述第二進程標識用於表明發出所述應答響應的進程,所述發送進程根據所述第二進程標識判斷發出所述應答響應的進程。 The intermediate data transmission system according to item 15 of the scope of the application, wherein the response response carries a second process identifier, and the second process identifier is used to indicate the process that sent the response response, and the sending process is based on The second process identifier determines the process that sends the response response. 一種分布式系統,包括申請專利範圍第10至18項任一項所述的中間資料傳輸系統。 A distributed system includes the intermediate data transmission system described in any one of items 10 to 18 of the patent application scope.
TW106103972A 2016-02-14 2017-02-07 Method, system and distributed system for intermediate data transmission TWI752005B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610084419.9A CN107087010B (en) 2016-02-14 2016-02-14 Intermediate data transmission method and system and distributed system
CN201610084419.9 2016-02-14

Publications (2)

Publication Number Publication Date
TW201732632A TW201732632A (en) 2017-09-16
TWI752005B true TWI752005B (en) 2022-01-11

Family

ID=59562909

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106103972A TWI752005B (en) 2016-02-14 2017-02-07 Method, system and distributed system for intermediate data transmission

Country Status (3)

Country Link
CN (1) CN107087010B (en)
TW (1) TWI752005B (en)
WO (1) WO2017136999A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8077602B2 (en) * 2008-02-01 2011-12-13 International Business Machines Corporation Performing dynamic request routing based on broadcast queue depths
CN103279390A (en) * 2012-08-21 2013-09-04 中国科学院信息工程研究所 Parallel processing system for small operation optimizing
US20150286492A1 (en) * 2014-04-07 2015-10-08 International Business Machines Corporation Optimized resource allocation and management in a virtualized computing environment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621429B (en) * 2009-07-20 2012-05-23 中兴通讯股份有限公司 Method and system for monitoring messages
US8346845B2 (en) * 2010-04-14 2013-01-01 International Business Machines Corporation Distributed solutions for large-scale resource assignment tasks
CN102655480B (en) * 2011-03-03 2015-12-02 腾讯科技(深圳)有限公司 Similar mail treatment system and method
CN102508704A (en) * 2011-11-10 2012-06-20 上海市共进通信技术有限公司 Method for implementing task decomposition and parallel processing in computer software system
US9477731B2 (en) * 2013-10-01 2016-10-25 Cloudera, Inc. Background format optimization for enhanced SQL-like queries in Hadoop
CN104063486B (en) * 2014-07-03 2017-07-11 四川中亚联邦科技有限公司 A kind of big data distributed storage method and system
CN104035817A (en) * 2014-07-08 2014-09-10 领佰思自动化科技(上海)有限公司 Distributed parallel computing method and system for physical implementation of large scale integrated circuit
CN105138679B (en) * 2015-09-14 2018-11-13 桂林电子科技大学 A kind of data processing system and processing method based on distributed caching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8077602B2 (en) * 2008-02-01 2011-12-13 International Business Machines Corporation Performing dynamic request routing based on broadcast queue depths
CN103279390A (en) * 2012-08-21 2013-09-04 中国科学院信息工程研究所 Parallel processing system for small operation optimizing
US20150286492A1 (en) * 2014-04-07 2015-10-08 International Business Machines Corporation Optimized resource allocation and management in a virtualized computing environment
US9405572B2 (en) * 2014-04-07 2016-08-02 International Business Machines Corporation Optimized resource allocation and management in a virtualized computing environment

Also Published As

Publication number Publication date
CN107087010A (en) 2017-08-22
CN107087010B (en) 2020-10-27
WO2017136999A1 (en) 2017-08-17
TW201732632A (en) 2017-09-16

Similar Documents

Publication Publication Date Title
EP3637733B1 (en) Load balancing engine, client, distributed computing system, and load balancing method
US9264371B2 (en) Router, method for controlling the router, and computer program
US9426099B2 (en) Router, method for controlling router, and program
CN105812287B (en) Efficient circuit in packet switching network
JP5335892B2 (en) High-speed virtual channel for packet-switched on-chip interconnect networks
CN101616083B (en) Message forwarding method and device
WO2017133623A1 (en) Data stream processing method, apparatus, and system
US11496416B2 (en) Enhance communication of network traffic
CN108270676B (en) Network data processing method and device based on Intel DPDK
CN102047619B (en) Methods, systems, and computer readable media for dynamically rate limiting slowpath processing of exception packets
CN110391873B (en) Method, apparatus and computer program product for determining a data transfer mode
CN110708377A (en) Data transmission method, device and storage medium
WO2019056771A1 (en) Distributed storage system upgrade management method and device, and distributed storage system
CN103986585A (en) Message preprocessing method and device
CN113703954A (en) Message backup method and device, electronic equipment and computer storage medium
US20090007133A1 (en) Balancing of Load in a Network Processor
CN108259390B (en) Priority pushing method and device for virtual channels in interconnection bus
WO2014101502A1 (en) Memory access processing method based on memory chip interconnection, memory chip, and system
TWI752005B (en) Method, system and distributed system for intermediate data transmission
WO2020014115A1 (en) System and method for data transmission in distributed computing environments
CN115361332A (en) Processing method and device for fault-tolerant routing, processor and electronic equipment
CN114500544A (en) Method, system, equipment and medium for load balancing among nodes
WO2020235055A1 (en) Virtual machine monitoring device, virtual machine monitoring method, and program
CN103825842A (en) Data flow processing method and device for multi-CPU system
JP7223591B2 (en) Data migration management device, data migration management program, and data migration management method