CN102201992B

CN102201992B - Stream processor parallel environment-oriented data stream communication system and method

Info

Publication number: CN102201992B
Application number: CN2011101357760A
Authority: CN
Inventors: 陈庆奎; 那丽春; 曹欢欢; 郝聚涛; 霍欢; 赵海燕; 庄松林; 丁晓东
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2011-05-25
Filing date: 2011-05-25
Publication date: 2013-09-25
Anticipated expiration: 2031-05-25
Also published as: CN102201992A

Abstract

The invention discloses a stream processor parallel environment-oriented data stream communication system and method, which relate to the technical field of parallel communication and are used for solving the technical problem that the inconvenience in developing a CUDA (Compute Unified Device Architecture) and MPI (Message Passing Interface) hybrid program is unbeneficial to real-time processingof mass data streams. Each computation node in the system is provided with a CPU (Central Processing Unit) computation component, GPU (Graphics Processing Unit) computation component, a dynamic pipeline pool as well as an MPI receiving component and an MPI transmitting component for MPI communication; a node resource table, a component resource table and a current communication pipeline mapping table are arranged in each computation node; communication among computation node components is performed by using the dynamic pipeline pool; and the communication data unit is used for transmitting. The communication system and the communication method thereof provided by the invention can effectively support the construction of a large-scale information processing parallel environment by using the MPI and a stream processor computer, contribute to system program development of programmers, and are suitable for constructing cloud computation nodes and processing a large scale of data streams in real time.

Description

Data flow communication system and communication means thereof towards the stream handle parallel environment

Technical field

The present invention relates to data communication technology, particularly relate to a kind of data flow communication system towards the stream handle parallel environment and the technology of communication means thereof.

Background technology

Along with popularizing that fast development of information technology and Internet technology are used, network has become a part important in people's daily life.In recent years, 3G network and technology of Internet of things are used life, the work progressively expand into people and are brought great convenience.Yet the extensive use of these new technologies brings the magnanimity information processing problem and how to improve the extensive new challenge of tenability in real time.The novel calculation element that with the stream handle is representative is to solve these difficulties to have brought dawn.Stream handle is the general name that GPU moves towards the general-purpose computations field, adopts up to a hundred Stream processor parallel architectures to gain the name because it is inner, and its performance is very outstanding, is that its Floating-point Computation performance of example has surpassed Tfloaps with GTX480.The at present world main flow video card NVIDIA of manufacturer, AMD(ATI), company such as the Intel research and the general-purpose computations research field that all actively drop into stream handle.The general-purpose computations research field ripe programming model the CUDA(Compute Unified Device Architecture of NVIDIA is arranged) and the CTM(Close To Metal of AMD), these technology have been widely used fields such as image processing, recreation, 3D processing, science calculating, data mining.According to traditional concurrent technique, utilize computer network the computer of several configuration flow processors to be coupled together can to make up more powerful parallel information treatment system be a very important job.Thereby the stream handle cluster not only can make up more massive treatment system, main is, it can use than traditional C PU cluster computing node still less and make up the computing system more powerful hundreds of times than CPU cluster, and its energy resource consumption reduces greatly, and has strengthened the stability of system greatly.Make up cluster except physical network, also need the support of effective parallel communications agreement, can bring into play the ability of its parallel processing.MPI has become parallel environment and has made up the important communication system in field, by academia and industrial quarters extensive use.

But, existing based on the processing environment of each computing node in the network of stream handle and the communication environment more complicated between each computing node, make CUDA(Compute Unified Device Architecture) programmer and CTM(Close To Metal) programmer needs the processing details of first detail knowledge computing node and the communication details between each computing node when exploitation parallel communications program and parallel flow processor program, therefore be unfavorable for the exploitation of system program, nor be beneficial to the real-time processing of mass data flow (as the 3G video flowing).

Summary of the invention

At the defective that exists in the above-mentioned prior art, it is convenient that technical problem to be solved by this invention provides a kind of system program exploitation, and be beneficial to data flow communication system and the communication means thereof towards the stream handle parallel environment that mass data flow is handled in real time.

In order to solve the problems of the technologies described above, a kind of data flow communication system towards the stream handle parallel environment provided by the present invention is characterized in that: comprise stream handle parallel computation physical environment, node resource table, parts resource table, stream communication pipeline mapping table;

Described stream handle parallel computation physical environment is a four-tuple SPPE(PCS, MASTER, and SOFS, NS); Wherein, SPPE is stream handle parallel computation physical environment, PCS={ C1, C2 ..., Cc } and be the set of the computing node among the SPPE, MASTER is the master control computing node among the SPPE, SOFS is the set of the software among the SPPE, and NS is the set of the interference networks among the SPPE, and is furnished with message passing interface MPI communication environment at NS;

Described computing node disposes CPU calculating unit, GPU calculating unit, dynamic pipeline pond, reach the MPI receiving-member, the MPI transmit block that are used for MPI communication, described CPU calculating unit, GPU calculating unit, MPI receiving-member, MPI transmit block are the broad sense calculating unit, and described GPU calculating unit is handled computing equipment for stream;

Described dynamic pipeline pond is a four-tuple DPP(ID, CPS, and PPS, PipeM); Wherein, DPP is dynamic pipeline pond, ID is the identifier of DPP, CPS={ CP1, CP2,, CPm } and be the pipeline set of the publicly-owned pipeline among the DPP, PPS={ PN1, PN2,, PNn } and be the pipeline set of the privately owned pipeline among the DPP, be the proprietary pipeline that the broad sense calculating unit reads information, PipeM is the pipeline management parts of DPP, described publicly-owned pipeline and privately owned pipeline are the data flow path of one-way flow, and described CPS is divided into two groups, are respectively CPSM duct bank and CPSS duct bank, the CPSM duct bank is used for receiving message, the CPSS duct bank is used for sending the pipeline binding that message is communicated by letter with traffic flow information, and each broad sense calculating unit all is furnished with a privately owned pipeline, is used for receiving message and data flow;

Described broad sense calculating unit is a five-tuple GP(ID, RP, and WP, PN, SP); Wherein, GP is the broad sense calculating unit, and ID is the component identifier of GP, and RP is the pipeline process of reading of GP; WP is the publicly-owned pipeline process of writing of GP, and PN is the privately owned pipeline of GP, SP=P1, and p2, p3 ..., Pp } and be the set of each the narrow sense calculating unit among the GP;

Described node resource table is a bivariate table NTA(Nid, Nname, and Nip Ntype), records the nodal information of all computing nodes among the SPPE; Wherein, NTA is the node resource table, Nid by the node identifier of record computing node, Nname by the namespace node of record computing node, be used for communication identifier, Nip for the node IP address of record computing node, for the configuration of MPI environment, Ntype by the node type of record computing node, what be used for showing the computing node of putting down in writing is common computing node or master control computing node;

Described parts resource table is a bivariate table PTA(Pid, Pname, and Ptype PN), records the component information of all broad sense calculating units among the SPPE; Wherein, PTA is the parts resource table, Pid by the component identifier of record parts, Pname by the parts name of record parts, be used for communication identifier, Ptype for the unit type of record parts, for showing that these parts are CPU calculating unit, GPU calculating unit, still communication component, PN by the privately owned pipeline of record parts;

Described stream communication pipeline mapping table is a bivariate table MTA(mPid, group, and sno, PipeA, PipeB); Wherein, MTA is stream communication pipeline mapping table, and mPid is identifier, and group is the data flow communication group number, is used for communication identifier, and sno is sequence number, and PipeA is datastream source pipe Taoist monastic name, and PipeB is data flow target tube Taoist monastic name;

Described NTA, PTA, MTA all keep a copy in each computing node, and adopt strong consistency to safeguard;

Communication between each parts is all carried by data communication units, and described communication data unit is ten tuple CDU(id, Sno, and Segno, SourceN, SourceP, DestNS, DestPS, type, COM, DATA); Wherein, CDU is the communication data unit, id is the identifier of this CDU, Sno, Segno is all for data flow communication, Sno is the Stream Number of this CDU, Segno is the data segment number of this CDU, SourceN is communication source computing node, and SourceP is the parts that send of this CDU, and DestNS is the set of the target computing node of this CDU, DestPS is the set of the target component of this CDU, be the set that receives the parts of this CDU in the target computing node, type is the communication data unit classification of this CDU, total data flow, command messages, short message three kinds, COM is communications command, and DATA is message data.

The communication means of the data flow communication system towards the stream handle parallel environment provided by the present invention is characterized in that:

SP in each broad sense calculating unit carries out local calculation task, and receives the information that the RP in this broad sense calculating unit sends here and handle;

If the broad sense calculating unit of transmitting terminal is GP1, the broad sense calculating unit of receiving terminal is GP2, and INFO is message, and SINFO is data flow;

When the SP among the GP1 will send INFO to GP2, inquiry PTA obtained the ID of GP2 earlier, again the ID of GP2 and INFO was submitted to the WP of GP1, and then handled the subsequent calculations task;

When the SP among the GP1 will send SINFO to GP2, inquiry PTA obtained the ID of GP2 earlier, again the ID of GP2 and SINFO is submitted to the WP of GP1, and waits for to the SINFO sign off and handle the subsequent calculations task again;

After WP among the GP1 receives the ID and communication data of the GP2 that the SP among this GP submits to, judge the type of the communication data of receiving earlier, if the communication data that the WP among the GP1 receives is INFO, then create a CDU, from NTA and PTA, obtain the nodal information of local computing node according to the ID of GP1 again and the component information of GP1 is filled up among the SourceN and SourceP of CDU, and communications command is inserted the COM of CDU, INFO is inserted the DATA of CDU, the type of CDU is extended this as the short message classification, obtain a local dynamically publicly-owned pipeline in pipeline pond then and be designated as CP, and this CDU is write CP; If the communication data that the WP among the GP1 receives is SINFO, then be designated as PP according to the ID of GP2 from the privately owned pipeline that NTA and PTA obtain GP2, and a publicly-owned pipeline that obtains local dynamic pipeline pond is designated as CP, then CP, the ID of PP and GP1 issues local PipeM together, PipeM receives CP, in MTA, set up a mapping tuple mtat behind the ID of PP and GP1 with CP and PP binding, and CP is set to busy state, then mapping tuple mtat is mail to the computing node at GP2 place, and then SINFO is segmented into a plurality of data segments, and be that each data segment creates a CDU, each data segment is inserted the DATA of each CDU respectively, Stream Number is set in regular turn and the data segment number is inserted respectively among the Sno and Segno of each CDU according to the segmentation order for each data segment, then each CDU is write CP one by one by the segmentation order, finish until the SINFO transmission; Behind the SINFO sign off, the WP among the GP1 sends a request to the PipeM of this locality and makes PipeM will shine upon tuple mtat to cancel, and release CP;

The privately owned pipeline of the RP scanning in each broad sense calculating unit oneself, if RP scans the CDU that command messages or short message classification are arranged in oneself the privately owned pipeline, then extract id and the DATA among the CDU and be transferred to SP in this broad sense calculating unit, if RP scans the CDU that stream class is arranged in oneself the privately owned pipeline, and get access to the mapping tuple mtat corresponding with this CDU, then RP identifies with mapping tuple mtat each CDU that receives, and carry out data flow by the Sno of the CDU that receives and Segno and restore, and pass to SP in this broad sense calculating unit one by one;

Each publicly-owned pipeline among the local CPS of PipeM scan round in each dynamic pipeline pond, if in the publicly-owned pipeline among the local CPS CDU is arranged, judge earlier then whether the target component among this CDU is local parts, if the target component among this CDU is local parts then obtains the corresponding privately owned pipeline of target component according to the DestPS among this CDU from PTA, then this CDU transferred to the corresponding privately owned pipeline of target component; If the target component of this CDU is remote units then this CDU is write in the privately owned pipeline of local MPI transmit block;

Each MPI receiving-member receives the CDU from the NS among the SPPE in real time, and obtains the component information of target component and obtain its privately owned pipeline from PTA according to the DestPS among the CDU that receives, then the CDU that receives is write in the privately owned pipeline of target component;

The privately owned pipeline of each MPI transmit block scan round oneself, if scan in oneself the privately owned pipeline CDU to be sent arranged, then according to PTA, the DestPS of MTA and this CDU obtains the component information of nodal information and the target component of target computing node, then the nodal information of this computing node is inserted the SourceN of this CDU, the component information that this CDU is sent parts is inserted the SourceP of this CDU, the nodal information of target computing node is inserted the DestNS of this CDU, the component information of target component is inserted the DestPS of this CDU, and fill in type and the COM of this CDU, then this CDU is sent to the target computing node.

Data flow communication system and communication means thereof towards the stream handle parallel environment provided by the invention, each computing node in the system all disposes the CPU calculating unit, the GPU calculating unit, dynamic pipeline pond, reach the MPI receiving-member that is used for MPI communication, the MPI transmit block, and in each computing node, be equipped with the node resource table, the parts resource table, stream communication pipeline mapping table, each broad sense calculating unit all is furnished with a privately owned pipeline, communication between each computing node parts is all undertaken by dynamic pipeline pond, made things convenient for the communication between each computing node, being beneficial to mass data flow handles in real time, and in system program when exploitation, have nothing to do with concrete stream handle programmed environment, CUDA and CTM programmer need not to understand the processing details of computing node and the communication details between each computing node, only need correctly fill in communication data packet, reduced the programming complexity, and stream handle program and central processing unit program can independently be write exploitation separately, therefore also are beneficial to the exploitation of system program.

Description of drawings

Fig. 1 is the communication scheme of communicating by letter between two computing nodes in the data flow communication system of stream handle parallel environment of the embodiment of the invention.

Embodiment

Below in conjunction with description of drawings embodiments of the invention are described in further detail, but present embodiment is not limited to the present invention, every employing analog structure of the present invention and similar variation thereof all should be listed protection scope of the present invention in.

As shown in Figure 1, a kind of data flow communication system towards the stream handle parallel environment that the embodiment of the invention provides is characterized in that: comprise stream handle parallel computation physical environment, node resource table, parts resource table, stream communication pipeline mapping table;

Described stream handle parallel computation physical environment is a four-tuple SPPE(PCS, MASTER, and SOFS, NS); Wherein, SPPE is stream handle parallel computation physical environment, PCS={ C1, C2 ..., Cc } and be the set of the computing node among the SPPE, MASTER is the master control computing node among the SPPE, SOFS is the set of the software among the SPPE, and NS is the set of the interference networks among the SPPE, and is furnished with the MPI communication environment at NS;

Described computing node disposes CPU calculating unit Np, GPU calculating unit SPP, dynamic pipeline pond, reach the MPI receiving-member MR, the MPI transmit block MS that are used for MPI communication, described CPU calculating unit Np, GPU calculating unit SPP, MPI receiving-member MR, MPI transmit block MS are the broad sense calculating unit, and described GPU calculating unit SPP handles computing equipment for stream;

Described parts resource table is a bivariate table PTA(Pid, Pname, and Ptype PN), records the component information of all broad sense calculating units among the SPPE; Wherein, PTA is the parts resource table, Pid by the component identifier of record parts, Pname by the parts name of record parts, be used for communication identifier, Ptype for the unit type of record parts, for showing that these parts are CPU calculating unit Np, GPU calculating unit SPP, still communication component, PN by the privately owned pipeline of record parts;

Fig. 1 is the communication scheme of communicating by letter between two computing nodes in the data flow communication system of stream handle parallel environment of the embodiment of the invention, real directed line among this figure is that privately owned pipeline is read direction, empty directed line is that publicly-owned pipeline is write direction, and thick oriented arrow represents that the communication information is forwarded in the privately owned pipeline of each broad sense calculating unit by publicly-owned pipeline.

As shown in Figure 1, the communication means towards the data flow communication system of stream handle parallel environment that the embodiment of the invention provides is characterized in that:

Each publicly-owned pipeline among the local CPS of PipeM scan round in each dynamic pipeline pond, if in the publicly-owned pipeline among the local CPS CDU is arranged, judge earlier then whether the target component among this CDU is local parts, if the target component among this CDU is local parts then obtains the corresponding privately owned pipeline of target component according to the DestPS among this CDU from PTA, then this CDU transferred to the corresponding privately owned pipeline of target component; If the target component of this CDU is remote units then this CDU is write in the privately owned pipeline of local MPI transmit block MS;

Each MPI receiving-member MR receives the CDU from the NS among the SPPE in real time, and obtains the component information of target component and obtain its privately owned pipeline from PTA according to the DestPS among the CDU that receives, then the CDU that receives is write in the privately owned pipeline of target component;

The privately owned pipeline of each MPI transmit block MS scan round oneself, if scan in oneself the privately owned pipeline CDU to be sent arranged, then according to PTA, the DestPS of MTA and this CDU obtains the component information of nodal information and the target component of target computing node, then the nodal information of this computing node is inserted the SourceN of this CDU, the component information that this CDU is sent parts is inserted the SourceP of this CDU, the nodal information of target computing node is inserted the DestNS of this CDU, the component information of target component is inserted the DestPS of this CDU, and fill in type and the COM of this CDU, then this CDU is sent to the target computing node.

Below further specify the communication process of the embodiment of the invention with the communication between two computing nodes:

If the computing node of communication transmitting terminal is computing node 1, the computing node of communication receiver is computing node 2, it is GPU calculating unit SPP1 that CDU in the computing node 1 sends parts, CDU receiving-member in the computing node 2 is GPU calculating unit SPP2, the CDU that the GPU calculating unit SPP 1 of computing node 1 sends is sent to the publicly-owned pipeline CP of local computing node by the WP parts of this GPU calculating unit SPP 1; Local PipeM sends to this CDU the privately owned pipeline PN of local MPI transmit block MS; Local MPI transmit block MS reads the CDU among the own privately owned pipeline PN, fills MPI network service information, uses MPI to send primitive sends to CDU computing node 2 by the MPI network MPI receiving-member MR; The MPI receiving-member MR of computing node 2 extracts CDU and sends among the publicly-owned pipeline CP of computing node 2, and the PipeM of computing node 2 is forwarded to the privately owned pipeline PN of GPU calculating unit SPP2 to it according to the CDU header information; The RP of GPU calculating unit SPP2 reads the privately owned pipeline PN of GPU calculating unit SPP2, and is transmitted to the SP of GPU calculating unit SPP2 behind the extraction CDU.

In the practical application, utilize the embodiment of the invention to make up 3G network mass monitoring system based on the stream handle parallel environment, this system is made of 2 function clusters: 9 computers constitute the intensive CPU cluster of I/O task, 9 stream handle machines constitute computation-intensive GPU clusters, and cluster is formed by connecting for the gigabit ethernet switch of 48Gbps by 2 stylobate strips are wide; All computing nodes are interconnected with MPI communication protocol, each computing node configuration UBANTU10 operating system, and all disposed dynamic pipeline pond, MTA, PTA, NTA, PipeM, MPI receiving-member MR, MPI transmit block MS, stream handle adopts NVIDIA GTX480 and configuration CUDA driver, development environment, this system employs CPU cluster is 3G video flowing parameter extraction H.264, with the GPU cluster 10080 video image parameter are calculated then that (image restoring becomes YUV, ambiguity, blocking effect, smoothness is analyzed), realized 10080 3G video flowings are carried out the video quality analysis, practice shows that be 1.5 seconds the average time to the analyzing and processing of 10080 video images, and this parallel processing environment has satisfied telecommunications to requirement and the performance demands of the analytical scale of 3G video; About data flow communication, each CPU clustered node has a GPU clustered node of correspondence with it, between these two nodes, there are 1120 video flowings to carry out the data flow transmission, utilize the performance monitor of UBANTU to see, in the flow data peak transmission stage, CPU and GPU node between the communication bandwidth of data flow can reach 80MBps, namely can reach the communication capacity of 640Mbps, show that this communication system has satisfied the communication performance requirement of 3G network mass monitoring system.

Claims

1. the data flow communication system towards the stream handle parallel environment is characterized in that: comprise stream handle parallel computation physical environment, node resource table, parts resource table, stream communication pipeline mapping table;

2. the communication means of the data flow communication system towards the stream handle parallel environment according to claim 1 is characterized in that: