CN103679789A - Parallel rendering and visualization method and system based on data flow diagram - Google Patents

Parallel rendering and visualization method and system based on data flow diagram Download PDF

Info

Publication number
CN103679789A
CN103679789A CN201310659788.2A CN201310659788A CN103679789A CN 103679789 A CN103679789 A CN 103679789A CN 201310659788 A CN201310659788 A CN 201310659788A CN 103679789 A CN103679789 A CN 103679789A
Authority
CN
China
Prior art keywords
data
link
parameter
computing unit
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310659788.2A
Other languages
Chinese (zh)
Other versions
CN103679789B (en
Inventor
徐泽骅
李胜
汪国平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing weishiwei Information Technology Co.,Ltd.
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201310659788.2A priority Critical patent/CN103679789B/en
Publication of CN103679789A publication Critical patent/CN103679789A/en
Application granted granted Critical
Publication of CN103679789B publication Critical patent/CN103679789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a three-dimensional diagram parallel rendering and visualization method and system based on a data flow diagram. Basic computing elements and tools for assembling the computing elements in a parallelization mode are provided for a developer by explicitly constructing the technology with an assembly line driven by the data flow diagram, and the developer can combine the computing elements by himself to achieve the parallel rendering and visualization procedures which can meet task requirements. The three-dimensional diagram parallel rendering and visualization system aims at providing computing element interfaces with excellent extendibility and efficient data linksbetween the computing elements, the computing elements commonly used in parallel rendering tasks are achieved, and efficient parallelization of the complex multi-pass rendering and visualization tasks can be achieved only by properly assembling the internally-arranged computing elements. According to the three-dimensional diagram parallel rendering and visualization method and system, parallelization of the rendering and visualization algorithm based on multi-pass positive rendering becomes possible, the effective parallelization framework and the effective procedures are provided for the parallelization, and the problem that efficient parallelization is difficult to achieve through various current rendering and visualization algorithms with the requirement for multi-pass positive rendering is solved.

Description

Parallel drawing based on data flow diagram and method for visualizing and system
Technical field
The present invention relates to a kind of novel parallel drawing method and system that the GPU aggregated structure of drafting/rendering task (especially towards) realizes on computer cluster, be specifically related to the parallel drawing and visual system, method and the system that based on data flow diagram, drive.Both can be for the Realistic Rendering of three-dimensional model scene or the drafting of complicated special efficacy, also can be used for the drafting of various information in virtual reality system and visual, simultaneously can also be for three-dimensional or two-dimentional geography information visual, be used for the visual of various volume datas, belong to computer graphical and process and visual field.
Background technology
Traditional parallel drawing framework, as Equalizer(specifically can be referring to Stefan Eilemann, Maxim Makhinya, and Renato Pajarola. " Equalizer:A Scalable Parallel Rendering Framework. " In:IEEE Trans.Vis.Comput.Graph.15.3 (2009), pp.436 – 452.) and DRONE/URay(specifically can be referring to Michael Repplinger et al. " DRONE:A Flexible Framework for Distributed Rendering and Display " .In:International Symposium on Visual Computing.2009, pp.975 – 986.) and
(specifically can be referring to Michael Repplinger et al. " A flexible adaptation service for distributed rendering " .In:Proceedings of the9th Eurographics conference on Parallel Graphics and Visualization.EG PGV ' 09.Munich, Germany:Eurographics Association, 2009, pp.49 – 56.isbn:978-3-905674-15-6.) etc., often adopt the transfer behavior of parallel drawing and visual calculation process and intermediate data is hidden in to framework inside, only provide based on model data and divide (being sort-last), screen space is divided (being sort-first), sub-pixel is divided and some other parallel drawing pattern based on data parallel meets framework user's demand.The advantage of this method for designing is that framework can be realized desired function easily when developer's parallel drawing mission requirements conform to these patterns.But its shortcoming is also fairly obvious.The internal act of framework is the user's shielding to framework completely, does not leave the space of expansion, the ability that developer does not almost redevelop to system.The common more complicated of algorithm of the real-time Realistic Rendering of three-dimensional scenic is changeable, is easy to run into the fixing insurmountable situation of several parallel drawing patterns of simple dependence in the process of existing rendering algorithm being carried out to parallelization transformation.In fact, nearly all multipass (multi-pass) that need to carry out is drawn the senior rendering algorithm operating, as echo (specifically can be referring to Lance Williams. " Casting curved shadows on curved surfaces " .In:SIGGRAPH Comput.Graph.12.3 (Aug.1978), pp.270 – 274.), deferred lighting (specifically can be referring to Michael Deering et al. " The triangle processor and normal vector shader:a VLSI system for high performance graphics " .In:SIGGRAPH Comput.Graph.22.4 (June1988), pp.21 – 30.) scheduling algorithm, all cannot use the simple means based on data parallel to carry out parallelization.
Once original drawing mode cannot meet framework user's parallel drawing mission requirements in framework, unique selection is exactly to revise framework itself.And the degree of coupling of inner each assembly of framework is also often very high, very large by the final developer difficulty of modifying.The original idea of design framework is in order to facilitate the exploitation of program, yet uses in this case framework to increase on the contrary developer's burden, and makes the inner structure of calling program complicated, and reliability reduces.
Summary of the invention
In order to solve the problems of the technologies described above, the present invention proposes a kind of being easy at computer cluster, a novel parallel drawing method and system framework of especially realizing on GPU cluster.This novel parallel drawing method and system framework allows developer drafting task to be configured to the form of partial data flow graph, comprises many basic computing units in flow graph.These computing units are by using the developer of this framework to define respectively, and the data link object that Help of System framework provides links together, the drafting with complete function and visualization pipeline that composition data flow graph drives.Computing unit in streamline can be distributed in and calculate on computing nodes different in cluster, and with thread, for the most fine-grained parallel unit executed in parallel, the data transmission between computing unit is completed efficiently by framework completely.This method and system architecture can be for the parallelizations of many draftings and visualized algorithm, be particularly suitable for carrying out the real-time sense of reality special efficacy rendering algorithm of multipass drafting, as echo, postpone painted, multilayer Depth Peeling, global illumination (as SSAO, SSDO etc.), sub-macro surface illumination special efficacy are drawn in real time, Subsurface Scattering, Real-Time Optical are propagated scheduling algorithm in real time.
This method and system have solved currently variously need to carry out the drafting of multipass forward drafting and the difficult problem that visualized algorithm is difficult to parallelization.As echo, postpone painted, multilayer Depth Peeling, approximate global illumination (as SSAO, SSDO etc.), sub-macro surface illumination special efficacy drafting, real-time Subsurface Scattering scheduling algorithm, because the implementation procedure of these algorithms is not only to rely on to carry out converting two dimensional image to from three-dimensional model or scene and can completing of one step, the plot step that must carry out multipass just can realize.Wherein, there are many difference in each definition and data layout all over the data input and output of plot step, each is different all over form and the kind of parameter required in drawing process, each varies all over calculating performed in drawing process and drafting operation, current plot step is different with the data dependence relation of a upper plot step, that priority execution sequence each other of multipass plot step and relation exist is uncertain (some can exchange plot order each other all over drawing, and some can not exchanging order all over drawing each other).The drafting of above-mentioned Various Complex and visual in situation all hindered and based on multipass forward, drawn tactful drafting and the parallelization of visualized algorithm at present, and in the present invention, the parallel drawing based on data flow diagram and method for visualizing have exactly solved above-mentioned problem, make the drafting of drawing based on multipass forward and the parallelization of visualized algorithm become possibility, for it provides effective parallelization framework and flow process.
Here need to distinguish a bit, this method is not suitable for the parallelization of reverse method for drafting.So-called forward is drawn to typically refer to and is followed the general assembly line standard of existing 3 D image drawing: from three-dimensional geometry data, through the operation based on summit and pel assembling (the main three-dimensional of carrying out is to two-dimentional projective transformation), rasterisation, the flow process of a series of actions synthetic images such as pixel operation.Reverse drafting refers to take what ray tracing method was representative, from each pixel, calculates one by one the method for drafting of the color value of single pixel, and the method is not followed the general streamline operative norm that forward is drawn completely, therefore claims reverse drafting.
One of object of the present invention is to propose a kind ofly based on data flow diagram, to realize three-dimensional picture parallel drawing and visual system, the parallel drawing mission requirements that cannot meet various complicated draftings and visualized algorithm for solving existing standard SORT-LAST and two kinds of parallel drawing patterns of SORT-FIRST, are difficult to realize the parallel processing of complicated or special rendering algorithm.
Based on data flow diagram, realize parallel drawing and a visualization system in three-dimensional picture, designed in the present invention Render-Streamers framework and system, comprising:
The streamline, drafting resource module (based on graphics standard API such as OpenGL or Directx), type management module and the computing unit administration module in internal system for calling that are used for data flow diagram, the parallel Programming based on data flow diagram and the executed in parallel of decompose drawing and/or calculation task, the data link connecting by parameter passing interface and computing unit;
Described data flow diagram, by computing unit, parallel drawing in three-dimensional scenic and/or visualized algorithm have been decomposed to drafting and/or calculation task, between described computing unit, by input/output parameters groove, be connected with data link, the output of described computing unit is the input of data link, and the input of described computing unit is the output of data link; Described output parameter groove is carried out push operation push to data link, and described input parameter groove is carried out and received operation fetch to data link;
Described parallel Programming based on data flow diagram and the streamline of executed in parallel are to carry out described data flow diagram streamline, the execution that the elementary cell of described performance period is computing unit in each performance period; The execution sequence of described computing unit meets: in the data link of cross-thread, carry out synchronous operation and/or according to carrying out sequence, arrange computing unit execution sequence in thread block inside; It between described thread block, is the hierarchical structure of data dependence;
Described drafting resource module is that standard A PI(based on 3D figure is as OpenGL or DirectX etc.), by parameter object being divided into non local part and local part, cross-thread shares when quoting resource; Basic resource unit in described drafting resource module is the resource that can provide in 3D graphics standard API; Described type management module comprises type that built-in type in system and/or system provide and is defined and with the type of card format insertion system, all kinds collaborative work by developer;
Described computing unit administration module is used for the self-defined computing unit of developer, and quotes and generate this computing unit by name.
Each computing unit of described parameter passing interface has an input parameter piece and output parameter piece, and each parameter block comprises four grooves, and each groove is divided into SimpleSlot and FullSlot two parts;
The common interface of described data link data of description link is Link class, has and only comprises: push and fetch; Data link, except being responsible for transmitting data, is also responsible for generation and the management of non local parameter in parameter groove, also comprises four kinds of data link: link and socket link between thread inner link, cross-thread link, process.
In described computing unit data flow diagram, each node is a relatively independent computing unit, only by being connected to the data link of input/output parameters groove, communicates by letter with other computing units.The status different from data link passive receive method call, computing unit has the initiative and carries out in streamline.Computing unit each run is all asked input data from its all input link, output is passed to the data link connecting on all output magazines after calculating output.
Another object of the present invention is to propose a kind of three-dimensional picture parallel drawing and visual system, method driving based on data flow diagram, solves computer cluster and especially on GPU cluster, carries out efficiently parallel drawing and visual problem.The computing unit of realizing in streamline can be distributed on computing nodes different in calculating cluster, take thread as unit executed in parallel, and the data transmission between computing unit is completed efficiently by system architecture completely.
The three-dimensional picture parallel drawing and the method for visualizing that based on data flow diagram, drive, at GPU cluster, comprise at least one computing machine that possesses GPU, every computing machine is as described GPU clustered node, between described node, by network, connect, on described each node, operation has at least one process, described each at least one thread of in-process operation, its step is as follows:
1) at least one computing unit is set in thread, for completing calculation task or drafting task;
2) between described computing unit, by data link, connected and carried out that push pushes and/or fetch receives, in same thread, described computing unit is carried out according to dependence between data; In different threads, described computing unit is according to synchronization mechanism executed in parallel;
3) according to parallel drawing pattern and flow process corresponding to method for drafting, decompose, construct the corresponding parallel data flow graph of this method for drafting;
4) extract the drafting correlation parameter in three-dimensional model or scene, according to the computing unit in described data flow diagram and data link respectively the task of executed in parallel after decomposing obtain elementary area;
5) described elementary area is carried out to multi-form combination according to parallel drawing pattern, finally obtain complete visual image.
Further, built-in data link totally four kinds be respectively thread inner link, cross-thread link, shared drive link (link between process) and socket link.
Further, described thread inner link refers to that input block and output unit are in same thread, the data link of both sides' shared drive address space, and described data link is divided into thread inner link and cross-thread link.
Further, the input block of described thread inner link and output unit are in same thread, and according to the thread management pattern of figure API, they also share same drafting environmental context.Compare with the data link of other types, thread inner link has significant singularity: 1) execution sequence of the computing unit in same thread is topological order; 2) all computing units of thread inside are shared and are drawn environmental context, so the input slot of thread inner link and output magazine can be shared the local part of parameter; 3), except having the computing unit of carrying out complicated drafting task, also may there be the computing unit of many execution simple tasks a thread block inside, and these computing units all need to carry out exchanges data by thread inner link.
Further, each thread of described cross-thread link is executed in parallel, there is no each other sequencing, guarantee that input block produces data output unit ability consumption data afterwards, needs the synchronous operation of cross-thread; Cross-thread link adopts the mode of round-robin queue to carry out data transmission, distributes the round-robin queue of a fixed size, the non local part that the data type of queue for storing is parameter during link initialization.Round-robin queue carrys out the free space in record queue and has used space with two semaphores, to realize the thread synchronization of input block and output unit.
Further, the effect of buffering has been played between two computing units by round-robin queue, even if output unit also does not start to consume the data of this generation of input block, input block also can enter the next performance period, increases the concurrency of thread.The exchange of non local parameter only just can complete by the movement of pointer, is a kind of exchanges data means very efficiently.
Further, described shared drive link is for connecting the computing unit in two different processes that are positioned on same machine.Because the mapping address of same section of shared drive in each process may not be identical, the data structure through particular design is not directly in shared drive, normally to work.Adopt sequencing method, in shared drive, deposit parameter object serializing byte stream afterwards.
Further, the structure of described shared drive link has comprised controll block, data block and semaphore three parts, and the primary structure of its data block inside is the round-robin queue of buffer zone.In this queue the content of preserving as cross-thread link, be no longer the non local part of parameter object, but the byte sequence that whole parameter object obtains after the serializing operation of appointment in type definition (TypeSpec).
Further, at shared drive link, by a title by string representation, identify, on same machine, two identical titles always refer to same shared drive link.
Further, after the title name of given certain shared drive link, the handshake procedure handshake procedure of shared drive link is:
1. the both sides that shake hands all attempt creating handshake amount name.init.Due to the atomicity of file system operation, wherein a side understands success and the opposing party's failure.A successful side (being designated as A) continues to create other resources, and a side (being designated as B) of failure again opens handshake amount with the form of non-establishment and waits on the semaphore of shaking hands.
2.A creates other IPC resources, and according to the value of these resources of information initializing such as the type information providing and round-robin queue's size.
3., after initialization completes, A carries out post and operates to wake up B on the semaphore name.init that shakes hands.Then it waits for (means of reusing that adopt in order to distribute less a semaphore in handshake procedure) on name.full semaphore.
After 4.B is waken up, open these IPC resources, the value that A is inserted is compared with the local information obtaining.If find inconsistently, reporting errors, shakes hands unsuccessfully; Otherwise shake hands successfully, B carries out post operation on name.full.
5.A is waken up rear deletion name.init, the end of shaking hands by B.
Further, name type purposes: name.init semaphore handshake amount, the block count of name.empty semaphore idle data, name.full semaphore has been used data block count, name.ctrl shared drive controll block, i data block of name.buf.i shared drive.
Further, the transmitting procedure step of shared drive link is:
Owing to having adopted identical round-robin queue's mechanism, parameter transmittance process and the cross-thread link of shared drive link are quite similar.Only have following some difference:
1) what in round-robin queue, store is the parameter of serializing, rather than the non local part of parameter object.Therefore input slot and output magazine are all possessed a parameter object separately, rather than parameter object pointer directly points in queue as cross-thread link.
2) serializing/unserializing (serialize/deserialize) operation of type of service definition (TypeSpec) rather than context_out/context_in operation in push and fetch operation.
Further, described socket link is for connecting two computing units that are positioned on two physical machines, and it uses Transmission Control Protocol to carry out communication.Socket link is the data link of highest level.
Further, described socket link is connected to the output parameter groove of a computing unit input parameter groove of another computing unit.A socket chain route (Target IP, target port, link ID) tlv triple unique identification, its link ID is 32 unsigned numbers.In general a computing machine in cluster only need to be opened port and monitors, so from framework user's angle, the socket link that is connected to same computer is only identified by link ID conventionally.
Further, the handshake procedure of described socket link is:
For avoiding occurring that a side attempts to connect the opposing party and also do not start the situation of listening port, use two strategies to solve these problems.(1) output unit of link is fixed as monitoring side, and input block is fixed as connection side; (2) input block is constantly attempted connecting until successful connection when connecting.
Further, the transmitting procedure of described socket link is:
The input block of link and output unit can transmit any number of parameter objects (each performance period is transmitted) after having shaken hands.When input block need to send a parameter object, first it send one " data description head " (DataHeader) structure, wherein comprised the byte number after this parameter object serializing.Identical with shared drive link, if parameter type has fixed length serializing feature, this byte number is zero and uses the length of stating while shaking hands so; If parameter type has elongated serializing feature, so just use this value as the length of actual transmissions.Output unit is confirmed the errorless rear transmission status code of data or error code.Then input block sends the data after serializing, and output unit receives that total data sends status code 0 as confirming later.So far can start the transmission of next parameter object.
Further, socket link realize details:
Use independently thread to carry out the operation of data sending/receiving, to realize asynchronous transmission; Generation/consumption cross-thread in data transmit-receive thread and data arranges round-robin queue simultaneously, and the result after the serializing of queue for storing parameter object, controls asynchronous buffer size by the number of parameters that can deposit in round-robin queue is set.
Further, system also comprises the plug-in unit system for expanding, each plug-in unit in described plug-in unit system is a dynamic link library, comprising: the maker that all computing units that all computing units, the plug-in unit that corresponding TypeSpec subclass, the plug-in unit of all data types that all data types, the plug-in unit that plug-in unit provides provides provides provides are corresponding, realize the object of plug-in unit (Plugin) interface.
Further, the built-in plug-in unit in system provides the basic corresponding TypeSpec object of arithmetic type, mainly comprises various integers and floating type, and the computing unit that this plug-in unit provides comprises:
ConstProcessor computing unit template, provides constant output.
Dispatcher computing unit template, copies many parts of outputs by input parameter, and parameter can not be with local part.
FakeSink computing unit template, provides the input slot of any number, carries out blank operation.
VariableProcessor computing unit template, variable of internal maintenance, can have many parts of outputs.Parameter can not be with local part.
Further, system also comprises graphic plotting layer plug-in unit, and data type and calculating/drafting operation conventional in OpenGL and parallel drawing is provided.
Beneficial effect:
The present invention proposes a kind of Render-Streamers system framework, the technology by explicit construction with the streamline of data flow diagram logical drive, for developer provides the instrument of basic calculating unit with assembling computing unit, allow developer combine voluntarily these computing units and realize parallel drawing and the visible process that meets mission requirements.This framework is put forth effort on to provide has the computing unit interface of good extensibility and the efficient data link between computing unit.Conventional computing unit in some parallel drawing tasks has been realized in framework inside, and simple parallel drawing and visualization tasks only need suitably be assembled these built-in computing units and just can be realized.If need to realize complicated parallel drawing and visualized algorithm, use the developer of this framework in system, to add new computing unit voluntarily.The data link providing by Render-Streamers framework, the computing unit that these computing units that newly add can be built-in with framework fits together, and forms the parallel drawing streamline that meets specific needs.
Accompanying drawing explanation
Fig. 1 does not have data in link while carrying out synchronous operation in the data link of cross-thread, blocks output unit (being labeled as grey) schematic diagram;
Fig. 2 is that while carrying out synchronous operation in the data link of cross-thread, link buffer is full, blocks input block (being labeled as grey) schematic diagram;
Fig. 3 is that the computing unit of thread block inside is carried out schematic diagram according to topological sorting order;
Fig. 4 is the data flow diagram structural representation of carrying out common sort-first parallel drawing;
Fig. 5 is stratification sort-first streamline schematic diagram, comprises two-layer sort-first and decomposes;
Fig. 6 is the data flow diagram structural representation of carrying out common sort-last parallel drawing;
Fig. 7 is stratification sort-last streamline schematic diagram, comprises two-layer sort-last and decomposes;
Fig. 8 is the data flow diagram schematic diagram that combines sort-first and sort-last method;
Fig. 9 is the simple and easy data flow diagram schematic diagram of realizing echo algorithm;
Figure 10 is used sort-first and sort-last decomposing schematic representation on echo algorithm;
Figure 11 is Render-Streamers parameter block topology example figure;
Figure 12 be calculate line by line CRC32 verification and streamline exemplary plot;
Figure 13 is cross-thread link structure schematic diagram, and wherein, parameter object direction of transfer is from A to B, and the free token having used in round-robin queue is grey;
Figure 14 is shared drive link structure schematic diagram, and wherein, parameter object direction of transfer is from A to B, and the data block that has had data is labeled as grey;
Figure 15 realizes the streamline schematic diagram that sort-last draws, and comprises two GLObjectRenderer computing units;
Figure 16 is one and uses sort-first drawing mode to carry out the Render-Streamers schematic diagram that single 3D Model Based Parallel is drawn;
Figure 17 is the Render-Streamers streamline schematic diagram of realizing echo algorithm.
Embodiment
Render-Streamers provides some accessory parts, simplifies the design effort of new computing unit.Although the design original intention of Render-Streamers system framework is that its purposes is not limited only to absolutely parallel drawing in order to provide extendible implementation platform to parallel drawing algorithm.Any one algorithm, as long as its data characteristics possesses periodically, and its workflow can be described by the mode of data flow diagram, just can utilize Render-Streamers to carry out parallelization.
Below the detailed description of the present invention to each ingredient of Render-Streamers data flow diagram model.
1. data flow diagram drives parallel model and ingredient
1.1 towards parallel data flow diagram
Render-Streamers framework adopts data flow diagram to describe the logical process of parallel drawing/calculating, and the program of this data flow diagram realizes and carry out flow process, is called the parallel pipelining process line model based on data flow diagram.
Data flow diagram is actually by computing unit and represents the parallel drawing of three-dimensional scenic and each drafting and calculation procedure in visualized algorithm, and by data link, represents a kind of arthmetic statement means of the data dependence relation between these draftings and calculation procedure.In Render-Streamers, computing unit refers to a function opposite independent, completes the functional module of specific calculation task or the task of drafting.A computing unit need to obtain from other computing unit data conventionally as the input of calculating or drawing, and the result of calculating or draw is exported to other computing units.The difference of the calculating that the data type of computing unit input and output completes according to computing unit or the task of drafting has larger difference.In carrying out the streamline of parallel drawing task, the most common I/O type is simple parameter (comprising the variable of value type and the structure of formation thereof) and two dimensional image.
Computing unit is the base unit that Render-Streamers Program is carried out.
Render-Streamers, by computing unit is organized into thread block, can realize the executed in parallel between computing unit, but the implementation of single computing unit cannot realize parallelization by Render-Streamers.Therefore, the work that in general single computing unit completes should be tried one's best simply, and complicated work should have been worked in coordination with by data link by a plurality of computing units.The computing unit of design has better modular nature like this, and the streamline that they form is also easier to executed in parallel.
The formal definition of 1.2 data flow diagram
The basic comprising unit of data flow diagram is computing unit.Computing unit is an entity with active executive capability, and it is communicated by letter with extraneous by parameter groove (arugment slot, referred to as groove or slot).A computing unit can comprise zero or a plurality of groove, and some of them, only for obtaining data from other computing units, are called input parameter groove; Other,, for transmitting data to other computing units, are called output parameter groove.All input parameter grooves of a computing unit organize together, and are called the input parameter piece of this computing unit.The output parameter piece of definable computing unit in like manner.Each parameter groove is associated with a parameter object (also referred to as parameter), and the input parameter groove parameter object associated with output parameter groove of a computing unit is correspondingly called input parameter object (abbreviation input parameter) and the output parameter object (abbreviation output parameter) of this computing unit.Parameter object has type attribute, and type concept and the type concept in programming language are here basically identical.Parameter groove also has type attribute, and the type of parameter groove refers to the type of the parameter object that it can be associated with.The type of parameter groove is set when it is created, and cannot change.
Output parameter groove of a data link connection and an input parameter groove, these two grooves lay respectively on different computing units.The output parameter groove that data link is connected to is called the input slot of data link, and the input parameter groove that data link is connected to is called the output magazine of data link; Computing unit under the input slot of data link is called the input block of this data link, computing unit under the output magazine of data link is called the output unit (output of computing unit is exactly the input of data link, and the input of computing unit is exactly the output of data link) of this data link.In a well formed data flow diagram, each data link has unique input slot and output magazine, and each groove is just by a data link connection.Neither exist on a groove and be connected with a plurality of data link, also do not exist a data link connection to a plurality of input slots or a plurality of output magazine.
If certain output parameter groove of certain computing unit (being designated as A) is connected with certain input parameter groove of another computing unit (being designated as B) by data link, claim B to have immediate data to rely on to A.
Output parameter groove can carry out push (push) operation to data link, when output parameter groove is carried out push operation associated to parameter object be called the parameter object pushing in current push operation.Input parameter groove can be carried out and receive (fetch) operation to data link, after reception has operated, the parameter object that the input parameter groove of initiation reception operation is associated with equates with certain parameter object being pushed to before in this data link, now claim the input block (being designated as A) of data link the parameter object (being designated as O) on this output parameter groove to be passed to the output unit (being designated as B) of data link via this data link (being designated as L), or claim B from A, to obtain an input O via L.In general the parameter object transmission in data link meets first-in first-out rule, and the received order of parameter object is identical with pushed order.No matter from description above, can find out, be push operation or reception operation, and computing unit is all the active initiator of operation.Data link is propelling movement or the reception request of RESPONSE CALCULATION unit passively, and can not in the situation that not receiving propelling movement or receiving request, active give notice to computing unit.This medium of communication self is in passive position, and in the operator scheme that data transmission is triggered by receiving-transmitting sides completely and operating system, the operator scheme of pipeline is very similar.This operator scheme is one of cardinal rule of Render-Streamers data flow diagram design, is also the key of understanding Render-Streamers pipeline implementation.
If computing unit is seen to the node of mapping, data link is regarded the directed edge (pointing to output unit from the input block of data link) between node as, and in Render-Streamers framework, the topological structure of data flow diagram can represent with digraph so.Render-Streamers regulation, do not allow to comprise ring (otherwise will occur the circulation Dependence Problem of data), so data flow diagram is directed acyclic graph in data flow diagram.On a computing unit, can comprise a plurality of input parameter grooves and output parameter groove, the data link in the same way between two computing units may have many, so can comprise parallel edges in data flow diagram.
1.3Render-Streamers system resources in computation hierarchical structure
Render-Streamers system is organized as four levels by the available computational resources in GPU cluster, is respectively cluster, node, process and thread.A cluster includes one or more computing machine, and every computing machine is called a node of cluster.In general between node, by network, connect, each other shared drive not.On each node, operation has one or more processes, can shared drive between the process on same node, but have independently memory address space.One or more threads can be moved in each process inside, same in-process thread shared drive address space, but each thread has independently OpenGL Context.
In Render-Streamers, thread is the most basic parallel unit, and any two threads are all logically executed in parallel.Each thread comprises the one or more computing units in whole data flow diagram, and the subgraph that is positioned at the computing unit formation on same thread in data flow diagram is called the thread block that this thread is corresponding.Data flow diagram is directed acyclic graph, so its subgraph must be also directed acyclic graph.On mathematics, can prove, directed acyclic graph must be topology can sort (specifically can be referring to A.B.Kahn. " Topological sorting of large networks " .In:Commun.ACM5.11 (Nov.1962), pp.558 – 562.), this is in this patent, to walk abreast to realize executable precondition.The computing unit sequence that thread block obtains after topological sorting is called the execution sequence of this thread block.The topological sorting result of most of directed acyclic graphs is not unique, so the execution sequence of thread block is conventionally more than one.
The indirect data dependence of definition computing unit is as follows:
If C has immediate data to rely on to B, and B has immediate data to rely on to A, and C has indirect data to rely on to A so.
If C has indirect data to rely on to B, and B has immediate data to rely on to A, and C has indirect data to rely on to A so.
Indirect data relies on and immediate data dependence is referred to as data dependence.
According to the definition of topological sorting, can draw: if there are two computing unit A, B in same thread block, and B has data dependence to A, in any one of thread block carried out sequence, A comes before B so.
The executed in parallel process of the streamline form that 1.4 data flow diagram drive
The implementation of the streamline that Render-Streamers drives with performance period organising data flow graph.The base unit that forms the performance period is the execution of computing unit.The execution of computing unit comprises three phases below:
Input is at this one-phase, and the data link that all input parameter pieces of computing unit connect to it is carried out and received operation, to obtain the required input parameter of calculation stages.
Calculating is at this one-phase, and the input parameter object that computing unit obtains according to input phase and the internal state of self, carry out predefined sequence of operations (as change internal state, execution I/O etc.), and the value of output parameter object is set.Output is at this one-phase, and the data link that all output parameter pieces of computing unit connect to it is carried out push operation, and the output parameter object transfer that calculation stages is generated is given other computing units.
All computing units in streamline are all carried out once, just form a performance period of streamline.More accurate saying is that i the performance period of streamline is the set of carrying out formation for the i time of all computing units in streamline.It should be noted that between each performance period of streamline be not the relation of mutual exclusion in time, the nearer computing unit of range data flow graph entrance (in-degree is zero computing unit) often more early enters the new performance period.This point is very similar with real-life assembling line: if by the product in assembling line by dispatching from the factory serial number, so in streamline the forward assembly unit in position always at the larger product of processing numbering.
Existence due to data link between computing unit in streamline tends to produce direct or indirect data dependence.The relation of data dependence can have influence on the execution sequence of computing unit in the same performance period.Usually, if computing unit B exists immediate data to rely on to computing unit A, when in the same performance period, B carries out " input " operation so, A must finish " output " operation, otherwise B cannot obtain correct input parameter object.
Render-Streamers guarantees that by two kinds of means in streamline, the execution sequence of computing unit meets this requirement:
1) in the data link of cross-thread, carry out synchronous operation.
This strategy is applicable to the data link that input block and output unit are arranged in different threads piece.This regular specific requirement is, when if the output unit of data link is carried out reception operation to data link, data link temporarily cannot provide correct parameter object to it, and data link can be blocked acceptance operation so, by the time parameter object can with after return again.As shown in Figure 1, in Fig. 1, calculate the input phase of unit B in the cycle 1, the calculation stages of A in the cycle 1, so B is not yet pushed in data link by A at required input parameter of performance period 1, the now Fetch of B operation gets clogged, until A successful execution Push operation.
Otherwise if when the input block of data link is carried out push operation to data link, in data link, the parameter object of buffer memory is too much, data link also can select to block push operation.This is mainly to pile up in data link for fear of parameter object, little with data dependence relation.As shown in Figure 2, in Fig. 2, calculate the calculation stages of unit B in the cycle 1, the output stage of A in the cycle 4, and the surge capability of link is two parameters, when therefore A pushes parameter, link has not had enough buffer zones to hold parameter object, the now Push of A operation gets clogged, until B successful execution Fetch operation.
2) in thread block inside, according to execution sequence, arrange the execution sequence of computing unit.
Thread is parallel unit the most basic in Render-Streamers, and the computing unit of thread block inside is the relation that serial is carried out each other.Because the execution sequence of thread block is that topological sorting obtains according to immediate data dependence, so according to carrying out the sequence computing unit in execution thread piece successively, just can guarantee that the execution sequence of computing unit does not conflict with data dependence relation.
The data flow diagram of take in Fig. 3 is example.In this data flow diagram, comprise four threads, with empty frame, separate in the drawings, and be labeled as A, B, C, D.The numeral on computing unit for computing unit execution sequence of each thread inside, this order is obtained by the sequence of thread block inner topology.The execution sequence of these thread block inside of easy checking meets data dependence and requires.
Because the computing unit of thread block inside is always carried out according to the reiteration of carrying out sequence, thus in thread block inside always all computing unit all execute the operation of a performance period, then enter the next performance period.This means that overlapping phenomenon in time of different performance periods is only present between thread block.In the inside of thread block, only there is a movable performance period in a certain particular moment.
All computing units in thread block are carried out successively according to the order of carrying out sequence, are called the once execution of thread block.In Render-Streamers, the concept of performance period is corresponding with the concept of " frame " in drafting and visualized algorithm.In the performance period, comprised and drawn a frame and obtain the required Overall Steps of final image, the computing unit in certain performance period, in its input/output argument, institute's canned data is all to serve for the drafting of this frame.After computing unit in upper level in streamline is finished, its output parameter can pass to next stage computing unit as input parameter, therefore next stage computing unit also enters the residing performance period before upper level computing unit, upper level computing unit enters the next performance period, wait for next input parameter corresponding to performance period, generate corresponding output parameter, so go round and begin again.As can be seen here, parameter object has played the effect of defining the performance period in Render-Streamers.
By contrast, although Streaming Media is processed the concept that also has " frame ", the frame in Streaming Media is all present in the video flowing after coding in many cases, and before the byte stream that comprises Video coding is decoded, frame cannot exist with individual form or propagate.Therefore the framework of processing towards Streaming Media generally all can be avoided the concept of parameter object, directly character-oriented throttling, and computing unit needs oneself to realize parameter to the Code And Decode of byte stream.Because input/output argument is all continuous byte stream, thus from the angle of framework, see that each computing unit also can only carry out continuously, and can not define the performance period.
1.5 usage data flow graphs are described parallel drawing and visualization tasks
How this section introduction represents Task-decomposing method conventional in parallel drawing by data flow diagram, and how to utilize these Task-decomposing methods to realize the parallelization of common rendering algorithm.
1.5.1Sort-First decompose
Sort-First decomposes and to refer to screen divider is become to several mutually disjoint regions, and a content in region is responsible for drawing in each parallel drawing unit.The feature of this method is the model data that each parallel drawing unit needs processes complete, but the size of synthetic image is less, and the calculated amount that is applicable to processes pixel is large and the little situation of calculated amount that processes on summit.In Render-Streamers, sort-first method for drafting can data flow diagram as shown in Figure 4 be described.In this data flow diagram, comprise following treatment step, the sequence number of step marks in the drawings:
1) viewport information decomposition: according to rule given in advance (the normally cutting based on grid), the viewport information of input is processed, obtained the viewport information of each sub-viewport.
2) scene drawing: the computing unit of carrying out drafting task is drawn complete model of place data in viewport separately, obtains the image that each viewport is corresponding.
3) Image Mosaics: the Image Mosaics of each computing unit output of previous step is become to complete image.
Above data flow diagram input viewport information and output piece image are identical with single drawing unit in subordinate act.Construct a plurality of such data flow diagram, each is used as to an independent drawing unit, then use these " drawing units " to construct another sort-first data flow diagram, just obtained the cascade sort-first drawing mode of stratification, as shown in Figure 5.
The sort-first of as a rule stratification decomposes needs more viewport to split and Image Mosaics operation, poor-performing.In most cases all should preferentially use individual layer sort-first to decompose.
1.5.2Sort-Last decompose
Sort-last decomposes the model of place data that refer to needs drafting and is divided into many parts, and a part for rendering model data is responsible in each parallel drawing unit.The feature of this decomposition is that a part of model data only need to be processed in each parallel drawing unit, but need to generate the image of full-size, and the calculated amount that is applicable to processes pixel is little and the large situation of calculated amount that processes on summit.
Carry out the data flow diagram of sort-last drafting task as shown in Figure 6.This data flow diagram is identical on topological structure with the data flow diagram in Fig. 4, unique different be the operation that computing unit is carried out.In this data flow diagram, comprise equally three steps:
1) model information is decomposed: the model of place information of input is divided into many parts, exports to respectively the computing unit that each carries out drafting task.
2) scene drawing: carry out the computing unit of drafting task by the model information of receiving drafting pattern picture (comprising color component and depth component) in viewport given in advance.All computing units are all used identical viewport information, and the picture size of generation is also identical.
3) image stack: the image that previous step is generated, according to do image stack by the depth information of pixel, obtains complete drawing result (also comprising color component and depth component).
The serializing of model information operates conventionally all more complicated, data volume is also very large, therefore in actual applications often only with the real number in start of record/final position to representing the data area in scene, the computing unit of carrying out drafting task according to this real number to being written into voluntarily the model data that needs drafting.In 6 streamline, what split and transmit is all such a real number pair.
Similar with sort-first decomposition, sort-last decomposes also can realize multi-level cascade, as shown in Figure 7.Similar to the sort-first decomposition of stratification, the sort-last of stratification decomposes needs decomposition and the union operation of execution also more, in most cases all should only use the sort-last of individual layer to decompose.
1.5.3 combine and use Sort-First and Sort-Last to decompose
Summit process and the calculated amount of processes pixel all large in, can consider that the viewport that had both used sort-first method to reduce each computing unit is big or small, also use sort-last method to reduce the model data amount that each computing unit need to be drawn.Fig. 8 is a data flow diagram that combines sort-first and sort-last decomposition method.In figure, newly introduced a kind of computing unit, " parameter copies " unit by name.Its function is that input parameter is copied to many parts, exports to different computing units.The internal layer of this data flow diagram is two and carries out the subdata flow graph (identify with fillet grey rectangle in figure, and go out with dotted line circle) that sort-last decomposes.These two sub-data flow diagram are connected to outer field viewport information decomposition unit and Image Mosaics unit, form sort-first and decompose.Can see, owing to having adopted two kinds of is olations of sort-first and sort-last simultaneously, the scene drawing unit in figure need to receive viewport information and two kinds of information of model information simultaneously, and whole streamline also has viewport information and two inputs of model information.
1.5.4 the decomposition based on task: echo
The building method of several data flow diagram of introducing above has all adopted the decomposition method based on data: sort-first method is the decomposition based on view data, and sort-last method is the decomposition based on model data.Decomposition based on data is the Main Means that most of parallel drawing frameworks are realized parallelization.This section be take echo algorithm as example, introduces and how to use Render-Streamers data flow diagram to realize the decomposition based on task.
The algorithm that echo generates shade comprises two plot step: draw scene the generating depth map picture (being echo) under light source visual angle and draw the scene under observer visual angle and generate final image.Wherein second step need to be used the drawing result (depth map) of the first step.This draws flow process can be expressed as the most succinct abstract data flow diagram shown in Fig. 9.
Analyze the impact that data dependence relation brings to the implementation of this data flow diagram.From figure, easily find out, the computing unit of drawing scene has immediate data to rely on to drawing the computing unit of echo, within the same performance period, always draws the computing unit of echo and first carries out, and after the computing unit of drafting scene, carries out.Therefore in same computation period, the two cannot executed in parallel.But they can be by realizing parallel in the different performance periods.As long as the two is in different thread block, due to the synchronous effect of data link, the computing unit of drawing scene just always can be than slow at least one performance period of the computing unit of drawing echo.There is not the problem of data dependence in the computing unit in the same performance period, so two computing units can be realized parallel in this case.This parallel mode has significant difference with the mode based on data parallel of introducing before.In strategy at sort-first and sort-last etc. based on data parallel, between each drawing unit, there is not data dependence, can be in the same performance period executed in parallel.And between this drawing unit, there is the streamline of data dependence, and can only in the different performance periods, realize parallelly, this has just caused the performance period between computing unit to postpone.By the delay of introducing between computing unit, realizing the parallel of different task, is to utilize streamline to realize the characteristic feature of tasks in parallel.In general, the progression of streamline is more, just more from being input to the delay period of output.Data flow diagram in Fig. 9 can also further adopt the means of data parallel to improve degree of parallelism.Figure 10 has just provided one and has adopted sort-first mode to draw echo, draws the example of scene by sort-last mode, draws the sort-first part of echo and with empty frame, separates in the drawings with the sort-last part of drawing scene.
The design of 2.Render-Streamers fundamental element
In 2.1 parameter objects, quote the parallelization management of graphic resource
In most of the cases, parameter object is with an object encoding in internal memory in process.Because each thread shared drive address space in same process, so same each in-process thread just can be realized sharing of parameter object by sharing the mode of parameter pointer in this case.OpenGL is the standard A PI of 3D figure, and the present invention be take OpenGL and drawn the management mode of resource as example explanation.In graphics system, once quote the resource of OpenGL in parameter object, the shared method of this parameter object is just no longer applicable.According to the resource management scheme of OpenGL, in process, each uses the necessary OpenGL Context that creates portion oneself of thread of OpenGL function, and the content in OpenGL Context is not shared at cross-thread.Certain thread is toward the reference information of having preserved OpenGL resource in parameter object, and as the title of texture object or summit buffer object, this parameter object just cannot be understood by other threads, also cannot between thread, share.
Render-Streamers is by being divided into parameter object non local part and local this problem that partly solves.In Render-Streamers, each parameter object is divided into two parts: non local part and local part.The non local part of parameter refers to that in parameter object, not relying on any computing unit just can express the part of complete semanteme, and the local part of parameter refers to the part that needs to depend on certain specific computing unit ability expressed intact semanteme in parameter object.In Render-Streamers, the non local part of parameter is its main existence form, the parameter object of any type all must be in the situation that not there is not local part expressed intact semantic, it is inner that local part is only present in specific thread, the interim storage while carrying out with computing unit for the parameter object in thread is shared.Great majority have been used the parameter type of OpenGL resource all to have non local part and local part simultaneously, and the parameter type that only relies on CPU operation conventionally only has non local part and there is no local part.
Image type is typically both to have had the parameter type that non local part also has local part.In OpenGL application, image has two kinds of typical existence forms: while participating in drawing, image exists usually used as a texture object in OpenGL; And need to be at thread, while transmitting between process or different computing machine, image needs to be again stored in core buffer.Parameter object for an image type, core buffer is exactly its non local part, and texture object is exactly local part.
2.2 parallel data type management
Render-Streamers be designed to can with all kinds collaborative work, these types had both comprised the built-in type in system, also comprised type that Render-Streamers provides and may be defined and with the type of card format insertion system by developer in the future.Different types of data need to be transmitted and carry out type checking in the defined four kinds of data link of Render-Streamers, and this just need to carry out abstract to the various data manipulations that may relate to.In Render-Streamers, various types of data are unified to transmit in the inner void* of use of program type, and unify to describe its behavior by TypeSpec class.Each type of using in Render-Streamers streamline has a corresponding TypeSpec to realize.
Below listed the main operation of TypeSpec definition.
Figure BDA0000433020100000161
Utilize the operation of TypeSpec definition can define the relation of equality between two parameter objects of same type.The decision rule of relation of equality is as follows:
If a given moment, any one parameter object equates with self.
By assign, operate A assignment to B, the A after assignment and B equate with the A before assignment.
Same parameter object A0 is carried out to context_out operation and context_in successively and operate and obtain A1, A0 equates with A1.
Use parameter object A of serialize_* serializing to obtain a byte sequence, then this byte sequence unserializing is arrived to another parameter object B, A equates with B.
Because the parameter object in Render-Streamers can be any type, so relation of equality can have a great difference with the difference of type, cannot provide concrete definition, can only provide abstract delivery rules.When realizing the TypeSepc of particular type, should be noted that and allow relation of equality defined above meet equal meaning directly perceived.
Render-Streamers unifies the behavior of data of description except type of service is abstract, also type is carried out to centralized management.TypeManager has safeguarded a mapping from type i D to TypeSpec, ender-Streamers comprises that the other types of built-in type and developer's definition are at interior all types, as long as need to transmit by data link, will in TypeManager, register.In general, the registration of type information should be carried out when plug-in unit is written into.
2.3 computing unit management
Computing unit administrative mechanism allows developer oneself definition computing unit, and quotes and generate these computing units by name.The management of computing unit is carried out in ProcessorManager, and it safeguards a mapping from name to computing unit maker.Computing unit maker is a function, and it is accepted a parameter list and returns to a Processor object.Processor is to liking the base class of all computing units.
2.4 parameter passing interfaces
Each computing unit of Render-Streamers has an input parameter piece and output parameter piece.This section will be introduced the internal structure of parameter block in detail.
Figure 11 has shown the inner structure of a parameter block that comprises four grooves in Render-Streamers.Each groove is divided into two parts: SimpleSlot and FullSlot.SimpleSlot is with array form Coutinuous store in parameter block, and FullSlot carries out index by name, also can carry out the access of the form of subscripts by an array of indexes simultaneously.
In SimpleSlot, only include the necessary information of driving data link true(-)running, in FullSlot, comprised the more complicated information such as name, subordinate relation and non local parameter.
Selecting the design of this information separated storage, is in order to optimize the transmission performance of simple link.Comparatively simple data link, as thread inner link, only needs the information recording in SimpleSlot just can normally work.SimpleSlot is deposited with conitnuous forms, can the in the situation that of a large amount of use thread inner link, increase cache hit rate and internal storage access continuity, improve performance.FullSlot has comprised more information, can support the name index operation of more complicated data link form and parameter groove.
The content recording in SimpleSlot comprises:
The actual deposit position of the non local parameter of arg, this pointer is realized and is carried out initialization by concrete data link when setting up data link.
The data link of this groove association of link, the same with arg, be also initialization when setting up data link.
The type of typespec parameter object, the type is determined and can be modified afterwards when generating groove.
The content recording in FullSlot comprises:
The SimpleSlot of this FullSlot association of simple slot.
Typespec is identical with the typespec in SimpleSlot.
The non local part of local arg parameter, the same with arg, initialization when setting up data link.
The title of name groove.Except for there is no practical use index and error message.
Computing unit under this groove of processor.
The direction amount of enumerating, value is Input or Output.In input parameter piece, the direction of all grooves is Input, in output parameter piece, is Output.
2.5 data link
Common interface for data of description link in Render-Streamers is Link class.It only includes two method: push and fetch.
Data link, except being responsible for transmitting data, is also responsible for generation and the management of non local parameter in parameter groove.The mode that different types of data link is transmitted data is very different, so the generation of parameter object is also very different from way to manage.The realization of data link can be revised the non local parameter pointer in parameter groove according to self needs when carrying out push or fetch, only need to guarantee following condition:
After data link initialization completes, should guarantee the non local parameter pointer non-NULL of output magazine (being the input end of link).
After fetch has operated for the first time, should guarantee the non local parameter pointer non-NULL of input slot (being the output terminal of link).
Render-Streamers framework provides four kinds of data link: link and socket link between thread inner link, cross-thread link, process.They respectively input block and the output unit of respective links lay respectively in same thread block, these four kinds of situations in the different threads of same process, on the difference in-process and different computing machine of same computer.The streamline consisting of these four kinds of data link can meet the demand of most parallel drawing tasks, but does not also get rid of the situation that developer needs designed, designed data link.If developer need to add data link in Render-Streamers framework, the requirement above should meeting during parameter object pointer in revising parameter groove.The workload of design data link is made mistakes more greatly and easily conventionally, does not therefore advise designed, designed data link.
2.6 computing unit
In the data flow diagram of Render-Streamers, each node is a relatively independent computing unit, only by being connected to the data link of input/output parameters groove, communicates by letter with other computing units.The status different from data link passive receive method call, computing unit has the initiative and carries out in streamline.Computing unit each run is all asked input data from its all input link, output is passed to the data link connecting on all output magazines after calculating output.
In Render-Streamers, the base class of computing unit is Processor, and its main method is execute method, and this method is called all at every turn and can intactly be carried out once " input---calculating---output " flow process, and its detailed process is as follows:
1) input: all grooves in traversal input parameter piece call fetch method to obtain parameter on its associated Link object.
2) calculate: call the run method of self, carry out and calculate.This method is realized by subclass.
3) output: all grooves in traversal input parameter piece call push method on its associated Link object.
Figure 12 has shown a complete pipeline organization, and it comprises an input block, a computing unit and an output unit.In this figure, can be clear that the annexation of computing unit and data link.Hollow arrow representative function call relation in figure (pointing to callee from caller), regular point represents the adduction relationship of pointer.
The computing unit of 2.7 preset parameter lists
Processor class is in order to realize universalization to greatest extent, and its number of parameters and type are (unique restriction are that parameter groove can only add and can not delete) that can dynamically determine in the runtime.But in practical application, the input and output parameter type of many computing units is all fixed, and does not need dynamic decision.In order to simplify the realization of these computing units, Render-Streamers provides the computing unit template FixedProcessor of preset parameter list.It accepts an input parameter list of types and an output parameter list of types, and automatically generates the class with following method according to these parameter lists:
A constructed fuction, accepts one group of character string as the title (being the name in FullSlot) of input and output parameter groove, and according to list of types to adding trough in input/output parameters piece.
Pure empty member function do_run, the input parameter list that its parameter list is computing unit adds output parameter list.
Realize the run function of Processor base class, its function for to extract non local parameter pointer from input/output parameters piece SimpleSlot list, and according to parameter type, list is done type conversion and called do_run function.This class is very easy to the realization of preset parameter list computing unit.Realize an I/O type for " (string; int)-> (char; float) " computing unit, only need to inherit FixedProcessor<TypeList<string, int>, TypeList<string, int>> class, and realize do_run (string &, int &, char &, float &) function.
3. the design of data link and realization
Polytype data link is the data transmission that Render-Streamers can be competent at varying environment different levels, thereby completes the core place of all kinds of parallel drawings and visualization tasks.The present invention is the link (or data path) stressing in conjunction with graphic resource and calculating.In Render-Streamers, built-in data link has four kinds at present, is respectively thread inner link, cross-thread link, shared drive link and socket link.
3.1 process inner link
Process inner link, refers to that input block and output unit are in same process, the data link of both sides' shared drive address space.This class data link can be further divided into again thread inner link and cross-thread link.
3.1.1 thread inner link
The input block of thread inner link and output unit are in same thread, and according to the threading model of OpenGL, they also share same OpenGL Context.Compare with the data link of other types, thread inner link has significant singularity:
1) execution sequence of the computing unit in same thread is topological order.Within the same performance period, when the output unit of thread inner link starts to carry out, the execution of input block is inevitable finishes, the input parameter that output unit needs also all pushes, so the parameter groove of both link ends can be shared same parameter object and there will not be data contention or error in data, also there is no need to take extra synchronisation measures.
2) all computing units of thread inside are shared OpenGL Context, so the input slot of thread inner link and output magazine can be shared the local part of parameter.
3), except having the computing unit of carrying out complicated drafting task, also may there be the computing unit of many execution simple tasks a thread block inside, and these computing units all need to carry out exchanges data by thread inner link.So thread internal data link has higher performance requirement to exchanges data.
In fact, above-mentioned front two dot characteristics just in time can meet the performance requirement thirdly proposing.The I/O unit of link is shared same OpenGL Context when carrying out, thereby not only can share the non local part of parameter, also can share local part, and this has just saved necessity that parameter is transmitted; The execution sequence of computing unit meets topological sorting, and this has just saved synchronous necessity.Unique operation that thread inner link need to be carried out is exactly when link initialization, to be that the parameter groove at two ends distributes local parameter and non local parameter, and its push and fetch operation are all blank operations.So just naturally reached high exchanges data efficiency.The realization of thread inner link in Render-Streamers is InternalSingleLink class.
3.1.2 cross-thread link
Compare with thread inner link, the link of cross-thread is slightly more complex.Each thread is executed in parallel, there is no each other sequencing, guarantee that input block produces data output unit ability consumption data afterwards, and this just needs the synchronous operation of cross-thread.When design synchronization mechanism, also should be noted that synchronization mechanism can not affect the concurrency of thread.Such as synchronization mechanism being designed to the thread at input block place and the thread at output unit place, alternately carry out, although can solve data dependence problem, eliminated the concurrency of cross-thread completely, but also easy produce of deadlock problem.
Cross-thread link in Render-Streamers adopts the mode of round-robin queue to carry out data transmission.The round-robin queue that distributes a fixed size during link initialization, the non local part that the data type of queue for storing is parameter.Arg pointer in the input slot of the write pointer respective links of queue, the arg pointer in the output magazine of read pointer respective links, as shown in figure 13.While carrying out push operation, first to the context_out method in the parameter call TypeSpec of current write pointers point, make the information of local parameter be synchronized to non local
In parameter, then make write pointer move to the next object in queue.While carrying out fetch operation, moving hand first, then call the context_in method of TypeSpec, the information in non local parameter is write in the local parameter of output magazine.This round-robin queue carrys out the free space in record queue and has used space with two semaphores, to realize the thread synchronization of input block and output unit.
The effect of buffering has been played between two computing units by round-robin queue, even if output unit also does not start to consume the data of this generation of input block, input block also can enter the next performance period, and this has just increased the concurrency of thread.The exchange of non local parameter only just can complete by the movement of pointer, and this is also a kind of exchanges data means very efficiently.When relating to local parameter because need to carry out context_out and context_in operation, can produce some extra expenses, but this to be essential characteristic by local parameter determine.In sum, the cross-thread link of Render-Streamers is proper as the means of cross-thread parameter exchange.
The realization of cross-thread link in Render-Streamers is ThreadedLink class.
3.2 shared drive links
Shared drive link is for connecting the computing unit in two different processes that are positioned on same machine.Because the mapping address of same section of shared drive in each process may not be identical, in general cannot directly normal work in shared drive through the data structure of particular design.Therefore the design of shared drive link, not as cross-thread link, directly allows input/output parameters groove share parameter object, but has adopted the method for serializing, deposits parameter object serializing byte stream afterwards in shared drive.
Shared drive link has been continued to use the structure of cross-thread link substantially, the primary structure Ye Shi round-robin queue that it is inner.Just the content in this queue is no longer the non local part of parameter object as cross-thread link, but the byte sequence that whole parameter object obtains after the serializing operation of appointment in TypeSpec.
The structure that design has proposed shared drive link in the present invention as shown in figure 14, comprises following components:
Controll block: controll block is a shared drive, has wherein stored the metamessage of this link.Its content comprises:
32 signless integers of –, record the length that link is uploaded defeated type i D;
32 signless integers of –, record the capacity of round-robin queue in link, i.e. the number of data buffer;
32 signless integers of –, record the size of each data buffer, if the serializing variable-length of data type so this value be zero;
– character string, records link and uploads defeated type i D.
Data block: data block is one group of shared drive, each represents a buffer zone in round-robin queue.The byte number of remainder in front four these buffer zones of byte representation of each buffer zone.For the fixing data type of serializing length, this value is identical with the buffer size recording in controll block; Otherwise it represents the byte number that this serializing produces.
Semaphore: each shared drive link comprises three semaphores, wherein two are recorded current free time and the data block number of use, and another uses when initialization handshake, deleted after data link setup completes.
The realization of shared drive link in Render-Streamers is ProcessLink class.
3.2.1 handshake procedure
Shared drive link identifies by a title by string representation, and on same machine, two identical titles always refer to same shared drive link.
Under interprocess communication (Inter-Process Communication, the IPC) system of POSIX, shared drive and semaphore all have the feature of similar file, and they all visit by title, and the operation such as support to create, delete and open.In Render-Streamers, arrange: after the title name of given certain shared drive link, corresponding POSIX IPC resource name is as follows with it.
Name type purposes
Name.init semaphore handshake amount
The block count of name.empty semaphore idle data
Name.full semaphore has been used data block count
Name.ctrl shared drive controll block
I data block of name.buf.i shared drive
The POSIX IPC resource of using in the above-mentioned Render-Streamers of being shared drive data link
The handshake procedure of Render-Streamers shared drive link is as follows:
1. the both sides that shake hands all attempt creating handshake amount name.init.Due to the atomicity of file system operation, wherein a side understands success and the opposing party's failure.A successful side (being designated as A) continues to create other resources, and a side (being designated as B) of failure again opens handshake amount with the form of non-establishment and waits on the semaphore of shaking hands.
Other IPC resources of listing in 2.A establishment table 4.1.And according to the value of these resources of information initializing such as the type information providing and round-robin queue's size.
3., after initialization completes, A carries out post and operates to wake up B on the semaphore name.init that shakes hands.Then it waits for (means of reusing that adopt in order to distribute less a semaphore in handshake procedure) on name.full semaphore.
After 4.B is waken up, open these IPC resources, the value that A is inserted is compared with the local information obtaining.If find inconsistently, reporting errors, shakes hands unsuccessfully; Otherwise shake hands successfully, B carries out post operation on name.full.
5.A is waken up rear deletion name.init, the end of shaking hands by B.
3.2.2 transmitting procedure
Owing to having adopted identical round-robin queue's mechanism, parameter transmittance process and the cross-thread link of shared drive link are quite similar.Only have following some difference:
What in round-robin queue, store is the parameter of serializing, rather than the non local part of parameter object.Therefore input slot and output magazine are all possessed a parameter object separately, rather than parameter object pointer directly points in queue as cross-thread link.
In push and fetch operation, use serialize/deserialize operation rather than the context_out/context_in operation of TypeSpec.
The conditional-variable that the directly semaphore of use POSIX IPC, rather than C++11 provides.
3.3 sockets (socket) link
Socket link is for connecting two computing units that are positioned on two physical machines, and it uses Transmission Control Protocol to carry out communication.Socket link is the data link of highest level in Render-Streamers.
The same with the link of other type, socket link is connected to the output parameter groove of a computing unit input parameter groove of another computing unit.A socket chain route (Target IP, target port, link ID) tlv triple unique identification, its link ID is 32 unsigned numbers.In general a computing machine in cluster only need to be opened port and monitors, so from framework user's angle, the socket link that is connected to same computer is only identified by link ID conventionally.The realization of socket link in Render-Streamers is SocketLinkInput class and SocketLink-Output class.
3.3.1 handshake procedure
Different from shared drive, because Transmission Control Protocol communicating pair when shaking hands is not reciprocity, therefore can not as shared drive link, when link initialization, freely initiate by both party to connect.And Render-Streamers is designed to the structure of decentralization, the machine in cluster is not done to unified management, this just may occur that a side attempts to connect and the opposing party does not also start the situation of listening port.Render-Streamers has been used two strategies to solve these problems.(1) output unit of link is fixed as monitoring side, and input block is fixed as connection side; (2) input block is constantly attempted connecting until successful connection when connecting.
In Render-Streamers, the protocol comparison of socket link is simple, the input block of link is successfully connected to after output unit, to output unit, send a LinkHeader structure, comprising the length of link ID, typonym and the information such as data length of serializing.In these information, except link ID socket link is exclusive, other all identical with the corresponding informance in shared drive link controll block.The output unit of link receives after this structure, utilizes link ID to find the link information of local statement, and compares, and confirms that errorless backward input block beams back status code 0, otherwise beams back corresponding error code and stop shaking hands.
Next the input block of link sends the type i D of parameter, and output unit carries out type comparison, and beams back status code 0(and shake hands successfully) or error code (termination is shaken hands).
3.3.2 transmitting procedure
The input block of link and output unit can transmit any number of parameter objects (each performance period is transmitted) after having shaken hands.When input block need to send a parameter object, first it send one " data description head " (DataHeader) structure, wherein comprised the byte number after this parameter object serializing.Identical with shared drive link, if parameter type has fixed length serializing feature, this byte number is zero and uses the length of stating while shaking hands so; If parameter type has elongated serializing feature, so just use this value as the length of actual transmissions.Output unit is confirmed the errorless rear transmission status code of data or error code.Then input block sends the data after serializing, and output unit receives that total data sends status code 0 as confirming later.So far can start the transmission of next parameter object.The details of LinkHeader and DataHeader structure.
3.3.3 socket link realizes details
Although the protocol comparison of socket link is simple, it is integrated in the data link framework of Render-Streamers is not a simple thing.Topmost problem still derives from performance.General socket is realized the flow control of transport layer is all only provided, and cannot carry out data buffering for the data characteristics of application layer.If it is too small that the buffering of transport layer arranges, the operation of data transmit-receive is just easy to become the synchronous point of receiving-transmitting sides, at network speed, be not very fast in the situation that, can cause meaningless data awaiting transmission; If adopt excessive buffer zone to carry out asynchronous transmission, easily cause again input block and output unit to lose completely synchronously.
Render-Streamers is to use independently thread to carry out the operation of data sending/receiving, to realize asynchronous transmission to this solution; Generation/consumption cross-thread in data transmit-receive thread and data arranges round-robin queue simultaneously, and the result after the serializing of queue for storing parameter object, controls asynchronous buffer size by the number of parameters that can deposit in round-robin queue is set.So just realize the flow control of application layer, thereby can in asynchronous transmission, control maximum-delay.Under extreme case, it is 1 that queue size can be set, and now each parameter object remains asynchronous transmission, but there is no pooling feature.
4. plug-in unit system
4.1 plug-in unit system introductions
Plug-in unit system is the pillar of Render-Streamers extensibility.By definition plug-in unit, use the developer of Render-Streamers in system, to add new computing unit and data type.
In Render-Streamers, each plug-in unit is a dynamic link library, in this dynamic link library, should comprise following content:
All data types that plug-in unit provides.
TypeSpec subclass corresponding to all data types that plug-in unit provides.
All computing units that plug-in unit provides.
Maker corresponding to all computing units that plug-in unit provides.
An object of realizing Plugin interface.Plugin will be explained in greater detail below.
In order to allow Render-Streamers can have access to the content in plug-in unit, in the dynamic link library of plug-in unit, need to define several globak symbols.The title of these globak symbols and type are being listed below.Render-Streamers is written into after the dynamic link library of plug-in unit, will first search these symbols, obtains the relevant information of plug-in unit, to carry out next step initial work.
Name type meaning
The title of glstreamer_plugin_name char const* plug-in unit
The major version number of glstreamer_plugin_major int plug-in unit
The minor version number of glstreamer_plugin_minor int plug-in unit
GetPlugin glstreamer::Plugin* () obtains the function of plug-in object
The content that the above-mentioned Render-Streamers of being plug-in unit need to define
Wherein, the most important thing is getPlugin function.It returns to an object of realizing Plugin interface, and Plugin interface comprises three operations:
Void init () plug-in unit completes the initialization of self in this function.
Void registerTypes () plug-in unit calls registerTypeSpec or the registerType method of TypeManager class in this function, completes the registration operation of data type.
Void registerProcessors () plug-in unit calls the registerProcessor method of ProcessorManager class in this function, completes the registration operation of computing unit.
Successfully find the getPlugin function of plug-in unit and obtain after the object of realizing Plugin interface, Render-Streamers will call this three functions successively by order listed above, and so far the process that is written into of plug-in unit finishes.Plug-in unit, except utilizing registerTypes and registerProcessors function to add data type and computing unit in Render-Streamers framework, can also provide computing unit template.Computing unit template is the class template in C++, need to provide type parameter just can be instantiated as computing unit class.A computing unit template, difference due to type parameter, may corresponding many computing unit classes, so computing unit template cannot register by registerProcessors method, the computing unit template that developer can only use plug-in unit to provide by the header file of introducing plug-in unit and providing.
The built-in plug-in unit of 4.2Render-Streamers
4.2.1 core inserter
This plug-in unit provides arithmetic type corresponding TypeSpec object basic in C++, mainly comprises various integers and floating type.It also provides the TypeSpec of std::vector type corresponding to these elementary arithmetic types and std::string type.
The computing unit that this plug-in unit provides (template) comprising:
ConstProcessor computing unit template, provides constant output.
Dispatcher computing unit template, copies many parts of outputs by input parameter.Parameter can not be with local part.
FakeSink computing unit template, provides the input slot of any number, carries out blank operation.
VariableProcessor computing unit template, variable of internal maintenance, can have many parts of outputs.Parameter can not be with local part.
4.2.2 graphic plotting layer plug-in unit
This plug-in unit provides take conventional data type and calculating/drafting operation in graphic plotting layer that OpenGL is representative and parallel drawing.The data type that this plug-in unit provides has:
GLFrameData data type template, the image of storage specific pixel type.There are two example GLFrame-Data<RGBAFrame> and GLFrameData<DepthFrame>.Such parameter comprises local part,
The type of local part is GLTextureData.
The viewport that GLViewport comprises scene and the information of projection plane.Comprise the position of viewport when final demonstration, width, highly and the Si Bian position, upper and lower, left and right of projection plane and the position of far and near cutting plane.
The range data that GLDataRange consists of two double values.In sort-last drawing mode, be usually used in the data area that represents that needs are drawn.
The model view information converting that GLObjectState comprises scene.Comprise scaling, the translation with respect to the anglec of rotation of three coordinate axis and phase scene for initial point.
The float matrix of GLMatrix4x4, is usually used in depositing the homogeneous coordinate transformation matrix in OpenGL.The computing unit that this plug-in unit provides is more.
Embodiment 1
Render-Streamers parallel drawing and visual in application example
Sort-Last draws embodiment
Figure 15 has shown that one is used sort-last drawing mode to carry out the Render-Streamers pipeline organization that single 3D Model Based Parallel is drawn.Square frame represents computing unit, and regular point represents wall scroll data link, by the input node sensing output node of data link.Tail end is that the solid sharp arrow of fletching shape represents many data link.Point to the hollow triangle arrow of data link for the parameter type in this data link is described.
In this data flow diagram, having three in-degrees is zero computing unit, and viewport/projection information, data area information and the model view information converting of whole scene is provided respectively.Wherein model view change information is provided by VariableProcessor, so the coordinate of object can temporal evolution; Other two information are all provided by ConstProcessor, can not change.The data area information that ConstProcessor<GLDataRange> computing unit provides (normally [0; 1)) by GLDataRangeSplitter computing unit, be divided into two equal intervals, offer two GLObjectRenderer computing units.Viewport/projection information and model view information converting directly offer this two computing units without splitting.
Each GLObjectRenderer computing unit produces three kinds of information: a width RGBA image, an amplitude deepness image and viewport/projection information corresponding to this two width image.These three kinds of information are output in GLFrameComposer computing unit, this computing unit can combine each width RGBA image and depth image according to viewport and depth information, the final RGBA image producing is shown on screen by GLFrameDisplayer, and depth image is abandoned by FakeSink.GLFrameComposer also needs an input parameter to determine the viewport size of final synthetic image, and this parameter is provided by ConstProcessor<GLViewport>.
First and second outputs that window is two GLObjectRenderer from left to right, the 3rd window is the output of GLFrameDisplayer.Three windows belong to threads different in same process.Two GLObjectRenderer computing units lay respectively in the thread under first and second window, and in streamline, remaining computing unit is all arranged in the thread under the 3rd window.
All white computing unit in figure is carried out in same thread, with two computing units of grey mark, at two, independently in thread, carries out.
Embodiment 2
Sort-First draws embodiment
Figure 16 realizes the streamline that sort-first draws, and comprises two GLObjectRenderer computing units.All white computing unit in figure is carried out in same thread.Two one group of the computing unit of grey mark, independently carries out in thread at two, separates in the drawings with empty frame.
The structure of this streamline is substantially identical with the sort-last streamline in a upper joint, be all to produce parameter by two Const-Processor and a Variable Processor, and use GLFrame Composer to merge the drawing result of two GLObjectRenderer.Certainly, in order to adapt to the drafting demand of sort-first, these two streamlines still have some differences:
1. viewport/projection information passes to two GLObjectRenderer after GLScreenSplitter splits.
2.GLDataRange is directly passed to GLObjectRenderer without fractionation.
The Image Mosaics of 3.Sort-first does not need depth buffer, so the depth image of GLObjectRenderer output directly outputed in FakeSink and abandon, and only has viewport/projection information and RGBA image transfer to GLFrameComposer.
Two GLObjectRenderer computing units lay respectively in the thread under the window of two, the left side, and the FakeSink and the GLObjectRenderer that accept the depth image output of GLObjectRenderer are arranged in same thread.Other computing unit is all arranged in the affiliated thread of window on the right.
Embodiment 3
Use Render-Streamers to realize the parallelization of multipass rendering algorithm
The streamline feature of Render-Streamers self makes it be good at very much describing multipass rendering algorithm.Below the rendering algorithm of just take based on echo be example, show the realization of rendering algorithm in Render-Streamers that comprises a plurality of plot step.
Realize the Render-Streamers streamline of echo algorithm as shown in figure 17, in figure, the meaning of various symbols is identical with Figure 16.It comprises two groups of " GLViewport/GLObjectState " parameters, and wherein one group is used for light source visual angle,
Another group is for observer visual angle.The parameter at light source visual angle outputs to GLShadowMapGenerator and two computing units of GLShadow-CoordCalc, and the parameter at observer visual angle outputs to GLSimpleShadowRenderer computing unit.Two computing units of GLShadowCoordCalc and GLShadowMapGenerator generate respectively model-view-projective transformation matrix and the depth map (being echo) at light source visual angle, also export to GLSimpleShadowRenderer computing unit.GLSimpleShadowRenderer utilizes these information under observer visual angle, to draw out the scene with shade, and the result of drafting is shown on screen by GLFrameDisplayer.
This streamline does not adopt the method for drafting of the data parallels such as sort-first and sort-last, so export to the GLDataRange parameter of GLShadowMapGenerator and GLSimpleShadowRenderer, is all complete data area [0; 1).GLViewport parameter is not passed through the division of GLScreenSplitter yet.This streamline adopts two threads in same process to realize, and GLShadowMapGenerator computing unit is arranged in a thread with the FakeSink computing unit that it is connected to, and other computing unit is arranged in another one thread.

Claims (13)

1. three-dimensional picture parallel drawing and the method for visualizing driving based on data flow diagram, at GPU cluster, comprise at least one computing machine that possesses GPU, every computing machine is as described GPU clustered node, between described node, by network, connect, on described each node, operation has at least one process, described each at least one thread of in-process operation, its step is as follows:
1) at least one computing unit is set in thread, for completing calculation task or drafting task;
2) between described computing unit, by data link, connected and carried out that push pushes and/or fetch receives, in same thread, described computing unit is carried out according to dependence between data; In different threads, described computing unit is according to synchronization mechanism executed in parallel; Described data link comprises four kinds of data link: thread inner link, cross-thread link, shared drive link and socket link;
3) according to parallel drawing pattern and flow process corresponding to method for drafting, decompose, construct the corresponding parallel data flow graph of this method for drafting;
4) extract the drafting correlation parameter in three-dimensional model or scene, according to the computing unit in described data flow diagram and data link respectively the task of executed in parallel after decomposing obtain elementary area;
5) described elementary area is carried out to multi-form combination according to parallel drawing pattern, obtain complete visual image.
2. three-dimensional picture parallel drawing and the method for visualizing based on data flow diagram as claimed in claim 1, is characterized in that, the design of described thread inner link is as follows:
The execution sequence of the computing unit in same thread is topological order;
All computing units of thread inside are shared and are drawn environmental context, so the input slot of thread inner link and output magazine can be shared the local part of parameter.
3. three-dimensional picture parallel drawing and the method for visualizing based on data flow diagram as claimed in claim 1, is characterized in that,
Each thread parallel of described cross-thread link is carried out does not have sequencing each other, by output unit ability consumption data after the synchronous operation assurance input block generation data of cross-thread;
Described cross-thread link adopts the mode of round-robin queue to carry out data transmission, the round-robin queue that distributes a fixed size during link initialization, this round-robin queue carrys out the free space in record queue and has used space with two semaphores, to realize the thread synchronization of input block and output unit;
Described round-robin queue is two bufferings between computing unit, even if output unit also does not start to consume the data of this generation of input block, input block also can enter the next performance period, thereby increases the concurrency of thread execution.
4. three-dimensional picture parallel drawing and the method for visualizing based on data flow diagram as claimed in claim 1, is characterized in that, described shared drive link is for connecting the computing unit in two different processes that are positioned on same machine; Adopt sequencing method, in shared drive, deposit parameter object serializing byte stream afterwards.
5. three-dimensional picture parallel drawing and the method for visualizing based on data flow diagram as claimed in claim 4, it is characterized in that, the structure of described shared drive link comprises controll block, data block and semaphore three parts, the primary structure of its data block inside is the round-robin queue of buffer zone, and in this queue, the content of preserving is the byte sequence that whole parameter object obtains after the serializing operation of appointment in type definition TypeSpec.
6. three-dimensional picture parallel drawing and the method for visualizing based on data flow diagram as described in claim 4 or 5, it is characterized in that, at shared drive link, by a title name by string representation, identify, on same machine, two identical titles always refer to same shared drive link, after the name of given certain shared drive link, the handshake procedure of shared drive link is:
1) both sides that shake hands all attempt creating handshake amount name.init, atomicity due to document method operation, wherein a side understands success and the opposing party's failure, a successful side A continues to create other resources, and a side B of failure again opens handshake amount with the form of non-establishment and waits on the semaphore of shaking hands;
2) described A creates other IPC resources, and according to the value of these resources of information initializing such as the type information providing and round-robin queue's size;
3), after initialization completes, A carries out post and operates to wake up B on the semaphore name.init that shakes hands, then it at name.full semaphore with waiting on data block count semaphore;
4) B opens these IPC resources after being waken up, and the value that A is inserted is compared with the local information obtaining, if find inconsistently, reporting errors, shakes hands unsuccessfully; Otherwise shake hands successfully, B carries out post operation on name.full;
5) A is waken up rear deletion name.init, the end of shaking hands by B.
7. three-dimensional picture parallel drawing and the method for visualizing based on data flow diagram as claimed in claim 6, is characterized in that, the transmitting procedure of described shared drive link and the difference of cross-thread link are:
1) what in round-robin queue, store is the parameter of serializing, rather than the non local part of parameter object, so input slot and output magazine all possess a parameter object separately, is different from cross-thread link parameter pointer to object and directly points in queue;
2) serializing/unserializing operation of type of service definition rather than context_out or context_in operation in push and fetch operation.
8. three-dimensional picture parallel drawing and the method for visualizing based on data flow diagram as claimed in claim 1, is characterized in that, described socket link, for connecting two computing units that are positioned on two physical machines, is used Transmission Control Protocol to carry out communication; Described socket link is the data link of highest level, the output parameter groove of a computing unit is connected to the input parameter groove of another computing unit; A socket chain route (Target IP, target port, link ID) tlv triple unique identification.
9. three-dimensional picture parallel drawing and the method for visualizing based on data flow diagram as described in claim 1 or 8, is characterized in that, the handshake procedure of described socket link is:
1) output unit of link is fixed as monitoring side, and input block is fixed as connection side;
2) input block is constantly attempted connecting until successful connection when connecting.
10. three-dimensional picture parallel drawing and the method for visualizing based on data flow diagram as described in claim 1 or 9, is characterized in that, the transmitting procedure of described socket link is:
1) input block of link and output unit can transmit any number of parameter objects after having shaken hands;
2) when input block need to send a parameter object, first it send a data description DataHeader structure, wherein comprised the byte number after this parameter object serializing; If parameter type has fixed length serializing feature, this byte number is zero and uses the length of stating while shaking hands; If parameter type has elongated serializing feature, use this value as the length of actual transmissions;
3) output unit is confirmed the errorless rear transmission status code of data or error code, and then input block sends the data after serializing, and output unit receives that total data sends status code 0 as confirming later;
4) use independently thread to carry out the operation of data sending/receiving, to realize asynchronous transmission; Generation/consumption cross-thread in data transmit-receive thread and data arranges round-robin queue simultaneously, and the result after the serializing of queue for storing parameter object, controls asynchronous buffer size by the number of parameters that can deposit in round-robin queue is set.
Based on data flow diagram, realize parallel drawing and visual system in three-dimensional picture for 11. 1 kinds, it is characterized in that, at Render-Streamers framework, comprise: for the data flow diagram of decompose drawing and/or calculation task, parallel Programming based on data flow diagram and the streamline of executed in parallel, drafting resource module in internal system for calling, type management module, computing unit administration module, and the data link connecting by parameter passing interface and computing unit;
Described data flow diagram, by computing unit, parallel drawing in three-dimensional scenic and/or visualized algorithm have been decomposed to drafting and/or calculation task, between described computing unit, by input/output parameters groove, be connected with data link, the output of described computing unit is the input of data link, and the input of described computing unit is the output of data link; Described output parameter groove is carried out push operation push to data link, and described input parameter groove is carried out and received operation fetch to data link;
Described streamline is to carry out described data flow diagram streamline, the execution that the elementary cell of described performance period is computing unit in each performance period; The execution sequence of described computing unit meets: in the data link of cross-thread, carrying out synchronous operation and/or according to execution sequence, arranging between computing unit execution sequence and described thread block in thread block inside is the hierarchical structure of data dependence;
Described drafting resource module is the standard A PI based on 3D figure, and by parameter object being divided into non local part and local part, cross-thread shares when quoting resource; Basic resource unit in described drafting resource module is the resource that can provide in 3D graphics standard API;
Described type management module comprises type that built-in type in system and/or system provide and is defined and with the type of card format insertion system, all kinds collaborative work by developer;
Described computing unit administration module is used for the self-defined computing unit of developer, and quotes and generate this computing unit by name;
Each computing unit of described parameter passing interface has an input parameter piece and output parameter piece, and each parameter block comprises four grooves, and each groove is divided into SimpleSlot and FullSlot two parts;
Described data link, is Link class for the common interface of data of description link, has and only comprises: push and fetch; Data link, except being responsible for transmitting data, is also responsible for generation and the management of non local parameter in parameter groove, also comprises four kinds of data link: thread inner link, cross-thread link, shared drive link and socket link;
Described computing unit is that in data flow diagram, each node is a relatively independent computing unit, only by being connected to the data link of input/output parameters groove, communicates by letter with other computing units; Described computing unit each run is all asked input data from its all input link, output is passed to the data link connecting on all output magazines after calculating output.
12. as claimed in claim 11ly realize parallel drawing and visual system in three-dimensional picture based on data flow diagram, it is characterized in that, also comprise the plug-in unit system for expanding, each plug-in unit in described plug-in unit system is a dynamic link library, comprising: the maker that all computing units that all computing units, the plug-in unit that corresponding TypeSpec subclass, the plug-in unit of all data types that all data types, the plug-in unit that plug-in unit provides provides provides provides are corresponding, realize the object of card i/f.
13. as claimed in claim 11ly realize parallel drawing and visual system in three-dimensional picture based on data flow diagram, it is characterized in that, in system, also comprise built-in plug-in unit and graphic plotting layer plug-in unit, described built-in plug-in unit provides the basic corresponding TypeSpec object of arithmetic type, described graphic plotting layer plug-in unit, provides data type and calculating/drafting operation conventional in OpenGL and parallel drawing.
CN201310659788.2A 2013-12-09 2013-12-09 Parallel rendering and visualization method and system based on data flow diagram Active CN103679789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310659788.2A CN103679789B (en) 2013-12-09 2013-12-09 Parallel rendering and visualization method and system based on data flow diagram

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310659788.2A CN103679789B (en) 2013-12-09 2013-12-09 Parallel rendering and visualization method and system based on data flow diagram

Publications (2)

Publication Number Publication Date
CN103679789A true CN103679789A (en) 2014-03-26
CN103679789B CN103679789B (en) 2017-01-18

Family

ID=50317231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310659788.2A Active CN103679789B (en) 2013-12-09 2013-12-09 Parallel rendering and visualization method and system based on data flow diagram

Country Status (1)

Country Link
CN (1) CN103679789B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104835110A (en) * 2015-04-15 2015-08-12 华中科技大学 Asynchronous graphic data processing system based on GPU
CN106462938A (en) * 2014-06-26 2017-02-22 英特尔公司 Efficient hardware mechanism to ensure shared resource data coherency across draw calls
CN109343984A (en) * 2018-10-19 2019-02-15 珠海金山网络游戏科技有限公司 Data processing method, calculates equipment and storage medium at system
CN110223361A (en) * 2019-05-10 2019-09-10 杭州安恒信息技术股份有限公司 The method for realizing fly line effect based on web front-end technology
CN111539518A (en) * 2017-04-24 2020-08-14 英特尔公司 Computational optimization mechanism for deep neural networks
CN112162737A (en) * 2020-10-13 2021-01-01 深圳晶泰科技有限公司 Universal description language data system of directed acyclic graph automatic task flow
CN112799603A (en) * 2021-03-02 2021-05-14 王希敏 Task behavior model for multiple data stream driven signal processing system
CN112862245A (en) * 2020-12-30 2021-05-28 北京知因智慧科技有限公司 Data exchange method and device and electronic equipment
CN113253965A (en) * 2021-06-25 2021-08-13 中国空气动力研究与发展中心计算空气动力研究所 Mass data multi-view-port visual interaction method, system, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101527031A (en) * 2008-08-20 2009-09-09 深圳先进技术研究院 Ray-projection polynuclear parallel body drawing method
US20130155080A1 (en) * 2011-12-15 2013-06-20 Qualcomm Incorporated Graphics processing unit with command processor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101527031A (en) * 2008-08-20 2009-09-09 深圳先进技术研究院 Ray-projection polynuclear parallel body drawing method
US20130155080A1 (en) * 2011-12-15 2013-06-20 Qualcomm Incorporated Graphics processing unit with command processor

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
STEFAN EILEMANN等: "Equalizer: A Scalable Parallel Rendering Framework", 《IEEE TRANSACTIONGS ON VISUALIZATION AND COMPUTER GRAPHICS》 *
刘真 等: "基于PC集群并行图形绘制系统综述", 《系统仿真学报》 *
马志刚 等: "三维地形场景流式传输", 《北京大学学报(自然科学版)》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462938A (en) * 2014-06-26 2017-02-22 英特尔公司 Efficient hardware mechanism to ensure shared resource data coherency across draw calls
CN104835110B (en) * 2015-04-15 2017-12-22 华中科技大学 A kind of asynchronous diagram data processing system based on GPU
CN104835110A (en) * 2015-04-15 2015-08-12 华中科技大学 Asynchronous graphic data processing system based on GPU
CN111539518A (en) * 2017-04-24 2020-08-14 英特尔公司 Computational optimization mechanism for deep neural networks
CN111539518B (en) * 2017-04-24 2023-05-23 英特尔公司 Computation optimization mechanism for deep neural networks
CN109343984A (en) * 2018-10-19 2019-02-15 珠海金山网络游戏科技有限公司 Data processing method, calculates equipment and storage medium at system
CN109343984B (en) * 2018-10-19 2020-05-19 珠海金山网络游戏科技有限公司 Data processing method, system, computing device and storage medium
CN110223361A (en) * 2019-05-10 2019-09-10 杭州安恒信息技术股份有限公司 The method for realizing fly line effect based on web front-end technology
CN112162737A (en) * 2020-10-13 2021-01-01 深圳晶泰科技有限公司 Universal description language data system of directed acyclic graph automatic task flow
CN112862245A (en) * 2020-12-30 2021-05-28 北京知因智慧科技有限公司 Data exchange method and device and electronic equipment
CN112862245B (en) * 2020-12-30 2024-04-23 北京知因智慧科技有限公司 Data exchange method and device and electronic equipment
CN112799603A (en) * 2021-03-02 2021-05-14 王希敏 Task behavior model for multiple data stream driven signal processing system
CN113253965A (en) * 2021-06-25 2021-08-13 中国空气动力研究与发展中心计算空气动力研究所 Mass data multi-view-port visual interaction method, system, equipment and storage medium
CN113253965B (en) * 2021-06-25 2021-10-29 中国空气动力研究与发展中心计算空气动力研究所 Mass data multi-view-port visual interaction method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN103679789B (en) 2017-01-18

Similar Documents

Publication Publication Date Title
CN103679789B (en) Parallel rendering and visualization method and system based on data flow diagram
CN109472858B (en) Differential rendering pipeline for inverse graphics
Lindholm et al. NVIDIA Tesla: A unified graphics and computing architecture
Benthin Realtime ray tracing on current CPU architectures
KR101231291B1 (en) Fast reconfiguration of graphics pipeline state
Bethel et al. Using high-speed WANs and network data caches to enable remote and distributed visualization
US6088044A (en) Method for parallelizing software graphics geometry pipeline rendering
US11463272B2 (en) Scalable in-network computation for massively-parallel shared-memory processors
CN111143174A (en) Optimal operating point estimator for hardware operating under shared power/thermal constraints
Steinberger et al. Parallel generation of architecture on the GPU
WO2010033942A1 (en) Systems and methods for a ray tracing shader api
CN111210498A (en) Reducing the level of detail of a polygon mesh to reduce the complexity of rendered geometry
CN111667542B (en) Decompression technique for processing compressed data suitable for artificial neural network
CN111445003A (en) Neural network generator
Reiners OpenSG: A scene graph system for flexible and efficient realtime rendering for virtual and augmented reality applications
CN112041894A (en) Improving realism of scenes involving water surface during rendering
Arpa et al. Crowdcam: Instantaneous navigation of crowd images using angled graph
CN110211197A (en) It is a kind of based on polygon space divide Photon Mapping optimization method, apparatus and system
CN114247138B (en) Image rendering method, device and equipment and storage medium
Zhang et al. Multi-GPU Parallel Pipeline Rendering with Splitting Frame
JPH11353496A (en) Intersection search device for light ray tracing
CN112950451B (en) GPU-based maximum k-tress discovery algorithm
Villemin et al. Art and technology at Pixar, from Toy Story to today
CN110070597A (en) A kind of Unity3D rendering accelerated method based on OpenCL
CN114168340B (en) Multi-core system synchronous data flow graph instantiation concurrent scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200722

Address after: 830-3, 8 / F, No. 8, Sijiqing Road, Haidian District, Beijing 100195

Patentee after: Beijing weishiwei Information Technology Co.,Ltd.

Address before: 100871 Haidian District the Summer Palace Road,, No. 5, Peking University

Patentee before: Peking University

TR01 Transfer of patent right