CN103559357B - A kind of fpga chip rendering acceleration for 3D graphics - Google Patents

A kind of fpga chip rendering acceleration for 3D graphics Download PDF

Info

Publication number
CN103559357B
CN103559357B CN201310560232.8A CN201310560232A CN103559357B CN 103559357 B CN103559357 B CN 103559357B CN 201310560232 A CN201310560232 A CN 201310560232A CN 103559357 B CN103559357 B CN 103559357B
Authority
CN
China
Prior art keywords
module
rendering
data
sdram
floating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310560232.8A
Other languages
Chinese (zh)
Other versions
CN103559357A (en
Inventor
陈陵都
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Hua Leiyi Microelectronics Co., Ltd.
Original Assignee
Nanjing Hua Leiyi Microelectronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Hua Leiyi Microelectronics Co Ltd filed Critical Nanjing Hua Leiyi Microelectronics Co Ltd
Priority to CN201310560232.8A priority Critical patent/CN103559357B/en
Publication of CN103559357A publication Critical patent/CN103559357A/en
Application granted granted Critical
Publication of CN103559357B publication Critical patent/CN103559357B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of fpga chip rendering acceleration for 3D graphics, this 3D fpga chip is overlapped mutually by the logical layer circuit of the restructural layer circuit on upper strata with lower floor and is integrally forming, and this 3DFPGA chip includes PCIe interface module, 3D filling module, space two sub-module, initial rendering module, 3D rendering module and display module according to data stream successively from the order being input to output.Compared with Ray Tracing Rendering based on ASIC fixed logic, the present invention has the advantage of the high degree of flexibility that the programmability of ASIC institute nothing brings.And relative to the fpga chip of popular, the fpga chip of the present invention renders in 3D graphics and provides 10 in the application of acceleration2The lifting of times level chip logic density, and have and have 10 relative to the most general 3D graphics Rendering based on cpu chip or GPU chip4The acceleration of times level.

Description

A kind of fpga chip rendering acceleration for 3D graphics
Technical field
The present invention relates to 3D graphics Rendering field, particularly relate to one and render for 3D graphics Field programmable gate array (Field-Programmable Gate Array, the FPGA) chip accelerated.
Background technology
3D graphics Rendering is 3D computer graphics (3D Computer Graphics) now The technology that in application, demand property is the highest.The highest demand of this technology is to produce truly in application in real time Vision imaging.Towards this direction, the realization of 3D graphics Rendering has 3 kinds of modes so far: 1. based on CPU or the 3D graphics wash with watercolours of graphic process unit (Graphic Processing Unit, GPU) Dyeing technique.2. based on special IC (Application Specific integrated Circuit, ASIC) The 3D graphics Rendering of chip fixed logic.3. 3D based on fpga chip reconfigurable logic Graphics Rendering.
In many large-scale 3D application, most time-consuming task is by the vision of 3D image Information representation is in the plane of 2D.Especially will as 3D industrial design, 3D animation, 3D electricity The instrument of shadow specially good effect editor is used in extensive, complicated Drawing Design (the such as model of aircraft, automobile Design) on, it is possible to be quickly completed tangent plane (Cutting Plane), model combination (Model Interrogation), Complicated coloring (Sophisticated Shading) is a basic key operation.These different 3D figures Learning application has a common technology to require: 3D digital signal processing function at a high speed.At Luo Congyi Narration in " the Blender authority's guide " book the 7th chapter Section 5 " rendering farm " write: famous Film specially good effect operating room Weta Digital is at Making Movies " 2012 " (note: U.S.'s science fiction calamity in 2009 Difficult sheet) during, spend in render time average out on single frames picture 20 hours, and whole The specially good effect frame number of individual earthquake scene is more than 7000 frames, so render time altogether is about 141,120 Hour.
Reporting according to Wikipedia, the 3D graphics instrument in present 3D application market is almost complete by base System (Rasterization-Based System) in raster method.The software section of this system by The built-in function of OpenGL API (www.opengl.org) and their application composition, hardware components is complete It is CPU, GPU and the dedicated custom chip of minority design use.Start so far from the seventies, Along with cpu chip (headed by Intel Company) and GPU chip (headed by Nvidia company) Designing technique and Super deep submicron process (Ultra-Deep Submicron Processing Technology) high development of chip manufacturing process, the product of 3D graphics instrument based on raster method Matter and speed of operation have the most progressive.The advantage of raster method 3D graphics system is product development Cycle is short and high degree of flexibility, but owing to its operation essentially refers to make level serially, The rising of speed do not catch up with far away Super deep submicron process according to Moore's Law (Moore ' s Law, work The chip logic density that skill progress brings doubles for every 18 months) progress.Up to now, pass through The effort of 40 years, the requirement of the speed of raster method 3D engineering graphic application still application the most in real time.Closely Over 10 years, some utilize multinuclear with the raster method system of the custom hardware coordinating CPU report slightly above The video effect of 24 frames per second, but their construction cycle is very long and cost intensive.
The graphics design of raster method system is a drawing-board type drawing technique handled by designer (Painter’s Algorithm).Its effect is the 3D vision of people table in the way of artificial simulation Reach for the result of drawing on 2D drawing board, and and the emulation of non-genuine optical physics phenomenon.
From the seventies, one occurs based on physioptial Ray Tracing, and attempt obtains a thing Visual effect true to nature in reason, by the reflection in optics, reflect, have an X-rayed with shade physical phenomenon profit Calculate by the ray method of geometric optics.Invisible part is substantially avoided due to Ray Tracing Tracking, amount of calculation is the logarithmic function of object complexity, is sight complexity with raster method amount of calculation Linear function compare, in for the 3D engineering graphic application of most complex scenarios, have obvious advantage.But Its amount of calculation is the biggest burden for the computer software speed of service now.
From calendar year 2001, Germany Saarland university (Saarland University) is by Philipp Slusallek The computer graphics group study that professor leads is with hardware parallel computing framework based on Ray Tracing Put into practice the modeling of 3D graphics and render.In the IEEE meeting that the Utah, USA of in JIUYUE, 2006 is held In view (IEEE Symposium on Interactive Ray Tracing), this group announces a reality Time ray trace asic chip design, the picture of 200 vertical frame dimension qualities per second can be reached.According to German automobiles company Volkswagan reports, this chip technology for auto industry provide one new The visual data manager of car model, it is to avoid costly and time consuming prototype so that auto industry Design cost and launch cycle reduce by 30%.
Two kinds of 3D engineering graphic application systems of summary, i.e. pure software system and pure hardware system, building Strengths and weaknesses of both mould, the performance rendered and cost: although the asic chip of pure hardware can produce Far beyond CPU or the GPU method high-quality of pure software, high performance picture, but cost is the highest, and And flexibility ratio can not show a candle to the latter, the requirement that multiple 3D engineering graphic application is different is not provided that a kind of such as profit By the solution adjusted able to programme of CPU or GPU software.
The present invention focuses on the fusion above two 3D other advantage of engineering graphic application solution, And avoid the two other shortcomings of scheme.More specifically, on the one hand retain the able to programme of Software-only method Pattern, thus inherit motility and the adjustable of its application, the most again should for 3D graphics The Embedded A SIC module with high speed operation performance is provided by required hardware speed advantage, allows this Two kinds are organized in the upper staggered existence of time and space design performing 3D application task, work in perfect harmony. It coexists the form with the mode cooperated and each comfortable software design arts and hardware designs field all To be determined by the cost performance optimization point of 3D engineering graphic application product.
Summary of the invention
(1) to solve the technical problem that
In view of this, present invention is primarily targeted at offer and a kind of render acceleration for 3D graphics Fpga chip, to promote 3D graphics rendering speed and chip logic density.
(2) technical scheme
For reaching above-mentioned purpose, the invention provides a kind of FPGA rendering acceleration for 3D graphics Chip, this 3D fpga chip is mutual with the logical layer circuit of lower floor by the restructural layer circuit on upper strata Being superimposed as being integrated, this 3D fpga chip wraps from the order being input to output successively according to data stream Include PCIe interface module 1,3D filling module 2, space two sub-module 3, initial rendering module 4, 3D rendering module 5 and display module 6.Wherein, restructural layer circuit and logical layer circuit are at this 3D In fpga chip and deposit and form a superposition and close-connected complete structure.
In such scheme, described PCIe interface module 1 is used for realizing this 3D fpga chip with outside The data transmission of PCIe bus, front end interface is connected with PCIe bus slot, and external signal passes through PCIe Bus passes in and out chip with the form of differential signal;3D data in the 3D data file of PC hard disc with High Speed Serial transmission means PCIe slot on PC motherboard is transferred into this 3D fpga chip The difference port of PCIe interface module;The back end interface of PCIe interface module loads module 2 with 3D Connect.
In such scheme, described 3D filling module 2 host computer (PC) in the future is via PCIe 3D primitive data in the 3D data of interface module 1 input is cached in the pel SDRAM of outside In, and the 3D optical data in 3D data is directly output to initial rendering module 4, this 3D light Learn data and at least include photographic head, material and light source information.
In such scheme, described space two sub-module 3 is for caching in outside pel SDRAM 3D primitive data be converted into two points of space KD tree type data and be cached in outside KD SDR on Stating in scheme, described initial rendering module 4 at least includes shooting for input 3D filling module The 3D optical data of head, material and light source information is analyzed and processes, and produces 3D rendering module Required optical data, this optical data at least includes ray initial point and picture vertex data, and exports To 3D rendering module, produce 3D rendering module simultaneously and render the enabling signal needed for every frame 3D picture And process the finishing signal that 3D rendering module terminates to produce when every frame 3D picture renders.
In such scheme, described 3D rendering module 5 is in the outside pel SDRAM synchronizing input The 3D primitive data of caching, two points of the space KD tree type data of caching and rising in KDSDRAM The optical data of beginning rendering module 4 carries out 3D and renders, and exports the result rendering process to display Module 6.
In such scheme, described display module 6 carries out data conversion to the result rendering process, obtains Display object information, and the display object information obtained is buffered in the frame buffering SDRAM of outside, And by the display object information obtained and the RGB color from outside frame buffering SDRAM input Data export to VGA display together for native monitoring, feed back to PCIe interface module 1 simultaneously For host computer processes.
(3) beneficial effect
From technique scheme it can be seen that the method have the advantages that
1, the fpga chip rendering acceleration for 3D graphics that the present invention provides, with popular The instruction-level speed of operation utilizing the 3D graphics software Rendering of CPU or GPU is compared, core Sheet gate leve calculates speed 104The lifting of times level.On the other hand, the present invention utilizes static reconfigurability (programmability of similar FPGA practice, i.e. determine the function of reconfigurable circuit in programming phases) protects Stay the programmable features advantage of CPU or GPU.Meanwhile, the dynamic reconfigurable of the present invention (i.e. exists Operational phase determines the function of reconfigurable circuit in real time) with real-time gate leve the second (10-12Second) level prolongs Slow speed of operation surmounts CPU or GPU instruction-level microsecond (10-6Second) speed of operation that postpones.
2, the fpga chip rendering acceleration for 3D graphics that the present invention provides, fixes with ASIC The Ray Tracing Rendering of logic is compared, and programmable features has the programmability band of ASIC institute nothing The advantage of the high degree of flexibility come.
3, the fpga chip rendering acceleration for 3D graphics that the present invention provides, with popular Fpga chip is compared, and renders in acceleration at 3D engineering graphic application, not only has similar restructural special Property, and because providing Embedded floating-point ALU module and Embedded caching (Cache) module, Solve 3D and render the application common requirements to high accuracy floating-point operation Yu high-speed data conveying function, make Chip logic density has 102The lifting of times level.
Accompanying drawing explanation
In detail foregoing invention content is described by accompanying drawing image, so that the feature of the present invention Becoming more fully apparent with advantage, these accompanying drawings include:
Fig. 1 is illustrated that the structural representation of the 3D fpga chip according to first embodiment of the invention (framework A).
Fig. 2 is illustrated that the structural representation of the 3D fpga chip according to second embodiment of the invention (framework B).
Fig. 3 is illustrated that the structural representation of PCIe interface module.
Fig. 4 is illustrated that the structural representation of 3D filling module.
Fig. 5 is illustrated that the structural representation of space two sub-module.
Fig. 6 is illustrated that the structural representation of initial rendering module.
Fig. 7 is illustrated that the structural representation of 3D rendering module.
Fig. 8 is illustrated that the structural representation of main routing module in Fig. 7.
Fig. 9 is illustrated that in Fig. 7 the structural representation passing through cache module.
Figure 10 is illustrated that in Fig. 7 and renders the structural representation passing through module.
Figure 11 is illustrated that in Figure 10 the structural representation passing through routing module.
Figure 12 is illustrated that the structural representation of crossing pipeline module in Figure 10.
Figure 13 is illustrated that in Figure 12 the structural representation passing through datapath module.
Figure 14 is illustrated that in Figure 10 the structural representation passing through floating-point ALU module.
Figure 15 is illustrated that in Fig. 7 the structural representation passing through fifo module.
Figure 16 is illustrated that in Fig. 7 the structural representation enumerating cache module.
Figure 17 is illustrated that in Fig. 7 and renders the structural representation enumerating module.
Figure 18 is illustrated that in Figure 17 the structural representation enumerating routing module.
Figure 19 is illustrated that in Figure 17 the structural representation enumerating pipeline module.
Figure 20 is illustrated that the structural representation of enumerated data path module in Figure 19.
Figure 21 is illustrated that in Fig. 7 the structural representation enumerating fifo module.
Figure 22 is illustrated that in Fig. 7 the structural representation puncturing cache module.
Figure 23 is illustrated that in Fig. 7 the structural representation rendering piercing module.
Figure 24 is illustrated that in Figure 23 the structural representation puncturing routing module.
Figure 25 is illustrated that in Figure 23 the structural representation puncturing pipeline module.
Figure 26 is illustrated that in Figure 25 the structural representation puncturing datapath module.
Figure 27 is illustrated that in Figure 23 the structural representation puncturing floating-point ALU module.
Figure 28 is illustrated that in Fig. 7 the structural representation colouring fifo module.
Figure 29 is illustrated that in Fig. 7 the structural representation colouring cache module.
Figure 30 is illustrated that in Fig. 7 the structural representation rendering staining module.
Figure 31 is illustrated that in Figure 30 the structural representation colouring routing module.
Figure 32 is illustrated that the structural representation of coloured pipes module in Figure 30.
Figure 33 is illustrated that the structural representation of coloring data path module in Figure 32.
Figure 34 is illustrated that in Figure 30 the structural representation of coloring floating-point ALU module.
Figure 35 is illustrated that the structural representation of floating-point ALU datapath module in Figure 34.
Figure 36 is illustrated that the structural representation of Floating-point divider module in Figure 35.
Figure 37 is illustrated that the structural representation of display module in Fig. 7.
Detailed description of the invention
Hereinafter, by referring to accompanying drawing and form, the example of the present invention will be described in detail. But, the present invention can be carried out in many different forms, and should not be limited to reality given here Example, it is thoroughly with complete that being provided for of this example makes the disclosure, and to being familiar with this area Personnel pass on the thought of the present invention all sidedly.
The structure of 3D fpga chip be one be high-speed digital video camera reconfigurable parallel many Process line construction.Hereinafter, the most reconfigurable all part-structures are all so that (* can Reconstruct) labelling show.
Reference Fig. 1, framework A are two kinds of the present invention and realize one of structure.SDRAM in Fig. 1 Be directly connected to via chip pin with the 3D fpga chip of framework A, this be for SDRAM with 3D fpga chip shares the design of same system board.This 3D fpga chip weighing by upper strata The logical layer circuit of structure layer circuit and lower floor is overlapped mutually and is integrally forming, restructural layer circuit and logical layer Circuit is in this 3D fpga chip and deposits and forms a superposition and close-connected complete structure.
Each 3D fpga chip is made up of 6 reconfigurable modules, defeated from being input to according to data stream The order gone out includes PCIe interface module 1,3D filling module 2, space two sub-module 3 successively, rises Beginning rendering module 4,3D rendering module 5 and display module 6.
Wherein, PCIe interface module 1 is for realizing this 3D fpga chip and exterior PC Ie bus Data are transmitted, and front end interface is connected with PCIe bus slot, and external signal passes through PCIe bus with difference The form turnover chip of sub-signal.3D data in the 3D data file of PC hard disc are with High Speed Serial Transmission means PCIe slot on PC motherboard is transferred into the PCIe of this 3D fpga chip The difference port (Differential Port) of interface module.The back end interface of PCIe interface module and 3D Filling module 2 connects.
3D filling module 2 inputs via PCIe interface module 1 for host computer (PC) in the future 3D data in 3D primitive data be cached in the pel SDRAM of outside, and by 3D data In 3D optical data (such as photographic head, material, light source information) be directly output to initial rendering Module 4.
Space two sub-module 3 is for turning the 3D primitive data of caching in outside pel SDRAM Change two points of space KD tree type data into and be cached in outside KD SDRAM.
Initial rendering module 4 is for the photographic head inputting 3D filling module, material, light source information It is analyzed Deng 3D optical data and processes, producing the ray initial point needed for 3D rendering module and picture The optical datas such as vertex of surface data, and export to 3D rendering module, produce 3D rendering module simultaneously Enabling signal and process 3D rendering module needed for rendering every frame 3D picture terminate every frame 3D picture The finishing signal produced when rendering.
3D rendering module 5 is the 3D pel of caching in the outside pel SDRAM to synchronization input Two points of the space KD tree type data cached in data, KD SDRAM and initial rendering module 4 Optical data carry out 3D and render, and the result that will render process exports to display module 6.
Display module 6, for the result rendering process is carried out data conversion, obtains showing object information, And the display object information obtained is buffered in the frame buffering SDRAM of outside, and will obtain Display object information exports together with the RGB color data from outside frame buffering SDRAM input To VGA display for native monitoring, feed back to PCIe interface module 1 for master computer simultaneously Process (such as Intelligent Recognition or internet communication are applied).
Reference Fig. 2, framework B are two kinds of the present invention and realize the two of structure.Framework A and framework B's Only difference is that the 3D fpga chip of the present invention (includes pel with outside SDRAM SDRAM, KD SDRAM and frame buffer SDRAM) data exchange ways.Institute in Fig. 2 Having SDRAM with 3D fpga chip to be connected via outside PCIe bus, this is for PCIe On slot, SDRAM from 3D fpga chip is in the design of the most different system boards.Purpose is carrying The high present invention extending motility and reducing the cost of system board design on system board designs.
Wherein, PCIe interface module 1, it is used for realizing this 3D fpga chip and exterior PC Ie bus Data transmission, front end interface is connected with PCIe bus slot, external signal pass through PCIe bus with The form turnover chip of differential signal.3D primitive data in the 3D data file of PC hard disc is with height Speed Serial Port Transmission mode PCIe slot on PC motherboard is transferred into this 3D fpga chip The difference port (Differential Port) of PCIe interface module.The back end interface of PCIe interface module It is connected with 3D filling module 2.
3D loads module 2, defeated via PCIe interface module 1 for host computer (PC) in the future 3D primitive data in the 3D data entered passes sequentially through PCIe interface module 1 and PCIe bus and delays It is stored in the pel SDRAM of outside, and the 3D optical data in 3D data (is such as imaged Head, material, light source information) it is directly output to initial rendering module 4.
Space two sub-module 3, obtains for passing sequentially through PCIe interface module 1 and PCIe bus Outside pel SDRAM in caching 3D primitive data be converted into two points of space KD tree type data, And pass sequentially through PCIe interface module 1 and PCIe bus cache in outside KD SDRAM.
Initial rendering module 4, for believing the photographic head of 3D filling module input, material, light source The 3D optical datas such as breath are analyzed and process, produce the ray initial point needed for 3D rendering module and Picture vertex data, and export to 3D rendering module, produce 3D rendering module simultaneously and render every frame Enabling signal and process 3D rendering module needed for 3D picture terminate to produce when every frame 3D picture renders Raw finishing signal.
3D rendering module 5, for scheming the 3D synchronizing to cache in the outside pel SDRAM of input Two points of the space KD tree type data cached in metadata, KD SDRAM and initial rendering module 4 Optical data carry out 3D and render, and the result that will render process exports to display module 6.
Display module 6, for the result rendering process is carried out data conversion, obtains showing result letter Breath, and the display object information obtained is passed sequentially through PCIe interface module 1 and PCIe bus cache In outside frame buffering SDRAM, and the display object information obtained is delayed with from outside frame The RGB color data rushing SDRAM input export to VGA display together for native monitoring, Feed back to PCIe interface module 1 for host computer processes (such as Intelligent Recognition or the Internet simultaneously Communications applications).
Based on the framework B shown in framework A and Fig. 2 shown in above-mentioned Fig. 1, this fpga chip is with outer Portion SDRAM (i.e. pel SDRAM, KD SDRAM buffers SDRAM with frame) carries out data Exchange, in framework A, the pel SDRAM in the SDRAM of this outside is directly connected in this FPGA 3D filling module 2 in chip and space two sub-module 3, the KD in the SDRAM of this outside Space two the sub-module 3 and 3D rendering module 5 that SDRAM is directly connected in this fpga chip, The display mould that frame buffering SDRAM in the SDRAM of this outside is directly connected in this fpga chip Block 6.In framework B, this fpga chip is to be connected to outside this via outside PCIe bus Pel SDRAM, KD SDRAM in SDRAM buffers SDRAM with frame.
Below in conjunction with Fig. 3 to Figure 37, the composition present invention is rendered for 3D graphics the FPGA of acceleration The PCIe interface module 1 of chip, 3D filling module 2, space two sub-module 3, initial rendering module 4,3D rendering module 5 and display module 6 are described in detail.
With reference to Fig. 3, PCIe interface module is by two sub-module compositions, i.e. PCIe core module and PCIe Application module.PCIe core module is for performing the physical layer of data exchange, data link layer and process layer Protocol logic.PCIe application module is used for controlling 3D and loads module, space two sub-module, 3D Rendering module and display module and (Fig. 2) on outside (Fig. 1) or exterior PC Ie bus slot PC, between pel SDRAM, KD SDRAM and frame buffer SDRAM data exchange.
Comprise 3 modules with reference to Fig. 4,3D filling module, i.e. PCIe read module, 3D classify mould Block and pel SDRAM control module.PCIe read module is for obtaining from PCIe interface module 1 Take 3D data and be transferred to 3D sort module.3D sort module is for being divided into two by these 3D data Class data, a class is 3D primitive data, and output is to outside pel SDRAM, another kind of for 3D light Learning data (i.e. photographic head, material, light source data), output is to initial rendering module 4.Pel SDRAM Control module is for exporting this 3D primitive data and cache to outside pel SDRAM.
With reference to Fig. 5, space two sub-module comprises 6 modules, i.e. KD joint storehouse FILO module, Build KD module, two sub-modules, cost calculation module, two points of position fifo modules and pel FIFO Module.Cost calculation module reads in the most required 3D with two sub-modules from outside pel SDRAM Primitive data is buffered in pel fifo module and two points of position fifo modules the most respectively.Two sub-modules 3D sight pel is produced based on axial bag box with the summit obtained from two points of position FIFO Two points of space plane (the Binary Space of (Axis-Aligned Bounding Box is called for short AABB) Partition Plane), then utilize the primitive vertices coordinate that cost calculation module obtains from pel FIFO Calculate every one-level AABB of KD tree (KD Tree) two are divided into this, finally minimum with cost Two points of planes complete two points of the KD joint of this grade.The root KD of the whole all pels of 3D sight saves (Root KD Node) repeatedly two assigned to an other KD leaf segment (KD Leaf Node) by space two sub-module, The primitive data of all 3D sights is converted into a KD tree data structure, and then directly (A Framework) or via PCIe interface module (B framework) output to outside KD SDRAM.
Space two sub-module is read into 3D sight primitive data, according to KD from outside pel SDRAM Space Bisection Algorithms (KD Binary Space Partition Algorithm) carries out repeatedly two points of space, Save (KD Tree Inner Node) information record by internal for all KD trees by two points and input simultaneously Outside KD SDRAM, until each space two sub-path arrives KD leaves joint (KD Tree Leaf Node)。
With reference to Fig. 6, initial rendering module includes that initial ALU and initial state machine, initial ALU are used It is analyzed in the 3D optical data that 3D filling module is inputted and processes, producing 3D rendering module Required ray initial point and picture vertex data, and export to 3D rendering module, process 3D simultaneously Rendering module terminates the finishing signal produced when every frame 3D picture renders.Initial state machine is used for producing 3D rendering module renders the enabling signal needed for every frame 3D picture.
12 modules, the most main routing module, 4 caching moulds are comprised with reference to Fig. 7,3D rendering module Block (passes through cache module, enumerates cache module, puncture cache module, coloring cache module), and 4 The individual submodule that renders (renders and passes through module, renders and enumerate module, render piercing module, render coloring Module) and 3 interval they fifo module (pass through fifo module, enumerate fifo module, Puncture fifo module).
Main routing module in 3D rendering module, 4 cache modules (pass through cache module, enumerate slow Storing module, puncture cache module, coloring cache module), 4 render submodule (render pass through module, Render and enumerate module, render piercing module, render staining module) and 3 FIFO (pass through FIFO Module, enumerate fifo module, puncture fifo module) Data Source be:
1)KD SDRAM
Storage 3D pel KD tree construction data after spatial dichotomy processes.
2) pel SDRAM
The pel composition information of storage 3D sight.
3) 3D optical data
Initial rendering module 3D optical data (photographic head, sight thing to 3D filling module input Body material and the information of light source) it is analyzed and processes, penetrating needed for the 3D rendering module of generation Line initial point and picture vertex data.
KD joint number from outside KD SDRAM passes through cache module according to input.From exterior view The primitive data of unit SDRAM inputs main routing module.Defeated from the optical data of initial rendering module Enter to render and pass through module.The pipeline operation computation sequence that 3D renders: render and pass through module > and pass through Fifo module > render enumerate module > enumerate fifo module > render piercing module > puncture Fifo module > renders staining module.This pipeline operation by the direction along ray trace from point to Nearly point of puncture (nearest intersect point), by order Zhou Erfu of direct projection > reflection > refraction Begin until completing.
Include that major cycle MUX module and master are slow with reference to the main routing module in Fig. 8,3D rendering module Depositing read module, major cycle MUX module is enumerated module, is rendered piercing module and render from rendering Selecting a module in three modules of staining module, master cache read module is from outside pel SDRAM input is chosen the primitive data needed for module, and major cycle MUX module then will input Primitive data exports to this selected module.
Include passing through CAM bank, passing through with reference to the cache module that passes through in Fig. 9,3D rendering module TAG module and pass through caching RAM module, therein pass through CAM bank and passes through TAG module Accept and analyze to render to pass through module to rendering crossing pipeline and calculate the request of required KD joint number evidence, continue And from pass through caching RAM module (if request KD save in this locality, i.e. Cache Hit) or from Outside KD SDRAM (if the KD of request saves not in this locality, i.e. Cache Miss) input institute The KD joint number evidence needed.
Module is passed through by passing through routing module and 64 with reference to rendering in Figure 10,3D rendering module The crossing pipeline module of (* restructural) ray 1: 1 correspondence and pass through floating-point ALU and constitute.From The KD joint number evidence passing through cache module passes through wearing of routing module selection via passing through routing module input More pipeline module, calls the KD joint of ray with input and passes through floating-point ALU module and render and wear More calculate, the KD leaf segment ID of result is exported to passing through FIFO in case exporting and enumerating module to rendering.
With reference to Figure 11, render the routing module that passes through passing through in module and include circulating selected control module and passing through Caching read module, wherein circulation selected control module in the way of circulating (Round Robin) in turn The crossing pipeline of 64 (* restructural) rays select a pipeline be connected with passing through cache module, Pass through caching read module KD joint number evidence needed for passing through cache module input and give crossing pipeline.
With reference to Figure 12, render the crossing pipeline module passed through in module and include passing through block of state, ray Generation module, pass through control module and pass through datapath module, for performing a ray and one Pass through (Traverse) of KD joint calculates, and saves bag box (KD Node Bounding including ray with KD Box) and two facet (Splitting Plane) puncture (Intersect) calculate.Calculating is passed through whole Individual KD tree all KD joint is until KD leaf segment (KD Leaf Node).Calculating process is by passing through control Finite state machine (Finite State Machine) in module controls to pass through the fortune in datapath module Calculate logic to complete.With reference to Figure 12, the datapath module of passing through in all 64 crossing pipelines is total to Floating-point ALU module is passed through with same.
With reference to Figure 13, in crossing pipeline module pass through datapath module include passing through stack module and Passing through algoritic module, its nucleus module is to pass through algoritic module, i.e. ray saves bag box (KD with KD Node Bounding Box) the puncture (Intersect) of two points of space plane (BSP Split Plane) Calculate.When ray passes through the KD joint of two points of plane both sides simultaneously, on the one hand continue to pass through next stage KD joint, on the other hand abeyant KD joint is registered in and passes through storehouse FILO module.
With reference to Figure 14, render the floating-point ALU module of passing through passing through in module and include passing through floating-point ALU Control module and floating-point ALU datapath module, pass through what datapath module sent for acceptance ALU mode request, is sent suitable control signal by passing through floating-point ALU control module, allows floating-point ALU datapath module forms required ALU computing formula of passing through and completes to pass through floating-point ALU Calculate.
64 (* restructurals) are comprised with reference to the fifo module that passes through in Figure 15,3D rendering module FIFO submodule is passed through for what individual rays passed through calculating.Pass through the input of fifo module from Module is passed through in rendering of 3D rendering module, is passed through meter by passing through routing module control importing individual rays That calculates passes through FIFO submodule.
Include enumerating TAG module and enumerating with reference to the cache module of enumerating in Figure 16,3D rendering module Caching RAM module, the TAG of enumerating module therein accepts and analyzes to render to enumerate module to rendering row Lifting pipeline and calculate the request of required pel ID data, enumerating from this module caches RAM mould then Block (if the pel ID of request is in this locality, i.e. Cache Hit) or from outside pel SDRAM (as Fruit request pel ID not in this locality, i.e. Cache Miss) input needed for pel ID data.
Module is enumerated by enumerating routing module and 64 (* with reference to rendering in Figure 17,3D rendering module Restructural) ray 1: 1 correspondence enumerate pipeline module composition.From the pel number enumerating cache module Enumerating of routing module selection is enumerated via enumerating routing module input according to (i.e. KD leaf segment pel ID) Pipeline module, carries out rendering and enumerates calculating and (i.e. try to achieve all KD in the way of chained list Linked List Pel ID in leaf segment), the KD leaf segment pel ID of result is exported to enumerating FIFO in case inputting Render piercing module.
With reference to Figure 18, render the routing module of enumerating enumerating in module and include circulating selected control module and enumerating Caching read module, circulation selected control module therein is in the way of circulating (Round Robin) in turn One pipeline of selection in pipeline of enumerating at 64 (* restructural) rays is connected with enumerating caching, by Enumerate the primitive data caching read module KD leaf segment needed for enumerating cache module input and hand over Give and enumerate pipeline.
With reference to Figure 19, render the pipeline module of enumerating enumerating in module and include enumerating control module and enumerating Datapath module, renders for execution and enumerates calculating.Try to achieve the pel ID in all KD leaf segments. Calculating process is controlled to enumerate by the finite state machine (Finite State Machine) enumerated in control module Arithmetic logic in datapath module completes.
With reference to Figure 20, the nucleus module foot enumerating the enumerated data path module in pipeline module enumerates calculation Method module, i.e. primitive data from pel SDRAM try to achieve the pel ID in all KD leaf segments.
64 (* restructurals) are comprised with reference to the fifo module of enumerating in Figure 21,3D rendering module FIFO submodule is enumerated for what individual rays enumerated calculating.Enumerate the input of fifo module from Module is enumerated in rendering of 3D rendering module, is enumerated meter by enumerating routing module control importing individual rays That calculates enumerates FIFO submodule.
Include puncturing TAG module and puncture with reference to the puncture cache module in Figure 22,3D rendering module Caching RAM module, puncture TAG module therein accepts and analyzes to render piercing module and wear rendering Thorn pipeline calculates the request of required primitive vertex data, the then caching of the puncture from this module RAM Module (if the primitive vertex data of request is in this locality, i.e. Cache Hit) or from outside pel SDRAM (if the primitive vertex data of request is not in this locality, i.e. Cache Miss) input is required Primitive vertex data.
With reference to Figure 23,3D rendering module renders piercing module by puncturing routing module, 64 (* Restructural) ray 1: 1 correspondence puncture pipeline module and puncture floating-point ALU module composition.From The vertex data puncturing cache module punctures wearing of routing module selection via puncturing routing module input Thorn pipeline module, carries out rendering and punctures and calculate that (i.e. calculating the puncture of ray and pel is middle Hit or not Middle Miss), the pel ID of result is exported to puncturing FIFO in case input renders with point of puncture coordinate Staining module.
With reference to Figure 24, render the puncture routing module in piercing module and include circulating selected control module and puncture Caching read module, circulation selected control module therein is in the way of circulating (Round Robin) in turn Select a pipeline to be connected with puncturing caching in the puncture pipeline of 64 (* restructural) rays, wear Thorn caching read module from the primitive data punctured the KD leaf segment needed for cache module inputs and is given Puncture pipeline.
With reference to Figure 25, render the puncture pipeline module in piercing module and include puncturing control module and puncture Datapath module, renders puncture for execution and calculates, and calculates ray and the figure in all KD leaf segments Whether unit punctures successfully and puncture position.Calculating process is by the finite state machine punctured in control module The arithmetic logic that (Finite State Machine) controls to puncture in datapath module completes.
With reference to Figure 26, puncture pipeline module the nucleus module of punctures datapath module be to puncture calculation Method module, i.e. whether the puncture of the calculating ray of the primitive vertex data from pel SDRAM and pel And point of puncture position.
With reference to Figure 27, render the puncture floating-point ALU module in piercing module and include puncturing floating-point ALU Control module and floating-point ALU datapath module, for accepting what puncture datapath module sent ALU mode request, is sent suitable control signal by puncturing floating-point ALU control module, allows floating-point ALU datapath module forms required puncture ALU computing formula and completes to puncture floating-point ALU Calculate.
64 (* restructurals) are comprised with reference to the puncture fifo module in Figure 28,3D rendering module The puncture FIFO submodule calculated is punctured for individual rays.Puncture fifo module input from 3D rendering module render piercing module, by puncture routing module control import individual rays puncture meter The puncture FIFO submodule calculated.
Include colouring TAG module and coloring with reference to the coloring cache module in Figure 29,3D rendering module Caching RAM module, coloring TAG module therein accepts and analyzes to render staining module to rendering Colour tube road calculates the request of required pel optical data (i.e. color and material), then from this module Coloring caching RAM module (if request pel optical data in this locality, i.e. Cache Hit) Or from outside pel SDRAM (if the pel optical data of request is not in this locality, i.e. Cache Miss) the pel optical data needed for input.
With reference to Figure 30,3D rendering module renders staining module by colouring routing module, 64 (* Restructural) ray 1: 1 correspondence coloured pipes module and coloring floating-point ALU module composition.From The pel optical data of coloring cache module selects via coloring routing module input coloring routing module Coloured pipes module, carry out rendering coloring and calculate (i.e. calculating the color puncturing pel), by result Pel color export to display module.
With reference to Figure 31, render the coloring routing module in staining module and include circulating selected control module and coloring Caching read module, circulation selected control module therein is in the way of circulating (Round Robin) in turn A pipeline is selected to be connected with coloring caching in the coloured pipes of 64 (* restructural) rays, by Coloring caching read module from the pel optical data coloured needed for cache module inputs and gives coloring Pipeline.
With reference to Figure 32, render the coloured pipes module in staining module and include colouring control module and coloring Datapath module, renders coloring for execution and calculates, and calculating process is by colouring having in control module The arithmetic logic that limit state machine (Finite State Machine) controls in coloring data path module has come Become.
With reference to Figure 33, the coloring data path module in coloured pipes module include colouring algorithm module and Coloring stack module, nucleus module therein is colouring algorithm module, utilizes pel SDRAM through master The pel optical data of routing module, coloring cache module and coloring routing module input calculates each The optical effect of the pel in individual tracked ray puncture needl, i.e. calculates light source direct projection, reflects and refraction Optical color.Calculate Reusability ray trace, calculate the pel hit from each direct projection light source Start repeatedly to follow the tracks of each reflection hitting a little upper generation and refraction to hitting the optics that pel is caused Colour effect.
With reference to Figure 34, render the coloring floating-point ALU module in staining module and include colouring floating-point ALU Control module and floating-point ALU datapath module, for accepting what coloring data path module sent ALU mode request, is sent suitable control signal by coloring floating-point ALU control module, allows floating-point ALU datapath module forms required coloring ALU computing formula and completes to colour floating-point ALU Calculate.
With reference to Figure 35, the floating-point ALU datapath module in coloring floating-point ALU module is 3D wash with watercolours 3 submodules calling ALU of dye module (i.e. render and pass through module, render piercing module, wash with watercolours Dye staining module) each pipeline module share ALU computing module.Each renders submodule Each pipeline module of block provides the ALU pattern (ALU needed for indivedual ALU calculating cycles Mode), individual other floating-point ALU control module (floating-point ALU control module, puncture are i.e. passed through Floating-point ALU control module, coloring floating-point ALU control module) this ALU pattern is decoded And send the control signal to floating-point ALU datapath module.Floating-point ALU datapath module bag Include 3 floating point unit modules and floating-point division module, 3 i.e. floating point unit 0 moulds of floating point unit module Block, floating point unit 1 module, floating point unit 2 module, individually by the OP code (OP Code) inputted Control, can perform two 32 IEEE 754 floating numbers addition (+) or subtraction (-) or take advantage of Method (X).Floating-point division module can only perform the division (÷) of two 32 IEEE 754 floating numbers.
With reference to Figure 36, the floating-point division module in floating-point ALU datapath module is by 3 floating-point lists Unit's composition, performs the division of two 32 IEEE 754 floating numbers.Its circuit structure and floating-point The combination of 3 floating point units in ALU datapath module is identical.Its logical structure can by static state The programming mode of reconstruct determines.
In the outside pel SDRAM of synchronization input in the 3D primitive data of caching, KD SDRAM Two points of the space KD tree type data of caching and the optical data of initial rendering module via main road by mould Block, pass through cache module and initial rendering module and enter the core pipeline (Core of 3D rendering module Pipeline): render and pass through module > and pass through fifo module > and render and enumerate module > and enumerate FIFO Module > renders piercing module > puncture fifo module > and renders staining module.3D rendering module Calculating from the beginning of initial rendering module obtains photographic head initial point and picture pixel position, passed through by rendering The routing module that passes through of module selects a crossing pipeline to circulate (Round Robin) mode in turn Module is carried out certainly inputting to save into all KD of 3D sight passing through cache module from KD SDRAM Upper and under repeatedly pass through calculating until hitting a KD leaf segment, and KD leaf segment ID is buffered in Pass through FIFO.Render enumerate module through enumerate routing module with endless form in turn select one enumerate Pipeline module, according to from pass through fifo module read KD leaf segment ID through main routing module from outside Pel SDRAM inputs primitive vertices ID of these KD leaf segments all to be entered to enumerate cache module, then Primitive vertices ID is buffered in and enumerates fifo module.Render piercing module through puncturing routing module with wheel Stream endless form selects a puncture pipeline module, according to from enumerating the pel top that fifo module reads Point ID enters to wear from the outside pel SDRAM input all primitive vertices of KD leaf segment through main routing module Thorn cache module, then carries out puncturing calculating, the puncture pel ID of minimum distance is buffered in puncture Fifo module.Render staining module and select a coloring through coloring routing module with endless form in turn Pipeline module, according to from puncturing the puncture pel ID that fifo module reads, through main routing module from outward Pel SDRAM input in portion punctures the normal direction of pel and enters to colour cache module with color, then enters The coloring of row repeatedly calculates, along direct light, reflection light, refraction light ray trace in 3D sight Path completes the coloring of the picture pixel that a ray hits.Detection one is drawn block by initial rendering module (Tile) rendered and start 3D rendering module to the next one draw block render.So, week and Renew having rendered until whole picture.
With reference to Figure 37, display module comprises 4 modules, i.e. shows that state machine module, VGA control The dash-control module of module, frame and display PCIe control module.Display state machine module controls VGA Control module, display PCIe control module input and behavior sequential with the data of the dash-control module of frame: The synchronizing signal of color data Yu USB interface is given the VGA of outside by VGA control module RAMDAC chip and USB interface, outside 3D rendering result data are given by the dash-control module of frame The frame buffer SDRAM (framework A) in portion, display PCIe control module is by 3D rendering result data Give PCIe interface module to be simultaneously supplied to the frame buffer SDRAM (framework B) being cached in outside Master computer on exterior PC Ie bus slot further processes (such as Intelligent Recognition or the Internet Communications applications).
Display module inputs pixel color data from 3D rendering module, directly through frame buffer control module Import and export external frame caching SDRAM (framework A), or through PCIe output module by PCIe interface Frame buffer SDRAM card (framework B) on module output exterior PC Ie bus slot.
From technique scheme it can be seen that the 3D fpga chip that the present invention provides has following spy Levy and surmount the fpga chip of asic chip, cpu chip, GPU chip and popular and exist 3D graphics render accelerate application on cost performance:
1), the input of multiple tracks high-speed PCI e serial type data and output (Multiple-Lane High-Speed PCIe Serial Data Input and Output)
2), multiray panel data pipeline (Multiple-Ray Parallel Data Pipelines Processing)
3), make full use of data storage cell on sheet, i.e. RAM (Random Access Memory), Shift register (Shift Register), depositor plate (Register File), with first in first out (First-In First-Out is called for short FIFO), first-in last-out (First-In Last-Out is called for short FILO) And caching (Cache) mode carrys out the data exchange between largest optimization pipeline and pipeline.
4), integrated form restructural high-speed floating point computing (add, subtract, the combination in any of multiplication and division) unit. It can change the combination of floating-point operation by static reconfigurable programming mode, and such reconstruct mode exists Programming process completes, it is also possible to Dynamic Signal pattern changes the combination of floating-point operation, such reconstruct The process that mode operates in real time at circuit completes.Coordinating flexibly of two kinds of reconstruct modes makes full use of progress The high speed gate leve arithmetic speed that Super deep submicron process technology is brought.
5), multiray panel data pipeline (Multiple-Ray Parallel Data Pipelining) logic The all tasks during 3D renders are completed in the way of sharing FPU Float Point Unit at a high speed, the highest to reach " speed/chip area " cost performance.
These features all follow, by making full use of, the sub-micro that Moore law constantly improves now CMOS technology (Ultra-Deep Submicron CMOS Technology) completes 3D graphics wash with watercolours The task that dye is accelerated.
Particular embodiments described above, is carried out the purpose of the present invention, technical scheme and beneficial effect Further describe, be it should be understood that the foregoing is only the present invention specific embodiment and , be not limited to the present invention, all within the spirit and principles in the present invention, that is done any repaiies Change, equivalent, improvement etc., should be included within the scope of the present invention.

Claims (41)

1. the fpga chip rendering acceleration for 3D graphics, it is characterised in that this 3D FPGA Chip is overlapped mutually by the logical layer circuit of the restructural layer circuit on upper strata with lower floor and is integrally forming, should 3D fpga chip includes PCIe interface module according to data stream successively from the order being input to output (1), 3D filling module (2), space two sub-module (3), initial rendering module (4), 3D wash with watercolours Dye module (5) and display module (6), wherein:
3D filling module (2), for host computer (PC) in the future via PCIe interface module (1) 3D primitive data in the 3D data of input is cached in the pel SDRAM of outside, and by 3D 3D optical data in data is directly output to initial rendering module (4), and this 3D optical data is extremely Include photographic head, material and light source information less;
Space two sub-module (3), for by the 3D pel number of caching in outside pel SDRAM According to being converted into two points of space KD tree type data and being cached in outside KD SDRAM;
Initial rendering module (4), at least includes photographic head, material for input 3D filling module The 3D optical data of matter and light source information is analyzed and processes, needed for producing 3D rendering module Optical data, this optical data at least includes ray initial point and picture vertex data, and exports to 3D Rendering module, simultaneously produce 3D rendering module render the enabling signal needed for every frame 3D picture and Process the finishing signal that 3D rendering module terminates to produce when every frame 3D picture renders;
3D rendering module (5), the 3D of caching in the outside pel SDRAM to synchronization input Two points of the space KD tree type data of caching and initial render mould in primitive data, KD SDRAM The optical data of block (4) carries out 3D and renders, and exports the result rendering process to display module (6).
The fpga chip rendering acceleration for 3D graphics the most according to claim 1, it is special Levying and be, described restructural layer circuit and is deposited with described logical layer circuit in this 3D fpga chip And form a superposition and close-connected complete structure.
The fpga chip rendering acceleration for 3D graphics the most according to claim 1, it is special Levying and be, described PCIe interface module (1) is used for realizing this 3D fpga chip and exterior PC Ie The data transmission of bus, front end interface is connected with PCIe bus slot, and external signal is total by PCIe Line passes in and out chip with the form of differential signal;3D data in the 3D data file of PC hard disc are with height Speed Serial Port Transmission mode PCIe slot on PC motherboard is transferred into this 3D fpga chip The difference port of PCIe interface module;The back end interface of PCIe interface module and 3D filling module (2) Connect.
The fpga chip rendering acceleration for 3D graphics the most according to claim 3, it is special Levying and be, described PCIe interface module (1) includes PCIe core module and PCIe application module, its Middle PCIe core module is for performing the agreement of the physical layer of data exchange, data link layer and process layer Logic;PCIe application module is used for controlling 3D filling module, space two sub-module, 3D render mould Block and display module and the PC on outside or exterior PC Ie bus slot, pel SDRAM, KD Data exchange between SDRAM and frame buffer SDRAM.
The fpga chip rendering acceleration for 3D graphics the most according to claim 1, it is special Levying and be, described 3D filling module (2) includes PCIe read module, 3D sort module and pel SDRAM control module, wherein:
PCIe read module, for obtaining 3D data and being transferred to from PCIe interface module (1) 3D sort module;
3D sort module, for these 3D data are divided into two class data, a class is 3D primitive data, Output is to outside pel SDRAM, another kind of for 3D optical data, at least includes photographic head, material Matter, light source data, output is to initial rendering module (4);
Pel SDRAM control module, for exporting this 3D primitive data and cache to exterior view Unit SDRAM.
The fpga chip rendering acceleration for 3D graphics the most according to claim 1, it is special Levying and be, described space two sub-module (3) includes KD joint storehouse FILO module, builds KD mould Block, two sub-modules, cost calculation module, two points of position fifo modules and pel fifo module, its In, cost calculation module reads in the most required 3D figure with two sub-modules from outside pel SDRAM Metadata is buffered in pel fifo module and two points of position fifo modules the most respectively;Two sub-modules are used Two points of 3D sight pel space based on axial bag box is produced from the summit that two points of position FIFO obtain Plane, utilizes the primitive vertices coordinate that cost calculation module obtains from pel FIFO to calculate KD then The two of every one-level AABB of tree (KD Tree) are divided into this, finally come with two points of planes that cost is minimum The KD completing this grade saves two points.
The fpga chip rendering acceleration for 3D graphics the most according to claim 1, it is special Levying and be, described initial rendering module (4) includes initial ALU and initial state machine, wherein:
Initial ALU is analyzed for the 3D optical data inputting 3D filling module and processes, Produce the ray initial point needed for 3D rendering module and picture vertex data, and export and render mould to 3D Block, processes the finishing signal that 3D rendering module terminates to produce when every frame 3D picture renders simultaneously;
Initial state machine renders the enabling signal needed for every frame 3D picture for producing 3D rendering module.
The fpga chip rendering acceleration for 3D graphics the most according to claim 1, it is special Levy and be, described 3D rendering module (5) include main routing module, pass through cache module, enumerate slow Storing module, puncture cache module, coloring cache module, render pass through module, render enumerate module, Render piercing module, render staining module, and interval 4 rendering modules pass through fifo module, Enumerate fifo module and puncture fifo module, wherein:
KD joint number from outside KD SDRAM passes through cache module according to input, from exterior view The primitive data of unit SDRAM inputs main routing module, defeated from the optical data of initial rendering module Enter to render and pass through module;
The pipeline operation computation sequence that 3D renders: render pass through module-> pass through fifo module-> Render enumerate module-> enumerate fifo module-> render piercing module-> puncture fifo module-> Rendering staining module, the direction along ray trace from putting to nearest point of puncture, is pressed by this pipeline operation The order of direct projection-> reflection-> refraction is gone round and begun again until completing.
The fpga chip rendering acceleration for 3D graphics the most according to claim 8, it is special Levying and be, described main routing module includes major cycle MUX module and master cache read module, wherein, Major cycle MUX module is enumerated module, is rendered piercing module and render staining module three from rendering Selecting a module in module, master cache read module is chosen from outside pel SDRAM input Primitive data needed for module, input primitive data are exported to this quilt by major cycle MUX module then Select module.
The fpga chip rendering acceleration for 3D graphics the most according to claim 8, its Be characterised by, described in pass through cache module and include passing through CAM bank, passing through TAG module and pass through Caching RAM module, therein pass through CAM bank and passes through TAG module and accept and analyze to render Pass through module to rendering crossing pipeline and calculate the request of required KD joint number evidence, then from passing through caching RAM module or the KD joint number evidence needed for outside KD SDRAM input.
11. fpga chips rendering acceleration for 3D graphics according to claim 8, its Be characterised by, described in render and pass through module by passing through routing module and 64 restructural ray 1:1 Corresponding crossing pipeline module and pass through floating-point ALU and constitute;From the KD passing through cache module Joint number evidence passes through the crossing pipeline module that routing module selects, to ray via passing through routing module input Call with the KD joint of input and pass through floating-point ALU module and carry out rendering passing through calculating, by the KD of result Leaf segment ID exports to passing through fifo module in case exporting and enumerating module to rendering.
12. fpga chips rendering acceleration for 3D graphics according to claim 11, its Be characterised by, described in pass through routing module and include circulating selected control module and pass through caching read module, its Middle circulation selected control module selects in the way of circulating in turn in the crossing pipeline of 64 restructural rays One pipeline is connected with passing through cache module, passes through caching read module and inputs institute from passing through cache module Need KD joint number according to and give crossing pipeline.
13. fpga chips rendering acceleration for 3D graphics according to claim 11, its Being characterised by, described crossing pipeline module includes passing through block of state, ray generation module, passing through control Molding block and pass through datapath module, passes through calculating for performing a ray with a KD saves, Puncture including ray with KD joint bag box and two facet calculates, and calculating is passed through whole KD tree and owned KD joint is until KD leaf segment;Calculating process is passed through by the finite states machine control passed through in control module Arithmetic logic in datapath module completes.
14. fpga chips rendering acceleration for 3D graphics according to claim 13, its Be characterised by, described in pass through datapath module and include passing through stack module and pass through algoritic module, its Nucleus module is to pass through algoritic module, for performing the ray two points of planes in space with KD joint bag box Puncture and calculate, when ray passes through the KD joint of two points of plane both sides simultaneously, on the one hand continue to pass through down The KD joint of one-level, is on the other hand registered in abeyant KD joint and passes through storehouse FILO module.
15. fpga chips rendering acceleration for 3D graphics according to claim 13, its Being characterised by, the datapath module of passing through in described crossing pipeline module shares and same passes through floating-point ALU module, passes through floating-point ALU module and includes passing through floating-point ALU control module and floating-point ALU Datapath module, passes through the ALU mode request that datapath module sends, by wearing for acceptance More floating-point ALU control module sends suitable control signal, allows floating-point ALU datapath module shape ALU computing formula of passing through needed for one-tenth completes to pass through floating-point ALU calculating.
16. fpga chips rendering acceleration for 3D graphics according to claim 8, its Be characterised by, described in pass through fifo module and comprise 64 and reconfigurable pass through calculating for individual rays Pass through FIFO submodule, the input passing through fifo module is passed through from rendering of 3D rendering module By passing through routing module, module, controls to import that individual rays passes through calculating passes through FIFO submodule.
17. fpga chips rendering acceleration for 3D graphics according to claim 8, its Be characterised by, described in enumerate cache module and include enumerating TAG module and enumerate caching RAM module, The TAG of enumerating module therein accepts and analyzes to render to enumerate module and enumerate pipeline calculate required to rendering The request of pel ID data, then from enumerating caching RAM module or from outside pel SDRAM Pel ID data needed for input.
18. fpga chips rendering acceleration for 3D graphics according to claim 8, its Be characterised by, described in render that to enumerate module corresponding with 64 restructural ray 1:1 by enumerating routing module Enumerate pipeline module composition, defeated via enumerating routing module from the primitive data enumerating cache module Enter to enumerate routing module selection enumerates pipeline module, carries out rendering enumerating calculating, by the KD of result Leaf segment pel ID exports to enumerating FIFO in case inputting and rendering piercing module.
19. fpga chips rendering acceleration for 3D graphics according to claim 18, its Be characterised by, described in enumerate routing module and include circulating selected control module and enumerate caching read module, its In circulation selected control module in the way of circulating in turn 64 restructural rays enumerate in pipeline select Select a pipeline and be connected with enumerating to cache, inputted institute by enumerating caching read module from enumerating cache module Primitive data in the KD leaf segment needed also is given and is enumerated pipeline.
20. fpga chips rendering acceleration for 3D graphics according to claim 18, its Be characterised by, described in enumerate pipeline module and include enumerating control module and enumerated data path module, use Rendering in execution and enumerate calculating, try to achieve the pel ID in all KD leaf segments, calculating process is by enumerating control The arithmetic logic in finite states machine control enumerated data path module in molding block completes.
21. fpga chips rendering acceleration for 3D graphics according to claim 20, its Being characterised by, the nucleus module of described enumerated data path module is to enumerate algoritic module, for from figure Primitive data in unit SDRAM tries to achieve the pel ID in all KD leaf segments.
22. fpga chips rendering acceleration for 3D graphics according to claim 8, its Be characterised by, described in enumerate fifo module and comprise 64 and reconfigurable enumerate calculating for individual rays Enumerate FIFO submodule, the input enumerating fifo module is enumerated from rendering of 3D rendering module By enumerating routing module, module, controls to import that individual rays enumerates calculating enumerates FIFO submodule.
23. fpga chips rendering acceleration for 3D graphics according to claim 8, its Being characterised by, described puncture cache module includes puncturing TAG module and puncturing caching RAM module, Puncture TAG module therein accepts and analyzes to render piercing module and calculate required to rendering puncture pipeline The request of primitive vertex data, then from puncturing caching RAM module or from outside pel SDRAM Primitive vertex data needed for input.
24. fpga chips rendering acceleration for 3D graphics according to claim 8, its Be characterised by, described in render piercing module corresponding by puncturing routing module, 64 restructural ray 1:1 Puncture pipeline module and puncture floating-point ALU module composition, from puncture cache module summit Data puncture, via puncturing routing module input, the puncture pipeline module that routing module selects, and render Puncture and calculate, the pel ID of result is exported to puncturing fifo module in case inputting with point of puncture coordinate Render staining module.
25. fpga chips rendering acceleration for 3D graphics according to claim 24, its Being characterised by, described puncture routing module includes circulating selected control module and puncturing caching read module, its In circulation selected control module in the way of circulating in turn in the puncture pipeline of 64 restructural rays select Select a pipeline and be connected with puncturing caching, puncture caching read module required from puncturing cache module input KD leaf segment in primitive data and give puncture pipeline.
26. fpga chips rendering acceleration for 3D graphics according to claim 24, its Being characterised by, described puncture pipeline module includes puncturing control module and puncturing datapath module, uses Calculate in performing to render puncture, calculate the pel in ray and all KD leaf segments whether puncture successfully with And puncture position;Calculating process is punctured data path by the finite states machine control punctured in control module Arithmetic logic in module completes.
27. fpga chips rendering acceleration for 3D graphics according to claim 26, its Being characterised by, the nucleus module of described puncture datapath module is to puncture algoritic module, for from figure Whether and point of puncture position primitive vertex data in unit SDRAM calculates the puncture of ray and pel.
28. fpga chips rendering acceleration for 3D graphics according to claim 24, its Being characterised by, described puncture floating-point ALU module includes puncturing floating-point ALU control module and floating-point ALU datapath module, for accepting the ALU mode request that puncture datapath module sends, Sent suitable control signal by puncturing floating-point ALU control module, allow floating-point ALU data path mould Block forms required puncture ALU computing formula and completes to puncture floating-point ALU calculating.
29. fpga chips rendering acceleration for 3D graphics according to claim 8, its Being characterised by, described puncture fifo module comprises 64 reconfigurable punctures for individual rays and calculates Puncture FIFO submodule;The input puncturing fifo module renders puncture from 3D rendering module Module, is controlled to import the puncture FIFO submodule that individual rays puncture calculates by puncturing routing module.
30. fpga chips rendering acceleration for 3D graphics according to claim 8, its Being characterised by, described coloring cache module includes colouring TAG module and coloring caching RAM module, Coloring TAG module therein accepts and analyzes to render staining module and calculate required to rendering coloured pipes The request of pel optical data, then from coloring caching RAM module or from outside pel SDRAM Pel optical data needed for input.
31. fpga chips rendering acceleration for 3D graphics according to claim 8, its Be characterised by, described in render staining module corresponding by colouring routing module, 64 restructural ray 1:1 Coloured pipes module and coloring floating-point ALU module composition;Pel from coloring cache module The coloured pipes module that optical data selects via coloring routing module input coloring routing module, is carried out Render coloring to calculate, the pel color of result is exported to display module.
32. fpga chips rendering acceleration for 3D graphics according to claim 31, its Being characterised by, described coloring routing module includes circulating selected control module and coloring caching read module, its In circulation selected control module in the way of circulating in turn in the coloured pipes of 64 restructural rays select Select a pipeline to be connected with coloring cache module, defeated from coloring cache module by coloring caching read module Enter required pel optical data and give coloured pipes.
33. fpga chips rendering acceleration for 3D graphics according to claim 31, its Being characterised by, described coloured pipes module includes colouring control module and coloring data path module, uses Calculating in performing to render to colour, calculating process is by the finite states machine control coloring coloured in control module Arithmetic logic in datapath module completes.
34. fpga chips rendering acceleration for 3D graphics according to claim 33, its Being characterised by, described coloring data path module includes colouring algorithm module and coloring stack module, its In nucleus module be colouring algorithm module, utilize pel SDRAM slow through main routing module, coloring The pel optical data of storing module and coloring routing module input calculates each tracked ray and wears The optical effect of the pel that thorn hits, i.e. calculates light source direct projection, reflects and the optical color of refraction;Meter Calculate Reusability ray trace, calculate the pel hit from each direct projection light source and start repeatedly to follow the tracks of often The reflection occurred on blow midpoint and refraction are to hitting the optical color effect that pel is caused.
35. fpga chips rendering acceleration for 3D graphics according to claim 31, its Being characterised by, described coloring floating-point ALU module includes colouring floating-point ALU control module and floating-point ALU datapath module, for accepting the ALU mode request that coloring data path module sends, Sent suitable control signal by coloring floating-point ALU control module, allow floating-point ALU data path mould Block forms required coloring ALU computing formula and completes to colour floating-point ALU calculating.
36. fpga chips rendering acceleration for 3D graphics according to claim 35, its Being characterised by, described floating-point ALU datapath module is 3 of 3D rendering module and calls ALU Submodule each pipeline module share ALU computing module, these 3 sons calling ALU Module is passed through module for rendering, is rendered piercing module, renders staining module;Each renders submodule Each pipeline module ALU pattern needed for indivedual ALU calculating cycles is provided, floating by passing through Point ALU control module, puncture floating-point ALU control module or coloring floating-point ALU control module it A pair this ALU pattern is decoded and sends the control signal to floating-point ALU datapath module.
37. fpga chips rendering acceleration for 3D graphics according to claim 36, its Being characterised by, described floating-point ALU datapath module includes 3 floating point unit modules and floating division Summer block, 3 floating point unit modules are floating point unit 0 module, floating point unit 1 module, floating-point Unit 2 module, is controlled by the OP code inputted respectively, floats for performing two 32 IEEE 754 The addition counted (+) or subtraction (-) or multiplication (X);Floating-point divider module can only perform two The division (÷) of individual 32 IEEE 754 floating numbers.
38. according to the fpga chip rendering acceleration for 3D graphics described in claim 37, its Being characterised by, described Floating-point divider module is made up of 3 floating point units, performs two 32 IEEE The division of 754 floating numbers, the circuit structure of these 3 floating point units and floating-point ALU datapath module In the combination of 3 floating point units identical, the logical structure of these 3 floating point units is by static restructural Programming mode determine.
39. fpga chips rendering acceleration for 3D graphics according to claim 1, its Being characterised by, described display module (6) carries out data conversion to the result rendering process, is shown Show object information, and the display object information obtained be buffered in the frame buffering SDRAM of outside, And by the display object information obtained and the RGB color from outside frame buffering SDRAM input Data export to VGA display together for native monitoring, feed back to PCIe interface module (1) simultaneously For host computer processes.
40. according to the fpga chip rendering acceleration for 3D graphics described in claim 39, its Being characterised by, described display module includes showing state machine module, VGA control module, frame buffering Control module and display PCIe control module, wherein:
Display state machine module controls VGA control module, display PCIe control module and frame buffering control Data input and the behavior sequential of molding block:
The synchronizing signal of color data Yu USB interface is given the VGA of outside by VGA control module RAMDAC chip and USB interface;
3D rendering result data are given the frame buffer SDRAM of outside by the dash-control module of frame;
3D rendering result data are given PCIe interface module to be cached in by display PCIe control module Outside frame buffer SDRAM, the master computer being simultaneously supplied on exterior PC Ie bus slot enters one The process of step.
41. fpga chips rendering acceleration for 3D graphics according to claim 1, its Being characterised by, this fpga chip carries out data exchange with outside SDRAM, this outside SDRAM SDRAM is buffered with frame, wherein including pel SDRAM, KD SDRAM:
Pel SDRAM in the SDRAM of this outside is directly connected in the 3D in this fpga chip Filling module (2) and space two sub-module (3), the KD SDRAM in the SDRAM of this outside is straight The space two sub-module (3) being connected in succession in this fpga chip and 3D rendering module (5), this is outer Frame buffering SDRAM in portion SDRAM is directly connected in the display module in this fpga chip (6);Or
This fpga chip is to be connected to the figure in the SDRAM of this outside via outside PCIe bus Unit SDRAM, KD SDRAM buffers SDRAM with frame.
CN201310560232.8A 2013-11-12 2013-11-12 A kind of fpga chip rendering acceleration for 3D graphics Expired - Fee Related CN103559357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310560232.8A CN103559357B (en) 2013-11-12 2013-11-12 A kind of fpga chip rendering acceleration for 3D graphics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310560232.8A CN103559357B (en) 2013-11-12 2013-11-12 A kind of fpga chip rendering acceleration for 3D graphics

Publications (2)

Publication Number Publication Date
CN103559357A CN103559357A (en) 2014-02-05
CN103559357B true CN103559357B (en) 2016-09-21

Family

ID=50013603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310560232.8A Expired - Fee Related CN103559357B (en) 2013-11-12 2013-11-12 A kind of fpga chip rendering acceleration for 3D graphics

Country Status (1)

Country Link
CN (1) CN103559357B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503950B (en) * 2014-12-09 2017-10-24 中国航空工业集团公司第六三一研究所 A kind of graphics processor towards OpenGL API
CN107967704A (en) * 2016-10-20 2018-04-27 上海复旦微电子集团股份有限公司 A kind of fpga chip domain line display methods
CN107464207B (en) * 2017-07-17 2020-06-02 南京华磊易晶微电子有限公司 3D (three-dimensional) graphics rendering acceleration system based on reconfigurable data stream system chip array
CN113472964B (en) * 2021-06-05 2024-04-16 山东英信计算机技术有限公司 Image processing device and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102246146A (en) * 2008-11-07 2011-11-16 谷歌公司 Hardware-accelerated graphics for web applications using native code modules
CN102835119A (en) * 2010-04-01 2012-12-19 英特尔公司 A multi-core processor supporting real-time 3D image rendering on an autostereoscopic display

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020101425A1 (en) * 2001-01-29 2002-08-01 Hammad Hamid System, method and article of manufacture for increased I/O capabilities in a graphics processing framework

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102246146A (en) * 2008-11-07 2011-11-16 谷歌公司 Hardware-accelerated graphics for web applications using native code modules
CN102835119A (en) * 2010-04-01 2012-12-19 英特尔公司 A multi-core processor supporting real-time 3D image rendering on an autostereoscopic display

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于ARM+FPGA架构的三维图形加速系统;夏小为,吴宁,刘静;《电子设计应用》;20091231(第12期);第49页左栏第1段-第50页中栏最后1段,图1-3 *
基于复杂场景图的光线追踪渲染的Kd-tree 构造;陈立华,王毅刚;《计算机应用与软件》;20111031;第28卷(第10期);第235页参见摘要、引言 *

Also Published As

Publication number Publication date
CN103559357A (en) 2014-02-05

Similar Documents

Publication Publication Date Title
CN110176054B (en) Generation of composite images for training neural network models
US10740952B2 (en) Method for handling of out-of-order opaque and alpha ray/primitive intersections
US20200051314A1 (en) Watertight ray triangle intersection
CN103559357B (en) A kind of fpga chip rendering acceleration for 3D graphics
CN104025181B (en) The block based on classification for uncoupling sampling postpones coloring system structure
Liu et al. Multi-layer depth peeling via fragment sort
CN107251098A (en) The true three-dimensional virtual for promoting real object using dynamic 3 D shape is represented
CN105869117A (en) Method for accelerating GPU directed at deep learning super-resolution technology
CN104584082B (en) The stitching of the primitive in graphics process
CN110458905A (en) Device and method for the adaptive tessellation of level
CN101593345A (en) Three-dimensional medical image display method based on the GPU acceleration
CN109923519A (en) For accelerating the mechanism of the graphical Work load in multicore computing architecture
US20210287096A1 (en) Microtraining for iterative few-shot refinement of a neural network
CN101599181A (en) A kind of real-time drawing method of algebra B-spline surface
CN105976345A (en) Visible light remote sensing image synthesis method
CN103500463A (en) Visualization method for multilayer shape feature fusion on GPU (Graphics Processing Unit)
CN109523619A (en) A method of 3D texturing is generated by the picture of multi-angle polishing
CN102915563A (en) Method and system for transparently drawing three-dimensional grid model
Lee et al. Real-time ray tracing on coarse-grained reconfigurable processor
CN107077758A (en) Zero covering rasterisation is rejected
CN107784622A (en) Graphic system and graphics processor
CN106548500A (en) A kind of two-dimension situation image processing method and device based on GPU
Kim et al. A 3D graphics rendering pipeline implementation based on the openCL massively parallel processing
CN103617594B (en) Noise isopleth-surface drawing-oriented multi-GPU (Graphics Processing Unit) rendering parallel-processing device and method thereof
CN103247070A (en) Interactive relighting sense of reality rendering method based on precomputed transfer tensor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: NANJING HUALEI YIJING MICROELECTRONICS CO., LTD.

Free format text: FORMER OWNER: WUXI HUALEI YIJING MICRO-ELECTRICAL CO., LTD.

Effective date: 20150327

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 214043 WUXI, JIANGSU PROVINCE TO: 214106 NANJING, JIANGSU PROVINCE

TA01 Transfer of patent application right

Effective date of registration: 20150327

Address after: 701, room 1, building 108, East 214106, Gan Hua Street, Yao street, Qixia District, Jiangsu, Nanjing

Applicant after: Nanjing Hua Leiyi Microelectronics Co., Ltd.

Address before: 214043, No. 401 Xingyuan North Road, Wuxi, Jiangsu, 708A

Applicant before: WUXI HUALEI YIJING MICRO-ELECTRICAL CO., LTD.

C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160921

Termination date: 20171112