CN103995746A - Method of realizing graphic processing in harmonic processor and harmonic processor - Google Patents

Method of realizing graphic processing in harmonic processor and harmonic processor Download PDF

Info

Publication number
CN103995746A
CN103995746A CN201410166054.5A CN201410166054A CN103995746A CN 103995746 A CN103995746 A CN 103995746A CN 201410166054 A CN201410166054 A CN 201410166054A CN 103995746 A CN103995746 A CN 103995746A
Authority
CN
China
Prior art keywords
graphics process
thread
hardware
processor
process controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410166054.5A
Other languages
Chinese (zh)
Inventor
丘正前
钟伟
冀谦祥
李晶晶
梅思行
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN ICUBE TECHNOLOGY CORP
Icube Co Ltd
Original Assignee
SHENZHEN ICUBE TECHNOLOGY CORP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN ICUBE TECHNOLOGY CORP filed Critical SHENZHEN ICUBE TECHNOLOGY CORP
Priority to CN201410166054.5A priority Critical patent/CN103995746A/en
Publication of CN103995746A publication Critical patent/CN103995746A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Generation (AREA)

Abstract

The invention relates to a method for realizing graphic processing in a harmonic processor. The method comprises the following steps that a graphic processing command is executed, software threads for graphic processing are built, and hardware resources for executing the software threads are distributed to the software threads or the software threads are made to wait for distributing of the hardware resources for executing the software threads in a queue; a graphic processing line is formed; the software threads in the graphic processing line read content of a command register in a graphic processing controller; according to the obtained content, the software threads in the graphic processing line carry out corresponding operations. The invention further relates to the harmonic processor. The method for realizing graphic processing in the harmonic processor and the harmonic processor have the advantages that efficiency is high, and resources are saved.

Description

In harmonious processor, realize method and the harmonious processor of graphics process
Technical field
The present invention relates to field of processors, more particularly, relate to a kind of method and harmonious processor of realizing graphics process in harmonious processor (UPU, Unified Process Unit).
Background technology
The development of processor is in the process of being changed to multinuclear by monokaryon, and its function also has the trend merging gradually.For example, in former processor, CPU(CPU (central processing unit)) and GPU(Graphics Processing Unit) be different integrated circuit, the physical arrangement that its difference is independent, can connect by south, north bridge therebetween; There is being afterwards provided with the processor of CPU and GPU in an integrated circuit, although its using method does not become, in structure independently, but original two sets of elements are become to one, still greatly reduced it and taken the area of circuit board.When polycaryon processor starts when popular gradually, started to occur that a kind of processor with a plurality of hardware-core both can process the task that in traditional sense, CPU carries out, can process again the situation of the image Processing tasks that in traditional sense, GPU carries out; And for processor aspect, processing these two kinds of tasks does not need it to distinguish in advance.Sort processor is commonly called UPU, i.e. harmonious processor.The common feature of such processor is: having a plurality of hardware-core can move independently, and system or processor software are carried out instruction, generates software thread, and these software threads, when available free hardware-core, move in this hardware-core, complete thread; When there is no idle hardware thread, these software threads are waited in queue; The software thread in queue after software thread has moved, discharges the hardware-core that it takies, so that can be used these idle hardware-core.Similarly, these software threads are not distinguished and are belonged in that CPU in traditional sense carries out or traditional sense that GPU carries out.Due to the singularity of graphics process, for example, although its operation steps is simpler, repeatability is higher,, its data volume is large, time that need to be longer when it is processed; Meanwhile, the processing of its some fixed function, painted etc. such as rasterisation, pixel, and be not suitable for hardware-core and process, need special-purpose hardware to process it, conventionally use graphics process fixed function module to process it.Like this, owing to not distinguishing the software thread of CPU and GPU in processor aspect, make existing UPU when carrying out above-mentioned figure fixed function processing, its software kernel still occupies the hardware thread of its operation, but this software thread and hardware-core this period (for example, while carrying out texture processing, rasterisation) interior idle, thus make the efficiency of processor not high, wasted resource.
Summary of the invention
The technical problem to be solved in the present invention is, the defect of, waste resource not high for the above-mentioned efficiency of prior art provides the method that realizes graphics process in harmonious processor and the harmonious processor of a kind of high-level efficiency, saving resource.
The technical solution adopted for the present invention to solve the technical problems is: construct a kind of method that realizes graphics process in harmonious processor, comprise the steps:
A) carry out figure processing command, set up the software thread of graphics process, for described software thread distributes or makes it at the hardware resource of medium this software thread of execution to be allocated of queue; Described hardware resource comprises the hardware-core of moving this software thread;
B) form graphics processing pipeline, described graphics processing pipeline comprises software thread and the graphics process fixed function module that obtains described hardware resource;
C) software thread in described graphics processing pipeline reads the command register contents in graphics process controller (GPUF controller); Described graphics process controller is hardware configuration, and Thread control or configuration in described harmonious processor;
D) according to obtaining content, the software thread in described graphics processing pipeline operates accordingly, and these operations comprise: abdicate the hardware resource of configuration and enter waiting list, carry out graphics process or exit.
Further, described in, carrying out graphics process comprises the steps:
The described graphics process of carrying out comprises the steps:
The line rasterization of going forward side by side is played up in the summit of figure and process, the PaintShop thread completing after play up on summit returns to step C) again read the command register contents in graphics process controller; Or
The pixel of figure is carried out painted and processed described pixel after painted by raster operation unit; The software thread completing after processes pixel returns to step C) again read the command register contents in graphics process controller;
Wherein, in above-mentioned steps, in the time of need to carrying out texture processing to figure, the hardware resource that software thread is initiated texture requests and abdicated operation to described graphics process fixed function module enters wait, and described graphics process fixed function resume module texture requests is also waken PaintShop thread up when data texturing returns.
Further, described rasterization process, raster manipulation and texture processing are all under the effect of described graphics process controller, by described graphics process fixed function module, are realized; Described pixel is painted to be realized by the software thread after waking up.
Further, described graphics process controller is connected with the appointment local storage of described hardware-core, and described graphics process controller and described software thread are by specifying local storage interaction data; Described graphics process controller is by register interface and described software thread interactive command; Described graphics process controller is also connected with the level cache of processor hardware kernel by bus, and described graphics process controller and described software thread are by caching system interaction data.
Further, described graphics process controller is also connected with the L2 cache in described processor, and described graphics process controller and harmonious processor thread are by described L2 cache and local storage exchange instruction or data.
Further, each unit in described graphics process fixed function module is respectively by described local storage or L2 cache and local storage and described graphics process thread exchange instruction or data.
The invention still further relates to a kind of harmonious processor, comprise a plurality of hardware-core for operating software thread, described each hardware internal memory comprises level cache and the local storage that it is special-purpose, also comprises the graphics process controller of carrying out the resulting PaintShop thread of figure processing command and the formed graphics processing pipeline of graphics process fixed function module for controlling this processor; Described graphics process controller comprises the command register of controlling PaintShop threading operation and described graphics process fixed function module operation for depositing; Described command register is connected with the local storage of described hardware-core.
Further, described graphics process controller is connected with the local storage of the appointment of described hardware-core; Described graphics process controller is by register interface and described software thread interactive command; Described graphics process controller is also connected with the level cache of processor hardware kernel by bus.
Further, described graphics process fixed function module is independent hardware configuration, and it comprises rasterization unit, texture cell and raster operation unit; Described graphics process controller reads described command register, and correspondingly drives described rasterization unit, texture cell and raster operation unit according to the command context reading.
Further, described graphics process controller is also connected with the L2 cache of described harmonious processor hardware kernel; The DMA passage that described software thread configuration obtains a hardware-core is connected the L2 cache of described kernel and the local storage of described kernel.
Implement method and the harmonious processor of realizing graphics process in harmonious processor of the present invention, there is following beneficial effect: because the characteristic of harmonious processor is not distinguished for traditional CPU Processing tasks and GPU Processing tasks in processor aspect, although make may can make on some times this PaintShop thread to use hardware resource idle when graphics process, because this software thread need to be waited for the figure fixed function processing module return data of image data processing.Owing to lacking necessary control device, in the prior art, the Efficiency Decreasing that this situation is used whole processor hardware resource, thus make the efficiency of whole processor lower, wasted hardware resource very valuable for processor.In the present invention, owing to having increased graphics process controller, and make the streamline that above-mentioned PaintShop thread and hardware resource are combined under the control of this graphics process controller, make corresponding action, specifically be by reading UPU threads store content in command register in this graphics process controller, thereby determine the concrete operations of this streamline.In the case of necessary, make this PaintShop thread enter waiting status and discharge the hardware resource that it takies, make this hardware resource can be for the processing of other threads.And after graphic processing data completes, on idle hardware resource, move again this PaintShop thread, thereby make hardware resource obtain the reasonable use of maximum magnitude.Thereby its efficiency is higher, saving resource comparatively.
Accompanying drawing explanation
Fig. 1 is the process flow diagram that the present invention realizes graphic processing method in the method for graphics process and harmonious processor embodiment in harmonious processor;
Fig. 2 is graphic processing data process flow diagram concrete in described embodiment;
Fig. 3 is the concrete structure block diagram of this harmony processor in described embodiment;
Fig. 4 is the processor structural representation of refinement more in Fig. 3;
Fig. 5 has the structural representation of the processor of 4 hardware-core in described embodiment.
Embodiment
Below in conjunction with accompanying drawing, embodiments of the present invention is further illustrated.
As shown in Figure 1, in the harmonious processor of the present invention, realize in the method and harmonious processor embodiment of graphics process, the method for this graphics process comprises the steps:
Step S11 sets up the software thread of graphics process, and for its distribution or make it wait for hardware resource: in the present embodiment, for the harmonious processor of UPU(explaining the situation, Unified Process Unit) comprise 4 hardware-core, can move 4 UPU threads simultaneously.All softwares (or task) of this UPU are carried out in above-mentioned UPU hardware-core, for example, and operating system, application program, GPU tinter etc.For this UPU, in processor aspect, can not distinguished to dissimilar task (can not distinguish is by the task of CPU execution in traditional sense or the task of GPU execution).In the present embodiment, the register configuring in UPU thread controller by software creates UPU thread.In a UPU, can there be 4 UPU threads in real time in operation, can also have other 4 threads to wait in the queue of thread controller.When having a thread moving to exit to make available free hardware-core resource, the hardware-core that will be dispatched immediately into idle at the thread of waiting for is carried out.Wherein, UPU thread (UPU Thread) is the combination of software thread and above-mentioned hardware-core, and this software thread is the thread of CPU in traditional sense, GPU and other any program.When the software thread in UPU thread is GPU rendering program, it is just equivalent to a GPU and plays up thread, also can be tinter thread, because operation is vertex shader or pixel coloring device.In the time of UPU thread operation CPU program, it is just equivalent to CPU thread.In this step, when software need to call GPU picture, it can send out the order of drawing to GPU driver, the queue of an order of driver maintenance.While having order in this queue, the register that GPU driver can configure UPU thread controller creates GPU and plays up thread and (can be considered as the combination of software thread and hardware-core, this hardware-core is by distributing or obtaining after wait in queue), and send the order of drawing to graphics process fixed function module.In the present embodiment, above-mentioned graphics process fixed function module comprises: graphics process controller plc, rasterization unit, texture cell and raster operation unit.Wherein, above-mentioned graphics process controller plc is for controlling the operation that above-mentioned rasterization unit, texture cell, raster operation unit and above-mentioned GPU play up thread software thread.Particularly, this graphics process controller comprises command register, and above-mentioned thread or each unit are all by the exchange of reading or write to realize instruction or data to this command register content, and then the control of practical function.
Step S12 forms graphics processing pipeline, start process graphical data: because the software thread in above-mentioned steps has been assigned to hardware-core, by idle hardware-core or by the wait in queue, obtain idle hardware-core, therefore, in this step, form graphics processing pipeline, graphics processing pipeline comprises software thread (for example, operating in tinter or the coloration program in UPU hardware-core) and the graphics process fixed function module that obtains described hardware resource (idle hardware-core).
Step S13 software thread reads the command register in graphics process controller: in this step, the software thread in graphics processing pipeline reads the command register contents in graphics process controller; In the present embodiment, above-mentioned graphics process controller is hardware configuration, and Thread control or configuration in described harmonious processor; Wherein the content of command register is also like this.Because UPU thread is to be managed by UPU thread controller (being the thread controller in processor aspect), so figure has processing controller can not directly create tinter in UPU kernel.As mentioned above, GPU driver creates tinter, and tinter creates out the interface register that can read afterwards between UPU kernel and graphics process fixed function module and knows what task oneself will carry out.In the present embodiment, used the mechanism of a kind of wait-wake up to make graphics process controller can dispatch tinter; Several situations that may occur in the present embodiment, are as follows: GPU plays up the command register of thread (software thread) inquiry figure processing controller and knows what task oneself need to carry out; When not having task to carry out, graphics process controller can be placed the command code of expression " wait " in mentioned order register, when GPU plays up thread and takes this command code, can from UPU hardware-core, exit (discharging this hardware-core), in the queue of waiting line range controller, wait for; When having summit or pixel to play up, graphics process controller can be put into the code that represents summit or pixel rendering task in command register, plays up thread and waits in the queue of thread controller, the thread wakening of those waits if had; When whole rendering tasks all complete, graphics process controller can be put into the command code that expression " is exited " in command register, and GPU plays up when thread is read this order and will exit (finish software thread and discharge hardware-core).
Step S14 is according to the content obtaining, and this software thread is carried out corresponding operation: in this step, according to the command register contents obtaining in above-mentioned steps, carry out operation accordingly.Certainly, when carrying out graphics process, also relate to some more specifically steps, describe in detail after a while.
In the present embodiment, when this software thread operation, above-mentioned steps S13, S14 constantly repeat, perform step after S14, return to step S13, continue to take out the code in command register, perform step again S14, according to the code of again obtaining, carry out corresponding operation, until the code of obtaining is to exit, the hardware resource of this software thread release operation exits.
Fig. 2 is an example of the concrete steps while carrying out graphics process in the present embodiment, and in Fig. 2, concrete graphics process comprises:
Step S21 plays up summit: in the present embodiment, by this step, being started, to step S24, is all the step of concrete processing processing graphics.In this step, the summit of figure is played up; Complete software thread after play up on summit querying command register decide next step action according to the content of command register again.If now the content of command register is to carry out other operations, in the present embodiment, use the hardware resource of this software thread operation to go to carry out other operations that this command register is pointed out; If now the content in command register is to wait for, the PaintShop thread completing after play up on summit enters wait under the control of graphics process controller, abdicates the hardware resource of its operation.In the prior art, although also can relate to summit is played up, but, PaintShop thread completes after play up on a summit and all exits, the execution of next graphics process task must create out one by graphics process controller again and play up thread out, therefore have frequently the operation that creates-exit, and creation operation has all been concentrated by graphics process controller, when having when much playing up thread and need to create, system overhead is just larger.This step is that the present embodiment is distinctive, graphics process controller is placed the task that next software thread will be carried out in advance in command register, after software thread executes a task, reading order register is carried out next task, by the method for such software and hardware combination, realize distributed task scheduling, reduced the system overhead of task scheduling.The hardware resource freeing out (or hardware-core), by other threads for the treatment of in queue, is greatly improved the efficiency of processor.Concrete grammar is, when graphics process controller is waited for or exits by software thread, this software thread discharges hardware resource, the hardware resource discharging can be used for moving other software thread, described software thread is not limited to figure reason thread, can also be other thread, as CPU thread, or the thread of other application program.In a word, in the present embodiment, software thread completes after the rendering task of summit, has two kinds of modes to remove to utilize the hardware resource free out: querying command register again, and command register may make this software thread carry out other task; If command register allows this software thread wait for or to exit, so this software thread discharge after hardware resource other software thread can on use discharged hardware resource.
Step S22 carries out rasterization process to the vertex data after playing up: in this step, the graphics vertex of playing up that completes obtaining in previous step is carried out to rasterization process.It is worth mentioning that, rasterization process is under the effect of above-mentioned graphics process controller, calls that rasterization process unit carries out.Its operation does not relate to above-mentioned software thread and hardware-core, that is to say, this step only relates to above-mentioned graphics process fixed function module.
Step S23 carries out painted to the pixel producing: in this step, owing to carrying out rasterization process in above-mentioned steps, in its processing procedure, will constantly produce pixel, for this reason, in this step, the pixel of the figure of continuous generation is carried out painted, when a pixel complete painted after, by raster operation unit, process this pixel after painted.In this step, pixel is painted is completed by software thread.When having pixel needs painted, graphics process controller is placed and is represented that the painted order of pixel is in command register, software thread reads that to carry out pixel after the painted order of pixel painted, complete pixel painted after this software reading order register again, according to gained content, decide next step operation, detailed process is as described in step S21.
In above-mentioned steps S21 and S23, except the content of above-mentioned record, also may produce texture requests, namely texture processing is carried out in the summit obtaining in this step or pixel.In this case, first be that software thread forms or produce a texture processing request, after texture requests is sent to graphics process fixed function module, abdicate hardware resource and wait for, the hardware resource of abdicating can be for moving other software thread under the control of UPU thread controller, then, the texture processing unit that graphics process controller calls in graphics process fixed function module carries out texture processing to it, when returning, data texturing (completes after texture processing), graph data through above-mentioned processing is returned to above-mentioned PaintShop thread, and wake PaintShop thread up.It is worth mentioning that, texture requests is not to occur, need to be depending on concrete situation.
Step S24 carries out raster manipulation to the pixel after processing: in this step, the pixel data after above-mentioned processing is carried out to raster manipulation.
It is worth mentioning that, in the present embodiment, for the ease of narration, above-mentioned steps S21-S24 is described in a certain order.But, in practical operation, when obtaining order from command register, its correspondence may be in above-mentioned steps, for example, in above-mentioned steps S13, obtain order time pixel when painted, it is painted that meaning only need to be carried out pixel to current data, so S14 is in this execution step, be actually execution step S23 wherein, and need not perform step S21, S22 and S24.In other words, above-mentioned steps S21-24, when each execution step S14, can only carry out wherein any one, certainly, also can carry out the wherein combination of a plurality of steps.
On overall, in the present embodiment, while take tinter or color thread process graphical data, be example, it roughly comprises several stages, the operation in each stage is different hardware or software unit, but a common ground is that these operations are all to carry out under the control of graphics process controller.When having summit to play up, graphics process controller can be put a code that represents summit rendering task in command register, GPU plays up after thread is taken this order and just starts to carry out vertex coloring program, after completing this summit rendering task, it just writes back an acknowledgement command to graphics process controller (similarly, this acknowledgement command is also to write mentioned order register), this order of notice graphics process controller completes.When all summits of a geometric element (or geometric figure) have all completed when playing up, when all having played up as three summits of one three solution shape, graphics process controller can tell or control rasterization unit and can start completing the rasterisation work of the vertex data of playing up.While in the present embodiment, carrying out these steps, graphics process controller is directly to call rasterization unit, texture cell and raster operation unit.In hardware design, three unit of graphics process controller and this are that direct line is connected.So rasterization unit is carried out rasterisation task, when having pixel to produce, graphics process controller can be put the order of expression " pixel is painted " in command register, and GPU plays up after thread is taken this order will carry out the painted task of pixel.When GPU plays up thread and completes the painted task of pixel, playing up thread can write back an acknowledgement command and manage controller (similarly to figure, this acknowledgement command is also to write mentioned order register), graphics process controller can be notified raster operation unit, and raster operation unit will start to process these pixels.
Afterwards, above-mentioned Vertex and pixel shader all may need to initiate texture requests, and the order of texture requests is write to command register, then waits for; When graphics process controller, receiving this order will drive texture cell to process this texture requests.When texture cell is returned to all data texturings that certain plays up the texture requests that thread sends, its can send a signal to graphics process controller and by graphics process controller, that tinter waiting for is waken up.When available free hardware thread, UPU thread controller will be dispatched waken up color thread into idle hardware thread operation.
In the present embodiment, CPU program and GPU tinter can use level cache, and all programs of running in any one UPU kernel can be used level cache.In the present embodiment, the order between UPU kernel and graphics process fixed function module is transmitted by register interface.Except order, also have the data of other a lot of GPU between UPU kernel and graphics process fixed function module, to transmit.UPU kernel can be by interactive interfacing data that are directly connected with graphics process fixed function module, as follows: 1) GPU driver leaves the information of the order of drawing in local storage in, and graphics process fixed function module directly reads local storage and obtains these data; 2) tinter is kept at the result of playing up in local storage, and graphics process fixed function module directly reads local storage and obtains these data; 3) rasterization unit is write the information of the pixel producing in local storage, and tinter directly reads these data in local storage; 4) tinter is write local storage texture coordinate, graphics process fixed function module reader ground storage is obtained the data of texture coordinate, texture cell is write the data texturing returning in local storage, and tinter reads local storage and obtains these data etc.Direct interface between this UPU kernel and graphics process fixed function module is by order and data sharing.
UPU kernel and graphics process fixed function module can also be carried out swap data by L2 cache, for all data recited above: 1) if data are produced by graphics process fixed function module, graphics process fixed function module can first be write data in L2 cache, software thread is initiated a DMA and data are read to pass to from L2 cache local storage, then from local storage, is read these data, or software thread directly reads level cache and obtains these data.2) if data produce in UPU kernel the inside, UPU thread can be write data in level cache, then graphics process fixed function module is read L2 cache, caching system can be processed the consistance of storage, from level cache, up-to-date data retrieval, then return to graphics process fixed function module; 3) software thread can also write on data in local storage, then by DMA, data is passed to L2 cache from local storage, and graphics process fixed function module is read L2 cache.
In addition, in the present embodiment, local storage and caching system are to unify addressing, and which kind of mode UPU kernel and graphics process fixed function module adopt come interaction data to depend on what address is program used.Between UPU kernel and graphics process fixed function module mutual data, other uses to UPU kernel or institute's data of only using to graphics process fixed function module can both be used L2 cache.For example CPU can carry out its data of buffer memory with level cache and L2 cache; Graphics process fixed function module can be carried out with L2 cache the data such as buffer texture, the degree of depth, masterplate, color.In addition, whether data can be buffered can software control.
Fig. 3 shows the structural representation of a kind of harmonious processor relating in the present embodiment, and in Fig. 3, this harmony processor comprises a plurality of hardware-core for operating software thread, and described level cache and local storage are that all hardware kernel is shared.In the present embodiment, from physical significance, the level cache of processor and local storage are respectively an integral body, are not divided into several parts.But, from logic, can be by configuring or the method such as distribution makes each hardware-core have level cache and local storage in logic.This processor also comprises for this processor carries out the resulting PaintShop thread of figure processing command and graphics process fixed function module, and graphics process fixed function module also comprises for controlling the graphics process controller of graphics processing pipeline (consisting of PaintShop thread and hardware-core); Wherein, graphics process controller comprises command register, and this register is controlled the operation of PaintShop thread and the operation of described graphics process fixed function module for depositing; This command register can be by hardware-core (hardware-core that above-mentioned PaintShop thread moves thereon) and the read-write of graphics process controller, geographically, this command register is divided into two parts, lay respectively at hardware-core and graphics process fixed function module the inside, hardware-core with graphics process fixed function module by a mutual transmission command of the special purpose interface being directly connected.As previously described, this special purpose interface except transmission command also for transmitting data.In the present embodiment, above-mentioned graphics process fixed function module comprises graphics process fixed-function unit and above-mentioned graphics process controller, this graphics process fixed-function unit is independent hardware configuration, and it comprises rasterization unit, texture cell and raster operation unit; Rasterization unit, texture cell and raster operation unit are connected with described command register respectively, accept the instruction in this command register, the go forward side by side operation of line correlation of specified data.
In addition, graphics process controller is also connected with the L2 cache of harmonious processor hardware kernel, and is connected with the local storage of described kernel by described L2 cache.
Fig. 4 is the structural representation of the further refinement of Fig. 3.In Fig. 4, mainly provided the concrete mode of data in this harmony processor or instruction storage.Wherein, in Fig. 4, be labeled as the connection that 1 connecting line represents versabus.In graphics process fixed function module accesses DDRx storer, if these data can be buffered, so just through L2 cache, otherwise directly go to DDRx storer.In Fig. 4, being labeled as 2 is between UPU kernel and graphics process fixed function module one inner bus that connects, and is used for transmitting data and order.In Fig. 4, the register interface module of UPU and graphics process fixed function module the inside all comprises command register.The command register of UPU the inside is read and write to UPU thread; The command register of graphics process fixed function module the inside is to the steering logic read-write of graphics process controller.So just realize the steering logic of instruction interaction between software thread and graphics process fixed function module controller and graphics process controller the inside and according to the content of command register, gone to drive the function of rasterization unit, texture cell, these three fixed function modules of raster operation unit.
It is worth mentioning that, in the present embodiment, in order to narrate conveniently, be to take a hardware-core to be described as example always.But it is convenient that this description is only used to narration, be not that this processor of explanation only has a hardware-core.In fact, in the present embodiment, this processor has 4 hardware-core, refers to Fig. 5, and the method for each kernel when realizing graphics process is identical with the present embodiment, is parallel relation between a plurality of kernels; Be that each kernel all can be realized the method in the present embodiment simultaneously.For example, in harmonious processor, there is in the present embodiment the hardware thread of 4 operations, but always have 8 software threads.In these 8 software threads, have 4 in operation, have 4 waiting for, these 4 in the possibility of waiting for, some has been ready to move, and when the software thread moving is once enter waiting status, the thread that is ready to move just immediately can move.GPUF controller is responsible for controlling well this streamline, has another thread to use immediately and release released hardware resource when a software thread is waited for.This is an advantage of the harmonious processor in the present embodiment.Under certain situation in the present embodiment, this processor may have 1,2,4 or 8 kernel.Such expansion is the structures shape of method in the present embodiment and processor.Due to the independence of method in the present embodiment, with the characteristic such as sharing of the non-correlation of hardware and memory storage, the expansion of its hardware-core is very easy.
The above embodiment has only expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims (10)

1. in harmonious processor, realize a method for graphics process, it is characterized in that, comprise the steps:
A) carry out figure processing command, set up the software thread of graphics process, for described software thread distributes or makes it at the hardware resource of medium this software thread of execution to be allocated of queue; Described hardware resource comprises the hardware-core of moving this software thread;
B) form graphics processing pipeline, described graphics processing pipeline comprises software thread and the graphics process fixed function module that obtains described hardware resource;
C) software thread in described graphics processing pipeline reads the command register contents in graphics process controller; Described graphics process controller is hardware configuration, and Thread control or configuration in described harmonious processor;
D) according to obtaining content, the software thread in described graphics processing pipeline operates accordingly, and these operations comprise: abdicate the hardware resource of configuration and enter waiting list, carry out graphics process or discharge its operation resource and exit.
2. the method that realizes graphics process in harmonious processor according to claim 1, is characterized in that, described in carry out graphics process and comprise the steps:
The line rasterization of going forward side by side is played up in the summit of figure and process, the PaintShop thread completing after play up on summit returns to step C) again read the command register contents in graphics process controller; Or
The pixel of figure is carried out painted and processed described pixel after painted by raster operation unit; The software thread completing after processes pixel returns to step C) again read the command register contents in graphics process controller;
Wherein, in above-mentioned steps, in the time of need to carrying out texture processing to figure, the hardware resource that software thread is initiated texture requests and abdicated operation to described graphics process fixed function module enters wait, and described graphics process fixed function resume module texture requests is also waken PaintShop thread up when data texturing returns.
3. the method that realizes graphics process in harmonious processor according to claim 2, it is characterized in that, described rasterization process, raster manipulation and texture processing are all under the effect of described graphics process controller, by described graphics process fixed function module, are realized; Described pixel is painted to be realized by the software thread after waking up.
4. the method that realizes graphics process in harmonious processor according to claim 3, it is characterized in that, described graphics process controller is connected with the appointment local storage of described hardware-core, and described graphics process controller and described software thread are by specifying local storage interaction data; Described graphics process controller is by register interface and described software thread interactive command; Described graphics process controller is also connected with the level cache of processor hardware kernel by bus, and described graphics process controller and described software thread are by caching system interaction data.
5. the method that realizes graphics process in harmonious processor according to claim 3, it is characterized in that, described graphics process controller is also connected with the L2 cache in described processor, and described graphics process controller and harmonious processor thread are by described L2 cache and local storage exchange instruction or data.
6. the method that realizes graphics process in harmonious processor according to claim 3, it is characterized in that, each unit in described graphics process fixed function module is respectively by described local storage or L2 cache and local storage and described graphics process thread exchange instruction or data.
7. a harmonious processor, comprise a plurality of parallel hardware-core for operating software thread, described each hardware-core comprises level cache and the local storage that it is special-purpose, it is characterized in that, also comprise the graphics process controller of carrying out the resulting PaintShop thread of figure processing command and the formed graphics processing pipeline of graphics process fixed function module for controlling this processor; Described graphics process controller comprises the command register of controlling PaintShop threading operation and described graphics process fixed function module operation for depositing; Described command register is connected with the local storage of described hardware-core.
8. harmonious processor according to claim 7, is characterized in that, described graphics process controller is connected with the local storage of the appointment of described hardware-core; Described graphics process controller is by register interface and described software thread interactive command; Described graphics process controller is also connected with the level cache of processor hardware kernel by bus.
9. harmonious processor according to claim 8, is characterized in that, described graphics process fixed function module is independent hardware configuration, and it comprises rasterization unit, texture cell and raster operation unit; Described graphics process controller reads described command register, and correspondingly drives described rasterization unit, texture cell and raster operation unit according to the command context reading.
10. harmonious processor according to claim 9, is characterized in that, described graphics process controller is also connected with the L2 cache of described harmonious processor hardware kernel; The DMA passage that described software thread configuration obtains a hardware-core is connected the L2 cache of described kernel and the local storage of described kernel.
CN201410166054.5A 2014-04-24 2014-04-24 Method of realizing graphic processing in harmonic processor and harmonic processor Pending CN103995746A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410166054.5A CN103995746A (en) 2014-04-24 2014-04-24 Method of realizing graphic processing in harmonic processor and harmonic processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410166054.5A CN103995746A (en) 2014-04-24 2014-04-24 Method of realizing graphic processing in harmonic processor and harmonic processor

Publications (1)

Publication Number Publication Date
CN103995746A true CN103995746A (en) 2014-08-20

Family

ID=51309920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410166054.5A Pending CN103995746A (en) 2014-04-24 2014-04-24 Method of realizing graphic processing in harmonic processor and harmonic processor

Country Status (1)

Country Link
CN (1) CN103995746A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101802789A (en) * 2007-04-11 2010-08-11 苹果公司 Parallel runtime execution on multiple processors
CN102147722A (en) * 2011-04-08 2011-08-10 深圳中微电科技有限公司 Multithreading processor realizing functions of central processing unit and graphics processor and method
CN102750132A (en) * 2012-06-13 2012-10-24 深圳中微电科技有限公司 Thread control and call method for multithreading virtual assembly line processor, and processor
CN103064657A (en) * 2012-12-26 2013-04-24 深圳中微电科技有限公司 Method and device for achieving multi-application parallel processing on single processors
US20130321436A1 (en) * 2012-06-04 2013-12-05 Adobe Systems Inc. Method and apparatus for unifying graphics processing unit computation languages
CN103617088A (en) * 2013-11-29 2014-03-05 深圳中微电科技有限公司 Method, device and processor of device for distributing core resources in different types of threads of processor

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101802789A (en) * 2007-04-11 2010-08-11 苹果公司 Parallel runtime execution on multiple processors
CN102147722A (en) * 2011-04-08 2011-08-10 深圳中微电科技有限公司 Multithreading processor realizing functions of central processing unit and graphics processor and method
US20130321436A1 (en) * 2012-06-04 2013-12-05 Adobe Systems Inc. Method and apparatus for unifying graphics processing unit computation languages
CN102750132A (en) * 2012-06-13 2012-10-24 深圳中微电科技有限公司 Thread control and call method for multithreading virtual assembly line processor, and processor
CN103064657A (en) * 2012-12-26 2013-04-24 深圳中微电科技有限公司 Method and device for achieving multi-application parallel processing on single processors
CN103617088A (en) * 2013-11-29 2014-03-05 深圳中微电科技有限公司 Method, device and processor of device for distributing core resources in different types of threads of processor

Similar Documents

Publication Publication Date Title
US10120728B2 (en) Graphical processing unit (GPU) implementing a plurality of virtual GPUs
US11493974B2 (en) Dynamic power budget allocation in multi-processor system
US7447873B1 (en) Multithreaded SIMD parallel processor with loading of groups of threads
US8082426B2 (en) Support of a plurality of graphic processing units
CN100336075C (en) Appts. method and system with graphics-rendering engine having time allocator
US7594095B1 (en) Multithreaded SIMD parallel processor with launching of groups of threads
US10503520B2 (en) Automatic waking of power domains for graphics configuration requests
CN102147722B (en) Realize multiline procedure processor and the method for central processing unit and graphic process unit function
US5423009A (en) Dynamic sizing bus controller that allows unrestricted byte enable patterns
CN100527087C (en) Method for supporting multi-threaded instruction implementation of multi-core computer system drive program
US20140176586A1 (en) Multi-mode memory access techniques for performing graphics processing unit-based memory transfer operations
US10560892B2 (en) Advanced graphics power state management
US20100110089A1 (en) Multiple GPU Context Synchronization Using Barrier Type Primitives
US20120173847A1 (en) Parallel processor and method for thread processing thereof
CN101526934A (en) Construction method of GPU and CPU combined processor
US20190163254A1 (en) Core off sleep mode with low exit latency
US10552937B2 (en) Scalable memory interface for graphical processor unit
US8941669B1 (en) Split push buffer rendering for scalability
US11257182B2 (en) GPU mixed primitive topology type processing
CN112580792B (en) Neural network multi-core tensor processor
CN101178806B (en) System and method for managing texture data in computer
US10613972B2 (en) Dynamic configuration of caches in a multi-context supported graphics processor
US9196014B2 (en) Buffer clearing apparatus and method for computer graphics
US20190197658A1 (en) Intelligent memory dvfs scheme exploiting graphics inter-frame level correlation
CN100423081C (en) Hardware acceleration display horizontal line section device and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20180309