CN115344307A

CN115344307A - Hardware circuit implementation method for rapid multi-target multi-task allocation

Info

Publication number: CN115344307A
Application number: CN202210831288.1A
Authority: CN
Inventors: 钱家祥; 石小刚; 黄光新
Original assignee: Zhihua Microelectronics Technology Nanjing Co ltd
Current assignee: Zhihua Microelectronics Technology Nanjing Co ltd
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2022-11-15

Abstract

The invention discloses a hardware circuit implementation method for rapid multi-target multi-task allocation, which comprises the steps of constructing a circuit module of an allocator, and locking an allocation result of a current target task; constructing a data transmission control circuit module, and generating a write data signal according to the acquired distribution result signal and a PCV (geometric acceleration Module) request signal; constructing a read address generating circuit module, generating a counter according to the acquired read data enabling signal, and outputting a read data address; constructing a write address generating circuit module; and simultaneously transmitting the multi-target tasks until the data transmission is finished, and then scheduling the next round of data. By means of rapidly distributing result data generated by the 4 geometric acceleration modules PCV to the 4 rasterization modules, a hardware circuit can complete distribution of a plurality of tasks of a plurality of targets by using one beat clock cycle, and when the distribution is finished, the method has the advantages that a data transmission channel is automatically established among the plurality of tasks and the plurality of tasks of the plurality of targets can simultaneously transmit data.

Description

Hardware circuit implementation method for rapid multi-target multi-task allocation

Technical Field

The invention relates to the technical field of computer graphic processor design, in particular to a hardware circuit implementation method for rapid multi-target multi-task allocation.

Background

OpenGL (computer graphics) -based general-purpose graphics processors mainly accomplish acceleration of OpenGL. Rendering 3D graphics using OpenGL requires many stages of pipelining to be done.

Each stage of pipeline needs to complete a large number of parallel computing tasks, a large number of data are generated and sent to the next stage of pipeline, and then computing is carried out. The next stage pipeline also has many parallel circuits to process the data. The problem of multi-objective multi-tasking is often encountered.

Therefore, the problem that when a large amount of parallel data generated by a plurality of upper parallel pipelines are quickly and efficiently distributed to a plurality of lower parallel pipelines for processing is solved, the problem that due to the fact that the data generated by the upper pipelines cannot be distributed, the lower pipelines cannot obtain the data, and the waste of computing resources is caused is avoided, and the method becomes an industrial research hotspot.

Disclosure of Invention

The invention aims to provide a hardware circuit implementation method for rapid multi-target and multi-task allocation, which solves the problems in the prior art by rapidly allocating result data generated by 4 geometric acceleration modules PCV to 4 rasterization modules. In order to achieve the purpose, the invention provides the following technical scheme:

a hardware circuit implementation method for rapid multi-target multi-task allocation comprises the following steps:

constructing a distributor circuit module, performing target task distribution on one rasterizer aster according to a PCV (geometric acceleration Module) request signal in the hardware circuit, after the target task distribution is finished, rejecting a target task distributed by the current rasterizer aster, and locking a distribution result of the current target task and outputting the result;

a data transmission control circuit module is constructed, the obtained distribution result of the current target task is latched, and after a distribution result signal is synchronously generated, a data writing signal is generated according to the obtained distribution result signal and the PCV request signal;

and constructing a read address generation circuit module, and latching the response signal of the geometric acceleration module PCV as a read data enable signal of the geometric acceleration module PCV. Enabling the counter according to the read data enabling signal, and outputting a read data address;

a write address generating circuit module is constructed, the raster device raster is latched to receive a response signal when the current target task is distributed, a write enabling signal is generated, and a write data address is output;

and finishing the construction of the target task distribution transmission channel, simultaneously transmitting the multi-target tasks until the data transmission is finished, and then scheduling the next round of data.

As an improvement of the hardware circuit implementation method for fast multi-target multi-task allocation, the divider circuit module is composed of four 1-level dividers in cascade connection, and the specific implementation mode is as follows:

firstly, according to a geometric acceleration module PCV request signal req [3:0] and a rasterizer raster in the hardware circuit, a completion signal rdy [3:0] is prepared when a target task is distributed, a response signal ack _ req [3:0] of the request, a response signal ack _ rdy [3:0] of the ready signal, and an allocation selection signal arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre when the rasterizer performs the allocation of the target task;

secondly, each 1-level distributor carries out preferential coding on the response signal ack _ req [3:0] of the request to generate a coding result pri _ code; the encoding result pri _ code is an encoding of a geometric acceleration module PCV to be distributed;

if rdyx is high, indicating that the rasterizer raster to be currently allocated is ready for perfection, then pri _ code is selected; on the contrary, the raster device raster to be allocated currently is not prepared, task allocation is not needed, and a 4' d0,1-level allocator is selected not to allocate; finally, a grade 1 allocation result arb _ o is generated;

thirdly, the grade 1 allocation result arb _ o is subjected to register output or reduction, after a response signal ack _ rdy _ o to a data receiving end is generated, the grade 1 allocation result arb _ o is subjected to bitwise negation and bitwise AND with an input geometric acceleration module PCV request signal req [3:0], an already allocated request is excluded as a next-grade allocation request

Finally, the state of the ready completion signal rdy [3:0] when the geometric acceleration module PCV request signal req [3:0] and the rasterizer raster perform target task allocation at this time is detected, and an lk _ en signal is generated for latching the allocation result and the relevant signals.

As an improvement of the hardware circuit implementation method for fast multi-target multi-task allocation, the specific implementation mode of generating the write data signal by the data transmission control circuit module is as follows:

firstly, the data transmission control circuit module latches four allocation selection signals arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre according to the allocation result of the target task output by the allocator circuit module, and generates four allocation result signals arb3_ rlst/arb2_ rlst/arb1_ rlst/arb0_ rls;

secondly, selecting according to the obtained four distribution result signals arb3_ rlst/arb2_ rlst/arb1_ rlst/arb0_ rlst and the geometry acceleration module PCV request data signal rdata 3-0;

finally, four write data signals wdata3-0 are generated, wherein write data signals wdata3-0 are write data buses to 4 rasterizers raster.

As an improvement of the hardware circuit implementation method for fast multi-target multi-task allocation, the specific implementation mode of the read address generation circuit module for outputting the read data address is as follows:

firstly, latching a response signal ack _ req [3:0] of a generation request, and outputting a read data enable signal ren3-0, wherein the read data enable signal ren3-0 is a read data request for a geometric acceleration module PCV request signal req [3:0], and when the signal is valid, starting to transmit data;

secondly, using a read data enable signal ren3-0 to respectively generate 4 counters, when the value of each counter is equal to an attribute number signal attr _ num, resetting the counter, invalidating the read enable signal and completing one-time data transmission, wherein the signal attr _ num is configuration information of a primitive and mainly indicates the number of the primitive attributes;

finally, four read data addresses are output based on the value of each counter.

As an improvement of the hardware circuit implementation method for fast multi-target multi-task allocation, the specific implementation mode that the write address generating circuit module generates the write enable signal and outputs the write data address is as follows:

firstly, a response signal ack _ rdy [3:0] of a latch ready signal generates wen _ pre/wen _ pre/wen _ pre/6253 zxft 623 _ pre four signals, the four signals delay one beat to generate wen-0, wherein the signal wen-0 is write enable to a rasterizer, and when the signal is effective, the corresponding rasterizer starts to receive data; meanwhile, the answer signal ack _ rdy [3:0] is an answer signal for distributing the target task to the rasterizer raster, and when the answer signal is valid, the answer signal indicates that the corresponding rasterizer raster is distributed with the task and can start to receive the distributed data;

secondly, delaying the latched acknowledge signal ack _ rdy [3:0] of the ready signal by one clock cycle to generate a write data enable signal wen-0;

thirdly, enabling 4 counters respectively by using wen _ pre/wen1_ pre/wen _ pre/wen3_ pre signals generated by the latch response signals, clearing the counters when the value of each counter is equal to attribute number signals attr _ num, invalidating write enabling signals, and completing one-time data transmission, wherein the signals attr _ num is the configuration information of the primitives and mainly indicates the number of the attributes of the primitives;

and finally, delaying the values of the four counters by one clock cycle and outputting four write data addresses.

Compared with the prior art, the invention has the following beneficial effects:

the invention enables a hardware circuit for fast multi-target and multi-task distribution in the general graphic processor to finish the distribution of a plurality of targets and a plurality of tasks only in one beat clock cycle by quickly distributing the result data generated by the 4 geometric acceleration modules PCV to the 4 rasterization modules, and has the advantages of automatically establishing a data transmission channel among the multi-task and the multi-target when the distribution is finished and simultaneously carrying out data transmission on the plurality of targets and the plurality of tasks.

Drawings

FIG. 1 is a schematic diagram of a peripheral operating environment of a hardware circuit for fast multi-object multi-task allocation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an abstract circuit of a hardware circuit for fast distribution of results from 4 PCV modules to 4 rasters modules according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating an implementation principle of a circuit module of the distributor according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a class 1 dispenser implementation proposed in one embodiment of the present invention;

fig. 5 is a schematic block diagram of a data transmission control circuit module according to an embodiment of the present invention;

FIG. 6 is a block diagram of a read address generating circuit according to an embodiment of the present invention;

fig. 7 is a schematic block diagram of an implementation of a write address generation circuit module according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.

In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.

The present invention will be described in further detail with reference to the accompanying drawings, but the present invention is not limited thereto.

As an understanding of the present invention, one of the general purpose graphics processors isThe hardware circuit for fast multi-target multi-task allocation is mainly used for fast allocating 4 rasterization modules raster to 4 geometric acceleration modules PCV. Simultaneously: (the signal attr _ num is the configuration information of the primitive and mainly indicates the number of the attributes of the primitive; the signal req indicates the data transmission request of 4 modules PCV 3-0; the signal ack _ req indicates the corresponding request response and indicates that the corresponding data transmission request is allocated once the response signal is valid and data transmission is possible; the signal raddr3-0 is the attribute address for reading the data output by the PCV3-0 and is the address bus for data transmission; andrthe read data request from en3-0 to PCV3-0 is a read data enable signal for a data transfer. The signal is valid and data transmission is started; the signal rdata3-0 is read data fed back by the PCV3-0 and is a data bus for data transmission; the signal rdy indicates whether 4 rasterizing modules raster can receive data; the signal ack rdy is an answer to the assignment of tasks to the 4 rasterizing modules raster. Once the answer signal is valid, indicating that the corresponding rasterizing module has been assigned a task, it may begin to receive the assigned data; the signal waddr3-0 is written into the attribute address of 4 raster modules and is a data transmission address bus of the raster end; signal wen-0 is write enable for 4 rasters, the signal is valid, and the corresponding raster begins receiving data; signal wdata3-0 is the write data bus to 4 rasters. )

Referring to fig. 1, as an embodiment of the present invention, a hardware circuit implementation method for fast multi-objective multi-task allocation is provided, which includes

And constructing a distributor circuit module, distributing a target task to one rasterizer raster according to a PCV (geometric acceleration Module) request signal in a hardware circuit, after the target task is distributed, rejecting the target task distributed by the current rasterizer raster, and outputting after locking a distribution result of the current target task.

As shown in fig. 2, it can be understood that the main idea of the allocator circuit module is to allocate a task to one raster first, and then to allocate the next raster, and to allocate a next raster, the priority code is used to find the first request for allocation, and then to remove the PCV request that has been allocated, and the remaining requests are used again as the next allocation requests until the task is allocated or the target is allocated.

Based on the technical concept, the divider circuit module is composed of four 1-level divider cascades, as shown in fig. 3-4, and the specific implementation manner is as follows:

first, a request response signal ack _ req [3:0] (it is understood that the signal ack _ req indicates a corresponding request response indicating that a corresponding data transmission request has been assigned and data transmission is possible once the response signal is valid, an acknowledgement signal ack _ rdy [3:0] of a ready completion signal is generated in accordance with a geometric acceleration module PCV request signal req [3:0] and a rasterizer raster assignment completion signal rdy [3:0] in a hardware circuit (it is understood that the signal ack _ req is a response to assign tasks to 4 rasterizing modules raster, it is understood that the corresponding rasterizing modules have been assigned tasks and reception of assigned data can be started), and an assignment selection signal arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre when the rasterizer raster assigns a target task, it is understood that the assignment selection signal 4 arb is the assignment result of the target task;

secondly, each 1-level distributor carries out preferential coding on the response signal ack _ req [3:0] of the request to generate a coding result pri _ code; the encoding result pri _ code is encoding of a geometric acceleration module PCV to be distributed; if rdyx is high, indicating that the rasterizer raster to be currently allocated is ready for perfection, then pri _ code is selected; otherwise, indicating that the rasterizer raster to be currently allocated is not ready, no task allocation is needed, selecting 4' d0, and finally generating a level-1 allocation result arb _ o, wherein the coded value of arb _ o is the code of the corresponding PCV allocated to the current raster;

and thirdly, reducing or outputting the level 1 allocation result arb _ o to a register, generating a response signal ack _ rdy _ o to a data receiving end, bitwise negating the level 1 allocation result arb _ o, bitwise anding the bitwise negation result arb _ o with an input geometric acceleration module PCV request signal req [3:0], and excluding the allocated request as a next-level allocation request.

Finally, the state of the ready completion signal rdy [3:0] when the geometric acceleration module PCV request signal req [3:0] and the rasterizer raster perform target task allocation at this time is detected, and an lk _ en signal is generated for latching the allocation result and the relevant signals. The following is also required:

and constructing a data transmission control circuit module, latching the acquired distribution result of the current target task, synchronously generating a distribution result signal, and generating a write data signal according to the acquired distribution result signal and a geometric acceleration module PCV request signal.

Based on the above technical concept, as shown in fig. 5, it can be understood that the specific implementation manner of the data transmission control circuit module generating the write data signal is as follows:

finally, four write data signals wdata3-0 are generated, wherein the write data signals wdata3-0 are write data buses corresponding to 4 rasters. The following is also required:

constructing a read address generating circuit module, latching a PCV (geometric acceleration Module) request signal to output a read data enable signal, generating a counter according to the acquired read data enable signal, and outputting a read data address

Based on the above technical concept, as shown in fig. 6, the specific implementation of the read address generation circuit module outputting the read data address is as follows:

first, a response signal ack _ req [3:0] of a generation request is latched](it will be appreciated that the ack req signal indicates a corresponding request reply, and that once the reply signal is valid, indicating that a corresponding data transmission request has been allocated for data transfer), a read data enable signal is outputren3-0 (as will be appreciated, believed)Number (C)rThe read data request by en3-0 to PCV3-0 is a read data enable signal for a data transfer. This signal is valid, data transfer is started), where read data enable signal ren3-0 is the request signal req [3:0] for geometry acceleration module PCV]When the signal is valid, data transmission is started;

secondly, using a read data enable signal ren3-0 to respectively generate 4 counters, and when the value of each counter is equal to an attribute number signal attr _ num (it can be understood that the signal attr _ num is the configuration information of the primitive and mainly indicates the number of the primitive attributes), clearing the counters and completing one data transmission;

finally, four read data addresses are output based on the value of each counter, which is four read data addresses at this time. The following is also required:

and constructing a write address generating circuit module, latching the response signal of the raster receiving the current target task distribution, generating a write enable signal and outputting a write data address.

Based on the above technical concept, as shown in fig. 7, the write address generating circuit block is implemented in the same idea as the read address generating circuit block, that is,

first, a ready-to-complete signal ack _ rdy [3:0] (it is understood that signal ack _ rdy is an acknowledgement that 4 rasterizer modules are assigned tasks, once the acknowledgement signal is valid, indicating that the corresponding rasterizer module has been assigned tasks, the assigned data may begin to be received), wen _ pre/wen _ pre/wen2_ pre/wen _ pre four signals are generated, which are beat by beat, generating wen3-0, wherein signal wen-3-0 is a write enable to rasterizer, and when the signal is valid, the corresponding rasterizer module begins to receive data;

next, the latch result wen _ pre/wen1_ pre/wen2_ pre/wen3_ pre of the response signal ack _ rdy [3:0] is delayed by one clock cycle, generating a write data enable signal wen-0;

thirdly, using the latched result wen0_ pre/wen1_ pre/wen _ pre/wen _ pre to respectively generate 4 counters, when the value of each counter is equal to the attribute number signal attr _ num, clearing the counter, invalidating the read enable signal and completing one data transmission, wherein the signal attr _ num is the configuration information of the primitive and mainly indicates the number of the attributes of the primitive;

And after the establishment of the target task distribution transmission channel is finished, simultaneously transmitting the multi-target tasks until the data transmission is finished, and then scheduling the data in the next round.

Based on the circuit implementation, it can be understood that a hardware circuit for fast multi-target and multi-task allocation in a general graphic processor can complete allocation of multiple targets and multiple tasks at one beat, once allocation is finished, a data transmission channel is automatically established among the multiple targets and the multiple targets, and the multiple targets and the multiple tasks can simultaneously carry out data transmission, so that when a large amount of parallel data generated by a plurality of upper-level parallel pipelines are quickly and efficiently allocated to a plurality of lower-level parallel pipelines for processing, the problem that due to untimely allocation, data cannot be allocated to the upper-level pipelines, the lower-level pipelines cannot obtain data, and the waste of computing resources is caused is solved.

While there have been shown and described the fundamental principles and essential features of the invention and advantages thereof, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof; the present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A hardware circuit implementation method for rapid multi-target multi-task allocation is characterized by comprising the following steps

Constructing a distributor circuit module, performing target task distribution on one rasterizer raster according to a PCV (geometric acceleration Module) request signal in the hardware circuit, after the target task distribution is finished, rejecting a target task distributed by the current rasterizer raster, and outputting after a distribution result of the current target task is locked;

the method comprises the steps that a data transmission control circuit module is constructed, the obtained distribution result of a current target task is latched, a distribution result signal is synchronously generated, and then a write data signal is generated according to the obtained distribution result signal and a PCV (geometric acceleration Module) request signal;

constructing a read address generating circuit module, latching the response signal of the geometric acceleration module PCV, enabling a counter according to a read data enabling signal after the response signal is used as the read data enabling signal of the geometric acceleration module PCV, and outputting a read data address;

a write address generating circuit module is constructed, the rasterizer raster is latched to receive a response signal when the current target task is distributed, a write enable signal is generated, and a write data address is output;

2. The hardware circuit implementation method for fast multi-objective multi-task allocation according to claim 1, wherein the allocator circuit module is composed of four 1-level allocator cascades, and the specific implementation manner is as follows:

firstly, according to a geometric acceleration module PCV request signal req [3:0] and a preparation completion signal rdy [3:0] when a rasterizer performs target task allocation in the hardware circuit, generating a request response signal ack _ req [3:0], a preparation completion signal ack _ rdy [3:0] and an allocation selection signal arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre when the rasterizer performs target task allocation;

secondly, each 1-level distributor carries out preferential coding on the response signal ack _ req [3:0] of the request to generate a coding result pri _ code; the encoding result pri _ code is encoding of a geometric acceleration module PCV to be distributed;

if rdyx is high, indicating that the rasterizer raster to be currently allocated is ready for perfection, then pri _ code is selected; on the contrary, the raster device raster to be allocated currently is not prepared, task allocation is not needed, and a 4' d0,1-level allocator is selected not to allocate; finally, a 1-stage assignment result arb _ o is generated;

3. The hardware circuit implementation method for fast multi-objective and multi-task allocation according to claim 1 or 2, wherein the specific implementation manner of the data transmission control circuit module generating the write data signal is as follows:

finally, four write data signals wdata3-0 are generated, wherein the write data signals wdata3-0 are write data buses to 4 rasterizers raster.

4. The hardware circuit implementation method for fast multi-target multi-task allocation according to claim 1 or 2, wherein the specific implementation manner of the read address generation circuit module outputting the read data address is as follows:

firstly, latching a response signal ack _ req [3:0] of a generation request, and outputting a read data enable signal ren3-0, wherein the read data enable signal ren3-0 is a read data request in a request signal for the geometric acceleration module PCV, and when the signal is valid, starting to transmit data;

secondly, writing data enable signals ren3-0 are used for respectively generating 4 counters, when the value of each counter is equal to attribute number signals attr _ num, the counters are cleared, the read enable signals are invalid, and one-time data transmission is completed, wherein the signals attr _ num are configuration information of the primitive and mainly indicate the number of the primitive attributes;

5. The hardware circuit implementation method for fast multi-target multi-task allocation according to claim 1 or 2, wherein the specific implementation manner of the write address generation circuit module generating the write enable signal and outputting the write data address is as follows:

firstly, a response signal ack _ rdy [3:0] of a latch preparation completion signal generates four signals of wen _ pre/wen1_ pre/wen2_ pre/wen3_ pre, wherein the signal wen-0 is write enable to a rasterizer, and when the signal is effective, the corresponding rasterizer starts to receive data; the answer signal ack _ rdy [3:0] is an answer signal for assigning the rasterizer aster to the target task, and when the answer signal is valid, it indicates that the corresponding rasterizer aster has been assigned the task, and can start receiving the assigned data;

secondly, delaying the latched acknowledge signal ack _ rdy [3:0] of the ready signal by one clock cycle, generating a write data enable signal wen-0;

thirdly, 4 counters are respectively generated by using a latch answer signal result wen _ pre/wen1_ pre/wen _ pre/wen3_ pre, when the value of each counter is equal to an attribute number signal attr _ num, the counter is cleared, a write enable signal is invalid, and one-time data transmission is completed, wherein the signal attr _ num is the configuration information of the primitive and mainly indicates the number of the attributes of the primitive;