CN115344307A - Hardware circuit implementation method for rapid multi-target multi-task allocation - Google Patents

Hardware circuit implementation method for rapid multi-target multi-task allocation Download PDF

Info

Publication number
CN115344307A
CN115344307A CN202210831288.1A CN202210831288A CN115344307A CN 115344307 A CN115344307 A CN 115344307A CN 202210831288 A CN202210831288 A CN 202210831288A CN 115344307 A CN115344307 A CN 115344307A
Authority
CN
China
Prior art keywords
signal
data
allocation
request
pcv
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210831288.1A
Other languages
Chinese (zh)
Inventor
钱家祥
石小刚
黄光新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhihua Microelectronics Technology Nanjing Co ltd
Original Assignee
Zhihua Microelectronics Technology Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhihua Microelectronics Technology Nanjing Co ltd filed Critical Zhihua Microelectronics Technology Nanjing Co ltd
Priority to CN202210831288.1A priority Critical patent/CN115344307A/en
Publication of CN115344307A publication Critical patent/CN115344307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Generation (AREA)

Abstract

The invention discloses a hardware circuit implementation method for rapid multi-target multi-task allocation, which comprises the steps of constructing a circuit module of an allocator, and locking an allocation result of a current target task; constructing a data transmission control circuit module, and generating a write data signal according to the acquired distribution result signal and a PCV (geometric acceleration Module) request signal; constructing a read address generating circuit module, generating a counter according to the acquired read data enabling signal, and outputting a read data address; constructing a write address generating circuit module; and simultaneously transmitting the multi-target tasks until the data transmission is finished, and then scheduling the next round of data. By means of rapidly distributing result data generated by the 4 geometric acceleration modules PCV to the 4 rasterization modules, a hardware circuit can complete distribution of a plurality of tasks of a plurality of targets by using one beat clock cycle, and when the distribution is finished, the method has the advantages that a data transmission channel is automatically established among the plurality of tasks and the plurality of tasks of the plurality of targets can simultaneously transmit data.

Description

Hardware circuit implementation method for rapid multi-target multi-task allocation
Technical Field
The invention relates to the technical field of computer graphic processor design, in particular to a hardware circuit implementation method for rapid multi-target multi-task allocation.
Background
OpenGL (computer graphics) -based general-purpose graphics processors mainly accomplish acceleration of OpenGL. Rendering 3D graphics using OpenGL requires many stages of pipelining to be done.
Each stage of pipeline needs to complete a large number of parallel computing tasks, a large number of data are generated and sent to the next stage of pipeline, and then computing is carried out. The next stage pipeline also has many parallel circuits to process the data. The problem of multi-objective multi-tasking is often encountered.
Therefore, the problem that when a large amount of parallel data generated by a plurality of upper parallel pipelines are quickly and efficiently distributed to a plurality of lower parallel pipelines for processing is solved, the problem that due to the fact that the data generated by the upper pipelines cannot be distributed, the lower pipelines cannot obtain the data, and the waste of computing resources is caused is avoided, and the method becomes an industrial research hotspot.
Disclosure of Invention
The invention aims to provide a hardware circuit implementation method for rapid multi-target and multi-task allocation, which solves the problems in the prior art by rapidly allocating result data generated by 4 geometric acceleration modules PCV to 4 rasterization modules. In order to achieve the purpose, the invention provides the following technical scheme:
a hardware circuit implementation method for rapid multi-target multi-task allocation comprises the following steps:
constructing a distributor circuit module, performing target task distribution on one rasterizer aster according to a PCV (geometric acceleration Module) request signal in the hardware circuit, after the target task distribution is finished, rejecting a target task distributed by the current rasterizer aster, and locking a distribution result of the current target task and outputting the result;
a data transmission control circuit module is constructed, the obtained distribution result of the current target task is latched, and after a distribution result signal is synchronously generated, a data writing signal is generated according to the obtained distribution result signal and the PCV request signal;
and constructing a read address generation circuit module, and latching the response signal of the geometric acceleration module PCV as a read data enable signal of the geometric acceleration module PCV. Enabling the counter according to the read data enabling signal, and outputting a read data address;
a write address generating circuit module is constructed, the raster device raster is latched to receive a response signal when the current target task is distributed, a write enabling signal is generated, and a write data address is output;
and finishing the construction of the target task distribution transmission channel, simultaneously transmitting the multi-target tasks until the data transmission is finished, and then scheduling the next round of data.
As an improvement of the hardware circuit implementation method for fast multi-target multi-task allocation, the divider circuit module is composed of four 1-level dividers in cascade connection, and the specific implementation mode is as follows:
firstly, according to a geometric acceleration module PCV request signal req [3:0] and a rasterizer raster in the hardware circuit, a completion signal rdy [3:0] is prepared when a target task is distributed, a response signal ack _ req [3:0] of the request, a response signal ack _ rdy [3:0] of the ready signal, and an allocation selection signal arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre when the rasterizer performs the allocation of the target task;
secondly, each 1-level distributor carries out preferential coding on the response signal ack _ req [3:0] of the request to generate a coding result pri _ code; the encoding result pri _ code is an encoding of a geometric acceleration module PCV to be distributed;
if rdyx is high, indicating that the rasterizer raster to be currently allocated is ready for perfection, then pri _ code is selected; on the contrary, the raster device raster to be allocated currently is not prepared, task allocation is not needed, and a 4' d0,1-level allocator is selected not to allocate; finally, a grade 1 allocation result arb _ o is generated;
thirdly, the grade 1 allocation result arb _ o is subjected to register output or reduction, after a response signal ack _ rdy _ o to a data receiving end is generated, the grade 1 allocation result arb _ o is subjected to bitwise negation and bitwise AND with an input geometric acceleration module PCV request signal req [3:0], an already allocated request is excluded as a next-grade allocation request
Finally, the state of the ready completion signal rdy [3:0] when the geometric acceleration module PCV request signal req [3:0] and the rasterizer raster perform target task allocation at this time is detected, and an lk _ en signal is generated for latching the allocation result and the relevant signals.
As an improvement of the hardware circuit implementation method for fast multi-target multi-task allocation, the specific implementation mode of generating the write data signal by the data transmission control circuit module is as follows:
firstly, the data transmission control circuit module latches four allocation selection signals arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre according to the allocation result of the target task output by the allocator circuit module, and generates four allocation result signals arb3_ rlst/arb2_ rlst/arb1_ rlst/arb0_ rls;
secondly, selecting according to the obtained four distribution result signals arb3_ rlst/arb2_ rlst/arb1_ rlst/arb0_ rlst and the geometry acceleration module PCV request data signal rdata 3-0;
finally, four write data signals wdata3-0 are generated, wherein write data signals wdata3-0 are write data buses to 4 rasterizers raster.
As an improvement of the hardware circuit implementation method for fast multi-target multi-task allocation, the specific implementation mode of the read address generation circuit module for outputting the read data address is as follows:
firstly, latching a response signal ack _ req [3:0] of a generation request, and outputting a read data enable signal ren3-0, wherein the read data enable signal ren3-0 is a read data request for a geometric acceleration module PCV request signal req [3:0], and when the signal is valid, starting to transmit data;
secondly, using a read data enable signal ren3-0 to respectively generate 4 counters, when the value of each counter is equal to an attribute number signal attr _ num, resetting the counter, invalidating the read enable signal and completing one-time data transmission, wherein the signal attr _ num is configuration information of a primitive and mainly indicates the number of the primitive attributes;
finally, four read data addresses are output based on the value of each counter.
As an improvement of the hardware circuit implementation method for fast multi-target multi-task allocation, the specific implementation mode that the write address generating circuit module generates the write enable signal and outputs the write data address is as follows:
firstly, a response signal ack _ rdy [3:0] of a latch ready signal generates wen _ pre/wen _ pre/wen _ pre/6253 zxft 623 _ pre four signals, the four signals delay one beat to generate wen-0, wherein the signal wen-0 is write enable to a rasterizer, and when the signal is effective, the corresponding rasterizer starts to receive data; meanwhile, the answer signal ack _ rdy [3:0] is an answer signal for distributing the target task to the rasterizer raster, and when the answer signal is valid, the answer signal indicates that the corresponding rasterizer raster is distributed with the task and can start to receive the distributed data;
secondly, delaying the latched acknowledge signal ack _ rdy [3:0] of the ready signal by one clock cycle to generate a write data enable signal wen-0;
thirdly, enabling 4 counters respectively by using wen _ pre/wen1_ pre/wen _ pre/wen3_ pre signals generated by the latch response signals, clearing the counters when the value of each counter is equal to attribute number signals attr _ num, invalidating write enabling signals, and completing one-time data transmission, wherein the signals attr _ num is the configuration information of the primitives and mainly indicates the number of the attributes of the primitives;
and finally, delaying the values of the four counters by one clock cycle and outputting four write data addresses.
Compared with the prior art, the invention has the following beneficial effects:
the invention enables a hardware circuit for fast multi-target and multi-task distribution in the general graphic processor to finish the distribution of a plurality of targets and a plurality of tasks only in one beat clock cycle by quickly distributing the result data generated by the 4 geometric acceleration modules PCV to the 4 rasterization modules, and has the advantages of automatically establishing a data transmission channel among the multi-task and the multi-target when the distribution is finished and simultaneously carrying out data transmission on the plurality of targets and the plurality of tasks.
Drawings
FIG. 1 is a schematic diagram of a peripheral operating environment of a hardware circuit for fast multi-object multi-task allocation according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an abstract circuit of a hardware circuit for fast distribution of results from 4 PCV modules to 4 rasters modules according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating an implementation principle of a circuit module of the distributor according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a class 1 dispenser implementation proposed in one embodiment of the present invention;
fig. 5 is a schematic block diagram of a data transmission control circuit module according to an embodiment of the present invention;
FIG. 6 is a block diagram of a read address generating circuit according to an embodiment of the present invention;
fig. 7 is a schematic block diagram of an implementation of a write address generation circuit module according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
The present invention will be described in further detail with reference to the accompanying drawings, but the present invention is not limited thereto.
As an understanding of the present invention, one of the general purpose graphics processors isThe hardware circuit for fast multi-target multi-task allocation is mainly used for fast allocating 4 rasterization modules raster to 4 geometric acceleration modules PCV. Simultaneously: (the signal attr _ num is the configuration information of the primitive and mainly indicates the number of the attributes of the primitive; the signal req indicates the data transmission request of 4 modules PCV 3-0; the signal ack _ req indicates the corresponding request response and indicates that the corresponding data transmission request is allocated once the response signal is valid and data transmission is possible; the signal raddr3-0 is the attribute address for reading the data output by the PCV3-0 and is the address bus for data transmission; andrthe read data request from en3-0 to PCV3-0 is a read data enable signal for a data transfer. The signal is valid and data transmission is started; the signal rdata3-0 is read data fed back by the PCV3-0 and is a data bus for data transmission; the signal rdy indicates whether 4 rasterizing modules raster can receive data; the signal ack rdy is an answer to the assignment of tasks to the 4 rasterizing modules raster. Once the answer signal is valid, indicating that the corresponding rasterizing module has been assigned a task, it may begin to receive the assigned data; the signal waddr3-0 is written into the attribute address of 4 raster modules and is a data transmission address bus of the raster end; signal wen-0 is write enable for 4 rasters, the signal is valid, and the corresponding raster begins receiving data; signal wdata3-0 is the write data bus to 4 rasters. )
Referring to fig. 1, as an embodiment of the present invention, a hardware circuit implementation method for fast multi-objective multi-task allocation is provided, which includes
And constructing a distributor circuit module, distributing a target task to one rasterizer raster according to a PCV (geometric acceleration Module) request signal in a hardware circuit, after the target task is distributed, rejecting the target task distributed by the current rasterizer raster, and outputting after locking a distribution result of the current target task.
As shown in fig. 2, it can be understood that the main idea of the allocator circuit module is to allocate a task to one raster first, and then to allocate the next raster, and to allocate a next raster, the priority code is used to find the first request for allocation, and then to remove the PCV request that has been allocated, and the remaining requests are used again as the next allocation requests until the task is allocated or the target is allocated.
Based on the technical concept, the divider circuit module is composed of four 1-level divider cascades, as shown in fig. 3-4, and the specific implementation manner is as follows:
first, a request response signal ack _ req [3:0] (it is understood that the signal ack _ req indicates a corresponding request response indicating that a corresponding data transmission request has been assigned and data transmission is possible once the response signal is valid, an acknowledgement signal ack _ rdy [3:0] of a ready completion signal is generated in accordance with a geometric acceleration module PCV request signal req [3:0] and a rasterizer raster assignment completion signal rdy [3:0] in a hardware circuit (it is understood that the signal ack _ req is a response to assign tasks to 4 rasterizing modules raster, it is understood that the corresponding rasterizing modules have been assigned tasks and reception of assigned data can be started), and an assignment selection signal arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre when the rasterizer raster assigns a target task, it is understood that the assignment selection signal 4 arb is the assignment result of the target task;
secondly, each 1-level distributor carries out preferential coding on the response signal ack _ req [3:0] of the request to generate a coding result pri _ code; the encoding result pri _ code is encoding of a geometric acceleration module PCV to be distributed; if rdyx is high, indicating that the rasterizer raster to be currently allocated is ready for perfection, then pri _ code is selected; otherwise, indicating that the rasterizer raster to be currently allocated is not ready, no task allocation is needed, selecting 4' d0, and finally generating a level-1 allocation result arb _ o, wherein the coded value of arb _ o is the code of the corresponding PCV allocated to the current raster;
and thirdly, reducing or outputting the level 1 allocation result arb _ o to a register, generating a response signal ack _ rdy _ o to a data receiving end, bitwise negating the level 1 allocation result arb _ o, bitwise anding the bitwise negation result arb _ o with an input geometric acceleration module PCV request signal req [3:0], and excluding the allocated request as a next-level allocation request.
Finally, the state of the ready completion signal rdy [3:0] when the geometric acceleration module PCV request signal req [3:0] and the rasterizer raster perform target task allocation at this time is detected, and an lk _ en signal is generated for latching the allocation result and the relevant signals. The following is also required:
and constructing a data transmission control circuit module, latching the acquired distribution result of the current target task, synchronously generating a distribution result signal, and generating a write data signal according to the acquired distribution result signal and a geometric acceleration module PCV request signal.
Based on the above technical concept, as shown in fig. 5, it can be understood that the specific implementation manner of the data transmission control circuit module generating the write data signal is as follows:
firstly, the data transmission control circuit module latches four allocation selection signals arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre according to the allocation result of the target task output by the allocator circuit module, and generates four allocation result signals arb3_ rlst/arb2_ rlst/arb1_ rlst/arb0_ rls;
secondly, selecting according to the obtained four distribution result signals arb3_ rlst/arb2_ rlst/arb1_ rlst/arb0_ rlst and the geometry acceleration module PCV request data signal rdata 3-0;
finally, four write data signals wdata3-0 are generated, wherein the write data signals wdata3-0 are write data buses corresponding to 4 rasters. The following is also required:
constructing a read address generating circuit module, latching a PCV (geometric acceleration Module) request signal to output a read data enable signal, generating a counter according to the acquired read data enable signal, and outputting a read data address
Based on the above technical concept, as shown in fig. 6, the specific implementation of the read address generation circuit module outputting the read data address is as follows:
first, a response signal ack _ req [3:0] of a generation request is latched](it will be appreciated that the ack req signal indicates a corresponding request reply, and that once the reply signal is valid, indicating that a corresponding data transmission request has been allocated for data transfer), a read data enable signal is outputren3-0 (as will be appreciated, believed)Number (C)rThe read data request by en3-0 to PCV3-0 is a read data enable signal for a data transfer. This signal is valid, data transfer is started), where read data enable signal ren3-0 is the request signal req [3:0] for geometry acceleration module PCV]When the signal is valid, data transmission is started;
secondly, using a read data enable signal ren3-0 to respectively generate 4 counters, and when the value of each counter is equal to an attribute number signal attr _ num (it can be understood that the signal attr _ num is the configuration information of the primitive and mainly indicates the number of the primitive attributes), clearing the counters and completing one data transmission;
finally, four read data addresses are output based on the value of each counter, which is four read data addresses at this time. The following is also required:
and constructing a write address generating circuit module, latching the response signal of the raster receiving the current target task distribution, generating a write enable signal and outputting a write data address.
Based on the above technical concept, as shown in fig. 7, the write address generating circuit block is implemented in the same idea as the read address generating circuit block, that is,
first, a ready-to-complete signal ack _ rdy [3:0] (it is understood that signal ack _ rdy is an acknowledgement that 4 rasterizer modules are assigned tasks, once the acknowledgement signal is valid, indicating that the corresponding rasterizer module has been assigned tasks, the assigned data may begin to be received), wen _ pre/wen _ pre/wen2_ pre/wen _ pre four signals are generated, which are beat by beat, generating wen3-0, wherein signal wen-3-0 is a write enable to rasterizer, and when the signal is valid, the corresponding rasterizer module begins to receive data;
next, the latch result wen _ pre/wen1_ pre/wen2_ pre/wen3_ pre of the response signal ack _ rdy [3:0] is delayed by one clock cycle, generating a write data enable signal wen-0;
thirdly, using the latched result wen0_ pre/wen1_ pre/wen _ pre/wen _ pre to respectively generate 4 counters, when the value of each counter is equal to the attribute number signal attr _ num, clearing the counter, invalidating the read enable signal and completing one data transmission, wherein the signal attr _ num is the configuration information of the primitive and mainly indicates the number of the attributes of the primitive;
and finally, delaying the values of the four counters by one clock cycle and outputting four write data addresses.
And after the establishment of the target task distribution transmission channel is finished, simultaneously transmitting the multi-target tasks until the data transmission is finished, and then scheduling the data in the next round.
Based on the circuit implementation, it can be understood that a hardware circuit for fast multi-target and multi-task allocation in a general graphic processor can complete allocation of multiple targets and multiple tasks at one beat, once allocation is finished, a data transmission channel is automatically established among the multiple targets and the multiple targets, and the multiple targets and the multiple tasks can simultaneously carry out data transmission, so that when a large amount of parallel data generated by a plurality of upper-level parallel pipelines are quickly and efficiently allocated to a plurality of lower-level parallel pipelines for processing, the problem that due to untimely allocation, data cannot be allocated to the upper-level pipelines, the lower-level pipelines cannot obtain data, and the waste of computing resources is caused is solved.
While there have been shown and described the fundamental principles and essential features of the invention and advantages thereof, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof; the present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (5)

1. A hardware circuit implementation method for rapid multi-target multi-task allocation is characterized by comprising the following steps
Constructing a distributor circuit module, performing target task distribution on one rasterizer raster according to a PCV (geometric acceleration Module) request signal in the hardware circuit, after the target task distribution is finished, rejecting a target task distributed by the current rasterizer raster, and outputting after a distribution result of the current target task is locked;
the method comprises the steps that a data transmission control circuit module is constructed, the obtained distribution result of a current target task is latched, a distribution result signal is synchronously generated, and then a write data signal is generated according to the obtained distribution result signal and a PCV (geometric acceleration Module) request signal;
constructing a read address generating circuit module, latching the response signal of the geometric acceleration module PCV, enabling a counter according to a read data enabling signal after the response signal is used as the read data enabling signal of the geometric acceleration module PCV, and outputting a read data address;
a write address generating circuit module is constructed, the rasterizer raster is latched to receive a response signal when the current target task is distributed, a write enable signal is generated, and a write data address is output;
and finishing the construction of the target task distribution transmission channel, simultaneously transmitting the multi-target tasks until the data transmission is finished, and then scheduling the next round of data.
2. The hardware circuit implementation method for fast multi-objective multi-task allocation according to claim 1, wherein the allocator circuit module is composed of four 1-level allocator cascades, and the specific implementation manner is as follows:
firstly, according to a geometric acceleration module PCV request signal req [3:0] and a preparation completion signal rdy [3:0] when a rasterizer performs target task allocation in the hardware circuit, generating a request response signal ack _ req [3:0], a preparation completion signal ack _ rdy [3:0] and an allocation selection signal arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre when the rasterizer performs target task allocation;
secondly, each 1-level distributor carries out preferential coding on the response signal ack _ req [3:0] of the request to generate a coding result pri _ code; the encoding result pri _ code is encoding of a geometric acceleration module PCV to be distributed;
if rdyx is high, indicating that the rasterizer raster to be currently allocated is ready for perfection, then pri _ code is selected; on the contrary, the raster device raster to be allocated currently is not prepared, task allocation is not needed, and a 4' d0,1-level allocator is selected not to allocate; finally, a 1-stage assignment result arb _ o is generated;
thirdly, the grade 1 allocation result arb _ o is subjected to register output or reduction, after a response signal ack _ rdy _ o to a data receiving end is generated, the grade 1 allocation result arb _ o is subjected to bitwise negation and bitwise AND with an input geometric acceleration module PCV request signal req [3:0], an already allocated request is excluded as a next-grade allocation request
Finally, the state of the ready completion signal rdy [3:0] when the geometric acceleration module PCV request signal req [3:0] and the rasterizer raster perform target task allocation at this time is detected, and an lk _ en signal is generated for latching the allocation result and the relevant signals.
3. The hardware circuit implementation method for fast multi-objective and multi-task allocation according to claim 1 or 2, wherein the specific implementation manner of the data transmission control circuit module generating the write data signal is as follows:
firstly, the data transmission control circuit module latches four allocation selection signals arb3_ pre/arb2_ pre/arb1_ pre/arb0_ pre according to the allocation result of the target task output by the allocator circuit module, and generates four allocation result signals arb3_ rlst/arb2_ rlst/arb1_ rlst/arb0_ rls;
secondly, selecting according to the obtained four distribution result signals arb3_ rlst/arb2_ rlst/arb1_ rlst/arb0_ rlst and the geometry acceleration module PCV request data signal rdata 3-0;
finally, four write data signals wdata3-0 are generated, wherein the write data signals wdata3-0 are write data buses to 4 rasterizers raster.
4. The hardware circuit implementation method for fast multi-target multi-task allocation according to claim 1 or 2, wherein the specific implementation manner of the read address generation circuit module outputting the read data address is as follows:
firstly, latching a response signal ack _ req [3:0] of a generation request, and outputting a read data enable signal ren3-0, wherein the read data enable signal ren3-0 is a read data request in a request signal for the geometric acceleration module PCV, and when the signal is valid, starting to transmit data;
secondly, writing data enable signals ren3-0 are used for respectively generating 4 counters, when the value of each counter is equal to attribute number signals attr _ num, the counters are cleared, the read enable signals are invalid, and one-time data transmission is completed, wherein the signals attr _ num are configuration information of the primitive and mainly indicate the number of the primitive attributes;
finally, four read data addresses are output based on the value of each counter.
5. The hardware circuit implementation method for fast multi-target multi-task allocation according to claim 1 or 2, wherein the specific implementation manner of the write address generation circuit module generating the write enable signal and outputting the write data address is as follows:
firstly, a response signal ack _ rdy [3:0] of a latch preparation completion signal generates four signals of wen _ pre/wen1_ pre/wen2_ pre/wen3_ pre, wherein the signal wen-0 is write enable to a rasterizer, and when the signal is effective, the corresponding rasterizer starts to receive data; the answer signal ack _ rdy [3:0] is an answer signal for assigning the rasterizer aster to the target task, and when the answer signal is valid, it indicates that the corresponding rasterizer aster has been assigned the task, and can start receiving the assigned data;
secondly, delaying the latched acknowledge signal ack _ rdy [3:0] of the ready signal by one clock cycle, generating a write data enable signal wen-0;
thirdly, 4 counters are respectively generated by using a latch answer signal result wen _ pre/wen1_ pre/wen _ pre/wen3_ pre, when the value of each counter is equal to an attribute number signal attr _ num, the counter is cleared, a write enable signal is invalid, and one-time data transmission is completed, wherein the signal attr _ num is the configuration information of the primitive and mainly indicates the number of the attributes of the primitive;
and finally, delaying the values of the four counters by one clock cycle and outputting four write data addresses.
CN202210831288.1A 2022-07-15 2022-07-15 Hardware circuit implementation method for rapid multi-target multi-task allocation Pending CN115344307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210831288.1A CN115344307A (en) 2022-07-15 2022-07-15 Hardware circuit implementation method for rapid multi-target multi-task allocation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210831288.1A CN115344307A (en) 2022-07-15 2022-07-15 Hardware circuit implementation method for rapid multi-target multi-task allocation

Publications (1)

Publication Number Publication Date
CN115344307A true CN115344307A (en) 2022-11-15

Family

ID=83948798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210831288.1A Pending CN115344307A (en) 2022-07-15 2022-07-15 Hardware circuit implementation method for rapid multi-target multi-task allocation

Country Status (1)

Country Link
CN (1) CN115344307A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117478707A (en) * 2023-12-27 2024-01-30 天津数智物联科技有限公司 Multi-target energy management data transmission method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117478707A (en) * 2023-12-27 2024-01-30 天津数智物联科技有限公司 Multi-target energy management data transmission method
CN117478707B (en) * 2023-12-27 2024-05-07 天津数智物联科技有限公司 Multi-target energy management data transmission method

Similar Documents

Publication Publication Date Title
CN110796588B (en) Simultaneous computing and graphics scheduling
US6642928B1 (en) Multi-processor graphics accelerator
US9250697B2 (en) Application programming interfaces for data parallel computing on multiple processors
KR100843548B1 (en) Concurrent access of shared resources
US8533435B2 (en) Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict
US20140229953A1 (en) System, method, and computer program product for management of dependency between tasks
CN103886547A (en) Technique For Storing Shared Vertices
US20110055511A1 (en) Interlocked Increment Memory Allocation and Access
KR20160134713A (en) Hardware-based atomic operations for supporting inter-task communication
CN103793876A (en) Distributed tiled caching
CN103886634A (en) Efficient Super-sampling With Per-pixel Shader Threads
US8928677B2 (en) Low latency concurrent computation
US11470394B2 (en) Scalable light-weight protocols for wire-speed packet ordering
CN103885893A (en) Technique For Accessing Content-Addressable Memory
CN103885902A (en) Technique For Performing Memory Access Operations Via Texture Hardware
CN103827842A (en) Writing message to controller memory space
CN115344307A (en) Hardware circuit implementation method for rapid multi-target multi-task allocation
CN103886538A (en) Technique For Storing Shared Vertices
CN103885903A (en) Technique For Performing Memory Access Operations Via Texture Hardware
CN116521096B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
CN116777725A (en) Data multicasting across programmatic controls of multiple compute engines
US20140082120A1 (en) Efficient cpu mailbox read access to gpu memory
CN116737083B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
US20130262812A1 (en) Hardware Managed Allocation and Deallocation Evaluation Circuit
CA2323116A1 (en) Graphic processor having multiple geometric operation units and method of processing data thereby

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination