CN116820580B

CN116820580B - Instruction execution method, system and device, graphics processor and electronic equipment

Info

Publication number: CN116820580B
Application number: CN202311110694.XA
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Moore Threads Technology Co Ltd
Current assignee: Moore Threads Technology Co Ltd
Priority date: 2023-08-31
Filing date: 2023-08-31
Publication date: 2023-11-10
Anticipated expiration: 2043-08-31
Also published as: CN116820580A

Abstract

The embodiment of the disclosure discloses an instruction execution method, an instruction execution system, an instruction execution device, a graphics processor and electronic equipment, and aims to at least solve the problems that the accuracy of instruction completion time obtained in the related technology is low, high resource consumption is required and the like. The method is applied to the graphic processor, and the specific implementation scheme is as follows: determining a control block corresponding to a block based on an instruction sequence, wherein the control block comprises a query tag corresponding to a target query instruction in the instruction sequence, and the position of the query tag in the control block is determined based on the position of the target query instruction in the instruction sequence; and sending the query tag in the control block corresponding to the block to a tag collector, so that the tag collector determines the execution state of the target query instruction based on the query tag.

Description

Instruction execution method, system and device, graphics processor and electronic equipment

Technical Field

The present disclosure relates to, but is not limited to, the field of computer technology, and in particular, to a method, a system, an apparatus, a graphics processor, an electronic device, and a storage medium for executing instructions.

Background

In a graphics processor (GraphicsProcessing Unit, abbreviated GPU) of the TBDR (Tile-Based Deferred Rendering) architecture, processing of partial instructions (e.g., draw calls (also known as drawing instructions), etc.) also requires passing through front-end modules (also known as geometry stages) and back-end modules (also known as illumination stages). In implementation, the processing results of the geometric phases of multiple drawing instructions (Draw Call) are combined and then sent to the illumination phase, so that the processing results of the geometric phases of the same drawing instruction may be distributed to multiple tiles for processing, and thus the completion time of each drawing instruction in the illumination phase cannot be accurately defined.

In the related art, a driver is generally used to insert a refresh request after each drawing command, and the processing result of the geometric stage of each drawing command is separately sent to the illumination stage for processing, so as to obtain the completion time of the drawing command. Therefore, the method is contrary to the conventional use scene, so that the accuracy of the completion time is not high, and the objects to be rendered need to be repeatedly read and written, so that the resource consumption of bandwidth, power consumption and the like is increased, and the performance of the system is reduced.

Disclosure of Invention

Embodiments of the present disclosure provide a method, system, and apparatus for executing instructions, a graphics processor, an electronic device, a storage medium, and a computer program product.

The technical scheme of the embodiment of the disclosure is realized as follows:

an embodiment of the present disclosure provides an instruction execution method, applied to a graphics processor, including:

determining a control block corresponding to a block based on an instruction sequence, wherein the control block comprises a query tag corresponding to a target query instruction in the instruction sequence, and the position of the query tag in the control block is determined based on the position of the target query instruction in the instruction sequence;

and sending the query tag in the control block corresponding to the block to a tag collector, so that the tag collector determines the execution state of the target query instruction based on the query tag.

An embodiment of the present disclosure provides an instruction execution method, applied to a tag collector, including:

receiving a query mark in a control block corresponding to a block sent by a graphic processor; wherein the query tag corresponds to a target query instruction in an instruction sequence, the location of the query tag in the corresponding control block being determined based on the location of the target query instruction in the instruction sequence;

Based on the query tag, an execution state of the target query instruction is determined.

Embodiments of the present disclosure provide an execution system of instructions, the system comprising a graphics processor and a tag collector, wherein:

the graphics processor is used for determining a control block corresponding to a block based on an instruction sequence, wherein the control block comprises a query tag corresponding to a target query instruction in the instruction sequence, and the position of the query tag in the control block is determined based on the position of the target query instruction in the instruction sequence; sending the query marks in the control blocks corresponding to the image blocks to a mark collector;

the mark collector is used for receiving the query mark sent by the graphic processor; based on the query tag, an execution state of the target query instruction is determined.

An embodiment of the present disclosure provides an instruction execution apparatus, which is applied to a graphics processor, and includes:

a first determining module, configured to determine, based on an instruction sequence, a control block corresponding to a tile, where the control block includes a query tag corresponding to a target query instruction in the instruction sequence, and a position of the query tag in the control block is determined based on a position of the target query instruction in the instruction sequence;

And the sending module is used for sending the query tag in the control block corresponding to the block to a tag collector so that the tag collector determines the execution state of the target query instruction based on the query tag.

An embodiment of the present disclosure provides an instruction execution apparatus, which is applied to a tag collector, and the apparatus includes:

the receiving module is used for receiving the query marks in the control blocks corresponding to the image blocks sent by the image processor; wherein the query tag corresponds to a target query instruction in an instruction sequence, the location of the query tag in the corresponding control block being determined based on the location of the target query instruction in the instruction sequence;

and the second determining module is used for determining the execution state of the target query instruction based on the query mark.

The disclosed embodiments provide a graphics processor that performs tile distribution based on a tile rendering TBR architecture, the graphics processor including a marker collector and at least two rendering cores, wherein:

the front-end module of the TBR architecture is used for determining a control block corresponding to a block based on an instruction sequence, wherein the control block comprises a query tag corresponding to a target query instruction in the instruction sequence, and the position of the query tag in the control block is determined based on the position of the target query instruction in the instruction sequence;

A back end module of the TBR architecture, configured to determine a target rendering core of the tile from the at least two rendering cores, so that the target rendering core sends a query tag in a control block corresponding to the tile to the tag collector;

the mark collector is used for receiving the query mark sent by the target rendering core; based on the query tag, an execution state of the target query instruction is determined.

An embodiment of the present disclosure provides an electronic device including a processor and a memory storing a computer program executable on the processor, the processor implementing the above method when executing the computer program.

The disclosed embodiments provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method.

Embodiments of the present disclosure provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, implements the above-described method.

In the embodiment of the disclosure, a control block corresponding to a block is determined based on an instruction sequence, wherein the control block comprises a query tag corresponding to a target query instruction in the instruction sequence, and the position of the query tag in the control block is determined based on the position of the target query instruction in the instruction sequence; and sending the query tag in the control block corresponding to the block to a tag collector, so that the tag collector determines the execution state of the target query instruction based on the query tag. On the one hand, query marks corresponding to the query instructions are respectively inserted into the control blocks corresponding to the image blocks, and the driver is not required to insert a refresh request after each drawing instruction, so that the operation on the driver is reduced, and the friendliness degree on the driver is improved; on the other hand, the execution state of the query instruction is determined based on the query mark by using the mark collector, so that the boundary of the instruction in the instruction sequence can be obtained according to the execution state of the query instruction, the boundary of each instruction in the instruction sequence can be effectively defined from the hardware level, the processing mode of each instruction in the instruction sequence in the illumination stage is not changed, the consistency of the performance of the graphics processor in the process of inserting the query instruction and the process of not inserting the query instruction is ensured, the accuracy of the performance evaluation (for example, the execution duration) of each instruction in the instruction sequence is improved, the further optimization of an application program is facilitated, and meanwhile, the resource consumption such as bandwidth, power consumption and the like is reduced and the performance of a system is improved because the repeated read-write memory is not needed.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

Fig. 1A is a schematic implementation flow diagram of an execution method of an instruction according to an embodiment of the present disclosure;

FIG. 1B is a schematic diagram of an instruction sequence provided by an embodiment of the present disclosure;

FIG. 1C is a schematic diagram of a control block without inserted query tags provided by an embodiment of the present disclosure;

FIG. 1D is a schematic diagram of a control block with inserted query tags provided by an embodiment of the present disclosure;

fig. 2 is a second implementation flow chart of an execution method of an instruction according to an embodiment of the disclosure;

FIG. 3A is a schematic diagram of a first component of an instruction execution system according to an embodiment of the present disclosure;

FIG. 3B is a schematic diagram between a rendering core and a tile provided by an embodiment of the present disclosure;

FIG. 3C is a schematic diagram illustrating a second component of an instruction execution system according to an embodiment of the present disclosure;

FIG. 3D is a schematic diagram of a third component of an instruction execution system according to an embodiment of the present disclosure;

Fig. 4 is a schematic diagram of a composition structure of an instruction execution device according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a second component structure of an instruction execution device according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a hardware entity of an electronic device in an embodiment of the disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be further described in detail with reference to the accompanying drawings, and the described embodiments should not be construed as limiting the present disclosure, and all other embodiments obtained by those skilled in the art without making inventive efforts are within the scope of protection of the present disclosure.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a particular order or sequence, as permitted, to enable embodiments of the disclosure described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing embodiments of the present disclosure only and is not intended to be limiting of the present disclosure.

TBDR is a graphics rendering technique based on deferred rendering. Unlike conventional deferred rendering, TBDR divides the screen into a plurality of small blocks (referred to as "tiles" or "tiles"), each of which may be rendered independently. In the geometry stage, each tile is processed and a separate G-buffer (containing attribute information for each pixel) is generated and stored in memory. During the illumination phase, each tile is rendered independently and the illumination calculation is performed on the pixels therein. Eventually, the rendering results of all tiles are merged to form the final image. The TBDR has the advantages of better utilizing the parallel computing capability of the GPU and avoiding unnecessary rendering overhead, thereby improving the real-time rendering performance. Further, since TBDR may be optimized for each tile, it may also support some effects (e.g., antialiasing, shading, etc.) that traditional deferred rendering cannot support.

Draw calls (also known as drawing instructions) are a concept in computer graphics programming that refers to sending instructions to graphics hardware to draw graphics that tell the graphics processor which triangles to draw, how to paint, how to render, etc. In TBDR, the processing of drawing instructions also requires passing through a geometry stage and a lighting stage. In practice, the results of the geometric phase processing of multiple drawing instructions are combined. In the illumination phase, for the same block, the data of the geometric phase to be processed originate from a plurality of drawing instructions, and the processing result of the geometric phase of the same drawing instruction needs to be distributed to a plurality of blocks, so that the completion time of the drawing instruction in the illumination phase cannot be effectively defined.

When the application program needs to evaluate the performance and further requests to acquire the drawing instructions, the driver program forcedly inserts a refreshing request after each drawing instruction, and independently sends the processing result of the geometric stage of each independent drawing instruction into the illumination stage for processing, so that the completion time of the drawing instruction is the completion time when the illumination stage is completed.

On the one hand, as all inputs in the illumination stage come from the same drawing instruction, the accuracy of the obtained performance data is not high as in the conventional use scene, after the processing results of the geometric stages of a plurality of drawing instructions are collected first, the processing results are sent to the illumination stage together for processing; on the other hand, as each drawing instruction is forcedly refreshed and completes the illumination stage, the objects to be rendered need to be repeatedly read and written, the resource consumption of bandwidth, power consumption and the like is increased, and the performance of the system is reduced.

According to the instruction execution method, on one hand, query marks corresponding to the query instructions are respectively inserted into control blocks corresponding to the image blocks, and a refreshing request is not required to be inserted into a driver after each drawing instruction, so that the operation of the driver is reduced, and the friendliness of the driver is improved; on the other hand, the execution state of the query instruction is determined based on the query mark by using the mark collector, so that the boundary of the instruction in the instruction sequence can be obtained according to the execution state of the query instruction, the boundary of each instruction in the instruction sequence can be effectively defined from the hardware level, the processing mode of each instruction in the instruction sequence in the illumination stage is not changed, the consistency of the performance of the graphics processor in the process of inserting the query instruction and the process of not inserting the query instruction is ensured, the accuracy of the performance evaluation (for example, the execution duration) of each instruction in the instruction sequence is improved, the further optimization of an application program is facilitated, and meanwhile, the resource consumption such as bandwidth, power consumption and the like is reduced and the performance of a system is improved because the repeated read-write memory is not needed. The method provided by the embodiment of the disclosure may be performed by an electronic device, which may be a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), or other various types of terminals, and may also be implemented as a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like.

In the following, the technical solutions in the embodiments of the present disclosure will be clearly and completely described with reference to the drawings in the embodiments of the present disclosure.

Fig. 1A is a schematic implementation flow chart of an instruction execution method provided in an embodiment of the present disclosure, which is applied to a graphics processor, as shown in fig. 1A, and the method includes steps S11 to S12, where:

step S11, determining a control block corresponding to a block based on an instruction sequence, wherein the control block comprises a query tag corresponding to a target query instruction in the instruction sequence, and the position of the query tag in the control block is determined based on the position of the target query instruction in the instruction sequence.

Here, the instruction sequence includes at least one instruction, where the instruction may be any suitable instruction, for example, a drawing instruction (Draw Call), a Query instruction (Query), and the like. Query instructions in the GPU are used to obtain internal data (e.g., execution time, utilization of rendering cores, bandwidth throughput, etc.) of the GPU to optimize and evaluate the performance of the application. The execution time may include, but is not limited to, a copy time, an execution time of a drawing instruction, and the like. In implementation, the instruction sequence is a plurality of instructions issued by the CPU.

The number of target query instructions may be at least one. For example, a target query instruction may be inserted before and after the drawing instruction a, and then the boundary of the drawing instruction a may be obtained by the two target query instructions. For another example, a target query is inserted after the drawing instruction B, and then the execution state of the drawing instruction B can be obtained by the target query. The execution state may include, but is not limited to, an incomplete state, a complete state, and the like. In practice, the different execution states may be represented by any suitable means, such as a "0" for an incomplete state and a "1" for a complete state.

In some implementations, the target query instruction may be explicitly inserted. For example, when it is necessary to determine the boundary of a drawing instruction, a target query instruction may be inserted before and after the drawing instruction.

Fig. 1B is a schematic diagram of an instruction sequence provided in an embodiment of the disclosure, as shown in fig. 1B, the instruction sequence 101 includes three drawing instructions, namely: if the boundary of the drawing instruction 1 needs to be known, two query instructions may be inserted before and after the drawing instruction 1, namely: query instruction 0 and query instruction 1, then the instruction sequence 101 is updated to instruction sequence 102.

The control Block is used for representing the relation between the corresponding Block and each PB (Primitive Block), and when implemented, the control Block can be a control queue. The PB includes a processing result of at least one primitive corresponding to the drawing instruction in the geometric stage and other state configuration information required to be transferred to the illumination stage. The state configuration information may include, but is not limited to, information of a fragment shader, rendering pipeline information, deposit address information of various layers in a rendering object, and the like. In some implementations, a drawing instruction may be used to draw N PB, N being an integer. In implementation, if a block covers at least one PB, then a control block corresponding to the block needs to include a control sub-block corresponding to each PB; if the tile does not cover any PB, then the control block corresponding to the tile may be empty.

In some implementations, the number of tiles is at least one. Each block corresponds to a control block.

Fig. 1C is a schematic diagram of a control block without a query tag inserted according to an embodiment of the present disclosure, where, as shown in fig. 1C, a rendering object is divided into four tiles, namely: block 0-block 3, wherein:

For block 0, since primitive block 0 to primitive block 2 are covered, a corresponding control sub-block needs to be created for each primitive block in the corresponding control block 110, namely: control sub-block M0 to control sub-block M2;

for tile 1, since primitive block 0 and primitive block 2 are covered, then corresponding control sub-blocks for primitive block 0 and primitive block 2 need to be created in corresponding control block 111, namely: control sub-block N0-control sub-block N1;

for tile 2, the corresponding control block 112 is empty since no primitive blocks are covered;

for tile 3, the corresponding control block 113 is empty since no primitive block is covered.

The query tag may include, but is not limited to, a target query instruction, identification information of the target query instruction. The identification information may be location information of the target query instruction in a certain memory, or a number of a certain cache queue, etc. The memory may be a video memory on the GPU or may be a memory external to the GPU.

In some implementations, the control block corresponding to the tile includes at least one control sub-block, and the "determining the control block corresponding to the tile based on the instruction sequence" in the step S11 includes a step S111, where:

step S111, for each instruction in the instruction sequence, determining an instruction sub-block corresponding to the instruction based on the instruction, and taking the instruction sub-block corresponding to the instruction as one control sub-block in the control block corresponding to the block when the instruction sub-block corresponding to the instruction is not the preset second control sub-block.

Here, the instruction sub-blocks may include, but are not limited to, a query tag, a first control sub-block, a second control sub-block, and the like. Wherein the first control sub-block characterizes coverage information between primitive blocks and tiles, namely: at least a portion of at least one primitive in the primitive block is located in the tile. The second control sub-block may be a preset empty sub-block. In implementation, different instructions correspond to different instruction sub-blocks. For example, for a query instruction, its corresponding instruction sub-block is a query tag; for drawing instructions, the corresponding instruction sub-block may be a first control sub-block or a second control sub-block.

In some embodiments, "based on the instruction, determining an instruction sub-block corresponding to the instruction" in the step S111 includes step S131 and/or step S132, where:

step 131, determining a query tag corresponding to the query instruction when the instruction is the query instruction, and taking the query tag as an instruction sub-block corresponding to the query instruction.

Here, the instruction sub-block corresponding to the query instruction is a query tag. In implementation, different query instructions correspond to different query tags. For example, in fig. 1B, in order to obtain the boundary of the drawing instruction 1, a query instruction 0 and a query instruction 1 may be inserted before and after the drawing instruction 1, and then the query flag corresponding to the query instruction 0 may be the query instruction 0 or the identification information of the query instruction 0; the query tag corresponding to the query instruction 1 may be the query instruction 1, or may be identification information of the query instruction 1.

In some embodiments, the query tag corresponding to the query instruction may be determined according to the length of the query instruction. For example, if the length of the query instruction meets the first preset condition, the query instruction is used as a query tag corresponding to the query instruction; and if the length of the query instruction does not meet the first preset condition, determining a query mark corresponding to the query instruction based on the position information of the query instruction in the memory. The first preset condition may be any suitable condition. For example, less than a preset length, such as 16 bits (Bit), 32Bit, etc. The memory may be any suitable memory capable of storing. For example, the Memory is a Memory located outside the GPU, such as a system Memory, DDR, etc. For another example, the Memory may be a Memory built in the GPU, such as an On-Chip Memory (On-Chip Memory), which is not large in capacity and is typically used for high-speed processing of data storage required inside the Chip. In implementation, the memory includes at least one memory address, and the memory address or an index value corresponding to the memory address is used as a query tag corresponding to the query instruction.

Step S132, when the instruction is a drawing instruction, determining an instruction sub-block corresponding to the drawing instruction based on a primitive block corresponding to the drawing instruction and the tile.

Here, the instruction sub-blocks corresponding to the drawing instruction may include, but are not limited to, a first control sub-block, a second control sub-block, and the like.

In some implementations, the "determining the instruction sub-block corresponding to the drawing instruction based on the primitive block corresponding to the drawing instruction and the tile" in the step S132 includes step S1321 and/or step S1322, wherein:

in step S1321, when the primitive block belongs to the range of the tile, a first control sub-block is generated based on the primitive block and the tile, and the first control sub-block is used as an instruction sub-block corresponding to the drawing instruction.

Here, the first control sub-block characterizes the coverage information between primitive blocks and tiles, namely: at least a portion of at least one primitive in the primitive block is located in the tile. In implementation, the location information of each primitive in the primitive block is compared with the range of the tile respectively, and the coverage information is determined. For example, in fig. 1C, since the primitive block 0 is overlaid on the tile 0, the control block X0 corresponding to the tile 0 includes the control sub-block M0, and the overlay information between the primitive block 0 and the tile 0 is represented by the control sub-block M0.

In step S1322, if the primitive block does not belong to the range of the tile, the preset second control sub-block is used as the instruction sub-block corresponding to the drawing instruction.

Here, the second control sub-block may be a null sub-block. In implementation, if the control sub-block corresponding to the drawing instruction is the second control sub-block, the control sub-block does not need to be embodied in the control block. For example, in fig. 1C, since neither primitive block 0 nor primitive block 2 is covered on the tile 2 and the tile 3, the control blocks corresponding to the tile 2 and the tile 3 are empty, that is: does not contain any control sub-blocks.

In some embodiments, the control block corresponding to the tile includes at least one control sub-block, the instruction sequence includes the target query instruction and a target drawing instruction adjacent to the target query instruction, and the "determining the control block corresponding to the tile based on the instruction sequence" in the step S11 includes steps S141 to S142, where:

step S141, determining a query tag corresponding to the target query instruction, and taking the query tag as one control sub-block in a control block corresponding to the block.

Here, the query tag may be a target query instruction, a storage address of the target query instruction in the memory, or an index value corresponding to the storage address.

Step S142, determining an instruction sub-block corresponding to the target drawing instruction, and taking the instruction sub-block corresponding to the target drawing instruction as one of the control sub-blocks corresponding to the block in the case that the instruction sub-block corresponding to the target drawing instruction is not the preset second control sub-block.

Here, the target drawing instruction may precede the target query instruction or follow the target query instruction. For example, in FIG. 1B, the target drawing instruction may be drawing instruction 1, and the target query instruction may be query instruction 0 or query instruction 1. In implementation, according to the sequence of each instruction in the instruction sequence, the corresponding control sub-blocks are sequentially inserted into the control blocks corresponding to the blocks, namely: the position of an instruction in the instruction sequence corresponds to the position of its corresponding control sub-block in the control block.

Fig. 1D is a schematic diagram of a control block with a query tag inserted according to an embodiment of the present disclosure, as shown in fig. 1D, an instruction sequence 111 includes 5 instructions, i.e.: drawing instruction 0, query instruction 0, drawing instruction 1, query instruction 1, and drawing instruction 2, drawing instruction 0 to drawing instruction 2 are used for drawing primitive block 0 to primitive block 2, respectively, and the rendering object is divided into 4 blocks, namely: block 0-block 3, wherein:

For the control block 120 corresponding to the tile 0, since the tile 0 is covered by the primitive block 1, then the query tag 0 corresponding to the query instruction 0 is located before the control sub-block M1 corresponding to the primitive block 1, and the query tag 1 corresponding to the query instruction 1 is located after the control sub-block M1 corresponding to the primitive block 1, and meanwhile, since the tile 0 and the primitive block 2 cover the tile 0, the control block 120 includes five control sub-blocks, in turn: the control sub-block M0, the query mark 0, the control sub-block M1, the query mark 1 and the control sub-block M2, wherein the control sub-block M0 represents the coverage relation between the primitive block 0 drawn by the drawing instruction 0 and the block 0, the query mark 0 is the query mark corresponding to the query instruction 0, the control sub-block M1 represents the coverage relation between the primitive block 1 drawn by the drawing instruction 1 and the block 0, the query mark 1 is the query mark corresponding to the query instruction 1, and the control sub-block M2 represents the coverage relation between the primitive block 2 drawn by the drawing instruction 2 and the block 0;

for the control block 121 corresponding to the tile 1, since the tile 1 is not covered by the primitive block 1, the query tag 0 corresponding to the query instruction 0 and the query tag 1 corresponding to the query instruction 1 are adjacent, and meanwhile, since the tile 1 is covered by the primitive block 0 and the primitive block 2, the control block 121 includes four control sub-blocks, which are in turn: the control sub-block N0, the query tag 1 and the control sub-block N1, wherein the control sub-block N0 represents the coverage relationship between the primitive block 0 drawn by the drawing instruction 0 and the image block 1, the query tag 0 is the query tag corresponding to the query instruction 0, the query tag 1 is the query tag corresponding to the query instruction 1, and the control sub-block N1 represents the coverage relationship between the primitive block 2 drawn by the drawing instruction 2 and the image block 1;

For the control block 122 corresponding to the tile 2, since the tile 2 is not covered by the primitive block 1, the query tag 0 corresponding to the query instruction 0 and the query tag 1 corresponding to the query instruction 1 are adjacent, and meanwhile, since the tile 2 is not covered by the primitive block 0 and the primitive block 2, the control block 122 includes two control sub-blocks, which are in turn: query tag 0 and query tag 1, wherein query tag 0 is a query tag corresponding to query instruction 0, and query tag 1 is a query tag corresponding to query instruction 1;

for the control block 123 corresponding to the tile 3, since the tile 3 is not covered by the primitive block 1, the query tag 0 corresponding to the query instruction 0 and the query tag 1 corresponding to the query instruction 1 are adjacent, and meanwhile, since the tile 3 is covered by the primitive block 0 and the primitive block 2, the control block 123 includes two control sub-blocks, which are in turn: query tag 0, and query tag 1, query tag 0 being the query tag corresponding to query instruction 0, query tag 1 being the query tag corresponding to query instruction 1.

And step S12, sending the query tag in the control block corresponding to the block to a tag collector, so that the tag collector determines the execution state of the target query instruction based on the query tag.

Here, in the illumination phase, the GPU, when processing the control block corresponding to each tile, if encountering a query tag, sends the query tag into the tag collector. The tag collector may be implemented in hardware, firmware, software, etc.

In some embodiments, the tag collector may be located in the GPU, and the "send the query tag in the control block corresponding to the tile to the tag collector" in step S12 includes step S121, where:

step S121, sending the query tag in the control block corresponding to the tile to the tag collector of the graphics processor.

Here, the query tag is sent to the tag collector by internal communication. Thus, the physical distance between two parties is shortened, the delay time of data transmission is reduced, and the transmission efficiency is improved.

The execution state may include, but is not limited to, an incomplete state, a complete state, and the like. When the method is implemented, if the number of the query marks corresponding to the target query instruction is consistent with the number of the image blocks, the query marks are characterized to be collected and completed, and the execution state of the target query instruction is a completion state, otherwise; and if the number of the query marks corresponding to the target query instruction is smaller than the number of the image blocks, indicating that the query marks are not collected to be completed, wherein the execution state of the target query instruction is an unfinished state.

In the embodiment of the disclosure, on one hand, the query marks corresponding to the query instructions are respectively inserted into the control blocks corresponding to each image block, so that a driver is not required to insert a refresh request after each drawing instruction, the operation on the driver is reduced, and the friendliness degree on the driver is improved; on the other hand, the execution state of the query instruction is determined by the mark collector based on each query mark, so that the boundary of the instruction in the instruction sequence can be obtained according to the execution state of the query instruction, the boundary of each instruction in the instruction sequence can be effectively defined from the hardware level, the processing mode of each instruction in the instruction sequence in the illumination stage is not changed, the consistency of the performance of the graphics processor in inserting the query instruction and not inserting the query instruction is ensured, the accuracy of performance evaluation (for example, the execution time length) of each instruction in the instruction sequence is improved, the further optimization of an application program is facilitated, and meanwhile, the resource consumption such as bandwidth, power consumption and the like is reduced, and the performance of a system is improved because the repeated read-write memory is not needed.

In some embodiments, the method further comprises step S151 and/or step S152, wherein:

Step S151, taking the target query instruction as the query tag when the length of the target query instruction meets a first preset condition.

Here, the first preset condition may be any suitable condition. For example, less than a preset length. In implementation, because the lengths and the complexity of the query instructions provided by different GPUs are different, if the length of the query instructions is smaller than the preset length, the query instructions can be used as the query marks. The preset length may be any suitable length. For example, 16 bits (Bit), 32Bit, etc.

Step S152, storing the target query instruction in a preset memory, and generating the query tag based on the position of the target query instruction in the memory, where the length of the target query instruction does not meet the first preset condition.

Here, the memory may be any suitable memory capable of storing. For example, the Memory is a Memory located outside the GPU, such as a system Memory, DDR, etc. For another example, the Memory may be a Memory built in the GPU, such as an On-Chip Memory (On-Chip Memory), which is not large in capacity and is typically used for high-speed processing of data storage required inside the Chip. In practice, the memory includes at least one memory address.

When the length of the query command is greater than the preset length, if the query command is inserted into the control block, more storage space is occupied, so that a piece of on-chip or off-chip storage space can be applied first as the memory, the query command is stored in the memory, an index value of a corresponding position is generated, and the index value is used as the query mark.

In the embodiment of the disclosure, on one hand, when the structure of the query instruction is simpler, the query instruction can be directly used as a query mark, so that the implementation is convenient, the error is easy to detect, and the like; on the other hand, when the structure of the query instruction is complex, the identification information of the query instruction can be used as the query mark because the query instruction occupies a larger storage space, so that the information can be transmitted only by a fixed and narrow channel, and the data quantity and the occupation of resources are reduced.

In some embodiments, the method further comprises step S16, wherein:

step S16, in response to receiving the execution state of the target query instruction transmitted by the mark collector, determining the execution information of a first drawing instruction associated with the target query instruction; wherein the execution state of the target query instruction is determined by the tag collector based on the number of query tags.

Here, the execution state of the target query instruction is a completion state. In implementation, when the tag collector collects the query tag corresponding to the target query instruction, the tag collector sends a notification of the completion status in a preset manner, where the preset manner may include, but is not limited to, interrupt, write register, write memory, etc. Then the GPU processes the corresponding instruction upon receiving the completion status.

The target query instruction may be associated with at least one first drawing instruction. For example, in FIG. 1D, query instruction 0 may be associated with drawing instruction 0, and/or drawing instruction 1.

The execution information may include, but is not limited to, execution duration, execution status, etc., wherein the execution status may include, but is not limited to, completion status, incomplete status, etc. For example, the execution duration of the first drawing instruction, the execution status of the first drawing instruction.

In some embodiments, the execution information of the first drawing instruction includes an execution duration, and the "determining the execution information of the first drawing instruction associated with the target query instruction" in the step S16 includes steps S161 to S162, where:

step S161, determining the completion time of the target query instruction.

Here, the completion time may be determined in any suitable manner, for example, the system clock information at the moment is collected as the completion time of the target query instruction.

Step S162, determining an execution duration of the first drawing instruction based on the completion time of the target query instruction.

Here, the first drawing instruction may be associated with at least one target query instruction. For example, in fig. 1D, drawing instruction 1 associates query instruction 0 and query instruction 1, and then the execution duration of drawing instruction 1 can be obtained by using the completion time of query instruction 0 and the completion time of query instruction 1.

In some embodiments, the target query instruction includes a first query instruction and a second query instruction, the first drawing instruction, and the second query instruction are adjacent in sequence, and the step S162 includes steps S1621 to S1622, wherein:

step S1621, determining a difference between the completion time of the first query instruction and the completion time of the second query instruction.

Here, the determination manner of the completion time of the first query instruction and the completion time of the second query instruction may be the same or different. For example, the completion time of the first query instruction and the completion time of the second query instruction may each be a system clock that collects a corresponding time.

Step S1622, determining the execution duration of the first drawing instruction based on the difference value.

Here, the manner of determining the execution duration may include, but is not limited to, the difference, a weight of the difference, and the like. For example, the difference is taken as the execution duration.

In an embodiment of the disclosure, the execution information of the first drawing instruction associated with the target query instruction is determined by responding to the received execution state of the target query instruction transmitted by the mark collector. Therefore, the execution information of the drawing instruction can be obtained through the execution state of the target query instruction, on one hand, the memory is not required to be read frequently and blindly to obtain the execution information of the drawing instruction, the working efficiency of the GPU is improved, and the consumption of resources such as bandwidth, power consumption and the like is reduced; on the other hand, the boundary of the drawing instruction can be effectively defined from the hardware level, and a forced refreshing request is inserted without displaying by a driver, so that the accuracy of executing information is improved.

In some embodiments, the execution information of the first drawing instruction includes an execution state, and the method further includes step S17, wherein:

step S17, executing a second drawing instruction depending on the first drawing instruction when the execution state of the first drawing instruction is a completion state.

Here, the execution state may include, but is not limited to, a completed state, an unfinished state, and the like. In practice, since the second drawing instruction depends on the execution result of the first drawing instruction, it is necessary to wait until the execution of the first drawing instruction is completed, and then the second drawing instruction can be executed.

In an embodiment of the present disclosure, the second drawing instruction dependent on the first drawing instruction is executed when the execution state of the first drawing instruction is a completion state. Therefore, the state of the second drawing instruction is switched in a wake-up mode (namely, the state is switched from the suspension state to the execution state), and the memory is not required to be read frequently and blindly to acquire the execution state of the drawing instruction, so that the working efficiency of the GPU is improved, the consumption of resources such as bandwidth and power consumption is reduced, and the normal execution of the dependent drawing instruction is ensured.

Fig. 2 is a second implementation flow chart of an instruction execution method provided in an embodiment of the present disclosure, which is applied to a tag collector, as shown in fig. 2, where the method includes steps S21 to S22, where:

step S21, receiving a query mark in a control block corresponding to a block sent by a graphic processor; wherein the query tag corresponds to a target query instruction in an instruction sequence, and the location of the query tag in the corresponding control block is determined based on the location of the target query instruction in the instruction sequence.

Here, the mark collector may be implemented as hardware, firmware, software, or the like. In some implementations, the mark collector may be located within the GPU.

The query tag may include, but is not limited to, a target query instruction, identification information of the target query instruction.

The number of the blocks may be at least one, and the determination manner of the control block corresponding to each block may refer to the specific embodiment of step S11.

Step S22, based on the query mark, determining the execution state of the target query instruction.

Here, the execution state may include, but is not limited to, an incomplete state, a completed state, and the like. In the implementation, if the execution state of the target query instruction is a completion state, if the query flag is the target query instruction, then a notification of the completion state may be sent in a preset manner (e.g., interrupt, write register, write memory, etc.), so that other modules execute subsequent actions according to the completion state; if the query tag is the identification information of the target query instruction, the corresponding target query instruction can be acquired from the memory according to the identification information, and the target query instruction is sent to other modules for processing. Thus, the tag collector performs instruction communication with the memory in the chip, compared with the instruction communication with the memory outside the chip, the physical distance between the two communication sides is shortened, so that the delay time of instruction reading and writing can be reduced, and the execution efficiency of instructions is improved.

In some embodiments, the step S22 includes steps S221 to S222, wherein:

step S221, taking the completion status as the execution status of the target query instruction if the number of the query tags satisfies the second preset condition.

Here, the second preset condition may be any suitable condition. For example, equal to the total number of tiles. In implementation, if the number of the plurality of query tags corresponding to the target query instruction is equal to the total number of the tiles, the completion state is taken as the execution state of the target query instruction. The number of the plurality of query marks corresponding to the target query instruction is not greater than the total number of the tiles. For example, the rendered object includes 6 tiles, and then the number of the plurality of query tokens corresponding to the target query instruction should be no greater than 6.

Step S222, taking the incomplete state as the execution state of the target query instruction if the number of the query tags does not meet the second preset condition.

Here, if the number of the plurality of query tags corresponding to the target query instruction is smaller than the total number of tiles, the incomplete state is taken as the execution state of the target query instruction.

In the embodiment of the disclosure, on one hand, by inserting the query mark corresponding to the query instruction in the control block corresponding to the image block without inserting the refresh request after each drawing instruction by the driver, the operation on the driver is reduced, and the friendliness to the driver is improved; on the other hand, the execution state of the query instruction is determined based on the query mark by using the mark collector, so that the boundary of the instruction in the instruction sequence can be obtained according to the execution state of the query instruction, the boundary of each instruction in the instruction sequence can be effectively defined from the hardware level, the processing mode of each instruction in the instruction sequence in the illumination stage is not changed, the consistency of the performance of the graphics processor in the process of inserting the query instruction and the process of not inserting the query instruction is ensured, the accuracy of the performance evaluation (for example, the execution duration) of each instruction in the instruction sequence is improved, the further optimization of an application program is facilitated, and meanwhile, the resource consumption such as bandwidth, power consumption and the like is reduced and the performance of a system is improved because the repeated read-write memory is not needed.

Based on the above embodiments, the present disclosure provides an instruction execution system, and fig. 3A is a schematic diagram of the composition of the instruction execution system provided by the present disclosure, as shown in fig. 3A, the instruction execution system 30 includes a graphics processor 31 and a tag collector 32, where:

the graphics processor 31 is configured to determine, based on an instruction sequence, a control block corresponding to a tile, where the control block includes a query tag corresponding to a target query instruction in the instruction sequence, and a position of the query tag in the control block is determined based on a position of the target query instruction in the instruction sequence; sending the query marks in the control blocks corresponding to the image blocks to a mark collector;

the mark collector 32 is configured to receive the query mark sent by the graphics processor; based on the query tag, an execution state of the target query instruction is determined.

Here, the instruction sequence includes at least one instruction, for example, at least one drawing instruction, at least one query instruction, and the like.

The control blocks are used to characterize the relationship between the corresponding tiles and the various PB's. In implementation, the manner of determining the control block corresponding to each block may refer to the specific embodiment of step S11.

The tag collector may be implemented in hardware, firmware, software, etc. In some implementations, the mark collector may be located in the GPU. In implementation, the determination of the execution state of the target query instruction may be referred to in the foregoing embodiment of step S22.

In some implementations, the graphics processor 31 has at least two rendering cores, the graphics processor 31 performing tile distribution based on a tile rendering TBR architecture; wherein:

the front end module of the TBR architecture is used for determining a control block corresponding to the block based on an instruction sequence;

and the back-end module of the TBR architecture is used for determining a target rendering core of the block from the at least two rendering cores so that the target rendering core sends the query tag in the control block corresponding to the block to the tag collector.

Here, at least two rendering Core cores may be included in the GPU. For example, 4 cores, 8 cores, etc.

Front-end modules (also known as geometry stages) may be divided into vertex processing modules, graphics processing modules, and blocking (Tiling) modules. Wherein the vertex processing module is configured to perform vertex and primitive transformations on the graphics data (Vertex processing). The graphic processing module is used for removing, cutting and the like the graphic elements. The method comprises the steps that a blocking (tilling) module is used for completing screen segmentation, recording graphic Data covered to a Tile (Tile), writing generated information such as Tile information (primary List) and Vertex information (Vertex Data) into a system memory, wherein the primary List is a fixed-length array with the length of Tile, each element in the array is a linkedlist, and pointers of all triangles intersected with the current Tile are stored and point to the Vertex Data; vertex Data holds Vertex and Vertex attribute Data. In implementation, after the front-end module executes the block dividing module, a control block corresponding to each block can be generated.

The back-end modules (also referred to as illumination stages) may be divided into a rasterization module, a hidden surface removal (Hidden Surface Removal, HSR) module, a pixel rendering module, and an output merging module. The rasterization module is used for converting the primitive into a two-dimensional image, and each point in the two-dimensional image contains color (color), depth (depth) and texture (texture) data, and the point and related information are called a Fragment (Fragment). The HSR module is used for eliminating the blocked triangle. The pixel texture number coloring module is used for coloring pixels. The output merging module is used for merging a plurality of tiles and outputting the tiles to a RT (Render Target).

In some implementations, the rendering core may process any suitable preset number of tiles, e.g., 4, 6, 8, etc. In practice, the number of tiles that different rendering cores may process may be the same or different.

In some embodiments, each rendering core may be ordered according to the load information of each rendering core to obtain order information of each rendering core, and the target rendering core corresponding to each tile is determined according to the order information of each rendering core. The manner in which the order information is determined may include, but is not limited to, from more to less, or from less to more, tile thresholds, etc., as indicated by the load information. Wherein the tile threshold characterizes a number of tiles that the rendering core is most capable of processing. For example, in the case where the load information of the rendering Core1 and the load information of the rendering Core 2 are the same, the order information 1 of the rendering Core1 and the order information of the rendering Core 2 may be determined according to the naming information of the rendering Core1 and the rendering Core 2, and the order information of the rendering Core 2 is 2.

The manner in which the target rendering cores are determined may include, but is not limited to, random, custom, first, sequential, etc. For example, a rendering core whose order information is first is taken as a target rendering core. For another example, one rendering core is randomly selected from among a plurality of rendering cores having the same order information as the target rendering core. For example, if the GPU includes 4 rendering cores Core 0-Core 3, each rendering Core may process 4 tiles, and if the load information of cores 0-Core 3 is relatively close, the 4 tiles may be respectively distributed to the four cores for parallel processing.

Fig. 3B is a schematic diagram between a rendering core and a tile provided by an embodiment of the disclosure, where, as shown in fig. 3B, the GPU includes 4 rendering cores, i.e.: rendering core 0-rendering core3, rendering object is divided into 4 tiles, namely: if the block 0 to the block 3 are performed, the block 0 to the block 3 can be distributed to 4 rendering cores for parallel processing, so as to shorten the processing time of the back-end module.

Fig. 3C is a schematic diagram of a second component structure of an instruction execution system according to an embodiment of the present disclosure, where, as shown in fig. 3C, the instruction execution system includes a graphics processor 31 and a tag collector 32, and a rendering object is divided into 4 tiles, namely: block 0-block 3, the graphics processor 31 includes at least 4 rendering cores, namely: rendering cores 0-3, each connected unidirectionally to a mark collector 32. Thus, each rendering core, in processing the control blocks of the corresponding tiles, if encountering a query tag, sends the query tag into the tag collector.

Fig. 3D is a schematic diagram of a third component structure of an instruction execution system according to an embodiment of the present disclosure, where, as shown in fig. 3D, the instruction execution system includes a graphics processor 31 and a tag collector 32, where the tag collector 32 is located in the graphics processor 31, and the graphics processor 31 further includes a rendering core 0-3 and a buffer area 311 (corresponding to the foregoing preset memory), where:

in the front-end module of the TBR architecture, the rendering object is divided into 4 tiles, namely: block 0-block 3, and generating a control block corresponding to each block; wherein, the control block includes index values corresponding to the storage positions of the query instruction 0 and the query instruction 1 in the buffer 311;

in a back-end module of the TBR architecture, respectively determining target rendering cores corresponding to all the tiles, namely: the target rendering core of the block 0 is the rendering core 0, the target rendering core of the block 1 is the rendering core 1, the target rendering core of the block 2 is the rendering core 2, and the target rendering core of the block 3 is the rendering core 3, so that each query tag in the control block is sent to the tag collector 32 when each rendering is executing the control block corresponding to the block;

a tag collector 32 counting the number of query tags corresponding to each query instruction; when the number of query tags is consistent with the number of tiles, a corresponding query instruction is acquired from the buffer 311, and an execution state (completion state) of the query instruction is issued.

In the embodiment of the disclosure, firstly, query marks corresponding to query instructions are respectively inserted into control blocks corresponding to each image block, and a refresh request is not required to be inserted after each drawing instruction by a driver, so that the operation of the driver is reduced, and the friendliness of the driver is improved; secondly, when the structure of the query instruction is simpler, the query command can be directly used as a query mark, so that the implementation is convenient, the error is easy to detect and the like; when the structure of the query instruction is complex, the identification information of the query instruction can be used as the query mark because the larger storage space is occupied, and then the execution state of the target query instruction can be transferred by only one fixed and narrow channel, so that the data quantity and the occupation of resources are reduced; finally, in the illumination stage, a distribution and recovery mechanism of the query mark is used, so that the boundary of the instruction in the instruction sequence can be obtained according to the execution state of the query instruction, the boundary of each instruction in the instruction sequence can be effectively defined from the hardware level, and the processing mode of each instruction in the instruction sequence in the illumination stage is not changed, so that the consistency of the performance of the graphics processor in the insertion query instruction and the performance of the non-insertion query instruction is ensured, the accuracy of the performance evaluation (for example, the execution duration) of each instruction in the instruction sequence is improved, the further optimization of an application program is facilitated, and meanwhile, the resource consumption such as bandwidth, power consumption and the like is reduced and the performance of a system is improved because the repeated read-write memory is not needed.

In some embodiments, the graphics processor 31 is further configured to: and determining an instruction sub-block corresponding to the instruction according to each instruction in the instruction sequence, and taking the instruction sub-block corresponding to the instruction as one control sub-block in a control block corresponding to the block under the condition that the instruction sub-block corresponding to the instruction is not a preset second control sub-block.

In some embodiments, the graphics processor 31 is further configured to: under the condition that the instruction is a query instruction, determining a query mark corresponding to the query instruction, and taking the query mark as an instruction sub-block corresponding to the query instruction; and/or, if the instruction is a drawing instruction, determining an instruction sub-block corresponding to the drawing instruction based on a primitive block corresponding to the drawing instruction and the tile block.

In some embodiments, the graphics processor 31 is further configured to: taking the target query instruction as the query mark under the condition that the length of the target query instruction meets a first preset condition; and/or storing the target query instruction into a preset memory under the condition that the length of the target query instruction does not meet the first preset condition, and generating the query mark based on the position of the target query instruction in the memory.

In some embodiments, the graphics processor 31 is further configured to: generating a first control sub-block based on the primitive block and the image block under the condition that the primitive block belongs to the range of the image block, and taking the first control sub-block as an instruction sub-block corresponding to the drawing instruction; wherein the first control sub-block characterizes coverage information between the primitive block and the tile; and/or taking the preset second control sub-block as an instruction sub-block corresponding to the drawing instruction under the condition that the primitive block does not belong to the range of the image block.

In some embodiments, the control block corresponding to the tile includes at least one control sub-block, the instruction sequence includes the target query instruction and a target drawing instruction adjacent to the target query instruction, and the graphics processor 31 is further configured to: determining a query tag corresponding to the target query instruction, and taking the query tag as one control sub-block in a control block corresponding to the image block; determining an instruction sub-block corresponding to the target drawing instruction, and taking the instruction sub-block corresponding to the target drawing instruction as one control sub-block in a control block corresponding to the image block under the condition that the instruction sub-block corresponding to the target drawing instruction is not a preset second control sub-block.

In some embodiments, the marker collector is located in the graphics processor; the graphics processor 31 is further configured to: and sending the query marks in the control blocks corresponding to the image blocks to a mark collector of the image processor.

In some embodiments, the graphics processor 31 is further configured to: determining the execution information of a first drawing instruction associated with the target query instruction in response to receiving the execution state of the target query instruction transmitted by the mark collector; wherein the execution state of the target query instruction is determined by the tag collector based on the number of query tags.

In some embodiments, the execution information of the first drawing instruction includes an execution duration; the graphics processor 31 is further configured to: determining the completion time of the target query instruction; and determining the execution duration of the first drawing instruction based on the completion time of the target query instruction.

In some embodiments, the target query instruction comprises a first query instruction and a second query instruction, the first drawing instruction, and the second query instruction being adjacent in sequence; the graphics processor 31 is further configured to: determining a difference between a completion time of the first query instruction and a completion time of the second query instruction; and determining the execution time length of the first drawing instruction based on the difference value.

In some embodiments, the execution information of the first drawing instruction includes an execution state; the graphics processor 31 is further configured to: and executing a second drawing instruction which depends on the first drawing instruction under the condition that the execution state of the first drawing instruction is a completion state.

In some embodiments, the marker collector 32 is further configured to: taking the completion state as the execution state of the target query instruction under the condition that the number of the query marks meets a second preset condition; and taking the unfinished state as the execution state of the target query instruction under the condition that the number of the query marks does not meet the second preset condition.

The description of the system embodiments above is similar to that of the method embodiments above, with similar benefits as the method embodiments. For technical details not disclosed in the embodiments of the system of the present disclosure, please refer to the description of the embodiments of the method of the present disclosure for understanding.

Based on the foregoing embodiments, the embodiment of the present disclosure provides an instruction execution device, which is applied to a graphics processor, fig. 4 is a schematic diagram of a composition structure of the instruction execution device provided by the embodiment of the present disclosure, as shown in fig. 4, where, the instruction execution device 40 includes a first determining module 41 and a sending module 42, where:

The first determining module 41 is configured to determine, based on an instruction sequence, a control block corresponding to a tile, where the control block includes a query tag corresponding to a target query instruction in the instruction sequence, and a position of the query tag in the control block is determined based on a position of the target query instruction in the instruction sequence;

the sending module 42 is configured to send the query tag in the control block corresponding to the tile to a tag collector, so that the tag collector determines the execution state of the target query instruction based on the query tag.

In some embodiments, the first determining module 41 is further configured to: and determining an instruction sub-block corresponding to the instruction according to each instruction in the instruction sequence, and taking the instruction sub-block corresponding to the instruction as one control sub-block in a control block corresponding to the block under the condition that the instruction sub-block corresponding to the instruction is not a preset second control sub-block.

In some embodiments, the first determining module 41 is further configured to at least one of: under the condition that the instruction is a query instruction, determining a query mark corresponding to the query instruction, and taking the query mark as an instruction sub-block corresponding to the query instruction; and determining an instruction sub-block corresponding to the drawing instruction based on the primitive block corresponding to the drawing instruction and the image block under the condition that the instruction is the drawing instruction.

In some embodiments, the first determining module 41 is further configured to at least one of: under the condition that the length of the target query instruction meets a first preset condition, taking the target query instruction as a query mark corresponding to the target query instruction; and storing the target query instruction into a preset memory under the condition that the length of the target query instruction does not meet the first preset condition, and generating the query mark based on the position of the target query instruction in the memory.

In some embodiments, the first determining module 41 is further configured to at least one of: generating a first control sub-block based on the primitive block and the image block under the condition that the primitive block belongs to the range of the image block, and taking the first control sub-block as an instruction sub-block corresponding to the drawing instruction; wherein the first control sub-block characterizes coverage information between the primitive block and the tile; and under the condition that the primitive block does not belong to the range of the image block, taking a preset second control sub-block as an instruction sub-block corresponding to the drawing instruction.

In some embodiments, the control block corresponding to the tile includes at least one control sub-block, the instruction sequence includes the target query instruction and a target drawing instruction adjacent to the target query instruction, and the first determining module 41 is further configured to: determining a query tag corresponding to the target query instruction, and taking the query tag as one control sub-block in a control block corresponding to the image block; determining an instruction sub-block corresponding to the target drawing instruction, and taking the instruction sub-block corresponding to the target drawing instruction as one control sub-block in a control block corresponding to the image block under the condition that the instruction sub-block corresponding to the target drawing instruction is not a preset second control sub-block.

In some embodiments, the mark collector is located in the graphics processor, and the sending module 42 is further configured to: and sending the query marks in the control blocks corresponding to the image blocks to a mark collector of the image processor.

In some embodiments, the apparatus further includes a third determining module, configured to determine, in response to receiving an execution state of the target query instruction transmitted by the mark collector, execution information of a first drawing instruction associated with the target query instruction; wherein the execution state of the target query instruction is determined by the tag collector based on the number of query tags.

In some embodiments, the execution information of the first drawing instruction includes an execution duration, and the third determining module is further configured to: determining the completion time of the target query instruction; and determining the execution duration of the first drawing instruction based on the completion time of the target query instruction.

In some embodiments, the target query instruction includes a first query instruction and a second query instruction, the first drawing instruction, and the second query instruction are adjacent in sequence, and the third determination module is further configured to: determining a difference between a completion time of the first query instruction and a completion time of the second query instruction; and determining the execution time length of the first drawing instruction based on the difference value.

In some embodiments, the apparatus further comprises an execution module for: and executing a second drawing instruction which depends on the first drawing instruction under the condition that the execution state of the first drawing instruction is a completion state.

The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the description of the embodiments of the method of the present disclosure for understanding.

Based on the above embodiments, the disclosure provides an instruction execution device, which is applied to a tag collector, and fig. 5 is a schematic diagram of a second component structure of the instruction execution device provided by the disclosure embodiment, as shown in fig. 5, where the instruction execution device 50 includes a receiving module 51 and a second determining module 52, where:

the receiving module 51 is configured to receive a query tag in a control block corresponding to a tile sent by the graphics processor; wherein the query tag corresponds to a target query instruction in an instruction sequence, the location of the query tag in the corresponding control block being determined based on the location of the target query instruction in the instruction sequence;

The second determining module 52 is configured to determine an execution state of the target query instruction based on the query tag.

In some embodiments, the second determining module 52 is further configured to: taking the completion state as the execution state of the target query instruction under the condition that the number of the query marks meets a second preset condition; and taking the unfinished state as the execution state of the target query instruction under the condition that the number of the query marks does not meet the second preset condition.

Based on the above embodiments, the disclosed embodiments provide a graphics processor for tile distribution based on a tile rendering TBR architecture, the graphics processor comprising a marker collector and at least two rendering cores, wherein:

The description of the graphics processor embodiments above is similar to that of the method embodiments described above, with similar benefits as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the description of the embodiments of the method of the present disclosure for understanding.

It should be noted that, in the embodiment of the present disclosure, if the method is implemented in the form of a software functional module, and sold or used as a separate product, the method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present disclosure may be essentially or portions contributing to the related art, and the software product may be stored in a storage medium, including several instructions to cause an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. As such, embodiments of the present disclosure are not limited to any specific combination of hardware and software.

An embodiment of the present disclosure provides an electronic device including a memory and a processor, where the memory stores a computer program executable on the processor, and where the processor implements the above method when executing the computer program.

The disclosed embodiments provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method. The computer readable storage medium may be transitory or non-transitory.

Embodiments of the present disclosure provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above-described method. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It should be noted that fig. 6 is a schematic diagram of a hardware entity of an electronic device in an embodiment of the disclosure, and as shown in fig. 6, the hardware entity of the electronic device 600 includes: a processor 601, a communication interface 602, and a memory 603, wherein:

the processor 601 generally controls the overall operation of the electronic device 600.

The communication interface 602 may enable the electronic device to communicate with other terminals or servers over a network.

The memory 603 is configured to store instructions and applications executable by the processor 601, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or processed by various modules in the processor 601 and the electronic device 600, which may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM). Data transfer may be performed between the processor 601, the communication interface 602, and the memory 603 via the bus 604.

It should be noted here that: the description of the storage medium and apparatus embodiments above is similar to that of the method embodiments described above, with similar benefits as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present disclosure, please refer to the description of the embodiments of the method of the present disclosure for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not constitute any limitation on the implementation of the embodiments of the present disclosure. The foregoing embodiment numbers of the present disclosure are merely for description and do not represent advantages or disadvantages of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present disclosure may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

Alternatively, the above-described integrated units of the present disclosure may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present disclosure may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium, including several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

The foregoing is merely an embodiment of the present disclosure, but the protection scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think about the changes or substitutions within the technical scope of the present disclosure, and should be covered by the protection scope of the present disclosure.

Claims

1. A method of executing instructions for use in a graphics processor, the method comprising:

and sending the query tags in the control blocks corresponding to the tiles to a tag collector, so that the tag collector determines the execution state of the target query instruction based on the number of the query tags and a second preset condition.

2. The method of claim 1, wherein the block corresponding to the tile includes at least one control sub-block, and wherein the determining the block corresponding to the tile based on the instruction sequence includes:

And determining an instruction sub-block corresponding to the instruction according to each instruction in the instruction sequence, and taking the instruction sub-block corresponding to the instruction as one control sub-block in a control block corresponding to the block under the condition that the instruction sub-block corresponding to the instruction is not a preset second control sub-block.

3. The method of claim 2, wherein the determining, based on the instruction, an instruction sub-block to which the instruction corresponds includes at least one of:

under the condition that the instruction is a query instruction, determining a query mark corresponding to the query instruction, and taking the query mark as an instruction sub-block corresponding to the query instruction;

and determining an instruction sub-block corresponding to the drawing instruction based on the primitive block corresponding to the drawing instruction and the image block under the condition that the instruction is the drawing instruction.

4. The method of claim 3, wherein the determining the instruction sub-block corresponding to the drawing instruction based on the primitive block corresponding to the drawing instruction and the tile comprises at least one of:

generating a first control sub-block based on the primitive block and the image block under the condition that the primitive block belongs to the range of the image block, and taking the first control sub-block as an instruction sub-block corresponding to the drawing instruction; wherein the first control sub-block characterizes coverage information between the primitive block and the tile;

And under the condition that the primitive block does not belong to the range of the image block, taking a preset second control sub-block as an instruction sub-block corresponding to the drawing instruction.

5. The method of any one of claims 1 to 4, wherein the control block corresponding to the tile includes at least one control sub-block, and the instruction sequence includes the target query instruction and a target drawing instruction adjacent to the target query instruction;

the determining, based on the instruction sequence, a control block corresponding to the tile includes:

determining a query tag corresponding to the target query instruction, and taking the query tag as one control sub-block in a control block corresponding to the image block;

determining an instruction sub-block corresponding to the target drawing instruction, and taking the instruction sub-block corresponding to the target drawing instruction as one control sub-block in a control block corresponding to the image block under the condition that the instruction sub-block corresponding to the target drawing instruction is not a preset second control sub-block.

6. The method of any one of claims 1 to 4, wherein the marker collector is located in the graphics processor;

the sending the query tag in the control block corresponding to the block to the tag collector includes:

And sending the query marks in the control blocks corresponding to the image blocks to a mark collector of the image processor.

7. The method according to any one of claims 1 to 4, further comprising at least one of:

under the condition that the length of the target query instruction meets a first preset condition, taking the target query instruction as a query mark corresponding to the target query instruction;

and storing the target query instruction into a preset memory under the condition that the length of the target query instruction does not meet the first preset condition, and generating a query mark corresponding to the target query instruction based on the position of the target query instruction in the memory.

8. The method according to any one of claims 1 to 4, further comprising:

determining the execution information of a first drawing instruction associated with the target query instruction in response to receiving the execution state of the target query instruction transmitted by the mark collector; wherein the execution state of the target query instruction is determined by the tag collector based on the number of query tags.

9. The method of claim 8, wherein the execution information of the first drawing instruction includes an execution duration, and wherein the determining the execution information of the first drawing instruction associated with the target query instruction includes:

determining the completion time of the target query instruction;

and determining the execution duration of the first drawing instruction based on the completion time of the target query instruction.

10. The method of claim 9, wherein the target query instruction comprises a first query instruction and a second query instruction, the first drawing instruction, and the second query instruction being adjacent in sequence;

the determining the execution duration of the first drawing instruction based on the completion time of the target query instruction includes:

determining a difference between a completion time of the first query instruction and a completion time of the second query instruction;

and determining the execution time length of the first drawing instruction based on the difference value.

11. The method of claim 8, wherein the execution information of the first drawing instruction includes an execution state, the method further comprising:

and executing a second drawing instruction which depends on the first drawing instruction under the condition that the execution state of the first drawing instruction is a completion state.

12. A method of executing instructions for use in a mark collector, the method comprising:

and determining the execution state of the target query instruction based on the number of the query tags and a second preset condition.

13. The method of claim 12, wherein the determining the execution state of the target query instruction based on the number of query tags and a second preset condition comprises:

taking the completion state as the execution state of the target query instruction under the condition that the number of the query marks meets the second preset condition;

and taking the unfinished state as the execution state of the target query instruction under the condition that the number of the query marks does not meet the second preset condition.

14. An execution system for instructions, the system comprising a graphics processor and a tag collector, wherein:

the mark collector is used for receiving the query mark sent by the graphic processor; and determining the execution state of the target query instruction based on the number of the query tags and a second preset condition.

15. The system of claim 14, wherein the graphics processor has at least two rendering cores, the graphics processor performing tile distribution based on a tile rendering TBR architecture, wherein:

16. An execution apparatus for instructions, for use with a graphics processor, the apparatus comprising:

and the sending module is used for sending the query marks in the control blocks corresponding to the tiles to a mark collector so that the mark collector determines the execution state of the target query instruction based on the number of the query marks and a second preset condition.

17. An execution device for instructions, for use in a mark collector, the device comprising:

And the second determining module is used for determining the execution state of the target query instruction based on the number of the query marks and a second preset condition.

18. A graphics processor, characterized in that the graphics processor performs tile distribution based on a tile rendering TBR architecture, the graphics processor comprising a marker collector and at least two rendering cores, wherein:

the mark collector is used for receiving the query mark sent by the target rendering core; and determining the execution state of the target query instruction based on the number of the query tags and a second preset condition.

19. An electronic device comprising a processor and a memory, the memory storing a computer program executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 13 when executing the computer program.

20. A computer readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, implements the method of any of claims 1 to 13.