CN115641251A

CN115641251A - 2D desktop image pre-fetching block fusion method, device and equipment

Info

Publication number: CN115641251A
Application number: CN202211384580.XA
Authority: CN
Inventors: 曹杨; 蒋新; 杨盼; 林苍松
Original assignee: Changsha Jingmei Integrated Circuit Design Co ltd; Changsha Jingjia Microelectronics Co ltd
Current assignee: Changsha Jingmei Integrated Circuit Design Co ltd; Changsha Jingjia Microelectronics Co ltd
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-01-24

Abstract

The embodiment of the application provides a method, a device and equipment for prefetching, blocking and fusing 2D desktop images, and relates to the technical field of computers. Firstly, acquiring a fusion command from a GPGPU (general purpose graphics processing unit), and sending the fusion command to each fusion component; respectively reading source data from a frame memory in each fusion component in a Tile/linear block mode, and writing the read source data into a CACHE; reading data in CACHE in each fusion component, and sending the data to a fusion component unit for data processing after byte alignment; after data processing is finished in each fusion component unit, writing the data into CACHE, performing block compression processing, and further performing Burst write-back frame storage to finish fusion. Through the application, the 2D desktop is accelerated, the method is not limited by a complex 3D application scene, and the smooth 2D desktop display effect is achieved.

Description

2D desktop image pre-fetching block fusion method, device and equipment

Technical Field

The application relates to the technical field of computers, in particular to a method and a device for prefetching, blocking and fusing 2D desktop images and computer equipment.

Background

The operating system is divided into a desktop operating system, a server operating system and an embedded operating system according to the application field. The desktop operating system is most commonly applied, the basic function of the desktop operating system is to realize human-computer interaction, and smooth desktop window display effect is pursued in the human-computer interaction process, so that friendly experience is brought.

The 2D desktop can be directly realized by software, the performance of the 2D desktop greatly depends on the performance of a CPU, and in order to get rid of the complete dependence on the CPU, the hardware acceleration is realized by a GPGPU. However, in some application scenarios, when the GPGPU drawing engine is occupied by complicated 3D drawing, the 2D desktop will be stuck when used, and the independent 2D desktop image blocking and fusing operation realizes independent acceleration of the 2D desktop without occupying 3D drawing engine resources.

In the prior art, a technical scheme for realizing the processing of a multi-channel projection picture by performing the processes of color difference adjustment, geometric correction, picture fusion and the like on the multi-channel picture exists, but the processing mode of the scheme is single, and the division processing of the multi-format picture is not described; the data is not pre-fetched, and the effect of processing while outputting is achieved.

Disclosure of Invention

In order to solve one of the technical defects, embodiments of the present application provide a method, an apparatus, and a computer device for 2D desktop image prefetching and blocking fusion.

According to a first aspect of the embodiments of the present application, there is provided a 2D desktop image pre-fetching blocking fusion method, including:

acquiring a fusion command from a GPGPU (general purpose graphics processing unit), and sending the fusion command to each fusion component;

respectively reading source data from the frame memory in each fusion component, and writing the read source data into CACHE;

reading the data in CACHE in each fusion part, and sending the data into the fusion part unit for data processing after byte alignment;

after data processing is finished in each fusion component unit, writing the data into CACHE, and further performing Burst write-back frame storage to finish fusion.

In an optional embodiment of the present application, in the method, the step of reading data in CACHE in each fusion component, performing byte alignment, and sending the data to the fusion component unit for data processing further includes:

and respectively reading data in CACHE in each fusion component, carrying out pixel alignment, sending the data to a fusion component unit, and carrying out color gamut conversion.

In an optional embodiment of the present application, in the method, after data processing is respectively completed in each fusion component unit, the data is written into a CACHE, and further Burst write-back frame memory is performed, so as to complete the fusion step, further including:

after color gamut conversion is respectively completed in each fusion component unit, pixels are written into CACHEs, patterns processed in a blocking mode are transmitted in a blocking compression mode, and Burst write-back frame storage is further performed to complete fusion.

In an optional embodiment of the present application, in the method, after color gamut conversion is respectively completed in each fusion component unit, pixels are written into CACHE, and patterns processed by blocking are transmitted in a blocking compression manner, and Burst write-back is further performed to a frame memory, so as to complete the fusion step further includes:

after color gamut conversion is respectively completed in each fusion component unit, alignment is carried out according to the initial address of a target object, pixels are written into CACHE, patterns processed in a blocking mode are transmitted in a blocking compression mode, and Burst write-back frame storage is further carried out to complete fusion.

In an optional embodiment of the present application, the block compression in the method includes:

the compression is performed by any one of lossless compression, lossy compression, and bypass compression.

In an optional embodiment of the present application, in the method, the step of reading the source data from the frame memory and writing the read source data into the CACHE in each fusion component further includes:

reading the source data by any one of a TILE format mode and a linear pixel reading mode.

and reading the source data by a read operation mode capable of configuring the specific size of the block.

According to a second aspect of the embodiments of the present application, a 2D desktop image pre-fetching block fusion apparatus is provided, the apparatus includes a command distribution module, a CACHE write module, a byte alignment module, and a frame memory write-back module; wherein, the first and the second end of the pipe are connected with each other,

the command distribution module is used for acquiring the fusion command from the GPGPU and sending the fusion command to each fusion component;

a CACHE writing module used for reading the source data from the frame memory in each fusion component and writing the read source data into the CACHE;

the byte alignment module is used for reading data in CACHE in each fusion component respectively, and sending the data to the fusion component unit for data processing after byte alignment;

and the frame memory write-back module is used for writing the data into CACHE after the data processing is finished in each fusion component unit, and further performing Burst write-back frame memory to finish fusion.

According to a third aspect of embodiments of the present application, there is provided a computer apparatus, including: a memory; a processor; and a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the steps of the method as any one of the first aspect of the embodiments of the present application.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having a computer program stored thereon; the computer program is executed by a processor to implement the steps of the method according to any one of the first aspect of the embodiments of the application.

By adopting the 2D desktop image prefetching and blocking fusion method provided by the embodiment of the application, the following beneficial effects are achieved:

1. frame memory is written in a pipeline mode and a high-efficiency Burst mode, burst operation of each component can reach 128x256 bits, and compared with 1024x768 display resolution, one row or TILE data block size pixels can be continuously operated at one time;

2. the pattern fusion operation supported by the application occupies less resources and has a high-efficiency circuit structure;

3. the method is realized by adopting independent 2D fusion, the command receiving modes comprise three modes, a bus write command CACHE is configured, or a ringbuffer mode for actively reading commands from a memory and a frame memory is adopted, a standard internal storage bus interface is adopted to access the frame memory, the reusability is strong, and the method can be repeatedly used in the design of a high-performance general graphic chip of the GPGPU.

In summary, based on the method of the present application, implementation of a block pre-fetching image fusion operation that is relatively simple, has a small hardware resource occupation ratio and a very high performance can be completed, a processed pattern is output in a block compression manner, an effect of processing while outputting is achieved, acceleration is achieved for a 2D desktop, and the method is not limited to a complex 3D application scenario, and a smooth 2D desktop display effect is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flowchart of a 2D desktop image pre-fetching blocking fusion method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a fused component architecture provided by an embodiment of the present application;

fig. 3 is a schematic diagram of a fusion adaptive blocking strategy provided in an embodiment of the present application;

FIG. 4 is a diagram illustrating a prefetch mechanism according to an embodiment of the present application;

fig. 5 is a structural diagram of a 2D desktop image prefetching blocking fusion apparatus provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions and advantages in the embodiments of the present application more clearly understood, the following description of the exemplary embodiments of the present application with reference to the accompanying drawings is made in further detail, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all the embodiments. It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.

A General-purpose graphics processing unit (GPGPU) is a General-purpose computing task that is originally processed by a central processing unit and is calculated by a graphics processor that processes graphics tasks. These general purpose computations often have no relationship to graphics processing. Due to the powerful parallel processing capability and programmable pipelining of modern graphics processors, stream processors are enabled to process non-graphics data. In particular, when single instruction stream multiple data Stream (SIMD) is faced and the computation load of data processing is much larger than the requirement of data scheduling and transmission, the performance of the general-purpose graphics processor greatly surpasses that of the conventional cpu application.

The 2D desktop can be directly realized by software, the performance of the 2D desktop greatly depends on the performance of a CPU, and in order to get rid of the complete dependence on the CPU, the hardware acceleration is realized by a GPGPU. In the process of implementing the application, the inventor finds that, under some application scenarios, when the GPGPU drawing engine is occupied by complicated 3D drawing, a 2D desktop is jammed when used, and independent acceleration of the 2D desktop is implemented by independent 2D desktop image blocking and fusing operations, without occupying 3D drawing engine resources.

In view of the above problems, the embodiment of the present application provides a relatively simple implementation manner for a block pre-fetching image fusion operation with a small hardware resource occupation ratio and a very high performance, and processed patterns are output in a block compression manner, so as to achieve an effect of processing while outputting, accelerate a 2D desktop, and achieve a smooth 2D desktop display effect without being limited by a complex 3D application scenario.

The method and the device complete the pixel synthesis operation of the source object and the target object through the image block pre-fetching fusion operation, and accelerate the fusion operation rate through a pre-fetching mechanism operating a mode that a plurality of objects are fused and output simultaneously; and completing image blocking processing through a specific blocking size processing mode to realize image fusion. Please refer to the steps shown in fig. 1:

s1: and acquiring the fusion command in the GPGPU and sending the fusion command to each fusion component. In a specific implementation, the fusion components can be respectively distributed to a plurality of fusion components of the GPGPU block fusion engine through the distribution component, and optionally, the maximum fusion components are 32 fusion components.

S2: and respectively reading the source data from the frame memory in each fusion component, and writing the read source data into the CACHE.

In a specific implementation, when reading the source data, the source data may be read in any one of a TILE format manner, a linear pixel reading manner, or a reading operation manner in which a specific size of a block can be configured.

Based on steps S1 and S2, in this embodiment of the present application, a command is first obtained, the command is parsed and then dispatched to the fusion component, and the fusion component reads source data from a memory or a frame memory to a pre-stored data Cache, which may support a TILE format, a linear pixel reading mode, or a read operation mode in which a specific size of a block may be configured.

S3: and respectively reading the data in the CACHE in each fusion component, carrying out byte alignment, and then sending the data to a fusion component unit for data processing.

In specific implementation, data in CACHE is read in each fusion component respectively, pixel alignment is carried out, then the data is sent to a fusion component unit, and color gamut conversion is carried out.

S4: after data processing is finished in each fusion component unit, writing the data into CACHE, and further performing Burst write-back frame storage to finish fusion.

In specific implementation, after color gamut conversion is completed in each fusion component unit, pixels are written into CACHE, patterns processed in a blocking mode are transmitted in a blocking compression mode, and Burst write-back frame storage is further performed to complete fusion.

Specifically, after color gamut conversion is respectively completed in each fusion component unit, alignment is performed according to the starting address of the target object, pixels are written into a CACHE, the pattern processed by blocking is transmitted in a blocking compression mode, and Burst write-back frame storage is further performed to complete fusion. A

Specifically, the block compression in the embodiment of the present application may be performed by any one of lossless compression, lossy compression, and bypass compression. Further, the configuration compression block can select 32byte multiple specifications, 64 byte multiple specifications, 128 byte multiple specifications, 256 byte multiple specifications, 512 byte multiple specifications, and the like according to actual needs. Further, data compression may be performed in different formats, including in particular implementations based on both color format and data format. Specifically, the color format includes an RGB format and a YUV format, and the data format includes a TILE format.

In summary, the present application reads data from CACHE for byte alignment, then sends the data to the fusion unit, writes the pixels into CACHE, transmits the pattern processed by blocks in a block compression manner, and then performs Burst write back to frame memory by control. The image block prefetching directly generates a frame memory writing operation by the controller and simultaneously generates a corresponding Burst operation. The whole image fusion operation process is efficient, and the 2D desktop fluency in a complex 3D application scene is greatly accelerated.

In addition, the image prefetching mode of the embodiment of the application may adopt multi-path alternate prefetching (1, 2, 4, 8, 16, 32 paths), and based on the above technical scheme, each fusion component is 128-bit full-flow operation, and the image fusion performs fusion operation on the fusion component in a flow mode according to the real addresses of the source and the target, so that the fusion operation of 128 pixels (RGBA, 32 bits) in a single period is realized. In a specific implementation, the fusion technique only needs to pipeline according to the target start address. The fused pipeline operation comprises reading pixels from CACHE, aligning two levels of CACHEs, executing fused operation by two levels of pipelines and writing back by pipelines. The fusion supports the fusion of a scanning line pixel storage mode and a pixel block mode, and meets the requirements of different operations on the pixel storage mode.

Based on the technical scheme, the multi-image fusion, transparency adjustment and full-screen or local rectangular block fusion applied to the 2D desktop in a fast fusion mode can be realized. And the images are transmitted in a compression mode, so that the transmission bandwidth is reduced, the rate of GPU pattern fusion processing is accelerated, and the smooth 2D desktop display effect is achieved.

Referring to fig. 2 to 4, the embodiment of the present application takes the contents shown in fig. 2 to 4 as an example to further describe the steps S1 to S4:

as shown in fig. 2, the GPGPU obtains the command, analyzes the command into a fusion command, and then sends the fusion command to eight fusion components of the block fusion engine through a dispatching component, and after sending the fusion command to each fusion component, starts the fusion, and completes the fusion under the support of NOC and GDDR.

Each fusion component needs to read source data from the frame memory and write the source data into the CACHE. Specifically, as shown in fig. 3, in the embodiment of the present application, the image is segmented by a TILE format method and a linear pixel reading method (linear format), and source data is read.

And reading CACHE data, aligning the current pixels, sending the pixels into a fusion component unit, and performing corresponding color gamut conversion. After each pipeline is converted, alignment is carried out according to the initial address of the target object, then the pixels are written into CACHE, the non-empty state of the CACHE can directly start frame writing Burst operation, data is transmitted through compression, and if the storage bandwidth of the Burst operation is enough, the full-pipeline operation performance can be achieved.

The image block pre-fetching directly generates a write frame memory operation based on the pre-fetching mechanism as shown in fig. 4, and simultaneously generates a corresponding Burst operation, and a byte mask controls the write frame memory. 128 pixels can be output per cycle. The whole fusion process is simple and efficient, and the fluency of the 2D desktop in a complex 3D application scene is greatly accelerated.

It should be understood that, although the steps in the flowchart are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

Referring to fig. 5, an embodiment of the present application provides a 2D desktop image pre-fetching block fusion apparatus, which includes a command distribution module 10, a CACHE write module 20, a byte alignment module 30, and a frame memory write-back module 40; wherein the content of the first and second substances,

the command distribution module 10 is configured to obtain a fusion command in the GPGPU and send the fusion command to each fusion component;

and a CACHE write module 20, configured to read source data from the frame memory in each fusion component, and write the read source data into the CACHE.

Based on the command distribution module 10 and the CACHE write-in module 20, in this embodiment, a command is first obtained, the command is parsed and then dispatched to the fusion component, and the fusion component reads source data from a memory or a frame memory to a pre-stored data CACHE, which can support a TILE format, a linear pixel reading mode, or a read operation mode in which a specific size of a block can be configured.

A byte alignment module 30, which is used for reading the data in CACHE in each fusion component, and sending the data into the fusion component unit for data processing after byte alignment;

and a frame memory write-back module 40, configured to write the data into a CACHE after the data processing is completed in each fusion component unit, and further perform Burst write-back to the frame memory to complete fusion.

In summary, in this embodiment, data is read from CACHE for byte alignment, and then sent to the merging unit, and then pixels are written into CACHE, and the pattern processed by blocking is transmitted in a blocking compression manner, and then Burst write back is performed by control for frame storage. The image block pre-fetching directly generates the frame memory writing operation by the controller and simultaneously generates the corresponding Burst operation. The whole image fusion operation process is efficient, and the 2D desktop fluency in a complex 3D application scene is greatly accelerated.

For specific limitations of the 2D desktop image pre-fetching block fusion device, reference may be made to the above limitations of the 2D desktop image pre-fetching block fusion method, which is not described herein again. The modules in the 2D desktop image pre-fetching block fusion apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a 2D desktop image pre-fetching block fusion method as above. The method comprises the following steps: the memory stores a computer program, and the processor executes the computer program to realize any step of the 2D desktop image pre-fetching block fusion method.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out any of the steps of the above 2D desktop image pre-fetching block fusion method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The solution in the embodiment of the present application may be implemented by using various computer languages, for example, C language, VHDL language, verilog language, object-oriented programming language Java, and transliterated scripting language JavaScript.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the present application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A2D desktop image pre-fetching block fusion method is characterized by comprising the following steps:

acquiring a fusion command in a GPGPU, and sending the fusion command to each fusion component;

respectively reading source data from a frame memory in each fusion component, and writing the read source data into CACHE;

reading the data in the CACHE in each fusion component respectively, and sending the data to a fusion component unit for data processing after byte alignment;

and after the data processing is respectively completed in each fusion component unit, writing the data into the CACHE, and further performing Burst write-back frame storage to complete fusion.

2. The 2D desktop image pre-fetching blocking fusion method according to claim 1, wherein the step of reading the data in the CACHE in each fusion component, respectively, performing byte alignment, and sending the data to a fusion component unit for data processing further comprises:

and respectively reading the data in the CACHE in each fusion component, carrying out pixel alignment, sending the data to a fusion component unit, and carrying out color gamut conversion.

3. The 2D desktop image pre-fetching blocking fusion method according to claim 2, wherein the step of writing to the CACHE and further performing Burst write back to frame memory after the data processing is completed in each of the fusion component units, respectively, to complete the fusion further comprises:

after the color gamut conversion is respectively completed in each fusion component unit, the pixels are written into the CACHE, the pattern processed by the block is transmitted according to a block compression mode, and Burst write-back frame storage is further carried out to complete fusion.

4. The 2D desktop image pre-fetching block fusion method according to claim 3, wherein after the color gamut conversion is completed in each of the fusion component units, the pixels are written into the CACHE, the pattern processed by the block is transmitted in a block compression manner, and further Burst write-back frame storage is performed to complete the fusion, further comprising:

after the color gamut conversion is respectively completed in each fusion component unit, the color gamut conversion is aligned according to the initial address of a target object, the pixels are written into the CACHE, the pattern processed by the blocks is transmitted in a block compression mode, and Burst write-back frame storage is further performed to complete fusion.

5. The 2D desktop image pre-fetching block fusion method of claim 4, wherein the block compression comprises:

6. The 2D desktop image pre-fetching block fusion method according to claim 4, wherein the step of reading source data from a frame memory and writing the read source data into a CACHE in each fusion component further comprises:

and reading the source data in any one of a TILE format mode and a linear pixel reading mode.

7. The 2D desktop image pre-fetching block fusion method according to claim 4, wherein the step of reading source data from a frame memory and writing the read source data into a CACHE in each fusion component further comprises:

8. A2D desktop image pre-fetching block fusion device is characterized by comprising a command distribution module, a CACHE writing module, a byte alignment module and a frame memory writing-back module; wherein the content of the first and second substances,

the command distribution module is used for acquiring fusion commands from the GPGPU and sending the fusion commands to each fusion component;

a CACHE writing module used for reading source data from the frame memory in each fusion component and writing the read source data into CACHE;

the byte alignment module is used for reading the data in the CACHE in each fusion component respectively, and sending the data to the fusion component unit for data processing after byte alignment;

and the frame memory write-back module is used for writing the data into the CACHE after the data processing is respectively finished in each fusion component unit, and further performing Burst write-back frame memory to finish fusion.

9. A computer device, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-7.

10. A computer-readable storage medium, having stored thereon a computer program; the computer program is executed by a processor to implement the method of any one of claims 1-7.