CN111179403A - Method and device for parallel generation of texture mapping Mipmap image and computer storage medium - Google Patents

Method and device for parallel generation of texture mapping Mipmap image and computer storage medium Download PDF

Info

Publication number
CN111179403A
CN111179403A CN202010069472.8A CN202010069472A CN111179403A CN 111179403 A CN111179403 A CN 111179403A CN 202010069472 A CN202010069472 A CN 202010069472A CN 111179403 A CN111179403 A CN 111179403A
Authority
CN
China
Prior art keywords
mipmap
generated
image
parallel
original image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010069472.8A
Other languages
Chinese (zh)
Inventor
马超
孙建康
王世凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xintong Semiconductor Technology Co Ltd
Original Assignee
Nanjing Xintong Semiconductor Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xintong Semiconductor Technology Co Ltd filed Critical Nanjing Xintong Semiconductor Technology Co Ltd
Priority to CN202010069472.8A priority Critical patent/CN111179403A/en
Publication of CN111179403A publication Critical patent/CN111179403A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping

Abstract

The embodiment of the invention discloses a method, a device and a computer storage medium for generating texture mapping Mipmap images in parallel; the method can comprise the following steps: determining the corresponding filter pixel block size between the Mipmap image to be generated at each level and the original image; establishing corresponding processing tasks for the Mipmap images to be generated at all levels respectively; and running the processing task in parallel according to the original image and the corresponding filter pixel block sizes between the Mipmap images to be generated at all levels and the original image to generate the Mipmap images to be generated at all levels.

Description

Method and device for parallel generation of texture mapping Mipmap image and computer storage medium
Technical Field
The embodiment of the invention relates to the technical field of computer graphic rendering, in particular to a method and a device for generating a texture mapping Mipmap image in parallel and a computer storage medium.
Background
In the process of three-dimensional graphics rendering using a computer, a texture mapping technique is generally used to make the surface of an object in a scene have texture and texture as close as possible to the real world. Briefly, a texture map is equivalent to wrapping a picture on the surface of an object like a gift-containing box. In reality, the texture of the surface of an object closer to the observer should be larger and clearer, and the texture of the surface of an object farther from the observer should be smaller and more blurred; therefore, if texture maps of the same size are used regardless of the distance of an object in the field of view during texture mapping, an error effect of the distance is produced. In view of this, Mipmap is a technique for solving the problem of how large and small a texture map should be.
The Mipmap image is a sequence of images formed by reducing an original image by a multiple of 2 until an image having a size of 1 pixel × 1 pixel is obtained. The texture width and height of each Mipmap image level are half the texture width and height of the last Mipmap image level. In the process of texture mapping, for an object with a farther visual field, a Mipmap image with a smaller size is used instead of an original image for rendering, so that not only can the sawtooth phenomenon be eliminated to generate an effect which is more realistic and is close to large and far, but also the rendering efficiency can be improved, and the rendering time can be shortened. Therefore, the efficiency of Mipmap image generation is an important consideration in the texture mapping process.
Disclosure of Invention
In view of the above, embodiments of the present invention are directed to a method, an apparatus, and a computer storage medium for parallel generation of a texture mapped Mipmap image; the generation efficiency of the Mipmap image can be improved, the generation delay of the Mipmap image is reduced, and the overall performance of the graphic rendering is improved.
The technical scheme of the embodiment of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a method for generating Mipmap images in parallel, where the method includes:
determining the corresponding filter pixel block size between the Mipmap image to be generated at each level and the original image;
establishing corresponding processing tasks for the Mipmap images to be generated at all levels respectively;
and running the processing task in parallel according to the original image and the corresponding filter pixel block sizes between the Mipmap images to be generated at all levels and the original image to generate the Mipmap images to be generated at all levels.
In a second aspect, an embodiment of the present invention provides a method for generating Mipmap images in parallel, where the method includes:
loading an original image;
determining a plurality of task groups based on the size of a Mipmap image to be generated;
for each task group, determining a corresponding pixel block of each task group in the original image according to a filter pixel block size corresponding to the level of detail LOD of the to-be-generated Mipmap image;
for each task group, carrying out filtering calculation on pixel blocks corresponding to each task group in parallel; wherein, the result of the parallel filtering calculation is the Mipmap image to be generated.
In a third aspect, an embodiment of the present invention provides an apparatus for generating Mipmap images in parallel, where the apparatus includes: the system comprises a first determination part, a task establishment part and a parallel operation part; wherein the content of the first and second substances,
the first determining part is configured to determine a corresponding filtering pixel block size between each stage of a Mipmap image to be generated and an original image;
the task establishing part is configured to establish corresponding processing tasks for the Mipmap images to be generated at all levels respectively;
and the parallel operation part is configured to operate the processing task in parallel according to the original image and the corresponding filtering pixel block sizes between the Mipmap images to be generated at all levels and the original image so as to generate the Mipmap images to be generated at all levels.
In a fourth aspect, an embodiment of the present invention provides an apparatus for generating Mipmap images in parallel, where the apparatus includes: a loading part, a second determining part, a third determining part and a parallel filtering part; wherein the content of the first and second substances,
the loading part is configured to load an original image;
the second determination section configured to determine a plurality of task groups based on a size of a Mipmap image to be generated;
the third determining part is configured to determine, for each task group, a pixel block corresponding to each task group in the original image according to a filter pixel block size corresponding to a level of detail LOD of the Mipmap image to be generated;
the parallel filtering part is configured to perform filtering calculation on pixel blocks corresponding to the task groups in parallel for the task groups; wherein, the result of the parallel filtering calculation is the Mipmap image to be generated.
In a fifth aspect, an embodiment of the present invention provides a GPU, including:
a memory configured to store an original image;
a compute shader unit; the compute shader unit is configured to perform the steps of the method of generating Mipmap images in parallel of the first aspect or the second aspect.
In a sixth aspect, an embodiment of the present invention provides a computer storage medium storing a program for generating Mipmap images in parallel, where the program for generating Mipmap images in parallel is executed by at least one processor, and the method for generating Mipmap images in parallel in the first aspect or the second aspect is implemented.
The embodiment of the invention provides a method, a device and a computer storage medium for generating texture mapping Mipmap images in parallel; in the process of generating the multi-level Mipmap image, the multi-level Mipmap image is generated in a parallel computing mode according to the original image and the filter pixel block sizes corresponding to the Mipmap image to be generated at each level and the original image, which are found through recursion, so that the generation efficiency of the Mipmap image is improved, the generation delay of the Mipmap image is reduced, the running resources of a CPU (central processing unit) are saved, and the overall performance of image rendering is improved.
Drawings
FIG. 1 is a schematic diagram of a Mipmap image of a set of checkerboard textures provided by an embodiment of the present invention;
FIG. 2 is a block diagram of a computing device according to an embodiment of the present invention;
FIG. 3 is a schematic block diagram of a GPU according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the relationship between a graphics rendering pipeline and a compute shader according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of a method for generating Mipmap images in parallel according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a pixel block according to an embodiment of the present invention;
fig. 7 is a schematic flowchart of an embodiment of a flowchart for parallel generating a Mipmap image according to the present invention;
fig. 8 is a schematic flow chart of a method for generating a single-level Mipmap image according to an embodiment of the present invention;
fig. 9 is a schematic flowchart of an embodiment of a method for generating a single-level Mipmap image according to the present invention;
fig. 10 is a schematic diagram illustrating an apparatus for parallel generating Mipmap images according to an embodiment of the present invention;
fig. 11 is a schematic composition diagram of another apparatus for parallel generating Mipmap images according to the embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Generally, taking the set of Mipmap images with checkerboard texture as shown in fig. 1 as an example, the images are usually arranged in sequence according to the Level of Detail (LOD) of Mipmap, where the original image is Level 0, the Level of Detail is sequentially increased, and the width and height of each Mipmap image are respectively 1/2 of the Mipmap image of the previous stage. As shown in fig. 1, the original image size of the checkerboard texture is 2048 pixels × 2048 pixels, which is a Level 0 image of the set of Mipmap images; from the above, the Level 1 image of the set of Mipmap images has a size of 1024 pixels × 1024 pixels, and so on. For simplicity, only Level 4 images to the set of Mipmap images are shown in fig. 1, it being understood that Mipmap images to a size of 1 pixel x 1 pixel may be included in the set of tessellated texture Mipmap images.
Currently, a group of Mipmap images are generated by filtering step by step starting from an original image, i.e. Level 0, i.e. a next Mipmap image is generated by performing filtering calculation on a previous Mipmap image by using a 2 pixel × 2 pixel block, and therefore, if a next Mipmap image is to be generated, it is necessary to perform processing on the basis that the previous Mipmap image has already been generated. Therefore, generally speaking, if Mipmap images are needed to be used, or before the graphics application program runs, Mipmap images at each level are generated in advance, that is, generated off-line; or directly generated when the graphic application program runs; but both are Mipmap images that generate Level 1 through Level N by stage pixel by pixel based on the serial computation process described above. Therefore, CPU resources are occupied, and processing time delay is increased; if further generated while the graphics application is running, the overall performance of the graphics application running may also be reduced.
In view of this, the embodiments of the present invention are expected to describe a technique for generating a texture mapped Mipmap image in parallel according to the characteristics of a Mipmap image based on pixel block filtering in a generation process, for example, a Mipmap image can be generated by a parallel computing manner by using the strong parallel processing performance of a GPU, so as to improve the generation efficiency of the Mipmap image, reduce the generation delay of the Mipmap image, save the operating resources of a CPU, and improve the overall performance of graphics rendering.
Fig. 2 is a computing device 100 capable of implementing the parallel texture map Mipmap image generation technique described in the embodiments of the present invention, where the computing device 100 may include, but is not limited to, the following: wireless devices, mobile or cellular telephones (including so-called smart phones), Personal Digital Assistants (PDAs), video game consoles (including video displays, mobile video game devices, mobile video conferencing units), laptop computers, desktop computers, television set-top boxes, tablet computing devices, electronic book readers, fixed or mobile media players, and the like. In the example of fig. 2, computing device 2 may include a Central Processing Unit (CPU) 102 and a system memory 104 that communicate via an interconnection path that may include a memory bridge 105. The memory bridge 105, which may be, for example, a north bridge chip, is connected to an I/O (input/output) bridge 107 via a bus or other communication path 106, such as a HyperTransport (HyperTransport) link. I/O bridge 107, which may be, for example, a south bridge chip, receives user input from one or more user input devices 108 (e.g., a keyboard, mouse, trackball, touch screen that can be incorporated as part of display device 110, or other type of input device) and forwards the input to CPU102 via path 106 and memory bridge 105. Graphics processor 112 is coupled to memory bridge 105 via a bus or other communication path 113 (e.g., PCI Express, accelerated graphics port, or hypertransport link); in one embodiment, GPU112 may be a graphics subsystem that delivers pixels to display device 110 (e.g., a conventional CRT or LCD based monitor). System disk 114 is also connected to I/O bridge 107. Switch 116 provides a connection between I/O bridge 107 and other components, such as network adapter 118 and various add-in cards 120 and 121. Other components (not explicitly shown), including USB or other port connections, CD drives, DVD drives, film recording devices, and the like, may also be connected to I/O bridge 107. Communication paths interconnecting the various components in fig. 2 may be implemented using any suitable protocols, such as PCI (peripheral component interconnect), PCI-Express, AGP (accelerated graphics port), hypertransport, or any other bus or point-to-point communication protocol, and connections between different devices may use different protocols as is known in the art.
In one embodiment, GPU112 includes circuitry optimized for graphics and video processing, including, for example, video output circuitry. In another embodiment, GPU112 includes circuitry optimized for general purpose processing while preserving the underlying (underlying) computing architecture. In yet another embodiment, GPU112 may be integrated with one or more other system elements, such as memory bridge 105, CPU102, and I/O bridge 107, to form a system on a chip (SoC).
It will be appreciated that the system shown herein is exemplary and that variations and modifications are possible. The connection topology, including the number and arrangement of bridges, the number of CPUs 102, and the number of GPUs 112, may be modified as desired. For example, in some embodiments, system memory 104 is directly connected to CPU102 rather than through a bridge, and other devices communicate with system memory 104 via memory bridge 105 and CPU 102. In other alternative topologies, GPU112 is connected to I/O bridge 107 or directly to CPU102, rather than to memory bridge 105. While in other embodiments, I/O bridge 107 and memory bridge 105 may be integrated onto a single chip. Numerous embodiments may include two or more CPUs 102 and two or more GPUs 112. The particular components shown herein are optional; for example, any number of add-in cards or peripherals may be supported. In some embodiments, switch 116 is eliminated and network adapter 118 and add-in cards 120, 121 are directly connected to I/O bridge 107.
Fig. 3 is a schematic block diagram of a GPU112 capable of implementing embodiments of the present invention, in which a graphics memory 204 may be a part of GPU 12. Thus, GPU 12 may read data from graphics memory 204 and write data to graphics memory 204 without using a bus. In other words, GPU112 may process data locally using local storage instead of off-chip memory. Such graphics memory 204 may be referred to as on-chip memory. This allows GPU112 to operate in a more efficient manner by eliminating the need for GPU112 to read and write data via a bus, which may experience heavy bus traffic. In some cases, however, GPU112 may not include a separate memory, but rather utilize system memory 10 via a bus. Graphics memory 204 may include one or more volatile or non-volatile memories or storage devices, such as Random Access Memory (RAM), Static RAM (SRAM), Dynamic RAM (DRAM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory, magnetic data media, or optical storage media.
Based on this, GPU112 may be configured to perform various operations related to: generate pixel data from graphics data provided by CPU102 and/or system memory 104 via memory bridge 105 and bus 113, interact with local graphics memory 204 (e.g., a general frame buffer) to store and update pixel data, transfer pixel data to display device 110, and so on.
In operation, CPU102 is the main processor of computing device 100, controlling and coordinating the operation of other system components. Specifically, CPU102 issues commands that control the operation of GPU 112. In some embodiments, CPU102 writes command streams for GPU112 into data structures (not explicitly shown in fig. 2 or 3) that may be located in system memory 104, graphics memory 204, or other storage locations accessible to both CPU102 and GPU 112. A pointer to each data structure is written to a pushbuffer to initiate processing of the command stream in the data structure. GPU112 reads the command stream from one or more pushbuffers and then executes the commands asynchronously with respect to the operation of CPU 102. Execution priority may be specified for each pushbuffer to control scheduling of different pushbuffers.
As particularly depicted in FIG. 3, the GPU112 includes an I/O (input/output) unit 205 that communicates with the rest of the computing device 100 via a communication path 113 that is connected to the memory bridge 105 (or, in an alternative embodiment, directly to the CPU 102). The connection of the GPU112 to the rest of the computing device 100 may also vary. In some embodiments, GPU112 may be implemented as an add-in card that may be inserted into an expansion slot of computer system 100. In other embodiments, GPU112 may be integrated on a single chip with a bus bridge, such as memory bridge 105 or I/O bridge 107. While in other embodiments some or all of the elements of GPU112 may be integrated with CPU102 on a single chip.
In one embodiment, communication path 113 can be a PCI-EXPRESS link in which a dedicated channel is allocated to GPU112 as is known in the art. The I/O unit 205 generates data packets (or other signals) for transmission over the communication path 113 and also receives all incoming data packets (or other signals) from the communication path 113, directing the incoming data packets to the appropriate components of the GPU 112. For example, commands related to processing tasks may be directed to scheduler 207, while commands related to memory operations (e.g., reads or writes to graphics memory 204) may be directed to graphics memory 204.
In GPU112, an array 230 of rendering cores may be included, where array 230 may include C general purpose rendering cores 208, where C > 1. Based on the generic rendering cores 208 in the array 230, the GPU112 is able to concurrently perform a large number of program tasks or computational tasks. For example, each rendering core may be programmed to be able to perform processing tasks related to a wide variety of programs, including, but not limited to, linear and non-linear data transformations, video and/or audio data filtering, modeling operations (e.g., applying laws of physics to determine the position, velocity, and other attributes of objects), graphics rendering operations (e.g., tessellation shader, vertex shader, geometry shader, and/or fragment shader programs), and so forth.
Further, a fixed function processing unit 231, which may include hardware that is hardwired to perform certain functions, may also be included in GPU 112. Although fixed-function hardware may be configured to perform different functions via, for example, one or more control signals, the fixed-function hardware typically does not include program memory capable of receiving user-compiled programs. In some examples, fixed function processing unit 231 may include, for example, a processing unit that performs primitive assembly, a processing unit that performs rasterization, and a processing unit that performs fragment operations. For the processing unit executing the primitive assembly, the processing unit can restore the vertexes which are colored by the vertex shader unit into a grid structure of a graph, namely the primitive, according to the original connection relation, so that the subsequent fragment shader unit can process the graph; the rasterization operation includes converting the new primitive and outputting the fragments to a fragment shader; the fragment operation includes, for example, a depth test, a cropping test, an Alpha blend, or a transparency blend, and the pixel data output by the above operations can be displayed as graphics data by the display device 110. Combining the rendering core array 230 and the fixed-function processing unit 231, a complete logic model of the graphics rendering pipeline can be implemented.
In addition, rendering core array 230 may receive processing tasks to be performed from scheduler 207. Scheduler 207 may independently schedule the tasks for execution by resources of GPU112, such as one or more rendering cores 208 in rendering core array 230. In one example, scheduler 207 may be a hardware processor. In the example shown in fig. 3, scheduler 207 may be included in GPU 112. In other examples, scheduler 207 may also be a separate unit from CPU102 and GPU 112. Scheduler 207 may also be configured as any processor that receives a stream of commands and/or operations.
Scheduler 207 may process one or more command streams that include scheduling operations included in one or more command streams executed by GPU 112. Specifically, scheduler 207 may process one or more command streams and schedule operations in the one or more command streams for execution by rendering core array 230. In operation, CPU102, via GPU driver 103 included with system memory 104 in fig. 2, may send a command stream to scheduler 207 that includes a series of operations to be performed by GPU 12. Scheduler 207 may receive a stream of operations including a command stream through I/O unit 205 and may process the operations of the command stream sequentially based on an order of the operations in the command stream, and the operations in the command stream may be scheduled for execution by one or more processing units in rendering core array 230.
As shown in fig. 4, while one or a portion of the general purpose rendering cores 208 in the rendering core array 230 is programmed to perform processes associated with the graphics rendering pipeline 41 in conjunction with the fixed function processing unit 231, one or a portion of the general purpose rendering cores 208 in the rendering core array 230 is also programmed to perform the compute shader 42 independent of the parallel computing of the graphics rendering pipeline 41. Compute shaders 42 and graphics rendering pipeline 41 are both capable of making calls to and writing to resources within graphics memory 204 in GPU 12. Therefore, for the technology of parallel generating a texture map Mipmap image that is expected to be described in the embodiment of the present invention, in combination with the fact that the Mipmap image is the basis for texture mapping by a fragment shader in a graphics rendering pipeline, the embodiment of the present invention preferably employs a compute shader 42 to implement the technology of parallel generating a texture map Mipmap image that is expected to be described in the embodiment of the present invention, it should be noted that the technology of parallel generating a texture map Mipmap image that is expected to be described in the embodiment of the present invention is implemented by the compute shader 42 as a preferred example, and it can be understood that, in a specific implementation process, scheduling by means of the scheduler 207 can still be implemented by programming one or a part of the general-purpose rendering cores 208 in the rendering core array 230 to implement the technology of parallel generating a texture map Mipmap image. And will not be described or limited herein.
With reference to the above drawings and the corresponding description, referring to fig. 5, a method for generating Mipmap images in parallel is shown, where the method may be applied to a GPU with a parallel processing function, and the method may include:
s501: determining the corresponding filter pixel block size between the Mipmap image to be generated at each level and the original image;
s502: establishing corresponding processing tasks for the Mipmap images to be generated at all levels respectively;
s503: and running the processing task in parallel according to the original image and the corresponding filter pixel block sizes between the Mipmap images to be generated at all levels and the original image to generate the Mipmap images to be generated at all levels.
It should be noted that, for a set of Mipmap images, since each Level of Mipmap image, such as the Mipmap image of Level i, is generated by performing traversal filtering calculation on the Mipmap image of its upper Level, such as the Mipmap image of Level (i-1), with a 2 pixel × 2 pixel block, i is not 0; then it can be found by recursion that if it is desired to generate a Mipmap image of Level i, it can be scaled by 2 for the original image, i.e. the Mipmap image of Level 0i Pixel 2iGenerating pixel blocks of pixels by performing traversal filtering calculation, wherein as shown in fig. 6, a Mipmap image of Level 1 is generated by performing traversal filtering calculation on the pixel blocks of 2 pixels × 2 pixels through a Mipmap image of Level 0; the Mipmap image of Level 2 is a Mipmap image of Level 0 at 22 Pixel 22Based on the foregoing recursive content, for the technical solution shown in fig. 5, in some examples, the determining a block size of a corresponding filtering pixel between a Mipmap image to be generated at each stage and an original image includes:
according to the Mipmap to be generatedDetermining the corresponding filtering pixel block size between the Mipmap image to be generated and the original image as 2, wherein the Level of detail LOD of the image is Level iiPixel 2iA pixel.
For the technical solution shown in fig. 5, in some examples, the establishing of the corresponding processing tasks for the Mipmap images to be generated at each stage respectively includes:
respectively establishing processing tasks for the Mipmap images to be generated correspondingly according to the sizes of the filtering pixel blocks corresponding to the Mipmap images to be generated at all levels; the processing task is executed to perform traversal filtering calculation on the original image according to the filtering pixel block size corresponding to each level of the Mipmap image to be generated.
Specifically, as shown in fig. 7, for each to-be-generated Mipmap image, in the implementation process, a kernel may be correspondingly set for each to-be-generated Mipmap image, and it can be understood that each kernel correspondingly completes one of the processing tasks Task; for the kernel, the OpenCL kernel based on OpenCL is taken as an example to be implemented in the embodiment of the present invention, and it is understood that other APIs meeting the graphics standard may also be applicable to implementing the technical solution in the embodiment of the present invention, and details are not described here. The input of each kernel can be configured into two, including an original image, that is, Mipmap image data of level 0 and an LOD of a Mipmap image to be generated corresponding to the kernel; and the output of each kernel can be configured as Mipmap image data to be generated corresponding to the kernel. After the kernel is set, the various kernel can be compiled by an OpenCL compiler into an executable program that can run on specific hardware, e.g., a GPU. By running the executable program, the scheduler Dispatcher can be used to distribute each kernel to each Processing Element (PE) of the GPU, it should be noted that in the GPU, each PE may include a plurality of processing cores; the processing tasks corresponding to the kernel are executed in parallel by the respective PEs, so that Mipmap images to be generated at all levels can be output in parallel.
Through the technical scheme shown in fig. 5, the Mipmap images of all levels can be generated by filtering from the Mipmap image of the previous level through recursive discoveryBesides, the original image can be used as 2i Pixel 2iThe pixel blocks of the pixels are generated by traversing, filtering and calculating, therefore, each level of Mipmap images do not need to be generated in a conventional serial calculation mode, and can be generated in parallel by utilizing the powerful parallel operation capability of the GPU according to the requirement, so that the generation efficiency of the multi-level Mipmap images is improved, the generation delay of the multi-level Mipmap images is reduced, and the operation resources of the CPU are saved and the overall performance of image rendering is improved due to the fact that the GPU is utilized.
For the technical solution shown in fig. 5, parallel generation of a multi-level Mipmap image is implemented with task-level granularity, but in the specific generation process for a single-level Mipmap image, if the processing task is executed, serial traversal is still performed on the original image according to a conventional scheme, parallel processing capability of the GPU cannot be brought forward. In the process of traversing filtering calculation, it can be found that the pixels are never repeatedly traversed in the process of traversing by using the pixel blocks, and the original image can be completely traversed. Based on this, traversal can be replaced by the idea of blocking, and the original image is divided into image blocks which are completely covered and not overlapped according to pixel blocks, so that the traversal filtering mode adopted for generating the Mipmap image can be replaced by the blocking filtering mode, and the image blocks obtained by division are not overlapped and can completely cover the original image, so that blocking filtering can be realized in parallel. Based on the above description, an embodiment of the present invention further provides a method for generating a single-level Mipmap image, where the method may still be applied to a GPU with a parallel processing function, and referring to fig. 8, the method includes:
s801: loading an original image;
s802: determining a plurality of task groups based on the size of a Mipmap image to be generated;
s803: for each task group, determining a corresponding pixel block of each task group in the original image according to a filter pixel block size corresponding to the level of detail LOD of the to-be-generated Mipmap image;
s804: for each task group, carrying out filtering calculation on pixel blocks corresponding to each task group in parallel; wherein, the result of the parallel filtering calculation is the Mipmap image to be generated.
For the solution shown in fig. 8, in some examples, the determining a plurality of task groups based on a size of a Mipmap image to be generated includes:
determining width multiplied by height task groups according to the width and height of the Mipmap image to be generated; wherein, the unit of width and height is pixel.
For the technical solution shown in fig. 8, in some examples, the LOD corresponding to the Mipmap image to be generated is Level i, and the filter pixel block size corresponding to the detail Level LOD of the Mipmap image to be generated is 2iPixel 2iA pixel.
In combination with the above technical solution shown in fig. 8, in a specific implementation process, as shown in fig. 9, the method may include:
step 1: and initializing the GPU to enable the GPU to be in a standby working state.
Step 2: the original image used to generate the Mipmap image is loaded.
And step 3: verifying whether a pixel data format of an original image can be supported; if the information can be supported, the step 4 is carried out; otherwise, the process is ended.
And 4, step 4: initiating width multiplied by height task groups according to the width (unit is pixel) and height (unit is pixel) of the Mipmap image to be generated;
and 5: these task groups are used to compute the individual pixel values in the Mipmap image to be generated in parallel.
As for the technical solution and the specific implementation process shown in fig. 8, the embodiment of the present invention may be implemented by the aforementioned compute shader, where the generation algorithm of any pixel of each stage of Mipmap image is programmed to the compute shader (ComputeShader), the compute shader program is compiled through OpenGL standard API, linked and loaded to a memory of a GPU, and the GPU core executes the width × height task groups in parallel, where the width and height are the width and height of the Mipmap image to be currently generated. For example, if the original image size is 2048 pixels × 2048 pixels, the Mipmap image to be generated is an image of a next-stage Mipmap of the original image, and if the Mipmap image size to be generated is 1024 pixels × 1024 pixels, the number of task groups for generating the stage Mipmap image is 1024 × 1024, that is, one pixel corresponds to one task group. The GPU will execute the set of tasks as parallel as possible, depending on the number of physical cores of the GPU that are actually used. Therefore, with the parallel filtering scheme shown in fig. 8, all pixels of the Mipmap image to be generated currently can be generated almost simultaneously.
For a compute shader, its algorithmic program to implement the solution shown in FIG. 8 may be as follows:
#version 430 core
layout (local_size_x = 1, local_size_y = 1) in;
layout (rgba32f, binding = 0) readonly uniform image2D input_image;
layout (rgba32f, binding = 0) writeonly uniform image2D output_image;
uniform int mipmap_level;
void main(void)
{
vec4 texel_total = vec4(0.0, 0.0, 0.0, 1.0);
ivec2 uv = ivec2(gl_GlobalInvocationID.xy);
int bs = int(exp2(mipmap_level));
for (int i = 0; i<bs; i++)
for (int j = 0; j<bs; j++)
{
texel_total += imageLoad(input_image, (uv * bs + ivec2(i,j)));
}
vec4 next_level_texel = texel_total / pow(bs, 2);
imageStore(output_image, uv, next_level_texel);
}
for the above procedure, it should be noted that, first, a pixel block size in the original image corresponding to each pixel of the level is calculated according to the Mipmap _ level of the Mipmap image to be generated, for example, Mipmap _ level =1, and assuming that the level 0 size is 32 × 32 pixels, the pixel block size of the previous level corresponding to one pixel of the level is 2 × 2; then, calculating the average value of all pixel values in the pixel block of the original image as a pixel value of the level; finally, the above process may calculate a pixel value of the image at this level, which corresponds to a task group, when the GPU hardware used is running, the OpenGL library may compile the Shader program into GPU executable commands, and allocate the executable commands to each GPU Core for parallel computation according to the size of the Mipmap image to be generated, for example, the size of level 1 is 16 pixels × 16 pixels, and the executable commands are allocated to 256 GPU cores for parallel computation. Thereby obtaining all pixels of the Mipmap image currently to be generated in parallel.
It can be understood that the technical solution shown in fig. 8 can be implemented not only separately to obtain a single-stage Mipmap image, but also as a preferred specific implementation of S503 in the technical solution shown in fig. 5, to be implemented separately for Mipmap images to be generated at each stage, so as to generate Mipmap images to be generated at each stage. The embodiments of the present invention will not be described in detail herein.
Based on the above description, the embodiment of the present invention performs parallel computation processing on each pixel value in each level of Mipmap images and single-level Mipmap images by using the parallel computation capability of the GPU, fully utilizes the computation resources of the GPU, and greatly reduces the time delay for generating Mipmap images.
Based on the above description about the technical solution shown in fig. 5, referring to fig. 10, an apparatus 90 for generating Mipmap images in parallel according to an embodiment of the present invention is shown, where the apparatus 90 includes: a first determination section 901, a task creation section 902, and a parallel execution section 903; wherein the content of the first and second substances,
the first determining part 901 is configured to determine a corresponding filter pixel block size between each stage of Mipmap image to be generated and the original image;
the task establishing part 902 is configured to establish corresponding processing tasks for Mipmap images to be generated at each level respectively;
the parallel running part 903 is configured to run the processing task in parallel according to the original image and the corresponding filter pixel block sizes between the Mipmap images to be generated at each level and the original image, so as to generate Mipmap images to be generated at each level.
In some examples, the first determining portion 901 is configured to:
determining the corresponding filter pixel block size between the Mipmap image to be generated and the original image as 2 according to the Level of detail LOD of the Mipmap image to be generated as Level iiPixel 2iA pixel.
In some examples, the task creation portion 902 is configured to:
respectively establishing processing tasks for the Mipmap images to be generated correspondingly according to the sizes of the filtering pixel blocks corresponding to the Mipmap images to be generated at all levels; the processing task is executed to perform traversal filtering calculation on the original image according to the filtering pixel block size corresponding to each level of the Mipmap image to be generated.
Based on the above description about the technical solution shown in fig. 8, referring to fig. 11, it shows another apparatus 90 for generating Mipmap images in parallel according to the embodiment of the present invention, where the apparatus 90 includes: a loading section 904, a second determining section 905, a third determining section 906, and a parallel filtering section 907; wherein the content of the first and second substances,
the loading part 904 configured to load an original image;
the second determining section 905 configured to determine a plurality of task groups based on a size of a Mipmap image to be generated;
the third determining part 906 is configured to determine, for each task group, a pixel block corresponding to each task group in the original image according to a filter pixel block size corresponding to the level of detail LOD of the Mipmap image to be generated;
the parallel filtering portion 907 is configured to perform filtering calculation on pixel blocks corresponding to each task group in parallel for each task group; wherein, the result of the parallel filtering calculation is the Mipmap image to be generated.
In some examples, the second determining portion 905 is configured to:
determining width multiplied by height task groups according to the width and height of the Mipmap image to be generated; wherein, the unit of width and height is pixel.
In some examples, the LOD corresponding to the Mipmap image to be generated is Level i, and the filter pixel block size corresponding to the detail Level LOD of the Mipmap image to be generated is 2iPixel 2iA pixel.
It should be noted that the apparatus 90 may be implemented by a compute shader in a GPU, and in a specific implementation process, the apparatus 90 shown in fig. 10 and the apparatus 90 shown in fig. 11 may be separately implemented according to the generation requirement of a Mipmap image, and in addition, the apparatus 90 shown in fig. 11 may be implemented together as a component of the apparatus 90 shown in fig. 10 and a component in the apparatus 90 shown in fig. 10, which is not described in detail herein in the embodiment of the present invention.
It is understood that in this embodiment, "part" may be part of a circuit, part of a processor, part of a program or software, etc., and may also be a unit, and may also be a module or a non-modular.
In addition, each component in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
The integrated unit, if implemented in software functional modules, may be sold or used as a stand-alone product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. By way of example, and not limitation, such computer-readable media can comprise a USB flash disk, a removable hard disk, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The code may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Accordingly, the terms "processor" and "processing unit" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of embodiments of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (i.e., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by a collection of interoperative hardware units, including one or more processors as described above.
Various aspects of the present invention have been described. These and other embodiments are within the scope of the following claims. It should be noted that: the technical schemes described in the embodiments of the present invention can be combined arbitrarily without conflict.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method of generating Mipmap images in parallel, the method comprising:
determining the corresponding filter pixel block size between the Mipmap image to be generated at each level and the original image;
establishing corresponding processing tasks for the Mipmap images to be generated at all levels respectively;
and running the processing task in parallel according to the original image and the corresponding filter pixel block sizes between the Mipmap images to be generated at all levels and the original image to generate the Mipmap images to be generated at all levels.
2. The method of claim 1, wherein determining a corresponding filter pixel block size between each stage of the Mipmap image to be generated and the original image comprises:
determining the corresponding filter pixel block size between the Mipmap image to be generated and the original image as 2 according to the Level of detail LOD of the Mipmap image to be generated as Level iiPixel 2iA pixel.
3. The method according to claim 1, wherein the establishing the corresponding processing tasks for the Mipmap images to be generated at each stage respectively comprises:
respectively establishing processing tasks for the Mipmap images to be generated correspondingly according to the sizes of the filtering pixel blocks corresponding to the Mipmap images to be generated at all levels; the processing task is executed to perform traversal filtering calculation on the original image according to the filtering pixel block size corresponding to each level of the Mipmap image to be generated.
4. A method of generating Mipmap images in parallel, the method comprising:
loading an original image;
determining a plurality of task groups based on the size of a Mipmap image to be generated;
for each task group, determining a corresponding pixel block of each task group in the original image according to a filter pixel block size corresponding to the level of detail LOD of the to-be-generated Mipmap image;
for each task group, carrying out filtering calculation on pixel blocks corresponding to each task group in parallel; wherein, the result of the parallel filtering calculation is the Mipmap image to be generated.
5. The method of claim 4, wherein determining the plurality of task groups based on the size of the Mipmap image to be generated comprises:
determining width multiplied by height task groups according to the width and height of the Mipmap image to be generated; wherein, the unit of width and height is pixel.
6. The method of claim 4, whereinThe method is characterized in that the LOD corresponding to the Mipmap image to be generated is Level i, and the block size of the filtering pixel corresponding to the Level of detail LOD of the Mipmap image to be generated is 2iPixel 2iA pixel.
7. An apparatus for parallel generation of Mipmap images, the apparatus comprising: the system comprises a first determination part, a task establishment part and a parallel operation part; wherein the content of the first and second substances,
the first determining part is configured to determine a corresponding filtering pixel block size between each stage of a Mipmap image to be generated and an original image;
the task establishing part is configured to establish corresponding processing tasks for the Mipmap images to be generated at all levels respectively;
and the parallel operation part is configured to operate the processing task in parallel according to the original image and the corresponding filtering pixel block sizes between the Mipmap images to be generated at all levels and the original image so as to generate the Mipmap images to be generated at all levels.
8. An apparatus for parallel generation of Mipmap images, the apparatus comprising: a loading part, a second determining part, a third determining part and a parallel filtering part; wherein the content of the first and second substances,
the loading part is configured to load an original image;
the second determination section configured to determine a plurality of task groups based on a size of a Mipmap image to be generated;
the third determining part is configured to determine, for each task group, a pixel block corresponding to each task group in the original image according to a filter pixel block size corresponding to a level of detail LOD of the Mipmap image to be generated;
the parallel filtering part is configured to perform filtering calculation on pixel blocks corresponding to the task groups in parallel for the task groups; wherein, the result of the parallel filtering calculation is the Mipmap image to be generated.
9. A GPU, comprising:
a memory configured to store an original image;
a compute shader unit; the compute shader unit is configured to perform the steps of the method of parallel generating Mipmap images of any one of claims 1 to 6.
10. A computer storage medium storing a program for parallel generation of Mipmap images, which when executed by at least one processor implements the steps of the method of parallel generation of Mipmap images of any one of claims 1 to 6.
CN202010069472.8A 2020-01-21 2020-01-21 Method and device for parallel generation of texture mapping Mipmap image and computer storage medium Pending CN111179403A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010069472.8A CN111179403A (en) 2020-01-21 2020-01-21 Method and device for parallel generation of texture mapping Mipmap image and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010069472.8A CN111179403A (en) 2020-01-21 2020-01-21 Method and device for parallel generation of texture mapping Mipmap image and computer storage medium

Publications (1)

Publication Number Publication Date
CN111179403A true CN111179403A (en) 2020-05-19

Family

ID=70652789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010069472.8A Pending CN111179403A (en) 2020-01-21 2020-01-21 Method and device for parallel generation of texture mapping Mipmap image and computer storage medium

Country Status (1)

Country Link
CN (1) CN111179403A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070211070A1 (en) * 2006-03-13 2007-09-13 Sony Computer Entertainment Inc. Texture unit for multi processor environment
CN101344961A (en) * 2007-06-07 2009-01-14 辉达公司 Extrapolation of nonresident mipmap data using resident MIPMAP data
CN106254877A (en) * 2015-06-11 2016-12-21 Arm有限公司 Processing system for video
CN106683171A (en) * 2016-12-12 2017-05-17 中国航空工业集团公司西安航空计算技术研究所 GPU multi-thread texture mapping SystemC modeling structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070211070A1 (en) * 2006-03-13 2007-09-13 Sony Computer Entertainment Inc. Texture unit for multi processor environment
CN101344961A (en) * 2007-06-07 2009-01-14 辉达公司 Extrapolation of nonresident mipmap data using resident MIPMAP data
CN106254877A (en) * 2015-06-11 2016-12-21 Arm有限公司 Processing system for video
CN106683171A (en) * 2016-12-12 2017-05-17 中国航空工业集团公司西安航空计算技术研究所 GPU multi-thread texture mapping SystemC modeling structure

Similar Documents

Publication Publication Date Title
CN110036413B (en) Gaze point rendering in tiled architecture
KR101697910B1 (en) Fault-tolerant preemption mechanism at arbitrary control points for graphics processing
CN108027955B (en) Techniques for storage of bandwidth-compressed graphics data
CN111062858B (en) Efficient rendering-ahead method, device and computer storage medium
US9280956B2 (en) Graphics memory load mask for graphics processing
US20160048980A1 (en) Bandwidth reduction using texture lookup by adaptive shading
EP3353746B1 (en) Dynamically switching between late depth testing and conservative depth testing
CN112801855B (en) Method and device for scheduling rendering task based on graphics primitive and storage medium
KR20170132758A (en) Hybrid 2d/3d graphics rendering
EP3427229B1 (en) Visibility information modification
CN111127299A (en) Method and device for accelerating rasterization traversal and computer storage medium
CN111080761B (en) Scheduling method and device for rendering tasks and computer storage medium
CN110928610B (en) Method, device and computer storage medium for verifying shader function
CN111989715A (en) Compressed visibility state for GPU compatible with hardware instantiation
CN111383314A (en) Method and device for verifying shader function and computer storage medium
CN111179403A (en) Method and device for parallel generation of texture mapping Mipmap image and computer storage medium
CN113256764A (en) Rasterization device and method and computer storage medium
CN111179151A (en) Method and device for improving graphic rendering efficiency and computer storage medium
CN114037795A (en) Invisible pixel eliminating method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 265503 No. 402, No. 7, No. 300, Changjiang Road, economic and Technological Development Zone, Yantai City, Shandong Province

Applicant after: Yantai Xintong Semiconductor Technology Co.,Ltd.

Address before: 211800 b403, No. 320, pubin Road, Jiangpu street, Pukou District, Nanjing City, Jiangsu Province

Applicant before: Nanjing Xintong Semiconductor Technology Co.,Ltd.