WO2024007293A1 - 基于位图图元的图形处理系统,方法和gpu - Google Patents

基于位图图元的图形处理系统,方法和gpu Download PDF

Info

Publication number
WO2024007293A1
WO2024007293A1 PCT/CN2022/104589 CN2022104589W WO2024007293A1 WO 2024007293 A1 WO2024007293 A1 WO 2024007293A1 CN 2022104589 W CN2022104589 W CN 2022104589W WO 2024007293 A1 WO2024007293 A1 WO 2024007293A1
Authority
WO
WIPO (PCT)
Prior art keywords
primitive
bitmap
data
pixel
layer
Prior art date
Application number
PCT/CN2022/104589
Other languages
English (en)
French (fr)
Inventor
卓永红
Original Assignee
卓永红
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 卓永红 filed Critical 卓永红
Priority to PCT/CN2022/104589 priority Critical patent/WO2024007293A1/zh
Priority to CN202280002320.3A priority patent/CN115349136A/zh
Publication of WO2024007293A1 publication Critical patent/WO2024007293A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/325Power saving in peripheral device
    • G06F1/3275Power saving in memory, e.g. RAM, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1028Power efficiency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/22Employing cache memory using specific memory technology
    • G06F2212/221Static RAM
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/455Image or video data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the technical field of computer graphics processing, and in particular to a graphics processing system, method and GPU based on bitmap primitives.
  • a GPU is a microprocessor that can run graphics operations on a variety of different smart computing devices, such as computer workstations, mobile phones, embedded systems, personal computers, tablets, and video game consoles.
  • the main purpose of the GPU is to convert and drive the display information required by intelligent computing devices, and to provide line scan signals to the display to control the correct display of the display.
  • GPUs usually include computing units and storage units. The more computing units and storage units, the faster the GPU's processing speed and the more expensive it is.
  • the technical problem to be solved by this invention is to provide a graphics processing system, method and GPU based on bitmap primitives, which can achieve better display effects at a lower cost and significantly reduce power consumption.
  • a GPU based on bitmap primitives includes one or more sub-processors, the sub-processors are connected through a GPU bus, and a first cache and a second high-speed cache of parallel access are provided to the sub-processors. cache, and a timing generator connecting the sub-processor, the first cache and the second cache; the sub-processor includes at least one primitive filter, at least one depth processor, at least one command parser, and more A graphics primitive processor, at least one color processor that controls the work of the plurality of graphics primitive processors and at least one pixel shader; wherein the color processor is based on the image of one or more bitmap primitives where the pixel to be drawn is located.
  • Metadata and primitive formatting data start one or more primitive processors that can process the primitive format of the bitmap primitive where the pixel to be drawn is located, and obtain the pixel ARGB generated by the one or more primitive processors value and provide the pixel ARGB value to the pixel shader; the pixel shader calculates the pixel of the pixel to be drawn based on the pixel synthesis command of one or more bitmap primitives where the pixel to be drawn is located and the pixel ARGB value.
  • the first cache is configured with a bitmap primitive data cache area for caching the primitive data of the bitmap primitive;
  • the second cache A primitive cache area is configured to cache the primitive format data of the bitmap primitive; wherein the primitive data of the bitmap primitive at least includes data recording the primitive format of the bitmap primitive, recording the The data of the ARGB value of the bitmap primitive and the data recording the storage address and size of the bitmap primitive;
  • the formatted data of the graphic primitive at least includes the object index data used to identify the bitmap primitive, recording the The data of the position and size of the bitmap primitive area, the data of the layer overlay display relationship between the bitmap primitive and other bitmap primitives, and the data of the pixel synthesis command between the bitmap primitive and other bitmap primitives. data.
  • a graphics processing system based on bitmap primitives includes a CPU, a first static random access memory, a second static random access memory, and a third static random access memory that provide parallel access.
  • the first static random access memory, the second static random access memory, and the third static random access memory are connected to the GPU;
  • the second static random access memory is configured with a bitmap primitive data buffer for storing the primitive data of the bitmap primitive.
  • area the third static random access memory is configured with a buffer for storing primitive formatted data of the bitmap primitive.
  • a graphics processing method based on bitmap primitives includes dividing the data information of the bitmap primitives into at least primitive data and primitive formatted data; the primitive data at least includes recording bitmaps.
  • the primitive format data of the bitmap primitive at least includes the use of Use the object index data that identifies the bitmap primitive, record the data on the location and size of the bitmap primitive, record the data on the layer overlay display relationship between the bitmap primitive and other bitmap primitives, and record the bitmap
  • the data of the pixel synthesis command between the graphics primitive and the other bitmap primitives store the graphics primitive data and the graphics primitive formatting data in different storage areas or different memories in the memory; provide the graphics primitive to the GPU. Parallel access to the data and formatted data of this primitive.
  • the invention divides the data information of bitmap primitives into primitive data and primitive formatted data, and stores them respectively in different SRAMs that can provide parallel access, so that the GPU can quickly obtain the primitive data and primitive formatted data. , and uses adapted color processing methods for bitmap primitives of different primitive formats to achieve the purpose of using the smallest area of SRAM and achieving better display effects, reducing hardware costs and system power consumption, and eliminating the need for DRAM display cache.
  • Figure 1 is an example of bitmap primitive layering of a car instrument UI interface according to some embodiments of the present invention.
  • Figure 2 is a structural block diagram of a graphics processing system based on a single-core GPU according to one embodiment.
  • Figure 3 is a structural block diagram of a multi-core GPU-based graphics processing system according to one embodiment.
  • FIG. 4 is a simplified flowchart of a graphics processing method based on bitmap primitives according to one embodiment.
  • Figure 5 is an example of data in SRAM2 according to some embodiments of the present invention.
  • Figure 6 is another example of data in SRAM2 according to some embodiments of the present invention.
  • Figure 7 is an example of data in SRAM 3 according to some embodiments of the present invention.
  • Figure 8 is an example of data in CACHE1 according to some embodiments of the present invention.
  • Figure 9 is an example of data in CACHE2 according to some embodiments of the present invention.
  • Figure 10 is a structural block diagram of a solid color primitive processor according to one embodiment.
  • Figure 11 is a structural block diagram of an ARGB primitive processor according to one embodiment.
  • Figure 12 is a structural block diagram of a linear gradient primitive processor according to one embodiment.
  • Figure 13 is a structural block diagram of a radial gradient gradient primitive processor according to one embodiment.
  • Figure 14 is a structural block diagram of a general primitive processor according to one embodiment.
  • Figure 15 is a structural block diagram of a color processor according to one embodiment.
  • Figure 16 is a schematic diagram of a ColorReady connection node of a dynamically reconfigurable matrix circuit based on CrossBar in one embodiment.
  • Figure 17 is a structural block diagram of a pixel shader according to one embodiment.
  • Figure 18 is a structural block diagram of a primitive filter according to one embodiment.
  • Figure 19 is a structural block diagram of a depth processor according to one embodiment.
  • Figure 20 is a flowchart of a command parser obtaining a pixel synthesis command according to one embodiment.
  • Figure 21 is a flow chart of a primitive filter filtering bitmap primitives according to one embodiment.
  • bitmap graphics are composed of pixels. Enlarging the graphics will cause distortion and the files will be larger, but they can provide a more realistic and richer color display effect.
  • Vector graphics are composed of straight lines and curves. Enlarging them will not distort the files and have smaller files. They are usually used for drawing. iconicon.
  • Graphics processing systems usually include CPU, GPU and memory, where application programs APP, graphics API, GPU drivers and graphics data are stored.
  • the CPU runs the graphics API and application APP, and calls the GPU driver to start the GPU operation.
  • the GPU reads the graphics data in the memory to form a UI graphical interface and outputs it to the monitor.
  • designing application UI interfaces entirely with bitmaps as graphics elements often results in larger program files and higher requirements for hardware configuration.
  • graphics elements refer to graphic elements, such as text in various glyphs, symbols and icons in various shapes, colorful or single background pictures, etc.
  • bitmap refers to the graphics encoding format based on bitmap.
  • layer mentioned in this application includes three meanings: the hierarchical attribute of the graphic element, the overlay display level of the graphic element, and the bitmap graphic element during overlay display.
  • the graphics processing system based on bitmap primitives of the present invention includes a CPU, which is encapsulated in the MCU and static random memories SRAM1, SRAM2, and SRAM3 that are connected to the CPU through a system communication bus. They are connected to the CPU and SRAM1, SRAM2, and SRAM3 through a system communication bus.
  • SRAM1 is configured with three buffers (buffers) used to store application APP, graphics API, and GPU driver.
  • a GPU may be a single-core GPU with only one sub-processor or a multi-core GPU including multiple sub-processors operating in parallel.
  • FIG. 2 illustrates a structural block diagram of a graphics processing system based on a single-core GPU according to one embodiment, and FIG.
  • the GPU also includes a first cache CACHE1 and a second cache CACHE2 that connect one or more sub-processors through a communication bus (GPU bus) within the GPU.
  • the first cache CACHE1 and the second cache CACHE2 may also be high-speed static random access memories (SRAM). Compared with dynamic random access memory DRAM, SRAM has the advantages of fast reading speed and low power consumption, but it is also more expensive.
  • SRAM static random access memories
  • Each pixel of a bitmap primitive in Bitmap format is assigned a specific coordinate (x, y) and transparency and color ARGB values.
  • the color information of each pixel is represented by RGB, and the transparency is represented by A. According to the information depth, bitmaps can be divided into 1, 4, 8, 16, 24 and 32 bit, etc. The more bits of information used per pixel, the more colors are available, the more realistic the color performance, and the corresponding amount of data, which requires more storage space.
  • Embedded systems in the prior art usually use a frame buffer memory (Frame Buffer) to process Bitmap bitmap primitives.
  • Each storage unit of the frame buffer memory corresponds to a pixel on the screen, and the entire frame buffer memory corresponds to a frame of image, which is a direct image of the picture displayed on the screen.
  • SRAM Serial RAM
  • Existing technologies usually use cheaper DRAM memory, but DRAM memory consumes a lot of power, which also leads to increased device complexity.
  • FIG. 4 illustrates a simplified flowchart of a bitmap primitive-based graphics processing method according to some embodiments of the present invention.
  • the method includes blocks 70-72.
  • the data information of the bitmap primitive is divided into at least primitive data and primitive formatted data.
  • the primitive data at least includes data recording the primitive format of the bitmap primitive, data recording the ARGB value of the bitmap primitive, and data recording the storage address and size of the primitive data of the bitmap primitive.
  • Graph element formatting data includes at least the object index data used to identify the bitmap graph element, data recording the location and size of the bitmap graph element area, and data recording the layer overlay display relationship between the bitmap graph element and other bitmap graph elements. , records the data of the pixel synthesis command between the bitmap primitive and other bitmap primitives.
  • the primitive data and primitive formatted data are stored in different storage areas or different memories in the memory respectively.
  • SRAM2 and SRAM3 are configured to store primitive data and primitive formatted data respectively
  • the first cache CACHE1 and the second cache CACHE2 are configured to read data provided by SRAM2 and SRAM3 respectively.
  • parallel access to the primitive data and primitive formatted data is provided to the GPU.
  • the sub-processor of the GPU accesses SRAM2 and SRAM3 in parallel, and uses adaptive processing methods for bitmap primitives of different primitive formats to achieve the purpose of using the smallest area of SRAM, faster processing speed, and better display effects, reducing the Hardware costs.
  • blocks 70-72 discussed above may be performed in a different order depending on actual needs.
  • the graphics processing method based on bitmap primitives also includes dividing the graphics primitives into at least three layer levels, which are window layer primitives as the basic level, and controls belonging to the window layer primitives.
  • a two-dimensional UI graphic interface contains one or more window layer primitives, each window layer primitive contains one or more control layer primitives, and each control layer primitive contains one or more bitmap layer primitives.
  • the bitmap layer primitive is the minimum node, and the window layer primitive is the maximum node.
  • Window layer primitives correspond to ordinary graphics windows, pop-up windows, dialog windows, floating windows, etc.
  • Control layer primitives correspond to buttons, scroll bars, status lists, edit boxes, picture boxes, etc.
  • Bit layer layer primitives corresponds to static or dynamic pictures, text, numbers, icons, etc.
  • the bitmap primitive-based graphics processing method further includes classifying the bitmap primitives into at least three different primitive formats, which are solid-color bitmap primitives containing the same color and the same transparency, and different bitmap primitives containing different ARGB bitmap primitives containing colors and the same or different transparency, and glyph bitmap primitives containing the same or different colors and the same or different transparency.
  • the SRAM 2 is configured with a bitmap primitive data buffer 30 for storing primitive data of bitmap primitives.
  • the primitive data 301 of the solid-color bitmap primitive includes data (BitmapFormat) used to record the primitive format of the solid-color bitmap primitive, and data used to record the fill color ARGB value of the solid-color bitmap primitive. , used to record the data storage address and size of the primitive data of the solid-color bitmap primitive.
  • the primitive data 302 of the ARGB bitmap primitive includes data (BitmapFormat) used to record the primitive format of the ARGB bitmap primitive, and data 310 used to record multiple pixel ARGB values of the ARGB bitmap primitive.
  • the metadata of ARGB bitmap primitives stores address and size data.
  • ARGB bitmap primitives can be divided into 32bitARGB bitmap primitives, ARGB-C bitmap primitives, 24bit ARGB bitmap primitives with the same transparency, 24bit different transparency ARGB bitmap primitives, and 16bit ARGB bitmap primitives according to size and format.
  • the transparency A and color components R, G, and B of each pixel of the 32-bit ARGB bitmap primitive are recorded with 8 bits each; the color components R, G, and B of each pixel of the 24-bit ARGB bitmap primitive with the same transparency are each recorded with 8 bits.
  • the transparency A of each pixel is recorded with 8 bits, the color components R and B of each pixel are recorded with 5 bits, and the color component G of each pixel is recorded with 6 bits.
  • each pixel of the 16-bitARGB bitmap primitive uses 5 bits, the color component G of each pixel is recorded with 6 bits, and the transparency A defaults to 0xFF; each pixel of the mono bitmap primitive uses 1 bit To record colors, the default transparency is 0xFF; each pixel of the palette bitmap primitive uses an 8-bit index to record its ARGB value.
  • the index table has 256 items and 256 colors can be defined.
  • Each pixel of a JPEG bitmap primitive has the same transparency, and the color value comes from the JPEG image decoder; the color and saturation value of each pixel of a PNG bitmap primitive comes from the PNG image decoder.
  • ARGB-C bitmap primitives refer to bitmap primitives that have only a few valid pixels (transparency value is non-0) and most of them are transparent areas (transparency value is 0).
  • the primitive data 304 of the ARGB-C bitmap primitive includes data (BitmapFormat) used to record the primitive format of the ARGB-C bitmap primitive, data used to record the effective pixels of the ARGB-C bitmap primitive, and Data used to record the primitive data storage address and size of ARGB-C bitmap primitives.
  • the data used to record the effective pixels of the ARGB-C bitmap primitive is recorded in rows and includes one or more row data blocks 320.
  • Each row data block includes the position coordinates of the first valid pixel of the valid pixel row.
  • the first row data block 321 records the coordinates of the first effective pixel in the first row of the ARGB-C bitmap primitive, the number of effective pixels in the first row and the effective pixels [1...m] in the first row arranged in columns. ARGB value.
  • Linear gradient bitmap primitives refer to bitmap primitives with linear gradient effects.
  • the primitive data 305 of the linear gradient gradient bitmap primitive includes data (BitmapFormat) used to record the primitive format of the linear gradient gradient bitmap primitive, and data used to record the coordinates of the starting point of the linear gradient.
  • the primitive data of a bitmap primitive stores address and size data.
  • Radial gradient gradient bitmap primitives refer to bitmap primitives with radial gradient gradient effects.
  • the primitive data 306 of the radial gradient gradient bitmap primitive includes data (BitmapFormat) used to record the primitive format of the radial gradient gradient bitmap primitive, used to record the gradient area circle
  • the data of the coordinates of the center point is used to record the data of the inner radius of the gradient area circle. It is used to record the data of the outer radius of the gradient area circle. It is used to record the data of the radial gradient gradient calculation formula. It is used to record the radial gradient gradient position.
  • the primitive data of the graph primitive stores address and size data.
  • the primitive data 307 of the JPEG bitmap primitive includes data (BitmapFormat) used to record the primitive format of the JPEG bitmap, the storage area address of the JPEG encoded data and the code stream byte length.
  • the primitive data 308 of the PNG bitmap includes data (BitmapFormat) used to record the primitive format of the PNG bitmap, the storage area address of the PNG encoded data and the code stream byte length.
  • the primitive data 303 of the glyph bitmap primitive includes data used to record the primitive format of the glyph bitmap primitive, data used to record the glyph ARGB value of the glyph bitmap primitive, data used to record the glyph bitmap primitive's Glyph outline data, used to record the data storage address and size of the glyph bitmap primitive's primitive data.
  • Glyph bitmap primitives can be 8bit glyph bitmap primitives. Each pixel of the 8-bit glyph bitmap primitive uses 8 bits to record the transparency A of the pixel, and the color of the text is specified by the glyph ARGB value.
  • bitmap metadata cache area 50 is allocated in the first cache CACHE1 to read and cache the data of the bitmap metadata buffer 30 in SRAM2.
  • the bitmap primitive data cache area 50 includes a plurality of data blocks, each data block stores the primitive data of a bitmap primitive. If the pixels to be drawn involve bitmap primitives of six different layers, the bitmap primitives of the six different layers can be stored in six data blocks 501, 502, 503, 504, 505, and 506 respectively. metadata.
  • the SRAM 3 is configured with a window layer buffer 40 for saving graphic element formatting data of multiple window layer graphic elements, and a window layer buffer 40 for saving graphic element formatting data of multiple control layer graphic elements.
  • the control layer buffer 41, and the bitmap layer buffer 42 used to save the formatted data of multiple bitmap layer primitives.
  • the window layer buffer 40 is divided into a plurality of window layer primitive formatted data blocks 401.
  • a window layer bitmap element formatting data block 409 stores the element formatting data of a window layer element, including identifying the window layer element (such as a dialog window, a floating window, a message window, a regular window, etc.)
  • the object index data (window index window_index) is used to record the data of the primitive format of the window layer primitive (picture primitive format BitmapFormat), and is used to record the data of the position and size of the window layer primitive area (window area window_rect (x, y, w, h)), used to record the data of the number of control layer primitives belonging to the window layer primitive (number of control layer primitives), used to identify the object index data of the control layer primitive belonging to the window layer primitive (Control index widget_index), used to mark the layer overlay display relationship between window layer primitives and other window layer primitives (window layer serial number window_layer), used to record the window layer primitives and other window layer primitives
  • the control layer buffer 41 is divided into a plurality of control layer primitive formatted data blocks 411.
  • a control layer primitive formatting data block 419 stores the primitive formatting data of a control layer primitive, including information used to identify the control layer primitive (such as buttons, scroll bars, status lists, edit boxes, picture boxes, etc.
  • control index widget_index used to record the data of the primitive format of the control layer primitive (graphic primitive format BitmapFormat), used to record the data of the position and size of the control layer primitive area (control area widget_rect (x, y, w, h)), used to identify the object index data (bitmap index bitmap_index) of the bitmap layer primitive belonging to the control layer primitive, used to record the bitmap layer primitive belonging to the control layer primitive Number of data (number of bitmap layer primitives), layer number used to mark the layer overlay display relationship between a control layer primitive and other control layer primitives belonging to the same window layer primitive (control layer serial number widget_layer) , used to record the data of the pixel synthesis command of the control layer primitive and other control layer primitives (pixel synthesis command widget_pixel_cmd), to identify the object index data (window index window_index) of the window layer primitive to which the control layer primitive belongs, and Object index data used to identify other control layer primitives belonging to the same window layer primitive (sibling control index
  • the bitmap layer buffer 42 is divided into a plurality of bitmap layer primitive formatted data blocks 421.
  • a bitmap layer primitive format data block 429 stores the graphic primitive format data of a bitmap layer primitive, including data used to record the graphic primitive format of the bitmap layer primitive (pixel format BitmapFormat), The object index data (control index widget_index) used to identify the control layer primitive to which the bitmap layer primitive belongs, and the data used to record the position and size of the bitmap layer primitive's envelope rectangle (bitmap envelope rectangle DispRect(x ,y,w,h)), used to record the position and size data of the bitmap layer primitive's clipping rectangle (bitmap clipping rectangle ClipRect(x,y,w,h)), used to record the bitmap layer primitive
  • the data of the primitive data storage location which is the layer serial number (bitmap layer serial number bitmap_layer) used to mark the layer overlay display relationship between the bitmap layer primitive and other bitmap layer primitives belonging to the same control layer primitive.
  • pixel composition command bitmap_pixel_cmd Data used to record the pixel composition command of a bitmap layer primitive and other bitmap layer primitives
  • object index data used to identify other bitmap layer primitives belonging to the same control layer primitive
  • sibling bitmap index sibling_Bitmap_index The sibling bitmap index sibling_Bitmap_index, the GPU can traverse all bitmap layer primitives under the same control layer primitive.
  • the second cache CACHE2 is configured with a primitive cache area 60 to read the graphics of the window layer buffer 40, the control layer buffer 41 and the bitmap layer buffer 42 in SRAM3. Meta-formatted data and cached.
  • the primitive cache area 60 includes a plurality of data blocks, and each data block stores primitive formatted data of different bitmap primitives. If the pixels to be drawn involve six different bitmap primitives, the primitive formatting data of the six bitmap primitives can be stored in six data blocks 601, 602, 603, 604, 605, and 606 respectively.
  • each layer is defined according to the layer overlay display relationship in which the bitmap layer primitives are placed on top of the control layer primitives and the control layer primitives are placed on top of the window layer primitives.
  • the initialized layer number of the bitmap primitive The higher the bitmap primitive is, the larger the layer number is. The lower the bitmap primitive is, the smaller the layer number is.
  • the layer number of the upper bit layer primitive is larger, and the layer number of the lower bit layer primitive is smaller.
  • the layer number of the control layer primitive above is larger, and the layer number of the control layer primitive below is smaller.
  • control layer primitives with larger layer numbers are displayed on top and the control layer primitives with smaller layer numbers are displayed below.
  • window layer primitives with larger layer numbers are displayed on top and the window layer primitives with smaller layer numbers are displayed below.
  • the GPU sets the new layer according to the display rules of superimposing the new layer on the original layer.
  • the layer number of the new layer is greater than the original layer.
  • Window layer primitive 1 includes control layer primitive 11, control layer primitive 12, and control layer primitive 13.
  • the control layer primitive 11 also includes the bitmap layer primitive 111
  • the control layer primitive 12 also includes the bitmap layer primitive 121, the bitmap layer primitive 122, the bitmap layer primitive 123, the bitmap layer primitive 124
  • the control layer primitive 13 also includes a bitmap layer primitive 131.
  • Window layer primitive 2 includes control layer primitive 21, control layer primitive 22; control layer primitive 21 also includes bitmap layer primitive 211, bitmap layer primitive 212, bitmap layer primitive 213; control layer primitive 22 also includes bitmap layer primitive 221 and bitmap layer primitive 222.
  • control layer primitive 11, control layer primitive 12, and control layer primitive 13 belonging to window layer primitive 1 are displayed on window layer primitive 1.
  • bitmap layer primitive 111 is superimposed on control layer primitive 11, bitmap layer primitive 121, bitmap layer primitive 122, bitmap layer primitive 123, bitmap layer belonging to control layer primitive 12
  • the graphic element 124 is superimposed on the control layer graphic element 12
  • the bitmap layer graphic element 131 is superimposed on the control layer graphic element 13, and the display level of the window layer graphic element 1 is at the bottom.
  • control layer primitive 21 and control layer primitive 22, which both belong to window layer primitive 2 are in the middle, bitmap layer primitive 211, bitmap layer primitive 212, bitmap layer primitive 213, bitmap layer primitive Element 221, bitmap layer element 222 is at the top, and window layer element 2 is at the bottom. Then, according to the layer number size of the bitmap layer primitive that also belongs to the control layer primitive 21, the bitmap layer primitive 212 and the bitmap layer primitive 213 with the larger layer number are superimposed on the bitmap layer 211 for display.
  • bitmap layer primitive 223 and bitmap layer primitive 224 need to be added to the original control layer primitive 22, and bitmap layer primitive 223 needs to be added.
  • the original bitmap layer graphic element 222 is superimposed and displayed, and the bitmap layer graphic element 224 is to be superimposed and displayed on the original control layer graphic element 22 .
  • the GPU traverses the area of the bitmap layer primitive 223 and the bitmap layer primitive 224 currently to be displayed to see if there is an original bitmap layer primitive that belongs to the same control layer primitive 22. If so, it will
  • the layer serial number of the layer primitive bitmap_layer is basically increased by 1 as the bitmap layer serial number of the newly added bitmap layer primitive. If there is no bitmap layer serial number, the bitmap layer serial number of the newly added bitmap layer primitive is 0.
  • bitmap primitives containing this pixel include bitmap layer primitive 223, bitmap layer primitive 222, and control layer from top to bottom.
  • Graph element 22 window layer graph element 2.
  • the GPU will be based on the layer serial number that records the layer overlay display relationship of the window layer, control layer, and bitmap layer, as well as the layer overlay display relationship of the new bitmap layer primitives superimposed on the original bitmap layer primitives, for these bitmaps.
  • the primitive calculates the depth value Z, sorts the depth value Z from small to large, and allocates the corresponding color channels in order, and allocates the corresponding primitive processor according to the primitive format for processing, and then according to the corresponding pixels of each bitmap primitive
  • the synthesis command finally generates the pixel synthesis ARGB value with overlay display effect, and then outputs it to the display.
  • the pixel synthesis command may be set according to PORTER-DUFF image synthesis rules or other image synthesis rules.
  • the bitmap primitive with the largest depth value is superimposed on the bitmap primitive with the smallest depth value and is displayed.
  • a single-core GPU is packaged with a sub-processor that generates and outputs pixel composite ARGB values, and the first cache CACHE1, the second cache CACHE2, and the sub-processors are connected through the GPU bus.
  • the sub-processor is connected to the timing generator of the first cache CACHE1 and the second cache CACHE2 through the GPU bus.
  • the multi-core GPU is encapsulated with multiple sub-processors that generate and output pixel synthesis ARGB values, and the first cache CACHE1 and the second cache CACHE2 that connect the multiple sub-processors through the GPU bus. and a timing generator that connects multiple sub-processors to the first cache CACHE1 and the second cache CACHE2 through the GPU bus.
  • Each sub-processor works the same but processes different pixels. If a multi-core GPU containing 200 sub-processors is used to process the example in Figure 1, the multi-core GPU can assign sub-processor 1 to process pixel P(200,100), sub-processor 2 to process pixel P(200,101), and sub-processor 3 to process pixel P(200,102)..., 200 sub-processors can process 200 pixels at the same time.
  • the CPU calls the program data of the application APP stored in SRAM1, starts the application APP, calls the program data of the graphics API and GPU driver, and starts the GPU driver.
  • the GPU driver initializes the GPU, decomposes the UI elements of the application APP into the primitive formatted data of the window layer primitive, the primitive formatted data of the control layer primitive, and the primitive formatted data of the bitmap layer primitive, and separates them respectively. Save to the window layer buffer 40, control layer buffer 41 and bitmap layer buffer 42 of SRAM3, start the GPU work, and store the APP metadata in the bitmap metadata buffer 30 of SRAM2.
  • the second cache CACHE2 reads from the window layer buffer 40, the control layer buffer 41 and the bitmap layer buffer 42 of SRAM3 through the system communication bus according to the sub-processor's demand for formatted graphics data required for drawing the current pixel. Get the primitive formatted data and send it to the sub-processor.
  • the first cache CACHE1 reads the bitmap metadata from the bitmap metadata buffer 30 of SRAM2 through the system communication bus according to the subprocessor's requirement for drawing the metadata required for the current pixel, and sends it to the subprocessor. .
  • the first cache CACHE1 and the second cache CACHE2 read data in parallel.
  • the sub-processor of the GPU includes reading the second cache CACHE2 according to the pixels to be drawn generated by the timing generator and filtering out all bitmap primitives that contain the pixels to be drawn within the area of the bitmap primitives.
  • the primitive filter saves the layer data of filtered bitmap primitives in the layer data register array BitmapList, calculates the depth value Z of all filtered bitmap primitives, and stores the depth value Z of all filtered bitmap primitives.
  • the depth processor that sorts the primitive parameter information according to the depth value Z, saves the depth sorting register array ZSortList of the primitive parameter information of all the sorted bitmap primitives, and reads the second cache CACHE2 according to the sorted primitive parameter information.
  • the command parser of the pixel synthesis command of the corresponding bitmap primitive saves the primitive command register array of the pixel synthesis command of the sorted bitmap primitive, reads the data of the first cache CACHE1 and divides it according to the primitive format BitmapFormat Multiple primitive processors that can generate pixel ARGB values, read the data of the first cache CACHE1 and schedule the color channels in sequence according to the sorted bitmap primitives and schedule the corresponding primitive processors according to the primitive format BitmapFormat
  • a color processor that jointly generates the ARGB value of the pixel to be drawn in one or more bitmap primitives, and generates and outputs the pixel of the pixel to be drawn based on the ARGB value of the pixel to be drawn in one or more bitmap primitives and the pixel composition command. Pixel shader that synthesizes ARGB values.
  • Layer data may include bitmap index bitmap_index, pixel format bitmapformat, window index window_index, control index widget_index, bitmap layer number bitmap_layer, control layer number widget_layer, window layer number window_layer, etc.
  • the GPU includes multiple primitive processors, and each primitive format BitmapFormat is configured with at least one primitive processor. The number of primitive processors will affect how quickly the graphics refreshes the display.
  • Figure 10 illustrates the structural block diagram of one of the solid color primitive processors used to process solid color bitmap primitives.
  • the solid color primitive processor may include a primitive parameter register P1 that receives and saves configuration parameters sent by the color processor, and an addressing/calculation unit S1 connected to the primitive parameter register P1.
  • the primitive parameter register P1 may include a command register R0a used to store a variety of work commands (such as primitive processor parameter settings, starting the primitive processor, stopping the primitive processor, pausing the primitive processor, etc.) , the primitive format register R1a used to store the primitive format BitmapFormat, the primitive area coordinate register R2a used to store the primitive area coordinates, the pixel coordinate register R3a used to store the pixel coordinates to be drawn, and the pixel coordinate register R3a used to store the pixels to be drawn ARGB
  • the pixel color register R4a of the value is used to store the fill color ARGB value of the primitive color register F1a.
  • the ARGB value of the pixel to be drawn in the pixel color register R4a is equal to the fill color ARGB value of the primitive color register F1a.
  • the addressing/calculation unit S1 directly writes the fill color ARGB value of the primitive color register F1a into the pixel color register R4a without calculation, and then outputs the ColorReady signal indicating that the color has been completed to the color processor.
  • FIG 11 illustrates the structural block diagram of one of the ARGB primitive processors used to process ARGB bitmap primitives.
  • the ARGB primitive processor may include obtaining the primitive data of the ARGB bitmap primitive from the first cache CACHE1 through the GPU bus. and save the ARGB cache A2, receive the configuration parameters sent by the color processor and save the primitive parameter register P2, connect the primitive parameter register P2 and the addressing/calculation unit S2 of the ARGB cache A2, and connect the primitive parameter register P2 and the pixel color buffer C2 of the addressing/calculation unit S2.
  • the addressing/calculation unit obtains data from the ARGB cache A2 and the primitive parameter register P2 and calculates the ARGB value of the pixel to be drawn according to the preset calculation formula.
  • the primitive parameter register P2 may include a command register R0b, a primitive format register R1b, a primitive area coordinate register R2b, a pixel coordinate register R3b, a pixel color register R4b used to store the ARGB value of the pixel to be drawn, and a pixel color register R4b used to store the ARGB bitmap.
  • the data storage address register F1b is the primitive data storage address of the first pixel of the primitive
  • the data row byte length register F2b is used to store the row byte length of the ARGB bitmap primitive.
  • the command register R0b, the primitive format register R1b, the primitive area coordinate register R2b, the pixel coordinate register R3b, the pixel color register R4b are the same as the command register R0a and the primitive format register R1a of the solid color primitive processor of the example in Figure 10,
  • the primitive area coordinate register R2a, the pixel coordinate register R3a, and the pixel color register R4a have the same functions.
  • the addressing/calculation unit S2 reads the data of the primitive area coordinate register R2b, the pixel coordinate register R3b, the data storage address register F1b and the data row byte length register F2b, and calculates the current ARGB bitmap according to the preset calculation formula.
  • the preset calculation formula includes the step of calculating the coordinates of the pixel to be drawn, and the step of calculating the storage address of the ARGB value of the pixel to be drawn in the ARGB cache according to the coordinates of the pixel to be drawn
  • the pixel color buffer C2 stores the coordinates XY and ARGB values of one or more pixels.
  • Figure 12 illustrates the block diagram of the primitive processor of one of the linear gradient bitmap primitives.
  • the linear gradient gradient primitive processor includes a primitive parameter register P3 that receives and saves the configuration parameters sent by the color processor, an addressing/calculation unit S3 connected to the primitive parameter register P3, and a primitive parameter register P3 connected to the primitive parameter register P3.
  • the pixel color cache C3 of the addressing/calculation unit S3; the addressing/calculation unit S3 obtains data from the primitive parameter register P3 and calculates the ARGB value of the pixel to be drawn according to the preset calculation formula.
  • the primitive parameter register P3 may include a command register R0c, a primitive format register R1c, a primitive area coordinate register R2c, a pixel coordinate register R3c, and a pixel color register R4c used to store the ARGB value of the pixel to be drawn sent by the pixel color buffer.
  • the start color register F1c is used to store the linear gradient start color ARGB value of the current linear gradient gradient bitmap primitive
  • the end color register F2c is used to store the linear gradient end color ARGB value of the current linear gradient gradient bitmap primitive.
  • the start coordinate register F3c of the linear gradient starting point coordinate of the current linear gradient gradient bitmap primitive is used to store the end coordinate register F4c of the linear gradient end point coordinate of the current linear gradient gradient bitmap primitive.
  • the command register R0c, the primitive format register R1c, the primitive area coordinate register R2c, the pixel coordinate register R3c, the pixel color register R4c are the same as the command register R0a and the primitive format register R1a of the solid color primitive processor of the example in Figure 10,
  • the primitive area coordinate register R2a, the pixel coordinate register R3a, and the pixel color register R4a have the same functions.
  • the addressing/calculation unit S3 reads the data of the primitive parameter register P3, and calculates the ARGB value of the pixel to be drawn according to the preset calculation formula, sends it to the pixel color register R4c, and compares the ARGB value of the pixel to be drawn with the The coordinates of the pixels to be drawn are sent together to the pixel color cache C3, and then the ColorReady signal indicating that the color is completed is output to the color processor.
  • the preset calculation formula is the linear gradient gradient calculation formula contained in the primitive data of the linear gradient gradient bitmap.
  • the pixel color cache C3 stores the coordinates XY and ARGB values of one or more pixels.
  • Figure 13 illustrates the structural block diagram of a primitive processor of one of the radial gradient gradient bitmap primitives.
  • the radial gradient gradient primitive processor may include a primitive parameter register P4 that receives and saves the configuration parameters sent by the color processor, an addressing/calculation unit S4 connected to the primitive parameter register P4, and an addressing/calculation unit S4 connected to the primitive parameter register P4 and the addressing unit S4. Address/pixel color buffer C4 of calculation unit S4.
  • the primitive parameter register may include a command register R0d, a primitive format register R1d, a primitive area coordinate register R2d, a pixel coordinate register R3d, a pixel color register R4d, and a start color register F1d used to store the radial gradient start color ARGB value.
  • the end color register F2d is used to store the ARGB value of the radial gradient end color
  • the circle center coordinate register F3d is used to store the coordinates of the center point of the circle in the gradient area
  • the inner radius register F4d is used to store the inner radius of the gradient area circle, which is used to store the gradient.
  • Circle outer radius register F5d for area circle outer radius.
  • the command register R0d, the primitive format register R1d, the primitive area coordinate register R2d, the pixel coordinate register R3d, the pixel color register R4d are the same as the command register R0a and the primitive format register R1a of the solid color primitive processor of the example in Figure 10,
  • the primitive area coordinate register R2a, the pixel coordinate register R3a, and the pixel color register R4a have the same functions.
  • the addressing/calculation unit S4 reads the data of the primitive parameter register P4, and calculates the ARGB value of the pixel to be drawn according to the preset calculation formula, sends it to the pixel color register R4d, and compares the ARGB value of the pixel to be drawn with the value of the pixel to be drawn as needed.
  • the drawing pixel coordinates are sent to the pixel color buffer C4 together, and then the ColorReady signal indicating that the color is completed is output to the color processor.
  • the preset calculation formula is the radial gradient gradient calculation formula contained in the primitive data of the radial gradient gradient bitmap.
  • the pixel color buffer C4 stores the coordinates XY and ARGB values of one or more pixels.
  • Figure 14 illustrates the structural block diagram of one of the general primitive processors.
  • the general primitive processor can process bitmap primitives in a variety of different primitive formats, and executes corresponding processing procedures according to the primitive format BitmapFormat sent by the color processor.
  • the general primitive processor may include an ARGB cache A5 that obtains the primitive data of the bitmap primitive to be drawn from the first cache CACHE1 through the GPU bus and saves it, and a primitive parameter register that receives and saves the configuration parameters sent by the color processor.
  • P5 the addressing/calculation unit S5 that connects the primitive parameter register and the ARGB cache, and the pixel color cache C5 that connects the primitive parameter register and the addressing/calculation unit.
  • the primitive parameter register includes the command register R0e, the primitive format register R1e, the primitive area coordinate register R2e, the pixel coordinate register R3e, the pixel color register R4e and the ARGB values required for calculating the pixel to be drawn in multiple primitive formats.
  • the format of calculation parameters is special register F1e, F2e,...Fne.
  • the format-specific register may include a register used to store the fill color ARGB value, a data storage address register used to store the primitive data storage address of the first pixel of the ARGB bitmap primitive, and a data storage address register used to store the ARGB bitmap primitive.
  • the row byte length register used to store the linear gradient start color ARGB value.
  • the start color register used to store the linear gradient end color ARGB value.
  • the end color register used to store the start of the linear gradient starting point coordinates.
  • the coordinate register is used to store the end coordinate of the linear gradient.
  • the end coordinate register is used to store the start color ARGB value of the radial gradient.
  • the start color register is used to store the ARGB value of the end color of the radial gradient.
  • the end color register is used to store the radial gradient.
  • the primitive parameter register P5 receives the configuration parameters (including primitive format BitmapFormat, command parameters, primitive data, etc.) sent by the color processor.
  • the addressing/calculation unit S5 reads the format special registers F1e, F2e,... according to the primitive format.
  • Corresponding calculation parameters in Fne and read the data of ARGB cache A5 according to the calculation needs, and then calculate the ARGB value of the pixel to be drawn according to the preset calculation formula, send it to the pixel color register R4e, and convert the pixel to be drawn as needed
  • the ARGB value is sent to the pixel color cache C5 together with the coordinates of the pixel to be drawn, and then the ColorReady signal indicating that the color is completed is output to the color processor.
  • the GPU may include both a general primitive processor and a solid color primitive processor, an ARGB primitive processor, a glyph primitive processor, a linear gradient gradient primitive processor, and a radial gradient gradient primitive processor. and other different types of primitive processors.
  • the first cache CACHE1 and the second cache CACHE2 can be configured as SRAM memories for parallel processing, so that the sub-processors can obtain primitive data and primitive formatted data at the same time.
  • multiple primitive processors can also be configured for parallel processing to increase processing speed.
  • Figure 15 illustrates the structural block diagram of one of the color processors.
  • the color processor includes a primitive pixel color processing circuit and a dynamically reconfigurable matrix circuit.
  • the primitive pixel color processing circuit reads the bitmap primitive data cache area 50 with the image from the first cache CACHE1
  • the primitive data of the bitmap primitive in the meta-format is read from the primitive cache area 60 of the second cache CACHE2, and the primitive formatted data of the bitmap primitive with the object index data is read to the currently idle one.
  • the primitive parameter registers of multiple primitive processors send the configuration parameters required to start the work, and send commands to start the primitive processor work, so that the primitive processor generates pixel ARGB values.
  • the dynamically reconfigurable matrix circuit matches one or more color channels from bottom to top according to the overlay display relationship of one or more bitmap primitives where the pixel to be drawn is located, and establishes a connection with the one or more primitive processors. , obtain the signal indicating that the color has been completed from the primitive processor, and then read the pixel color register of the primitive processor to obtain the pixel ARGB value and output it.
  • the primitive pixel color processing circuit can be based on the primitive parameter information of the pixels to be drawn recorded in all ZSortList sorted lists stored in the depth sorting register array that have been sorted by the depth value Z (the primitive parameter information includes but is not limited to: bitmap index bitmap_index, primitive format bitmapFormat), obtain the corresponding bitmap primitive data and graphics from the bitmap primitive data cache area 50 of the first cache CACHE1 and the primitive cache area 60 of the second cache CACHE2 Meta-formatted data, and starts the work of the adapted primitive processor, and at the same time controls the dynamically reconfigurable matrix circuit.
  • the primitive parameter information includes but is not limited to: bitmap index bitmap_index, primitive format bitmapFormat
  • Figure 16 illustrates a schematic diagram of the ColorReady connection node of a dynamically reconfigurable matrix circuit based on CrossBar in one embodiment.
  • the GPU contains 13 primitive processors (Unit1 ⁇ Unit13) for processing 13 different primitive formats, and 9 color channels (ch1 ⁇ ch9) for processing 9 overlay display bitmap primitives.
  • the dynamically reconfigurable matrix circuit will process the first layer bitmap with the color channel ch1.
  • the primitive processor Unit1 of the primitive bitmap_index1 establishes a ColorReady connection, establishes a ColorReady connection between the color channel ch2 and the primitive processor Unit3 that processes the second-layer bitmap primitive bitmap_index2, and establishes a ColorReady connection between the color channel ch3 and the primitive processor Unit3 that processes the third-layer bitmap primitive.
  • the pixel processor Unit12 of bitmap_index3 establishes a ColorReady connection, establishes a ColorReady connection between the color channel ch4 and the pixel processor Unit9 that processes the fourth-layer bitmap primitive bitmap_index4, and establishes a ColorReady connection between the color channel ch5 and the pixel processor Unit9 that processes the fifth-layer bitmap primitive bitmap_index5.
  • the primitive processor Unit6 establishes a ColorReady connection, and establishes a ColorReady connection between the color channel ch6 and the primitive processor Unit5 that processes the sixth layer bitmap primitive bitmap_index6.
  • the color processor configures the primitive processor Unit1 to process the first-layer bitmap primitive bitmap_index1; according to the primitive format of the second-layer bitmap primitive bitmap_index2 bitmapformat2, the color processor configures the primitive processor Unit3 to process the second layer bitmap primitive bitmap_index2; according to the primitive format bitmapformat3 of the third layer bitmap primitive bitmap_index3, the color processor configures the primitive processor Unit12 to process the third layer bitmap Picture primitive bitmap_index3; According to the primitive format bitmapformat4 of the fourth-layer bitmap primitive bitmap_index4, the color processor configures the primitive processor Unit9 to process the fourth-layer bitmap primitive bitmap_index4; According to the picture of the fifth-layer bitmap primitive bitmap_index5 Meta format bitmapformat5, the color processor configures the primitive processor Unit6 to process the fifth layer bitmap primitive bitmap_index5; according to the sixth layer bitmap primitive bitmap_index6's primitive format bitmap
  • the pixel shader reads the ChannelColor[123456] stored in the pixel color register and the pixel synthesis command BitmapCmd[123456] stored in the primitive command register array, calculates the pixel synthesis ARGB value of the final overlay display, and sends it to the row pixel color buffer storage.
  • the pixel shader may include multiple pixel calculation units connected in series, and an initial color register connected to the first pixel calculation unit.
  • the initial color register stores the initial color ARGB value;
  • the first pixel calculation unit reads the initial color ARGB value and the pixel ARGB value of the first-layer bitmap primitive of the pixel to be drawn provided by the color processor, and compares the two according to the pixel to be drawn.
  • the pixel synthesis command defined in the first layer bitmap primitive calculates the first pixel synthesis ARGB value.
  • the nth pixel calculation unit reads the pixel composite ARGB value calculated by the previous pixel calculation unit and the pixel ARGB value of the nth layer bitmap of the pixel to be drawn provided by the color processor, and combines the two according to the pixel to be drawn in the nth layer bitmap.
  • the pixel synthesis command defined by the nth layer bitmap primitive calculates the nth pixel synthesis ARGB value; where n is a natural number greater than 1.
  • the maximum value of n is the number of bitmap primitives involved in the pixel to be drawn. If the pixel to be drawn involves 6 bitmap primitives, the maximum value of n is 6.
  • the pixel ARGB value of the first-layer bitmap primitive of the pixel to be drawn is obtained through the color channel 1 of the color processor.
  • the pixel ARGB value of the nth layer bitmap primitive of the pixel to be drawn is obtained through the color channel n of the color processor.
  • Figure 17 illustrates a structural block diagram of a pixel shader in one embodiment.
  • the initial color register stores the initial color PixelColor0, ARGB value (0,0,0,0).
  • the first pixel calculation unit F (bitmapCmd1) uses the pixel ARGB value of PixelColor0 and color channel 1 ChannelColor[1] according to the calculation formula of the pixel synthesis command BitmapCmd[1] of the bitmap primitive to calculate the pixel synthesis ARGB value PixelColor1 .
  • Pixel calculation unit 2 uses PixelColor1 and ChannelColor[2] of color channel 2, and calculates PixelColor2 according to the calculation formula of the pixel synthesis command BitmapCmd[2] of the bitmap primitive... until the pixel synthesis of all color channels of the pixel to be drawn is completed. , output the final synthesized pixel synthesized ARGB value to the row pixel buffer.
  • the timing generator reads the ARGB value of each pixel point by point from the line pixel buffer and outputs it to the display. Then, when the next line of field synchronization signal arrives, it starts the processing of the next line scan and the next frame of image, and then moves to the sub-processing
  • the processor sends the coordinate value of the next pixel to be drawn.
  • the second cache CACHE2 is set with an index register 61 (mini_window) for storing the window index window_index1611 of the first window layer primitive.
  • the primitive filter responds to the current line and field synchronization signal, obtains the coordinates of the pixel to be drawn, then reads the index register 61 (mini_window), obtains the window index window_index1 of the first window layer primitive, and traverses the windows one by one starting from the first window layer primitive.
  • Layer primitives, control layer primitives, and primitive formatted data of bitmap layer primitives send the layer data of the bitmap primitives containing the pixel coordinates to be drawn in the bitmap primitive area to the layer data register Array BitmapList storage.
  • Layer data includes at least object index data, primitive format and layer serial number.
  • Figure 18 illustrates a structural block diagram of a primitive filter in one embodiment.
  • the primitive filter may include a method for accessing the second cache CACHE2 through the GPU bus to obtain the window index window_index1 and primitive format data of the first window layer primitive and filtering out the layer data of the bitmap primitive where the pixel to be drawn is located.
  • Filter the pixel register to be drawn cur_p that is connected to the filter to store the coordinates of the pixel to be drawn, and the window index register cur_window that is connected to the filter to store the window index window_index, and the control index register cur_widget that is used to store the control index widget_index,
  • the bitmap index register cur_bitmap used to store the bitmap index bitmap_index.
  • Figure 21 illustrates the process of filtering bitmap primitives by a primitive filter in one embodiment.
  • the primitive filter is initialized, the index register mini_window61 is read, and the window index window_index1 of the first window layer primitive is saved in the window index register cur_window.
  • the filter first reads the window index register cur_window, and checks whether the area window_rect1 (x, y, w, h) of the first window layer primitive contains the pixel coordinate P (x, y) to be drawn. If not, continue to read the window index.
  • the register cur_window traverses other window layer primitives.
  • control index register cur_window uses the window index register cur_window to read the control index widget_index of the control layer primitive belonging to the current window layer primitive, and save it in the control index register cur_widget.
  • Use the control index register cur_widget Read the control area widget_rect (x, y, w, h) of all control layer primitives belonging to the current window layer primitive, and check one by one whether the control area widget_rect (x, y, w, h) contains the pixel coordinate P to be drawn.
  • control index register cur_widget to read the bitmap index of the bitmap layer primitive belonging to the current control layer primitive from the control primitive buffer bitmap_index, and stored in the bitmap index register cur_bitmap
  • bitmap index register cur_bitmap to read the clipping rectangle ClipRect of all bitmap layer primitives belonging to the current control layer primitive from the bitmap primitive buffer, and check the current bitmap one by one Whether the clipping rectangle ClipRect of the layer primitive contains the pixel coordinate P(x,y) to be drawn, if not, then traverse other bitmap layer primitives, if so, push the required layer data into the layer data register array BitmapList, and so on until all bitmap layer primitives, control layer primitives, and window layer primitives are filtered.
  • FIG 19 illustrates a structural block diagram of a depth processor of one embodiment.
  • the depth processor may include a plurality of depth value Z calculation units, a depth value Z comparator connected to the depth value Z calculation units, and a depth sorted list generator connected to the depth value Z comparator.
  • the depth value Z calculation unit obtains the filtered layer data stored in the layer data register array Bitmaplist, and calculates the depth value Z of each filtered bitmap primitive according to the preset calculation formula; where, the calculation formula
  • the depth value Z the layer number of the bitmap layer primitive bitmap_layer + the layer number of the control layer primitive to which the bitmap layer primitive belongs widget_layer ⁇ (maximum number of overlapping layers + 1) + the window layer to which the control layer primitive belongs The layer number of the primitive (window_layer) ⁇ (maximum number of overlapping layers + 1) ⁇ 2 .
  • the depth value Z comparator is used to compare the depth values Z of different bitmap primitives, and then the depth sorting list generator sorts the primitive parameter information of the bitmap primitives in order from small to large depth values Z to generate Depth sorted list and sent to the depth sort register array ZSortlist for storage.
  • the sorted depth sorting list ZSortlist represents the layer overlay display order from bottom to top of the pixels to be drawn.
  • the calculation formula of the depth value Z can ensure that all bitmap primitives are superimposed on the bitmap according to the larger bitmap layer serial number (bitmap_layer) between different bitmap layer primitives belonging to the same control layer primitive.
  • bitmap_layer bitmap layer serial number
  • width_layer control layer primitive with the larger control layer serial number
  • the command parser obtains the pixel synthesis command of one or more different bitmap primitives where the pixel to be drawn is located from the primitive cache area 60 of the second cache CACHE2, and sends the pixel synthesis command in the order of overlay display from bottom to top.
  • the pixel shader gets this pixel composition command from the primitive command register array.
  • Figure 20 illustrates a flow chart of the command parser obtaining a pixel synthesis command according to one embodiment.
  • the command parser obtains the bitmap index BitmapIndex[1...m] of each bitmap primitive from the depth sorting register array ZSortList, and then obtains the bitmap index BitmapIndex[1...m] from the second cache CACHE2’s primitive cache area based on the bitmap index BitmapIndex[1...m].
  • the pixel synthesis command PixelCommand[1...m] of each bitmap primitive is obtained and sent to the primitive command register array BitmapCommandList for storage.
  • the primitive command register array BitmapCommandList also stores the pixel synthesis command BitmapCmd[1...m] of each bitmap primitive in sequence in the order of ZSortList.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Controls And Circuits For Display Device (AREA)
  • Liquid Crystal Display Device Control (AREA)
  • Control Of Indicators Other Than Cathode Ray Tubes (AREA)
  • Image Generation (AREA)

Abstract

一种基于位图图元的图形处理系统,方法和GPU,通过将位图的数据信息分为图元数据和图元格式化数据,并分别存储在不同的可提供并行访问的SRAM中,GPU并行访问不同的SRAM获得图元数据和图元格式化数据,并针对不同图元格式的位图图元运用适配的处理方法,达到使用最小面积的SRAM且显示效果更好的目的,降低了硬件成本及系统功耗,无需DRAM显示缓存。

Description

基于位图图元的图形处理系统,方法和GPU 技术领域
本发明涉及计算机图形处理技术领域,尤其涉及基于位图图元的图形处理系统,方法和GPU。
背景技术
图形处理器GPU是一种可在多种不同智能计算设备(例如,计算机工作站,移动电话,嵌入式系统,个人计算机,平板计算机和视频游戏控制台)上运行绘图运算工作的微处理器。GPU主要用途是将智能计算设备所需要的显示信息进行转换驱动,并向显示器提供行扫描信号,控制显示器的正确显示。GPU中通常包括运算单元和存储单元,运算单元和存储单元越多,GPU的处理速度也越快,同时价格也更加昂贵。
技术问题
本发明所要解决的技术问题是提供一种基于位图图元的图形处理系统,方法和GPU,可用更低的成本达到更好的显示效果,  并且功耗显著降低。
技术解决方案
根据一实施例,一种基于位图图元的GPU,包括一个或多个子处理器,通过GPU总线连接该子处理器,并向该子处理器提供并行访问的第一高速缓存和第二高速缓存,以及连接该子处理器,该第一高速缓存和该第二高速缓存的时序发生器;该子处理器包括至少一个图元筛选器,至少一个深度处理器,至少一个命令解析器,多个图元处理器,至少一个控制该多个图元处理器工作的颜色处理器和至少一个像素着色器;其中,该颜色处理器根据待绘制像素所在的一个或多个位图图元的图元数据和图元格式化数据,启动一个或多个可处理该待绘制像素所在位图图元的图元格式的图元处理器工作,获得该一个或多个图元处理器生成的像素ARGB值,并向该像素着色器提供该像素ARGB值;该像素着色器根据待绘制像素所在的在一个或多个位图图元的像素合成命令和该像素ARGB值,计算得到待绘制像素的像素合成ARGB值,并向该时序发生器提供该像素合成ARGB值;该第一高速缓存配置有用以缓存该位图图元的图元数据的位图图元数据高速缓存区;该第二高速缓存配置有用以缓存该位图图元的图元格式化数据的图元高速缓存区;其中,该位图图元的图元数据至少包括记录该位图图元的图元格式的数据,记录该位图图元的ARGB值的数据和记录该位图图元的图元数据存储地址和大小的数据;该图元格式化数据至少包括用以标识该位图图元的对象索引数据,记录该位图图元区域位置和大小的数据,记录该位图图元与其他位图图元的图层叠加显示关系的数据,记录该位图图元与该其他位图图元的像素合成命令的数据。
根据另一实施例,一种基于位图图元的图形处理系统,包括CPU,和提供并行访问的第一静态随机存储器,第二静态随机存储器,第三静态随机存储器,以及与该CPU和该第一静态随机存储器,该第二静态随机存储器,该第三静态随机存储器连接的的GPU;该第二静态随机存储器配置有用以存储该位图图元的图元数据的位图图元数据缓冲区;该第三静态随机存储器配置有用以存储该位图图元的图元格式化数据的缓冲区。
根据另一实施例,一种基于位图图元的图形处理方法,包括将该位图图元的数据信息至少分为图元数据和图元格式化数据;该图元数据至少包括记录位图图元的图元格式的数据,记录位图图元的ARGB值的数据和记录位图图元的图元数据存储地址和大小的数据;该位图图元的图元格式化数据至少包括用以标识该位图图元的对象索引数据,记录该位图图元区域位置和大小的数据,记录该位图图元与其他位图图元的图层叠加显示关系的数据,记录该位图图元与该其他位图图元的像素合成命令的数据;将该图元数据和该图元格式化数据分别存储在存储器中的不同存储区域或不同的存储器中;向GPU提供对该图元数据和该图元格式化数据的并行访问。
有益效果
本发明将位图图元的数据信息分为图元数据和图元格式化数据,并分别存储在不同的可提供并行访问的SRAM中,使GPU可快速获得图元数据和图元格式化数据,并针对不同图元格式的位图图元运用适配的颜色处理方法,达到使用最小面积的SRAM且显示效果更好的目的,降低了硬件成本及系统功耗,无需DRAM显示缓存。
附图说明
图1是根据本发明一些实施例的汽车仪表UI界面的位图图元分层示例。
图2是其中一种实施例的基于单核GPU的图形处理系统的结构框图。
图3是其中一种实施例的基于多核GPU的图形处理系统的结构框图。
图4是其中一种实施例的基于位图图元的图形处理方法的简化流程图。
图5是根据本发明一些实施例的SRAM2中数据的一种示例。
图6是根据本发明一些实施例的SRAM2中数据的另一种示例。
图7是根据本发明一些实施例的SRAM3中数据的一种示例。
图8是根据本发明一些实施例的CACHE1中数据的一种示例。
图9 是根据本发明一些实施例的CACHE2中数据的一种示例。
图10是其中一种实施例的纯色图元处理器的结构框图。
图11是其中一种实施例的ARGB图元处理器的结构框图。
图12是其中一种实施例的线性梯度渐变图元处理器的结构框图。
图13是其中一种实施例的径向梯度渐变图元处理器的结构框图。
图14是其中一种实施例的通用图元处理器的结构框图。
图15是其中一种实施例的颜色处理器的结构框图。
图16是其中一种实施例的基于CrossBar可动态重构的矩阵电路的ColorReady连接节点示意图。
图17是其中一种实施例的像素着色器的结构框图。
图18是其中一种实施例的图元筛选器的结构框图。
图19是其中一种实施例的深度处理器的结构框图。
图20是其中一种实施例的命令解析器获取像素合成命令的流程图。
图21是其中一种实施例的图元筛选器筛选位图图元的流程图。
本发明的最佳实施方式
下面结合附图和实施例,对本发明作进一步详细说明。应当理解的是,此处所记录的具体实施方式仅用于说明和解释本发明,并不用于限制本发明。
在计算机图形中,主要包括两类图形,一类是位图图形,另一类是矢量图形。位图图形由像素组成,放大图形会失真,并且文件较大,但是可提供更真实更丰富的颜色显示效果;矢量图形由直线和曲线组成,放大不会失真并且文件较小,通常用作绘制icon图标。
图形处理系统通常包括CPU,GPU和存储器,存储器中存储有应用程序APP,图形API,GPU驱动程序和图形数据。CPU运行图形API和应用APP,并调用GPU驱动程序启动GPU运行,GPU读取存储器中的图形数据形成UI图形界面输出到显示器。为了达到更好的显示效果,完全以位图(bitmap)为图元来设计应用程序UI界面,往往会导致程序文件较大,对硬件配置的要求更高。
需要说明的是,本申请所说的“图元”是指图形元素,例如各种字形的文字,各种形状的符号和图标,色彩丰富或单一的背景图片等。本申请所说的“位图”是指基于bitmap的图形编码格式。本申请所说的“图层”包括图元的层级属性,图元的叠加显示层级,叠加显示时的位图图元三种含义。
本发明的基于位图图元的图形处理系统包括CPU,被封装在MCU中通过系统通信总线与CPU连接的静态随机存储器SRAM1,SRAM2,SRAM3,通过系统通信总线与CPU和SRAM1,SRAM2,SRAM3连接的图形处理器GPU。SRAM1配置有分别用以存储应用APP,图形API,GPU驱动程序的三个缓冲区(buffer)。GPU可以是只有一个子处理器的单核GPU或者是包括多个以并行方式操作的子处理器的多核GPU。图2示例了其中一种实施例的基于单核GPU的图形处理系统结构框图,图3示例了其中一种实施例的基于多核GPU的图形处理系统结构框图。GPU除了包含子处理器外,还包括通过GPU内部的通信总线(GPU总线),连接一个或多个子处理器的第一高速缓存CACHE1和第二高速缓存CACHE2。第一高速缓存CACHE1和第二高速缓存CACHE2也可是高速的静态随机存储器(SRAM)。相比动态随机存储器DRAM,SRAM具有读取速度快,功耗低的优点,但是价格也更加昂贵。Bitmap格式的位图图元的每个像素都分配有特定的坐标(x,y)和透明度及颜色ARGB值。每个像素的颜色信息由RGB表示,透明度用A表示。根据信息深度,可将位图分为1,4,8,16,24及32bit等。每个像素使用的信息位数越多,可用的颜色就越多,颜色表现就越逼真,相应的数据量也越大,也就需要更多的存储空间。
现有技术的嵌入式系统通常采用帧缓冲存储器(Frame Buffer)来处理Bitmap位图图元。帧缓冲存储器的每一存储单元对应屏幕上的一个像素,整个帧缓冲存储器对应一帧图像,是屏幕所显示画面的一个直接映象。这样一来,要达到较好的显示效果,就需要较大的存储空间,考虑到SRAM较为昂贵。现有技术通常采用较为便宜的DRAM存储器,但是DRAM存储器的功耗较大, 也导致设备复杂性提高。
图4示例了根据本发明一些实施例的基于位图图元的图形处理方法的简化流程图。该方法包括框70-72。在框70中,将位图图元的数据信息至少分为图元数据和图元格式化数据。其中,图元数据至少包括记录位图图元的图元格式的数据,记录位图图元的ARGB值的数据和记录位图图元的图元数据存储地址和大小的数据。图元格式化数据至少包括用以标识位图图元的对象索引数据,记录位图图元区域位置和大小的数据,记录位图图元与其他位图图元的图层叠加显示关系的数据,记录位图图元与其他位图图元的像素合成命令的数据。
在框71中,将图元数据和图元格式化数据分别存储在存储器中的不同存储区域或不同的存储器中。在一些实施例中,将SRAM2和SRAM3配置为分别存储图元数据和图元格式化数据,将第一高速缓存CACHE1和第二高速缓存CACHE2配置为分别读取SRAM2和SRAM3提供的数据。
在框72中,向GPU提供对图元数据和图元格式化数据的并行访问。GPU的子处理器并行访问SRAM2和SRAM3,并针对不同图元格式的位图图元运用适配的处理方法,达到使用最小面积的SRAM且处理速度更快,显示效果更好的目的,降低了硬件成本。
在一些实施例中,以上所讨论的框70-72可以根据实际需要以不同的次序执行。
另外,在一些实施例中,基于位图图元的图形处理方法还包括将图元划分为至少三个图层级别,分别为作为基本层级的窗口层图元,归属于窗口层图元的控件层图元和归属于控件层图元的位图层图元。一个二维UI图形界面包含一个或多个窗口层图元,每个窗口层图元包含一个或多个控件层图元,每个控件层图元包含一个或多个位图层图元。位图层图元为最小节点,窗口层图元为最大节点。窗口层图元(window)对应普通图形窗口,弹出窗口,对话窗口,悬浮窗口等,控件层图元(widget)对应按钮,滚动条,状态列表,编辑框,图片框等,位图层图元(bitmap)对应静态或动态的图片,文字,数字,图标等。
在一些实施例中,基于位图图元的图形处理方法还包括将位图图元至少分类为三种不同的图元格式,分别是包含相同颜色和相同透明度的纯色位图图元,包含不同颜色和相同或不同透明度的ARGB位图图元,以及包含相同或不同颜色和相同或不同透明度的字形位图图元。
如图5,图6的示例中解说的,SRAM2中配置有用以存储位图图元的图元数据的位图图元数据缓冲区30。
如图5的示例,纯色位图图元的图元数据301包括用以记录纯色位图图元的图元格式的数据(BitmapFormat),用以记录纯色位图图元的填充色ARGB值的数据,用以记录纯色位图图元的图元数据存储地址和大小的数据。ARGB位图图元的图元数据302包括用以记录ARGB位图图元的图元格式的数据(BitmapFormat),用以记录ARGB位图图元的多个像素ARGB值的数据310,用以记录ARGB位图图元的图元数据存储地址和大小的数据。ARGB位图图元按大小及格式可划分为32bitARGB位图图元,ARGB-C位图图元,24bit相同透明度ARGB位图图元,24bit不同透明度ARGB位图图元,16bitARGB位图图元,mono位图图元,palette位图图元,线性梯度渐变位图图元,径向梯度渐变位图图元,JPEG位图图元,PNG位图图元等。
其中,32bitARGB位图图元每个像素的透明度A和颜色分量R,G,B各用8bit来记录;24bit相同透明度ARGB位图图元每个像素的颜色分量R,G,B各用8bit来记录,透明度A默认为0xFF;24bit不同透明度ARGB位图图元每个像素的透明度A用8bit来记录,每个像素的颜色分量R和B用5bit来记录,每个像素的颜色分量G用6bit来记录;16bitARGB位图图元每个像素的颜色分量R和B用5bit来记录,每个像素的颜色分量G用6bit来记录,透明度A默认为0xFF;mono位图图元每个像素用1bit来记录颜色,透明度默认为0xFF;palette位图图元每个像素使用一个8bit索引记录其ARGB值,索引表为256项,可定义256种颜色。JPEG位图图元的每个像素均具有相同的透明度,颜色值来自JPEG图片解码器;PNG位图图元的每个像素的颜色及饱和度值来自PNG图片解码器。ARGB-C位图图元是指仅有少数有效像素(透明度值为非0),大部分为透明区域(透明度值为0)的位图图元。ARGB-C位图图元的图元数据304包括用以记录ARGB-C位图图元的图元格式的数据(BitmapFormat),用以记录ARGB-C位图图元的有效像素的数据,以及用以记录ARGB-C位图图元的图元数据存储地址和大小的数据。
用以记录ARGB-C位图图元的有效像素的数据是按行记录,包括一个或多个行数据块320,每个行数据块包括该有效像素行的首个有效像素的位置坐标,该有效像素行的有效像素个数,该有效像素行的按列排列的有效像素的ARGB值。
其中,首个行数据块321中记录了ARGB-C位图图元首行的首个有效像素的坐标,首行的有效像素的个数和首行的按列排列的有效像素[1…m]的ARGB值。
线性梯度渐变位图图元是指具有线性梯度渐变效果的位图图元。如图6的示例中解说的,线性梯度渐变位图图元的图元数据305包括用以记录线性梯度渐变位图图元的图元格式的数据(BitmapFormat),用以记录线性渐变起点坐标的数据,用以记录线性渐变终点坐标的数据,用以记录线性渐变起点ARGB的数据,用以记录线性渐变终点ARGB的数据,用以记录线性梯度渐变梯度计算公式的数据,用以记录线性梯度渐变位图图元的图元数据存储地址和大小的数据。
径向梯度渐变位图图元是指具有径向梯度渐变效果的位图图元。如图6的示例中解说的,径向梯度渐变位图图元的图元数据306包括用以记录径向梯度渐变位图图元的图元格式的数据(BitmapFormat),用以记录渐变区域圆中心点的坐标的数据,用以记录渐变区域圆内半径的数据,用以记录渐变区域圆外半径的数据,用以记录径向性梯度渐变计算公式的数据,用以记录径向梯度渐变位图图元的图元数据存储地址和大小的数据。JPEG位图图元的图元数据307包含用以记录JPEG位图的图元格式的数据(BitmapFormat),JPEG编码数据的存储区地址及码流字节长度。PNG位图的图元数据308包括用以记录PNG位图的图元格式的数据(BitmapFormat),PNG编码数据的存储区地址及码流字节长度。
字形位图图元的图元数据303包括用以记录字形位图图元的图元格式的数据,用以记录字形位图图元的字形ARGB值的数据,用以记录字形位图图元的字形轮廓的数据,用以记录字形位图图元的图元数据存储地址和大小的数据。字形位图图元可以是8bit字形位图图元。8bit字形位图图元每个像素使用8bit记录像素的透明度A,文字的颜色由字形ARGB值指定。
在SRAM2中存储一个32bit纯色位图图元的ARGB值,不论图形大小都只需要4个字节。针对只在少数区域存在有效像素,大部分区域为透明的icon图形,可使用采用基于位置的压缩编码格式的32bitARGB-C位图图元,其需要的存储空间更少,例如,一个10×10的icon图形,有效像素为30个,其余70个像素均为透明部分,那么采用32bitARGB-C位图图元只需120个字节存储有效像素的ARGB值,透明区域无需存储空间。针对UI界面中具有线性梯度渐变或径向梯度渐变效果的控件层,可采用线性梯度渐变位图图元和径向梯度渐变位图图元,只需要存储起点,始点的坐标及ARGB值,以及计算公式即可,无需存储每个像素的ARGB值,节省了较多存储空间。
将上述不同格式的位图图元,用以适配应用程序的不同类型的图形元素,可有效降低SRAM存储空间,并减少GPU读取时间,同时可达到更为真实,色彩更为丰富的显示效果。
如图8的示例中解说的,第一高速缓存CACHE1中分配有位图图元数据高速缓存区50,用以读取SRAM2中位图图元数据缓冲区30的数据并缓存。
位图图元数据高速缓存区50中包括多个数据块,每个数据块存储一个位图图元的图元数据。若待绘制的像素涉及6个不同图层的位图图元,可分别在6个数据块501,502,503,504,505,506中存储该6个不同图层的位图图元的图元数据。
如图7的示例中解说的,SRAM3中配置有用以保存多个窗口层图元的图元格式化数据的窗口层缓冲区40,用以保存多个控件层图元的图元格式化数据的控件层缓冲区41,和用以保存多个位图层图元的图元格式化数据的位图层缓冲区42。
窗口层缓冲区40被分成多个窗口层图元格式化数据块401。一个窗口层位图图元格式化数据块409存储有一个窗口层图元的图元格式化数据,包括用以标识该窗口层图元(如对话窗,悬浮窗,消息窗,常规窗口等)的对象索引数据(窗口索引window_index),用以记录窗口层图元的图元格式的数据(图元格式BitmapFormat),用以记录窗口层图元区域位置和大小的数据(窗口区域window_rect(x,y,w,h)),用以记录归属窗口层图元的控件层图元个数的数据(控件层图元数量),用以标识归属窗口层图元的控件层图元的对象索引数据(控件索引widget_index),用以标记窗口层图元与其他窗口层图元的图层叠加显示关系的图层序号(窗口图层序号window_layer),用以记录窗口层图元与其他窗口层图元的像素合成命令的数据(像素合成命令window_pixel_cmd),以及用以标识其他窗口层图元的对象索引数据(兄弟窗口索引sibling_window_index)。通过兄弟窗口索引sibling_window_index,GPU可遍历所有窗口层图元。
控件层缓冲区41被分成多个控件层图元格式化数据块411。一个控件层图元格式化数据块419存储有一个控件层图元的图元格式化数据,包括用以标识与该控件层图元(如按钮,滚动条,状态列表,编辑框,图片框等)的对象索引数据(控件索引widget_index),用以记录该控件层图元的图元格式的数据(图元格式BitmapFormat),用以记录控件层图元区域的位置和大小的数据(控件区域widget_rect(x,y,w,h)),用以标识归属控件层图元的位图层图元的对象索引数据(位图索引bitmap_index),用以记录归属控件层图元的位图层图元个数的数据(位图层图元数量),用以标记控件层图元与属于同一窗口层图元的其他控件层图元的图层叠加显示关系的图层序号(控件图层序号widget_layer),用以记录控件层图元与其他控件层图元的像素合成命令的数据(像素合成命令widget_pixel_cmd),用以标识控件层图元所属窗口层图元的对象索引数据(窗口索引window_index),以及用以标识属于同一窗口层图元的其他控件层图元的对象索引数据(兄弟控件索引sibling_widget_index)。通过兄弟控件索引sibling_widget_index,GPU可遍历同一窗口层图元下所有控件层图元。
位图层缓冲区42被分成多个位图层图元格式化数据块421。一个位图层图元格式化数据块429存储有一个位图层图元的图元格式化数据,包括,用以记录该位图层图元的图元格式的数据(图元格式BitmapFormat),用以标识该位图层图元所属控件层图元的对象索引数据(控件索引widget_index),用以记录位图层图元包络矩形的位置和大小的数据(位图包络矩形DispRect(x,y,w,h)),用以记录位图层图元裁剪矩形的位置和大小的数据(位图裁剪矩形ClipRect(x,y,w,h)),用以记录位图层图元的图元数据存储器位置的数据,用以标记位图层图元与属于同一控件层图元的其他位图层图元的图层叠加显示关系的图层序号(位图图层序号bitmap_layer),用以记录位图层图元与其他位图层图元的像素合成命令的数据(像素合成命令bitmap_pixel_cmd),以及用以标识属于同一控件层图元的其他位图层图元的对象索引数据(兄弟位图索引sibling_Bitmap_index)。通过兄弟位图索引sibling_Bitmap_index,GPU可遍历同一控件层图元下所有位图层图元。
如图9的示例中解说的,第二高速缓存CACHE2中配置有图元高速缓存区60,用以读取SRAM3中窗口层缓冲区40,控件层缓冲区41和位图层缓冲区42的图元格式化数据并缓存。
图元高速缓存区60中包括多个数据块,每个数据块存储不同位图图元的图元格式化数据。若待绘制的像素涉及6个不同的位图图元,可分别在6个数据块601,602,603,604,605,606中存储6个位图图元的图元格式化数据。
在屏幕中同一区域内叠加显示多个位图图元时,按照位图层图元置于控件层图元上面,控件层图元置于窗口层图元上面的图层叠加显示关系来定义各个位图图元的初始化的图层序号,显示越在上面的位图图元的图层序号越大,越在下面的,图层序号越小。
属于同一控件层图元的不同位图层图元之间叠加显示时,在上面的位图层图元的图层序号大,在下面的位图层图元的图层序号小。属于同一窗口层图元的不同控件层图元之间叠加显示时,在上面的控件层图元的图层序号大,在下面的控件层图元的图层序号小。
属于不同控件层图元的位图层图元叠加显示时,按控件层图元的图层序号大的显示在上面,控件层图元的图层序号小的显示在下面的原则进行叠加显示。属于不同窗口层图元的控件层图元叠加显示时,按窗口层图元的图层序号大的显示在上面,窗口层图元的图层序号小的显示在下面的原则进行叠加显示。
当发生某一事件,使应用APP产生新增事件,需要GPU在原有图层上面新增一个图层以显示新增事件时,GPU按新增图层叠加在原有图层上面的显示规则,设置新增图层的图层序号大于原有的图层。
如图1示例的一种汽车LCD仪表显示屏100。在显示屏100中,初始化显示的二维UI界面划分成窗口层图元1和窗口层图元2,窗口层图元1包括控件层图元11,控件层图元12,控件层图元13,控件层图元11又包括位图层图元111,控件层图元12又包括位图层图元121,位图层图元122,位图层图元123,位图层图元124,控件层图元13又包括位图层图元131。窗口层图元2包括控件层图元21,控件层图元22;控件层图元21又包括位图层图元211,位图层图元212,位图层图元213;控件层图元22又包括位图层图元221,位图层图元222。
按照窗口层,控件层,位图层逐级叠加的显示规则,属于窗口层图元1的控件层图元11,控件层图元12,控件层图元13,显示在窗口层图元1的上面,位图层图元111叠加在控件层图元11的上面,同属于控件层图元12的位图层图元121,位图层图元122,位图层图元123,位图层图元124叠加在控件层图元12的上面,位图层图元131叠加在控件层图元13的上面,窗口层图元1的显示层级在最下面。
同样的,同属于窗口层图元2的控件层图元21和控件层图元22在中间,位图层图元211,位图层图元212,位图层图元213,位图层图元221,位图层图元222在上面,窗口层图元2在最下面。再根据同属于控件层图元21的位图层图元的图层序号大小,将图层序号大的位图层图元212和位图层图元213叠加在位图层211上显示。
当出现新增告警事件“前方事故多发地段,注意危险!”时,需要在原有控件层图元22中新增位图层图元223和位图层图元224,位图层图元223需要在原有位图层图元222上面叠加显示,位图层图元224要在原有控件层图元22上面叠加显示。则,GPU遍历当前待显示的位图层图元223和位图层图元224的区域内是否存在同属于相同控件层图元22的原有的位图层图元,若有则在原有位图层图元的图层序号bitmap_layer基础上加1作为新增位图层图元的位图图层序号,若无则新增位图层图元的位图图层序号为0。
假设,当前待显示像素为图1所示的P(200,100)时,包含该像素的位图图元从上至下依次包括位图层图元223,位图层图元222,控件层图元22,窗口层图元2。GPU将依据记录了窗口层,控件层,位图层逐级叠加,以及新增位图层图元叠加在原有位图层图元上面的图层叠加显示关系的图层序号,为这些位图图元计算深度值Z,按深度值Z从小到大排序,并依序分配相应的颜色通道,以及依据图元格式分配相应的图元处理器来处理,再根据各个位图图元相应的像素合成命令,最终生成具有叠加显示效果的像素合成ARGB值,再输出给显示器。
在一些实施例中,像素合成命令可以是依据PORTER-DUFF图像合成规则或其他图像合成规则而设定的。深度值Z大的位图图元叠加在深度值Z小的位图图元上面显示。
如图2的示例中解说的,单核GPU内封装有一个生成及输出像素合成ARGB值的子处理器,和通过GPU总线连接子处理器的第一高速缓存CACHE1,第二高速缓存CACHE2,和通过GPU总线连接子处理器和第一高速缓存CACHE1,第二高速缓存CACHE2的时序发生器。如图3的示例中解说的,多核GPU内封装有多个生成及输出像素合成ARGB值的子处理器,和通过GPU总线连接多个子处理器的第一高速缓存CACHE1,第二高速缓存CACHE2,和通过GPU总线连接多个子处理器和第一高速缓存CACHE1,第二高速缓存CACHE2的时序发生器。每个子处理器的工作原理相同,但是处理的像素不同。若采用一个包含200个子处理器的多核GPU处理图1的示例,则多核GPU可分配子处理器1处理像素P(200,100),子处理器2处理像素P(200,101),子处理器3处理像素P(200,102)…,200个子处理器可同时处理200个像素。
初始化工作时,CPU调用SRAM1中存储的应用APP的程序数据,启动应用APP,调用图形API和 GPU驱动程序的程序数据,启动GPU驱动程序。GPU驱动程序初始化GPU,将应用APP的UI元素分解为窗口层图元的图元格式化数据,控件层图元的图元格式化数据和位图层图元的图元格式化数据,并分别保存至SRAM3的窗口层缓冲区40,控件层缓冲区41和位图层缓冲区42,并启动GPU工作,以及将应用APP的图元数据存入SRAM2的位图图元数据缓冲区30中。
第二高速缓存CACHE2根据子处理器的绘制当前像素所需图元格式化数据的需求,通过系统通信总线从SRAM3的窗口层缓冲区40,控件层缓冲区41和位图层缓冲区42分别读取图元格式化数据,并发送给子处理器。第一高速缓存CACHE1根据子处理器的绘制当前像素所需图元数据的需求,通过系统通信总线从SRAM2的位图图元数据缓冲区30读取位图图元数据,并发送给子处理器。第一高速缓存CACHE1和第二高速缓存CACHE2是并行读取数据。
在一些实施例中,GPU的子处理器包括根据时序发生器生成的待绘制像素,读取第二高速缓存CACHE2并筛选出在位图图元的区域内包含待绘制像素的所有位图图元的图元筛选器,保存筛选后的位图图元的图层数据的图层数据寄存器阵列BitmapList,计算筛选后的所有位图图元的深度值Z并将筛选后的所有位图图元的图元参数信息按深度值Z排序的深度处理器,保存排序后的所有位图图元的图元参数信息的深度排序寄存器阵列ZSortList,根据排序后的图元参数信息读取第二高速缓存CACHE2中相应位图图元的像素合成命令的命令解析器,保存排序后的位图图元的像素合成命令的图元命令寄存器阵列,读取第一高速缓存CACHE1的数据并按图元格式BitmapFormat划分的多个可生成像素ARGB值的图元处理器,读取第一高速缓存CACHE1的数据并根据排序后的位图图元依序调度颜色通道和根据图元格式BitmapFormat调度相应的图元处理器共同生成待绘制像素在一个或多个位图图元的ARGB值的颜色处理器,根据待绘制像素在一个或多个位图图元的ARGB值和像素合成命令生成并输出待绘制像素的像素合成ARGB值的像素着色器。
图层数据可包括位图索引bitmap_index,图元格式bitmapformat,窗口索引window_index,控件索引widget_index,位图图层序号bitmap_layer, 控件图层序号widget_layer, 窗口图层序号window_layer 等。GPU包括多个图元处理器,并且每种图元格式BitmapFormat至少配置一个图元处理器。图元处理器的数量将影响图形刷新显示的速度。
图10示例了其中一种用以处理纯色位图图元的纯色图元处理器结构框图。纯色图元处理器可包括接收颜色处理器发送的配置参数并保存的图元参数寄存器P1和连接图元参数寄存器P1的寻址/计算单元S1。
图元参数寄存器P1可包括用以存储多种工作命令(例如图元处理器参数设置,启动图元处理器工作,停止图元处理器工作,暂停图元处理器工作等命令)的命令寄存器R0a,用以存储图元格式BitmapFormat的图元格式寄存器R1a,用以存储图元区域坐标的图元区域坐标寄存器R2a,用以存储待绘制像素坐标的像素坐标寄存器R3a,用以存储待绘制像素ARGB值的像素颜色寄存器R4a,用以存储填充色ARGB值的图元颜色寄存器F1a。
由于纯色位图图元的所有像素的ARGB值是一样的,所以,像素颜色寄存器R4a的待绘制像素ARGB值就等于图元颜色寄存器F1a的填充色ARGB值。寻址/计算单元S1无需计算直接将图元颜色寄存器F1a的填充色ARGB值写入像素颜色寄存器R4a,再向颜色处理器输出表示颜色已完成的ColorReady信号。
图11示例了其中一种用以处理ARGB位图图元的ARGB图元处理器结构框图,ARGB图元处理器可包括通过GPU总线从第一高速缓存CACHE1获得ARGB位图图元的图元数据并保存的ARGB高速缓存A2,接收颜色处理器发送的配置参数并保存的图元参数寄存器P2,连接图元参数寄存器P2和ARGB高速缓存A2的寻址/计算单元S2,以及连接图元参数寄存器P2和寻址/计算单元S2的像素颜色缓存C2。寻址/计算单元从ARGB高速缓存A2和图元参数寄存器P2获取数据并依据预置的计算公式计算出待绘制像素ARGB值。
图元参数寄存器P2可包括命令寄存器R0b,图元格式寄存器R1b,图元区域坐标寄存器R2b,像素坐标寄存器R3b,用以存储待绘制像素的ARGB值的像素颜色寄存器R4b,用以存储ARGB位图图元首个像素的图元数据存储地址的数据存储地址寄存器F1b,用以存储ARGB位图图元的行字节长度的数据行字节长度寄存器F2b。
其中,命令寄存器R0b,图元格式寄存器R1b,图元区域坐标寄存器R2b,像素坐标寄存器R3b,像素颜色寄存器R4b与上述图10示例的纯色图元处理器的命令寄存器R0a,图元格式寄存器R1a,图元区域坐标寄存器R2a,像素坐标寄存器R3a,像素颜色寄存器R4a的功能相同。
寻址/计算单元S2读取图元区域坐标寄存器R2b,像素坐标寄存器R3b和数据存储地址寄存器F1b和数据行字节长度寄存器F2b的数据,并依据预置的计算公式,计算出当前ARGB位图图元在待绘制像素坐标处的ARGB值在ARGB高速缓存中的存储地址,然后根据该存储地址从ARGB高速缓存中读取ARGB值,发送到像素颜色寄存器R4b,并根据需要将待绘制像素ARGB值与待绘制像素坐标一起发送到像素颜色缓存C2,再向颜色处理器输出表示颜色已完成的ColorReady信号。其中,该预置的计算公式包括计算待绘制像素的坐标的步骤,和根据待绘制像素的坐标计算待绘制像素的ARGB值在ARGB高速缓存中的存储地址的步骤。
像素颜色缓存C2中存储有一个或多个像素的坐标XY和ARGB值。
图12示例了其中一种线性梯度渐变位图图元的图元处理器结构框图。线性梯度渐变图元处理器包括接收颜色处理器发送的配置参数并保存的图元参数寄存器P3,连接该图元参数寄存器P3的寻址/计算单元S3,以及连接该图元参数寄存器P3和该寻址/计算单元S3的像素颜色缓存C3;寻址/计算单元S3从图元参数寄存器P3获取数据并依据预置的计算公式,计算出待绘制像素的ARGB值。
图元参数寄存器P3可包括命令寄存器R0c,图元格式寄存器R1c,图元区域坐标寄存器R2c,像素坐标寄存器R3c,用以存储像素颜色缓存发送的待绘制像素的ARGB值的像素颜色寄存器R4c,用以存储当前线性梯度渐变位图图元的线性渐变开始颜色ARGB值的开始颜色寄存器F1c,用以存储当前线性梯度渐变位图图元的线性渐变结束颜色ARGB值的结束颜色寄存器F2c,用以存储当前线性梯度渐变位图图元的线性渐变起点坐标的开始坐标寄存器F3c,用以存储当前线性梯度渐变位图图元的线性渐变终点坐标的结束坐标寄存器F4c。
其中,命令寄存器R0c,图元格式寄存器R1c,图元区域坐标寄存器R2c,像素坐标寄存器R3c,像素颜色寄存器R4c与上述图10示例的纯色图元处理器的命令寄存器R0a,图元格式寄存器R1a,图元区域坐标寄存器R2a,像素坐标寄存器R3a,像素颜色寄存器R4a的功能相同。
寻址/计算单元S3读取图元参数寄存器P3的数据,并依据预置的计算公式,计算出待绘制像素的ARGB值,发送到像素颜色寄存器R4c,并根据需要将待绘制像素ARGB值与待绘制像素坐标一起发送到像素颜色缓存C3,再向颜色处理器输出表示颜色已完成的ColorReady信号。该预置的计算公式为该线性梯度渐变位图的图元数据中包含的线性梯度渐变计算公式。
像素颜色缓存C3中存储有一个或多个像素的坐标XY和ARGB值。
图13示例了其中一种径向梯度渐变位图图元的图元处理器结构框图。径向梯度渐变图元处理器可包括接收颜色处理器发送的配置参数并保存的图元参数寄存器P4,连接图元参数寄存器P4的寻址/计算单元S4,以及连接图元参数寄存器P4和寻址/计算单元S4的像素颜色缓存C4。
图元参数寄存器可包括命令寄存器R0d,图元格式寄存器R1d,图元区域坐标寄存器R2d,像素坐标寄存器R3d,像素颜色寄存器R4d,用以存储径向渐变开始颜色ARGB值的开始颜色寄存器F1d,用以存储径向渐变结束颜色ARGB值的结束颜色寄存器F2d,用以存储渐变区域圆中心点坐标的圆中心坐标寄存器F3d,用以存储渐变区域圆内半径的圆内半径寄存器F4d,用以存储渐变区域圆外半径的圆外半径寄存器F5d。
其中,命令寄存器R0d,图元格式寄存器R1d,图元区域坐标寄存器R2d,像素坐标寄存器R3d,像素颜色寄存器R4d与上述图10示例的纯色图元处理器的命令寄存器R0a,图元格式寄存器R1a,图元区域坐标寄存器R2a,像素坐标寄存器R3a,像素颜色寄存器R4a的功能相同。
寻址/计算单元S4读取图元参数寄存器P4的数据,并依据预置的计算公式,计算出待绘制像素ARGB值,发送到像素颜色寄存器R4d,并根据需要将待绘制像素ARGB值与待绘制像素坐标一起发送到像素颜色缓存C4,再向颜色处理器输出表示颜色已完成的ColorReady信号。该预置的计算公式为该径向梯度渐变位图的图元数据中包含的径向梯度渐变计算公式。
像素颜色缓存C4中存储有一个或多个像素的坐标XY和ARGB值。
图14示例了其中一种通用图元处理器结构框图。通用图元处理器可处理多种不同图元格式的位图图元,其按照颜色处理器发送的图元格式BitmapFormat,来执行相应的处理程序。
通用图元处理器可包括通过GPU总线从第一高速缓存CACHE1获得待绘制位图图元的图元数据并保存的ARGB高速缓存A5,接收颜色处理器发送的配置参数并保存的图元参数寄存器P5,连接图元参数寄存器和ARGB高速缓存的寻址/计算单元S5,以及连接图元参数寄存器和寻址/计算单元的像素颜色缓存C5。
图元参数寄存器中包括命令寄存器R0e,图元格式寄存器R1e,图元区域坐标寄存器R2e,像素坐标寄存器R3e,像素颜色寄存器R4e和用以存储多种图元格式的计算待绘制像素ARGB值所需的计算参数的格式专用寄存器F1e,F2e,…Fne。
根据一些实施例,格式专用寄存器可以包括用以存储填充色ARGB值的寄存器,用以存储ARGB位图图元首个像素的图元数据存储地址的数据存储地址寄存器,用以存储ARGB位图图元的行字节长度的数据行字节长度寄存器,用以存储线性渐变开始颜色ARGB值的开始颜色寄存器,用以存储线性渐变结束颜色ARGB值的结束颜色寄存器,用以存储线性渐变起点坐标的开始坐标寄存器,用以存储线性渐变终点坐标的结束坐标寄存器,用以存储径向渐变开始颜色ARGB值的开始颜色寄存器,用以存储径向渐变结束颜色ARGB值的结束颜色寄存器,用以存储径向渐变起点坐标的开始坐标寄存器,用以存储径向渐变终点坐标的结束坐标寄存器,用以存储渐变区域圆中心点坐标的圆中心坐标寄存器,用以存储渐变区域圆内半径的圆内半径寄存器,用以存储渐变区域圆外半径的圆外半径寄存器。
图元参数寄存器P5接收颜色处理器发送的配置参数(包括图元格式BitmapFormat,命令参数,图元数据等),寻址/计算单元S5根据图元格式,读取格式专用寄存器F1e,F2e,…Fne中相应的计算参数,并根据计算需要读取ARGB高速缓存A5的数据,再依据预置的计算公式,计算出待绘制像素ARGB值,发送到像素颜色寄存器R4e,并根据需要将待绘制像素ARGB值与待绘制像素坐标一起发送到像素颜色缓存C5,再向颜色处理器输出表示颜色已完成的ColorReady信号。
在一些实施例中,GPU可同时包含通用图元处理器和纯色图元处理器,ARGB图元处理器,字形图元处理器,线性梯度渐变图元处理器,径向梯度渐变图元处理器等不同类型的图元处理器。
为了效率,第一高速缓存CACHE1和第二高速缓存CACHE2可配置成并行处理的SRAM存储器,以使子处理器可同时获得图元数据和图元格式化数据。另外,多个图元处理器也可配置成并行处理方式,以提高处理速度。
图15示例了其中一种颜色处理器的结构框图。颜色处理器包括图元像素颜色处理电路和可动态重构的矩阵电路。
根据待绘制像素所在的一个或多个位图图元的图元格式和对象索引数据,图元像素颜色处理电路从第一高速缓存CACHE1的位图图元数据高速缓存区50读取具有该图元格式的位图图元的图元数据,从第二高速缓存CACHE2的图元高速缓存区60读取具有该对象索引数据的位图图元的图元格式化数据,并向当前空闲的一个或多个图元处理器的图元参数寄存器发送启动工作所需的配置参数,并发送命令启动图元处理器工作,以使图元处理器生成像素ARGB值。
可动态重构的矩阵电路,按待绘制像素所在的一个或多个位图图元的叠加显示关系,从下至上依次匹配一个或多个颜色通道与该一个或多个图元处理器建立连接,获得该图元处理器的表示颜色已完成的信号,再读取该图元处理器的像素颜色寄存器获得像素ARGB值,并输出。
图元像素颜色处理电路可根据深度排序寄存器阵列中存储的所有已按深度值Z排序的ZSortList排序列表中记录的待绘制像素的图元参数信息(图元参数信息包括但不限于:位图索引bitmap_index,图元格式bitmapFormat),从第一高速缓存CACHE1的位图图元数据高速缓存区50和第二高速缓存CACHE2的图元高速缓存区60获取相应的位图图元的图元数据和图元格式化数据,并启动适配的图元处理器工作,同时控制可动态重构的矩阵电路按ZSortList排序列表从下至上,依次将颜色通道1,颜色通道2…颜色通道n与处理bitmap_index1位图的图元处理器1,处理bitmap_index2位图的图元处理器2…处理bitmap_indexn位图的图元处理器k,建立连接。基于每次待绘制像素的ZSortList排序列表不同,颜色通道和图元处理器之间为动态重构的连接关系。
图16示例了其中一种实施例的基于CrossBar可动态重构的矩阵电路的ColorReady连接节点示意图。GPU包含了13个用以处理13种不同图元格式的图元处理器(Unit1~Unit13),和9个用以处理9个叠加显示位图图元的颜色通道(ch1~ch9)。
假设当前ZSortList排序列表中记录了6个图层的位图图元参数信息(即6条位图图元参数信息),则可动态重构的矩阵电路将颜色通道ch1与处理第一层位图图元bitmap_index1的图元处理器Unit1建立ColorReady连接,将颜色通道ch2与处理第二层位图图元bitmap_index2的图元处理器Unit3建立ColorReady连接,将颜色通道ch3与处理第三层位图图元bitmap_index3的图元处理器Unit12建立ColorReady连接,将颜色通道ch4与处理第四层位图图元bitmap_index4的图元处理器Unit9建立ColorReady连接,将颜色通道ch5与处理第五层位图图元bitmap_index5的图元处理器Unit6建立ColorReady连接,将颜色通道ch6与处理第六层位图图元bitmap_index6的图元处理器Unit5建立ColorReady连接。
具体的,根据第一层位图图元bitmap_index1的图元格式bitmapformat1,颜色处理器配置图元处理器Unit1处理第一层位图图元bitmap_index1;根据第二层位图图元bitmap_index2的图元格式bitmapformat2,颜色处理器配置图元处理器Unit3处理第二层位图图元bitmap_index2;根据第三层位图图元bitmap_index3的图元格式bitmapformat3,颜色处理器配置图元处理器Unit12处理第三层位图图元bitmap_index3;根据第四层位图图元bitmap_index4的图元格式bitmapformat4,颜色处理器配置图元处理器Unit9处理第四层位图图元bitmap_index4;根据第五层位图图元bitmap_index5的图元格式bitmapformat5,颜色处理器配置图元处理器Unit6处理第五层位图图元bitmap_index5;根据第六层位图图元bitmap_index6的图元格式bitmapformat6,颜色处理器配置图元处理器Unit5处理第六层位图图元bitmap_index6。
图元处理器Unit1,Unit3,Unit12,Unit9,Unit6,Unit5处理完成后,发出ColorReady信号,颜色通道ch1~6收到此信号后,从图元处理器Unit1,Unit3,Unit12,Unit9,Unit6,Unit5获得ARGB值,再发送给像素颜色寄存器。像素颜色寄存器保存颜色通道ch1~6发送的ARGB值,形成ARGB颜色数组ChannelColor[123456]。
然后,像素着色器读取像素颜色寄存器存储的ChannelColor[123456]和图元命令寄存器阵列存储的像素合成命令BitmapCmd[123456],计算出最终叠加显示的像素合成ARGB值,发送到行像素颜色缓冲区存储。
像素着色器可包括多个依序串联的像素计算单元,和连接首个像素计算单元的初始颜色寄存器。初始颜色寄存器存储初始颜色ARGB值;首个像素计算单元读取初始颜色ARGB值和颜色处理器提供的待绘制像素在第一层位图图元的像素ARGB值,将两者按照该待绘制像素在第一层位图图元定义的像素合成命令,计算出第一个像素合成ARGB值。
第n个像素计算单元读取前一个像素计算单元计算出的像素合成ARGB值和该颜色处理器提供的待绘制像素在第n层位图的像素ARGB值,将两者按照该待绘制像素在第n层位图图元定义的像素合成命令,计算出第n个像素合成ARGB值;其中,n为大于1的自然数。n的最大值是待绘制像素涉及的位图图元个数。若待绘制的像素涉及6个位图图元,则n的最大值为6。
在一些实施例中,待绘制像素在第一层位图图元的像素ARGB值是通过颜色处理器的颜色通道1获得的。待绘制像素在第n层位图图元的像素ARGB值,通过颜色处理器的颜色通道n获得的。
图17示例了其中一种实施例的像素着色器的结构框图。初始颜色寄存器存储初始颜色PixelColor0,ARGB值(0,0,0,0)。首个像素计算单元F(bitmapCmd1),将PixelColor0和颜色通道1的像素ARGB值ChannelColor[1],按照该位图图元的像素合成命令BitmapCmd[1]的计算公式,计算出像素合成ARGB值PixelColor1。像素计算单元2将PixelColor1和颜色通道2的ChannelColor[2],按照该位图图元的像素合成命令BitmapCmd[2]的计算公式,计算出PixelColor2…直至完成待绘制像素的所有颜色通道的像素合成,向行像素缓冲区输出最终合成后的像素合成ARGB值。
时序发生器从行像素缓冲区逐点读取每个像素的ARGB值并输出到显示器,然后,在下一行场同步信号到来时,再启动下一次行扫描和下一帧图像的处理,向子处理器发送下一次待绘制像素的坐标值。
如图9的示例,为了效率,第二高速缓存CACHE2中设置有用以存储首个窗口层图元的窗口索引window_index1611的索引寄存器61(mini_window)。图元筛选器响应当前行场同步信号,获得待绘制像素坐标,再读取索引寄存器61(mini_window),获得首个窗口层图元的窗口索引window_index1,从首个窗口层图元开始逐一遍历窗口层图元,控件层图元,以及位图层图元的图元格式化数据,将位图图元区域内包含待绘制像素坐标的位图图元的图层数据,发送到图层数据寄存器阵列BitmapList存储。图层数据至少包括对象索引数据,图元格式和图层序号。
图18示例了其中一种实施例的图元筛选器的结构框图。图元筛选器可包括用以通过GPU总线访问第二高速缓存CACHE2获取首个窗口层图元的窗口索引window_index1和图元格式化数据且筛选出待绘制像素所在位图图元的图层数据的筛选器,连接筛选器的用以存储待绘制像素坐标的待绘制像素寄存器cur_p,以及连接筛选器的用以存储窗口索引window_index的窗口索引寄存器cur_window,用以存储控件索引widget_index的控件索引寄存器cur_widget,用以存储位图索引bitmap_index的位图索引寄存器cur_bitmap。
图21示例了其中一种实施例的图元筛选器筛选位图图元的流程。图元筛选器初始化,读取索引寄存器mini_window61,将首个窗口层图元的窗口索引window_index1保存在窗口索引寄存器cur_window。筛选器首先读取窗口索引寄存器cur_window,检查首个窗口层图元的区域window_rect1(x,y,w,h)是否包含待绘制像素坐标P(x,y),如无,继续读取窗口索引寄存器cur_window遍历其他窗口层图元,如有,使用窗口索引寄存器cur_window读取归属于当前窗口层图元的控件层图元的控件索引widget_index,并保存在控件索引寄存器cur_widget中,使用控件索引寄存器cur_widget读取归属于当前窗口层图元的所有控件层图元的控件区域widget_rect(x,y,w,h),逐一检查控件区域widget_rect(x,y,w,h)是否包含待绘制像素坐标P(x,y),如无,继续遍历其他控件层图元,如有,使用控件索引寄存器cur_widget从控件图元缓冲区读取归属于当前控件层图元的位图层图元的位图索引bitmap_index,并保存在位图索引寄存器cur_bitmap中,使用位图索引寄存器cur_bitmap从位图图元缓冲区读取归属于当前控件层图元的所有位图层图元的裁剪矩形ClipRect,逐一检查当前位图层图元的裁剪矩形ClipRect是否包含待绘制像素坐标P(x,y),如无,再遍历其他位图层图元,如有,将所需图层数据压入到图层数据寄存器阵列BitmapList,如此直至所有位图层图元、控件层图元、窗口层图元均筛选完毕。
图19示例了其中一种实施例的深度处理器的结构框图。深度处理器可包括多个深度值Z计算单元,和连接深度值Z计算单元的深度值Z比较器,和连接深度值Z比较器的深度排序列表生成器。
深度值Z计算单元获取图层数据寄存器阵列Bitmaplist中存储的筛选后的图层数据,按照预置的计算公式,计算出每个筛选后的位图图元的深度值Z;其中,该计算公式为深度值Z=位图层图元的图层序号bitmap_layer+该位图层图元所属控件层图元的图层序号widget_layer×(最大可叠加层数+1)+该控件层图元所属窗口层图元的图层序号(window_layer)×(最大可叠加层数+1)^ 2
再通过深度值Z比较器,比较不同位图图元的深度值Z的大小,然后深度排序列表生成器按深度值Z从小到大的顺序将位图图元的图元参数信息依次排序,生成深度排序列表,并发送到深度排序寄存器阵列ZSortlist存储。
按照深度值Z大的位图图元叠加在深度值Z小的位图图元上面的原则,排序后的深度排序列表ZSortlist表示了待绘制像素从下到上的图层叠加显示顺序。
深度值Z的计算公式可以保证,所有位图图元均依照属于同一控件层图元的不同位图层图元之间位图图层序号(bitmap_layer)大的位图层图元叠加在位图图层序号小的位图层图元上面,属于同一窗口层图元的不同控件层图元之间控件图层序号(widget_layer)大的控件层图元叠加在控件图层序号小的控件层图元上面的图层叠加显示关系进行叠加显示。
命令解析器从第二高速缓存CACHE2的图元高速缓存区60获取待绘制像素所在的一个或多个不同位图图元的像素合成命令,并按从下至上叠加显示的顺序,发送像素合成命令给图元命令寄存器阵列存储。像素着色器从图元命令寄存器阵列获取该像素合成命令。
图20示例了其中一种实施例的命令解析器获取像素合成命令的流程图。命令解析器从深度排序寄存器阵列ZSortList获取每个位图图元的位图索引BitmapIndex[1…m],再根据位图索引BitmapIndex[1…m]从第二高速缓存CACHE2的图元高速缓存区60中获取每个位图图元的像素合成命令PixelCommand[1…m],并发送给图元命令寄存器阵列BitmapCommandList存储。
图元命令寄存器阵列BitmapCommandList也按ZSortList的顺序依次存储每个位图图元的像素合成命令BitmapCmd[1…m]。
以上举较佳实施例,对本发明的目的,技术方案和优点进行了进一步详细说明,所应理解的是,以上该实施例仅为了使本领域技术人员能够制作或实现而公开的实施例,并不用以限制本发明的保护范围。凡在本发明所定义的原理基础上所作的任何显而易见的修改,等同的替换与改进,均应包含在本发明的保护范围之内。本发明所主张的权利范围不应非仅限于上述实施例,而是应与本发明权利要求所定义的原理和技术特征相一致的最广的可能范围。

Claims (10)

  1. 一种基于位图图元的GPU,其特征在于,包括一个或多个子处理器,通过GPU总线连接所述子处理器,并向所述子处理器提供并行访问的第一高速缓存和第二高速缓存,以及连接所述子处理器,所述第一高速缓存和所述第二高速缓存的时序发生器;
    所述子处理器包括至少一个图元筛选器,至少一个深度处理器,至少一个命令解析器,多个图元处理器,至少一个控制所述多个图元处理器工作的颜色处理器和至少一个像素着色器;其中,所述颜色处理器根据待绘制像素所在的一个或多个位图图元的图元数据和图元格式化数据,启动一个或多个可处理所述待绘制像素所在位图图元的图元格式的图元处理器工作,获得所述一个或多个图元处理器生成的像素ARGB值,并向所述像素着色器提供所述像素ARGB值;所述像素着色器根据待绘制像素所在的一个或多个位图图元的像素合成命令和所述像素ARGB值,计算得到待绘制像素的像素合成ARGB值,并向所述时序发生器提供所述像素合成ARGB值;
    所述第一高速缓存配置有用以缓存所述位图图元的图元数据的位图图元数据高速缓存区;所述第二高速缓存配置有用以缓存所述位图图元的图元格式化数据的图元高速缓存区;其中,所述位图图元的图元数据至少包括记录所述位图图元的图元格式的数据,记录所述位图图元的ARGB值的数据和记录所述位图图元的图元数据存储地址和大小的数据;所述图元格式化数据至少包括用以标识所述位图图元的对象索引数据,记录所述位图图元区域位置和大小的数据,记录所述位图图元与其他位图图元的图层叠加显示关系的数据,记录所述位图图元与所述其他位图图元的像素合成命令的数据。
  2. 如权利要求1所述的基于位图图元的GPU,其特征在于,所述位图图元至少分类为三种不同的图元格式,分别是包含相同颜色和相同透明度的纯色位图图元,包含不同颜色和相同或不同透明度的ARGB位图图元,以及包含相同或不同颜色和相同或不同透明度的字形位图图元;
    所述纯色位图图元的图元数据包括用以记录所述纯色位图图元的图元格式的数据,用以记录所述纯色位图图元的填充色ARGB值的数据,用以记录所述纯色位图图元的图元数据存储地址和大小的数据;
    所述ARGB位图图元的图元数据包括用以记录所述ARGB位图图元的图元格式的数据,用以记录所述ARGB位图图元的像素ARGB值的数据,用以记录所述ARGB位图图元的图元数据存储地址和大小的数据;
    所述字形位图图元的图元数据包括用以记录所述字形位图图元的图元格式的数据,用以记录所述字形位图图元的字形ARGB值的数据,用以记录所述字形位图图元的字形轮廓的数据,用以记录所述字形位图图元的图元数据存储地址和大小的数据。
  3. 如权利要求1所述的基于位图图元的GPU,其特征在于,所述位图图元至少分为三个不同的图层级别,分别是作为基本层级的窗口层图元,归属于所述窗口层图元的控件层图元,归属于所述控件层图元的位图层图元;
    所述窗口层图元的图元格式化数据包括用以标识窗口层图元的对象索引数据,用以记录所述窗口层图元的图元格式的数据,用以记录所述窗口层图元区域的位置和大小的数据,用以记录归属所述窗口层图元的控件层图元个数的数据,用以标识归属所述窗口层图元的控件层图元的对象索引数据,用以标记所述窗口层图元与其他窗口层图元的图层叠加显示关系的图层序号,用以记录所述窗口层图元与所述其他窗口层图元的像素合成命令的数据,以及用以标识所述其他窗口层图元的对象索引数据;
    所述控件层图元的图元格式化数据包括用以标识所述控件层图元的对象索引数据,用以记录所述控件层图元的图元格式的数据,用以记录所述控件层图元区域的位置和大小的数据,用以标识归属所述控件层图元的位图层图元的对象索引数据,用以记录归属所述控件层图元的位图层图元个数的数据,用以标记所述控件层图元与属于同一窗口层图元的其他控件层图元的图层叠加显示关系的图层序号,用以记录所述控件层图元与所述其他控件层图元的像素合成命令的数据,用以标识所述控件层图元所属窗口层图元的对象索引数据,以及用以标识所述其他控件层图元的对象索引数据;
    所述位图层图元的图元格式化数据包括用以标识所述位图层图元的对象索引数据,用以记录所述位图层图元的图元格式的数据,用以标识位图层图元所属控件层图元的对象索引数据,用以记录所述位图层图元包络矩形的位置和大小的数据,用以记录所述位图层图元裁剪矩形的位置和大小的数据,用以记录存储所述位图层图元的图元数据的存储器位置的数据,用以标记所述位图层图元与属于同一控件层图元的其他位图层图元的图层叠加显示关系的图层序号,用以记录所述位图层图元与所述其他位图层图元的像素合成命令的数据,以及用以标识所述其他位图层图元的对象索引数据。
  4. 如权利要求2所述的基于位图图元的GPU,其特征在于,按照所述图元格式,所述ARGB位图图元包括仅含有少数有效像素的ARGB-C位图图元;所述ARGB-C位图图元的图元数据包括用以记录所述ARGB-C位图图元的图元格式的数据,用以记录所述ARGB-C位图图元的有效像素的数据,用以记录所述ARGB-C位图图元的图元数据存储地址和大小的数据;
    所述用以记录ARGB-C位图图元的有效像素的数据是按行记录,包括一个或多个行数据块,每个行数据块包括该有效像素行的首个有效像素的位置坐标,该有效像素行的有效像素个数,该有效像素行的按列排列的有效像素的ARGB值。
  5. 如权利要求2所述的基于位图图元的GPU,其特征在于,按照所述图元格式,所述ARGB位图图元还可包括具有线性梯度渐变效果的线性梯度渐变位图图元;所述线性梯度渐变位图图元的图元数据包括用以记录所述线性梯度渐变位图图元的图元格式的数据,用以记录线性渐变起点坐标的数据,用以记录线性渐变终点坐标的数据,用以记录线性渐变起点ARGB值的数据,用以记录线性渐变终点ARGB值的数据,用以记录线性梯度渐变计算公式的数据,用以记录所述线性梯度渐变位图图元的图元数据存储地址和大小的数据。
  6. 如权利要求2所述的基于位图图元的GPU,其特征在于,按照所述图元格式,所述ARGB位图图元还可包括具有径向梯度渐变效果的径向梯度渐变位图图元;所述径向梯度渐变位图图元的图元数据包括用以记录所述径向梯度渐变位图图元的图元格式的数据,用以记录渐变区域圆中心点的坐标的数据,用以记录渐变区域圆内半径的数据,用以记录渐变区域圆外半径的数据,用以记录径向梯度渐变计算公式的数据,用以记录所述径向梯度渐变位图图元的图元数据存储地址和大小的数据。
  7. 如权利要求1所述的基于位图图元的GPU,其特征在于,所述第一高速缓存和所述第二高速缓存为SRAM静态随机存储器。
  8. 如权利要求1所述的基于位图图元的GPU,其特征在于,所述颜色处理器包括图元像素颜色处理电路和可动态重构的矩阵电路;根据待绘制像素所在的一个或多个位图图元的图元格式和对象索引数据,所述图元像素颜色处理电路从所述第一高速缓存读取具有所述图元格式的位图图元的图元数据,从所述第二高速缓存读取具有所述对象索引数据的位图图元的图元格式化数据,并向一个或多个所述图元处理器发送启动工作所需的配置参数,以使所述图元处理器生成像素ARGB值;
    所述可动态重构的矩阵电路,按待绘制像素所在的一个或多个位图图元的图层叠加显示关系,从下至上依次匹配一个或多个颜色通道与所述一个或多个图元处理器建立连接,获得所述图元处理器的状态,及其生成的像素ARGB值,并输出所述像素ARGB值。
  9. 如权利要求2所述的基于位图图元的GPU,其特征在于,所述图元处理器包括用以处理所述纯色位图图元的纯色图元处理器;所述纯色图元处理器包括接收所述颜色处理器发送的配置参数并保存的图元参数寄存器和连接所述图元参数寄存器的寻址/计算单元;
    所述图元参数寄存器包括用以存储多种工作命令的命令寄存器,用以存储图元格式的图元格式寄存器,用以存储图元区域坐标的图元区域坐标寄存器,用以存储待绘制像素坐标的像素坐标寄存器,用以存储待绘制像素ARGB值的像素颜色寄存器,用以存储填充色ARGB值的图元颜色寄存器;
    所述寻址/计算单元将所述填充色ARGB值写入所述像素颜色寄存器,再向所述颜色处理器输出信号表示颜色已完成。
  10. 如权利要求2所述的基于位图图元的GPU,其特征在于,所述图元处理器包括用以处理所述ARGB位图图元的ARGB图元处理器;所述ARGB图元处理器包括接收所述颜色处理器发送的配置参数并保存的图元参数寄存器,从所述第一高速缓存获得图元数据并保存的ARGB高速缓存,连接所述图元参数寄存器和所述ARGB高速缓存的寻址/计算单元,以及连接所述图元参数寄存器和所述寻址/计算单元的像素颜色缓存;
    所述图元参数寄存器包括用以存储多种工作命令的命令寄存器,用以存储图元格式的图元格式寄存器,用以存储图元区域坐标的图元区域坐标寄存器,用以存储待绘制像素坐标的像素坐标寄存器,用以存储待绘制像素ARGB值的像素颜色寄存器,用以存储所述ARGB位图图元首个像素的图元数据存储地址的数据存储地址寄存器,用以存储所述ARGB位图图元的行字节长度的数据行字节长度寄存器;
    所述寻址/计算单元从所述ARGB高速缓存和所述图元参数寄存器获取数据,并依据预置的计算公式,计算出待绘制像素ARGB值,发送到所述像素颜色寄存器和所述像素颜色缓存,再向所述颜色处理器输出信号表示颜色已完成。
    11.如权利要求5所述的基于位图图元的GPU,其特征在于,所述图元处理器包括用以处理所述线性梯度渐变位图图元的线性梯度渐变图元处理器;所述线性梯度渐变图元处理器包括接收所述颜色处理器发送的配置参数并保存的图元参数寄存器,连接所述图元参数寄存器的寻址/计算单元,以及连接所述图元参数寄存器和所述寻址/计算单元的像素颜色缓存;
    所述图元参数寄存器包括用以存储多种工作命令的命令寄存器,用以存储图元格式的图元格式寄存器,用以存储图元区域坐标的图元区域坐标寄存器,用以存储待绘制像素坐标的像素坐标寄存器,用以存储待绘制像素ARGB值的像素颜色寄存器,用以存储线性渐变开始颜色ARGB值的开始颜色寄存器,用以存储线性渐变结束颜色ARGB值的结束颜色寄存器,用以存储线性渐变起点坐标的开始坐标寄存器,用以存储线性渐变终点坐标的结束坐标寄存器;
    所述寻址/计算单元读取所述图元参数寄存器的数据,并依据预置的计算公式,计算出待绘制像素ARGB值,发送到所述像素颜色缓存和所述像素颜色寄存器,再向所述颜色处理器输出信号表示颜色已完成。
    12.如权利要求6所述的基于位图图元的GPU,其特征在于,所述图元处理器包括用以处理所述径向梯度渐变位图图元的径向梯度渐变图元处理器;所述径向梯度渐变图元处理器包括接收所述颜色处理器发送的配置参数并保存的图元参数寄存器,连接所述图元参数寄存器的寻址/计算单元,以及连接所述图元参数寄存器和所述寻址/计算单元的像素颜色缓存;
    所述图元参数寄存器包括用以存储多种工作命令的命令寄存器,用以存储图元格式的图元格式寄存器,用以存储图元区域坐标的图元区域坐标寄存器,用以存储待绘制像素坐标的像素坐标寄存器,用以存储待绘制像素ARGB值的像素颜色寄存器,用以存储开始径向渐变颜色ARGB值的开始颜色寄存器,用以存储结束径向渐变颜色ARGB值的结束颜色寄存器,用以存储渐变区域圆中心点坐标的圆中心坐标寄存器,用以存储渐变区域圆内半径的圆内半径寄存器,用以存储渐变区域圆外半径的圆外半径寄存器;
    所述寻址/计算单元读取所述图元参数寄存器的数据,并依据预置的计算公式,计算出待绘制像素ARGB值,发送到所述像素颜色缓存和所述像素颜色寄存器,再向所述颜色处理器输出信号表示颜色已完成。
    13.如权利要求1所述的基于位图图元的GPU,其特征在于,所述图元处理器包括可处理所述多种图元格式的通用图元处理器;所述通用图元处理器包括接收所述颜色处理器发送的配置参数并保存的图元参数寄存器,从所述第一高速缓存获得图元数据并保存的ARGB高速缓存,连接所述图元参数寄存器和所述ARGB高速缓存的寻址/计算单元,以及连接所述图元参数寄存器和所述寻址/计算单元的像素颜色缓存;
    所述图元参数寄存器包括用以存储多种工作命令的命令寄存器,用以存储图元格式的图元格式寄存器,用以存储图元区域坐标的图元区域坐标寄存器,用以存储待绘制像素坐标的像素坐标寄存器,用以存储待绘制像素ARGB值的像素颜色寄存器,以及用以存储多种图元格式的计算待绘制像素ARGB值所需的计算参数的格式专用寄存器;
    所述寻址/计算单元根据所述图元格式,读取格式专用寄存器中相应的计算参数,并根据计算需要读取所述ARGB高速缓存的数据,再依据预置的计算公式,计算出待绘制像素ARGB值,发送到所述像素颜色寄存器,并根据需要将待绘制像素ARGB值与待绘制像素坐标一起发送到所述像素颜色缓存,再向所述颜色处理器输出信号表示颜色已完成。
    14.如权利要求1所述的基于位图图元的GPU,其特征在于,所述像素着色器包括多个依序串联的像素计算单元,和连接首个像素计算单元的初始颜色寄存器;所述初始颜色寄存器存储初始颜色ARGB值;所述首个像素计算单元读取所述初始颜色ARGB值和所述颜色处理器提供的待绘制像素在第一层位图图元的像素ARGB值,将两者按照所述待绘制像素在第一层位图图元的像素合成命令,计算出第一个像素合成ARGB值;
    第n个像素计算单元读取前一个像素计算单元计算出的像素合成ARGB值和所述颜色处理器提供的待绘制像素在第n层位图图元的像素ARGB值,将两者按照所述待绘制像素在第n层位图图元的像素合成命令,计算出第n个像素合成ARGB值,并作为所述待绘制像素的像素合成ARGB值输出;其中,n为大于1的自然数。
    15.如权利要求1所述的基于位图图元的GPU,其特征在于,所述子处理器还包括连接所述图元筛选器的图层数据寄存器阵列;所述第二高速缓存中配置有存储用以标识首个窗口层图元的对象索引数据的索引寄存器;所述图元筛选器接收所述时序发生器发送的待绘制像素的坐标值,和读取所述索引寄存器,从首个窗口层图元开始逐一遍历所述第二高速缓存提供的窗口层图元,控件层图元,以及位图层图元的图元格式化数据,筛选出所有图元区域内包含待绘制像素坐标的位图图元的图层数据;所述图层数据至少包括对象索引数据,图元格式和图层序号;
    所述图层数据寄存器阵列存储所述图层数据。
    16.如权利要求15所述的基于的位图图元的GPU,其特征在于,所述子处理器还包括连接所述深度处理器的深度排序寄存器阵列;所述深度处理器包括多个深度值Z计算单元,和连接所述深度值Z计算单元的深度值Z比较器,和连接所述深度值Z比较器的深度排序列表生成器;
    所述深度值Z计算单元从所述图层数据寄存器阵列读取所述图层数据,按照预置的计算公式,计算出每个筛选后的位图图元的深度值Z;
    所述深度值Z比较器比较所述深度值Z的大小;
    所述深度排序列表生成器,按照所述深度值Z从小到大的顺序将图元参数信息依次排序,生成深度排序列表,发送到所述深度排序寄存器阵列存储;所述图元参数信息包括所述对象索引数据和所述图元格式。
    17.如权利要求16所述的基于位图图元的GPU,其特征在于,所述颜色处理器从所述深度排序寄存器阵列读取所述深度排序列表,按照所述深度排序列表记录的图元参数信息,从所述第一高速缓存读取具有所述图元格式的位图图元的图元数据,从所述第二高速缓存读取具有所述对象索引数据的位图图元的图元格式化数据,启动一个或多个可处理所述图元格式的图元处理器工作,并按照所述深度排序列表的排序依次匹配颜色通道与所述图元处理器建立连接,获得所述图元处理器生成的像素ARGB值,并通过颜色通道输出所述像素ARGB值。
    18.如权利要求1所述的基于位图图元的GPU,其特征在于,所述子处理器还包括连接所述命令解析器的图元命令寄存器阵列;
    所述命令解析器从所述第二高速缓存获取待绘制像素所在的一个或多个位图图元的像素合成命令,并按从下向上的图层叠加显示关系的顺序,发送所述像素合成命令给图元命令寄存器阵列存储;
    所述像素着色器从所述图元命令寄存器阵列获取所述像素合成命令。
    19.一种基于位图图元的图形处理系统,其特征在于,包括CPU,和提供并行访问的第一静态随机存储器,第二静态随机存储器,第三静态随机存储器,以及与所述CPU和所述第一静态随机存储器,所述第二静态随机存储器,所述第三静态随机存储器连接的如权利要求1至3任意一项所述的GPU;
    所述第二静态随机存储器配置有用以存储所述位图图元的图元数据的位图图元数据缓冲区;所述第三静态随机存储器配置有用以存储所述位图图元的图元格式化数据的缓冲区。
    20.一种基于位图图元的图形处理方法,其特征在于,包括:
    将所述位图图元的数据信息至少分为图元数据和图元格式化数据;所述图元数据至少包括记录位图图元的图元格式的数据,记录位图图元的ARGB值的数据和记录位图图元的图元数据存储地址和大小的数据;所述位图图元的图元格式化数据至少包括用以标识所述位图图元的对象索引数据,记录所述位图图元区域位置和大小的数据,记录所述位图图元与其他位图图元的图层叠加显示关系的数据,记录所述位图图元与所述其他位图图元的像素合成命令的数据;
    将所述图元数据和所述图元格式化数据分别存储在存储器中的不同存储区域或不同的存储器中;
    向GPU提供对所述图元数据和所述图元格式化数据的并行访问。
    21.如权利要求20所述的基于位图图元的图形处理方法,其特征在于,还包括:
    将所述位图图元至少分类为三种不同的图元格式,分别是包含相同颜色和相同透明度的纯色位图图元,包含不同颜色和相同或不同透明度的ARGB位图图元,以及包含相同或不同颜色和相同或不同透明度的字形位图;
    所述纯色位图图元的图元数据包括用以记录所述纯色位图图元的图元格式的数据,用以记录所述纯色位图图元的填充色ARGB值的数据,用以记录所述纯色位图图元的图元数据存储地址和大小的数据;
    所述ARGB位图图元的图元数据包括用以记录所述ARGB位图图元的图元格式的数据,用以记录所述ARGB位图图元的像素ARGB值的数据,用以记录所述ARGB位图图元的图元数据存储地址和大小的数据;
    所述字形位图图元的图元数据包括用以记录所述字形位图图元的图元格式的数据,用以记录所述字形位图图元的字形ARGB值的数据,用以记录所述字形位图图元的字形轮廓的数据,用以记录所述字形位图图元的图元数据存储地址和大小的数据。
    22.如权利要求21所述的基于位图图元的图形处理方法,其特征在于,按照所述图元格式,所述ARGB位图图元包括仅含有少数有效像素的ARGB-C位图图元;所述ARGB-C位图图元的图元数据包括用以记录所述ARGB-C位图图元的图元格式的数据,用以记录所述ARGB-C位图图元的有效像素的数据,用以记录所述ARGB-C位图图元的图元数据存储地址和大小的数据;
    所述ARGB-C位图图元的有效像素的数据按行记录,包括一个多个行数据块,每个行数据块包括该有效像素行的首个有效像素的位置坐标,该有效像素行的有效像素个数,该有效像素行的按列排列的有效像素的ARGB值。
    23.如权利要求21所述的基于位图图元的图形处理方法,其特征在于,按照所述图元格式,所述ARGB位图图元包括具有线性梯度渐变效果的线性梯度渐变位图图元;所述线性梯度渐变位图图元的图元数据包括用以记录所述线性梯度渐变位图图元的图元格式的数据,用以记录起点坐标的数据,用以记录终点坐标的数据,用以记录起点ARGB值的数据,用以记录终点ARGB值的数据,用以记录线性梯度渐变梯度计算公式的数据,用以记录所述线性梯度渐变位图图元的图元数据存储地址和大小的数据。
    24.如权利要求20所述的基于位图图元的图形处理方法,其特征在于,还包括:
    将所述位图图元至少分为三个不同的图层级别,分别是作为基本层级的窗口层图元,归属于所述窗口层图元的控件层图元,归属于所述控件层图元的位图层图元;
    所述窗口层图元的图元格式化数据包括用以标识所述窗口层图元的对象索引数据,用以记录所述窗口层图元的图元格式的数据,用以记录所述窗口层图元区域位置和大小的数据,用以记录归属所述窗口层图元的控件层图元个数的数据,用以标识归属所述窗口层图元的控件层图元的对象索引数据,用以标记所述窗口层图元与其他窗口层图元的图层叠加显示关系的图层序号,用以记录所述窗口层图元与所述其他窗口层图元的像素合成命令的数据,以及用以标识所述其他窗口层图元的对象索引数据;
    所述控件层图元的图元格式化数据包括用以标识所述控件层图元的对象索引数据,用以记录所述控件层图元的图元格式的数据,用以记录所述控件层图元区域位置和大小的数据,用以标识归属所述控件层图元的位图层图元的对象索引数据,用以记录归属所述控件层图元的位图层图元个数的数据,用以标记所述控件层图元与属于同一窗口层图元的其他控件层图元的图层叠加显示关系的图层序号,用以记录所述控件层图元与所述其他控件层图元的像素合成命令的数据,用以标识所述控件层图元所属窗口层图元的对象索引数据,以及用以标识所述其他控件层图元的对象索引数据;
    所述位图层图元的图元格式化数据包括用以标识所述位图层图元的对象索引数据,用以记录所述位图层图元的图元格式的数据,用以标识所述位图层图元所属控件层图元的对象索引数据,用以记录所述位图层图元包络矩形的位置和大小的数据,用以记录所述位图层图元裁剪矩形的位置和大小的数据,用以记录存储所述位图层图元的图元数据的存储器位置的数据,用以标记所述位图层图元与属于同一控件层图元的其他位图层图元的图层叠加显示关系的图层序号,用以记录所述位图层图元与所述其他位图层图元的像素合成命令的数据,以及用以标识所述其他位图层图元的对象索引数据。
    25.如权利要求20所述的基于位图图元的图形处理方法,其特征在于,所述存储器为静态随机存储器SRAM。
PCT/CN2022/104589 2022-07-08 2022-07-08 基于位图图元的图形处理系统,方法和gpu WO2024007293A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/104589 WO2024007293A1 (zh) 2022-07-08 2022-07-08 基于位图图元的图形处理系统,方法和gpu
CN202280002320.3A CN115349136A (zh) 2022-07-08 2022-07-08 基于位图图元的图形处理系统,方法和gpu

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/104589 WO2024007293A1 (zh) 2022-07-08 2022-07-08 基于位图图元的图形处理系统,方法和gpu

Publications (1)

Publication Number Publication Date
WO2024007293A1 true WO2024007293A1 (zh) 2024-01-11

Family

ID=83957694

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104589 WO2024007293A1 (zh) 2022-07-08 2022-07-08 基于位图图元的图形处理系统,方法和gpu

Country Status (2)

Country Link
CN (1) CN115349136A (zh)
WO (1) WO2024007293A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485629A (zh) * 2023-06-21 2023-07-25 芯动微电子科技(珠海)有限公司 一种多gpu并行几何处理的图形处理方法及系统
CN117152300B (zh) * 2023-10-28 2024-02-09 浙江正泰中自控制工程有限公司 用于dcs系统流程图绘制性能优化的图层动态规划算法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9589388B1 (en) * 2013-07-10 2017-03-07 Thinci, Inc. Mechanism for minimal computation and power consumption for rendering synthetic 3D images, containing pixel overdraw and dynamically generated intermediate images
US20180089091A1 (en) * 2016-09-26 2018-03-29 Intel Corporation Cache and compression interoperability in a graphics processor pipeline
CN108694688A (zh) * 2017-04-07 2018-10-23 英特尔公司 用于在图形处理架构中管理数据偏置的设备和方法
CN109241555A (zh) * 2018-07-27 2019-01-18 西北工业大学 一种改善绘图精度的多图元Gerber文件解析及绘制方法
CN110291563A (zh) * 2017-02-15 2019-09-27 微软技术许可有限责任公司 图形处理中的多个着色器进程

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9589388B1 (en) * 2013-07-10 2017-03-07 Thinci, Inc. Mechanism for minimal computation and power consumption for rendering synthetic 3D images, containing pixel overdraw and dynamically generated intermediate images
US20180089091A1 (en) * 2016-09-26 2018-03-29 Intel Corporation Cache and compression interoperability in a graphics processor pipeline
CN110291563A (zh) * 2017-02-15 2019-09-27 微软技术许可有限责任公司 图形处理中的多个着色器进程
CN108694688A (zh) * 2017-04-07 2018-10-23 英特尔公司 用于在图形处理架构中管理数据偏置的设备和方法
CN109241555A (zh) * 2018-07-27 2019-01-18 西北工业大学 一种改善绘图精度的多图元Gerber文件解析及绘制方法

Also Published As

Publication number Publication date
CN115349136A (zh) 2022-11-15

Similar Documents

Publication Publication Date Title
WO2024007293A1 (zh) 基于位图图元的图形处理系统,方法和gpu
EP4002062A1 (en) Display method and apparatus, electronic device, and computer-readable medium
US9881592B2 (en) Hardware overlay assignment
US8842129B2 (en) Device and method for generating variable priority multiwindow images
CN112905122B (zh) 一种存储数据的方法及装置
US10748235B2 (en) Method and system for dim layer power optimization in display processing
US20210209827A1 (en) Methods and apparatus for reducing memory bandwidth in multi-pass tessellation
US20230335049A1 (en) Display panel fps switching
US10445902B2 (en) Fetch reduction for fixed color and pattern sub-frames
US8593473B2 (en) Display device and method for optimizing the memory bandwith
US20020101428A1 (en) Graphic engine and method for reducing idle time by validity test
KR102645239B1 (ko) Gpu 캐시를 활용한 다운스케일링을 위한 simo 접근 방식으로의 gpu 커널 최적화
US20230394738A1 (en) Rasterization of compute workloads
WO2024044936A1 (en) Composition for layer roi processing
WO2023225771A1 (en) Concurrent frame buffer composition scheme
US20240046410A1 (en) Foveated scaling for rendering and bandwidth workloads
WO2024087152A1 (en) Image processing for partial frame updates
CN111857918A (zh) 2D桌面PorterDuff实现方法及装置
WO2023230744A1 (en) Display driver thread run-time scheduling
WO2023065100A1 (en) Power optimizations for sequential frame animation
WO2024055234A1 (en) Oled anti-aging regional compensation
US20220172695A1 (en) Methods and apparatus for plane planning for overlay composition
US20240096042A1 (en) Methods and apparatus for saliency based frame color enhancement
WO2023142752A1 (en) Sequential flexible display shape resolution
WO2023151067A1 (en) Display mask layer generation and runtime adjustment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22949876

Country of ref document: EP

Kind code of ref document: A1