GB2474115A - Controlling the Reading of Arrays of Data from Memory - Google Patents

Controlling the Reading of Arrays of Data from Memory Download PDF

Info

Publication number
GB2474115A
GB2474115A GB201016165A GB201016165A GB2474115A GB 2474115 A GB2474115 A GB 2474115A GB 201016165 A GB201016165 A GB 201016165A GB 201016165 A GB201016165 A GB 201016165A GB 2474115 A GB2474115 A GB 2474115A
Authority
GB
United Kingdom
Prior art keywords
data
block
array
memory
processing device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB201016165A
Other versions
GB201016165D0 (en
GB2474115B (en
Inventor
Daren Croxford
Lars Ericsson
Jon Erik Oterhals
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Ltd
Original Assignee
ARM Ltd
Advanced Risc Machines Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB0916924.4A external-priority patent/GB0916924D0/en
Priority claimed from GBGB1014602.5A external-priority patent/GB201014602D0/en
Application filed by ARM Ltd, Advanced Risc Machines Ltd filed Critical ARM Ltd
Publication of GB201016165D0 publication Critical patent/GB201016165D0/en
Publication of GB2474115A publication Critical patent/GB2474115A/en
Application granted granted Critical
Publication of GB2474115B publication Critical patent/GB2474115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures

Abstract

A display controller 7 reads blocks of data from a frame buffer 3 and stores them in a local memory buffer 8 of the display controller 7 before outputting the blocks of data to a display The display controller 7 uses similarity meta-data 10 associated with the output frame in the frame buffer 3 to determine whether a new block of data to be processed for display is similar to a block of data already stored in the local memory 8 of the display controller 7 or not. If it is determined that the data block to be processed is similar to a data block already stored in the local buffer 8 of the display controller 7, the display controller does not read a new data block from the frame buffer 3 but instead provides the existing data block in its buffer 8 to the display 2.

Description

Methods of and Apparatus for Controlling the Reading of Arrays of Data from Memory The present invention relates to the reading of arrays of data from memory for processing. One example of this is the operation of display controllers when processing images from a frame buffer for display.
As is known in the art, in many electronic devices and systems, arrays of data, such as images, will need to be processed. For example, an image that is to be displayed to a user will usually be processed by a so-called "display controller" of a display device for display.
Typically, the display controller will read the output image to be displayed from a so-called "frame buffer" in memory which stores the image as a data array and provide the image data appropriately to the display. In the case of a graphics processing system, for example, the output image of the graphics processing system will be stored in the frame buffer in memory when it is ready for display and the display controller will then read the frame buffer and provide it to the display (which may, e.g., be a screen or printer) for display.
As is known in the art, the frame buffer itself is usually stored in so-called "main" memory of the system in question, and that is therefore external to the display device and to the display controller. The reading of data from the frame buffer for display can therefore consume a relatively significant amount of power and memory bandwidth. For example, a new image frame may need to be read and displayed from the frame buffer at rates of 30 frames per second or higher, and each frame can require a significant amount of data, particularly for higher resolution displays and high definition (HD) graphics.
It is known therefore to be desirable to try to reduce the power consumption of frame buffer operations and various techniques have been proposed to try to achieve this.
These techniques include providing an on-chip (as opposed to external) frame buffer, frame buffer caching (buffering), frame buffer compression and dynamic colour depth control. However, each of these techniques has its own drawbacks and disadvantages.
For example, using an on-chip frame buffer, particularly for higher resolution displays, may require a large amount of on-chip resources. Frame buffer caching or buffering may not be practicable as frame generation is typically asynchronous to frame buffer display. Frame buffer compression can help, but the necessary logic is relatively complex, and the frame buffer format is altered. Lossy frame buffer compression will reduce image quality. Dynamic colour depth control is similarly a lossy scheme and therefore reduces image quality.
Other arrangements in which data arrays may need to be read from memory for processing include, for example, the situation where a CPU may need to read in an image generated by a graphics processor to modify it, and where a graphics processor may need to read in an externally generated texture that it is then to use in its graphics processing. These arrangements can also consume relatively significant memory bandwidth and power when reading the stored data array for processing.
The Applicants believe therefore that there remains scope for improvements to data array, such as frame buffer, reading operations.
According to a first aspect of the present invention, there is provided a method of processing an array of data in which a processing device processes the array of data by processing successive blocks of data each representing particular regions of the array of data and blocks of data representing particular regions of the array of data are read from a first memory in which the array of data is stored and stored in a memory of the processing device prior to the blocks of data being processed by the processing device; the method comprising: determining whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and either processing for the block of data to be processed a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
According to a second aspect of the present invention, there is provided a system comprising: a first memory for storing an array of data to be processed; a processing device for processing an array of data stored in the first memory, by processing successive blocks of data, each representing particular regions of the array of data, and the processing device having a local memory; a read controller configured to read blocks of data representing particular regions of an array of data that is stored in the first memory and to store the blocks of data in the local memory of the processing device prior to the blocks of data being processed by the processing device; and a controller configured to determine whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and to cause the processing device to process for the block of data to be processed either a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
According to a third aspect of the present invention, there is provided a processing device for processing an array of data stored in a first memory, the processing device being configured to process the array of data by processing successive blocks of data, each representing particular regions of the array of data; and comprising: a local memory; a read controller configured to read blocks of data representing particular regions of an array of data that is stored in the first memory and to store the blocks of data in the local memory of the processing device prior to the blocks of data being processed by the processing device; and a controller configured to determine whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and to cause the processing device to process for the block of data to be processed either a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
The present invention relates to and is implemented in systems in which an array of data to be processed (which could be, e.g., and in one preferred embodiment is, a frame to be displayed) is read from memory for processing by a processing device (which could, e.g., and in one preferred embodiment is, a display controller)) in the form of blocks of data that represent particular regions of the array of data.
In essence therefore, the present invention relates to and is intended to be implemented in systems in which data arrays to be processed by the system are read from memory and processed on a block-by-block basis, rather than directly as a single, overall, output "array".
As discussed above, this may be the case, for example, for the display of images generated by a tile-based graphics processing system. In this case, the display controller may process each frame for display from the frame buffer on a tile-by-tile basis (although as will be discussed further below, this is not essential, and, indeed, may not always be preferred).
(As is known in the art, in tile-based rendering, the two dimensional output array or frame of the rendering process (the "render target") (e.g., and typically, that will be displayed to display the scene being rendered) is sub-divided or partitioned into a plurality of smaller regions, usually referred to as "tiles", for the rendering process. The tiles (sub-regions) are each rendered separately (typically one after another). The rendered tiles (sub-regions) are then recombined to provide the complete output array (frame) (render target), e.g. for display.
Other terms that are commonly used for "tiling" and "tile based" rendering include "chunking" (the sub-regions are referred to as "chunks") and "bucket" rendering. The terms "tile" and "tiling" will be used herein for convenience, but it should be understood that these terms are intended to encompass all alternative and equivalent terms and techniques.) In the present invention, rather than each data block (e.g. rendered tile) simply being read out of the memory where the data array is stored and processed in turn, when a data block is to be processed (e.g. for display), it is first determined whether that block is similar to a data block (e.g. tile) that is already stored in a (local) memory of the processing device (e.g. display controller) that is to process the data array. It is then determined whether to process an existing data block in the local memory or a new data block from the stored data array in memory as the data block to be processed on the basis of the similarity determination.
As will be discussed further below, the Applicants have found and recognised that this process can be used to reduce significantly the number of data blocks (e.g. rendered tiles) that will be read from main memory (e.g. the frame buffer) for processing in use, thereby significantly reducing the number of main memory (e.g. frame buffer) read transactions and hence the power and memory bandwidth consumption related to main memory (e.g. frame buffer) read operations.
It can also, accordingly, facilitate the use of lower performance, lower power memory systems, which may be particularly advantageous in the context of lower power, lower cost portable devices, for example.
For example, if it is found that a data block to be processed is the same as a data block (e.g. rendered tile) that is already present in the local memory of the processing device, it can be (and preferably is) determined to be unnecessary to read a data block from the stored data array to the processing device's local memory, thereby eliminating the need for that read "transaction". Thus, when the data block to be processed is determined to be similar to a data block already stored in the local memory of the processing device, preferably the (appropriate) existing block in the local memory of the processing device is processed by the processing device and vice-versa.
Moreover, the Applicants have recognised that, for example in the case of graphics processing, it may be a relatively common occurrence for a new data block (e.g. rendered tile) to be processed to be the same as or similar to a data block (e.g. rendered tile) that is already in the local memory of the, e.g. display controller. For example, in the case of graphics processing there will be regions of an image that will be similar to each other, such as the sky, sea, or other uniform background, etc., much of the user interface for many applications, etc.. By facilitating the ability to identify such regions (e.g. tiles) and to then, if desired, avoid reading such regions (e.g. tiles) to the local memory of the display controller again, a significant saving in read traffic (read transactions) to the local memory of the, e.g. display controller, can be achieved.
Thus the present invention can be used to significantly reduce the power consumed and memory bandwidth used for frame buffer and memory read operations, in effect by facilitating the identification and elimination of unnecessary memory (e.g. frame buffer) read transactions.
Furthermore, compared to the prior art schemes discussed above, the present invention requires relatively little on-chip hardware, can be a lossless process, and doesn't change the data array (e.g. frame buffer) format. It can also readily be used in conjunction with, and is complementary to, existing output (e.g. frame buffer) power reduction schemes, thereby facilitating further power savings if desired.
As will be discussed further below, the present invention can also be used to avoid the writing of data blocks to the initial data array in the first place. Such write transaction elimination can lead to further memory (eg. frame buffer) transaction power and memory bandwidth savings (although as the data array is likely to be read more times than it is written to (updated), eliminating read transactions may generally be more beneficial).
As discussed above, in a particularly preferred embodiment, the processing device determines whether to read a new data block from the data array in main memory into the local memory of the processing device or not on the basis of the similarity determination.
Thus, in a particularly preferred embodiment, if it is determined that a (e.g. the next) block of data to be processed is to be considered to be similar to a block of data already stored in the local memory of the processing device, a new block of data is not read from the data array in the main memory and stored in the local memory of the processing device, but instead the existing block of data in the local memory of the processing device is processed as the (e.g. next) block of data to be processed by the processing device.
On the other hand, if it is determined that a (e.g. the next) block of data to be processed is not to be considered to be similar to a block of data already stored in the local memory of the processing device, a new block of data is read from the data array in the main memory and stored in the local memory of the processing device, and then processed as the (e.g. next) block of data to be processed by the processing device.
As will be discussed further below, the similarity determination is preferably based on similarity information (rneta-data) that is associated with the data blocks in question. The generation of such similarity information is a further aspect of the present invention. This is discussed in more detail below.
The present invention can be used in any system where data is stored as an array and read out to and processed by a processing device on a block-by-block basis. Thus it may be used, for example, in graphics processors, CPUs, video processors, composition engines, display controllers etc..
In general the invention is useful in eliminating read transactions (and write transactions)where nearby data blocks in a data array to be processed are likely to be similar or the same. Thus, the scheme can be used to eliminate read transactions (and write transactions) when, for example, image data is transferred between any two of: a graphics processor (GPU), a CPU, a video processor, a camera controller, and a display controller.
For example, as well as the operation of a display controller as discussed above, potentially and typically processing images to be displayed in the form of blocks of data represents the image, a video processor may generate an image that is to be transferred to a graphics processor for use as a texture, in which case the technique of the present invention could be used to eliminate read transactions when the graphics processor reads in the image (texture) for use. Likewise, a frame generated by a graphics processor may be manipulated by a CPU, in which case the CPU may be operated in the manner of the present invention to reduce the read transactions needed for the CPU to read the frame to manipulate it. This may also have the additional benefit that fewer cache lines can be used in the CPU.
Similarly a camera (video or still) may, e.g. process the image generated by its sensor on a block-by-block basis for storing in memory and subsequent provision to a data processing system, such as a computer, display, etc., that is to process the image.
The memory that the array of data is stored in may comprise any suitable such memory and may be configured in any suitable and desired manner. For example, it may be a memory that is on-chip with the processing device or it may be an external memory. In a preferred embodiment it is an external memory, such as a main memory of the system. It may be dedicated memory for this purpose or it may be part of a memory that is used for other data as well. In the case of a graphics processing system, in a preferred embodiment the memory that the data array is stored in is a frame buffer that the graphics processing system's output is to be provided to.
The array of data that is stored in the first (e.g. main) memory, and that is to be read out therefrom for processing can be any suitable and desired such array of data. It may, for example, comprise any suitable and desired array of data that a graphics processor may be used to generate. In a preferred embodiment it is data representing an image, e.g. that is to be displayed.
In one particularly preferred embodiment it comprises an output frame for display, but it may also or instead comprise other outputs of a graphics processor such as a graphics texture (where, e.g., the render "target" is a texture that the graphics processor is being used to generate (e.g. in "render to texture" operation) or other surface to which the output of the graphics processor system is to be written. It could also, e.g., comprise, as discussed above, an image generated by a video processor, or a CPU.
The processing device may be any device that is to read the data array (in a block-by-block fashion) and process it, e.g., for use or to alter its content. Thus it may, e.g., be, and in a preferred embodiment is, one of a display controller, a CPU, a video processor and a graphics processor.
The local memory of the processing device may similarly be any suitable such memory. It is preferably a buffer or cache memory of or associated with the processing device. The cache may be fully or set associative, for example.
As discussed above, in a particularly preferred embodiment, the present invention is implemented in respect of a data array generated by a graphics processing system (a graphics processor), in which case the data array to be processed is preferably an output frame to be displayed, and the first, main memory in which the data array is stored is preferably a frame buffer of the graphics processing system. Similarly, the processing device that is to process the data array that the output frame is to be displayed on is preferably a display controller of or for a display device (e.g. screen or printer). It may also, e.g., be a CPU that is to manipulate a frame generated by the graphics processor, as discussed above.
The blocks of data that are processed (and compared) can each represent any suitable and desired region (area) of the overall array of data. So long as the overall array of data is divided or partitioned into a plurality of identifiable smaller regions each representing a part of the overall array, and that can accordingly be represented as blocks of data that can be identified and considered, then the sub-division of the array of data into blocks of data can be done as desired.
Each block of data preferably represents a different part (sub-region) of the overall data array (although the blocks could overlap if desired). Each block should represent an appropriate portion (area) of the data array, such as a plurality of data positions within the array. Suitable data block sizes would be, e.g., 8x8, 16x16 or 32x32 data positions in the data array.
In one particularly preferred embodiment, the array of data is divided into regularly sized and shaped regions (blocks of data), preferably in the form of squares or rectangles. However, this is not essential and other arrangements could be used if desired.
The similarity determination and consequent determination to process either a block of data that is already stored in the memory of the processing device or a new block of data from the array of data stored in the first memory may be performed in any desired and suitable manner and at any desired and suitable point and time as the data array is processed.
For example, the similarity determination and consequent data block selection may be (and in one preferred embodiment is) performed for each data block when it is the data block's turn to be processed. In this case, for example, it would be determined whether the next block of data to be processed after the current block of data that is being processed has been processed is similar to a block of data that is already stored in the memory of the processing device or not, and then a new or existing block of data processed for that next block of data accordingly.
However, in a particularly preferred embodiment, the similarity determination and consequent data block selection is performed in advance of the data blocks actually being processed. In this case, the similarity determination will be used to, for example, control the, in effect, "pre-fetching" of data blocks into the local memory of the processing device in advance of those data blocks then being taken from the local memory of the processing device and processed. This arrangement would be suitable where, for example, the processing device (e.g. display controller) operates by queuing data blocks to be processed in its local memory and then processes those blocks for display one-by-one from the queue. In such an arrangement, the similarity determination could be used to control the fetching of data blocks into the queue in the local memory (i.e. whether to, in effect, repeat a data block that is already in the queue or to fetch a new data block to the queue from the stored data array).
The determination of whether a new data block to be processed is similar to a block that is already stored in the local memory of the processing device (e.g. display controller) can be done in any suitable and desired manner. For example, a new data block to be read from the stored data array could be compared with a block or blocks that are already stored in the local memory to determine if the blocks are similar or not. Thus, for example, some of the content of the new data block could be compared with some or all of the content of a or the data block or blocks already stored in the local memory.
In a particularly preferred embodiment, information that is associated with the data array is used to determine whether any given blocks should be considered to be similar to each other or not. Thus, in a particularly preferred embodiment, rather than comparing the content of the data blocks themselves, the similarity determination process determines whether a data block to be processed is similar to a block that is already stored in the local memory using information that is associated with the array of data.
In other words, the similarity determination process preferably uses "meta-data" (information) that is associated with the data array to determine whether a data block to be processed is similar to a block that is already in the local memory of the processing device or not. As will be discussed further below, using meta-data associated with the data array for this purpose reduces the burden on the processing device and can provide a particularly effective mechanism for reducing the number of read transactions in use.
Any suitable form of meta-data (information) that can be used by the processing device to determine if the data blocks should be considered to be similar or not can be used (and associated appropriately with the stored array of data).
For example, the meta-data could comprise, and in one preferred embodiment does comprise, information to allow the processing device itself to assess whether the data blocks should be considered to be similar to each other or not.
In one preferred such embodiment, the information (meta-data) that is associated with the array of data and that is to be used to determine if the blocks of data are similar or not comprises information representative of and/or derived from the content of the data blocks in question. In this case, the similarity determination process preferably then determines whether the respective data blocks are similar or not by comparing information representative of and/or derived from the content of the new data block with information representative of and/or derived from the content of the data block that is already stored in the local memory.
The information representative of the content of each data block in these arrangements may take any suitable form, but is preferably based on or derived from the content on the data block. Most preferably it is in the form of a "signature" for the data block which is generated from or based on the content of the data block. Such a data block content "signature" may comprise, e.g., and preferably, any suitable set of derived information that can be considered to be representative of the content of the data block, such as a checksum, a CRC, or a hash value, etc., derived from (generated for) the data block. Suitable signatures would include standard CRCs, such as CRC32, or other forms of signature such as MD5, SHA-1, etc.. -11 -
Thus, in one particularly preferred embodiment, a signature indicative or representative of, and/or that is derived from, the content of the data block is generated for each data block that is to be compared, and the similarity determination process compares the signatures of the respective data blocks to determine if the blocks are similar or not.
It would, e.g., be possible to generate a single signature for an, e.g., RGBA, data block (e.g. rendered tile), or a separate signature (e.g. CRC) could be generated for each colour plane. Similarly, colour conversion could be performed and a separate signature generated for the Y, U, V planes if desired.
As will be appreciated by those skilled in the art, the longer the signature that is generated for a data block is (the more accurately the signature represents the data block), the less likely it is that there will be a false "match" between signatures (and thus, e.g., the erroneous non-reading of a new data block). Thus, in general, a longer or shorter signature (e.g. CRC) could be used, depending on the accuracy desired (and as a trade-off relative to the memory and processing resources required for the signature generation and processing, for example).
The signatures could also be weighted towards a particular aspect or aspects of the content of the data blocks to allow, e.g., a given overall length of signature to provide better overall results by weighting the signature to those parts of the data block content (data) that will have more effect on the overall output (e.g. as perceived by a viewer of the image that the data array represents).
It would also be possible to use different length signatures for different applications, etc., depending upon the, e.g., application's, e.g., display, requirements. This may further help to reduce power consumption. Thus, in a preferred embodiment, the length of the signature that is used can be varied in use.
Preferably the length of the signature can be changed depending upon the application in use (can be tuned adaptively depending upon the application that is in use).
In a particularly preferred arrangement of these embodiments, the data block signatures are "salted" (i.e. have another number (a salt value) added to the generated signature value) when they are created. The salt value may conveniently be, e.g., the data array (e.g. frame) number since boot, or a random value. This will, as is known in the art, help to make any error caused by any inaccuracies in the signature comparison process non-deterministic (i.e. avoid, for example, the error always occurring at the same point for repeated viewings of a -12-given sequence of images such as, for example, where the process is being used to display a film or television programme).
In the above arrangements, the similarity determination process uses meta-data (information) associated with two (or more) data blocks to determine whether a new data block to be processed is similar to a data block that is already stored in the local memory of the processing device.
However, in another particularly preferred embodiment, the meta-data (information) that is associated with the data array is in the form of similarity information that indicates directly whether a given data block in the data array is similar to another block in the data array. In this case, the processing device can simply read the meta-data to determine if a new data block is to be considered to be similar to a data block that is already stored in the local memory of the processing device or not: there is no need for the processing device to carry out any form of similarity assessment of the blocks themselves using the meta-data.
This reduces the processing requirements on the processing device during the data array processing operation.
Thus, while in one preferred embodiment the information (meta-data) that is associated with the array of data in the first (main) memory comprises information that can be used to assess the similarity between respective data blocks (such as data block "signatures", as discussed above), in a particularly preferred embodiment, this information (meta-data) comprises information indicating (directly) whether a respective data block can be considered to be similar to another data block in the data array or not.
Where the meta-data indicates directly whether a data block can be considered to be similar to another data block in the data array or not, the meta-data can take any suitable and desired form to do that. It could, for example, comprise a hierarchical quad-tree. In a preferred embodiment it is in the form of a (2D) bitmap.
In one particularly preferred such embodiment, the meta-data (e.g. bit-map) represents the data blocks to be read from the data array and each meta-data (e.g. bitmap) entry indicates for a corresponding data block whether that data block is similar to another data block in the data array or not. Most preferably each data block position in the data array has associated with it a meta-data entry indicating whether that block is similar to another block (or not). In this case, the similarity determination process need simply read the relevant meta-data entry for the data block position in question to determine whether the data block is similar to a data block that is already stored in the local memory of the processing device or not.
Thus, in a particularly preferred embodiment, the data array has associated with it meta-data, such as a bitmap, indicating for each respective data block in the data array whether that data block is similar to another data block in the data array, and the similarity determination process (processing device) determines whether a new data block to be processed is similar to a data block that is already stored in the local memory of the processing device using the relevant meta-data for the new data block.
In these arrangements, the meta-data can be constructed and arranged as desired. For example, it could and in one preferred embodiment does, simply indicate whether a data block is similar to the immediately preceding data block in the data array or not. In this case each meta-data entry need comprise only a single-bit, with one value (e.g. "1") indicating that the corresponding block is similar to the immediately preceding block and the other value (e.g. "0") indicating that it is not.
To facilitate this, the data blocks should be processed in a particular, predefined order (both for writing them to the data array and reading them from that array). Preferably an order that can exploit any spatial coherence between the blocks is used.
It would also be possible to use a more sophisticated meta-data arrangement, for example where data blocks are not just considered in relation to their immediately preceding data block but in relation to more than one data block in the data array. In this case the meta-data (e.g. bitmap entry) associated with each respective block position should indicate not only that the corresponding data block is similar to another data block in the data array but also which data block in the data array it is similar to. In this case the meta-data (e.g. bitmap entry) associated with each data block position will be larger than a single bit as more information is being conveyed for each block position. The actual size of the meta-data entries will depend, e.g., on how many data blocks in the data array each data block is to be compared with for similarity purposes (as that then determines how many possible similar block permutations each meta-data entry has to be able to represent).
In these arrangements, each similarity value (meta-data entry) can, e.g., give a relative indication of which other data block in the data array the data block in -14-question is similar to (such that, e.g., "001" indicates the previous data block relative to the current data block), or an absolute indication of which other data block in the data array the data block in question is similar to (such that, e.g., meta-data "125" indicates the block is similar to the 125th data block in the data array in question).
The choice of the size of the meta-data entries will be a trade-off or optimisation between the overhead for preparing and storing the meta-data and the potentially greater number of read transactions that will be eliminated if the meta-data can indicate similarity to a greater number of other data blocks in the data array. The choice of the meta-data arrangement to use can therefore be made based, e.g., on these criteria and, e.g. the expected or anticipated use or implementation conditions of the system. (It should also be noted here that the use of meta-data in the manner of the present embodiments can facilitate using much smaller data block sizes (such as at the level of cache lines), as the meta-data overhead per data block can be relatively small.) In these arrangements, it would also be possible to include with each meta-data entry a "likeness" value that indicates how similar the respective data blocks are. The similarity determination process could then, e.g., use this likeness value to determine whether to read a new block from the data array or to re-use the already existing similar data block in the local memory of the processing device in use. For example, the similarity determination process could set a likeness value threshold, and compare the likeness value for a new data block to that threshold and read in the new data block or not, accordingly. This would then allow the read process to be modified, e.g. to provide for a more or less accurate data array reading process, in use, for example by varying the likeness value threshold in use.
In a further preferred embodiment, the meta-data (similarity information) that is associated with the data array is in the form of a command list that instructs the processing device to read the data blocks into the local memory of the processing device according to their relative similarities. For example, a command list could be prepared that, for example, says read block 1 into the local memory of the processing device, repeat that block for the next three blocks, then read in the 5th data block from the data array into the local memory, repeat that block once, evict the first data block from the local memory, read in the 7th block from the data array, read in and process the 8th block from the data array, and so on. Such a command list could be generated directly, or, for example, a similarity bitmap could first be -15-generated and then parsed to create a command list that is then stored for the data array.
Where similarity meta-data (information) is associated with the data array, it will be necessary to also generate the necessary meta-data that is to be associated with the data array. The present invention also extends, in its preferred embodiments at least, to the generation of the meta-data.
The meta-data may be generated and associated with the data array in any desired and suitable manner. It is preferably generated as the data array is being generated. In one preferred embodiment the meta-data is generated by the device that is generating the data array (which device may, as discussed above, be a graphics processor, a video processor, a camera controller (processing data generated by the camera's sensor), or a CPU, for example).
Where the meta-data comprises content "signatures" for each data block, those signatures could be generated as the data blocks are generated and then stored in association with the generated data blocks in an appropriate manner.
In the case where the meta-data indicates directly whether a data block can be considered to be the same as another data block, such as the "similarity" bitmap discussed above, then the data array generation process preferably includes comparing the blocks of data as they are generated and generating the similarity information, e.g., bitmap, accordingly.
In this case, the data block comparison could be done, e.g., by comparing information, such as the signatures discussed above, representative of and/or derived from the content of a data block with information representative of and/or derived from the content of another data block, so as to assess the similarity or otherwise of the data blocks.
However, in a particularly preferred embodiment, the actual content of the blocks (rather than some representation of their content) is compared to determine if the blocks are to be considered to be similar or not. To do this, some or all of the content of a data block of the data array may be compared with some or all of the content of another data block (or blocks) of the data array. Comparing some or all of the actual content of the data blocks may reduce complexity and reduce errors in the comparison process.
The comparison process preferably uses some form of threshold criteria to determine if a block should be considered to be similar to another block or not. For example, and preferably, if a selected number of the bits of the respective block's contents match,*the blocks are considered to be similar. Preferably there is some maximum visual deviation between the blocks that is permitted (where the data array represents an image).
Most preferably, a maximum deviation, such as an amount of differences in the LSB of the pixels, is allowed before blocks are considered not to be similar.
Preferably this threshold, e.g. maximum content deviation, can be varied (e.g. programmed) in use. It could, for example, be set per application, based on the proportion of static and dynamic frame data, and/or based on the power mode (e.g. low power mode or not) in use, etc..
In one particularly preferred embodiment, the blocks of data that are considered each comprise one cache line of the local memory of the processing device, or a 2D sub-tile of the data array (where the array is made up of separate tiles, such as would be the case for a tile-based graphics processing system).
These are particularly effective implementations because they use units of stored data that can be efficiently manipulated by the processing elements of, and that can be fetched efficiently from memory by, a processing device that is to process the data array.
In a graphics processing system, in one preferred embodiment each data block corresponds to a rendered tile that the graphics processor produces as its rendering output. This is beneficial, as the graphics processor will generate the rendering tiles directly, and so there will be no need for any further processing to "produce" the data blocks that will be considered and compared.
In these arrangements, the (rendering) tiles that the render target (the data array) is divided into for rendering purposes can be any desired and suitable size or shape. The rendered tiles are preferably all the same size and shape, as is known in the art, although this is not essential. In a preferred embodiment, each rendered tile is rectangular, and preferably 8x8, 16x16 or 32x32 sampling positions in size.
In another particularly preferred embodiment, data blocks of a different size and/or shape to the tiles that the rendering process operates on (produces) may be, and preferably are, used.
For example, in a preferred embodiment, a or each data block that is considered and compared may be made up of a set of plural "rendered" tiles, and/or may comprise only a sub-portion of a rendered tile. In these cases there may be an intermediate stage that, in effect, "generates" the desired data block from the rendered tile or tiles that the graphics processor generates.
In one preferred embodiment, the same block (region) configuration (size and shape) is used across the entire array of data. However, in another preferred embodiment, different block configurations (e.g. in terms of their size and/or shape) are used for different regions of a given data array. Thus, in one preferred embodiment, different data block sizes may be used for different regions of the same data array.
In a particularly preferred embodiment, the block configuration (e.g. in terms of the size and/or shape of the blocks being considered) can be varied in use, e.g. on a data array (e.g. output frame) by data array basis. Most preferably the block configuration can be adaptively changed in use, for example, and preferably, depending upon the number or rate of read (and/or write) transactions that are being eliminated (avoided). For example, and preferably, if it is found that using a particular block size only results in a low probability of a block not needing to be read from the main memory, the block size being considered could be changed for subsequent arrays of data (e.g., and preferably, made smaller) to try to increase the probability of avoiding the need to read blocks of data from the main memory.
Where the data block size is varied in use, then that may be done, for example, over the entire data array, or over only particular portions of the data array, as desired.
A data block can be compared with one, or with more than one, other data block. Preferably the comparison is done by storing the respective blocks in an on-chip buffer/cache.
In one preferred embodiment, a data block is compared with a single stored data block only, preferably its immediately preceding data block in the data array.
In another preferred embodiment, a data block is compared to plural other data blocks of the data array. This may help to further reduce the number of data blocks that need to be read from the data array, as it may allow the reading of data blocks that are similar to data blocks in other positions in the data array to be eliminated.
Where a data block is compared to plural other data blocks of the data array, then while each data block could be compared to all the data blocks of the data array, preferably each data block is only compared to some, but not all, of the other data blocks of the data array, such as, and preferably, to those data blocks in the same area of the data array as the data block in question (e.g. those data blocks covering and surrounding the position of the data block). This will provide an increased likelihood of detecting data block matches, without the need to check all the data blocks in the data array. Most preferably a data block is compared to the data blocks on the same line in the data array (in the order that the blocks are being generated in).
It would also be possible to vary the number of other data blocks that each data block is compared with in use, e.g. on a frame-by-frame basis. Varying the data block comparison search depth would allow the meta-data width to be varied.
In one preferred embodiment, each and every data block of the data array is compared with another data block or blocks. However, this is not essential, and so in another preferred embodiment, the comparison is carried out in respect of some but not all of the data blocks of a given data array (e.g. output frame).
In a particularly preferred embodiment, the number of data blocks that are compared with another data block or blocks for respective data arrays is varied, e.g., and preferabty, on a data array by data array (e.g. frame-by-frame), or over sequences of data arrays (e.g. frames), basis. This is preferably based on the expected correlation (or not) between successive data arrays (e.g. frames).
Thus the meta-data generation process preferably comprises means for or a step of selecting the number of the data blocks in the data array that are to be compared with another data block or blocks for a given data array.
In a particularly preferred embodiment, the number of data blocks that are compared can be, and preferably is, different for different regions of the data array.
In a preferred embodiment, it is possible for a software application (e.g. that is triggering the generation of the data array) to indicate and control which regions of the data array the data block comparison process should be performed for. This would then allow the comparison process to be "turned off' by the application for regions of the data array the application "knows" will always be different.
This may be achieved as desired. In a preferred embodiment registers are provided that enable/disable data block (e.g. rendered tile) comparisons for data array regions, and the software application then sets the registers accordingly (e.g. via the graphics processor driver).
As discussed above, it is believed that the generation of "similarity" meta-data for data blocks of an array of data to be processed may be new and advantageous in its own right.
Thus, according to a fourth aspect of the present invention, there is provided a method of generating meta-data for use when processing array of data that is stored in memory, the method comprising: for each of one or more blocks of data representing particular regions of an array of data to be processed: determining whether the block of data should be considered to be similar to another block of data for the data array; and storing similarity information indicating whether the block of data was determined to be similar to another block of data for the data array in association with the array of data.
According to a fifth aspect of the present invention, there is provided a data processing system, comprising: a data processor for generating an array of data for processing; means for determining for each of one or more blocks of data representing particular regions of the array of data whether the block of data should be considered to be similar to another block of data for the data array; and means for storing similarity information indicating whether a block of data was determined to be similar to another block of data for the data array in association with the array of data.
According to a sixth aspect of the present invention, there is provided a data processor comprising: means for generating an array of data for processing; means for determining for each of one or more blocks of data representing particular regions of the array of data whether the block of data should be considered to be similar to another block of data for the data array; and means for storing similarity information indicating whether a block of data was determined to be similar to another block of data for the data array in association with the array of data.
As will be appreciated by those skilled in the art, these aspects and embodiments of the invention can and preferably do include any one or more or all of the preferred and optional features of the invention described herein, as appropriate. Thus, for example, the similarity indicating information is preferably in the form of a bitmap that is associated with the array of data. The similarity of the data blocks is preferably determined by comparing the data blocks, preferably by comparing their content directly. The array of data is preferably data representing -20 -an image, and the data processor (the data array generating processor) is preferably a graphics processor (but it may also be a video processor or a CPU, for
example).
Preferably in these aspects and arrangements, the system generates, as discussed above, the output data array together with a set of associated similarity information (meta-data) indicating which regions (blocks) in the output data array are the same (can be considered to be similar).
Most preferably the entire data array is divided into appropriate data blocks and it is determined for each data block that the data array is divided into, whether that data block is similar to another data block of the data or not (and similarity information stored for the data block accordingly).
In a particularly preferred embodiment, the similarity information is generated as the data array is being written to the memory (i.e. as the data array is being generated). This avoids the need to process the data array once it has been generated to generate the similarity information. In this case, the data array is preferably generated by writing data to the data array in blocks, and as each new block is generated for writing to the array, it is preferably determined whether that block is similar to another block that has already been generated for the data array and its similarity information (meta-data) generated accordingly.
Thus, in a particularly preferred embodiment, the array of data is stored in memory (e.g. the frame buffer) by writing blocks of data representing particular regions of the array of data to the stored array in memory, and when a new block of data is generated for the data array, it is determined whether that new block of data should be considered to be similar to a block of data that has already been generated for the data array, and the similarity information indicating whether that new block of data was determined to be similar to a block of data that had already been generated for the data array is generated and stored in association with the array of data accordingly.
In these arrangements, the data blocks are preferably buffered or cached in a local memory for the similarity information generation process, to avoid having, e.g., to read blocks from the main memory where the data array is to be stored in order to generate the similarity information.
It would also or instead be possible, e.g., to generate "signatures" (as discussed above) for blocks of data as the array is generated, and then use the -21 -signatures to generate further similarity information, such as a similarity bitmap, for the data array.
In the above aspects and embodiments, the meta-data (information), such as the block similarity bitmap and/or signatures for the data blocks, that is associated with the data array and that is to be used when the data array is processed should be stored appropriately. In a preferred embodiment it is stored with the data array in memory (in the first memory). However, this need not be the case, and the similarity meta-data could if desired be stored in a different location to the array of data, such as any other suitable location in the system. Indeed, as the similarity meta-data may be relatively small, it could, e.g., be stored in an on-chip memory or buffer, rather than in off-chip memory, if desired.
When the meta-data is to be used, it can be retrieved appropriately by the processing device. Preferably the meta-data, e.g. signatures, for one or more data blocks, and preferably for a plurality of data blocks is cached locally to the processing device, e.g. on the processing device itself, for example in an on-chip meta-data, e.g. signature, buffer. This may avoid the need to fetch the meta-data from an external memory every time a block similarity assessment is to be made, and so help to reduce the memory bandwidth used for reading the meta-data.
Most preferably, the meta-data for a data array that is being processed is retrieved (read) in portions (corresponding to plural blocks of the data array) in advance of the reading and processing of the data blocks to which it relates. Thus, the similarity meta-data (information) is preferably pre-fetched for the reading process. This can allow the similarity determination to be performed more rapidly.
Where the meta-data, such as data block signatures, is cached locally on the processing device, e.g., stored in an on-chip buffer, then the data blocks are preferably processed in a suitable order, such as a Hilbert order, so as to increase the likelihood of matches with the data block(s) whose meta-data is cached locally (stored in the on-chip buffer).
Although, as will be appreciated by those skilled in the art, the generation and storage of meta-data for data blocks (e.g. rendered tiles) will require some processing and memory resource, the Applicants believe that this will be outweighed by the potential savings in terms of power consumption and memory bandwidth that can be provided by then using that data in the manner discussed above.
-22 -As will be appreciated by those skilled in the art, in a particularly preferred embodiment, the generated data array and meta-data is then read and used by a processing device in the manner discussed above.
Thus, according to a further aspect of the present invention, there is provided a method of processing an array of data, the method comprising: generating an array of data to be processed; for each of one or more blocks of data representing particular regions of the array of data to be processed: determining whether the block of data should be considered to be similar to another block of data of the data array; and generating similarity information indicating whether the block of data was determined to be similar to another block of data of the data array; storing the array of data and its associated generated similarity information in a first memory; reading blocks of data each representing particular regions of the array of data from the first memory and storing them in a memory of a processing device that is to process the data array, prior to the blocks of data being processed by the processing device; using the similarity information generated for the data array to determine whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device; and either processing for the block of data to be processed a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination According to another aspect of the present invention, there is provided a data processing system, comprising: a first memory for storing an array of data to be processed; a data processor for generating an array of data to be processed; means for determining for each of one or more blocks of data representing particular regions of the array of data whether the block of data should be considered to be similar to another block of data of the data array; means for generating similarity information indicating whether the block of data was determined to be similar to another block of data of the data array; means for storing the array of data and its associated generated similarity information in the first memory; and a processing device for processing the array of data stored in the first memory, by processing successive blocks of data, each representing particular regions of the array of data, the processing device having a local memory; a read controller configured to read blocks of data representing particular regions of an array of data that is stored in the first memory and to store the blocks of data in the local memory of the processing device prior to the blocks of data being processed by the processing device; and control circuitry configured to use the similarity information generated for the data array to determine whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and to cause the processing device to process for the block of data to be processed either a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
As will be appreciated by those skilled in the art, these aspects and arrangements can, and preferably do, include one or more or all of the preferred and optional features of the invention discussed herein, as appropriate.
Although as discussed above the present technology is particularly concerned with the process of reading data from memory for use, the Applicants have recognised that the principles of the present technology can also be used to improve the process of writing the data array to memory in the first place. For example, and in particular, the Applicants have recognised that if a data block is determined to be sufficiently similar to a block that has already been generated for the data array then it may be unnecessary to also store the new data block in the data array.
Thus, in a particularly preferred embodiment, when the data blocks for the data array are being written to the data array in memory, a completed data block (e.g. rendered tile) is not written to the data array in memory if it has been determined that that data block should be considered to be similar to a data block that has already been generated for the data array (i.e. that will already be stored in the data array). This thereby avoids writing to the data array a data block that has been determined to be the same as a data block that will already be stored in the data array. -24-
In this case therefore, as each data block to be written to the data array is generated, it may be compared with another data block or blocks of the data array and the new data block then written or not to the data array on the basis of that comparison.
Thus, in a particularly preferred embodiment, there is a step of or means for, when a data block for the data array has been completed, comparing that data block to at least one other data block of the data array, and determining whether or not to write the completed data block to the data array on the basis of the comparison.
This process preferably uses the same block comparison arrangements as discussed above to determine if the blocks are similar, such as comparing signatures representative of the content of the data blocks, or, most preferably, comparing the content of the blocks directly.
In these arrangements, although the data blocks themselves may not be written to the data array, the similarity meta-data should still be generated and stored for the block position in question, as that information will be needed to determine which other block of the data array should be processed by the processing device instead.
In one preferred embodiment of these arrangements, the write elimination process is performed in respect of (by comparing) blocks being generated for the same data array (the current data array) only.
However, the comparison could be extended to include data blocks from a previous data array that is already stored in the memory (e.g. frame buffer) so as to avoid having to write a similar data block again to the memory for the data array if it is already present in the memory from a previous data array. This may particularly be useful where a series of similar data arrays (such as frames of a video sequence) is being generated. In this case, a newly generated data block could be compared (e.g. based on its content or a content signature) with a block or blocks of a data array that is already stored in the memory.
In these arrangements, the system is preferably configured to always write a newly generated data block to the data array in memory periodically, e.g., once a second, in respect of each given data block (data block position). This will then ensure that a new data block is written into the data array at least periodically for every data block position, and thereby avoid, e.g., erroneously matched data blocks (e.g. because the data blocks' signatures happen to match even though the data -25 -blocks' content actually varies) being retained in the data array for more than a given, e.g. desired or selected, period of time. This may be done, e.g., by simply writing out an entire new data array periodically (e.g. once a second). or by writing new data blocks out to the data array on a rolling basis in a cyclic pattern, so that over time all the data block positions are eventually written out as new.
In a particularly preferred embodiment, the present invention is used in conjunction with another power and bandwidth reduction scheme or schemes, such as, and preferably, a data array (e.g. frame buffer) compression scheme (which may be lossy or loss-less, as desired).
As discussed above, although the present techniques have particular application to graphics processor operation, the Applicants have recognised that they can equally be applied to other systems that process data in the form of blocks in a similar manner to, e.g., tile-based graphics processing systems, and that, for example, read frame buffers, textures and/or images. Thus, they may, for example, be applied to a host processor manipulating the frame buffer, a graphics processor reading a texture, a composition engine reading images to be composited, or a video processor reading reference frames for video decoding. Thus the present techniques may equally be used, for example, for video processing (as video processing operates on blocks of data analogous to tiles in graphics processing), and for composite image processing (as again the composition frame buffer will be processed as distinct blocks of data). They may also be used, e.g., where digital cameras are processing data (images) generated by the camera's sensor, and when processing, e.g., for display, data (images) generated by digital cameras.
The present techniques may also be used where there are plural master devices each writing to the same data array, e.g., frame in a frame buffer. This may be the case, for example, when a host processor generates an "overlay" to be displayed on an image that is being generated by a graphics processor.
In this case, each device writing to the data array could update the similarity meta-data accordingly, or, e.g., the meta-data for those parts of the data array that another master is writing to could be invalidated or cleared (so that those parts of the data array will be read out in full to the processing device). The latter would be necessary where a given master device is unable to update the similarity meta-data. It would also be possible to invalidate (clear) the meta-data for the entire data array if, e.g., another master modifies a relatively large portion of the data array (or modifies the data array at all).
-26 -More particularly, in the case where there is a "third party" device that is also reading and/or writing to the data array, then in the case where only read elimination is being employed, the third party device when reading from the data array could simply read the data array normally without using (or, indeed, without knowing about) the similarity meta-data, or the third party device could use the meta-data to eliminate read transactions.
Where the third party device is writing to the data array, then it could either update the meta-data associated with the data array, or a portion or the entirety of the similarity meta-data for the data array could be invalidated. In the latter case there could, for example, be a data array meta-data invalidate bit at the very start of the meta-data.
Where both read and write transaction elimination is being used, then in the case of reading from the data array, the third party device will use the similarity meta-data to eliminate read transactions. (Unlike in the case where only read elimination is being used and therefore a third party device reading the data array may or may not use the meta-data to eliminate reads, as desired, in the case where write elimination is enabled, the third party device must read and use the meta-data when reading from the data array because as write elimination has been used, the data array may not be "complete" (because in the case of a data block whose writing to the data array has been "eliminated", the reading device will have to determine from the meta-data which block to use instead).) In the case of writing to the data array in this case, then as for the case above where only read elimination is enabled, the third party device could when writing data to the data array either update the meta-data, or a portion of or the entirety of the meta-data could be invalidated.
The meta-data generation process (and data block comparison process where used) may be performed as desired. In one preferred embodiment it is performed by the data array generating processor (e.g. GPU, CPU, etc.) itself but in another preferred embodiment there is a separate block or hardware element (logic) that does this that is intermediate the data array generation process and the memory (e.g. frame buffer) where the data array is to be stored. In the case where the meta-data generation "unit" is separate (external) to the data array generating processor, it may reside as a separate logic block, or be part of the bus fabric and/or interconnect, for example. -27 -
Thus, in one preferred embodiment, there is a meta-data generation hardware element (logic) that is separate to the data array generating processor (e.g. graphics processor), and in another preferred embodiment the meta-data generation logic is integrated in (part of) that processor. Thus, in one preferred embodiment, the meta-data generating means, etc., will be part of the data generating processor (e.g. the graphics processor) itself, but in another preferred embodiment, the system will comprise a data generating processor, and a separate "meta-data generation" unit or element.
The present invention also extends to the provision of a particular hardware element for performing the comparison and consequent similarity meta-data determination. As discussed above, this hardware element (logic) may, for example, be provided as an integral part of a, e.g., graphics processor, or may be a standalone element that can, e.g., interface between a graphics processor, for example, and an external memory controller. It may be a programmable or dedicated hardware element.
Thus, according to a further aspect of the present invention, there is provided meta-data generation apparatus for use in a data processing system in which an array of data generated by the data processing system is read from an output buffer by reading blocks of data representing particular regions of the array of data from the output buffer, the apparatus comprising: means for comparing a block of data for the data array with at least one other block of data for the data array, and for generating information indicating whether or not the block of data is to be considered to be similar to another block of data of the data array on the basis of the comparison; and means for storing that similarity information in association with the data array.
As will be appreciated by those skilled in the art, these aspects and embodiments can and preferably do include any one or more or all of the preferred and optional features described herein. Thus, for example, the comparison preferably comprises comparing some or all of the contents of the respective data blocks.
The similarity determination process (and consequent data block selection process) may similarly be performed as desired. In one preferred embodiment it is performed by the processing device (e.g. display controller, GPU, CPU, etc.) itself, but in another preferred embodiment there is a separate block or hardware element -28 - (logic) that does this that is intermediate the data processing device and the memory (e.g. frame buffer) where the data array is stored. In the case where the similarity determination, etc., "unit" is separate (external) to the processing device, it may again reside as a separate logic block, or be part of the bus fabric and/or interconnect, for example.
Thus, in one preferred embodiment, there is a similarity determination hardware element (logic) that is separate to the data array processing device (e.g. display controller), and in another preferred embodiment the similarity determination logic is integrated in (part of) the data array processing device. Thus, in one preferred embodiment, the similarity determination means, etc., (the read controller and controller of the system) will be part of the processing device (e.g. display controller) itself, but in another preferred embodiment, the system will comprise a processing device, and a separate "similarity determination" unit or element (comprising the read controller and/or controller).
The present invention also extends to the provision of a particular hardware element for performing the similarity and consequent data block determination. As discussed above, this hardware element (logic) may, for example, be provided as an integral part of a, e.g., display controller, or may be a standalone element that can, e.g., interface between a display controller, for example, and an external memory controller. It may be a programmable or dedicated hardware element.
Thus, according to a further aspect of the present invention, there is provided a similarity determination apparatus for use when processing an array of data stored in a first memory, the apparatus comprising: a read controller configured to read blocks of data representing particular regions of an array of data that is stored in the first memory and to store the blocks of data in a local memory of a processing device that is to process the array of data prior to the blocks of data being processed by the processing device; and a controller configured to determine whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and to cause the processing device to process for the block of data to be processed either a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
As will be appreciated by those skilled in the art, these aspects and embodiments can and preferably do include any one or more or all of the preferred -29-and optional features described herein. Thus, for example, the similarity determination is preferably based on similarity meta-data that is associated with the data array.
Various other preferred and alternative arrangements are possible. For example, in the case of a stereoscopic display, where left and right images are generated and used, respective "left" and "right" blocks to be displayed are preferably compared for the purpose of read (and, optionally, write) elimination (rather than comparing blocks for the "left" image of the frame only with blocks for the "left" image (and "right" blocks only with "right" blocks)). In other words, preferably left and right parts of the image are compared with each other as well as comparing blocks in the respective parts of the image with each other. This will help to further reduce the number of read transactions, as, as the Applicants have recognised, many of the left and right tiles in the image will be the same as each other. Similar arrangement can be (and preferably are) used for displays that use more than two images and for volume displays.
In a particularly preferred embodiment, the determined similarity information is also used to manage the storing of the data blocks in the local memory of the processing device and in particular as a factor in determining the eviction, of data blocks from the local memory. For example, in one preferred embodiment the meta-data is used to determine a data block or blocks that is going to be used repeatedly by the processing device (e.g. used in a frame being displayed), and that data block (or blocks) is then temporarily locked in the local memory of the processing device (once it is written there) so that it will be available in the local memory when it is needed in the future. Thus, the meta-data is preferably used to try to identify in advance those data blocks that it would be advantageous to retain in the local memory of the processing device (where that is possible) and the local memory is then managed accordingly. This could be done, e.g., by counting how many other data blocks are noted as being similar to a given data block as the meta-data is being prepared. This information could then be used to control the storage of the data blocks in the processing device's local memory accordingly.
It would also be possible to keep a count of the number of times a given data block in the local memory is to be used in the near future (based, e.g., on meta-data that has been pre-fetched for the portion of the data array that is being processed), and to only allow a data block to be evicted from the local memory when its "use" count is zero.
-30 -Thus, in a particularly preferred embodiment, the eviction of data blocks from the local memory of the processing device is controlled, at least in part, in accordance with similarity meta-data that is associated with the data array in question.
The present invention can be implemented in any suitable system, such as a suitably configured micro-processor based system. In a preferred embodiment, the present invention is implemented in computer and/or micro-processor based system.
The various functions of the present invention can similarly be carried out in any desired and suitable manner. For example, the functions of the present invention can be implemented in hardware or software, as desired. Thus, for example, the various functional elements and "means" of the invention may comprise a suitable processor or processors, controller or controllers, functional units, circuitry, processing togic, microprocessor arrangements, etc., that are operable to perform the various functions, etc., such as appropriately dedicated hardware elements and/or programmable hardware elements that can be programmed to operate in the desired manner.
In a preferred embodiment the output data array generating processor and/or meta-data generation unit is implemented as a hardware element (e.g. ASIC). Thus, in another aspect the present invention comprises a hardware element including the apparatus of, or operated in accordance with the method of, any one or more of the aspects of the invention described herein.
It should also be noted here that, as will be appreciated by those skilled in the art, the various functions, etc., of the present invention may be duplicated and/or carried out in parallel on a given processor.
Where used in a graphics processing system, the present invention is applicable to any suitable form or configuration of graphics processor and renderer, such as processors having a "pipelined" rendering arrangement (in which case the renderer will be in the form of a rendering pipeline). It is particularly applicable to tile-based graphics processors and graphics processing systems.
As will be appreciated from the above, the present invention is particularly, although not exclusively, applicable to 2D and 3D graphics processors and processing devices, and accordingly extends to a 2D and/or 3D graphics processor and a 2D and/or 3D graphics processing platform including the apparatus of, or operated in accordance with the method of, any one or more of the aspects of the -31 -invention described herein. Subject to any hardware necessary to carry out the specific functions discussed above, such a 2D and/or 3D graphics processor can otherwise include any one or more or all of the usual functional units, etc., that 2D and/or 3D graphics processors include.
It will also be appreciated by those skilled in the art that all of the described aspects and embodiments of the present invention can include, as appropriate, any one or more or all of the preferred and optional features described herein.
The methods in accordance with the present invention may be implemented at least partially using software e.g. computer programs. It will thus be seen that when viewed from further aspects the present invention provides computer software specifically adapted to carry out the methods herein described when installed on data processing means, a computer program element comprising computer software code portions for performing the methods herein described when the program element is run on data processing means, and a computer program comprising code means adapted to perform all the steps of a method or of the methods herein described when the program is run on a data processing system.
The data processing system may be a microprocessor, a programmable FPGA
(Field Programmable Gate Array), etc..
The invention also extends to a computer software carrier comprising such software which when used to operate a processor or system comprising data processing means causes in conjunction with said data processing means said processor or system to carry out the steps of the methods of the present invention.
Such a computer software carrier could be a physical storage medium such as a ROM chip, CD ROM or disk, or could be a signal such as an electronic signal over wires, an optical signal or a radio signal such as to a satellite or the like.
It will further be appreciated that not all steps of the methods of the invention need be carried out by computer software and thus from a further broad aspect the present invention provides computer software and such software installed on a computer software carrier for carrying out at least one of the steps of the methods set out herein.
The present invention may accordingly suitably be embodied as a computer program product for use with a computer system. Such an implementation may comprise a series of computer readable instructions either fixed on a tangible medium, such as a non-transitory computer readable medium, for example, diskette, CD ROM, ROM, or hard disk. It could also comprise a series of computer -32-readable instructions transmittable to a corriputer system, via a modem or other interface device, over either a tangible medium, including but not limited to optical or analogue communications lines, or intangibly using wireless techniques, including but not limited to microwave, infrared or other transmission techniques.
The series of computer readable instructions embodies all or part of the functionality previously described herein.
Those skilled in the art will appreciate that such computer readable instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Further, such instructions may be stored using any memory technology, present or future, including but not limited to, semiconductor, magnetic, or optical, or transmitted using any communications technology, present or future, including but not limited to optical, infrared, or microwave. It is contemplated that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation, for example, shrink wrapped software, pie loaded with a computer system, for example, on a system ROM or fixed disk, or distributed from a server or electronic bulletin board over a network, for example, the Internet or World Wide Web.
A number of preferred embodiments of the present invention will now be described by way of example only and with reference to the accompanying drawings, in which: Figure 1 shows schematically a first embodiment in which the present invention is used in conjunction with a tile-based graphics processor; Figure 2 shows schematically how the relevant data is stored in memory in an embodiment of the present invention; Figure 3 shows schematically and in more detail the display controller of the embodiment shown in Figure 1; Figure 4 shows the operation of the display controller in the embodiment shown in Figure 1; Figure 5 shows schematically and in more detail the graphics processor of the embodiment shown in Figure 1; and Figure 6 shows the operation of the graphics processor in the embodiment shown in Figure 1.
A number of preferred embodiments of the present invention will now be described. These embodiments will be described primarily with reference to the -33-processing of an image generated by a graphics processing system for display by a display controller, although, as noted above, the present invention is applicable to other arrangements in which a data array is processed in blocks representing regions of the overall array.
Figure 1 shows schematically an arrangement of a system that can be operated in accordance with the present embodiment.
The system includes, as shown in Figure 1, a tile-based graphics processor (GPU) 1. This is the element of the system that, in this embodiment, generates the data arrays to be processed. The data arrays may, as is known in the art, typically be output frames intended for display on a display device 2, such as a screen or printer but may also, for example, comprise a "render to texture" output of the graphics processor 1, etc..
The graphics processor, as is known in the art, generates output data arrays, such as output frames, to be processed, by generating tiles representing different regions of a respective output data array.
As is known in the art, in such an arrangement, once a tile has been generated by the graphics processor I it would then normally be written to an output buffer in the form of a frame buffer 3 in main memory 4 (which memory may be DDR-SDRAM) of the system via an interconnect 5 which is connected to a memory controller 6.
Sometime later the data array in the frame buffer 3 will be read by a display controller 7 and output to the display device 2. (Thus the display controller 7 is the processing device that is to process the data array that is generated by the graphics processor 1 (in this case to display it).) As part of this process, the display controller will read blocks of data from the frame buffer 3 and store them in a local memory buffer 8 of the display controller 7 before outputting those blocks of data to the display 2. The display device 2 may, e.g., be a screen or printer.
In the present embodiment this process further comprises the display controller 7 determining whether a new block of data to be output (processed) for display is to be considered to be similar to a block of data already stored in the local memory 8 of the display controller 7 or not. To do this, in the present embodiment the display controller 7 uses similarity meta-data associated with the output frame in the frame buffer that has been generated by the graphics processor 1 when it generated the output frame. (This process is discussed in more detail below.) -34 -In essence, and as will be discussed in more detail below, the display controller 7 determines whether a data block to be processed is to be considered to be similar to a data block already stored in its local buffer 8, and if it is found that the data block to be processed is similar to a data block already stored in the local buffer 8 of the display controller 7, the display controller does not read a new data block from the frame buffer 3 but instead provides the existing data block in its buffer 8 to the display 2.
In this way, the present embodiment can avoid read traffic between the display controller 7 and the frame buffer 3 for blocks of data in the frame buffer 3 that are similar to blocks of data that are already stored in the local buffer 8 of the display controller 7. (In the case of a game, for example, this may typically be the case for much of the user interface, the sky, etc., as well as most of the playfield when the camera position is static.) This can save a significant amount of bandwidth and power consumption in relation to the frame read operation.
On the other hand, if a data block to be processed is determined not to be similar to a data block already stored in local buffer 8 of the display controller 7, then the display controller reads a new data block from the frame buffer 3 into its local buffer 8 and then provides that new data block to the display 2.
In the present embodiment the data blocks that are read from the frame buffer 3 and compared to data blocks already stored in the buffer 8 of the display controller 7 comprise cache lines, as that is the amount of data that is read for each reading operation by the display controller 7 from the frame buffer 3. However, other arrangements would be possible. For example, the display controller could operate this process in respect of data blocks that correspond to the rendered tiles that the graphics processor 1 generates, or to 2D "sub-tiles" of the rendered tiles.
Figure 1 also shows a host CPU 9 that is also capable of interacting with the main memory 4 via the interconnect 5 and which can also, for example, write to the frame buffer 3 in the main memory 4. This possibility will be discussed in more detail below.
In the present embodiment, as discussed above, the display controller 7 determines whether a given data block (cache line) to be processed for display is to be considered to be similar to a data block already stored in its local buffer 8 by assessing metadata in the form of a bitmap that is stored in association with the data blocks making up the frame in question.
-35 -Each data block position (cache line) in the stored data array in the frame buffer 3 has associated wfth it a single bit in a bitmap that corresponds to the frame (with each bit in the bitmap corresponding to one data block position (cache line in this case) of the frame). The bit in the bitmap for a data block (cache line) is set to "1" if the data block is to be considered to be the same as the previous data block.
(cache line) to be read (processed) from the frame or set to "0" if the data block is considered to be different to the previous data block.
In this way, the display controller can read the bitmap entry associated with a data block that it is due to process, and if that bitmap entry is set to "1", will know that that data block is to be considered the same as a previous data block that was read into the buffer 8 of the display controller 7 (and so can display that data block that is already in its buffer 8 instead of reading a new data block into the local memory 8 of the display controller 7). Alternatively, if the metadata associated with the data block to be processed is "0", the display controller knows that it should read a new data block from the frame buffer 3 into its local buffer 8 and then display it on the display 2.
Figure 2 shows an exemplary memory layout for the data array in the frame buffer 3 and its associated metadata (data block similarity information) 10. In this case, the data blocks making up the frame are stored as a frame buffer 3 and the associated data block similarity bitmap 10 is stored in another portion of the memory 4. (Other arrangements would, of course, be possible.) As shown in Figure 2, each data block in the data array in the frame buffer 3 has an associated entry in the similarity information bitmap 10. Thus, for example, data block 11 in the frame buffer 3 is associated with bitmap entry 13 in the bitmap 10 and data block 12 in the frame buffer 3 is associated with bitmap entry 14 in the similarity bitmap 10.
Figure 2 also shows the nature of the bitmap entries. Thus bitmap entry 13 has the value "0" to indicate that the data block 11 in the data array in the frame buffer 3 is not the same as the previous data block (and so a "new" data block that should be read from the frame buffer into the local memory 8 of the display controller 7). On the other hand, bitmap entry 14 for the next data block 12 has the entry "1" to indicate that that data block 12 is the same as data block 11 in the frame buffer 3. This will then cause the display controller to display the data block 11 that is stored in its local memory 8 instead of reading the new data block 12 from the frame buffer 3.
-36 -Other similarity metadata arrangements could be used if desired. For example, each data block could potentially be indicated as being similar to more than one data block in the data array, in which case each bitmap entry could comprise more bits so as to indicate to the display controller 7 which of the data blocks in the data array the data block to which the bitmap entry corresponds is to be considered to be similar to. In these arrangements, each similarity value (meta-data entry) can, e.g., give a relative indication of which other data block in the data array the data block in question is similar to (such that, e.g., "001" indicates the previous data block relative to the current data block), or an absolute indication of which other data block in the data array the data block in question is similar to (such that, e.g., meta-data "125" indicates the block is similar to the 125th data block in the data array in question).
It would also be possible to include with each nieta-data entry a "likeness" value that indicates how similar the respective data blocks are. The similarity determination process could then, e.g., use this likeness value to determine whether to read a new block from the data array or to re-use the already existing similar data block in the local memory of the processing device in use. For example, the similarity determination process could set a likeness value threshold, and compare the likeness value for a new data block to that threshold and read in the new data block or not, accordingly.
It would also be possible to use arrangements other than bitmaps, such as hierarchical quad trees, etc.. The meta-data (similarity information) that is associated with the data array could also be in the form of a command list that instructs the processing device to read the data blocks into the local memory of the processing device according to their relative similarities.
Also as will be discussed further below, although in the above bitmap example the similarity metadata (bitmap) indicates directly to the display controller 7 whether a respective data block should be considered to be similar to another data block in the data array or not, it would also be possible to associate with each data block some information which allows the display controller itself to carry out a comparison between the data blocks so as to determine whether they should be considered to be similar or not. For example, it would be possible to store instead information representative of the content of each data block and for the display controller 7 to then compare the respective content information of the data blocks to determine if they should be considered to be similar or not.
-37 -Figure 3 shows the structure of the display controller 7 in more detail and Figure 4 is a flowchart showing the above operation of the display controller 7.
As shown in Figure 3, the display controller 7 includes a bus interface unit 20, a metadata buffer 21, a display formatter and output unit 22, and a state machine controller 23, in addition to the local buffer 8 in which it stores the data blocks from the frame buffer 3 in main memory 4 before they are displayed.
The state machine controller 23 acts to control the display controller 7 to execute the operation of the embodiment described above. The metadata buffer 21 is used to store chunks of the metadata bitmap 10 for the frame (data array) in question, to improve off-chip memory access efficiency. Other arrangements, such as the display controller always reading the metadata in the main memory 4 directly When a new frame is to be displayed, the display controller will first read an appropriate portion of the metadata 10 associated with that frame from the main memory 4 and store it in its metadata buffer 21. The display controller will then read blocks of data from the frame buffer 3 in main memory 4 into its data cache/buffer 8 and provide those blocks of data appropriately via the display formatter/output unit 22 to the display 2 for display. The display controller operates to pre-fetch the blocks of data to be displayed into its local memory 8. This is so as to, ensure that there is always data available to be displayed (as buffer/memory under-runs could result in the displayed image glitching). The blocks are then read from the local memory 8 one after another for display. However, this operation is modified under the control of the state machine 23 to follow the process shown in Figure 4 (and discussed above).
As shown in Figure 4, when a new data block (cache line) is to be pre-fetched into the local memory 8, in order to be processed for display (which may be triggered, e.g., by the display of a block from the local memory 8, thereby prompting the need to fetch a new block to add to the "queue" in the local memory 8), the state machine controller 23 reads the appropriate location in the similarity metadata bitmap in the metadata buffer 21 for that new data block (step 31). It then determines whether the bit stored in the appropriate location in the similarity bitmap has the value "1" or not (step 32).
If it is determined that the value in the bitmap location is "1", then that indicates that the new data block is the same as the previous data block (which should therefore already be in the local memory 8 of the display controller) and so -38 -instead of reading a new data block from the frame buffer 3, the state machine controller 23 causes the display controller to (at the appropriate time) use the previous data block that is already in its local buffer 8, i.e. to provide that previous data block from the local buffer 8 to the display 2 (step 33). (It will be appreciated here that if there is a sequence of similar blocks (i.e. blocks for which the meta-data has the value "1"), then the state machine controller will cause the display controller to, in effect, reuse (repeat) the first block in the sequence for each successive similar data block.) On the other hand, if the value in the bitmap is then that indicates that the data block is not the same as the previous data block and so the data block will need to be pre-fetched from the frame buffer 3 into the local memory 8 for display.
In this case the state machine controller 23 causes the display controller to read the data block from the frame buffer 3 in the main memory 4 (step 34) and to store that data block in the local buffer 8 of the display controller (step 35). The new block is then provided (at the appropriate time) from the local buffer 8 of the display controller 7 to the display device 2 (step 36).
The data block is then displayed (step 37).
The process is then repeated for the next data block to be processed (to be pre-fetched into the local memory 8) and so on.
In the present embodiment, the metadata that is used by the display controller 7 to determine whether or not a new block to be processed is the same as a data block already stored in its local buffer 8 is generated by the graphics processor I as the tiles making up the frame are generated. Figure 5 shows the architecture of the graphics processor 1 that carries out this process and Figure 6 is a flow diagram showing the steps of the metadata generating process.
As shown in Figure 5, the graphics processor 1 is modified to include after its tile rendering logic 40, additional data block generation logic and block comparison logic which is used to generate the appropriate metadata for association with the data array (frame) in the frame buffer 3.
The block generating logic 41 acts to generate the appropriate data blocks from the tiles that are generated by the tile rendering logic 40. In the present embodiment the block generating logic accordingly generates blocks that correspond to cache lines in the cache memory 8 of the display controller 7.
However, as discussed above, other sizes and forms of data block would be possible and could be generated by the block generating logic 41 if desired.
-39 -The block generating logic stores the successive blocks that it generates in buffers 42. Comparison logic 43 then compares respective data blocks that are stored in the buffers 42 (in this case a new data block with the immediately preceding data block), and generates an appropriate metadata output bit on the basis of the comparison. To increase memory efficiency, the meta-data output bits for plural blocks are collected and merged in a buffer, and then stored appropriately in the metadata bitmap 10 in the main memory 4 (written to off-chip memory).
(Other arrangements would, of course, be possible.) The data blocks are also read from the buffers 42 and stored appropriately in the frame buffer 3.
To facilitate this operation, the data blocks making up the output frame are processed in a particular, predefined order (both for writing them to the frame buffer and reading them therefrom). An order that can exploit any spatial coherence between the blocks is preferably used.
This process is shown as a flowchart in Figure 6.
As shown in Figure 6, the block generation logic 41 generates data blocks (in this case corresponding to cache lines) from the rendered tiles produced by the tile rendering logic 40 (step 51). The data blocks are then stored in the buffers 42.
The comparison logic 43 then compares a new data block with the previous data block (which will already be stored in the buffers 42) (step 52). In the present embodiment, the comparison logic 43 compares the content of the data blocks with each other. Other arrangements would be possible. For example, the comparison logic could generate a signature, such as 32-bit CRC, for each block in question, to represent the content of the blocks, and then compare the signatures of the blocks rather than the actual content of the blocks.
The comparison logic then determines whether the new block should be considered to be similar to the previous block or not (step 53). In the present embodiment this assessment is based on how similar the contents of the two blocks being compared are. A threshold of a particular amount of differences in the LSBs of the pixels is set, and if the difference between the content of the two blocks is less than this threshold, the blocks are determined to be similar, and vice-versa.
(This threshold can be varied (e.g. programmed) in use. It could, for example, be set per application, based on the proportion of static and dynamic frame data, and/or based on the power mode (e.g. low power mode or not) in use, etc..) -40 -If the blocks are determined to be different (i.e. not to be similar) by the comparison logic in step 53, then the comparison logic operates to write the value "0" into the appropriate location in the meta-data bitmap 10 (step 54). The new data block is itself written from the buffers 42 to the frame buffer 3 in the main memory 4 (step 55).
On the other hand, if at step 53 it is determined that the blocks should be considered to be similar, then the comparison logic 43 operates to causes a "1" to be written into the appropriate location in the meta-data bitmap 10 (step 56).
It would then be possible again simply to write the new block into the frame buffer 3 in the main memory 4 as was the case where the blocks were considered to be different. However, Figure 6 shows a preferred arrangement in which a possible "write elimination" operation may be enabled in the graphics processor 1.
This write elimination process operates, as will be discussed further below, to allow the graphics processor to avoid writing blocks that are determined to be similar to each other into the data array in the frame buffer 3. Thus, as shown in Figure 6, if the write elimination process is enabled (step 57), then in the case that the two blocks are considered to be similar to each other, the new block is not written into the data array in the frame buffer (step 58). (On the other hand, if the write elimination process is not enabled at step 57, then the new block would be written to the frame buffer as normal (step 55).) The write elimination process in step 57 thus operates such that if a data block is determined to be the same as the previous data block (i.e. it is the same as the data block that will have already been stored in the frame buffer 3), then the new data block is not written into the frame buffer as welJ. In this way, the write elimination process can avoid write traffic for sections of the data array (frame buffer) that are the same as each other. This can further save bandwidth and power consumption in relation to the frame buffer operation. On the other hand, if the data blocks are determined to be different, then the new data block is written to the frame buffer as would be the case without the write elimination process.
In these arrangements, although the data blocks themselves may not be written to the data array, the similarity meta-data should still be generated and stored for the block position in question, as the processing device (the display controller in the present embodiment) will still need to use that information to determine which other block should be processed instead.
-41 -In a particularly preferred arrangement of these embodiments, where the data block comparisons may not be exact (may erroneously match blocks that do in fact differ) the system is configured to always write a newly generated data block to the frame buffer periodically, e.g., once a second, in respect of each given data block (data block position). This will then ensure that a new data block is written into the frame buffer at least periodically for every data block position, and thereby avoid, e.g., erroneously matched data blocks being retained in the frame buffer for more than a given, e.g. desired or selected, period of time. This may be done, e.g., by simply writing out an entire new output data array periodically (e.g. once a second). or by writing new data blocks out to the frame buffer on a rolling basis in a cyclic pattern, so that over time all the data block positions are eventually written out as new.
Various alternatives and modifications to the above arrangements would be possible. For example, the output array of data that the graphics processor is generating may also or instead comprise other outputs of a graphics processor such as a graphics texture (where, e.g., the render "target" is a texture that the graphics processor is being used to generate (e.g. in "render to texture" operation)) or other surface to which the output of the graphics processor system is to be written.
It would be possible to use a more sophisticated metadata arrangement, for example where data blocks are not just compared to their immediately preceding data block but to more than one data block in the output frame (data array). In this case the metadata (e.g. bitmap entry) associated with each respective block position should indicate not only that the corresponding data block is similar to another data block in the output data array but also which data block in the output data array it is similar to.
Similarly, the current, completed data block could be compared to plural data blocks that are in the data array. This may help to further reduce the number of data blocks that need to be read from the main memory for the processing, as it will allow the reading of data blocks that are similar to data blocks in other positions in the data array to be eliminated.
In a preferred embodiment, it is possible for a software application (e.g. that is triggering the generation of the data array, and/or that is to use and/or receive the output array that is being generated) to indicate and control which regions of the output data array are processed in the manner of the present embodiment, and in -42 -particular, and preferably, to indicate which regions of the output array the data block comparison process should be performed for. This would then allow the process of the present invention to be "turned off' by the application for regions of the output array the application "knows" will be always updated.
This may be achieved as desired. In a preferred embodiment registers are provided that enable/disable data block (e.g. rendered tile) comparisons for output array regions, and the software application then sets the registers accordingly (e.g. via the graphics processor driver).
Although the present embodiment has been described above with particular reference to graphics processor operation, the Applicants have recognised that the principles of the present invention can equally be applied to other systems that process data in the form of blocks in a similar manner to, e.g., tile-based graphics processing systems, and that, for example, read frame buffers or textures. Thus, it may, for example, be applied to a host processor manipulating the frame buffer, a graphics processor reading a texture, a composition engine reading images to be cornposited, or a video processor reading reference frames for video decoding.
Thus the techniques of the embodiment may equally be used, for example, for video processing (as video processing operates on blocks of data analogous to tiles in graphics processing), and for composite image processing (as again the composition frame buffer will be processed as distinct blocks of data).
They may also be used, for example, when processing the data (images) generated by (digital) cameras (video or stilt). In this case, the data from the camera's sensor, could, e.g., be processed as discussed above by the camera's controller to generate the appropriate meta-data for the image data that is written to memory (and to control the writing of the image data if desired). The so-stored image and meta-data could then be processed in the manner of the present invention by an, e.g., display controller that is to display the images from the camera.
The present embodiment may also be used where there are plural master devices each writing to the same output data array, e.g., frame in a frame buffer.
This may be the case, for example, when a host processor 9 generates an "overlay" to be displayed on an image that is being generated by the graphics processor 1.
In this case, each device writing to the output data array could update the similarity meta-data accordingly, or, e.g., the meta-data for those parts of the output array that another master is writing to could be invalidated or cleared (so that those -43-parts of the output array will be read out in full to the output device). The latter would be necessary where a given master device is unable to update the similarity meta-data. It would also be possible to invalidate (clear) the meta-data for the entire output array if, e.g., another master modifies a relatively large portion of the output array (or modifies the output array at all).
Various other preferred and alternative arrangements of the present embodiment are possible.
For example, the metadata may also be used to manage the storing of the data blocks in the local memory 8 of the display controller 7 and in particular as a factor in determining the eviction of data blocks from the local memory 8. For example, the metadata may be used to determine a data block or blocks that is going to be used repeatedly and that data block (or blocks) then be locked (for the time being) in the local memory of the processing device (once it is written there) so that it will be available in the local memory when it is needed in the future.
It would also be possible to keep a count of the number of times a given data block in the local memory 8 is to be used in the near future (based, e.g., on meta-data that has been pre-fetched for the portion of the output array that is being processed), and to only allow a data block to be evicted from the local memory when its "use" count is zero.
It can be seen from the above that the present invention, in its preferred embodiments at least, can help to reduce, for example, display controller power consumption and memory bandwidth.
This is achieved, in the preferred embodiments of the present invention at least, by eliminating unnecessary "main" memory read transactions. This reduces the amount of data that is read from main memory, thereby significantly reducing system power consumption and the amount of memory bandwidth consumed. It can be applied to graphics frame buffer, graphics render to texture, video frame buffer and composition frame buffer read transactions, etc..
The power and bandwidth savings when using the present invention can be relatively significant. For example, for a game and video content, with a standard definition frame buffer, using 32 byte linear blocks, where the previous 4 blocks are analysed (requiring a multi-bit bitmap), the applicants have found that about 17% of read and write transactions can be eliminated. For high definition frame buffers the elimination rate is even higher. For GUI content with a similar configuration about 80% of frame buffer read and write transactions can be eliminated.
-44 -Where both reads and writes are eliminated for HO (1920x1080x24bpp), with 60fps frame display rate (read) and 30fps frame update rate (write) and assuming 2.4nJ per 32-bit off-chip transfer this equates to a bandwidth saving of about 90MB/s and a power saving of 57mW for game and video content. For GUI content the savings are 427MB/s and 268mW.
So far as the additional overhead due to the need to store meta-data in the present invention is concerned, for a system where only the preceding data block is analysed (i.e. the meta-data comprises a single bit per data block position), a high definition frame using data blocks corresponding to 32 byte cache lines has been found to result in an additional 32 KB of control data for an HD frame occupying 7.9 MB. If using data blocks corresponding to 64 byte tile lines, the control data is 16 KB. For data blocks corresponding to 512 byte half tiles it is 2 KB, and for data blocks corresponding to 1024 byte tiles, it is 1KB. -45 -

Claims (32)

  1. CLAIMS1. A method of processing an array of data in which a processing device processes the array of data by processing successive blocks of data each representing particular regions of the array of data and blocks of data representing particular regions of the array of data are read from a first memory in which the array of data is stored and stored in a memory of the processing device prior to the blocks of data being processed by the processing device; the method comprising: determining whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and either processing for the block of data to be processed a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
  2. 2. The method of claim 1, wherein the step of determining whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and either processing for the block of data to be processed a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination, comprises: if it is determined that a block of data to be processed is to be considered to be similar to a block of data already stored in the local memory of the processing device, not reading a new block of data from the data array stored in the first memory and storing it in the memory of the processing device, but instead processing the existing block of data in the memory of the processing device as the block of data to be processed by the processing device; and if it is determined that a block of data to be processed is not to be considered to be similar to a block of data already stored in the memory of the processing device, reading a new block of data from the data array stored in the first memory and storing it in the memory of the processing device, and then processing that new block of data as the block of data to be processed by the processing device.
    -46 -
  3. 3. The method of claim 1, or 2, wherein the processing device is one of a display controller, a CPU, a video processor and a graphics processor.
  4. 4. The method of any one of the preceding claims, wherein the similarity determination process determines whether a data block to be processed is similar to a block that is already stored in the memory of the processing device using similarity information that is associated with the array of data.
  5. 5. The method of any one of the preceding claims, wherein the data array has associated with it similarity information indicating for each respective data block in the data array whether that data block is similar to another data block in the data array, and the similarity determination process determines whether a data block to be processed is similar to a data block that is already stored in the memory of the processing device using the relevant similarity information for the data block.
  6. 6. A method of generating meta-data for use when processing an array of data that is stored in memory, the method comprising: for each of one or more blocks of data representing particular regions of an array of data to be processed: determining whether the block of data should be considered to be similar to another block of data for the data array; generating similarity information indicating whether the block of data was determined to be similar to another block of data for the data array; and storing the similarity information indicating whether the block of data was determined to be similar to another block of data for the data array in association with the array of data.
  7. 7. The method of claim 6, wherein the step of determining whether the block of data should be considered to be similar to another block of data for the data array comprises comparing some or all of the actual content of the data blocks to determine if the data blocks are to be considered to be similar or not.
  8. 8. A method of processing an array of data, the method comprising: generating an array of data to be processed; -47-for each of one or more blocks of data representing particular regions of the array of data to be processed: determining whether the block of data should be considered to be similar to another block of data for the data array; and generating similarity information indicating whether the block of data was determined to be similar to another block of data for the data array; storing the array of data and its associated generated similarity information; reading blocks of data each representing particular regions of the array of data from the stored array of data and storing them in a memory of a processing device that is to process the data array, prior to the blocks of data being processed by the processing device; using the similarity information generated for the data array to determine whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device; and either processing for the block of data to be processed a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
  9. 9. The method of any one of claims 6 to 8, further comprising: not writing a data block to the data array in memory if it has been determined that that data block should be considered to be similar to another data block for the data array.
  10. 10. The method of any one of the preceding claims, wherein the array of data is data representing an image.
  11. 11. The method of any one of the preceding claims, wherein the blocks of data that are considered each comprise a cache line or a 2D sub-tile of the data array.
  12. 12. A system comprising: a first memory for storing an array of data to be processed; a processing device for processing an array of data stored in the first mehiory, by processing successive blocks of data, each representing particular regions of the array of data, the processing device having a local memory; -48 -a read controller configured to read blocks of data representing particular regions of an array of data that is stored in the first memory and to store the blocks of data in the local memory of the processing device prior to the blocks of data being processed by the processing device; and a controller configured to determine whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and to cause the processing device to process for the block of data to be processed either a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
  13. 13. The system of claim 12, wherein the read controller and controller are part of the processing device.
  14. 14. An apparatus for use when processing an array of data stored in a first memory, comprising: a read controller configured to read blocks of data representing particular regions of an array of data that is stored in the first memory and to store the blocks of data in a local memory of a processing device that is to process the array of data prior to the blocks of data being processed by the processing device; and a controller configured to determine whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and to cause the processing device to process for the block of data to be processed either a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
  15. 15. The system or apparatus of claim 12, 13 or 14, wherein the controller is configured to: if it is determined that a block of data to be processed is to be considered to be similar to a block of data already stored in the local memory of the processing device, cause the read controller to not read a new block of data from the data array stored in the first memory and store it in the memory of the processing device, and to cause the processing device to process the existing block of data in the -49 -memory of the processing device as the block of data to be processed by the processing device; and if it is determined that a block of data to be processed is not to be considered to be similar to a block of data already stored in the memory of the processing device, cause the read controller to read a new block of data from the data array stored in the first memory and store it in the memory of the processing device, and to cause the processing device to then process that new block of data as the block of data to be processed by the processing device;
  16. 16. The system or apparatus of claim 12, 13, 14, or 15, wherein the processing device is one of a display controller, a CPU, a video processor and a graphics processor.
  17. 17. The system or apparatus of any one of claims 12 to 16, wherein the controller determines whether a data block to be processed is similar to a block that is already stored in the memory of the processing device using similarity information that is associated with the array of data.
  18. 18. The system or apparatus of any one of claims 12 to 17, wherein the data array has associated with it similarity information indicating for each respective data block of the data array whether that data block is similar to another data block in the data array, and the controller determines whether a data block to be processed is similar to a data block that is already stored in the memory of the processing device using the relevant similarity information for the that data block.
  19. 19. A data processing system, comprising: a data processor for generating an array of data for processing; means for determining for each of one or more blocks of data representing particular regions of the array of data whether the block of data should be considered to be similar to another block of data for the data array; means for generating similarity information indicating whether the block of data was determined to be similar to another block of data for the data array; and means for storing the similarity information indicating whether a block of data was determined to be similar to another block of data for the data array in association with the array of data.-50 -
  20. 20. The system of claim 19, wherein the means for determining for each of one or more blocks of data representing particular regions of the array of data whether the block of data should be considered to be similar to another block of data for the data array, the means for generating similarity information indicating whether the block of data was determined to be similar to another block of data for the data array, and the means for storing the similarity information indicating whether a block of data was determined to be similar to another block of data for the data array in association with the array of data, are part of the data processor.
  21. 21. The system of claim 19 or 20, wherein the data processor is one of a camera controller, a graphics processor, a CPU and a video processor.
  22. 22. The system of claim 19, 20 or 21, wherein the means for determining whether the block of data should be considered to be similar to another block of data for the data array comprises means for comparing some or all of the actual content of the data blocks to determine if the data blocks are to be considered to be similar or not.
  23. 23. An apparatus for use in a data processing system in which an array of data generated by the data processing system is read from an output buffer by reading blocks of data representing particular regions of the array of data from the output buffer, the apparatus comprising: means for comparing a block of data for the data array with at least one other block of data for the data array, and for generating information indicating whether or not the block of data is to be considered to be similar to another block of data for the data array on the basis of the comparison; and means for storing that similarity information in association with the data array.
  24. 24. A data processing system, comprising: a data processor for generating an array of data to be processed; means for determining for each of one or more blocks of data representing particular regions of the array of data whether the block of data should be considered to be similar to another block of data for the data array; means for generating similarity information indicating whether the block of data was determined to be similar to another block of data for the data array; means for storing the array of data and its associated generated similarity information; and a processing device for processing the stored array of data, by processing successive blocks of data, each representing particular regions of the array of data, the processing device having a local memory; a read controller configured to read blocks of data representing particular regions of the array of data from the stored array of data and to store the blocks of data in the local memory of the processing device prior to the blocks of data being processed by the processing device: and a controller configured to use the similarity information generated for the data array to determine whether a block of data to be processed for the data array is similar to a block of data that is already stored in the memory of the processing device, and to cause the processing device to process for the block of data to be processed either a block of data that is already stored in the memory of the processing device, or a new block of data from the array of data stored in the first memory, on the basis of the similarity determination.
  25. 25. The system or apparatus of any one of claims 19 to 24, further comprising: means for not writing a data block to the data array in memory if it has been determined that that data block should be considered to be similar to another data block for the data array.
  26. 26. The system or apparatus of any one of claims 12 to 25, wherein the array of data is data representing an image.
  27. 27. The system or apparatus of any one of claims 12 to 26, wherein the blocks of data that are considered each comprise a cache line or a 2D sub-tile of the data array.
  28. 28. A computer program comprising code for performing all the steps of the method of any one of claims 1 to 12 when the program is run on a data processing system.-52 -
  29. 29. A method of processing an array of data substantially as herein described with reference to any one of the accompanying drawings.
  30. 30. A method of generating meta-data for use when processing an array of data substantially as herein described with reference to any one of the accompanying drawings.
  31. 31. A data-processing system substantially as herein described with reference to any one of the accompanying drawings.
  32. 32. An apparatus for use in a data processing system substantially as herein described with reference to any one of the accompanying drawings.
GB201016165A 2009-09-25 2010-09-24 Methods of and apparatus for controlling the reading of arrays of data from memory Active GB2474115B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB0916924.4A GB0916924D0 (en) 2009-09-25 2009-09-25 Graphics processing systems
GBGB1014602.5A GB201014602D0 (en) 2010-09-02 2010-09-02 Methods of and apparatus for controlling the reading of arrays of data from memory

Publications (3)

Publication Number Publication Date
GB201016165D0 GB201016165D0 (en) 2010-11-10
GB2474115A true GB2474115A (en) 2011-04-06
GB2474115B GB2474115B (en) 2012-10-03

Family

ID=43127977

Family Applications (2)

Application Number Title Priority Date Filing Date
GB201016162A Active GB2474114B (en) 2009-09-25 2010-09-24 Graphics processing systems
GB201016165A Active GB2474115B (en) 2009-09-25 2010-09-24 Methods of and apparatus for controlling the reading of arrays of data from memory

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB201016162A Active GB2474114B (en) 2009-09-25 2010-09-24 Graphics processing systems

Country Status (3)

Country Link
JP (2) JP5835879B2 (en)
CN (2) CN102033809B (en)
GB (2) GB2474114B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2504587A (en) * 2011-03-03 2014-02-05 Advanced Risc Mach Ltd Associating meta-information with vertex shader attributes
GB2504814A (en) * 2011-03-03 2014-02-12 Advanced Risc Mach Ltd Associating meta-information with vertex shader attributes
GB2507851B (en) * 2012-09-06 2017-05-17 Imagination Tech Ltd Systems and methods of partial frame buffer updating

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120133659A1 (en) * 2010-11-30 2012-05-31 Ati Technologies Ulc Method and apparatus for providing static frame
CN102427533B (en) * 2011-11-22 2013-11-06 苏州科雷芯电子科技有限公司 Video transmission device and method
US9659393B2 (en) 2013-10-07 2017-05-23 Intel Corporation Selective rasterization
GB2521170A (en) * 2013-12-11 2015-06-17 Advanced Risc Mach Ltd Method of and apparatus for displaying an output surface in data processing systems
US20150278981A1 (en) * 2014-03-27 2015-10-01 Tomas G. Akenine-Moller Avoiding Sending Unchanged Regions to Display
KR102197067B1 (en) * 2014-04-02 2020-12-30 삼성전자 주식회사 Method and Apparatus for rendering same region of multi frames
GB2525223B (en) 2014-04-16 2020-07-15 Advanced Risc Mach Ltd Graphics processing systems
US9940686B2 (en) 2014-05-14 2018-04-10 Intel Corporation Exploiting frame to frame coherency in a sort-middle architecture
GB2531015B (en) 2014-10-07 2021-06-30 Advanced Risc Mach Ltd Data processing systems
GB2531014B (en) 2014-10-07 2020-12-09 Advanced Risc Mach Ltd Data processing systems
GB2531358B (en) * 2014-10-17 2019-03-27 Advanced Risc Mach Ltd Method of and apparatus for processing a frame
GB2548852B (en) * 2016-03-30 2020-10-28 Advanced Risc Mach Ltd Method of operating a graphics processing pipeline by compressing a block of sampling positions having the same data value
EP3510483B1 (en) 2016-09-23 2023-12-20 Huawei Technologies Co., Ltd. Binary image differential patching
US10276125B2 (en) 2016-09-30 2019-04-30 Arm Limited Method of and apparatus for controlling overrun when writing data from a display controller to memory
CN108170393A (en) * 2017-12-29 2018-06-15 佛山市幻云科技有限公司 A kind of SCM Based display methods and system
GB2572404B (en) 2018-03-29 2020-04-15 Imagination Tech Ltd Method and system for controlling processing
GB2579590B (en) 2018-12-04 2021-10-13 Imagination Tech Ltd Workload repetition redundancy
GB2579591B (en) * 2018-12-04 2022-10-26 Imagination Tech Ltd Buffer checker
US11221976B2 (en) * 2019-01-25 2022-01-11 Microchip Technology Incorporated Allocation of buffer interfaces for moving data, and related systems, methods and devices
CN110673815B (en) * 2019-10-15 2023-06-06 重庆远视科技有限公司 Bitmap display method, device, equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63298485A (en) * 1987-05-28 1988-12-06 Matsushita Electric Ind Co Ltd Image processor
JPH05266177A (en) * 1992-03-19 1993-10-15 Nec Corp Plotting device
JPH11355536A (en) * 1998-06-08 1999-12-24 Konica Corp Image processing method and image processor

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05227476A (en) * 1992-02-14 1993-09-03 Hitachi Ltd Picture data storing system
US6094203A (en) * 1997-09-17 2000-07-25 Hewlett-Packard Company Architecture for a graphics processing unit using main memory
JPH11328441A (en) * 1998-05-11 1999-11-30 Hitachi Ltd Graphics display control method and computer graphics
US6885378B1 (en) * 2000-09-28 2005-04-26 Intel Corporation Method and apparatus for the implementation of full-scene anti-aliasing supersampling
US8683024B2 (en) * 2003-11-26 2014-03-25 Riip, Inc. System for video digitization and image correction for use with a computer management system
JP2005195899A (en) * 2004-01-07 2005-07-21 Matsushita Electric Ind Co Ltd Image transfer system
US20060050976A1 (en) * 2004-09-09 2006-03-09 Stephen Molloy Caching method and apparatus for video motion compensation
JP4795808B2 (en) * 2005-02-23 2011-10-19 パナソニック株式会社 Drawing apparatus, drawing method, drawing program, and drawing integrated circuit
JP2006252480A (en) * 2005-03-14 2006-09-21 Fuji Xerox Co Ltd Computer, image processing system, and image processing method
CN1332300C (en) * 2005-04-30 2007-08-15 广东威创日新电子有限公司 Remote display processing method based on server end/client end structure
JP4591291B2 (en) * 2005-09-14 2010-12-01 日本電気株式会社 Turbo decoding apparatus and method and program thereof
US20080002894A1 (en) * 2006-06-29 2008-01-03 Winbond Electronics Corporation Signature-based video redirection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63298485A (en) * 1987-05-28 1988-12-06 Matsushita Electric Ind Co Ltd Image processor
JPH05266177A (en) * 1992-03-19 1993-10-15 Nec Corp Plotting device
JPH11355536A (en) * 1998-06-08 1999-12-24 Konica Corp Image processing method and image processor

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2504587A (en) * 2011-03-03 2014-02-05 Advanced Risc Mach Ltd Associating meta-information with vertex shader attributes
GB2504814A (en) * 2011-03-03 2014-02-12 Advanced Risc Mach Ltd Associating meta-information with vertex shader attributes
GB2504587B (en) * 2011-03-03 2015-03-11 Advanced Risc Mach Ltd Graphics processing
GB2504814B (en) * 2011-03-03 2015-03-11 Advanced Risc Mach Ltd Graphics processing
GB2507851B (en) * 2012-09-06 2017-05-17 Imagination Tech Ltd Systems and methods of partial frame buffer updating

Also Published As

Publication number Publication date
GB2474114A (en) 2011-04-06
JP5835879B2 (en) 2015-12-24
GB2474114B (en) 2012-02-15
JP5751782B2 (en) 2015-07-22
JP2011070672A (en) 2011-04-07
JP2011070671A (en) 2011-04-07
CN102033728A (en) 2011-04-27
GB201016162D0 (en) 2010-11-10
GB201016165D0 (en) 2010-11-10
GB2474115B (en) 2012-10-03
CN102033809A (en) 2011-04-27
CN102033728B (en) 2016-04-13
CN102033809B (en) 2015-11-25

Similar Documents

Publication Publication Date Title
US8988443B2 (en) Methods of and apparatus for controlling the reading of arrays of data from memory
JP5835879B2 (en) Method and apparatus for controlling reading of an array of data from memory
US9406155B2 (en) Graphics processing systems
US9881401B2 (en) Graphics processing system
US10001941B2 (en) Graphics processing systems
EP3274841B1 (en) Compaction for memory hierarchies
US9996363B2 (en) Methods of and apparatus for displaying windows on a display
CN106030652B (en) Method, system and composite display controller for providing output surface and computer medium
US11023152B2 (en) Methods and apparatus for storing data in memory in data processing systems
US10832639B2 (en) Method of and apparatus for generating a signature representative of the content of an array of data
US11308570B2 (en) Video data processing system for storing frames of video data
US10896536B2 (en) Providing output surface data to a display in data processing systems
US11954038B2 (en) Efficient evict for cache block memory
US11205243B2 (en) Data processing systems
TWI793644B (en) Cache arrangements for data processing systems
US10283073B2 (en) Data processing systems
US20230206380A1 (en) Optimizing partial writes to compressed blocks