WO2009109891A1 - Processor comprising a cache memory - Google Patents

Processor comprising a cache memory Download PDF

Info

Publication number
WO2009109891A1
WO2009109891A1 PCT/IB2009/050823 IB2009050823W WO2009109891A1 WO 2009109891 A1 WO2009109891 A1 WO 2009109891A1 IB 2009050823 W IB2009050823 W IB 2009050823W WO 2009109891 A1 WO2009109891 A1 WO 2009109891A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
data
cache memory
mode
cma
Prior art date
Application number
PCT/IB2009/050823
Other languages
French (fr)
Inventor
Philippe Durieux
Olivier Guttin
Original Assignee
Nxp B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nxp B.V. filed Critical Nxp B.V.
Publication of WO2009109891A1 publication Critical patent/WO2009109891A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/43Hardware specially adapted for motion estimation or compensation
    • H04N19/433Hardware specially adapted for motion estimation or compensation characterised by techniques for memory access
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • An aspect of the invention relates to a processor comprising a cache memory.
  • the processor may be in the form of, for example, a decoder for decoding video data that has been encoded in accordance with an MPEG standard, or a similar standard (MPEG is an acronym for Moving Picture Experts Group).
  • MPEG is an acronym for Moving Picture Experts Group
  • the cache memory is typically used for storing picture areas that are referred to in a motion compensation process.
  • Other aspects of the invention relate to a video apparatus comprising a processor with a cache memory, a method of processing data that involves use of a cache memory, and a computer program product for a programmable processor.
  • a cache memory can be regarded as a buffer memory, which can be accessed more rapidly than a main memory. Data that may be needed by a given process at a given instant is transferred in advance from the main memory to the cache memory. This can be regarded as an anticipatory action. At the instant when the process needs the data, the cache memory can provide this data within a relatively short delay. That is, the process obtains the data more quickly than if the process was to fetch the data from the main memory. Consequently, a cache memory contributes to achieving a relatively high processing speed.
  • International patent application published under number WO 2004/102971 describes a video processing device that comprises an external memory for storing reference pictures. A cache memory temporarily stores data corresponding to a prediction area. A motion compensation circuit read this data from the cache memory in order to reconstruct pictures from decoded data.
  • Data which needs to be processed, may comprise respective segments that refer to respective data portions, whereby a segment is accompanied by an indication of a data portion to which the segment refers.
  • MPEG-coded video data is an example of such type of data.
  • a block of pixels that has been coded by means of motion estimation and compensation constitutes such a segment.
  • the block of pixels refers to a picture area in another picture by means of a motion vector. Let it be assumed that a processor for processing the type of data concerned comprises a cache memory.
  • the cache memory preferably has a capacity that allows storage of a data portion to which a segment refers, as well as several neighboring data portions. This approach may provide satisfactory results in case there is a relatively low degree of disparity in position between respective data portions referred to by respective segments that are successively processed. In that case, there is a relatively high probability that a data portion, which is referred to by a segment, is indeed present in the cache memory when the segment is processed. However, it may occur that respective segments, which are successively processed, refer to respective data portions that exhibit a relatively high degree of disparity in terms of position.
  • the respective data portions occupy respective positions within the data of interest that differ to a relatively large extent. In that case, it may be necessary to refresh the cache memory, as it were, relatively frequently. In an extreme case, the cache memory needs to be refreshed with each segment to be processed. Refreshing the cache memory generally entails transferring a relatively large amount of data from the main memory to the cache memory. Consequently, a relatively high degree of disparity in position as mentioned hereinbefore, may necessitate a data traffic between the main memory and the cache memory that exceeds available capacity. In that case, some data will be lost, which will lead to errors. It is possible to prevent such errors by increasing the available capacity for data transfer between the main memory and the cache memory, or by increasing memory capacity, or both. However, these solutions can be relatively costly.
  • a processor comprises a cache memory arrangement that can operate in a normal mode and in a bypass mode.
  • a request for a data portion is presented to a cache memory.
  • the bypass mode such a request is systematically presented to a main memory.
  • a cache mode controller analyzes the respective indications accompanying a group of segments that will be successively processed, so as to establish a degree of disparity in position between the respective data portions which are referred to.
  • the cache memory arrangement operates in the bypass mode if the degree of disparity exceeds a predefined limit and in the normal mode otherwise.
  • the cache memory is effectively bypassed in case a group of segments is processed that refer to data portions that exhibit a relatively high degree of disparity in position.
  • the data portion to which the segment refers is directly fetched from the main memory. Transferring the data portion from the main memory to the processor entails smaller data traffic than refreshing the cache memory.
  • the cache memory is used only if relatively few refreshes are required for processing a group of segments. Accordingly, the data traffic between the main memory and the cache memory can be kept below a relatively low limit, even in critical cases. Data losses and errors can be prevented without having to resort to relatively costly measures such as, for example, increasing data transfer capacity or increasing memory capacity, or both.
  • An implementation of the invention advantageously comprises one or more of the following additional features, which are described in separate paragraphs that correspond with individual dependent claims.
  • the data which is to be processed, may comprise video data that has been encoded in accordance with a standard according to which a macroblock is divided into smaller blocks, and according to which a smaller block can be encoded with reference to a particular picture area, a smaller block being accompanied by an indication of the particular picture area with reference to which the smaller block has been encoded.
  • the cache mode controller is then preferably arranged to analyze the respective indications accompanying the respective smaller blocks of a macroblock, so as to cause the cache memory arrangement to operate in a bypass mode or in the normal mode on a macroblock basis. This allows error- free and efficient video processing at moderate cost.
  • the indication that accompanies a smaller block may comprise an index that specifies a picture that comprises the particular picture area with reference to which the smaller block has been encoded.
  • the indication may further comprise motion vector data that specifies a displacement between the smaller block and the particular picture area.
  • the cache mode controller then preferably establishes a degree of disparity in position on the basis of the respective indices and the respective motion vector data of the respective smaller blocks of comprised in a macroblock.
  • the cache mode controller may establish the highest number of smaller blocks within a macroblock that have the same index, and cause the cache memory arrangement to operate in the bypass mode if this number is below a predefined value.
  • the cache mode controller may establish a sum of absolute differences between the respective displacements associated with the respective smaller blocks, and cause the cache memory arrangement to operate in the bypass mode if this sum of absolute differences above a predefined value.
  • the cache memory arrangement preferably comprises a cache control module that divides the cache memory into respective cache lines in which respective rectangular picture areas can be stored.
  • the main memory may provide a data burst, which has a maximum size in terms of number of data elements that can be comprised in the data burst.
  • a cache line preferably has a size, in terms of number of data elements that can be stored in the cache line, that corresponds with the maximum size of the data burst.
  • Fig. 1 is a block diagram that illustrates a video system.
  • Fig. 2 is a block diagram that illustrates a decoder, which forms part of the video system.
  • Fig. 3 is a conceptual diagram that illustrates storage of a rectangular picture area in a cache memory of the decoder.
  • Fig. 4 is a conceptual diagram that illustrates a case where a cache memory arrangement in the decoder may operate in a normal mode because there is a relatively small degree of disparity in position between respective picture areas referred to by respective block of pixels.
  • Fig. 5 is a conceptual diagram that illustrates a case where the cache memory arrangement may operate in a bypass mode because there is a relatively high degree of disparity, in terms of spatial position, between respective picture areas referred to by respective block of pixels.
  • Fig. 6 is a conceptual diagram that illustrates another case where the cache memory arrangement operate in the bypass mode because there is a relatively high degree of disparity, in terms of temporal position, between respective picture areas referred to by respective block of pixels.
  • Fig. 7 is a flow chart diagram that illustrates a series of steps to appropriately set the cache memory arrangement in the normal mode or in the bypass mode.
  • Fig. 1 illustrates a video system VSY that is coupled to a network NW, which may be, for example, a cable television network or the Internet.
  • a video server SV is also coupled to the network NW.
  • the video system VSY comprises a video apparatus VA, a display device DPL, and a remote-control device RCD.
  • the video apparatus VA may be in the form of, for example, a so-called settop box.
  • the video apparatus VA comprises an input module INM, a decoder DEC, an output module OUM, and a controller CTRL.
  • the video system VSY basically operates as follows.
  • a user may select a particular video title that is present on the video server SV by means of the remote-control device RCD.
  • the video apparatus VA submits a request, which indicates this selected video title, to the video server SV.
  • the video server SV provides a transport stream TS that conveys the selected video title.
  • the transport stream TS may be, for example, in accordance with an MPEG-4 encoding standard.
  • the video apparatus VA receives the transport stream TS via the network NW.
  • the input module INM of the video apparatus VA extracts from the transport stream TS coded video data VC that represents the selected video title.
  • the decoder DEC decodes this coded video data VC so as to obtain decoded video data VD.
  • the output module OUM provides a display driver signal DD on the basis of the decoded video data VD and supplemental data, if any, such as, for example, audio or text, which is to be overlaid.
  • the display driver signal DD causes the display device DPL to display the selected video title. It is desirable that the selected video title is displayed in an error-free fashion, that is, without any annoying artifacts. It is also desirable that the video apparatus VA and, in particular, the decoder DEC can achieve this while being relatively low cost.
  • Fig. 2 illustrates the decoder DEC, which forms part of the video apparatus VA.
  • the decoder DEC comprises two main entities: a decoding processor DPR and a main memory MM, which are coupled to each other via a bus system BS.
  • the decoding processor DPR may be in the form of an integrated circuit, which comprises one or more data processing circuits and one or more memory circuits.
  • the main memory MM may be in the form of, for example, a commercially available dynamic random access memory (DRAM) or another type of commercially available memory, which offers a relatively high storage/price ratio.
  • the main memory MM is external to the decoding processor DPR.
  • the bus system BS will then typically be provided on a circuit board, or another substrate, on which the decoding processor DPR and the main memory MM are physically mounted.
  • the decoding processor DPR comprises various functional entities that constitute a decoding path: a front-end module FE, a residual decoder RD, a motion compensator MC, and a back-end module BE.
  • the decoding processor DPR further comprises various functional entities that relate to memory access: a cache mode controller CMC, a cache memory arrangement CMA, and a memory interface MIF.
  • the cache memory arrangement CMA comprises further functional entities: a cache control module CC, a cache memory CM, and two multiplexers MUX 1 , MUX2.
  • Each of the aforementioned functional entities may be implemented by means of a dedicated circuit, which has a particular topology defining one or more data handling operations that the functional entity concerned can carry out. This can be regarded as a hardware-based implementation.
  • Each of the aforementioned functional entities may also be implemented by means of a programmable circuit, which can execute a set of instructions defining one or more data handling operations that the functional entity concerned can carry out. This can be regarded as a software-based implementation. It should be noted that several functional entities may be implemented by means of a single programmable circuit.
  • the decoding processor DPR may be hybrid in the sense that it comprises dedicated circuits as well as programmable circuits.
  • the decoder DEC basically operates follows.
  • the coded video data VC from the input module INM illustrated in Fig. 1 may be temporarily stored in the main memory MM.
  • the main memory MM may comprise a section dedicated to that purpose.
  • the controller CTRL illustrated in Fig. 1, or a dedicated memory controller, may define and manage this section.
  • the decoded video data VD which the decoder DEC provides, may also be temporarily stored in the main memory MM in a similar fashion. Accordingly, at any given instant, the main memory MM will typically comprise one or more pictures, which have been decoded relatively recently, and which may be needed for decoding subsequent pictures.
  • the decoding processor DPR accesses the main memory MM via the memory interface MIF. A memory access may concern a read operation or a write operation.
  • the coded video data VC comprises a sequence of coded pictures.
  • a coded picture is composed of various coded macroblocks.
  • a coded macroblock represents a relatively large block of pixels within the picture of interest, typically a block of 16 x 16 pixels.
  • the coded macroblock itself is composed of various coded blocks of pixels of smaller size. These coded blocks of pixels are commonly referred to as macroblock partitions in MPEG-4 standards.
  • a motion estimation operation has individually been carried out at a coding end for each of these coded blocks of pixels.
  • Motion estimation is a process of reducing temporal redundancy.
  • a picture area is searched for in a neighboring picture, on the basis of which a similar block of pixels can be established that best matches a block of pixels to be encoded.
  • the similar block of pixels, as well as the picture area on which it is based, is indicated by means of a motion vector.
  • the motion vector has sub-pixel precision
  • the similar block of pixels is obtained through interpolation between pixels in the picture area of interest.
  • the similar block of pixels is effectively subtracted from the block of pixels to be encoded. Accordingly, a block of residual pixels is obtained, which undergoes a data compression operation.
  • a coded block of pixels can therefore be regarded as a segment that refers to a data portion, namely a picture area, in a neighboring picture.
  • the segment is accompanied by an indication, in the form of a motion vector, of the data portion (the picture area) that is referred to.
  • the front-end module FE receives a coded macroblock MBC, which the decoding processor DPR reads from the main memory MM. For each coded block of pixels comprised in the coded macroblock MBC, the front-end module FE extracts motion compensation parameters MP and coded residual data RC from the coded macroblock MBC.
  • the motion compensation parameters MP comprise motion vector data, as well as a reference index. More specifically, the motion vector data comprises a differential motion vector, which expresses the motion vector of the block of pixels concerned respect to another motion vector. This other motion vector, which is referred to, may be based on one or more motion vectors of one or more neighboring blocks of pixels.
  • the reference index can be regarded as an integer offset value that indicates a neighboring picture that comprises the picture area to which the coded block of pixels of interest refers.
  • the coded residual data RC represents a block of residual pixels, which has been obtained at the coding end by subtracting a similar block of pixels from the block of pixels of interest.
  • the cache mode controller CMC receives from the front-end module FE a set of motion compensation parameters MPS, which belong to the coded macroblock MBC.
  • the set of motion compensation parameters MPS comprises the respective differential motion vectors and the respective reference indices of the respective coded blocks of pixels, which are comprised in the coded macroblock MBC.
  • the cache mode controller CMC sets the cache memory arrangement CMA in a normal mode or in a bypass mode during the decoding of the macroblock of interest.
  • the cache mode controller CMC sets the cache memory CM in the normal mode or in the bypass mode by means of a mode control signal MO. This will be explained in greater detail hereinafter.
  • the residual decoder RD, the motion compensator MC, and the back-end module BE successively decode the blocks of pixels that are comprised in a coded macroblock MBC.
  • a coded block of pixels is decoded in the following manner.
  • the residual decoder RD decodes the coded residual data RC, which is comprised in the coded block of pixels. Accordingly, a decoded block of residual pixels RD is obtained that corresponds with the block of residual pixels that has been obtained at the coding end as described hereinbefore.
  • the motion compensator MC generates a direct memory access (DMA) request on the basis of the motion compensation parameters MP of the coded block of pixels of interest.
  • This direct memory access request which will be referred to as data request RQ hereinafter, defines a range of addresses under which the picture area is stored to which the coded block of pixels refers.
  • the motion compensator MC submits the data request RQ to the cache memory arrangement CMA.
  • the cache memory arrangement CMA provides image data ID that represents the picture area of interest.
  • the motion compensator MC generates a motion-compensated block of pixels BC on the basis of this image data ID and the motion compensation parameters MP.
  • the motion-compensated block of pixels BC corresponds with the similar block of pixels that has been obtained at the coding end as described hereinbefore.
  • the back-end module BE adds the decoded block of residual pixels RD, which the residual decoder RD provides, to the motion-compensated block of pixels BC, which the motion compensator MC provides. Accordingly, the back-end module BE provides a decoded block of pixels BD, which is temporarily stored in the main memory MM as a part of a decoded picture.
  • the cache control module CC processes a data request RQ that emanates from the motion compensator MC.
  • the cache control module CC checks whether the picture area to which the data request RQ pertains is present in the cache memory CM. If so, the cache control module CC effectively translates the data request RQ into a corresponding access request for the cache memory CM.
  • the cache memory CM provides cache read data CD that corresponds with the image data ID that the motion compensator MC requires, as described hereinbefore. There has been a so-called cache hit.
  • the cache control module CC refreshes at least a portion of the cache memory CM so that the picture area concerned will be present in the cache memory CM.
  • Cache memory refreshing generally concerns a relatively large amount of data, such as, for example, 128 or 256 bytes, for reasons of overall speed and efficiency. Accordingly, the cache control module CC generates an extended data request EQ, which defines a relatively large range of addresses under which the picture area of interest is stored as well as surrounding picture areas.
  • the extended data request EQ constitutes a main memory read request MQ, which the memory interface MIF submits to the main memory MM via the bus system BS.
  • the main memory MM provides memory read data MD that is stored under the relatively large range of addresses defined by the extended data request EQ.
  • This memory read data MD which is relatively large in size, is stored in the cache memory CM.
  • the cache control module CC may apply a particular replacement policy: the cache control module CC decides which data has to leave the cache memory CM in order to make room for the aforementioned memory read data MD, with which the cache memory CM is refreshed.
  • the cache control module CC processes the data request RQ as described hereinbefore with respect to a cache hit. That is, the cache control module CC causes the cache memory CM to produce the cache read data CD, which corresponds with the image data ID that the motion compensator MC requires.
  • the motion compensator MC receives the image data ID required, which represents the picture area of interest, irrespective of whether the data request RQ leads to a cache hit or a cache miss.
  • a cache miss will introduce additional latency; it will take more time before the motion compensator MC finally receives the image data ID compared with a cache hit.
  • a cache miss leads to the transfer of a relatively large amount of data from the main memory MM to the cache memory CM over the bus system BS, in order to refresh the cache memory CM. Consequently, repetitive cache misses will lead to relatively heavy data traffic on the bus system BS, which may cause congestion and data losses. This is unlikely to occur in the decoder DEC illustrated in Fig.
  • the cache memory arrangement CMA is set to operate in the bypass mode when repetitive cache misses are expected to occur in the normal mode.
  • a data request RQ that emanates from the motion compensator MC is directly presented to the main memory MM as a main memory read request MQ.
  • the main memory MM provides memory read data MD that represents the picture area of interest.
  • the memory read data MD is directly applied to the motion compensator MC as the image data ID that has been requested for the purpose of generating the motion-compensated block of pixels BC. Accordingly, in the bypass mode, the memory read data MD can be restricted to the image data ID that represents the picture area of interest, without any surplus.
  • the memory read data MD will be relatively small in size compared with the normal mode hereinbefore, in which the memory read data MD is used for refreshing the cache memory CM. Accordingly, relatively heavy traffic on the bus system BS can be avoided, so that there is less risk of congestion and data losses.
  • the cache memory arrangement CMA is made to operate in the normal mode or the bypass mode by means of the two multiplexers MUXl, MUX2, which receive the mode control signal MO from the cache mode controller CMC.
  • Multiplexer MUXl has two inputs: a first input coupled to receive an extended data request EQ, if any, from the cache control module CC, and a second input coupled to receive a data request RQ from the motion compensator MC.
  • An output of multiplexer MUXl is coupled to submit a main memory read request MQ to the main memory MM.
  • Multiplexer MUX2 also has two inputs: a first input coupled to receive main memory MM read data from the main memory MM, and a second input coupled to receive cache read data CD from the cache memory CM.
  • An output of multiplexer MUX2 is coupled to apply image data ID to the motion compensator MC.
  • the mode control signal MO causes the respective first inputs of the two multiplexers MUXl, MUX2 to be coupled to the respective outputs of these multiplexers.
  • the mode control signal MO causes the respective second inputs of the two multiplexers MUXl, MUX2 to be coupled to the respective outputs of these multiplexers.
  • Fig. 3 illustrates storage of a rectangular picture area Al, which forms part of a picture, in the cache memory CM. Such storage typically occurs when refreshing the cache memory CM.
  • a pixel is designated by means of a reference sign P that comprises two indices that indicate the position of a pixel in the picture.
  • a left-hand index indicates the position along a horizontal axis x in units of pixels; a right-hand index indicates the position along a vertical axis y in units of pixels.
  • the rectangular picture area Al comprises 4 lines of 32 pixels each.
  • Pixel P 1J constitutes an upper left corner of the rectangular picture area Al, i and j being integer numbers in depending on the position of the rectangular picture area Al within the picture.
  • a first line of the rectangular picture area Al extends from this pixel P 1J to pixel P! + 3i j .
  • a second line extends from P 1J+ i pixel to pixel P 1+31J+1 .
  • a third line extends from pixel P 1J+2 to pixel P 1+3 lj+2 .
  • a fourth line extends from pixel P 1J+3 to pixel P 1+3 lj+3 .
  • the cache control module CC in Fig. 2 effectively divides the cache memory CM into several cache lines Ll, ..., L4, as illustrated in Fig. 3.
  • This division which is of a logical nature rather than of a physical nature, defines a granularity of the cache memory CM in terms of access. That is, a cache line constitutes an access unit that comprises a group of memory cells, which need not necessarily form part of a single physical row or a single physical column of the cache memory CM.
  • a cache line may comprise 32 memory cells, which can collectively be accessed.
  • a memory cell may be, for example, 32 bits wide, meaning that a memory cell can store a 32-bit data word.
  • Fig. 3 illustrates four cache lines only for the sake of simplicity. In practice, the cache memory CM may comprise a relatively large number of cache lines, such as, for example, 1024 cache lines.
  • the rectangular picture area Al is stored in a single cache line, namely cache line Ll. Respective neighboring picture areas A2, A3, A4 of similar size may be stored in respective other cache lines L2, L3, L4.
  • a pixel comprises 8 bits.
  • cache line Ll comprises 32 memory cells of 32 bits each, as mentioned hereinbefore. That is, there is room for 4 pixels in a memory cell; 8 memory cells are needed to store a line of 32 pixels.
  • the pixels P 1J , ..., P 1+3 ⁇ J of the first line of the rectangular picture area Al may be stored in a first set of 8 memory cells of cache line, the pixels P 1J+ !, ... , P ⁇ i j+ i of the second line may be stored in a second set of 8 memory cells, and so on.
  • the main memory MM illustrated in Fig. 2 may communicate data in a burst- like fashion. That is, the main memory MM provides a data burst in response to a data request RQ. Such a data burst will typically have a maximum size in terms of number of data elements that can be comprised in the data burst. For example, a data burst may comprise 128 bytes or 256 bytes at the most.
  • the aforementioned cache lines preferably have a size, in terms of number of data elements that can be stored in a cache line, that corresponds with the maximum size of the data burst. That is, cache lines are preferably aligned with data bursts from the main memory MM in terms of address and size. This contributes to efficient data communication between the main memory MM and the cache memory arrangement CMA over the bus system BS.
  • Fig. 4 illustrates a case where the cache memory arrangement CMA may operate in the normal mode because there is a relatively low degree of disparity in position.
  • Fig. 4 is a conceptual diagram in which relatively large parallelograms represent pictures.
  • a series of four pictures PN, ... , PN-3 is represented.
  • the pictures are each designated by means of a reference sign, which has an index that indicates the position of the picture concerned within the series of pictures.
  • N is arbitrary integer number identifying an arbitrary picture represented by the coded video data VC.
  • Fig. 4 illustrates a group of coded pixel blocks Bl, ... , B4, which belong to picture P N .
  • the group of coded pixel blocks Bl, ... , B4 may constitute, for example, a coded macroblock MBC (Fig. 2).
  • the respective coded pixel blocks Bl, ..., B4 refer to respective picture areas Rl, ..., R4 in another picture.
  • This is symbolically represented by means arrows II, ..., 14. That is, arrow Il symbolically represents the motion compensation parameters MP (Fig. 2) of coded pixel block Bl.
  • arrows 12, 13, 14 represents the respective motion compensation parameters MP of coded pixel blocks B2, B3, B4, respectively.
  • the respective picture areas Rl, ..., R4 to which the respective coded pixel blocks refer belong to the same picture P N-1 .
  • the respective picture areas Rl, ..., R4 are relatively close to each other. In other words, there is a relatively low degree of disparity in the respective positions that these picture areas have.
  • the cache memory CM illustrated in Fig. 2 may be capable of storing a relatively large contiguous picture area, which encompasses at least two of the four picture areas Rl, ..., R4 referred to.
  • the cache memory CM may even be capable of storing all four picture areas Rl , ... , R4.
  • Fig. 5 illustrates a case where the cache memory arrangement CMA may operate in the bypass mode because there is a relatively high degree of disparity in position.
  • Fig. 5 is a conceptual diagram similar to that of Fig. 4. Similar entities are represented in a similar fashion, using identical reference signs.
  • the respective picture areas Rl, ..., R4 to which the respective coded pixel blocks Bl, ..., B4 refer belong to the same picture P N - i.
  • the respective picture areas Rl, ..., R4 are relatively remote from each other. In other words, there is a relatively high degree of disparity between these picture areas in terms of spatial position.
  • the cache memory CM illustrated in Fig. 2 may be capable of storing a relatively large contiguous picture area, it may occur that this relatively large picture area cannot cover more than one picture area that is referred to. In other words, the cache memory CM can store only one such picture area at a time. Frequent refreshing will be needed, which will lead to relatively heavy data traffic on the bus system BS if the cache memory arrangement CMA operates in the normal mode. Consequently, it is more judicious to have the cache memory arrangement CMA operate in the bypass mode.
  • FIG. 6 illustrates another case where the cache memory arrangement CMA may operate in the bypass mode.
  • Fig. 6 is a conceptual diagram similar to that of Fig. 4. Similar entities are represented in a similar fashion, using identical reference signs.
  • the respective picture areas Rl, ..., R4 to which the respective coded pixel blocks Bl, ..., B4 refer are scattered over three different pictures PN I, PN-2, PN-3-
  • PN I, PN-2, PN-3- In the example illustrated in Fig. 6, only two coded blocks of pixels Rl, R4 refer to the same picture P N-1 .
  • the cache memory CM may jointly store picture areas Rl and R4 illustrated in Fig. 6. Frequent refreshing will be needed, which will lead to relatively heavy data traffic on the bus system BS if the cache memory arrangement CMA operates in the normal mode. Consequently, like in the case illustrated in Fig. 5, it is more judicious to have the cache memory arrangement CMA operate in the bypass mode.
  • Fig. 7 illustrates a series of steps Sl, ... , S7, which the cache mode controller CMC illustrated in Fig. 2 may carry out in order to appropriately set the cache memory arrangement CMA in the normal mode or in the bypass mode.
  • the cache mode controller CMC carries out the series of steps Sl, ..., S7 on a macroblock basis.
  • the series of steps are carried out upon reception of a coded macroblock MBC by the front-end module FE, before the coded macroblock MBC is actually decoded by the residual decoder RD, the motion compensator MC, and the back-end module BE.
  • the series of steps are carried out anew upon reception of a new macroblock.
  • the cache mode controller CMC may be in the form of a programmable circuit.
  • Fig. 7 may thus be regarded as a flowchart representation of a software program, that is, a set of instructions, which causes the programmable circuit to carry out various operations described hereinafter with reference to Fig. 7.
  • step Sl the cache mode controller CMC receives a set of motion compensation parameters MPS, which belong to a coded macroblock MBC that the front-end module FE illustrated in Fig. 2 has received.
  • the set of motion compensation parameters MPS comprises the respective reference indices and the respective differential motion vectors of the respective coded blocks of pixels, which are comprised in the coded macroblock MBC (MPSMBC: RIMBC, DVMBC).
  • the temporal disparity value represents the highest number of coded blocks of pixels that have the same reference index. For example, let it be assumed that the case illustrated in Fig. 4 applies: the four coded blocks of pixels Bl, ..., B4 all refer to picture P N-1 . In that case, the temporal disparity value may be equal to 4. The same applies to the case illustrated in Fig. 5. However, in the case illustrated in Fig. 6, only two coded blocks of pixels Bl, B4 refer to the same picture P N I and have therefore the same reference index. In that case, the temporal disparity value may be equal to 2. That is, in this example, the lower the temporal disparity value is, the higher the degree of temporal disparity is.
  • step S3 the cache mode controller CMC establishes whether the following condition applies: the temporal disparity value is smaller than a temporal disparity threshold value (TDV ⁇ TH TD ?)•
  • the temporal disparity threshold value represents a limit in terms of the number of coded blocks of pixels that should have the same reference index.
  • the cache mode controller CMC subsequently carries out step S7, which will be described hereinafter.
  • the cache mode controller CMC subsequently carries out step S4.
  • the spatial disparity value represents a sum of absolute values of the respective differential motion vectors. This sum of absolute values corresponds with the sum of absolute differences between the respective motion vectors that have been established for the respective coded blocks of pixels at the coding end.
  • Each motion vector specifies a displacement between the block of pixels concerned and the particular picture area to which the block of pixels refers. For example, let it be assumed that the case illustrated in Fig. 4 applies: the respective picture areas Rl, ..., R4 are relatively close to each other in terms of spatial position within the picture concerned.
  • the cache mode controller CMC establishes whether the following condition applies: the spatial disparity value is greater than a spatial disparity threshold value (SDV > THSD ?).
  • the spatial disparity threshold value represents a limit in terms of relative remoteness of the coded blocks of pixels within the coded macroblock MBC of interest, that is, the remoteness of the coded blocks of pixels relative to each other.
  • the cache mode controller CMC subsequently carries out step S7, which will be described hereinafter.
  • the cache mode controller CMC subsequently carries out step S6.
  • This value may be, for example, equal to zero in case the mode control signal MO is a binary signal. Accordingly, the cache mode controller CMC causes the cache memory arrangement CMA to operate in the normal mode, in case the conditions that are checked in steps S3 and S5 do not apply. Otherwise, the cache mode controller CMC will not reach step S6 but, instead, step S7.
  • This value may be, for example, one in case the mode control signal MO is a binary signal.
  • the cache mode controller CMC causes the cache memory arrangement CMA to operate in the bypass mode in case any one of the conditions that are checked in steps S3 and S5 applies. In that case, the cache memory CM will have to be refreshed relatively frequently if the cache memory arrangement CMA was to operate in normal mode. As explained hereinbefore, this may lead to congestion on the bus system BS, which may cause loss of data or the other undesired effects. Since, in the bypass mode, the cache memory CM is preempted from being frequency refreshed, these problems are avoided.
  • the invention may be applied to advantage in numerous types of products or methods that involve a cache memory.
  • Video processing is merely an example.
  • the invention may be applied for processing any type of data that comprise respective segments that refer to respective data portions, whereby a segment is accompanied by an indication of a data portion to which the segment refers.
  • the invention may be applied for coding data as well as decoding data, and other for types of data processing.
  • Data, which is to be processed, may be received in numerous different fashions.
  • Fig. 2 illustrates an example in which data is received from a server via a network.
  • data may be received from a reading module that reads a data carrier, such as, for example, an optical disk, a hard disk, or a solid-state memory.
  • Such a reading module may form part of, for example, a video player in which the invention is applied.
  • a cache mode controller in accordance with the invention.
  • Fig. 7 illustrates an example in the forms of a series of steps Sl, ... , S7, which may be executed by means of a software program. Numerous variations are possible. For example, steps S2 and S3 may be interchanged with steps S4 and S5. It is also possible to combine steps S3 and S5 in order to establish a disparity vector, which has temporal component and a spatial component. Such a disparity vector and may be compared with a limit, which may also be expressed as a vector, in order to decide whether the cache memory arrangement should operate in the normal mode or in the bypass mode.
  • a simplified cache mode controller may consider only one type of disparity, spatial or temporal, in order to set the cache memory arrangement in the normal mode or in the bypass mode.
  • An implementation preferably provides programmable parameters corresponding with one or more threshold values, or limits, which are considered upon deciding between the normal mode and the bypass mode.
  • the cache mode controller is implemented as a dedicated circuit, the circuit preferably comprises one or more programmable registers into which these parameters can be written.
  • Fig. 2 illustrates an example that comprises two multiplexers MUXl, MUX2. However, this does not mean that multiplexers are indispensable.
  • an alternative implementation may be obtained by applying the following modifications to the example illustrated in Fig. 2.
  • the two multiplexers MUXl, MUX2 are removed.
  • the cache control module CC is replaced by different control module that takes over the functions that the two multiplexers MUXl, MUX2 fulfill.
  • Such a control module may receive the cache mode control signal MO, which the cache mode controller CMC provides.
  • the control module may formulate different types of main memory requests, and apply these requests to the main memory MM.
  • Such a control module may selectively apply cache read data CD or main memory read data MD as image data ID to the motion compensator MC.
  • the term "segment" should be understood in a broad sense. The term includes any portion or element of a structured set of data. In broad terms, there are numerous ways of implementing functional entities by means of hardware or software, or a combination of both. In this respect, the drawings are very diagrammatic. Although a drawing shows different functional entities as different blocks, this by no means excludes implementations in which a single entity carries out several functions, or in which several entities carry out a single function. For example, referring to Fig. 2, the front-end module FE and the cache mode controller CMC may be implemented by means of a single software module.
  • software which allows a programmable circuit to operate in accordance with the invention.
  • software may be stored in a suitable medium, such as an optical disk or a memory circuit.
  • a medium in which software stored may be supplied as an individual product or together with another product, which may execute software. Such a medium may also be part of a product that enables software to be executed.
  • Software may also be distributed via communication networks, which may be wired, wireless, or hybrid. For example, software may be distributed via the Internet. Software may be made available for download by means of a server. Downloading may be subject to a payment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A processor processes data (VC) comprising respective segments that refer to respective data portions. A segment is accompanied by an indication of a data portion to which the segment refers. A segment may be, for example, a smaller block of a video macroblock. The processor comprises a cache memory arrangement (CMA) that can operate in a normal mode and in a bypass mode. In the normal mode, a request (RQ) for a data portion is presented to a cache memory (CM). In the bypass mode, such a request is systematically presented to a main memory (MM). A cache mode controller (CMC) analyzes the respective indications (MPS) accompanying a group (MBC) of segments that will be successively processed, so as to establish a degree of disparity in position between the respective data portions which are referred to. The cache memory arrangement (CMA) operates in the bypass mode if the degree of disparity exceeds a predefined limit and in the normal mode otherwise.

Description

Processor comprising a cache memory
FIELD OF THE INVENTION
An aspect of the invention relates to a processor comprising a cache memory. The processor may be in the form of, for example, a decoder for decoding video data that has been encoded in accordance with an MPEG standard, or a similar standard (MPEG is an acronym for Moving Picture Experts Group). In such an application, the cache memory is typically used for storing picture areas that are referred to in a motion compensation process. Other aspects of the invention relate to a video apparatus comprising a processor with a cache memory, a method of processing data that involves use of a cache memory, and a computer program product for a programmable processor.
BACKGROUND OF THE INVENTION
A cache memory can be regarded as a buffer memory, which can be accessed more rapidly than a main memory. Data that may be needed by a given process at a given instant is transferred in advance from the main memory to the cache memory. This can be regarded as an anticipatory action. At the instant when the process needs the data, the cache memory can provide this data within a relatively short delay. That is, the process obtains the data more quickly than if the process was to fetch the data from the main memory. Consequently, a cache memory contributes to achieving a relatively high processing speed. International patent application published under number WO 2004/102971 describes a video processing device that comprises an external memory for storing reference pictures. A cache memory temporarily stores data corresponding to a prediction area. A motion compensation circuit read this data from the cache memory in order to reconstruct pictures from decoded data.
SUMMARY OF THE INVENTION
There is a need for processing data in an error-free manner at moderate cost. The following points have been taken into consideration in order to better address this need. Data, which needs to be processed, may comprise respective segments that refer to respective data portions, whereby a segment is accompanied by an indication of a data portion to which the segment refers. MPEG-coded video data is an example of such type of data. A block of pixels that has been coded by means of motion estimation and compensation constitutes such a segment. The block of pixels refers to a picture area in another picture by means of a motion vector. Let it be assumed that a processor for processing the type of data concerned comprises a cache memory. Let it further be assumed that segments, which need to be processed, are systematically stored in the cache memory as in, for example, the aforementioned international patent application. The cache memory preferably has a capacity that allows storage of a data portion to which a segment refers, as well as several neighboring data portions. This approach may provide satisfactory results in case there is a relatively low degree of disparity in position between respective data portions referred to by respective segments that are successively processed. In that case, there is a relatively high probability that a data portion, which is referred to by a segment, is indeed present in the cache memory when the segment is processed. However, it may occur that respective segments, which are successively processed, refer to respective data portions that exhibit a relatively high degree of disparity in terms of position. The respective data portions occupy respective positions within the data of interest that differ to a relatively large extent. In that case, it may be necessary to refresh the cache memory, as it were, relatively frequently. In an extreme case, the cache memory needs to be refreshed with each segment to be processed. Refreshing the cache memory generally entails transferring a relatively large amount of data from the main memory to the cache memory. Consequently, a relatively high degree of disparity in position as mentioned hereinbefore, may necessitate a data traffic between the main memory and the cache memory that exceeds available capacity. In that case, some data will be lost, which will lead to errors. It is possible to prevent such errors by increasing the available capacity for data transfer between the main memory and the cache memory, or by increasing memory capacity, or both. However, these solutions can be relatively costly.
In accordance with an aspect of the invention, a processor comprises a cache memory arrangement that can operate in a normal mode and in a bypass mode. In the normal mode, a request for a data portion is presented to a cache memory. In the bypass mode, such a request is systematically presented to a main memory. A cache mode controller analyzes the respective indications accompanying a group of segments that will be successively processed, so as to establish a degree of disparity in position between the respective data portions which are referred to. The cache memory arrangement operates in the bypass mode if the degree of disparity exceeds a predefined limit and in the normal mode otherwise.
Accordingly, the cache memory is effectively bypassed in case a group of segments is processed that refer to data portions that exhibit a relatively high degree of disparity in position. In order to process a segment of such a group, the data portion to which the segment refers is directly fetched from the main memory. Transferring the data portion from the main memory to the processor entails smaller data traffic than refreshing the cache memory. The cache memory is used only if relatively few refreshes are required for processing a group of segments. Accordingly, the data traffic between the main memory and the cache memory can be kept below a relatively low limit, even in critical cases. Data losses and errors can be prevented without having to resort to relatively costly measures such as, for example, increasing data transfer capacity or increasing memory capacity, or both.
An implementation of the invention advantageously comprises one or more of the following additional features, which are described in separate paragraphs that correspond with individual dependent claims.
The data, which is to be processed, may comprise video data that has been encoded in accordance with a standard according to which a macroblock is divided into smaller blocks, and according to which a smaller block can be encoded with reference to a particular picture area, a smaller block being accompanied by an indication of the particular picture area with reference to which the smaller block has been encoded. The cache mode controller is then preferably arranged to analyze the respective indications accompanying the respective smaller blocks of a macroblock, so as to cause the cache memory arrangement to operate in a bypass mode or in the normal mode on a macroblock basis. This allows error- free and efficient video processing at moderate cost. The indication that accompanies a smaller block may comprise an index that specifies a picture that comprises the particular picture area with reference to which the smaller block has been encoded. The indication may further comprise motion vector data that specifies a displacement between the smaller block and the particular picture area. The cache mode controller then preferably establishes a degree of disparity in position on the basis of the respective indices and the respective motion vector data of the respective smaller blocks of comprised in a macroblock.
The cache mode controller may establish the highest number of smaller blocks within a macroblock that have the same index, and cause the cache memory arrangement to operate in the bypass mode if this number is below a predefined value. The cache mode controller may establish a sum of absolute differences between the respective displacements associated with the respective smaller blocks, and cause the cache memory arrangement to operate in the bypass mode if this sum of absolute differences above a predefined value. The cache memory arrangement preferably comprises a cache control module that divides the cache memory into respective cache lines in which respective rectangular picture areas can be stored.
The main memory may provide a data burst, which has a maximum size in terms of number of data elements that can be comprised in the data burst. In that case, a cache line preferably has a size, in terms of number of data elements that can be stored in the cache line, that corresponds with the maximum size of the data burst. This additional feature, as well as each additional feature specified in the preceding paragraphs, contributes to error- free and efficient video processing at moderate cost.
A detailed description, with reference to drawings, illustrates the invention summarized hereinbefore as well as the additional features.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a block diagram that illustrates a video system.
Fig. 2 is a block diagram that illustrates a decoder, which forms part of the video system.
Fig. 3 is a conceptual diagram that illustrates storage of a rectangular picture area in a cache memory of the decoder.
Fig. 4 is a conceptual diagram that illustrates a case where a cache memory arrangement in the decoder may operate in a normal mode because there is a relatively small degree of disparity in position between respective picture areas referred to by respective block of pixels.
Fig. 5 is a conceptual diagram that illustrates a case where the cache memory arrangement may operate in a bypass mode because there is a relatively high degree of disparity, in terms of spatial position, between respective picture areas referred to by respective block of pixels.
Fig. 6 is a conceptual diagram that illustrates another case where the cache memory arrangement operate in the bypass mode because there is a relatively high degree of disparity, in terms of temporal position, between respective picture areas referred to by respective block of pixels. Fig. 7 is a flow chart diagram that illustrates a series of steps to appropriately set the cache memory arrangement in the normal mode or in the bypass mode.
DETAILED DESCRIPTION Fig. 1 illustrates a video system VSY that is coupled to a network NW, which may be, for example, a cable television network or the Internet. A video server SV is also coupled to the network NW. The video system VSY comprises a video apparatus VA, a display device DPL, and a remote-control device RCD. The video apparatus VA may be in the form of, for example, a so-called settop box. The video apparatus VA comprises an input module INM, a decoder DEC, an output module OUM, and a controller CTRL.
The video system VSY basically operates as follows. A user may select a particular video title that is present on the video server SV by means of the remote-control device RCD. The video apparatus VA submits a request, which indicates this selected video title, to the video server SV. In response, the video server SV provides a transport stream TS that conveys the selected video title. The transport stream TS may be, for example, in accordance with an MPEG-4 encoding standard. The video apparatus VA receives the transport stream TS via the network NW.
The input module INM of the video apparatus VA extracts from the transport stream TS coded video data VC that represents the selected video title. The decoder DEC decodes this coded video data VC so as to obtain decoded video data VD. The output module OUM provides a display driver signal DD on the basis of the decoded video data VD and supplemental data, if any, such as, for example, audio or text, which is to be overlaid. The display driver signal DD causes the display device DPL to display the selected video title. It is desirable that the selected video title is displayed in an error-free fashion, that is, without any annoying artifacts. It is also desirable that the video apparatus VA and, in particular, the decoder DEC can achieve this while being relatively low cost.
Fig. 2 illustrates the decoder DEC, which forms part of the video apparatus VA. The decoder DEC comprises two main entities: a decoding processor DPR and a main memory MM, which are coupled to each other via a bus system BS. The decoding processor DPR may be in the form of an integrated circuit, which comprises one or more data processing circuits and one or more memory circuits. The main memory MM may be in the form of, for example, a commercially available dynamic random access memory (DRAM) or another type of commercially available memory, which offers a relatively high storage/price ratio. In such preferred implementations, the main memory MM is external to the decoding processor DPR. The bus system BS will then typically be provided on a circuit board, or another substrate, on which the decoding processor DPR and the main memory MM are physically mounted.
In more detail, the decoding processor DPR comprises various functional entities that constitute a decoding path: a front-end module FE, a residual decoder RD, a motion compensator MC, and a back-end module BE. The decoding processor DPR further comprises various functional entities that relate to memory access: a cache mode controller CMC, a cache memory arrangement CMA, and a memory interface MIF. The cache memory arrangement CMA comprises further functional entities: a cache control module CC, a cache memory CM, and two multiplexers MUX 1 , MUX2.
Each of the aforementioned functional entities may be implemented by means of a dedicated circuit, which has a particular topology defining one or more data handling operations that the functional entity concerned can carry out. This can be regarded as a hardware-based implementation. Each of the aforementioned functional entities may also be implemented by means of a programmable circuit, which can execute a set of instructions defining one or more data handling operations that the functional entity concerned can carry out. This can be regarded as a software-based implementation. It should be noted that several functional entities may be implemented by means of a single programmable circuit. The decoding processor DPR may be hybrid in the sense that it comprises dedicated circuits as well as programmable circuits.
The decoder DEC basically operates follows. The coded video data VC from the input module INM illustrated in Fig. 1 may be temporarily stored in the main memory MM. To that end, the main memory MM may comprise a section dedicated to that purpose. The controller CTRL illustrated in Fig. 1, or a dedicated memory controller, may define and manage this section. The decoded video data VD, which the decoder DEC provides, may also be temporarily stored in the main memory MM in a similar fashion. Accordingly, at any given instant, the main memory MM will typically comprise one or more pictures, which have been decoded relatively recently, and which may be needed for decoding subsequent pictures. The decoding processor DPR accesses the main memory MM via the memory interface MIF. A memory access may concern a read operation or a write operation.
The coded video data VC comprises a sequence of coded pictures. A coded picture is composed of various coded macroblocks. A coded macroblock represents a relatively large block of pixels within the picture of interest, typically a block of 16 x 16 pixels. The coded macroblock itself is composed of various coded blocks of pixels of smaller size. These coded blocks of pixels are commonly referred to as macroblock partitions in MPEG-4 standards. A motion estimation operation has individually been carried out at a coding end for each of these coded blocks of pixels.
Motion estimation is a process of reducing temporal redundancy. A picture area is searched for in a neighboring picture, on the basis of which a similar block of pixels can be established that best matches a block of pixels to be encoded. The similar block of pixels, as well as the picture area on which it is based, is indicated by means of a motion vector. In case the motion vector has sub-pixel precision, the similar block of pixels is obtained through interpolation between pixels in the picture area of interest. The similar block of pixels is effectively subtracted from the block of pixels to be encoded. Accordingly, a block of residual pixels is obtained, which undergoes a data compression operation. A coded block of pixels can therefore be regarded as a segment that refers to a data portion, namely a picture area, in a neighboring picture. The segment is accompanied by an indication, in the form of a motion vector, of the data portion (the picture area) that is referred to.
The front-end module FE receives a coded macroblock MBC, which the decoding processor DPR reads from the main memory MM. For each coded block of pixels comprised in the coded macroblock MBC, the front-end module FE extracts motion compensation parameters MP and coded residual data RC from the coded macroblock MBC. The motion compensation parameters MP comprise motion vector data, as well as a reference index. More specifically, the motion vector data comprises a differential motion vector, which expresses the motion vector of the block of pixels concerned respect to another motion vector. This other motion vector, which is referred to, may be based on one or more motion vectors of one or more neighboring blocks of pixels. The reference index can be regarded as an integer offset value that indicates a neighboring picture that comprises the picture area to which the coded block of pixels of interest refers. The coded residual data RC represents a block of residual pixels, which has been obtained at the coding end by subtracting a similar block of pixels from the block of pixels of interest.
Before a macroblock is actually decoded, the cache mode controller CMC receives from the front-end module FE a set of motion compensation parameters MPS, which belong to the coded macroblock MBC. The set of motion compensation parameters MPS comprises the respective differential motion vectors and the respective reference indices of the respective coded blocks of pixels, which are comprised in the coded macroblock MBC. On the basis of this information, the cache mode controller CMC sets the cache memory arrangement CMA in a normal mode or in a bypass mode during the decoding of the macroblock of interest. The cache mode controller CMC sets the cache memory CM in the normal mode or in the bypass mode by means of a mode control signal MO. This will be explained in greater detail hereinafter. The residual decoder RD, the motion compensator MC, and the back-end module BE successively decode the blocks of pixels that are comprised in a coded macroblock MBC. A coded block of pixels is decoded in the following manner. The residual decoder RD decodes the coded residual data RC, which is comprised in the coded block of pixels. Accordingly, a decoded block of residual pixels RD is obtained that corresponds with the block of residual pixels that has been obtained at the coding end as described hereinbefore.
The motion compensator MC generates a direct memory access (DMA) request on the basis of the motion compensation parameters MP of the coded block of pixels of interest. This direct memory access request, which will be referred to as data request RQ hereinafter, defines a range of addresses under which the picture area is stored to which the coded block of pixels refers. The motion compensator MC submits the data request RQ to the cache memory arrangement CMA. In response, the cache memory arrangement CMA provides image data ID that represents the picture area of interest. The motion compensator MC generates a motion-compensated block of pixels BC on the basis of this image data ID and the motion compensation parameters MP. The motion-compensated block of pixels BC corresponds with the similar block of pixels that has been obtained at the coding end as described hereinbefore.
The back-end module BE adds the decoded block of residual pixels RD, which the residual decoder RD provides, to the motion-compensated block of pixels BC, which the motion compensator MC provides. Accordingly, the back-end module BE provides a decoded block of pixels BD, which is temporarily stored in the main memory MM as a part of a decoded picture.
Let it be assumed that the cache memory arrangement CMA operates in the normal mode. The cache control module CC processes a data request RQ that emanates from the motion compensator MC. The cache control module CC checks whether the picture area to which the data request RQ pertains is present in the cache memory CM. If so, the cache control module CC effectively translates the data request RQ into a corresponding access request for the cache memory CM. In response, the cache memory CM provides cache read data CD that corresponds with the image data ID that the motion compensator MC requires, as described hereinbefore. There has been a so-called cache hit.
However, it may occur that the picture area to which the data request RQ pertains is not present in the cache memory CM. There is a so-called cache miss. In that case, the cache control module CC refreshes at least a portion of the cache memory CM so that the picture area concerned will be present in the cache memory CM. Cache memory refreshing generally concerns a relatively large amount of data, such as, for example, 128 or 256 bytes, for reasons of overall speed and efficiency. Accordingly, the cache control module CC generates an extended data request EQ, which defines a relatively large range of addresses under which the picture area of interest is stored as well as surrounding picture areas.
The extended data request EQ constitutes a main memory read request MQ, which the memory interface MIF submits to the main memory MM via the bus system BS. In response, the main memory MM provides memory read data MD that is stored under the relatively large range of addresses defined by the extended data request EQ. This memory read data MD, which is relatively large in size, is stored in the cache memory CM. To that end, the cache control module CC may apply a particular replacement policy: the cache control module CC decides which data has to leave the cache memory CM in order to make room for the aforementioned memory read data MD, with which the cache memory CM is refreshed. Once the cache memory CM has been refreshed, the cache control module CC processes the data request RQ as described hereinbefore with respect to a cache hit. That is, the cache control module CC causes the cache memory CM to produce the cache read data CD, which corresponds with the image data ID that the motion compensator MC requires.
In summary, the motion compensator MC receives the image data ID required, which represents the picture area of interest, irrespective of whether the data request RQ leads to a cache hit or a cache miss. However, a cache miss will introduce additional latency; it will take more time before the motion compensator MC finally receives the image data ID compared with a cache hit. What is more, a cache miss leads to the transfer of a relatively large amount of data from the main memory MM to the cache memory CM over the bus system BS, in order to refresh the cache memory CM. Consequently, repetitive cache misses will lead to relatively heavy data traffic on the bus system BS, which may cause congestion and data losses. This is unlikely to occur in the decoder DEC illustrated in Fig. 2 because the cache memory arrangement CMA is set to operate in the bypass mode when repetitive cache misses are expected to occur in the normal mode. In the bypass mode, a data request RQ that emanates from the motion compensator MC is directly presented to the main memory MM as a main memory read request MQ. In response, the main memory MM provides memory read data MD that represents the picture area of interest. The memory read data MD is directly applied to the motion compensator MC as the image data ID that has been requested for the purpose of generating the motion-compensated block of pixels BC. Accordingly, in the bypass mode, the memory read data MD can be restricted to the image data ID that represents the picture area of interest, without any surplus. The memory read data MD will be relatively small in size compared with the normal mode hereinbefore, in which the memory read data MD is used for refreshing the cache memory CM. Accordingly, relatively heavy traffic on the bus system BS can be avoided, so that there is less risk of congestion and data losses.
It should be noted that, in the bypass mode, there will systematically be a certain degree of latency in processing data requests RQ. It will take a relatively long time before the motion compensator MC receives image data ID after having issued a data request RQ. However, a cache miss also causes an additional latency, which may be even more important than the aforementioned latency in the bypass mode. That is, the decoder DEC will anyway have to temporarily slow down its pace, as it were, even if the cache memory arrangement CMA is forced to operate in the normal mode, whereas it would otherwise have operated in the bypass mode. Allowing the cache memory arrangement CMA to operate in the bypass mode will hardly affect processing speed, and may even have a beneficial effect.
In the decoder DEC illustrated in Fig. 2, the cache memory arrangement CMA is made to operate in the normal mode or the bypass mode by means of the two multiplexers MUXl, MUX2, which receive the mode control signal MO from the cache mode controller CMC. Multiplexer MUXl has two inputs: a first input coupled to receive an extended data request EQ, if any, from the cache control module CC, and a second input coupled to receive a data request RQ from the motion compensator MC. An output of multiplexer MUXl is coupled to submit a main memory read request MQ to the main memory MM. Multiplexer MUX2 also has two inputs: a first input coupled to receive main memory MM read data from the main memory MM, and a second input coupled to receive cache read data CD from the cache memory CM. An output of multiplexer MUX2 is coupled to apply image data ID to the motion compensator MC. In the normal mode, the mode control signal MO causes the respective first inputs of the two multiplexers MUXl, MUX2 to be coupled to the respective outputs of these multiplexers. In the bypass mode, the mode control signal MO causes the respective second inputs of the two multiplexers MUXl, MUX2 to be coupled to the respective outputs of these multiplexers.
Fig. 3 illustrates storage of a rectangular picture area Al, which forms part of a picture, in the cache memory CM. Such storage typically occurs when refreshing the cache memory CM. A pixel is designated by means of a reference sign P that comprises two indices that indicate the position of a pixel in the picture. A left-hand index indicates the position along a horizontal axis x in units of pixels; a right-hand index indicates the position along a vertical axis y in units of pixels. The rectangular picture area Al comprises 4 lines of 32 pixels each. Pixel P1J constitutes an upper left corner of the rectangular picture area Al, i and j being integer numbers in depending on the position of the rectangular picture area Al within the picture. A first line of the rectangular picture area Al extends from this pixel P1J to pixel P!+3ij. A second line extends from P1J+i pixel to pixel P1+31J+1. A third line extends from pixel P1J+2 to pixel P1+3 lj+2. A fourth line extends from pixel P1J+3 to pixel P1+3 lj+3.
The cache control module CC in Fig. 2 effectively divides the cache memory CM into several cache lines Ll, ..., L4, as illustrated in Fig. 3. This division, which is of a logical nature rather than of a physical nature, defines a granularity of the cache memory CM in terms of access. That is, a cache line constitutes an access unit that comprises a group of memory cells, which need not necessarily form part of a single physical row or a single physical column of the cache memory CM. For example, a cache line may comprise 32 memory cells, which can collectively be accessed. A memory cell may be, for example, 32 bits wide, meaning that a memory cell can store a 32-bit data word. Fig. 3 illustrates four cache lines only for the sake of simplicity. In practice, the cache memory CM may comprise a relatively large number of cache lines, such as, for example, 1024 cache lines.
The rectangular picture area Al is stored in a single cache line, namely cache line Ll. Respective neighboring picture areas A2, A3, A4 of similar size may be stored in respective other cache lines L2, L3, L4. For example, let it be assumed that a pixel comprises 8 bits. Let it further be assumed that cache line Ll comprises 32 memory cells of 32 bits each, as mentioned hereinbefore. That is, there is room for 4 pixels in a memory cell; 8 memory cells are needed to store a line of 32 pixels. The pixels P1J , ..., P1+3 Ϊ J of the first line of the rectangular picture area Al may be stored in a first set of 8 memory cells of cache line, the pixels P1J+!, ... , P^ij+i of the second line may be stored in a second set of 8 memory cells, and so on.
The main memory MM illustrated in Fig. 2 may communicate data in a burst- like fashion. That is, the main memory MM provides a data burst in response to a data request RQ. Such a data burst will typically have a maximum size in terms of number of data elements that can be comprised in the data burst. For example, a data burst may comprise 128 bytes or 256 bytes at the most. The aforementioned cache lines preferably have a size, in terms of number of data elements that can be stored in a cache line, that corresponds with the maximum size of the data burst. That is, cache lines are preferably aligned with data bursts from the main memory MM in terms of address and size. This contributes to efficient data communication between the main memory MM and the cache memory arrangement CMA over the bus system BS.
Fig. 4 illustrates a case where the cache memory arrangement CMA may operate in the normal mode because there is a relatively low degree of disparity in position. Fig. 4 is a conceptual diagram in which relatively large parallelograms represent pictures. A series of four pictures PN, ... , PN-3 is represented. The pictures are each designated by means of a reference sign, which has an index that indicates the position of the picture concerned within the series of pictures. N is arbitrary integer number identifying an arbitrary picture represented by the coded video data VC.
Fig. 4 illustrates a group of coded pixel blocks Bl, ... , B4, which belong to picture PN. The group of coded pixel blocks Bl, ... , B4 may constitute, for example, a coded macroblock MBC (Fig. 2). The respective coded pixel blocks Bl, ..., B4 refer to respective picture areas Rl, ..., R4 in another picture. This is symbolically represented by means arrows II, ..., 14. That is, arrow Il symbolically represents the motion compensation parameters MP (Fig. 2) of coded pixel block Bl. Similarly, arrows 12, 13, 14 represents the respective motion compensation parameters MP of coded pixel blocks B2, B3, B4, respectively.
In Fig. 4, the respective picture areas Rl, ..., R4 to which the respective coded pixel blocks refer belong to the same picture PN-1. Moreover, the respective picture areas Rl, ..., R4 are relatively close to each other. In other words, there is a relatively low degree of disparity in the respective positions that these picture areas have. As a result, the cache memory CM illustrated in Fig. 2 may be capable of storing a relatively large contiguous picture area, which encompasses at least two of the four picture areas Rl, ..., R4 referred to. The cache memory CM may even be capable of storing all four picture areas Rl , ... , R4. In case the cache memory arrangement CMA operates in the normal mode, the cache memory CM will therefore not have to be frequently refreshed. This implies relatively modest data traffic on the bus system BS for the purpose of motion compensation. In all, using the cache memory CM will be efficient and effective. Fig. 5 illustrates a case where the cache memory arrangement CMA may operate in the bypass mode because there is a relatively high degree of disparity in position. Fig. 5 is a conceptual diagram similar to that of Fig. 4. Similar entities are represented in a similar fashion, using identical reference signs. In Fig. 5, the respective picture areas Rl, ..., R4 to which the respective coded pixel blocks Bl, ..., B4 refer belong to the same picture PN- i. However, the respective picture areas Rl, ..., R4 are relatively remote from each other. In other words, there is a relatively high degree of disparity between these picture areas in terms of spatial position. Although the cache memory CM illustrated in Fig. 2 may be capable of storing a relatively large contiguous picture area, it may occur that this relatively large picture area cannot cover more than one picture area that is referred to. In other words, the cache memory CM can store only one such picture area at a time. Frequent refreshing will be needed, which will lead to relatively heavy data traffic on the bus system BS if the cache memory arrangement CMA operates in the normal mode. Consequently, it is more judicious to have the cache memory arrangement CMA operate in the bypass mode. Fig. 6 illustrates another case where the cache memory arrangement CMA may operate in the bypass mode. Fig. 6 is a conceptual diagram similar to that of Fig. 4. Similar entities are represented in a similar fashion, using identical reference signs. In Fig. 6, the respective picture areas Rl, ..., R4 to which the respective coded pixel blocks Bl, ..., B4 refer, are scattered over three different pictures PN I, PN-2, PN-3- In the example illustrated in Fig. 6, only two coded blocks of pixels Rl, R4 refer to the same picture PN-1. There is a relatively high degree of disparity in terms of temporal position. Although the cache memory CM illustrated in Fig. 2 may be capable of storing a relatively large contiguous picture area, this relatively large picture area cannot cover a relatively large number of picture areas that are referred to. At best, the cache memory CM may jointly store picture areas Rl and R4 illustrated in Fig. 6. Frequent refreshing will be needed, which will lead to relatively heavy data traffic on the bus system BS if the cache memory arrangement CMA operates in the normal mode. Consequently, like in the case illustrated in Fig. 5, it is more judicious to have the cache memory arrangement CMA operate in the bypass mode.
Fig. 7 illustrates a series of steps Sl, ... , S7, which the cache mode controller CMC illustrated in Fig. 2 may carry out in order to appropriately set the cache memory arrangement CMA in the normal mode or in the bypass mode. The cache mode controller CMC carries out the series of steps Sl, ..., S7 on a macroblock basis. Referring to Fig. 2, the series of steps are carried out upon reception of a coded macroblock MBC by the front-end module FE, before the coded macroblock MBC is actually decoded by the residual decoder RD, the motion compensator MC, and the back-end module BE. The series of steps are carried out anew upon reception of a new macroblock. As indicated hereinbefore, the cache mode controller CMC may be in the form of a programmable circuit. Fig. 7 may thus be regarded as a flowchart representation of a software program, that is, a set of instructions, which causes the programmable circuit to carry out various operations described hereinafter with reference to Fig. 7.
In step Sl, the cache mode controller CMC receives a set of motion compensation parameters MPS, which belong to a coded macroblock MBC that the front-end module FE illustrated in Fig. 2 has received. The set of motion compensation parameters MPS comprises the respective reference indices and the respective differential motion vectors of the respective coded blocks of pixels, which are comprised in the coded macroblock MBC (MPSMBC: RIMBC, DVMBC).
In step S2, the cache mode controller CMC establishes a temporal disparity value on the basis of the respective reference indices that belong to the coded macroblock of interest (RIMBC => TDV). The temporal disparity value represents the highest number of coded blocks of pixels that have the same reference index. For example, let it be assumed that the case illustrated in Fig. 4 applies: the four coded blocks of pixels Bl, ..., B4 all refer to picture PN-1. In that case, the temporal disparity value may be equal to 4. The same applies to the case illustrated in Fig. 5. However, in the case illustrated in Fig. 6, only two coded blocks of pixels Bl, B4 refer to the same picture PN I and have therefore the same reference index. In that case, the temporal disparity value may be equal to 2. That is, in this example, the lower the temporal disparity value is, the higher the degree of temporal disparity is.
In step S3, the cache mode controller CMC establishes whether the following condition applies: the temporal disparity value is smaller than a temporal disparity threshold value (TDV < THTD ?)• The temporal disparity threshold value represents a limit in terms of the number of coded blocks of pixels that should have the same reference index. In case the aforementioned condition applies, the cache mode controller CMC subsequently carries out step S7, which will be described hereinafter. In case the temporal disparity value is greater than, or equal to, the temporal disparity threshold value, meaning that the aforementioned condition does not apply, the cache mode controller CMC subsequently carries out step S4.
In step S4, the cache mode controller CMC establishes a spatial disparity value on the basis of the respective differential motion vectors that belong to the coded macroblock of interest (DVMBC => SDV). The spatial disparity value represents a sum of absolute values of the respective differential motion vectors. This sum of absolute values corresponds with the sum of absolute differences between the respective motion vectors that have been established for the respective coded blocks of pixels at the coding end. Each motion vector specifies a displacement between the block of pixels concerned and the particular picture area to which the block of pixels refers. For example, let it be assumed that the case illustrated in Fig. 4 applies: the respective picture areas Rl, ..., R4 are relatively close to each other in terms of spatial position within the picture concerned. In that case, the spatial disparity value will be relatively low. However, in the case illustrated in Fig. 5, the respective picture areas Rl, ..., R4 are relatively remote from each other. In that case, the spatial disparity value will be relatively high. In step S5, the cache mode controller CMC establishes whether the following condition applies: the spatial disparity value is greater than a spatial disparity threshold value (SDV > THSD ?). The spatial disparity threshold value represents a limit in terms of relative remoteness of the coded blocks of pixels within the coded macroblock MBC of interest, that is, the remoteness of the coded blocks of pixels relative to each other. In case the aforementioned condition applies, the cache mode controller CMC subsequently carries out step S7, which will be described hereinafter. In case the spatial disparity value does not exceed the spatial disparity threshold value, meaning that the aforementioned condition does not apply, the cache mode controller CMC subsequently carries out step S6.
In step S6, the cache mode controller CMC sets the mode control signal MO to a value for which the cache memory arrangement CMA (Fig. 2) will be in the normal mode (MO=O). This value may be, for example, equal to zero in case the mode control signal MO is a binary signal. Accordingly, the cache mode controller CMC causes the cache memory arrangement CMA to operate in the normal mode, in case the conditions that are checked in steps S3 and S5 do not apply. Otherwise, the cache mode controller CMC will not reach step S6 but, instead, step S7.
In step S7, the cache mode controller CMC sets the mode control signal MO to a value for which the cache memory arrangement CMA will be in the bypass mode (MO=I). This value may be, for example, one in case the mode control signal MO is a binary signal. Accordingly, the cache mode controller CMC causes the cache memory arrangement CMA to operate in the bypass mode in case any one of the conditions that are checked in steps S3 and S5 applies. In that case, the cache memory CM will have to be refreshed relatively frequently if the cache memory arrangement CMA was to operate in normal mode. As explained hereinbefore, this may lead to congestion on the bus system BS, which may cause loss of data or the other undesired effects. Since, in the bypass mode, the cache memory CM is preempted from being frequency refreshed, these problems are avoided.
CONCLUDING REMARKS The detailed description hereinbefore with reference to the drawings is merely an illustration of the invention and the additional features, which are defined in the claims. The invention can be implemented in numerous different ways. In order to illustrate this, some alternatives are briefly indicated.
The invention may be applied to advantage in numerous types of products or methods that involve a cache memory. Video processing is merely an example. The invention may be applied for processing any type of data that comprise respective segments that refer to respective data portions, whereby a segment is accompanied by an indication of a data portion to which the segment refers. The invention may be applied for coding data as well as decoding data, and other for types of data processing. Data, which is to be processed, may be received in numerous different fashions. Fig. 2 illustrates an example in which data is received from a server via a network. In other applications, data may be received from a reading module that reads a data carrier, such as, for example, an optical disk, a hard disk, or a solid-state memory. Such a reading module may form part of, for example, a video player in which the invention is applied. There are numerous ways of implementing a cache mode controller in accordance with the invention. Fig. 7 illustrates an example in the forms of a series of steps Sl, ... , S7, which may be executed by means of a software program. Numerous variations are possible. For example, steps S2 and S3 may be interchanged with steps S4 and S5. It is also possible to combine steps S3 and S5 in order to establish a disparity vector, which has temporal component and a spatial component. Such a disparity vector and may be compared with a limit, which may also be expressed as a vector, in order to decide whether the cache memory arrangement should operate in the normal mode or in the bypass mode. A simplified cache mode controller may consider only one type of disparity, spatial or temporal, in order to set the cache memory arrangement in the normal mode or in the bypass mode. An implementation preferably provides programmable parameters corresponding with one or more threshold values, or limits, which are considered upon deciding between the normal mode and the bypass mode. For example, in case the cache mode controller is implemented as a dedicated circuit, the circuit preferably comprises one or more programmable registers into which these parameters can be written. There are numerous ways of implementing a cache memory arrangement in accordance with the invention. Fig. 2 illustrates an example that comprises two multiplexers MUXl, MUX2. However, this does not mean that multiplexers are indispensable. For example, an alternative implementation may be obtained by applying the following modifications to the example illustrated in Fig. 2. The two multiplexers MUXl, MUX2 are removed. The cache control module CC is replaced by different control module that takes over the functions that the two multiplexers MUXl, MUX2 fulfill. Such a control module may receive the cache mode control signal MO, which the cache mode controller CMC provides. The control module may formulate different types of main memory requests, and apply these requests to the main memory MM. Such a control module may selectively apply cache read data CD or main memory read data MD as image data ID to the motion compensator MC.
The term "segment" should be understood in a broad sense. The term includes any portion or element of a structured set of data. In broad terms, there are numerous ways of implementing functional entities by means of hardware or software, or a combination of both. In this respect, the drawings are very diagrammatic. Although a drawing shows different functional entities as different blocks, this by no means excludes implementations in which a single entity carries out several functions, or in which several entities carry out a single function. For example, referring to Fig. 2, the front-end module FE and the cache mode controller CMC may be implemented by means of a single software module.
There are numerous ways of storing and distributing a set of instructions, that is, software, which allows a programmable circuit to operate in accordance with the invention. For example, software may be stored in a suitable medium, such as an optical disk or a memory circuit. A medium in which software stored may be supplied as an individual product or together with another product, which may execute software. Such a medium may also be part of a product that enables software to be executed. Software may also be distributed via communication networks, which may be wired, wireless, or hybrid. For example, software may be distributed via the Internet. Software may be made available for download by means of a server. Downloading may be subject to a payment.
The remarks made herein before demonstrate that the detailed description with reference to the drawings, illustrate rather than limit the invention. There are numerous alternatives, which fall within the scope of the appended claims. Any reference sign in a claim should not be construed as limiting the claim. The word "comprising" does not exclude the presence of other elements or steps than those listed in a claim. The word "a" or "an" preceding an element or step does not exclude the presence of a plurality of such elements or steps. The mere fact that respective dependent claims define respective additional features, does not exclude a combination of additional features, which corresponds to a combination of dependent claims.

Claims

Processor comprising a cache memory CLAIMS:
1. A processor (DEC) for processing data (VC) comprising respective segments (Bl, ..., B4) that refer to respective data portions (Rl, ..., R4), a segment being accompanied by an indication (II, ... , 14) of a data portion to which the segment refers, the processor comprising: - a cache memory arrangement (CMA) arranged to operate in a normal mode wherein a request (RQ) for a data portion is presented to a cache memory (CM), and in a bypass mode wherein such a request is systematically presented to a main memory (MM); and a cache mode controller (CMC) arranged to analyze the respective indications (MPS) accompanying a group (MBC) of segments that will be successively processed, so as to establish a degree of disparity in position (TDV, SDV) between the respective data portions which are referred to, and to cause the cache memory arrangement to operate in the bypass mode if the degree of disparity exceeds a predefined limit (THTD, THSD), and in the normal mode otherwise.
2. A processor according to claim 1, wherein the data (VC) comprises video data that has been encoded in accordance with a standard according to which a macroblock (MBC) is divided into smaller blocks (Bl, ..., B4), and according to which a smaller block can be encoded with reference to a particular picture area (Rl, ..., R4), a smaller block being accompanied by an indication (II, ..., 14) of the particular picture area with reference to which the smaller block has been encoded, the cache mode controller (CMC) being arranged to analyze the respective indications accompanying the respective smaller blocks of a macroblock, so as to cause the cache memory arrangement (CMA) to operate in a bypass mode or in the normal mode on a macroblock basis.
3. A processor according to claim 2, wherein the indication that accompanies a smaller block comprises: an index that specifies a picture that comprises the particular picture area with reference to which the smaller block has been encoded; and motion vector data that specifies a displacement between the smaller block and the particular picture area, the cache mode controller (CMC) being arranged to establish a degree of disparity in position on the basis of the respective indices (RIMBC) and the respective motion vector data (DVMBC) of the respective smaller blocks of comprised in a macroblock.
4. A processor according to claim 3, wherein the cache mode controller (CMC) is arranged to establish the highest number (TDV) of smaller blocks within a macroblock that have the same index, and to cause the cache memory arrangement (CMA) to operate in the bypass mode if this number is below a predefined value (THTD).
5. A processor according to claim 3, wherein the cache mode controller (CMC) is arranged to establish a sum of absolute differences (SDV) between the respective displacements associated with the respective smaller blocks, and to cause the cache memory arrangement (CMA) to operate in the bypass mode if this sum of absolute differences above a predefined value (THSD).
6. A processor according to claim 1, the cache memory arrangement (CMA) comprising a cache control module (CC) for dividing the cache memory (CM) into respective cache lines (Ll, ..., L4) in which respective rectangular picture areas (Al, ..., A4) can be stored.
7. A processor according to claim 6, whereby the main memory (MM) is arranged to provide a data burst, which has a maximum size in terms of number of data elements that can be comprised in the data burst, a cache line (Ll) having a size, in terms of number of data elements that can be stored in the cache line, that corresponds with the maximum size of the data burst.
8. A video apparatus (VA) comprising a processor (DEC) according to claim 1.
9. A method of processing data (VC) comprising respective segments (Bl, ...,
B4) that refer to respective data portions (Rl, ..., R4), a segment being accompanied by an indication (II, ... , 14) of a data portion to which the segment refers, the method involving use of a cache memory arrangement (CMA) that can operate in a normal mode wherein a request (RQ) for a data portion is presented to a cache memory (CM), and in a bypass mode wherein such a request is systematically presented to a main memory (MM), the method comprising: - a series of cache mode control steps (Sl ,..., S 7) in which the respective indications (MPS) accompanying a group (MBC) of segments that will be successively processed, are analyzed so as to establish a degree of disparity in position (TDV, SDV) between the respective data portions which are referred to, and in which step the cache memory arrangement (CMA) is caused to operate in the bypass mode if the degree of disparity exceeds a predefined limit (THTD, THSD), and in the normal mode otherwise.
10. A computer program product that comprises a set of instructions, which when loaded into a programmable processor, causes the programmable processor to carry out the method as claimed in claim 9.
PCT/IB2009/050823 2008-03-03 2009-03-02 Processor comprising a cache memory WO2009109891A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP08290207 2008-03-03
EP08290207.3 2008-03-03

Publications (1)

Publication Number Publication Date
WO2009109891A1 true WO2009109891A1 (en) 2009-09-11

Family

ID=40886407

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2009/050823 WO2009109891A1 (en) 2008-03-03 2009-03-02 Processor comprising a cache memory

Country Status (1)

Country Link
WO (1) WO2009109891A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190073312A1 (en) * 2017-09-06 2019-03-07 Shanghai Zhaoxin Semiconductor Co., Ltd. Hardware accelerators and access methods thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004102971A1 (en) * 2003-05-19 2004-11-25 Koninklijke Philips Electronics N.V. Video processing device with low memory bandwidth requirements
US20040239680A1 (en) * 2003-03-31 2004-12-02 Emberling Brian D. Method for improving cache-miss performance
WO2006029382A2 (en) * 2004-09-09 2006-03-16 Qualcomm Incorporated Caching method and apparatus for video motion compensation
US20080059716A1 (en) * 2006-09-04 2008-03-06 Fujitsu Limited Moving-picture processing apparatus and pre-fetch control method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040239680A1 (en) * 2003-03-31 2004-12-02 Emberling Brian D. Method for improving cache-miss performance
WO2004102971A1 (en) * 2003-05-19 2004-11-25 Koninklijke Philips Electronics N.V. Video processing device with low memory bandwidth requirements
WO2006029382A2 (en) * 2004-09-09 2006-03-16 Qualcomm Incorporated Caching method and apparatus for video motion compensation
US20080059716A1 (en) * 2006-09-04 2008-03-06 Fujitsu Limited Moving-picture processing apparatus and pre-fetch control method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FENG W-C ET AL: "IMPROVING DATA CACHING FOR SOFTWARE MPEG VIDEO DECOMPRESSION", PROCEEDINGS OF THE SPIE - THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING, SPIE, PO BOX 10 BELLINGHAM WA 98227-0010 USA, vol. 2668, 31 January 1996 (1996-01-31), pages 94 - 104, XP000617098, ISSN: 0277-786X *
VILAYANNUR M ET AL: "Discretionary caching for I/O on clusters", CLUSTER COMPUTING AND THE GRID, 2003. PROCEEDINGS. CCGRID 2003. 3RD IE EE/ACM INTERNATIONAL SYMPOSIUM ON 12-15 MAY 2003, PISCATAWAY, NJ, USA,IEEE, 12 May 2003 (2003-05-12), pages 96 - 103, XP010639741, ISBN: 978-0-7695-1919-7 *
ZHOU X ET AL: "Implementation of H.264 decoder on general-purpose processors with media instructions", PROCEEDINGS OF THE SPIE - THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING, SPIE, PO BOX 10 BELLINGHAM WA 98227-0010 USA, vol. 5022, 1 January 2003 (2003-01-01), pages 224 - 235, XP002367314, ISSN: 0277-786X *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190073312A1 (en) * 2017-09-06 2019-03-07 Shanghai Zhaoxin Semiconductor Co., Ltd. Hardware accelerators and access methods thereof
US11263139B2 (en) * 2017-09-06 2022-03-01 Shanghai Zhaoxin Semiconductor Co., Ltd. Hardware accelerators and access methods thereof

Similar Documents

Publication Publication Date Title
US8687706B2 (en) Memory word array organization and prediction combination for memory access
US10735727B2 (en) Method of adaptive filtering for multiple reference line of intra prediction in video coding, video encoding apparatus and video decoding apparatus therewith
US20060023792A1 (en) Memory mapping apparatus and method for video decoder/encoder
JP5245004B2 (en) Low power memory hierarchy for high performance video processors
JP2005510981A (en) Multi-channel video transcoding system and method
AU2017317847B2 (en) Intra-prediction video coding method and device
US9916251B2 (en) Display driving apparatus and cache managing method thereof
US20050190609A1 (en) Memory interface and data processing system
US8363730B2 (en) Local macroblock information buffer
JP2011120244A (en) System for processing images
US6820087B1 (en) Method and apparatus for initializing data structures to accelerate variable length decode
US6313766B1 (en) Method and apparatus for accelerating software decode of variable length encoded information
WO2009109891A1 (en) Processor comprising a cache memory
US20220210454A1 (en) Video decoding and display system and memory accessing method thereof
US20060291566A1 (en) Context buffer address determination using a plurality of modular indexes
US20130127887A1 (en) Method for storing interpolation data
US10085016B1 (en) Video prediction cache indexing systems and methods
US20030123555A1 (en) Video decoding system and memory interface apparatus
KR100891116B1 (en) Apparatus and method for bandwidth aware motion compensation
KR102247741B1 (en) An image processor, a method of operating the image processor, and an application processor including the image processor
US20110142128A1 (en) Method and apparatus interleaving pixel of reference image within single bank of frame memory, and video codec system having the same
JP5556082B2 (en) Memory controller, image processing system, and memory access control method
US6614437B1 (en) Apparatus and method for efficient memory utilization in an electronic system
US20080056377A1 (en) Neighboring Context Management
US8265169B2 (en) Video block memory read request translation and tagging

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09717762

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09717762

Country of ref document: EP

Kind code of ref document: A1