WO2007135602A1 - Electronic device and method for storing and retrieving data - Google Patents

Electronic device and method for storing and retrieving data Download PDF

Info

Publication number
WO2007135602A1
WO2007135602A1 PCT/IB2007/051783 IB2007051783W WO2007135602A1 WO 2007135602 A1 WO2007135602 A1 WO 2007135602A1 IB 2007051783 W IB2007051783 W IB 2007051783W WO 2007135602 A1 WO2007135602 A1 WO 2007135602A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
unit
address
memory
compression
Prior art date
Application number
PCT/IB2007/051783
Other languages
French (fr)
Inventor
Abraham K. Riemens
Pieter Van Der Wolf
Renatus J. Van Der Vleuten
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2007135602A1 publication Critical patent/WO2007135602A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0292User address space allocation, e.g. contiguous or non contiguous base addressing using tables or multilevel address translation means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • H04N19/426Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/401Compressed data

Definitions

  • the present invention relates to an electronic device as well as a method for storing and retrieving data.
  • Video and image data processing systems can be implemented as a system-on- chip, which may require a high bandwidth for some of the applications run on the image processing system. Therefore, to reduce the bandwidth to a background memory and to reduce the actual amount of memory usage, the actual amount of data transferred to and from the memory is reduced by an embedded compression of the data stored in the memory, i.e. only compressed data is transferred to the memory.
  • a data processing system which processes image data in a streaming manner
  • the processing typically starts from top left and ends at the bottom right of an image.
  • the image data can be decompressed immediately, i.e. on-the-fly, and is processed to be finally compressed again and is stored again in the background memory.
  • any random accesses may lead to difficulties. This is in particular true for an address-based computer like a system with a mix of hardware and software signal processing.
  • the memory access speed improves at a much slower pace than processing speed.
  • memory bandwidth is hampered by the latency associated with setting up a data transfer, i.e. the time required between offering an address and having the corresponding data word available.
  • a memory transaction typically consists of a burst of consecutive data words by addressing data once while multiple data words are transferred as they belong to consecutive addresses. In this way memory bandwidth can be used efficiently.
  • the increasing latency in terms of processor cycles
  • Image processing functions have a good spatial locality, as large sets of neighboring pixels are typically processed together.
  • Fig. 1 shows a block diagram of an image data processing system according to the prior art.
  • the system typically comprises a CPU and several image processing units IPU A - IPU C as well as a shared memory MU.
  • the CPU, the image processing units, and the shared memory MU are typically coupled by a bus B.
  • bus interfaces BI are used.
  • the shared memory MU will be coupled to the bus B via a memory interface MI.
  • the CPU can allocate a buffer in the shared memory in order to facilitate its processing. Apart from allocating the buffers, the CPU may initiate a processing of the image processing units by programming the respective parameters into the image processing units. This may include the set up of the addresses in the buffer as allocated by the CPU.
  • the image processing units IPU A - IPU C are typically dedicated processing units for performing various image processing. Thereafter, the image processing units IPU A - IPU C will perform their dedicated image processing and will store and retrieve the required image data from the buffers in the shared memory as notified by the CPU. After the dedicated image processing, the results are stored in an output buffer allocated in the shared memory MU. The data in the output buffer can be used by any one of the image processing units, by the CPU or can be output. In addition, a second level cache can be implemented in order to reduce any off-chip traffic.
  • any access to the shared memory MU is typically performed in bursts of 64, 128 or 256 bytes of consecutive data. This is advantageous as the addressing only has to be performed once for every memory transfer. Burst-based memory transfers are also required for SDRAM memories.
  • the memory system may be pipelined and the actual bus protocol can be decoupled from the specific system design including the overall memory bandwidth such that the memory unit can be based on a single or double data rate SDRAM without any influence on the bus protocol.
  • WO 2004/092960 shows a data processing apparatus for processing data being associated to a data address in a range of data addresses.
  • the data is compressed and stored as compressed blocks in a memory.
  • the memory address occupied by each block starts from a respective preferred starting address for a multi-address transfer to and from the memory system.
  • Each of the blocks represents compressed data which are associated to the data addresses in the sub-range.
  • a decompressor unit is provided for decompressing the compressed data from the memory. By storing compressed data in the memory system, the memory bandwidth can be reduced.
  • Fig. 2 shows a schematic representation of the compression of image data according to the prior art.
  • a picture P is analyzed and the respective image data (the uncompressed data) is associated to application addresses AD.
  • the compressed data will be arranged in the actual physical address PA such that some of the address range is in use and some is not in use, i.e. there are unoccupied address ranges between the compressed data.
  • This is in particular because of the actual compression being used which is a lossy compression with e.g. a fixed compression ratio of 2.
  • This will result in unused spaces in the physical addresses of the memory. Accordingly, the bandwidth to the memory system is reduced by a factor of 2 due to the compression ratio.
  • the actual compression of the image data will not lead to a reduced memory size.
  • a data processing device which comprises at least one processing unit for processing data based on a logical address space and a communication infrastructure comprising an interconnect for communicating data and addresses between the at least one processing unit and a memory unit storing compressed and/or uncompressed data based on a physical address space.
  • the data processing device further comprises a compression unit for compressing data and/or for decompressing compressed data.
  • a transformation unit is provided for performing an address transformation between the logical address space and the physical address space of an address associated with the data compressed and/or decompressed by the compression unit.
  • the size of the compressed data in the physical address space is smaller than the size of the corresponding uncompressed data in the logical address space. Therefore, a data processing device is provided which enables a transparent address transformation for all data which need to be stored in the memory unit.
  • the compression unit performs a lossy compression.
  • a lossy compression is performed, the size of the compressed data will be smaller than the size of the uncompressed data such that memory space can be saved.
  • the compression unit and/or the transformation unit are activated and deactivated according to the value of the address. Therefore, the compression unit and the transformation unit will only be activated if required, wherein the activation is performed based on the actual address of the memory access.
  • a memory interface unit is coupled to the interconnect for handling a communication to the memory unit. The memory interface unit will take care of the communication between the memory unit and the interconnect such that the memory unit does not need to take care of the communication particulars.
  • an interface unit is associated to a processing unit for handling the communication between the processing unit and communication infrastructure. Therefore, the interface unit will take care of the communication between the processing unit and the communication infrastructure such that the processing units only need to perform their dedicated processings.
  • an access to the memory unit is performed in bursts of data, wherein the address transformation is performed once per burst.
  • the latency of a memory access can be significantly reduced by performing the access to the memory in bursts and by performing the address transformation once per burst.
  • the address transformation is performed based on a start address of the burst to the memory unit.
  • the data of the burst will be stored in a consecutive memory space such that merely the start address is required for the address transformation.
  • the address transformation unit is adapted to calculate an address by evaluating a mathematical expression involving a constant offset to be added to a logical address within the logical address space. By merely adding a constant offset to the logical address, the respective physical address can be achieved.
  • the data processing device comprises a control unit for activating and deactivating the transformation unit and/or the compression unit, wherein said control unit comprises settings registers
  • the settings registers store information regarding the address ranges where data is required to be compressed.
  • the control unit only activates the compression unit to compress or decompress data and/or the transformation unit to perform the address transformation if the control unit determines that an address of a memory access falls within the range of addresses of compressed data.
  • the invention also relates to a video processing system, which comprises a memory unit for storing compressed and/or uncompressed data based on a physical address space, and a memory interface unit for handling a communication between the memory unit and a communication infrastructure.
  • the video processing system furthermore comprises at least one processing unit for processing data based on a logical address space and a communication infrastructure comprising an interconnect for communicating data and addresses between the at least one processing unit and a memory unit storing compressed and/or uncompressed data based on a physical address space.
  • the data processing device further comprises a compression unit for compressing data and/or for decompressing compressed data.
  • a transformation unit for performing an address transformation between the logical address space and the physical address space of an address associated with the data compressed and/or decompressed by the compression unit.
  • the size of the compressed data in the physical address space is smaller than the size of the corresponding uncompressed data in the logical address space.
  • the invention also relates to a method for storing and retrieving data in a data processing device having at least one processing unit for processing data based on a logical address space.
  • Data and addresses are communicated between the at least one processing unit and a memory unit which stores compressed and/or uncompressed data based on a physical address space.
  • Data is compressed and/or compressed data is decompressed.
  • An address transformation is performed between the logical address space and the physical address space of an address associated with the data compressed and/or decompressed by the compression unit.
  • the size of the compressed data in the physical address space is smaller than the size of the corresponding uncompressed data in the logical address space.
  • the invention relates to the idea to provide a video processing system which distinguishes a logical address space used for processing from the actual physical address space used in a background memory.
  • the logical address space may be larger than the physical address space such that the memory space is logically extended.
  • the data processing of the electronic device will be based on logical addresses.
  • An address transformation unit is provided for transforming any logical address to a physical address. The transformation of the addresses as well as the compression/decompression of data is controlled by an address discrimination.
  • Fig. 1 shows a block diagram of a data processing system according to the prior art
  • Fig. 2 shows a schematic representation of a compression of an image according to the prior art
  • Fig. 3 shows a block diagram of a video processing device according to the present invention
  • Fig. 4 shows a block diagram of an interface unit according to a first embodiment
  • Fig. 5 shows a representation of the image compression and the address translation according to a second embodiment
  • Fig. 6 shows a block diagram of an interface unit according to the second embodiment
  • Fig. 7 shows a basic representation of a memory map according to the second embodiment
  • Fig. 8 shows a representation of a compression and address translation according to a third embodiment
  • Fig. 9 shows a representation of a compression and address translation according to a fourth embodiment
  • Fig. 10 shows a basic representation of a memory map according to the fourth embodiment
  • Fig. 11 shows a block diagram of an interface unit according to a fifth embodiment
  • Fig. 12 shows a representation of a memory management.
  • Fig. 3 shows a block diagram of an image or video data processing device according to the present invention.
  • the system typically comprises a CPU and several so called IP blocks IP (which can be implemented as computation elements, memories, subsystems containing interconnect modules or image or video processing units) as well as a shared memory MU (which may be internal or external).
  • IP blocks IP which can be implemented as computation elements, memories, subsystems containing interconnect modules or image or video processing units
  • shared memory MU which may be internal or external.
  • the CPU, the image processing units IP, and the shared memory MU are typically coupled by a bus B.
  • the interconnect can also be realized by a network on chip or a network extending over several chips or devices.
  • interface units IU are used.
  • the shared memory MU will be coupled to the bus B via a memory interface MI.
  • the CPU can allocate a buffer in the shared memory MU in order to facilitate its processing. Apart from allocating the buffers, the CPU may initiate a processing of the image processing units IP by programming the respective parameters into the image processing units. This may include the set up of the addresses in the buffer as allocated by the CPU.
  • the image processing units IPU are typically dedicated processing units for performing various image processing. Thereafter, the image processing units IP will perform their dedicated image processing and will store and retrieve the required image data from the buffers in the shared memory as notified by the CPU. After the dedicated image processing, the results are stored in an output buffer allocated in the shared memory MU.
  • the data in the output buffer can be used by any one of the image processing units, by the CPU or can be output.
  • an interface unit IU may also be provided for several IP blocks.
  • the following embodiments relate to a data processing device, in particular for image or video processing with an external memory device.
  • These devices may be implemented as systems-on-chip.
  • a substantial part of the available memory bandwidth is consumed by image data and a memory-based communication is present between various (hardware or software) components.
  • Typical all (or almost all) images are stored in an off- chip memory.
  • the data processing device may comprise three types of components. IP blocks constitute hardware components dedicated to specific signal processing functions.
  • FIG. 4 shows a block diagram of an interface unit according to a first embodiment.
  • the interface unit IU comprises a compression unit CU, a transformation unit TU and optionally a control unit CTRL.
  • the compression unit serves to compress data to be stored in the memory and to decompress data from the memory.
  • the transformation unit TU serves to perform an address transformation for a memory access.
  • the control unit CTRL serves to activate and deactivate the compression unit CU and the transformation unit TU. This activation will depend on the address of the memory access.
  • the interface unit IU will be coupled between an IP block IP and a memory MU within the data processing system.
  • the processing of the IP blocks IP will be performed in a logical address space LAS while the memory is based on a physical address space PAS.
  • the logical address space is larger than the actual physical space of the memory such that a logical extension of the memory is provided. Therefore, the transformation unit TU serves to transform the logical address into a physical address.
  • the IP block IP will request a data access and may indicate an address as well as the data to be written.
  • the data to be transferred (uncompressed data) dt u is forwarded to the compression unit CU where this data is compressed and the compressed data dt c is forwarded to the memory.
  • the address addri (in the logical address space) is supplied to the transformation unit TU which performs the address transformation to transform the logical address addri to the physical address addr p . Accordingly, the compressed data will be stored in the memory at the modified address, i.e. the physical address addr p .
  • the IP block IP If the IP block IP needs to read data from the memory, the IP block will supply an address in the logical address space. The address will be transformed in the transformation unit TU into a physical address and the data at this address is fetched from the memory. If this data is compressed data, this compressed data will be decompressed in the compression unit CU and will be forwarded to the IP block IP such that the IP block IP may perform its processing thereon.
  • the IP block can also access data that is not subject to data compression. In this case no (de) compression nor an address translation is carried out, so both the address and the data address are passed unmodified.
  • the control unit CTRL keeps track of those logical addresses and the corresponding physical addresses in the memory which contain compressed data and those logical addresses and the corresponding physical address ranges which do not comprise compressed data.
  • the compression unit CU and the transformation unit TU will only be activated if the memory access involves compressed data. Therefore, the control unit CTRL compares the address of a memory access with the address ranges of compressed data and the address ranges of uncompressed data. This can for example be performed by determining whether the address of the memory access is within the logical address range which corresponds to the logical extension range. Alternatively, the comparison may be performed in the transformation unit.
  • a filter IP block IP When e.g. a filter IP block IP processes an image, its operation is set up by control software executing on a processor which specifies the source address of the input image and the destination address of the output image.
  • the IP block autonomously performs direct memory access (DMA) to traverse the images.
  • DMA direct memory access
  • the IP block When finished, the IP block typically issues an interrupt to the processor. This way, the software maintains the memory buffers while multiple hardware blocks can perform various processing steps in an application concurrently.
  • Fig. 5 shows a representation of the image compression and the address translation according to a second embodiment.
  • the compression of data and the address translation according to the second embodiment will correspond to the compression of data and address translation as described according to the first embodiment.
  • the processing according to the second embodiment regarding the compression and translation, will be performed by an interface unit IU as described according to Fig. 4.
  • two images II, 12 are shown which are segmented into segments of 128 bytes. Each of these segments is compressed to 64 bytes in a logical address space LAS according to a fixed compression factor of 2.
  • the start addresses of each of the image segments of the compressed data is unchanged, holes or empty spaces will occur between the end of a compressed image segment and the start of a new compressed image segment.
  • the logical address is transferred to the physical address PA preferably by the transformation unit TU as shown in Fig. 4.
  • This can be performed by using an address offset which is typically constant within an image.
  • the first image Il will have an offset of zero such that the logical address will correspond to the physical address.
  • the second image 12 will have an offset such that the compressed image segments of the second image 12 can fill up the empty holes between adjacent compressed image segments from the first image II.
  • the transformation from the logical space to the physical address space is performed by introducing an address offset.
  • the address offset is constant at least within a certain address range and can be constant within an image. Accordingly, by merely storing the address offset setting in the compression unit CU or in the translation unit TU, the need for any state information during a compression processing is obsolete.
  • the address translation is very simple if merely an address offset is used.
  • the physical address p will correspond to an application address a plus the address offset o.
  • a single image can be stored in a smaller buffer. This can be performed by interweaving two halves of a single image into a single physical address area resulting in more efficient memory utilization when e.g. an application requires an odd number of equal-sized images at the cost of an extra entry in the logical address table. When this is applied for all images, the number of entries is twice the number of images (still well manageable, but relatively high).
  • a single physical allocation of half the image size is used for two logical memory chunks located adjacent of each other to store the image. Then all physical memory is in use, so there is no need to keep track of free physical memory. Furthermore, also all newly allocated logical memory is in use, so there is also no need to keep track of available and free logical memory. Hence, the memory allocation is simplified, since no state needs to be maintained by a memory allocator.
  • a factor of 1.6 for one image reduces 128 bytes to 80 bytes. This can be combined with an other image compressed by a factor of 2.67. This may be utilized in e.g. a video compression standard like mpeg, where B frames are less sensitive to error propagation (thus compressed more aggressively) compared to I and P frames. So in cases where different images have different quality requirements, optimal selection of the compression factors is possible. Further, different compression factors can be applied to different types of data, like images, depth maps, graphics, etc.
  • Fig. 6 shows a block diagram of an interface unit according to the second embodiment.
  • the interface unit IU according to the second embodiment substantially corresponds to the interface unit according to Fig. 4.
  • the control unit CTRL comprises a plurality of setting registers SR which may contain the start address start, the end address end as well as an address offset offset.
  • the settings will define an address range within which the compression unit CU needs to be activated for compression or decompression.
  • the transformation unit TU determines based on the settings in the setting registers, i.e. the start address, the end address as well as the address offset, whether an address of a memory access falls within this range or not. If the address of the memory access does not fall into the range where a compression is required, the logical address will correspond to the physical address, i.e. the address offset will correspond to zero. In this case, the data will not undergo a compression and will therefore bypass the compression unit. Such a bypass may be implemented inside or outside the compression unit as depicted in Fig. 6.
  • the transformation unit TU determines that the address of a memory access falls within the range determined by the setting registers, the data dtu are compressed within the compression unit CU and the compressed data dtc is forwarded to the memory.
  • an address offset as stored in the setting registers SR will be added to the logical address addri in order to obtain the physical address addr p .
  • the compressed data will then be stored at the physical address as determined by the transformation unit TU.
  • Fig. 7 shows a basic representation of a memory map according to the second embodiment.
  • the left hand side shows a memory map of the logical address space LAS and the right hand side depicts a memory map of the physical address space PAS.
  • the memory unit MU can e.g. comprises a memory space of 16 Mbyte within an address range of OXCOOOOOOO - OxCOFFFFFF, i.e. this will correspond to the physical address space PAS.
  • the logical address space may be larger than the physical address space, wherein the difference between the logical address space and the physical address space can be referred to as a logical extension. In the present case, the logical extension will start at an address of OxAOOOOOOO.
  • each image will require 0xlFA400 bytes. If the memory allocation corresponds to the physical address range of 0xC0800000 - 0xC09FA3FF, then the settings of the address transformation unit with the respect to the first image Il will correspond to
  • the settings of the address transformation unit with respect to the second image 12 is as follows:
  • the memory access will be performed accordingly.
  • the address set up is performed by the memory allocation during an initialization of an application.
  • the interface units which are coupled between the IP blocks IP and the memory units. These interface units IU will take care of all of the communication between the IP block IP and the memory such that the IP block IP can perform its dedicated processing without having to care for any of the communication between the memory and the interface.
  • Fig. 8 shows a representation of a compression and address translation according to a third embodiment.
  • the compression and the address translation is performed on a single image.
  • the compression as well as the address translation can be performed by an interface unit IU as depicted in Fig. 4 or Fig. 6.
  • the start address as well as the compression ratio are constant within at least an address range of an image, these values can be stored in the setting registers and can be accessed by the transformation unit TU.
  • buffering or caching can be performed advantageously allowing larger data transfers to off-chip SDRAM (in physical address space) to increase efficiency of the SDRAM accesses. Further this requires on-chip buffering of compressed data to utilize locality of reference. Thus, bus efficiency is improved.
  • the third embodiment is advantageous with respect to the first embodiment as compression factors can be used which are not a power of 2.
  • compression factors can be used which are not a power of 2.
  • a compression factor of 1.6 corresponds to a compression ratio of 5/8, which is conveniently coded as fixed-point value for the multiplication in the address calculation.
  • an application being run in the system still needs to access the application addresses, which reside in physical memory space. Therefore, the "free" or unoccupied space at the end of the compressed image cannot be utilized for other purposes in a transparent manner, since that address space is actually used by the application. Therefore, there is no saving of memory footprint.
  • Fig. 9 shows a representation of a compression and address translation according to a fourth embodiment.
  • the fourth embodiment may be based on a combination of the second and third embodiment such that the advantages of both systems are combined at the cost of a slightly more expensive address transformation.
  • Fig. 10 shows a basic representation of a memory map according to the fourth embodiment.
  • the physical memory has 16 Mbyte, which is located at an address range of OXCOOOOOOO - OxCOFFFFFF.
  • the logical extension of this memory starts at address OXAOOOOOOO.
  • a first and second image Il and 12 requires 0xlFA400 bytes such that the physical address range corresponds to 0xC0800000 - 0xC09FA3FF. Therefore, the settings of the address transformation unit with respect to the first image:
  • the offset value can be coded as a 2's complement number, which can also hold negative values.
  • the logical extension can relate to a higher or a lower addresses compared to the physical memory.
  • the compression unit CU may be active on multiple address ranges, i.e. the settings registers SR comprise of a logical address table with multiple entries and an associative lookup operation is required to decide whether the data associated with a given address needs to be (de)compressed.
  • the address range and offset setting of a compression unit CU may be valid for a single image, or for multiple images. In other words, a logical address range does not necessarily coincide with an image. This may reduce the number of entries in the logical address table, which reduces register space and increases address lookup speed. It is possible to accommodate various image sizes simultaneously. E.g. one HD image can be interwoven with multiple SD images.
  • Fig. 11 shows a block diagram of an interface unit IU according to a fifth embodiment.
  • the interface unit IU comprises a compression unit CU, a transformation unit TU and a control unit CDRL.
  • the start address of the logical extension is stored in a register LEA ("logical extension address").
  • the transformation unit TU comprises a comparison unit COMU for comparing the logical address with the start address of the logical extension stored in the register LEA.
  • the output of the comparison unit COMU will control the input of the compression unit such that only those data is compressed which will correspond to a respective address.
  • the transformation unit TU furthermore comprises a look up unit LU for looking up the start and end address within the registers of the control unit CTRL.
  • the transformation unit furthermore comprises a calculation unit CAU for calculating the physical address according to the offset and the compression ratio.
  • the criterion for activation of the compression unit is simplified: all access in the logical extension need to be (de)compressed. This can be performed by a single address comparison simplifying the hardware implementation where fast detection of bypass mode is important for regular (i.e. non- video) data accesses. Accordingly, the impact on the system behavior for regular processing is minimal. According to this embodiment, the decision to bypass the (de)compression unit can be taken before the result of the lookup operation in the logical address table is available.
  • the fast selection saves power, e.g. in a pipelined system it avoids that the (de)compression unit is started before it is known that there is no need to be active. Even further, the (de)compression unit and address lookup logic can be deactivated when not required. In all of the above embodiments the Quality of Service can still be applied.
  • the memory allocation can be arranged for e.g. a compression factor of 2.
  • the compression factor may be increased (potentially dynamically, i.e. while the application is running), thus reducing bandwidth while the memory addressing is not changed.
  • This requires separate control of the compression ratio that is used by the (de)compression unit and the compression ratio used by the address translation, i.e. separate settings registers are required (not shown in the figure).
  • the interface unit can be associated to every IP block accessing compressed data.
  • the interface unit as described above can also be used at the main memory interface, reducing off-chip bandwidth and memory footprint, while the on-chip bandwidth is not reduced.
  • the interface unit can also be applied between a first and second level data cache. This is particularly advantageous since this way compressed data is stored in the second level cache, thus increasing its effective memory capacity (or allowing a reduction of the amount of 2 nd level cache memory). So this either increases application use of the cache or it reduces chip area and thus cost.
  • Fig. 12 shows a representation of a memory management. A specific image allocation will now be described which can abstract the control of logical memory space from the rest of the software system. The arrows in the Fig. 12 indicate the control flow.
  • the functions malloc() and free() are typically available in the system to manage physical memory.
  • the functions malloc_image() and free_image() are provided by the image memory allocator IMA to obtain and release memory in logical address space.
  • the state maintains the administration of the physical and logical memory that is managed by the module. Therefore, it keeps track of the physical memory allocated by the module, the logical memory that is in use by the application, and the logical memory that is free.
  • ⁇ ptr get logical memoryO; /* mark as used in state info */ setup_hw(); /* setup of address range and offset */ return ptr; ⁇
  • free_image() is the counterpart of malloc_image(). It also keeps track of the state and assures that the hardware settings are erased. When physical memory is not in use anymore, free() can be called to release it to the operating system.
  • the image memory allocator is the only component in the system that maintains the relation between the physical and logical memory space. Furthermore, it is the only component in the system that is aware of the distinction between logical and physical memory. For some embodiments, the image memory allocation routine requires also the compression ratio. Then the function prototype correspond to:
  • the state can be stored in a globally accessible memory area, i.e. accessible from any task or process in the system. Only the image memory allocator needs access to its state. In a concurrent operating system, management of the state must be considered a critical section where mutual access needs to be guarded by means of semaphores.
  • the image memory allocation module is typically an extension of a standard C library. The principles of the invention can be exploited in very large software systems, with only minimal additional software. Furthermore, merely during initialization minor software adaptations are required compared to not using the principles of the invention.
  • the memory footprint is reduced without the need to adapt any signal processing component such that compression and address translation is transparent for the IP blocks IP. Furthermore, only marginal impact is present on memory access latency for regular data accesses that do not require compression. Only minor adaptations of an application are required.
  • the relation between logical and physical memory can be abstracted in a special image memory allocation software module. The principles of the invention are particularly interesting in case the required amount of memory exceeds the available memory footprint.
  • the commercially available commodity memory devices typically have a memory size of a power of 2, hence the choice is limited to e.g. 16 Mbyte or 32 Mbyte. If e.g. the system would require 18 Mbyte, 14 Mbytes would be left unused in the system if no compression is performed. With application of embedded compression according to the invention, the memory use may be reduced to 16 Mbyte, thus leading to substantial cost savings.
  • the amount of on-chip memory can be kept equal (compared to storage of uncompressed image data) and embedded compression can be used to store more images on-chip, reducing the off-chip memory bandwidth.
  • Caching strategies can still be applied, independent of the choice to where to perform the address transformation. In other words, caching can be applied in the physical as well as in the logical address space. Hence, the method is also transparent for the cache.
  • the principles of the invention can also be used as a risk reduction feature.
  • system use cases are analyzed in early design phase of a system-on-chip (SoC).
  • SoC system-on-chip
  • insufficient details of the application and its implementation are known, so effective bus utilization and resource use are hard to estimate.
  • These can be significantly impacted by later design decision on e.g. bus topologies.
  • specification changes during the design process may lead to new use cases, which are hard to anticipate.
  • the principles of the invention allow different image quality versus system resource use trade-offs, without the need to adapt the IC design. This invention therefore supports quality of service and graceful degradation.

Abstract

A data processing device is provided, which comprises at least one processing unit (IP) for processing data based on a logical address space (LAS); and a communication infrastructure (B) comprising an interconnect (B) for communicating data and addresses between the at least one processing unit (IP) and a memory unit (MU) storing compressed and/or uncompressed data based on a physical address space (PAS). The data processing device further comprises a compression unit (CU) for compressing data and/or for decompressing compressed data. Moreover, a transformation unit (TU) is provided for performing an address transformation between the logical address space (LAS) and the physical address space (PAS) of an address associated with the data compressed and/or decompressed by the compression unit (CU). The size of the compressed data in the physical address space (PAS) is smaller than the size of the corresponding uncompressed data in the logical address space (LAS).

Description

Electronic device and method for storing and retrieving data
The present invention relates to an electronic device as well as a method for storing and retrieving data.
The ongoing advance of IC technology leads to steadily and rapidly increasing computational power for media processing applications both for fixed- function implementations using dedicated IP blocks as well as for programmable systems. The increased processing power also requires increased data throughput.
Video and image data processing systems can be implemented as a system-on- chip, which may require a high bandwidth for some of the applications run on the image processing system. Therefore, to reduce the bandwidth to a background memory and to reduce the actual amount of memory usage, the actual amount of data transferred to and from the memory is reduced by an embedded compression of the data stored in the memory, i.e. only compressed data is transferred to the memory.
For example, in a data processing system which processes image data in a streaming manner, the processing typically starts from top left and ends at the bottom right of an image. The image data can be decompressed immediately, i.e. on-the-fly, and is processed to be finally compressed again and is stored again in the background memory. As these systems are designed for a regular streaming data access, any random accesses may lead to difficulties. This is in particular true for an address-based computer like a system with a mix of hardware and software signal processing. Within such a data processing system the memory access speed improves at a much slower pace than processing speed. For typical memory devices, memory bandwidth is hampered by the latency associated with setting up a data transfer, i.e. the time required between offering an address and having the corresponding data word available. However, when this data word is available, neighboring data words are also available and can be transferred during subsequent clock cycles. For this reason, a memory transaction typically consists of a burst of consecutive data words by addressing data once while multiple data words are transferred as they belong to consecutive addresses. In this way memory bandwidth can be used efficiently. In order to remain efficient, the increasing latency (in terms of processor cycles) requires an increased burst length. Larger bursts will only be effective if the data words in the burst are actually needed by the processing units. Image processing functions have a good spatial locality, as large sets of neighboring pixels are typically processed together.
Fig. 1 shows a block diagram of an image data processing system according to the prior art. The system typically comprises a CPU and several image processing units IPU A - IPU C as well as a shared memory MU. The CPU, the image processing units, and the shared memory MU, are typically coupled by a bus B. To facilitate the communication between the CPU and the image processing units with the bus, bus interfaces BI are used. The shared memory MU will be coupled to the bus B via a memory interface MI. The CPU can allocate a buffer in the shared memory in order to facilitate its processing. Apart from allocating the buffers, the CPU may initiate a processing of the image processing units by programming the respective parameters into the image processing units. This may include the set up of the addresses in the buffer as allocated by the CPU. The image processing units IPU A - IPU C are typically dedicated processing units for performing various image processing. Thereafter, the image processing units IPU A - IPU C will perform their dedicated image processing and will store and retrieve the required image data from the buffers in the shared memory as notified by the CPU. After the dedicated image processing, the results are stored in an output buffer allocated in the shared memory MU. The data in the output buffer can be used by any one of the image processing units, by the CPU or can be output. In addition, a second level cache can be implemented in order to reduce any off-chip traffic.
To further optimize the usage and efficiency of the memory system, any access to the shared memory MU is typically performed in bursts of 64, 128 or 256 bytes of consecutive data. This is advantageous as the addressing only has to be performed once for every memory transfer. Burst-based memory transfers are also required for SDRAM memories. In addition, the memory system may be pipelined and the actual bus protocol can be decoupled from the specific system design including the overall memory bandwidth such that the memory unit can be based on a single or double data rate SDRAM without any influence on the bus protocol.
WO 2004/092960 shows a data processing apparatus for processing data being associated to a data address in a range of data addresses. The data is compressed and stored as compressed blocks in a memory. The memory address occupied by each block starts from a respective preferred starting address for a multi-address transfer to and from the memory system. Each of the blocks represents compressed data which are associated to the data addresses in the sub-range. A decompressor unit is provided for decompressing the compressed data from the memory. By storing compressed data in the memory system, the memory bandwidth can be reduced.
Fig. 2 shows a schematic representation of the compression of image data according to the prior art. A picture P is analyzed and the respective image data (the uncompressed data) is associated to application addresses AD. After the compression, the compressed data will be arranged in the actual physical address PA such that some of the address range is in use and some is not in use, i.e. there are unoccupied address ranges between the compressed data. This is in particular because of the actual compression being used which is a lossy compression with e.g. a fixed compression ratio of 2. This will result in unused spaces in the physical addresses of the memory. Accordingly, the bandwidth to the memory system is reduced by a factor of 2 due to the compression ratio. On the other hand, the actual compression of the image data will not lead to a reduced memory size.
It is therefore an object of the invention to provide a data processing device and a method for storing and retrieving data which enables a reduced memory bandwidth as well as a reduced memory.
This object is solved by an data processing device according to claim 1, video processing system according to 18 and a method for storing and retrieving data according to claim 19.
Therefore, a data processing device is provided, which comprises at least one processing unit for processing data based on a logical address space and a communication infrastructure comprising an interconnect for communicating data and addresses between the at least one processing unit and a memory unit storing compressed and/or uncompressed data based on a physical address space. The data processing device further comprises a compression unit for compressing data and/or for decompressing compressed data. Moreover, a transformation unit is provided for performing an address transformation between the logical address space and the physical address space of an address associated with the data compressed and/or decompressed by the compression unit. The size of the compressed data in the physical address space is smaller than the size of the corresponding uncompressed data in the logical address space. Therefore, a data processing device is provided which enables a transparent address transformation for all data which need to be stored in the memory unit.
According to an aspect of the invention the compression unit performs a lossy compression. As a lossy compression is performed, the size of the compressed data will be smaller than the size of the uncompressed data such that memory space can be saved.
According to an aspect of the invention the compression unit and/or the transformation unit are activated and deactivated according to the value of the address. Therefore, the compression unit and the transformation unit will only be activated if required, wherein the activation is performed based on the actual address of the memory access. According to an aspect of the invention a memory interface unit is coupled to the interconnect for handling a communication to the memory unit. The memory interface unit will take care of the communication between the memory unit and the interconnect such that the memory unit does not need to take care of the communication particulars.
According to an aspect of the invention an interface unit is associated to a processing unit for handling the communication between the processing unit and communication infrastructure. Therefore, the interface unit will take care of the communication between the processing unit and the communication infrastructure such that the processing units only need to perform their dedicated processings.
According to an aspect of the invention an access to the memory unit is performed in bursts of data, wherein the address transformation is performed once per burst. The latency of a memory access can be significantly reduced by performing the access to the memory in bursts and by performing the address transformation once per burst.
According to an aspect of the invention the address transformation is performed based on a start address of the burst to the memory unit. As the memory access will be performed in bursts, the data of the burst will be stored in a consecutive memory space such that merely the start address is required for the address transformation.
According to an aspect of the invention the address transformation unit is adapted to calculate an address by evaluating a mathematical expression involving a constant offset to be added to a logical address within the logical address space. By merely adding a constant offset to the logical address, the respective physical address can be achieved.
According to an aspect of the invention multiple logical address ranges are mapped to overlapping physical address ranges such that the respective images are stored interleaved in the physical address range. By storing images interleavedly in the physical address range, memory space can be saved. According to an aspect of the invention the data processing device comprises a control unit for activating and deactivating the transformation unit and/or the compression unit, wherein said control unit comprises settings registers The settings registers store information regarding the address ranges where data is required to be compressed. The control unit only activates the compression unit to compress or decompress data and/or the transformation unit to perform the address transformation if the control unit determines that an address of a memory access falls within the range of addresses of compressed data. By the usage of the control unit, the compression/decompression and transformation can be exactly performed only for those data where it is required. The invention also relates to a video processing system, which comprises a memory unit for storing compressed and/or uncompressed data based on a physical address space, and a memory interface unit for handling a communication between the memory unit and a communication infrastructure. The video processing system furthermore comprises at least one processing unit for processing data based on a logical address space and a communication infrastructure comprising an interconnect for communicating data and addresses between the at least one processing unit and a memory unit storing compressed and/or uncompressed data based on a physical address space. The data processing device further comprises a compression unit for compressing data and/or for decompressing compressed data. Moreover, a transformation unit is provided for performing an address transformation between the logical address space and the physical address space of an address associated with the data compressed and/or decompressed by the compression unit. The size of the compressed data in the physical address space is smaller than the size of the corresponding uncompressed data in the logical address space.
The invention also relates to a method for storing and retrieving data in a data processing device having at least one processing unit for processing data based on a logical address space. Data and addresses are communicated between the at least one processing unit and a memory unit which stores compressed and/or uncompressed data based on a physical address space. Data is compressed and/or compressed data is decompressed. An address transformation is performed between the logical address space and the physical address space of an address associated with the data compressed and/or decompressed by the compression unit. The size of the compressed data in the physical address space is smaller than the size of the corresponding uncompressed data in the logical address space.
The invention relates to the idea to provide a video processing system which distinguishes a logical address space used for processing from the actual physical address space used in a background memory. In particular, the logical address space may be larger than the physical address space such that the memory space is logically extended. The data processing of the electronic device will be based on logical addresses. An address transformation unit is provided for transforming any logical address to a physical address. The transformation of the addresses as well as the compression/decompression of data is controlled by an address discrimination.
The embodiments and advantages of the invention will now be described in more detail with reference to the figures.
Fig. 1 shows a block diagram of a data processing system according to the prior art,
Fig. 2 shows a schematic representation of a compression of an image according to the prior art, Fig. 3 shows a block diagram of a video processing device according to the present invention,
Fig. 4 shows a block diagram of an interface unit according to a first embodiment,
Fig. 5 shows a representation of the image compression and the address translation according to a second embodiment,
Fig. 6 shows a block diagram of an interface unit according to the second embodiment,
Fig. 7 shows a basic representation of a memory map according to the second embodiment, Fig. 8 shows a representation of a compression and address translation according to a third embodiment,
Fig. 9 shows a representation of a compression and address translation according to a fourth embodiment,
Fig. 10 shows a basic representation of a memory map according to the fourth embodiment,
Fig. 11 shows a block diagram of an interface unit according to a fifth embodiment, and
Fig. 12 shows a representation of a memory management. Fig. 3 shows a block diagram of an image or video data processing device according to the present invention. The system typically comprises a CPU and several so called IP blocks IP (which can be implemented as computation elements, memories, subsystems containing interconnect modules or image or video processing units) as well as a shared memory MU (which may be internal or external). The CPU, the image processing units IP, and the shared memory MU, are typically coupled by a bus B. However, the interconnect can also be realized by a network on chip or a network extending over several chips or devices. To facilitate the communication between the CPU and the image processing units with the bus, interface units IU are used. The shared memory MU will be coupled to the bus B via a memory interface MI.
The CPU can allocate a buffer in the shared memory MU in order to facilitate its processing. Apart from allocating the buffers, the CPU may initiate a processing of the image processing units IP by programming the respective parameters into the image processing units. This may include the set up of the addresses in the buffer as allocated by the CPU. The image processing units IPU are typically dedicated processing units for performing various image processing. Thereafter, the image processing units IP will perform their dedicated image processing and will store and retrieve the required image data from the buffers in the shared memory as notified by the CPU. After the dedicated image processing, the results are stored in an output buffer allocated in the shared memory MU. The data in the output buffer can be used by any one of the image processing units, by the CPU or can be output. Although in the above an interface unit IU is associated to each IP block IP, an interface unit IU may also be provided for several IP blocks.
The following embodiments relate to a data processing device, in particular for image or video processing with an external memory device. These devices may be implemented as systems-on-chip. A substantial part of the available memory bandwidth is consumed by image data and a memory-based communication is present between various (hardware or software) components. Typically all (or almost all) images are stored in an off- chip memory. The data processing device may comprise three types of components. IP blocks constitute hardware components dedicated to specific signal processing functions.
Signal processing functions however, can also be implemented as a software module. Finally, control software implements an application by taking care of the setup of the signal processing components, buffer management, data flow, etc. Preferably, the image or video data is sent in a streaming memory. Fig. 4 shows a block diagram of an interface unit according to a first embodiment. The interface unit IU comprises a compression unit CU, a transformation unit TU and optionally a control unit CTRL. The compression unit serves to compress data to be stored in the memory and to decompress data from the memory. The transformation unit TU serves to perform an address transformation for a memory access. The control unit CTRL serves to activate and deactivate the compression unit CU and the transformation unit TU. This activation will depend on the address of the memory access. Preferably, the interface unit IU will be coupled between an IP block IP and a memory MU within the data processing system. The processing of the IP blocks IP will be performed in a logical address space LAS while the memory is based on a physical address space PAS. According to the first embodiment, the logical address space is larger than the actual physical space of the memory such that a logical extension of the memory is provided. Therefore, the transformation unit TU serves to transform the logical address into a physical address.
If data is to be written into the memory, the IP block IP will request a data access and may indicate an address as well as the data to be written. The data to be transferred (uncompressed data) dtu is forwarded to the compression unit CU where this data is compressed and the compressed data dtc is forwarded to the memory. At the same time, the address addri (in the logical address space) is supplied to the transformation unit TU which performs the address transformation to transform the logical address addri to the physical address addrp. Accordingly, the compressed data will be stored in the memory at the modified address, i.e. the physical address addrp.
If the IP block IP needs to read data from the memory, the IP block will supply an address in the logical address space. The address will be transformed in the transformation unit TU into a physical address and the data at this address is fetched from the memory. If this data is compressed data, this compressed data will be decompressed in the compression unit CU and will be forwarded to the IP block IP such that the IP block IP may perform its processing thereon. The IP block can also access data that is not subject to data compression. In this case no (de) compression nor an address translation is carried out, so both the address and the data address are passed unmodified. The control unit CTRL keeps track of those logical addresses and the corresponding physical addresses in the memory which contain compressed data and those logical addresses and the corresponding physical address ranges which do not comprise compressed data. The compression unit CU and the transformation unit TU will only be activated if the memory access involves compressed data. Therefore, the control unit CTRL compares the address of a memory access with the address ranges of compressed data and the address ranges of uncompressed data. This can for example be performed by determining whether the address of the memory access is within the logical address range which corresponds to the logical extension range. Alternatively, the comparison may be performed in the transformation unit.
When e.g. a filter IP block IP processes an image, its operation is set up by control software executing on a processor which specifies the source address of the input image and the destination address of the output image. The IP block autonomously performs direct memory access (DMA) to traverse the images. When finished, the IP block typically issues an interrupt to the processor. This way, the software maintains the memory buffers while multiple hardware blocks can perform various processing steps in an application concurrently.
Fig. 5 shows a representation of the image compression and the address translation according to a second embodiment. The compression of data and the address translation according to the second embodiment will correspond to the compression of data and address translation as described according to the first embodiment. In particular, the processing according to the second embodiment, regarding the compression and translation, will be performed by an interface unit IU as described according to Fig. 4. In Fig. 5, two images II, 12 are shown which are segmented into segments of 128 bytes. Each of these segments is compressed to 64 bytes in a logical address space LAS according to a fixed compression factor of 2. As the start addresses of each of the image segments of the compressed data is unchanged, holes or empty spaces will occur between the end of a compressed image segment and the start of a new compressed image segment. Thereafter, the logical address is transferred to the physical address PA preferably by the transformation unit TU as shown in Fig. 4. This can be performed by using an address offset which is typically constant within an image. The first image Il will have an offset of zero such that the logical address will correspond to the physical address. The second image 12 will have an offset such that the compressed image segments of the second image 12 can fill up the empty holes between adjacent compressed image segments from the first image II. Accordingly, the transformation from the logical space to the physical address space is performed by introducing an address offset. The address offset is constant at least within a certain address range and can be constant within an image. Accordingly, by merely storing the address offset setting in the compression unit CU or in the translation unit TU, the need for any state information during a compression processing is obsolete. As mentioned above, the address translation is very simple if merely an address offset is used. The physical address p will correspond to an application address a plus the address offset o.
If a compression factor of >= 2 is used a single image can be stored in a smaller buffer. This can be performed by interweaving two halves of a single image into a single physical address area resulting in more efficient memory utilization when e.g. an application requires an odd number of equal-sized images at the cost of an extra entry in the logical address table. When this is applied for all images, the number of entries is twice the number of images (still well manageable, but relatively high). A single physical allocation of half the image size is used for two logical memory chunks located adjacent of each other to store the image. Then all physical memory is in use, so there is no need to keep track of free physical memory. Furthermore, also all newly allocated logical memory is in use, so there is also no need to keep track of available and free logical memory. Hence, the memory allocation is simplified, since no state needs to be maintained by a memory allocator.
Furthermore, if multiple images are to be stored in multiple buffers different compression factors can be combined. E.g. a factor of 1.6 for one image reduces 128 bytes to 80 bytes. This can be combined with an other image compressed by a factor of 2.67. This may be utilized in e.g. a video compression standard like mpeg, where B frames are less sensitive to error propagation (thus compressed more aggressively) compared to I and P frames. So in cases where different images have different quality requirements, optimal selection of the compression factors is possible. Further, different compression factors can be applied to different types of data, like images, depth maps, graphics, etc.
Fig. 6 shows a block diagram of an interface unit according to the second embodiment. The interface unit IU according to the second embodiment substantially corresponds to the interface unit according to Fig. 4. The control unit CTRL comprises a plurality of setting registers SR which may contain the start address start, the end address end as well as an address offset offset. The settings will define an address range within which the compression unit CU needs to be activated for compression or decompression.
The transformation unit TU determines based on the settings in the setting registers, i.e. the start address, the end address as well as the address offset, whether an address of a memory access falls within this range or not. If the address of the memory access does not fall into the range where a compression is required, the logical address will correspond to the physical address, i.e. the address offset will correspond to zero. In this case, the data will not undergo a compression and will therefore bypass the compression unit. Such a bypass may be implemented inside or outside the compression unit as depicted in Fig. 6. However, if the transformation unit TU determines that the address of a memory access falls within the range determined by the setting registers, the data dtu are compressed within the compression unit CU and the compressed data dtc is forwarded to the memory. At the same time, an address offset as stored in the setting registers SR will be added to the logical address addri in order to obtain the physical address addrp. The compressed data will then be stored at the physical address as determined by the transformation unit TU.
Fig. 7 shows a basic representation of a memory map according to the second embodiment. The left hand side shows a memory map of the logical address space LAS and the right hand side depicts a memory map of the physical address space PAS. The memory unit MU can e.g. comprises a memory space of 16 Mbyte within an address range of OXCOOOOOOO - OxCOFFFFFF, i.e. this will correspond to the physical address space PAS. As mentioned above, the logical address space may be larger than the physical address space, wherein the difference between the logical address space and the physical address space can be referred to as a logical extension. In the present case, the logical extension will start at an address of OxAOOOOOOO. If the images II, 12 have a resolution of fer example 1920 x 1080, each image will require 0xlFA400 bytes. If the memory allocation corresponds to the physical address range of 0xC0800000 - 0xC09FA3FF, then the settings of the address transformation unit with the respect to the first image Il will correspond to
Start: 0xC0800000
End: 0xC09FA3FF
Offset: 0x00000000
The settings of the address transformation unit with respect to the second image 12 is as follows:
Start: OxAOOOOOOO
End: 0xA01FA3FF
Offset: 0x20800040
By the offset the logical start address of OxAOOOOOOO of the second image 12 is translated to a physical start address of 0xC0800040, as OxAOOOOOOO + 0x20800040 = 0xC0800040. Once the memory allocation has been performed, the memory access will be performed accordingly. Typically the address set up is performed by the memory allocation during an initialization of an application. As soon as the application is running, none of the components of the data processing system needs to be aware of the existence of any embedded compression nor of any distinction between logical and physical address space. This can be realized by the usage of the interface units which are coupled between the IP blocks IP and the memory units. These interface units IU will take care of all of the communication between the IP block IP and the memory such that the IP block IP can perform its dedicated processing without having to care for any of the communication between the memory and the interface.
Fig. 8 shows a representation of a compression and address translation according to a third embodiment. Here, the compression and the address translation is performed on a single image. The compression as well as the address translation can be performed by an interface unit IU as depicted in Fig. 4 or Fig. 6. Here, the physical address p will correspond to p = s + (a-s)*r, wherein "a" corresponds to the application address, "s" corresponds to the start address of the image and "r" corresponds to the compression ratio. As the start address as well as the compression ratio are constant within at least an address range of an image, these values can be stored in the setting registers and can be accessed by the transformation unit TU. Accordingly, buffering or caching can be performed advantageously allowing larger data transfers to off-chip SDRAM (in physical address space) to increase efficiency of the SDRAM accesses. Further this requires on-chip buffering of compressed data to utilize locality of reference. Thus, bus efficiency is improved.
The third embodiment is advantageous with respect to the first embodiment as compression factors can be used which are not a power of 2. E.g. a compression factor of 1.6 corresponds to a compression ratio of 5/8, which is conveniently coded as fixed-point value for the multiplication in the address calculation.
However, according to the third embodiment, an application being run in the system still needs to access the application addresses, which reside in physical memory space. Therefore, the "free" or unoccupied space at the end of the compressed image cannot be utilized for other purposes in a transparent manner, since that address space is actually used by the application. Therefore, there is no saving of memory footprint.
Fig. 9 shows a representation of a compression and address translation according to a fourth embodiment. The fourth embodiment may be based on a combination of the second and third embodiment such that the advantages of both systems are combined at the cost of a slightly more expensive address transformation. When the application accesses image data at application address a, the address transformation unit calculates a physical address p as follows: p = o + s + (a-s)*r, wherein "o" corresponds to the constant address offset, "s" corresponds to the start address of the image (in logical address space), and "r" correspond to the compression ratio.
Fig. 10 shows a basic representation of a memory map according to the fourth embodiment. The physical memory has 16 Mbyte, which is located at an address range of OXCOOOOOOO - OxCOFFFFFF. The logical extension of this memory starts at address OXAOOOOOOO. AS according to Fig. 6 a first and second image Il and 12 requires 0xlFA400 bytes such that the physical address range corresponds to 0xC0800000 - 0xC09FA3FF. Therefore, the settings of the address transformation unit with respect to the first image:
Start: 0xA01FA400 End: 0xA03F47FF
Offset: 0x20702E00 compression Ratio of 0.5.
The settings of the address transformation unit with respect to the second image:
Start: OxAOOOOOOO End: 0xA01FA3FF Offset: 0x20800000 compression Ratio: 0.5
According to a further embodiment the offset value can be coded as a 2's complement number, which can also hold negative values. Hence, the logical extension can relate to a higher or a lower addresses compared to the physical memory. In each of the above embodiments the compression unit CU may be active on multiple address ranges, i.e. the settings registers SR comprise of a logical address table with multiple entries and an associative lookup operation is required to decide whether the data associated with a given address needs to be (de)compressed. The address range and offset setting of a compression unit CU may be valid for a single image, or for multiple images. In other words, a logical address range does not necessarily coincide with an image. This may reduce the number of entries in the logical address table, which reduces register space and increases address lookup speed. It is possible to accommodate various image sizes simultaneously. E.g. one HD image can be interwoven with multiple SD images.
Fig. 11 shows a block diagram of an interface unit IU according to a fifth embodiment. The interface unit IU comprises a compression unit CU, a transformation unit TU and a control unit CDRL. The start address of the logical extension is stored in a register LEA ("logical extension address"). The transformation unit TU comprises a comparison unit COMU for comparing the logical address with the start address of the logical extension stored in the register LEA. The output of the comparison unit COMU will control the input of the compression unit such that only those data is compressed which will correspond to a respective address. The transformation unit TU furthermore comprises a look up unit LU for looking up the start and end address within the registers of the control unit CTRL. The transformation unit furthermore comprises a calculation unit CAU for calculating the physical address according to the offset and the compression ratio.
If all images can reside in the logical address extension by always using a nonzero offset to access physical memory then the criterion for activation of the compression unit is simplified: all access in the logical extension need to be (de)compressed. This can be performed by a single address comparison simplifying the hardware implementation where fast detection of bypass mode is important for regular (i.e. non- video) data accesses. Accordingly, the impact on the system behavior for regular processing is minimal. According to this embodiment, the decision to bypass the (de)compression unit can be taken before the result of the lookup operation in the logical address table is available.
Furthermore, the fast selection saves power, e.g. in a pipelined system it avoids that the (de)compression unit is started before it is known that there is no need to be active. Even further, the (de)compression unit and address lookup logic can be deactivated when not required. In all of the above embodiments the Quality of Service can still be applied.
The memory allocation can be arranged for e.g. a compression factor of 2. When the system runs out of bandwidth resources, the compression factor may be increased (potentially dynamically, i.e. while the application is running), thus reducing bandwidth while the memory addressing is not changed. This requires separate control of the compression ratio that is used by the (de)compression unit and the compression ratio used by the address translation, i.e. separate settings registers are required (not shown in the figure).
The principles of the invention can be applied at various places in the memory hierarchy. As already described above, the interface unit can be associated to every IP block accessing compressed data.
The interface unit as described above can also be used at the main memory interface, reducing off-chip bandwidth and memory footprint, while the on-chip bandwidth is not reduced.
Alternatively, the interface unit can also be applied between a first and second level data cache. This is particularly advantageous since this way compressed data is stored in the second level cache, thus increasing its effective memory capacity (or allowing a reduction of the amount of 2nd level cache memory). So this either increases application use of the cache or it reduces chip area and thus cost.
Fig. 12 shows a representation of a memory management. A specific image allocation will now be described which can abstract the control of logical memory space from the rest of the software system. The arrows in the Fig. 12 indicate the control flow.
The functions malloc() and free() are typically available in the system to manage physical memory. The functions malloc_image() and free_image() are provided by the image memory allocator IMA to obtain and release memory in logical address space. The state maintains the administration of the physical and logical memory that is managed by the module. Therefore, it keeps track of the physical memory allocated by the module, the logical memory that is in use by the application, and the logical memory that is free.
Its operation is illustrated by means of the following pseudo code.
void *malloc_image(size_t size)
{ if (not logical memory available) /* check state information */
{ malloc(); /* obtain new physical memory */ register_new_memory(); /* logical memory available in state */
} ptr = get logical memoryO; /* mark as used in state info */ setup_hw(); /* setup of address range and offset */ return ptr; }
The function free_image() is the counterpart of malloc_image(). It also keeps track of the state and assures that the hardware settings are erased. When physical memory is not in use anymore, free() can be called to release it to the operating system.
The image memory allocator is the only component in the system that maintains the relation between the physical and logical memory space. Furthermore, it is the only component in the system that is aware of the distinction between logical and physical memory. For some embodiments, the image memory allocation routine requires also the compression ratio. Then the function prototype correspond to:
extern void *malloc_image(size_t size, ratio t ratio)
The state can be stored in a globally accessible memory area, i.e. accessible from any task or process in the system. Only the image memory allocator needs access to its state. In a concurrent operating system, management of the state must be considered a critical section where mutual access needs to be guarded by means of semaphores. The image memory allocation module is typically an extension of a standard C library. The principles of the invention can be exploited in very large software systems, with only minimal additional software. Furthermore, merely during initialization minor software adaptations are required compared to not using the principles of the invention.
With above described interface unit the memory footprint is reduced without the need to adapt any signal processing component such that compression and address translation is transparent for the IP blocks IP. Furthermore, only marginal impact is present on memory access latency for regular data accesses that do not require compression. Only minor adaptations of an application are required. The relation between logical and physical memory can be abstracted in a special image memory allocation software module. The principles of the invention are particularly interesting in case the required amount of memory exceeds the available memory footprint.
If an external shared memory is required, the commercially available commodity memory devices typically have a memory size of a power of 2, hence the choice is limited to e.g. 16 Mbyte or 32 Mbyte. If e.g. the system would require 18 Mbyte, 14 Mbytes would be left unused in the system if no compression is performed. With application of embedded compression according to the invention, the memory use may be reduced to 16 Mbyte, thus leading to substantial cost savings.
With the current progress in IC technology, it becomes feasible to store images in on-chip caches or buffers. For on-chip storage, the system designer has more freedom to adapt the memory size. Applying the principles of the invention can directly result in cost saving due to reduction of the amount of on-chip memory.
Alternatively, the amount of on-chip memory can be kept equal (compared to storage of uncompressed image data) and embedded compression can be used to store more images on-chip, reducing the off-chip memory bandwidth.
Caching strategies can still be applied, independent of the choice to where to perform the address transformation. In other words, caching can be applied in the physical as well as in the logical address space. Hence, the method is also transparent for the cache.
The principles of the invention can also be used as a risk reduction feature. Often, system use cases are analyzed in early design phase of a system-on-chip (SoC). At that early design stage, insufficient details of the application and its implementation are known, so effective bus utilization and resource use are hard to estimate. These can be significantly impacted by later design decision on e.g. bus topologies. Furthermore, specification changes during the design process may lead to new use cases, which are hard to anticipate. Instead of hitting hard bandwidth or memory size barriers, the principles of the invention allow different image quality versus system resource use trade-offs, without the need to adapt the IC design. This invention therefore supports quality of service and graceful degradation.
Only minor additional hardware is required compared to the system according to WO 2004/092960, which is incorporated by reference. Furthermore, the hardware can be controlled by a single, straightforward software module, which is only activated at initialization of an application. Accordingly, the hardware as well as the software modifications are small compared to state-of-the-art systems such that the extra cost of this invention is negligible in all practical design cases, while the savings are paramount.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. In the device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Furthermore, any reference signs in the claims shall not be construed as limiting the scope of the claims.

Claims

CLAIMS:
1. Data processing device, comprising: at least one processing unit (IP) for processing data based on a logical address space (LAS); a communication infrastructure (B) comprising an interconnect (B) for communicating data and addresses between the at least one processing unit (IP) and a memory unit (MU) storing compressed and/or uncompressed data based on a physical address space (PAS); a compression unit (CU) for compressing data and/or for decompressing compressed data; and a transformation unit (TU) for performing an address transformation between the logical address space (LAS) and the physical address space (PAS) of an address associated with the data compressed and/or decompressed by the compression unit (CU); wherein the size of the compressed data in the physical address space (PAS) is smaller than the size of the corresponding uncompressed data in the logical address space (LAS).
2. Data processing device according to claim 1, wherein the compression unit (CU) is adapted to perform a lossy compression.
3. Data processing device according to claim 1, wherein the compression unit
(CU) and/or the transformation unit (TU) are activated and deactivated according to the value of the address.
4. Data processing device according to claim 1, further comprising a memory interface unit (MI) coupled to the interconnect (B) for handling a communication to the memory unit (MU).
5. Data processing device according to claim 1, further comprising an interface unit (IU) being associated to a processing unit (IP) for handling the communication between the processing unit (IP) and communication infrastructure.
6. Data processing device according to claim 4, further comprising an interface unit (IU) being associated to a memory interface unit (MI) for handling the communication between the communication infrastructure and the memory interface.
7. Data processing device according to claim 3, wherein an access to the memory unit (MU) is performed in bursts of data, wherein the address transformation is performed once per burst.
8. Data processing device according to claim 7, wherein the address transformation is performed based on a start address of the burst to the memory unit (MU).
9 Data processing device according claim 1 , wherein the address transformation unit (TU) is adapted to calculate an address by evaluating a mathematical expression.
10. Data processing device according to claim 9, wherein the mathematical expression involves a constant offset to be added to a logical address within the logical address space (LAS).
11. Data processing device according to claim 8, wherein multiple logical address ranges are mapped to overlapping physical address ranges such that the respective images are stored interleaved in the physical address range.
12. Data processing device according to claim 9, wherein the expression involves a compression ratio, and an image start address, and/or adding an optional constant offset.
13. Data processing device according to claim 1, further comprising a control unit for activating and deactivating the transformation unit (TU) and/or the compression unit (CU), wherein said control unit (CTRL) comprises settings registers (SR).
14. Data processing device according to claim 13, wherein the settings registers
(SR) store information regarding the address ranges where data is required to be compressed, wherein the control unit (CTRL) only activates the compression unit (CU) to compress or decompress data and/or the transformation unit (TU) to perform the address transformation if the control unit (CTRL) determines that an address of a memory access falls within the range of addresses of compressed data.
15 Data processing device according to claim 13, wherein the settings registers (SR) store the values of the parameters of the mathematical expression.
16 Data processing device according to claim 13, wherein the settings registers further comprise a logical extension address register (LEA) which is used by the control unit (CTRL) to determine whether the address translation unit (TU) and/or the compression unit (CU) require activation.
17. Data processing device according to claim 13, wherein the settings registers further comprise registers for storing a compression ratio value wherein a compression ratio value is associated with an address range and the compression unit uses that value when processing data within the address range.
18. Video processing system, comprising: a memory unit (MU) for storing compressed and/or uncompressed data based on a physical address space (PAS); a memory interface unit (MI) for handling a communication between the memory unit (MU) and a communication infrastructure; at least one processing unit (IP) for processing data based on a logical address space (LAS); and a communication infrastructure (B) comprising an interconnect (B) for communicating data and addresses between the at least one processing unit (IP) and a memory unit (MU); and a compression unit (CU) for compressing data and/or for decompressing compressed data, and a transformation unit (TU) for performing an address transformation between the logical address space (LAS) and the physical address space (PAS) of an address associated with the data compressed and/or decompressed by the compression unit (CU); wherein the size of the compressed data in the physical address range is smaller than the size of the corresponding uncompressed data in the logical address range.
19. Method for storing and retrieving data in a data processing device having at least one processing unit (IP) for processing data based on a logical address space (LAS); comprising the steps of: communicating data and addresses between the at least one processing unit (IP) and a memory unit (MU) storing compressed and/or uncompressed data based on a physical address space (PAS); compressing data and/or decompressing compressed data, and performing an address transformation between the logical address space (LAS) and the physical address space (PAS) of an address associated with the data compressed and/or decompressed by the compression unit (CU); wherein the size of the compressed data in the physical address space (PAS) is smaller than the size of the corresponding uncompressed data in the logical address space (LAS).
PCT/IB2007/051783 2006-05-24 2007-05-11 Electronic device and method for storing and retrieving data WO2007135602A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP06114455 2006-05-24
EP06114455.6 2006-05-24

Publications (1)

Publication Number Publication Date
WO2007135602A1 true WO2007135602A1 (en) 2007-11-29

Family

ID=38515466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2007/051783 WO2007135602A1 (en) 2006-05-24 2007-05-11 Electronic device and method for storing and retrieving data

Country Status (1)

Country Link
WO (1) WO2007135602A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011153075A1 (en) * 2010-06-01 2011-12-08 Qualcomm Incorporated Virtual buffer interface methods and apparatuses for use in wireless devices
US8527993B2 (en) 2010-06-01 2013-09-03 Qualcomm Incorporated Tasking system interface methods and apparatuses for use in wireless devices
EP2642397A1 (en) * 2012-03-23 2013-09-25 LSI Corporation System for dynamically adaptive caching
US8631055B2 (en) 2009-09-30 2014-01-14 Samplify Systems, Inc. Enhanced multi-processor waveform data exchange using compression and decompression
US8718142B2 (en) 2009-03-04 2014-05-06 Entropic Communications, Inc. System and method for frame rate conversion that utilizes motion estimation and motion compensated temporal interpolation employing embedded video compression
US9026568B2 (en) 2012-03-30 2015-05-05 Altera Corporation Data compression for direct memory access transfers
US9158686B2 (en) 2012-03-30 2015-10-13 Altera Corporation Processing system and method including data compression API
US9158695B2 (en) 2011-08-09 2015-10-13 Seagate Technology Llc System for dynamically adaptive caching

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000045516A1 (en) * 1999-01-29 2000-08-03 Interactive Silicon, Inc. System and method for parallel data compression and decompression
US20040162954A1 (en) * 2002-07-31 2004-08-19 Texas Instruments Incorporated Reformat logic to translate between a virtual address and a compressed physical address
WO2004092960A2 (en) * 2003-04-16 2004-10-28 Koninklijke Philips Electronics N.V. Selectable procession / decompression for data stored in memory

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000045516A1 (en) * 1999-01-29 2000-08-03 Interactive Silicon, Inc. System and method for parallel data compression and decompression
US20040162954A1 (en) * 2002-07-31 2004-08-19 Texas Instruments Incorporated Reformat logic to translate between a virtual address and a compressed physical address
WO2004092960A2 (en) * 2003-04-16 2004-10-28 Koninklijke Philips Electronics N.V. Selectable procession / decompression for data stored in memory

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8718142B2 (en) 2009-03-04 2014-05-06 Entropic Communications, Inc. System and method for frame rate conversion that utilizes motion estimation and motion compensated temporal interpolation employing embedded video compression
US8631055B2 (en) 2009-09-30 2014-01-14 Samplify Systems, Inc. Enhanced multi-processor waveform data exchange using compression and decompression
WO2011153075A1 (en) * 2010-06-01 2011-12-08 Qualcomm Incorporated Virtual buffer interface methods and apparatuses for use in wireless devices
US8527993B2 (en) 2010-06-01 2013-09-03 Qualcomm Incorporated Tasking system interface methods and apparatuses for use in wireless devices
US8725915B2 (en) 2010-06-01 2014-05-13 Qualcomm Incorporated Virtual buffer interface methods and apparatuses for use in wireless devices
US9158695B2 (en) 2011-08-09 2015-10-13 Seagate Technology Llc System for dynamically adaptive caching
EP2642397A1 (en) * 2012-03-23 2013-09-25 LSI Corporation System for dynamically adaptive caching
JP2013200868A (en) * 2012-03-23 2013-10-03 Lsi Corp System for dynamically adaptive caching
US9026568B2 (en) 2012-03-30 2015-05-05 Altera Corporation Data compression for direct memory access transfers
US9158686B2 (en) 2012-03-30 2015-10-13 Altera Corporation Processing system and method including data compression API

Similar Documents

Publication Publication Date Title
WO2007135602A1 (en) Electronic device and method for storing and retrieving data
US6795897B2 (en) Selective memory controller access path for directory caching
CN103221995B (en) Stream translation in display tube
JP3197866B2 (en) Method and computer system for improving cache operation
US9430394B2 (en) Storage system having data storage lines with different data storage line sizes
CN108885585B (en) Providing memory bandwidth compression using multiple Last Level Cache (LLC) lines in a Central Processing Unit (CPU) based system
JP2006099774A (en) Data processing method and device, processing system, computer processing system, and computer network
JP5196239B2 (en) Information processing apparatus and method
US9785345B2 (en) Mode-dependent access to embedded memory elements
US7162583B2 (en) Mechanism to store reordered data with compression
JP7106775B2 (en) graphics surface addressing
US11488350B2 (en) Compression techniques and hierarchical caching
EP1535163A1 (en) Processor prefetch to match memory bus protocol characteristics
US8862823B1 (en) Compression status caching
KR102117511B1 (en) Processor and method for controling memory
JP4459641B2 (en) Computer system with built-in sequential buffer to improve data access performance of DSP and access method of the computer system
US11256629B2 (en) Cache filtering
US20240111686A1 (en) Application processor, system-on-a-chip and method of operation thereof
US20050172081A1 (en) Information processing method and apparatus
KR20240045069A (en) Application processor, system on chip and method of operation thereof
KR20230129551A (en) data processing systems
KR20210088304A (en) Operating method of image processor, image processing apparatus and operating method of image processing apparatus
CN113971626A (en) Data processing system
Vercillo et al. Advanced image memory architecture
JPH11120078A (en) Parallel processing processor and memory managing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07735858

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07735858

Country of ref document: EP

Kind code of ref document: A1