WO2000065453A1 - Direct memory access engine for data cache control - Google Patents

Direct memory access engine for data cache control Download PDF

Info

Publication number
WO2000065453A1
WO2000065453A1 PCT/US2000/010503 US0010503W WO0065453A1 WO 2000065453 A1 WO2000065453 A1 WO 2000065453A1 US 0010503 W US0010503 W US 0010503W WO 0065453 A1 WO0065453 A1 WO 0065453A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
cache
block
execution unit
transfer
Prior art date
Application number
PCT/US2000/010503
Other languages
French (fr)
Inventor
Govind Kizhepat
Phillip Lowe
Kenneth Ying Yuen Choy
Debashis Chatterjee
Original Assignee
Icompression, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Icompression, Inc. filed Critical Icompression, Inc.
Priority to AU43623/00A priority Critical patent/AU4362300A/en
Publication of WO2000065453A1 publication Critical patent/WO2000065453A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes

Definitions

  • the application relates generally to memory management systems, and specifically to
  • Multimedia encoders such as those used for MPEG and MPEG2 encoding, provide
  • Integrated MPEG encoders use an embedded processor to
  • FIG. 1 illustrates a block diagram of a conventional embedded processor 100 for use in an integrated circuit
  • the instruction cache 128 feeds an instruction stream to the instruction
  • decode unit 124 which decodes instructions within the stream for the execution unit 104.
  • decoded instructions are executed by the execution unit 104.
  • the data cache controller 112 The data cache controller 112
  • the data cache is typically divided into a number of sets, where each set contains a
  • Each cache line has a tag that holds a number of
  • a cache line is
  • the data cache controller 112 initiates one or more external memory accesses and
  • the execution unit 104 In an integrated MPEG encoder, the execution unit 104 must operate on blocks and
  • processing of the data block begins. Additionally, it is desirable in such systems to pre-load
  • execution unit 104 to devote its resources to other computationally intensive tasks.
  • an DMA engine is coupled to a data cache
  • the DMA engine operates independently of the execution unit and
  • the DMA engine with block transfer information and the DMA engine performs the rest of
  • the data cache is
  • the cache controller and DMA engine are
  • unit preferably dynamically determines whether the cache controller or the DMA engine will
  • execution unit to use the DMA engine or the cache controller to perform the data transfer.
  • the audio and video blocks of data are loaded into the data
  • the DMA engine sets a status flag to indicate to the execution unit that the block
  • the blocks can be of
  • data need only be retrieved from main memory, but using the DMA
  • Figure 1 is a block diagram of a prior art embedded processor.
  • Figure 2 is a block diagram of a preferred embodiment of an embedded processor in
  • Figure 3 is a block diagram of a data cache.
  • FIG. 4 is a block diagram of a data cache line tag in accordance with the present
  • FIG. 5 is a block diagram of an embodiment of an DMA engine in accordance with
  • Figure 6 is a flowchart illustrating a preferred method of transferring a block of data
  • Figure 7 is a flowchart illustrating a preferred method of transferring a block of data to
  • Figure 8 is a flowchart illustrating a preferred method of allocating sets in the data
  • Figure 2 is a block diagram illustrating a preferred embodiment of an embedded
  • the embedded processor 200 comprises
  • instruction cache controller 232 instruction cache 228, instruction decoder 224, external
  • DMA engine 250 is coupled to the execution unit 204, external
  • the DMA engine 250 is designed to
  • DMA engine 250 transfers instruction data to and from the
  • DMA engine 250 then performs the block transfer without further assistance from the
  • the execution unit transmits standard data requests to the cache
  • controller 212 across line 209.
  • the DMA engine 250 of the present invention does not
  • Figure 3 illustrates a preferred embodiment of the data cache 216 in accordance with
  • the data cache 216 is optimized to support both traditional data
  • the data cache 216 is organized into sets 304. Each set 304 contains a number of cache
  • the sets 304 are organized in the data cache 216 responsive to the type of
  • a typical set is 256x32 bits
  • FIG. 3 is shown for digital audio applications, each block representing a byte.
  • Two sets 304 are shown, one is illustrated having eight cache lines 300. Thus, the cache is 32
  • the data cache 216 has a busy tag 308 and a direction tag
  • the busy tag 308 indicates to the execution unit 204 whether the data cache 216 is being used for a DMA data transfer. Only one DMA data transfer is permitted to occur at a
  • the direction tag 312 indicates to the DMA engine 250 whether an operation is a read
  • An address tag 320 is also provided for the data cache which indicates the starting
  • a lock indicator 324 is also provided for
  • the lock indicator 324 indicates to the execution unit 204 whether access
  • the lock indicator 324 for that set 304 is
  • a separate portion of the data cache 216 is used as a buffer 316 for
  • This buffer serves as a memory for the DMA engine 250
  • cache controller 212 also uses the data cache 216 as a buffer for its operations.
  • each cache line 300 has a data part 412 and a control part 414
  • Control bits 400 typically
  • the address section 408 is used to determine cache hits or cache misses, as in normal cache
  • the cache controller 212 does not perform data transfers in response to
  • the execution unit 204 checks the busy tag 308 of the data cache 216
  • a request for a DMA transfer is stored in the queue if the busy
  • direction tag 312 to indicate whether the transfer is to be a read or a write. If the data transfer
  • a set 304 is chosen from the unlocked sets 304 using least-recently-used principles
  • Figure 5 is a block diagram illustrating a preferred embodiment of DMA engine 250.
  • the execution unit 204 determines whether the cache controller 212 or the DMA engine 250
  • DMA engine 250 when the data does not change very often. When data is constantly being
  • the cache controller 212 is more appropriately specified to perform the data transfer
  • the execution unit 204 selects either the cache
  • the execution unit 204 then checks the busy tag 308 of the data cache 216 across
  • the execution unit 204 transmits block transfer information to the
  • DMA engine 250 to allow the DMA engine 250 to transfer data responsive to a cache miss.
  • Block information preferably includes address information, byte count information, and a
  • the address information is
  • manipulation module 500 retrieves the block transfer information over lines 509, 511, and
  • the data manipulation module 500 accesses the external memory
  • manipulation module 500 begins reading bits of data from external memory 208 which are
  • the data manipulation module 500 stops reading.
  • manipulation module 500 determines which sets 304 are unlocked, and selects an
  • unlocked set 304 to which to write the data. Selection of the set 304 is based upon least-
  • the data manipulation module 500 writes the busy tag 308
  • manipulation module 500 then writes the data across line 519 through counter 520, across line 213 into the set 304. After the transfer is complete, the DMA engine 250 disables the
  • the DMA engine 250 examines a DMA queue in the data cache 208 to see if
  • the execution unit 204 may lock the set 304 to preserve the newly transferred data
  • the data manipulation module 500 retrieves the data across line
  • the data manipulation module 500 may be implemented
  • Figure 6 is a flowchart illustrating a preferred method of writing to data cache.
  • the DMA engine 250 determines 600 whether the request is to read a block of data from
  • step 700 discussed below. If it is,
  • the DMA engine 250 receives 604 the block transfer information from the execution unit 204.
  • the DMA engine 250 enables 608 the busy tag 308 of the data cache 216 to indicate to
  • the busy tag 308 is disabled 624.
  • Figure 7 is a flow chart illustrating a preferred method of transferring data from the
  • the DMA engine 250 receives 700 block
  • the busy tag 308 is
  • a busy tag 308 is provided to streamline
  • Figure 8 illustrates a preferred method of allocating sets 212 in data cache 216.
  • the execution unit 204 determines 800 if a data transfer is to be made by the DMA engine
  • this information is preferably provided in the code to be executed by the
  • the execution unit 204 selects 804 a cache set 304 to which to
  • the execution unit determines 806 whether the selected set 304 is
  • the execution unit 204 determines 816 whether there are
  • the execution unit 204 selects 820 a next set. The process is repeated
  • the execution unit 204 orders 824 the sets responsive to their latest time of access
  • the set 304 which has not been accessed for the longest period is
  • the execution unit 204 If the data transfer is to be made by the cache controller 212, the execution unit 204
  • the cache controller compares address tags of cache lines 300 to requests for data
  • the instruction cache is also
  • the DMA engine 250 fetches
  • the DMA engine 250 transmits the instruction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A DMA engine is coupled to a data cache and an execution unit. The DMA engine operates independently of the execution unit and can load blocks of data from external memory to the data cache and from the data cache to external memory without assistance from the execution unit. The execution unit programs the DMA engine with block transfer information and the DMA engine performs the rest of the operations independently. The block transfer information allows the system to transfer blocks of data to and from the cache. The audio and video blocks of data are loaded into the data cache by the DMA engine immediately prior to when they are needed by the execution unit. When a block has been transferred into the cache, the DMA engine sets a status flag to indicate to the execution unit that the block transfer is complete and the requested block is available for processing. The data cache is organized into sets which can be used for storage of multimedia blocks of data under the control of the DMA engine or for traditional data storage under the cache controller using conventional line replacement policies. The allocation of each set is dynamic and can be optimized for the computational requirements of the system.

Description

DIRECT MEMORY ACCESS ENGINE FOR DATA CACHE CONTROL
BACKGROUND OF THE INVENTION
1. Field of the Invention.
The application relates generally to memory management systems, and specifically to
an DMA engine for data cache control.
2. Description of the Background Art.
Multimedia encoders, such as those used for MPEG and MPEG2 encoding, provide
the necessary compression to allow video and audio data to be transferred, stored, and played
in a computer environment. Integrated MPEG encoders use an embedded processor to
perform the encoding operations required to compress the video and audio data. Figure 1
illustrates a block diagram of a conventional embedded processor 100 for use in an integrated
MPEG encoder. The instruction cache 128 feeds an instruction stream to the instruction
decode unit 124 which decodes instructions within the stream for the execution unit 104. The
decoded instructions are executed by the execution unit 104. The data cache controller 112
supervises the operation of the data cache 116 using conventional cache management
techniques.
The data cache is typically divided into a number of sets, where each set contains a
number of cache lines for storing data. Each cache line has a tag that holds a number of
address bits and several control bits (e.g., valid bit, lock indicator, dirty bit). A cache line is
filled at the request of the execution unit 104 when a data location is needed which is not
currently represented in the cache. This is commonly known as a cache miss. When a cache
miss occurs, the data cache controller 112 initiates one or more external memory accesses and
brings the requested cache line into the data cache 116 and updates the tags accordingly. The
issue of which existing cache line to replace is treated using conventional algorithms such as the "Least-Recently Used" methodology in which the least recently used data line is replaced
with the new line.
In an integrated MPEG encoder, the execution unit 104 must operate on blocks and
macroblocks which are typical for processing audio and video data. Conventional cache
management techniques are not optimal for systems in which blocks of data are required to be
transferred to and from the cache 116 and external memory 108. For example, if there is a
minor variation in a block of data, conventional schemes replace the entire block which is
time consuming and resource inefficient. In these systems, it is desirable to be able to load
an entire data set with a standard block of data in advance of when the execution unit 104
requires the data, such that the data will be available to the execution unit 104 when the
processing of the data block begins. Additionally, it is desirable in such systems to pre-load
the data cache 116 while minimizing the involvement of the execution unit 104, allowing the
execution unit 104 to devote its resources to other computationally intensive tasks.
Thus, a system is needed in which blocks of data can be pre-loaded prior to the
processing of the blocks of data, and which minimizes the involvement of an execution unit
to improve the overall processing power of the system.
SUMMARY OF THE INVENTION
In accordance with the present invention, an DMA engine is coupled to a data cache
and an execution unit. The DMA engine operates independently of the execution unit and
can load blocks of data from external memory to the data cache and from the data cache to
external memory without assistance from the execution unit. The execution unit programs
the DMA engine with block transfer information and the DMA engine performs the rest of
the operations independently. In contrast to conventional implementations of DMA engines
which require a dedicated memory buffer to operate, in accordance with the present invention
the memory buffer has been merged with the data cache memory. The block transfer
information allows the system to transfer blocks of data to and from the cache which is
advantageous in multimedia encoding systems in which the data is grouped into blocks and
macroblocks for processing by the execution unit. In a further embodiment, the data cache is
organized into sets which can be used for storage of multimedia blocks of data under the
control of the DMA engine or for traditional data storage under the cache controller using
conventional line replacement policies. The cache controller and DMA engine are
implemented separately; however, both share the same buffers for storage. The execution
unit preferably dynamically determines whether the cache controller or the DMA engine will
control each transfer of data responsive to instructions in the code or program directing the
execution unit to use the DMA engine or the cache controller to perform the data transfer.
Thus, the data transfers can be optimized for the computational requirements of the system,
providing greater flexibility and improving the overall processing power of the system.
In a preferred embodiment, the audio and video blocks of data are loaded into the data
cache by the DMA engine immediately prior to when they are needed by the execution unit.
This optimizes the processing of the system because there is no delay in waiting for blocks to be transferred while the execution unit is idle. When a block has been transferred into the
cache, the DMA engine sets a status flag to indicate to the execution unit that the block
transfer is complete and the requested block is available for processing. The blocks can be of
any size, and are preferably chosen to contain only data which has changed. By limiting the
transfer of data which has not changed, the amount of overall data transfers is reduced,
improving the system's processing capabilities.
Finally, the present invention is equally applicable to instruction cache management.
In this embodiment, data need only be retrieved from main memory, but using the DMA
engine of the present invention, only the portion of the code required is retrieved, which
maximizes the use of the system resources.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of a prior art embedded processor.
Figure 2 is a block diagram of a preferred embodiment of an embedded processor in
accordance with the present invention.
Figure 3 is a block diagram of a data cache.
Figure 4 is a block diagram of a data cache line tag in accordance with the present
invention.
Figure 5 is a block diagram of an embodiment of an DMA engine in accordance with
the present invention.
Figure 6 is a flowchart illustrating a preferred method of transferring a block of data
from external memory to a data cache.
Figure 7 is a flowchart illustrating a preferred method of transferring a block of data to
external memory from a data cache. Figure 8 is a flowchart illustrating a preferred method of allocating sets in the data
cache.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Figure 2 is a block diagram illustrating a preferred embodiment of an embedded
processor 200 of an integrated multimedia encoder. The embedded processor 200 comprises
an instruction cache controller 232, instruction cache 228, instruction decoder 224, external
memory 208, and data cache controller 212. In accordance with the present invention, a
direct memory access (DMA) engine 250 is coupled to the execution unit 204, external
memory 208, data cache 216, and instruction cache 228. The DMA engine 250 is designed to
perform the transfer of blocks of data from the data cache 216 to the external memory 208,
and from external memory 208 to the data cache 216 across lines 213, 211. Additionally, the
DMA engine 250, or a separate DMA engine, transfers instruction data to and from the
instruction cache 228 and external memory 208 across lines 201 , 211. The execution unit
204 programs block transfer information into the DMA engine 250 across line 205, and the
DMA engine 250 then performs the block transfer without further assistance from the
execution unit 204. The execution unit transmits standard data requests to the cache
controller 212 across line 209.
In conventional data cache memory, information is stored and transferred in cache
lines. However, in multimedia encoder applications, blocks and macroblocks which are
larger than individual cache lines are the groupings of data which are processed by the
execution unit 204. If a multimedia encoder is using a conventional cache management
system, multiple cache line transfer operations must be performed in order to complete a
usable data transfer, which lengthens the processing time of the system. Also when a block of data which needs to be replaced is smaller than a cache line, the entire cache line is
replaced by the conventional cache-management system, which over utilizes system
resources. However, by using a DMA only the data which needs to be replaced is retrieved
from main memory 208 and placed in a cache line. Another major benefit of using a DMA
engine 250 to provide the transfer of data and instructions is that it allows the execution unit
204 to be free to concentrate its resources on computation. This division of labor also greatly
improves the overall processing power of the system. Further, in contrast to existing
implementations of DMA engines, the DMA engine 250 of the present invention does not
require a separate dedicated memory buffer. Rather, the data cache 216 itself is used as the
buffer for the DMA engine 250 which is shared with the cache controller 212. No additional
hardware is required to store data required by the DMA engine 250 to perform its data
transfer operations.
Figure 3 illustrates a preferred embodiment of the data cache 216 in accordance with
the present invention. The data cache 216 is optimized to support both traditional data
storage functions as well as the block data storage required by the processing of multimedia
data. The data cache 216 is organized into sets 304. Each set 304 contains a number of cache
data lines 300. The sets 304 are organized in the data cache 216 responsive to the type of
applications required. For example, for digital video applications, a typical set is 256x32 bits,
while for digital audio applications a typical set is 64x32 bits. The illustrative data cache 216
shown in Figure 3 is shown for digital audio applications, each block representing a byte.
Two sets 304 are shown, one is illustrated having eight cache lines 300. Thus, the cache is 32
bits by 128 bits.
In a preferred embodiment, the data cache 216 has a busy tag 308 and a direction tag
312. The busy tag 308 indicates to the execution unit 204 whether the data cache 216 is being used for a DMA data transfer. Only one DMA data transfer is permitted to occur at a
time. The direction tag 312 indicates to the DMA engine 250 whether an operation is a read
or a write. An address tag 320 is also provided for the data cache which indicates the starting
address for a data transfer by the DMA engine 250. A lock indicator 324 is also provided for
each data set 304. The lock indicator 324 indicates to the execution unit 204 whether access
is permitted to an individual data set 304. If the program code instructs that a particular set
304 contains data which should not be modified, the lock indicator 324 for that set 304 is
written, and no data transfers will then occur involving the locked set 304. If there is a
particular set 304 from which data should be read, all of the lock indicators 324 for the other
sets 304 are written, and the unlocked set 304 will therefore be the set from which data is read
by the DMA engine 250. A separate portion of the data cache 216 is used as a buffer 316 for
the use of the DMA engine 250. This buffer serves as a memory for the DMA engine 250,
and stores information the DMA engine 250 requires to perform data transfers. The data
cache controller 212 also uses the data cache 216 as a buffer for its operations.
As shown in Figure 4, each cache line 300 has a data part 412 and a control part 414
which is comprised of an address section 408 and control bits 400. Control bits 400 typically
include a valid bit 400(1), a lock indicator 400(2) for the cache line, and a dirty bit 400(3).
The address section 408 is used to determine cache hits or cache misses, as in normal cache
management systems. However, in contrast with the DMA engine 250 in accordance with the
present invention, the cache controller 212 does not perform data transfers in response to
program instructions.
In operation, upon requiring a DMA transfer as indicated by instructions contained
within executing code, the execution unit 204 checks the busy tag 308 of the data cache 216
to determine whether a DMA transfer is currently occurring. If the busy tag 308 is set, indicating that a DMA transfer is occurring, no action is taken. In an embodiment using a
queue for the DMA requests, a request for a DMA transfer is stored in the queue if the busy
tag 308 is set. Once the busy tag 308 is cleared, the data transfer will begin. If the busy tag
308 indicates that the data cache 216 is available for a DMA transfer, the execution unit 204
writes the starting address into the address tag 320 of the data cache 216, and writes to the
direction tag 312 to indicate whether the transfer is to be a read or a write. If the data transfer
is a write, a set 304 is chosen from the unlocked sets 304 using least-recently-used principles
as the set to which data is to be written.
Figure 5 is a block diagram illustrating a preferred embodiment of DMA engine 250.
The execution unit 204 determines whether the cache controller 212 or the DMA engine 250
will transfer the data. In a preferred embodiment, the determination of which to use is made
by the programmer in creating the code. Typically the programmer will specify the use of the
DMA engine 250 when the data does not change very often. When data is constantly being
replaced, the cache controller 212 is more appropriately specified to perform the data transfer
operations. Responsive to the instructions, the execution unit 204 selects either the cache
controller 212 or the DMA engine 250 to perform the data transfer. If a DMA operation is
specified, the execution unit 204 then checks the busy tag 308 of the data cache 216 across
line 503 to determine whether the cache 216 is available for a DMA data transfer operation.
If the busy tag 308 is clear, the execution unit 204 transmits block transfer information to the
DMA engine 250 to allow the DMA engine 250 to transfer data responsive to a cache miss.
Block information preferably includes address information, byte count information, and a
control indication as to whether the operation is a read or a write. The address information is
transmitted over address line 507 to the address tag 320, the byte count information is
transmitted over data line 501 to the data cache buffer 316, and the control signal is transmitted over control line 503 to the direction tag 308. The use of the byte count
information and starting address to identify a block of data allows blocks of data of precise
size to be specified. This allows the blocks of data to be transferred to include only the
information which is changing to be replaced, rather than requiring the transfer of an entire
set of data, most of which has not changed since the last transfer. Other block transfer
information transmitted over different or a single line can also be used in accordance with the
present invention.
Once the execution unit 204 verifies that the busy tag 308 is clear, the execution unit
204 initiates the DMA engine 250 across line 205 to begin the transfer. The data
manipulation module 500 retrieves the block transfer information over lines 509, 511, and
515 and performs the required tasks. For example, if the direction tag 308 indicates a read
operation is to be performed, the data manipulation module 500 accesses the external memory
208 through data line 211 at the address specified by the execution unit 204. The data
manipulation module 500 begins reading bits of data from external memory 208 which are
counted by the counter 524. When the counter value equals the byte count value received
from the execution unit 204, the data manipulation module 500 stops reading. The current
counter value and base count value are also stored in the data cache buffer 316. The data
manipulation module 500 then determines which sets 304 are unlocked, and selects an
unlocked set 304 to which to write the data. Selection of the set 304 is based upon least-
recently used principles; however, the selection is only limited to the unlocked sets 304.
Once a set 304 is selected, the data manipulation module 500 writes the busy tag 308
to indicate to the execution unit 204 that the DMA engine 250 is working on transferring a
data block and that the execution unit 204 cannot initiate a DMA transfer. The data
manipulation module 500 then writes the data across line 519 through counter 520, across line 213 into the set 304. After the transfer is complete, the DMA engine 250 disables the
busy tag 308 to indicate to the execution unit 204 that it may initiate a new DMA transfer.
Alternatively, the DMA engine 250 examines a DMA queue in the data cache 208 to see if
there are any pending DMA data transfers to be executed. After storing new data into a data
set 304, the execution unit 204 may lock the set 304 to preserve the newly transferred data
from subsequent DMA operations.
For a read operation, the data manipulation module 500 retrieves the data across line
213 from the designated set 304 through counter 520. The data is counted through counter
520 until the size of the retrieved data matches the specified byte count. The data is then
transferred to external memory 208. The data manipulation module 500 may be implemented
as specialized hardware, as a microprocessor, or other conventional means.
Figure 6 is a flowchart illustrating a preferred method of writing to data cache. First,
the DMA engine 250 determines 600 whether the request is to read a block of data from
external memory 208. If it is not, the process proceeds to step 700, discussed below. If it is,
the DMA engine 250 receives 604 the block transfer information from the execution unit 204.
Then, the DMA engine 250 enables 608 the busy tag 308 of the data cache 216 to indicate to
the execution unit 204 that the block is being transferred. Next, the DMA engine 250
accesses 612 external memory 208, and locates the specified address. A block of data of the
requested size is retrieved 616 and stored 620 into the data cache 216. As the transfer is
completed for the lines 300, the busy tag 308 is disabled 624.
Figure 7 is a flow chart illustrating a preferred method of transferring data from the
data cache 216 to external memory 208. First, the DMA engine 250 receives 700 block
transfer information such as the starting address and the byte count. Then, the busy tag 308 is
enabled 704 to indicate that the data cache is in use by the DMA engine 250. The data is read from the data cache line 300 and transferred to available memory lines in external
memory 208. After the data is read from each line 300 the busy tag 308 for that line is
disabled, allowing the execution unit 204 to access that line 300. Thus, the DMA engine 250
in accordance with the present invention provides for the independent transfer of data from
memory 208 to the data cache 216 and back, allowing the execution unit 204 to devote its
resources to more computationally intensive tasks. A busy tag 308 is provided to streamline
communication between the DMA engine 250 and the execution unit 204 and ensure that
only the correct data is read from a memory location.
Figure 8 illustrates a preferred method of allocating sets 212 in data cache 216. First,
the execution unit 204 determines 800 if a data transfer is to be made by the DMA engine
250. Again, this information is preferably provided in the code to be executed by the
execution unit 204, allowing the dynamic designation of each data transfer to be performed
by either the DMA engine 209 or the cache controller 112. If the data transfer is to be made
by the DMA engine 250, the execution unit 204 selects 804 a cache set 304 to which to
transfer data. Next, the execution unit determines 806 whether the selected set 304 is
unlocked. If it is, the set 304 is added 808 to a list of unlocked sets. If it is unlocked the set
304 is not added 812 to the list. The execution unit 204 determines 816 whether there are
more sets. If there are, the execution unit 204 selects 820 a next set. The process is repeated
until a list of all unlocked sets 304 are obtained. Once the list of unlocked sets 304 is
obtained, the execution unit 204 orders 824 the sets responsive to their latest time of access
by the DMA engine 250. The set 304 which has not been accessed for the longest period is
selected 828 as the set 304 to which to transfer the data.
If the data transfer is to be made by the cache controller 212, the execution unit 204
transfers the data to a cache line 300 using the cache controller 212, as described above. Both the cache controller and the DMA engine 250 use the same instruction set. In normal
operation, the cache controller compares address tags of cache lines 300 to requests for data
to determine cache hit and cache misses. Upon finding a cache miss, the missing data is
retrieved from memory and stored in a cache line as described above. If the program code
instructs the execution unit 204 to perform a DMA data transfer, the DMA engine 250
performs the operation as described above.
In a further embodiment, referring again to Figure 2, the instruction cache is also
managed by the DMA engine 250. In this embodiment, the DMA engine 250 fetches
instructions from the external memory 208 across line 211 responsive to receiving an address
and a byte count from the execution unit 204. The DMA engine 250 transmits the instruction
data across line 201 in the instruction cache 128 for access by the execution unit 204. The
use of the DMA engine 250 to perform instruction retrieval allows the execution unit 204 to
devote its resources to performing computation tasks, thus maximizing the performance of
the system.
While the present invention has been described with reference to certain preferred
embodiments, those skilled in the art will recognize that various modifications may be
provided. These and other variations upon and modifications to the preferred embodiments
are provided for by the present invention.

Claims

WHAT IS CLAIMED IS:
1. A data cache memory for use with digital video and audio processing applications
executed by an execution unit comprising:
memory, for storing blocks of data;
a data cache, for storing blocks of data of which immediate access is desirable;
and
a direct memory access engine, for transferring blocks of video and audio data
between the data cache and memory responsive to receiving block
information from the execution unit.
2. The apparatus of claim 1 wherein the block information includes an address in the
memory where the data is located and a length of the block to be transferred.
3. The apparatus of claim 1 wherein the data cache includes a status indicator, coupled to
the direct memory access engine, for indicating to the execution unit whether a block of data
is being accessed by the direct memory access engine.
4. The apparatus of claim 1 wherein the data cache includes a direction indicator,
coupled to the direct memory access engine, to indicate whether a data transfer is to be a read
operation or a write operation.
5. The apparatus of claim 1 wherein the data cache comprises a plurality of sets, and
each set has a lock indicator to indicate whether data transfer operations may access data
within the set.
6. The apparatus of claim 1 further comprising a data cache controller, coupled to the
memory and the execution unit for transferring blocks of data between the data cache and
memory responsive to instructions received from the execution unit.
7. The apparatus of claim 1 wherein the data cache further comprises a direct memory
access engine buffer, coupled to the direct memory access engine, for temporarily storing data
used by the direct memory access engine in performing data transfer operations.
8. The apparatus of claim 1 further comprising an instruction cache, and wherein the
direct memory access engine is coupled to the instruction cache for transferring instruction
data between the external memory and the instruction cache responsive to instructions
received from the execution unit.
9. A method of storing and retrieving data in a multimedia encoder system having a data
cache, an execution unit, external memory, and a direct memory access engine, comprising
the steps of:
receiving block transfer information specifying a block of data to be
transferred;
determining whether the data cache is available for a data transfer operation;
responsive to determining the data cache is available for a data transfer
operation, determining whether the block is to be transferred to the
external memory or to the data cache;
responsive to determining the block is to be transferred to the data cache,
retrieving the block from external memory responsive to the block
transfer information; and
storing the retrieved block into the data cache.
10. The method of claim 9 wherein the block transfer information includes address
information specifying a starting address of the block, the method further comprising the step
of:
accessing the external memory at the address specified to retrieve the block.
11. The method of claim 10 wherein the block transfer information includes a byte count,
and the retrieving step further comprises:
retrieving data from the external memory until a size of the retrieved block is
approximately equal to the byte count.
12. The method of claim 9 further comprising, after storing the block, the step of:
signaling the execution unit that the block transfer is complete.
13. The method of claim 9 further comprising the step of:
responsive to determining the data cache is not available for a data transfer
operation, storing the block transfer information in a queue.
14. The method of claim 9 in a system in which the data cache is comprised of sets, said
method comprising:
determining whether a set is available for a data transfer operation, and the
step of storing further comprises:
storing the block to be transferred into the set responsive to determining the set
is available for a data transfer operation.
15. The method of claim 9 further comprising :
for each set, determining whether the set is available for a data transfer
operation; for each available set, determining when a last access of the set was made; and
selecting a set which has been not been accessed for a longest period of time to
which to transfer the data.
16. The method of claim 9 in a system having a cache controller, further comprising:
determining whether a cache controller or a direct memory access engine is to
perform the data transfer.
17. The method of claim 16 wherein determining whether a cache controller or a direct
memory access engine is to perform the data transfer further comprises:
receiving an instruction from the execution unit indicating the data transfer is
to be performed by the direct memory access engine.
PCT/US2000/010503 1999-04-23 2000-04-19 Direct memory access engine for data cache control WO2000065453A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU43623/00A AU4362300A (en) 1999-04-23 2000-04-19 Direct memory access engine for data cache control

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US29867399A 1999-04-23 1999-04-23
US09/298,673 1999-04-23

Publications (1)

Publication Number Publication Date
WO2000065453A1 true WO2000065453A1 (en) 2000-11-02

Family

ID=23151527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/010503 WO2000065453A1 (en) 1999-04-23 2000-04-19 Direct memory access engine for data cache control

Country Status (2)

Country Link
AU (1) AU4362300A (en)
WO (1) WO2000065453A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6502169B1 (en) * 2000-06-27 2002-12-31 Adaptec, Inc. System and method for detection of disk storage blocks containing unique values

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0282248A2 (en) * 1987-03-10 1988-09-14 Fujitsu Limited Block access system using cache memory
US5598576A (en) * 1994-03-30 1997-01-28 Sigma Designs, Incorporated Audio output device having digital signal processor for responding to commands issued by processor by emulating designated functions according to common command interface
US5668957A (en) * 1995-11-02 1997-09-16 International Business Machines Corporation Method and apparatus for providing virtual DMA capability on an adapter connected to a computer system bus with no DMA support

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0282248A2 (en) * 1987-03-10 1988-09-14 Fujitsu Limited Block access system using cache memory
US5598576A (en) * 1994-03-30 1997-01-28 Sigma Designs, Incorporated Audio output device having digital signal processor for responding to commands issued by processor by emulating designated functions according to common command interface
US5668957A (en) * 1995-11-02 1997-09-16 International Business Machines Corporation Method and apparatus for providing virtual DMA capability on an adapter connected to a computer system bus with no DMA support

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6502169B1 (en) * 2000-06-27 2002-12-31 Adaptec, Inc. System and method for detection of disk storage blocks containing unique values

Also Published As

Publication number Publication date
AU4362300A (en) 2000-11-10

Similar Documents

Publication Publication Date Title
US6775751B2 (en) System and method for using a compressed main memory based on degree of compressibility
US5235697A (en) Set prediction cache memory system using bits of the main memory address
US5600817A (en) Asynchronous read-ahead disk caching using multiple disk I/O processes adn dynamically variable prefetch length
KR920005852B1 (en) Apparatus and method for providing a composit descriptor in a data processing system
US5778434A (en) System and method for processing multiple requests and out of order returns
US8250332B2 (en) Partitioned replacement for cache memory
US5537569A (en) Multiprocessor system utilizing a directory memory and including grouped processing elements each having cache
US5764945A (en) CD-ROM average access time improvement
US5588129A (en) Cache for optical storage device and method for implementing same
US6549995B1 (en) Compressor system memory organization and method for low latency access to uncompressed memory regions
EP0409415A2 (en) Fast multiple-word accesses from a multi-way set-associative cache memory
US6578065B1 (en) Multi-threaded processing system and method for scheduling the execution of threads based on data received from a cache memory
US6671779B2 (en) Management of caches in a data processing apparatus
US20050108497A1 (en) Translation look aside buffer (TLB) with increased translational capacity for multi-threaded computer processes
US7246202B2 (en) Cache controller, cache control method, and computer system
US7428615B2 (en) System and method for maintaining coherency and tracking validity in a cache hierarchy
EP0667579A1 (en) Cache for optical storage device
JP2006018841A (en) Cache memory system and method capable of adaptively accommodating various memory line size
US7627734B2 (en) Virtual on-chip memory
EP1045307B1 (en) Dynamic reconfiguration of a micro-controller cache memory
WO2000065453A1 (en) Direct memory access engine for data cache control
JPH07129464A (en) Information processor
US5765190A (en) Cache memory in a data processing system
US6349370B1 (en) Multiple bus shared memory parallel processor and processing method
US6192449B1 (en) Apparatus and method for optimizing performance of a cache memory in a data processing system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP