US20040258147A1

US20040258147A1 - Memory and array processor structure for multiple-dimensional signal processing

Info

Publication number: US20040258147A1
Application number: US10/867,027
Authority: US
Inventors: Tsu-Chang Lee
Original assignee: Individual
Current assignee: VICHIP Corp
Priority date: 2003-06-23
Filing date: 2004-06-14
Publication date: 2004-12-23
Also published as: CN1809839A; WO2005002235A3; EP1644896A2; WO2005001773A2; US7499491B2; US20050013369A1; US7471724B2; WO2005001773A3; WO2005002235A2; US20050111548A1; JP2007525078A

Abstract

An object oriented n-dimensional signal object store and processing array structure to enable efficient processing of n-dimensional signal data, including: a fast n-dimensional signal storage memory capable of rapidly storing and accessing n-dimensional signal objects, a multi-level mass memory structure to store massive amounts of data before transferring to the fast n-dimensional signal storage memory, and an n-dimensional signal processor array to process the n-dimensional signal object data in the n-dimensional singal object store.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority from a U.S. provisional patent application, Ser. No. 60/480,985, filed on Jun. 23, 2003, entitled “Method and Apparatus for Adaptive Multiple-Dimensional Signal Sequences Encoding/Decoding,” which is hereby incorporated by reference. This application is related to a co-pending U.S. utility patent application, filed on Jun. 14, 2004, entitled “Method and Apparatus for Adaptive Multiple-Dimensional Signal Sequences Encoding/Decoding,” which is hereby incorporated by reference.[0001]

BACKGROUND OF THE INVENTION

1. Field of the Invention

This disclosure relates generally to data encoding, storage, distribution, and decoding, and more particularly to a memory and array signal processing structure to process the n-dimensional signal sequence.

2. Description of the Prior Art

Digital data systems are frequently challenged to handle large quantities of data quickly enough to meet practical needs. Compact disc music requires about 1500 kilobits per second, and video needs over 200,000 kilobits per second. Both transmission and storage of data is costly, and in some cases impossible. For example, a current telephone line modem can only carry maximum bit rate at 56 kilobits per second with a perfect line condition. Although video image frames need only be handled at approximately 30 cycles per second in order to allow an observer to have the impression of continual image transmission, the data content of each image frame is very large.

Solutions to the problem of quickly handling large quantities of data have been developed by using methods of data compression, i.e., methods of reducing the quantity of bits required. Data compression has made possible technological developments including digital television, DVD movie, streaming Internet video, home digital photography and video conferencing. Compressing coders and decoders (CODECs) are used to encode (at the capturing/production side) and decode (at the receiving/reproduction side) data containing statistical redundancy.

FIG. 1A is a simplified block diagram describing a prior art system for compressing image sequence data with an

encoder

1. An image frame 4 is provided as input to the encoder 1 and the data is encoded. The encoded frame is then transmitted, and a copy of the encoded frame is decoded at the decoder 10 and stored as a reference frame 6 in the frame buffer 3. With the current input image frame 4 as input, the encoder 1 then searches the reference frame 6 for a closest match point in the reference frame 6 for each block of a plurality of blocks that make up the current frame. This includes the use of a motion estimator (ME) 5, and requires a calculation of what is called an “energy difference” measure, such as sum of square error or sum of absolute error between the current frame block and corresponding reference frame block located at each search point in the reference frame. The best match location is then represented as a “motion vector” 7, specifying the two-dimensional location displacement of the block in the reference frame 6 relative to the corresponding block in the current frame. Also, the difference (residue) 8 between the best match block in the reference frame 6 and the current image input frame 4 is determined and is typically called the “Residue” or “Block Prediction Difference” (BPD). Both the motion vector 7 and the residue 8 are then encoded and transmitted. The encoder 1 will then decode the motion vector 7 and residue 8 and reconstruct the current frame in the same way that a decoder receiving the same data would reconstruct the frame, and then store this frame as a reference frame.

A very significant issue is the amount of computational power that is required by an encoder in accomplishing the task of finding the best match for each block in the current frame, i.e., determining the displacement vector (motion vector) such that the displacement block in the reference frame is most “similar” to the current block. One prior art method of performing this task involves searching every possible location for block matching within a pre-defined search area. This method requires astronomical computing power and is not feasible for practical real time implementation. There are many simplified methods to search a fraction of the large and complete search space to reduce the computation cost. However, even with the reduced computing cycles, the data access is still the major bottleneck for system performance throughput. This is especially true for multi-dimensional data (2D for images, and 3D for video with multiple frame considerations) that need to be rapidly accessed in a selected pattern.

SUMMARY OF THE INVENTION

The present invention is a memory organization and processing array structure to enable efficient processing of n-dimensional signal frames, including: an n-dimensional object store capable of rapidly storing and accessing of n-dimensional objects, the n-dimensional object store could optionally include a multi-level mass memory structure to store massive amount of data economically, and a signal processor array to process the data in the n-dimensional object store. Embodiments of the invention can be used to encode general n-dimensional signal sequences, such as one-dimensional, two-dimensional, and three-dimensional signals. One important application of this method is in video encoding for transmission and storage purposes. Because of this, in many of the descriptions below the two-dimensional video signal sequence compression is illustrated. However, the method and apparatus taught here can be extended to compress a general sequence of n-dimensional signals, where n is a positive integer.

A first aspect of the invention is directed to an objected oriented memory organization and processing array structure to enable efficient processing of n-dimensional signal frames. The system includes: an n-dimensional object store capable of rapidly storing and accessing blocks of n-dimensional signals, a multi-level mass memory structure to store a large amount of data before the transfer to the n-dimensional memory, and a signal processor array to process the data in the n-dimensional memory.

A second aspect of the invention is directed to an n-dimensional signal processing array to process n-dimensional data inputs. The processing array includes: an array of signal processing units; a group of data registers, to store the data for the signal processing units; and means for controlling the processing array to allow one data element to be used by more than one processor in the array.

A third aspect of the invention is directed to a method to operate a n-dimensional memory system for storing and retrieving n-dimensional data in an n-dimensional frame. The method includes: storing one data item into each slice of L memory slices, where L is a positive integer; organizing the n-dimensional data to allow all the data in a given cube, which can be located anywhere in the frame, to be accessed in M=B/L cycles, where B is the total number of points inside the cube; accessing the data from the L memory slices based on n-dimensional address inputs from an addressing translation module; and providing data flow from the L-slices through a data multiplexer and data de-multiplexer to outside processing modules using the n-dimensional data.

A fourth aspect of the invention is directed to a method to operate a two-dimensional memory system for storing and retrieving two-dimensional data in a two-dimensional frame. The method includes: storing one data item into each slice of L memory slices, where L is a positive integer; organizing the two-dimensional data to allow all the data in a given cube, which can be located anywhere in the frame, to be accessed in M=B/L cycles, where B is the total number of points inside the cube; accessing the data from the L memory slices based on two-dimensional address inputs from an addressing translation module; and providing data flow from the L-slices through a data multiplexer and data de-multiplexer to outside processing modules using the two-dimensional data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A illustrates a data compression system, in accordance with the prior art. [0014]
FIG. 1B illustrates a preferred embodiment of a system of the present invention for storage and processing of multi-dimensional signal data. [0015]
FIG. 1C illustrates a 2-dimensional application of the invention to video data arranged in a 2-dimensional frame, in accordance with one embodiment of the present invention. [0016]
FIG. 2 illustrates a block diagram of 2-level memory for n-dimensional storage, in accordance with one embodiment of the invention. [0017]
FIG. 3 illustrates allocation of data for a two-dimensional 3×4 block from a frame, in accordance with one embodiment of the invention. [0018]
FIG. 4 illustrates allocation of data for a two-dimensional 3×4 block from a frame, in accordance with an alternative embodiment of the invention. [0019]
FIG. 5 illustrates a frame buffer data allocation pattern in an SDRAM, in accordance with one embodiment of the invention. [0020]
FIG. 6 illustrates the order of arrangement of pixel data within one block, in accordance with one embodiment of the invention. [0021]
FIG. 7 illustrates a 2-dimensional array implementation of the n dimensional processing engine, in accordance with one embodiment of the invention. [0022]
FIG. 8 shows a spiral search with a step size of 4 pixels, in accordance with one embodiment of the invention. [0023]
FIG. 9 shows a parallel spiral pattern with P search points in parallel, in accordance with one embodiment of the invention. [0024]
FIG. 10 illustrates the method of data sharing with 3×3 array of processing units, in accordance with one embodiment of the invention. [0025]
FIG. 11 illustrates a memory access example, in accordance with one embodiment of the invention.[0026]

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

N-Dimensional Object Store (NDOS) [0027]
The implementation of block matching and motion estimation typically has a bottleneck at accessing the blocks from any location in the reference frames. With the video streams typically organized in 2-dimensional or 3-dimensional objects in video algorithms, the use of 1-dimensional linear addressing based memory does not provide efficient results. [0028]
To address this problem, what we need is some kind of specially organized memory data organization and processor structure to handle some special data access and processing requirements in the n-dimensional signal space efficiently with high throughput. Special memory organization and processing methods to address data access throughput issues for some space oriented data requests are known. For example, U.S. Pat. No. 5,818,726, entitled [0029] System and Method for Determining Acceptable Logic Cell Locations and Generating a Legal Location Structure, by Tsu-Chang Lee, issued on Oct. 6th, 1998, demonstrates the solution of the layout placement location searching problem with a special organized cell array structure and processing method to achieve very rapid IC placement data throughput. The method taught in that patent enabled the placement of multi-million gate designs with data throughput more than 100 times faster than prior methods without a special memory structure. The present invention is a different application of similar concepts taught in that IC placement patent, to solve the n-dimensional encoding and decoding signal processing problems.
FIG. 1B shows a preferred embodiment of this invention. The invention includes applying an “object oriented” principle as used in software design to organize the data and processor structure to solve the encoding and decoding signal processing problems. There are three main components in the encoding and decoding memory/processing system shown in FIG. 1B. The n dimensional encoding decoding object store (NDOS) [0030] 108, organizes and stores the n dimensional signal data in the encoding and decoding space to support high throughput processing for encoding and decoding data processing. Since there is large volume of data to be stored in the encoding and decoding signal data store, 108 can in alternative embodiments be organized according to a hierarchical structure. Inside NDOS 108, the encoding and decoding (n dimensional: N-D) signal data is stored in a general mass storage memory (NDMS) 103. A chunk of locally used encoding and decoding signal data is decomposed and organized into a fast n dimensional objected oriented n dimensional (N-D) signal storage memory (NDFS) 102. The NDFS 102 functions as a cache to provide data with high throughput to serve encoding and decoding data requests from the encoding and decoding signal processing engine 101. Since most of the encoding and decoding signal processing follows some kind of pre-defined “navigation” pattern through the encoding and decoding space, the encoding and decoding data can be pre-loaded into the NDFS 102 from the NDMS 103 through the mass storage bus 109 with some amount of pipelining control (for example, the spiral searching pattern for ME search and the neighborhood preserving scanning pattern described in the co-pending U.S. utility patent application, filed on Jun. 11, 2004, entitled “Method and Apparatus for Adaptive Multiple-Dimensional Signal Sequences Encoding/Decoding,” are two example space data access cases.)
The n dimensional (N-D) signal processing engine (NDPE) [0031] 101 requests the data through the encoding and decoding (N-D) object access bus 105 to request N-D “objects” from the NDOS 108. Note here that the requested N-D objects are “higher level” meaningful items to the n dimensional encoding decoding signal processing systems compared to the low level bits and bytes in the traditional data access from traditional memory. In other words, the NDOS 108 “understands” the data processing applications and some semantics of the data. The N-D objects requested from the NDOS 108 will be stored in a data register in the NDRF 104 associated with the NDPE 101. NDRF 104 contains registers (may include data, control, and system status) to be used by the NDPE 101 directly. The data transfer between the NDOS 108, NDPE 101, and NDRF 104 is through a very high speed N-D data bus (NDDB) 106.
NDOS Implementation [0032]
As a specific implementation of the preferred embodiment specified above, details follow below for constructing an object oriented n-dimensional memory store, based on a traditional 1-dimensional addressing based memory to optimize the memory access efficiency and access pattern flexibility for ME algorithm frame buffer accessing. However, the use of this structure is not limited to the ME algorithm. Any n-dimensional data processing can use this mechanism for the flexibility and efficiency advantages. [0033]
This memory access problem is illustrated in FIG. 1C. A 2-dimensional case in FIG. 1C is illustrated as an example, in the ME algorithm. In a video application, video data is typically arranged in a 2-dimensional “frame” [0034] 131 which shows a picture at any instance on the TV screen. Inside the frame 131, the data is typically organized in a smaller 2-dimensional blocks 133. These blocks 133 are usually have a size of 16×16 or 8×8 pixels, but can have other configurations. These blocks 133 are formed with a fixed grid pattern 132 on each frame.
Video algorithms need to access these blocks in a very efficient way, e.g. get all pixels in a block in one single cycle or one single burst of cycles. Furthermore, video algorithms need to access a 2-dimsional block at any random location not aligned to the fixed [0035] grid 133, as shown in FIG. 1C.
Currently, electronic memories (SDRAM, SRAM, etc.) are organized in a 1-dimensional based addressing mechanism that allows at best a simultaneous access/burst of pixels in a linear way, i.e., a row of pixels. With some pre-arrangement of the pixel data allocation in the memory, it is possible to burst access a block aligned to the fixed grid pattern in the frame. However, it is not possible to allow access in one cycle/burst of a random located block. One embodiment of the invention provides a structure to solve this problem. [0036]
FIG. 2 shows a specific embodiment of the general N-D object oriented memory structure shown in FIG. 1B. In this block diagram, the n-[0037] dimensional object memory 102 is separated into L slices. Each of the memory slices is a traditional 1-dimensional memory (e.g., SRAM). The data width of each slice is the minimal element size of the object. In video, this size is a pixel (e.g., 8 bits). In other applications, the bus width of the memory slice can be any size. The goal of the L-slice organization is to allow the access of an n-dimensional block in one cycle (if the data block has L elements), or in a burst of multiple access cycles with L elements each. To achieve this, the major issue is how the n-dimensional block data allocated into the L slices. There are two criteria for data allocated to each slice:
The data elements belonging to the same block should be evenly allocated into L-slice such that the L data elements in the block can be accessed simultaneously without conflict. [0038]
If the number of slice L is less then the number of data element in a block, say B=L*M, where B is the number of elements in a block, then there are multiple elements (M) of a block residing in the same slice. In one embodiment, the M data elements are put in a contiguous range on a slice to enable a single burst of block access. [0039]
One application of the invention is illustrated in FIG. 3. In FIG. 3, a 2-dimensional block of 3×4 with L=12 is shown to show the allocation of data. A [0040] memory slice ID 302, a row access 304, and a random block access 306 are shown. In this way, any 3×4 block in the frame can be accessed in a single cycle.
Another application of the invention with L=6 and M=2 is illustrated in FIG. 4. A [0041] memory slice ID 302, a row access 304, and a random block access 306 are shown. In this case, any 3×4 block consists of two elements with the same memory slice ID 302. That is, the 3×4 block can be accessed in two clock cycles. In addition, as observed in FIG. 3 and FIG. 4, any L pixels in a row access 304 can be accessed in one clock cycle, because there is no slice memory duplication in the set of row pixels.
Once the data allocation is done properly, the address translation and data multiplexing control in FIG. 2 is designed to reflect the allocation pattern. Note that in one embodiment of the invention, the number of dimension n, the number of block sizes in each dimension, the number of memory slices L can all be parameterized to fit any specific application. [0042]
Multi-Level N-dimensional Signal Storage Memory [0043]
The video ME algorithm has the following unique set of requirements that differentiates itself from a non-real-time CPU system. [0044]
1. Large Capacity [0045]
2. Large Bandwidth [0046]
3. Random Access of 2-dimensional data elements [0047]
4. Low Cost [0048]
Among these requirements, the second and third requirements can be solved by the memory mechanism described previously. However, the large capacity and low cost solution is not met if the n-dimensional storage mechanism is used alone. Furthermore, a large slice number L provides large access bandwidth while increasing the cost at the same time. [0049]
A conventional multi-level cache memory hierarchy can be applied to the n-dimensional memory very well. Note that the high speed and cost of n-dimensional store make a multi-level cache memory hierarchy most suitable for the innermost level of memory closest to the processing engine. [0050]
A 2-level memory embodiment for the n-dimensional store was previously shown in FIG. 2. In this mechanism, the data is organized such that the data is first read from the second level memory [0051] 103 (in this embodiment, a SDRAM is used) and stored in the on-chip n-dimensional store. Once the data is in the n-dimensional store, the data can be accessed flexibly and reused many time. In this way, the demand on the external SDRAM 103 bandwidth and the access pattern flexibility is reduced.
When a SDRAM is used as the second level of memory in 2-level n-dimensional store, some elaboration on the use of SDRAM is needed to support the n-dimensional data structure and overcome the SDRAM architecture limitations. Due to the architecture of a SDRAM design, there are overhead associated with the SDRAM access. Typically, a SDRAM access involves the following steps, each with various delays which incur overhead between bursts: [0052]
Pre-Charge of a previously accessed memory bank [0053]
Sending a RAS command. [0054]
Sending a CAS command [0055]
Without a proper arrangement of the pixel data, the overhead between burst accesses can be very high. On the other hand, the SDRAM provides memory organization of multiple banks allows command issuing and pre-charge independently. With a proper organization of pixel data of a frame, the SDRAM access overhead can be minimized. To do this, the frame buffer data allocation pattern is fixed in the SDRAM as illustrated in FIG. 5. A [0056] frame buffer 501 is first pre-partitioned into block of a fixed size (16×16, 8×8, or other fixed size) with each block allocated into one bank of SDRAM memory. The example in FIG. 5 shows 8×8 blocks 504 and a horizontal row of pixels 502. The blocks 504 are aligned to the fixed grid pattern as explained in 102 of FIG. 1B. These blocks 504 are arranged sequentially into the sequential bank ID 508 as shown in FIG. 5. Within one block, the 8×8 pixel data 610 are arranged in a zigzag order 608 shown in FIG. 6.
With this, the access patterns to the SDRAM listed in the following are done with zero-overhead: [0057]
Block Burst—The whole block is arranged continuously within a bank. Therefore the access of the whole block is done with one single burst. [0058]
Sequential Blocks Burst—Multiple blocks burst access in the raster scan order (as shown in FIG. 5) are achieved with multiple bursts. Since each block is allocated into a different bank, these bursts commands are pipelined such that there is no overhead. [0059]
Row Access—A row of pixels in the same line can be accessed with multiple bursts. Again, the multiple bursts belongs to different bank, therefore pipelining across burst is possible. Whether there is zero overhead depends on how long is the burst within one block, and depends on CAS and RAS delay of the SDRAM. [0060]
Even though the access to the external SDRAM has very limited access pattern, the multi-level N-dimensional store using the SDRAM as the second or higher level of memory allows flexible access to the data, once the data is read from the SDRAM to the n-dimensional store. [0061]
Parallel Spiral Pattern (PSP) Array Processors for ME Search [0062]
FIG. 7 shows an array processor implementation of an NDPE in FIG. 1B, in accordance with one embodiment of the invention. The array structure reduces the reference bandwidth need in a ME algorithm by using a parallel spiral search pattern and array-processors. This approach allows multiple processors to share the same data output from the reference buffer. Here the N-D object data (in this case, a Macro Block—MB) is fetched into the [0063] current MB register 701. The 9 processors, J(0,0)-J(2,2), also receive inputs through temp registers 704, 706, and 708, and can process the search objective function evaluation concurrently with the reference frame data coming from the high speed data bus (NDDB) 106. The outputs of the 9 processors, J(0,0)-J(2,2), are received by a motion vector decision making block 710, which provide motion vectors 712 to the high speed data bus (NDDB) 106.
This embodiment of the invention exploits the fixed search/access pattern nature in the ME algorithm. One way to share the 2-level memory output is to pre-specify the search pattern in the ME algorithm such that multiple search points are done in parallel. Traditionally, the ME algorithm uses various algorithms. One embodiment uses a spiral search that follows a pre-specified search trace until it eventually finds the best search point. [0064]
FIG. 8 illustrates a spiral search with a step size of 4 pixels, in accordance with one embodiment of the invention. In order to allow the parallelism of search with fixed access memory access pattern, this embodiment of the invention uses a [0065] search pattern 804 called a “Parallel Spiral Search” to search pixels 802.
FIG. 9 shows an example of the parallel spiral pattern [0066] 902 with P search points in parallel, with P=9 in this example to search pixels 802. With the P search points processing in parallel in a fixed pattern, e.g., a 3×3 grid pattern, the input data can be further analyzed to enhance the sharing and reduce the memory bandwidth usage.
One embodiment of this concept is shown in FIG. 10. Each of the search points in FIG. 10 specifies the location where a cost function evaluation is to be performed. In this case, the cost function is assumed to be based on a 16×16 size block. The search-[0067] points 1, 2 and 3 share 16 pixels out of the 24 pixels input in each row of pixels 802. In this way, when the first row is read from the reference buffer, it is shared by all three search- points 1, 2, and 3. Starting from row 5, the data is shared by search- points 1, 2, 3, 4, 5, and 6. Starting from the ninth row, the data is shared by all nine search-points, 1-9. Since the nine search-points are arranged in a fixed 3×3 grid, the access pattern for reference buffer is fixed and easily designed to reuse the data when it is read out from the buffer. Note that in this array processing architecture based on the parallel spiral search pattern, the search pattern step-size, and the array size in x and y dimensions are all parameters that can be set to any specific value.
PSP Array Processors with N-dimensional Memory Storage for ME Search [0068]
Alternately, the PSP array processor can also go in a column of data, or a block of data (e.g., 4×4) if a n-dimensional memory is used with the parallel spiral array processor. An embodiment of this combination is shown in FIG. 11. Once again, the search-[0069] points 1, 2 and 3 share 16 pixels out of the 24 pixels input in each row of pixels 802. In this way, when the first row is read from the reference buffer, it is shared by all three search-points: 1, 2, and 3. Starting from row 5, the data is shared by search-points: 1, 2, 3, 4, 5, and 6. Starting from the ninth row, the data is shared by all nine search-points: 1, 2, 3, 4, 5, 6, 7, 8, and 9.
The use of parallel spiral array processor with the n-dimensional store provides a better performance. Without the n-dimensional store, only a row or column of data is read and shared by the array processor. Assuming that the reference buffer has a data width of 16 pixels providing input data of 16 pixels at a time, consider the case in FIG. 11. If there is no n-dimensional store available, only a row or a column of 16 pixels are read at a time. To access the total of 24 rows of 24 pixels each, 48 cycles is needed and is shared by 9 processors. In this way, the number of cycles per processor is 48/9=5.33. [0070]
If an n-dimensional store is available to allow access of a 4×4 block in one cycle, a total of 36 cycles is needed. The number of cycle per processor in this case is 36/9=4. Note that without the PSP and array processor, the number of cycle is 16 cycles per processor. The performance improves from 16 to 5.33 for PSP processor alone, and to 4 for PSP with n-dimensional store. [0071]
In summary, the array processor architecture can be used alone, or with the n-dimensional memory as taught. The usage of the “Parallel Spiral Pattern with Array processor” with the “2-level Memory” enables a more efficient implementation of ME algorithm to search many more points as compared with traditional single spiral point search pattern, and therefore achieve much higher compression performance. [0072]
In the description herein, numerous specific details are provided, such as the description of system components and methods, to provide a thorough understanding of embodiments of the invention. One skilled in relevant arts will recognize, however, that the invention can be practiced without one or more of the specific details, or with other systems, methods, components, materials, parts, and the like. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention. [0073]
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearance of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. [0074]

Claims

What is claimed is:

1. An object oriented n-dimensional memory system (n is a positive integer) for storing and retrieving n-dimensional signal object in an n-dimensional signal space, comprising:

an n-dimensional object oriented signal object data store (NDOS), which contains at least one fast object oriented encoding and decoding signal data storage memory (NDFS) for high throughput encoding and decoding (N-D) signal object data transfer; and

means for organizing the n-dimensional data into the NDOS, to allow access for certain n-dimensional signal object data operations.

2. The NDOS of claim 1, further comprising:

at least one level of general mass storage memory (NDMS); and

means for data loading at least one NDFS from at least one NDMS to support a high throughput N-D signal object data operation.

3. The system in claim 2, wherein the at least one NDMS includes at least one SDRAM.

4. The system of claim 1, wherein the memory system is adapted for use in storing two-dimensional images.

5. The system of claim 1, wherein the memory system is adapted for use in storing three-dimensional video sequences.

6. An n-dimensional signal processing system to process n-dimensional data inputs, comprising:

an n-dimensional encoding decoding signal processing engine (NDPE);

a plurality of data registers, to store data for the NDPE; and

means for controlling the NDPE to allow at least one n-dimensional signal object to be used efficiently in NDPE.

7. The system of claim 6, further comprising: a motion estimator to explore a plurality of reference frames through a parallel spiral search.

8. The system in claim 6, further comprising: an NDOS to output n-dimensional signal object.

9. The system in claim 8, wherein the NDOS has more than one level.

10. The system in claim 7, wherein the motion estimator evaluates at least one system objective on multiple points in the reference frame(s) using at least one NDPE structure.

11. The system of claim 6, wherein the system is adapted to encoding two-dimensional video sequences.

12. The system of claim 10, wherein the system is adapted to encoding two-dimensional video sequences.

13. The system in claim 10, wherein the n-dimensional encoding decoding signal processing engine (NDPE) is composed of a 3×3 array of processors.

14. A method to operate an n-dimensional object oriented signal object data store (NDOS) for storing and retrieving n-dimensional signal data in an n-dimensional frame sequence, comprising:

storing an n-dimensional signal object data into the NDOS;

organizing the n-dimensional signal data in the NDOS to allow high throughput access;

accessing the data from the NDOS, based on an n-dimensional object request from at least one processor; and

providing the data from the NDOS to at least one outside processing module requesting the n-dimensional object data.

15. A method to operate a two-dimensional memory system for storing and retrieving two-dimensional data in a two-dimensional frame, comprising:

storing two-dimensional signal object data into an n-dimensional object oriented signal data object store (NDOS);

organizing the two-dimensional data in the NDOS to allow high throughput access;

accessing the two-dimensional data from the NDOS based on a two-dimensional object request from at least one processor; and

providing data from the NDOS to at least one outside processing module requesting the two-dimensional object data.