Image processing circuit with block accessible buffer memory
The invention relates to an image processing circuit and to a method of processing an image.
It is known to provide an image processing system with a buffer memory to provide fast access to storing pixel values for a rectangular window of pixel locations from a larger image. Popular image processing tasks, such as the computation of a DCT (Discrete Cosine Transform), two-dimensional filtering and matching between nearby blocks of pixels require repeated execution of the same operation, each time applied to pixel values for a different window of pixel locations. For each execution of the operation the pixel values for the corresponding window are kept in the buffer memory, so that these pixel values can be accessed quickly as part of the operation.
Before the next execution of the operation the content of the buffer memory is updated. Typically, a sliding window is used which moves over a predetermined horizontal distance form one execution of the operation to another. In this case usually only the pixel values for the rightmost column(s) of the image are newly loaded to replace the pixel values of the leftmost column(s).
To support this partial replacement some form of circular addressing is preferably used in the buffer memory. This can be realized for example by means of some form of circular address translation of window-relative X-Y addresses into memory addresses. Thus, for a horizontally moving window, a same memory address is the translation of X-Y addresses with successively decreasing X values. When the X value has been decreased outside the window, a new pixel value is loaded for the memory address and the X value is increased by the window size.
Of course, other solutions can be used to support partial replacement. As an alternative, old pixel values can be moved to shifted memory locations between each execution of the operation for different windows and new pixel values can be loaded into vacated memory locations. That is, shift registers are effectively used for respective lines of the image. In this case, a fixed translation of X-Y addresses into memory locations can be used to address locations in the shift registers.
Advantageously, a buffer memory for image processing provides for parallel access to a plurality of pixel values, for example in parallel to pixel values for a horizontal line of pixel locations in the window. In this way parallel processor circuits can be used, each for processing one or more pixel values for a respective pixel location or group of pixel locations in parallel with the other processing circuits.
The size and organization of the buffer memory is typically chosen dependent on the operation that must be performed. If an operation requires blocks of pixel values for 16x16 pixel locations in the X and Y direction, a buffer memory of 256 (=16x16) memory locations is typically used and the X-Y addresses of the operation are preferably contain two four bit parts for addressing X and Y locations relative to the window respectively.
However, not all processing tasks require pixel values for a window of the same size. Some operations require data for windows of 8x8 pixel locations; others require data for windows of 16x16 pixel locations or for 9x9 or 17x17 pixel locations etc. For fast processing it is desirable that the buffer memory has sufficient locations to store all pixel values for the largest required window.
In this case, when an operation is executed that requires a smaller window, only part of the pixel values will be used. When the buffer memory supports parallel access pixel values for a line of pixel locations only part of the parallel access capacity will be used. When a plurality of parallel processor circuits is provided that corresponds to the largest possible window size only part of these processor circuits will be used.
Among others it is an object of the invention to increase the utilization of processing resources in an image processing system that provides for parallel access to a plurality of memory locations in a buffer memory for storing pixel values for an at least two- dimensional window of pixel locations. The invention provides for an image processing circuit according to Claim 1.
The invention provides for use of blocks of different dimensions during access to a buffer memory. The term "block" will be used herein to refer to collections of pixel locations within a window of pixel locations. Typically the pixel values for locations of the block are accessed in parallel. Respective shift circuits are provided for respective functional rows of the buffer memory to change the access ports through which the pixel values of the rows can be accessed independently of one another. Amounts of shift for different rows are selected dependent on a mode signal that indicates the dimensions of a block that is accessed. The amounts of shift used by the circuit have values that meet the condition that pixel values for respective lines of pixel locations in the block that are stored in different ones of the rows can
be accessed in parallel. Thus for example, if a block with lines of N/m pixels wide is used and N access ports are available N pixels can be accessed in parallel when access to pixel values in different rows is shifted over integer multiples of N/m relative to one another. The functional rows are preferably implemented as geometrical rows in a memory matrix on an integrated circuit, but the invention is not limited to functional rows that are also geometrical rows.
Preferably, shifting is realized by implementing the functional rows of the buffer memory each as a circular shift register, pixel values being transferred along the shift register to implement access via different access ports. A simple serially connected chain of registers may be used. Alternatively multiplexers may be used between the registers in the shift register to provide for selectable shifting steps to speed up shifting. In another embodiment non-circular shift registers may be used, but in this case typically larger shift registers are needed, which are wider than a row of the window, to preserve pixel values that are shifted "out of view" These and other objects and advantageous aspects of the invention will be described using non- limitative examples of embodiments.
Figure 1 shows an image processing circuit Figure 2a-b illustrate relations between pixel locations and parallel output of pixels
Figure 3 shows a buffer memory
Figure 3 a shows a shift control part
Figure 3b shows a column addressing part Figure 4 shows a shift register
Figure 5 shows a buffer memory
Figure 6 shows a further shift control part
Figure 7a-b show fractional storage of image lines
Figure 1 shows an image processing circuit that comprises a main memory 10, a memory interface 12, a buffer memory 14, an addressing circuit 15, a plurality of processing circuits 16 and a control circuit 18. Buffer memory 14 is coupled to main memory 10 via interface 12. Buffer memory 14 has parallel data access ports 17 coupled to respective
ones of processing circuits 16. Control circuit 18 has an instruction output coupled to processing circuits 16, an address/control output coupled to addressing circuit 15 and a control output coupled to interface 12. Addressing circuit 15 has outputs coupled to buffer memory 14. Although three access ports 17 and processing circuits 16 have been shown, it should be understood that any number may be provided in parallel, for example eight or sixteen access ports 17 and processing circuits 16.
In an embodiment control circuit comprises an instruction memory, for storing a program of instructions, a program counter to address these instructions. The instructions include instructions that comprise a command part and an address part. In this embodiment control circuit is arranged to feed the address part of the issued instructions to addressing circuit 15 and the command part of the instructions to processing circuits 16. Furthermore the program counter is coupled to interface circuit 12, for triggering a transfer between main memory 10 and buffer memory 14 at a predetermined position in a cycle of instructions. In this embodiment processing circuits 16 are SIMD processing circuits, arranged to receive the same instruction from control circuit 18. However, it should be realized that the invention is not limited to this embodiment. Other forms of providing an address to addressing circuit 15 and controlling operation of processing circuit may be used. Furthermore, although each processing circuit 16 is shown connected to one access port 17, it should be appreciated that in a further embodiment each processing circuit 16 may be connected to a plurality of access ports, e.g. to of its neighbors as well or even to those of the neighbors of its neighbors etc. When the access ports are read ports this allows more complicated operations to be performed in each processing circuit 16. In another embodiment sets of processing circuits 16 may be replaced by respective larger processing circuits each with inputs coupled to a plurality of access ports 17 to perform more complicated operations.
In operation control circuit 18 supplies instructions to processing circuits 16 and window-relative addresses and mode signals to addressing circuit 15. Typically control circuit 18 supplies at most one window relative address-to-address pixel value for all processing circuits 16. Addressing circuit 15 translates a combination of each window relative address and the mode signals into control signals for buffer memory 14.
Buffer memory 14 may be arranged to support both reading and writing from processing circuits 16, or only reading or only writing. When both reading and writing is possible control circuit also sends read/write control signals to addressing circuit 15, in association with window-relative addresses. For the sake of illustration a read operation will
be discussed. In response to the address for the read operation buffer memory 14 retrieves pixel values for a plurality of pixel locations from buffer memory 14 and outputs these pixel values in parallel at ports 17.
Buffer memory 14 and addressing circuit 15 support control of the relation between the pixel locations of the pixel values that have been loaded from main memory 14 and the combination in which these pixel values will be output in parallel during reading.
Figures 2a-b illustrate the relation between the pixel locations and the combination of pixel values that will be output during a first and second mode of operation respectively for an example where there are four access ports 17 and processing circuits 16 in parallel. In the first mode pixel values for one line segment (one Y address) of four successive pixel locations (four X addresses) are output in parallel. In this first mode the window-relative address from control circuit 18 corresponds for example to the window relative XY-address of leftmost pixel location of the line segment. In the second mode pixel values for two successive line segments (two Y addresses) of each two-pixel locations (two X addresses) are output in parallel. In this second mode the window-relative address from control circuit 18 corresponds for example to window-relative XY address of the leftmost pixel location of the upper line segment.
The first mode can be used for example to process pixel values for blocks that are four pixel locations wide. In this mode, during execution of an instruction, each processing circuit 16 access a pixel value for a respective pixel location along a line segment in the block in parallel. The second mode can be used for example to process pixel values for blocks that are two pixel locations wide. In this mode, during execution of an instruction, processing circuits 16 are divided into two groups of each two processing circuits 16. A first group of two processing circuits 16 access pixel values for a first line segment during execution of the instruction as a result of a window-relative address that control circuit 18 applies to addressing circuit. A second group of processing circuits 16 access a pixel value for a second line segment during execution of the same instruction and as a result of the same application of the window-relative address. Thus, all processing circuits 16 can be utilized for both block widths. It is not necessary to leave part of access ports 17 and processing circuit 16 unused if the number of pixel locations along the width of the block is less than the number of access ports 17 and processing circuit 16.
Figure 3 illustrates an embodiment of buffer memory 14 and addressing circuit 15. For the sake of example a small buffer memory 14 with storage space for pixel values 4x4 pixel locations is shown, but it should be appreciated that in practice larger buffer
memories may be used. In this embodiment buffer memory 14 comprises a plurality of circular shift register circuits 30. Multiplexing circuits 32 are provided, organized in groups that each correspond to a respective access port 17. Each group of multiplexing circuit 32 is coupled between outputs of respective ones of the shift registers 30 and its corresponding access port 17.
Addressing circuit 15 has an input X, Y for receiving an X, Y address and an input M for receiving a mode signal. Addressing circuit 15 has separate control outputs coupled to shift control inputs of respective ones of circular shift registers 30. Furthermore addressing circuit 15 has separate control outputs coupled to the multiplexing circuits for respective access ports 17.
Addressing circuit 15 is arranged to cause shifts in the shift registers 30 dependent on the mode signal from mode signal input M, so that the shifts in different shift registers may be mutually different. The mode signal indicates the width W of the block from which pixel values are needed. The pixel values in successive shift registers 30 are shifted over an optional common amount of shift C plus respective integer multiples of W, modulo N, where N is the number of pixel values stored in each shift register 30 (the width of the window that is stored in buffer memory). In one embodiment:
shift for ith row = C + i*W modulo N
The common amount of shift C, which is the same for all shift registers 30 may be zero C=O for example. Thus for the example of a 4x4 buffer memory N=4 and for a mode signal that indicates a block of width W=2, the pixel values in successive shift registers 30 are shifted alternately over 0 and 2 positions. Figure 3a illustrates an embodiment of a shift control part of addressing circuit
15, comprising a lookup table circuit 34 with inputs that receive the mode signal "M" and outputs for shift control signals for the respective rows of buffer memory 14 (not shown). A lookup table (LUT) circuit 34 can be implemented as a memory that uses the mode signal M as address and stores sets of control signals at addresses for respective mode signal values, for example according to the preceding relation. Alternatively a logic circuit may be used that produces the same input-output relation, or arithmetic circuits to perform computation of the control signals.
Multiplexing circuits 32 may be implemented as switches that connect the outputs of selected ones of shift registers 30 to access ports 17. Addressing circuit 15 is
arranged to use a window-relative XY address signal from address signal input X, Y to generate control signals for multiplexing circuits 32. The control signals are generated so that the outputs of multiplexing circuits that contain pixels from the selected block are coupled to access ports 17. For each access port 17 one multiplexing circuit 32 is selected. It is possible to do so because a plurality of rows of buffer memory has been mutually shifted. For each relevant row of buffer memory W outputs of the shift register 30 for that row are output.
For example, in the embodiment wherein the ith row is shifted over i*W modulo N, if the window-relative address is X, Y addressing circuit 15 selects W columns in the YΛ row of buffer memory 14, starting from column number X+Y*W modulo N, wrapping around to the zeroth column if necessary. IfW is smaller than N, addressing circuit 15 also selects W columns in the (Y+l)st row, from column number X+(Y+1)*W modulo N, wrapping around to the zeroth column if necessary. Preferably, addressing circuit, if possible selects even more rows: the Y+jth row j=0, 1 , ..., at W positions starting from column number X+(Y+j)*W modulo N and wrapping around to the zeroth column if necessary. In this case the j values should satisfy j=0..jmax- 1 , with jmax*W smaller than or equal to N.
Figure 3b illustrates a column addressing part for generating the addresses for different columns. Lookup-table (LUT) circuits 36 are provided that outputs required address offsets for successive access ports 17. The mode signal M, the X address and (part of) the Y address are used to as inputs to the LUT circuits 36. The LUT circuit is arranged to generate offset values for respective ones of the columns as a function of mode signals M and window-relative X, Y address values. Adder circuits 38 are provided to add the Y address to the outputs of the LUT circuit 36. It will be appreciated that many implementations of addressing circuit 15 are possible to realize addressing. Adder circuits 38 may be integrated with LUT circuit 36 and/or with decoding circuits to generate selection signals for individual multiplexing circuits 32. In another embodiment, a computation circuit is provided for each access port 17, to compute the row of buffer memory that should be coupled to that access port 17.
In the embodiment of the example given above the column addressing part effectively computes integer row addresses Y+j that satisfy the conditions
P=X+k+(Y+j)*W modulo N
where P is the port number, k between 0 and W-I enumerates the different pixel values along a row in a block and j is between 0 and jmax- 1.
Thus, for the example of a 4x4 buffer memory 14 (N=4) for a two pixel location wide block W=2, if the window relative X Y address is X=2, Y=2 addressing circuit 15 will control the multiplexing circuits 32 of the second row to couple the shift registers 30 of that row to the zeroth and first access port (P=O and 1 for k=0 and 1 respectively, corresponding to j=0). Furthermore, addressing circuit 15 will control the multiplexing circuits 32 of the third row to couple the shift registers 30 of that row to the second and third access port (P=2 and 3 for k=0 and 1 respectively, corresponding to j=l).
In this embodiment addressing circuit 15 controls the amount of shift in each column dependent only on the width W of the block. This may complicate address computation. Moreover, it has the effect that the access port that provides access to the pixel value for the upper left pixel location of the block depends on the X and Y address of the block. This pixel location is output to the X+Y*Wth one of access ports 17. In a further embodiment a shift circuit (not shown) is coupled between access ports 17 and multiplexing circuits 32 to shift the pixel values circularly over X+Y*W positions so that predetermined access ports 17 provide access to pixel values for predetermined pixel locations relative to the upper left corner of the block. Thus, processing circuits 16 can be adapted to the pixel values.
In alternative embodiments the dependence on the Y address is removed by adapting the amount of shift in the shift registers to the Y address and/or to the X address. LUT circuit 34 may be provided with an input for Y address signals and/or X address signals to realize this for example. In this case addressing circuit 15 may be arranged to shift the pixel values in the ith shift register 30 over (i- Y)* W-X modulo N, where X and Y are the window relative X and Y address of the upper left pixel location of the block with which buffer memory 14 is addressed. Thus, at the access ports 17, the pixel locations that correspond to predetermined pixel locations within an addressed block correspond to predetermined access ports 17. In this case no further shift circuit is needed to realize this at access ports 17. In alternative embodiments addressing circuit may control the shift of the ith shift register 30 according to (i-Y)*W modulo N. In this case the addresses of the columns can be determined from the Y address plus a Y address independent offset. In another embodiment or -X modulo N is added so that a dependence of access ports 17 on only Y position in the block is realized.
In one embodiment addressing circuit 15 computes decoded selection signals and outputs different selection signals to individual multiplexing circuits 32 to provide on/off control over access ports 17. Alternatively, addressing circuit 15 may supply coded address
signals to multiplexing circuits 32 of a column in common, in which case multiplexing circuits 32 should be arranged to do the decoding as well.
Shifting may be controlled by enabling each shift register 30 to shift data for a controlled number of shift cycles. In one embodiment addressing circuit enables the ith shift register 30 for i*W modulo N shift cycles.
Figure 4 shows an alternative shift register circuit 40 that may be used in a row of buffer memory 14. Shift register circuit 40 comprises a circular series of registers and multiplexers 44 inserted between registers 42. Each multiplexer 44 has a first input coupled to an output of a preceding register 42 and the output of a register 42 a predetermined number of positions further back along the series. Addressing circuit 15 (not shown) controls multiplexers 44 and registers 42. A shift step multiplexers 44 can be used to select between shifting by one positions in register 40 or over a plurality of positions in a single shift cycle. Thus, the number of shift cycles that is needed to realize a large amount of shift can be reduced by controlling multiplexers 44 for selected shift registers 40 to pass pixel values from said predetermined number of positions back along the series.
It should be realized that a further reduction of the number of shift cycles can be realized by using multiplexing circuits 44 with more selectable inputs coupled to outputs of registers 42 at respective predetermined distances along the cyclical series of registers 42. In the most extreme form each multiplexer 44 has selectable inputs coupled to outputs of each of the registers 42 from which pixel values may be shifted. In this case addressing circuit 35 controls the multiplexers 44 in each row of buffer memory to pass pixel values from registers 42 at a required distance along the cyclical series (e.g. from a distance i*W, i being the row number of the shift register 40 in buffer memory 14). As a result a single shift cycle suffices to shift data along each shift register 40. In a further embodiment only a limited number of shift amounts is allowed (for example according to the different block sizes) and selectable inputs of multiplexers 44 for all these amounts are provided, but not for other amounts.
In an embodiment control circuit 18 is arranged to supply a fresh mode control signal for each access operation. In this case, each access instruction that is processed by control circuit preferably contains a field for a mode control signal value that control circuit supplies to addressing circuit 15. In an alternative embodiment, the mode control signal may be set for a plurality of access operations, for example until a new mode control signal value is set. In this case a separate instruction may be used to set the mode control signal value for subsequent access instructions. Preferably, in this alternative embodiment the pixel values in
shift registers 30 (or 40) are shifted in response to the instruction for setting the new mode control signal value.
In both embodiments, different types of processing, which use different blocks sizes and/or shapes can be executed during the same task without reloading buffer memory to fit the different block sizes.
Figure 5 shows an embodiment wherein the shift registers 30 or 40 are replaced by sets of registers 50 coupled to a shift circuit 52 (e.g. a barrel shifter) that is controlled by addressing circuit 15. The outputs of shift circuit 52 are coupled to access ports 17 by multiplexing circuits 54 (not shown in detail) similar to multiplexing circuits 32. In operation buffer memory 14 of figure 5 performs row dependent shifting by means of circuit switching in shift circuits 52, that is, without transferring data from one register in a shift register chain to another. This may speed up access, but it requires a more complicated access a less compact circuit than when shift registers are used.
In one embodiment addressing circuit 15 causes shift registers 30 (or 40) to shift the pixel values for each access operation by the amounts of shift needed for the access operation and back again over the same amount after the access operation.
Figure 6 shows an embodiment of a shift control part of addressing circuit 15 wherein the pixel values are shifted only before an access operation. In this embodiment a register 60 is provided for storing a last used mode control signal and a LUT circuit 34 (or any other circuit with the same input output function) for outputting shift control signal corresponding to the difference, if any, between the existing amount of shift (as indicated by the old mode control signal value) and the newly required amount of shift for the new access operation. After the shift has been controlled the stored mode control signal value is updated. Thus, the number of shift operations can be minimized, which saves power and time. When the amount of shift also depends on X and/or Y addresses, are preferably stored register 60 as well and supplied to LUT circuit 34.
In another embodiment addressing circuit 35 is arranged to disable shifting in the shift registers that are not addressed. If this embodiment is combined with the embodiment that use differential shifting, preferably registers are provided to represent the previous amount of shift for respective columns and the shift is controlled dependent on the change in shift with respect to that previous amount of shift.
Interface circuit 12 supports transfer of pixel values between main memory 10 and buffer memory 10. Any mechanism may be used to control transfer. In one embodiment transfer is linked to operation cycles of processing circuits 16. In this embodiment control
circuit initially sends a signal to interface circuit 12 that triggers interface circuit 12 to load pixel values for a window of pixel locations from main memory 10 into buffer memory 14. Next control circuit 18 starts a processing cycle and upon completion of access during that operation cycle control circuit 18 sends a signal to interface circuit 12 that triggers interface circuit 12 to load an additional column of pixel values for a window of pixel locations from main memory 10 into buffer memory 14. In this case the shift mechanism of buffer memory 14 may be used to realize that the new pixel values can be written in predetermined registers. It should be noted that this only requires the same amount of shift for all rows, not the mode dependent amounts of shift during operation. Although in one embodiment interface circuit 12 may arranged to transfer pixel values from main memory 10 to buffer memory 14 only, in an alternative embodiment interlace circuit 12 may arranged to transfer pixel values from buffer memory 14 to main memory 10 only (after processing), or to transfer pixel values from main memory 10 to buffer memory 14 before processing and back from buffer memory 14 to main memory 10 after processing.
It should be noted that other forms of transfer between main memory 10 and buffer memory 14 may be used. Instead of transfer linked to operation cycles (triggered for example by a program counter value), transfer in response to an explicit transfer instruction, or as a result of addressing outside buffer memory may be used. Although the invention has been described for an embodiment wherein control circuit 18 addresses pixel values by means of window-relative X Y addresses, an alternative embodiment may use absolute (image-relative) XY addresses. In this alternative embodiment a translation circuit may be provided to translate the absolute addresses into window relative X Y addresses, given an XY address of the window that is stored in buffer memory. Alternatively, the internal addresses (and optionally shift amounts) in buffer memory may be computed directly from the absolute addresses.
The invention has been described by means of examples wherein there is a one to one correspondence between image lines and rows of buffer memory 14. However, alternatively pixel values for more than one image line may be stored in one row or pixel values for one image line may be stored distributed over a plurality of rows of buffer memory 14. Preferably, in this case the number of pixel locations from an image line that is stored in buffer memory 14 is an integer multiple of the number of memory locations in a row, or conversely the number of memory locations in a row is an integer multiple of the number of
pixel locations from an image line that is stored in buffer memory 14. In a further embodiment the circuit is not limited to these integer multiples.
Figure 7a-b show fractional storage of image lines for an example wherein buffer memory has six access ports and rows of six memory locations. The figure shows a matrix of rows and columns that correspond to rows (shift registers 30 or 40) and columns (access ports 17) of memory locations in buffer memory 14. Pixel values for line segments of pixel locations in an image are stored in the memory locations. Memory locations that store the pixel values for the leftmost pixel locations on the line segments are indicated by circles. In the case of fractional storage, as shown, these memory locations are not in the same column. The figures also indicate the location of pixel values for line segments of rectangular blocks. Figure 7a illustrates the storage location of pixel values for a 4x2 block of pixel locations with crosses, triangles and squares. Figure 7b similarly illustrates the storage locations of pixel values for a 3x6 block of pixel locations.
Now when blocks of selectable dimensions must be accessed a mode signal is supplied to addressing circuit 15 to indicate the dimensions of the selected block. In this embodiment addressing circuit 15 also receives information about the length of the line segments that are stored in memory, or about the offsets between the starting memory locations of the line segments. In addition, an X, Y addresses of the block is supplied. Addressing circuit 15 controls shift amounts and addresses to allow pixel values for different line segments to be output on different access ports 17. For this purpose addressing circuit 15 controls shift amounts and addresses as a function of the mode signal and the length of the stored line segments.
The figures indicate pixel location from respective image lines that will be accessed in parallel on access port 17 by crosses and triangles respectively. As can be seen in the example of figure 7a four pixel values from one line in the block are output in parallel and two pixels of a next line are accessed. As can be seen in the example of figure 7b three pixel values from one line in the block are output in parallel and three pixels of a next line.
The shift amounts and addresses for this type of access are more complicated than for the case wherein the number of stored pixel locations equals the number of memory locations per row of buffer memory 14. In the case of figure 7a addressing circuit 15 may use shift amounts of zero and two for example. The row addresses generated by addressing circuit 15 for the first two access ports address the second row and the addressed for the third to sixth access port address the first row. In the case of figure 7b addressing circuit 15 may use shift amounts of two, one and one for the first second and third row for example. The row
addresses generated by addressing circuit 15 for the first three access ports address the first row, the addresses for the following two access ports address the second row and the address for the last access port address the third row.
As will be appreciated the addressing pattern is less regular in this case than for the example where the image lines start at the same position in each row of buffer memory 14. However, it is straightforward to select combinations of shift amounts and addresses that allow parallel access to pixel values from a plurality of lines for any given configuration of block dimensions, block address and line segment length of image line segments in buffer memory 14. A skilled person may select such combinations of shift amounts subject to the constraint that they have the result that the required pixel locations are shifted to mutually different sets of access ports. The skilled person can then provide LUT circuits or any other convenient circuits that output control signals that effect the selected combination of shift amounts and addresses in response to information about the configuration of block dimension, block address and line segment lengths. As will be appreciated this technique can be used to support blocks of any shape, not just rectangular blocks.
Although the invention has been described by means of specific embodiments it should be appreciated that the invention is not limited to these embodiments. For example it should be realized that a larger buffer memory size than 4x4 pixel locations can be used. Nor is the buffer memory 14 limited to a memory for a square window.
Furthermore, although the invention has been described for a buffer memory 14 that contains one pixel value per memory location, it should be realized that more than one pixel values may be stored in each location, for example a group of four pixel values for successive pixel locations in the X direction. In this case this group preferably defines the basic granularity of access, transfer to the access port 17 being controlled in common for a group and the pixel values of the group being accessed together at the same access port 17. Of course this will limit addressing and block sizes to integer multiples of the granularity, however this is acceptable for image processing operations in most applications.
Furthermore, although the use of a circular shift register has been described, it should be appreciated that alternatively a non-circular shift register may be used (that is, a register that does not shift in pixel values at the left when these pixel values are shifted out at the right or vice versa). In this case, the non- linear shift register is preferably wider than a row of the window, with additional locations for receiving pixel values from the window when they are shifted left or right, so that no pixel values from the window are lost when the
data is shifted back and forth as required for the different modes. It will be appreciated that a smaller shift register can be used when a circular shift register is available. Also split presentation of the blocks is possible in this case.
Addressing circuit 15 may be implemented as part of processing circuits 16, or as part of a larger processing circuit that replaces a set of individual processing circuits. In this way block selection and addressing may be controlled by means of instructions in such a processing circuit. Alternatively, addressing circuit 15 can be integrated with buffer memory 14. Furthermore, part of the function of control circuit 18 may be integrated in buffer memory 14 and/or with addressing circuit 15, for example to detect when new data must be fetched from main memory 10 on the basis of address use and/or information from processing circuits 16 about future addressing.
Although the invention has been explained mainly for embodiments where the access involves reading of pixel values from buffer memory 14, it should be realized that the invention may alternatively be applied to writing only or to reading and writing or to reading only. This merely affects the direction in which signal flow is supported between the access ports 17 and registers in the shift registers 30, 40 or registers 50. In case of writing multiplexing circuits 32, 54 strictly speaking function as demultiplexing circuits, but this makes no difference if these multiplexing circuits are implemented using switched connections.