WO2006120620A2 - Image processing circuit with block accessible buffer memory - Google Patents

Image processing circuit with block accessible buffer memory Download PDF

Info

Publication number
WO2006120620A2
WO2006120620A2 PCT/IB2006/051411 IB2006051411W WO2006120620A2 WO 2006120620 A2 WO2006120620 A2 WO 2006120620A2 IB 2006051411 W IB2006051411 W IB 2006051411W WO 2006120620 A2 WO2006120620 A2 WO 2006120620A2
Authority
WO
WIPO (PCT)
Prior art keywords
shift
pixel values
pixel
buffer memory
circuits
Prior art date
Application number
PCT/IB2006/051411
Other languages
French (fr)
Other versions
WO2006120620A3 (en
Inventor
Carlos A. Alba Pinto
Ramanathan Sethuraman
Original Assignee
Nxp B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nxp B.V. filed Critical Nxp B.V.
Priority to EP06765667A priority Critical patent/EP1882235A2/en
Priority to JP2008510698A priority patent/JP2008541259A/en
Publication of WO2006120620A2 publication Critical patent/WO2006120620A2/en
Publication of WO2006120620A3 publication Critical patent/WO2006120620A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Definitions

  • the invention relates to an image processing circuit and to a method of processing an image.
  • the content of the buffer memory is updated.
  • a sliding window is used which moves over a predetermined horizontal distance form one execution of the operation to another.
  • the pixel values for the rightmost column(s) of the image are newly loaded to replace the pixel values of the leftmost column(s).
  • some form of circular addressing is preferably used in the buffer memory. This can be realized for example by means of some form of circular address translation of window-relative X-Y addresses into memory addresses.
  • a same memory address is the translation of X-Y addresses with successively decreasing X values.
  • a new pixel value is loaded for the memory address and the X value is increased by the window size.
  • a buffer memory for image processing provides for parallel access to a plurality of pixel values, for example in parallel to pixel values for a horizontal line of pixel locations in the window. In this way parallel processor circuits can be used, each for processing one or more pixel values for a respective pixel location or group of pixel locations in parallel with the other processing circuits.
  • an object of the invention to increase the utilization of processing resources in an image processing system that provides for parallel access to a plurality of memory locations in a buffer memory for storing pixel values for an at least two- dimensional window of pixel locations.
  • the invention provides for an image processing circuit according to Claim 1.
  • the invention provides for use of blocks of different dimensions during access to a buffer memory.
  • the term "block” will be used herein to refer to collections of pixel locations within a window of pixel locations. Typically the pixel values for locations of the block are accessed in parallel.
  • Respective shift circuits are provided for respective functional rows of the buffer memory to change the access ports through which the pixel values of the rows can be accessed independently of one another. Amounts of shift for different rows are selected dependent on a mode signal that indicates the dimensions of a block that is accessed. The amounts of shift used by the circuit have values that meet the condition that pixel values for respective lines of pixel locations in the block that are stored in different ones of the rows can be accessed in parallel.
  • N pixels can be accessed in parallel when access to pixel values in different rows is shifted over integer multiples of N/m relative to one another.
  • the functional rows are preferably implemented as geometrical rows in a memory matrix on an integrated circuit, but the invention is not limited to functional rows that are also geometrical rows.
  • shifting is realized by implementing the functional rows of the buffer memory each as a circular shift register, pixel values being transferred along the shift register to implement access via different access ports.
  • a simple serially connected chain of registers may be used.
  • multiplexers may be used between the registers in the shift register to provide for selectable shifting steps to speed up shifting.
  • non-circular shift registers may be used, but in this case typically larger shift registers are needed, which are wider than a row of the window, to preserve pixel values that are shifted "out of view"
  • Figure 1 shows an image processing circuit Figure 2a-b illustrate relations between pixel locations and parallel output of pixels
  • Figure 3 shows a buffer memory
  • Figure 3 a shows a shift control part
  • Figure 3b shows a column addressing part
  • Figure 4 shows a shift register
  • Figure 5 shows a buffer memory
  • Figure 6 shows a further shift control part
  • FIG. 7a-b show fractional storage of image lines
  • Figure 1 shows an image processing circuit that comprises a main memory 10, a memory interface 12, a buffer memory 14, an addressing circuit 15, a plurality of processing circuits 16 and a control circuit 18.
  • Buffer memory 14 is coupled to main memory 10 via interface 12.
  • Buffer memory 14 has parallel data access ports 17 coupled to respective ones of processing circuits 16.
  • Control circuit 18 has an instruction output coupled to processing circuits 16, an address/control output coupled to addressing circuit 15 and a control output coupled to interface 12.
  • Addressing circuit 15 has outputs coupled to buffer memory 14. Although three access ports 17 and processing circuits 16 have been shown, it should be understood that any number may be provided in parallel, for example eight or sixteen access ports 17 and processing circuits 16.
  • control circuit comprises an instruction memory, for storing a program of instructions, a program counter to address these instructions.
  • the instructions include instructions that comprise a command part and an address part.
  • control circuit is arranged to feed the address part of the issued instructions to addressing circuit 15 and the command part of the instructions to processing circuits 16.
  • the program counter is coupled to interface circuit 12, for triggering a transfer between main memory 10 and buffer memory 14 at a predetermined position in a cycle of instructions.
  • processing circuits 16 are SIMD processing circuits, arranged to receive the same instruction from control circuit 18.
  • Other forms of providing an address to addressing circuit 15 and controlling operation of processing circuit may be used.
  • each processing circuit 16 is shown connected to one access port 17, it should be appreciated that in a further embodiment each processing circuit 16 may be connected to a plurality of access ports, e.g. to of its neighbors as well or even to those of the neighbors of its neighbors etc. When the access ports are read ports this allows more complicated operations to be performed in each processing circuit 16. In another embodiment sets of processing circuits 16 may be replaced by respective larger processing circuits each with inputs coupled to a plurality of access ports 17 to perform more complicated operations.
  • control circuit 18 supplies instructions to processing circuits 16 and window-relative addresses and mode signals to addressing circuit 15. Typically control circuit 18 supplies at most one window relative address-to-address pixel value for all processing circuits 16. Addressing circuit 15 translates a combination of each window relative address and the mode signals into control signals for buffer memory 14.
  • Buffer memory 14 may be arranged to support both reading and writing from processing circuits 16, or only reading or only writing. When both reading and writing is possible control circuit also sends read/write control signals to addressing circuit 15, in association with window-relative addresses. For the sake of illustration a read operation will be discussed. In response to the address for the read operation buffer memory 14 retrieves pixel values for a plurality of pixel locations from buffer memory 14 and outputs these pixel values in parallel at ports 17.
  • Buffer memory 14 and addressing circuit 15 support control of the relation between the pixel locations of the pixel values that have been loaded from main memory 14 and the combination in which these pixel values will be output in parallel during reading.
  • Figures 2a-b illustrate the relation between the pixel locations and the combination of pixel values that will be output during a first and second mode of operation respectively for an example where there are four access ports 17 and processing circuits 16 in parallel.
  • first mode pixel values for one line segment (one Y address) of four successive pixel locations (four X addresses) are output in parallel.
  • the window-relative address from control circuit 18 corresponds for example to the window relative XY-address of leftmost pixel location of the line segment.
  • pixel values for two successive line segments (two Y addresses) of each two-pixel locations (two X addresses) are output in parallel.
  • the window-relative address from control circuit 18 corresponds for example to window-relative XY address of the leftmost pixel location of the upper line segment.
  • the first mode can be used for example to process pixel values for blocks that are four pixel locations wide.
  • each processing circuit 16 access a pixel value for a respective pixel location along a line segment in the block in parallel.
  • the second mode can be used for example to process pixel values for blocks that are two pixel locations wide.
  • processing circuits 16 are divided into two groups of each two processing circuits 16. A first group of two processing circuits 16 access pixel values for a first line segment during execution of the instruction as a result of a window-relative address that control circuit 18 applies to addressing circuit.
  • a second group of processing circuits 16 access a pixel value for a second line segment during execution of the same instruction and as a result of the same application of the window-relative address.
  • all processing circuits 16 can be utilized for both block widths. It is not necessary to leave part of access ports 17 and processing circuit 16 unused if the number of pixel locations along the width of the block is less than the number of access ports 17 and processing circuit 16.
  • FIG. 3 illustrates an embodiment of buffer memory 14 and addressing circuit 15.
  • buffer memory 14 comprises a plurality of circular shift register circuits 30.
  • Multiplexing circuits 32 are provided, organized in groups that each correspond to a respective access port 17. Each group of multiplexing circuit 32 is coupled between outputs of respective ones of the shift registers 30 and its corresponding access port 17.
  • Addressing circuit 15 has an input X, Y for receiving an X, Y address and an input M for receiving a mode signal. Addressing circuit 15 has separate control outputs coupled to shift control inputs of respective ones of circular shift registers 30. Furthermore addressing circuit 15 has separate control outputs coupled to the multiplexing circuits for respective access ports 17.
  • Addressing circuit 15 is arranged to cause shifts in the shift registers 30 dependent on the mode signal from mode signal input M, so that the shifts in different shift registers may be mutually different.
  • the mode signal indicates the width W of the block from which pixel values are needed.
  • the pixel values in successive shift registers 30 are shifted over an optional common amount of shift C plus respective integer multiples of W, modulo N, where N is the number of pixel values stored in each shift register 30 (the width of the window that is stored in buffer memory).
  • the pixel values in successive shift registers 30 are shifted alternately over 0 and 2 positions.
  • Figure 3a illustrates an embodiment of a shift control part of addressing circuit
  • a lookup table (LUT) circuit 34 can be implemented as a memory that uses the mode signal M as address and stores sets of control signals at addresses for respective mode signal values, for example according to the preceding relation. Alternatively a logic circuit may be used that produces the same input-output relation, or arithmetic circuits to perform computation of the control signals.
  • Multiplexing circuits 32 may be implemented as switches that connect the outputs of selected ones of shift registers 30 to access ports 17.
  • Addressing circuit 15 is arranged to use a window-relative XY address signal from address signal input X, Y to generate control signals for multiplexing circuits 32.
  • the control signals are generated so that the outputs of multiplexing circuits that contain pixels from the selected block are coupled to access ports 17.
  • For each access port 17 one multiplexing circuit 32 is selected. It is possible to do so because a plurality of rows of buffer memory has been mutually shifted. For each relevant row of buffer memory W outputs of the shift register 30 for that row are output.
  • Y addressing circuit 15 selects W columns in the Y ⁇ row of buffer memory 14, starting from column number X+Y*W modulo N, wrapping around to the zeroth column if necessary. IfW is smaller than N, addressing circuit 15 also selects W columns in the (Y+l) st row, from column number X+(Y+1)*W modulo N, wrapping around to the zeroth column if necessary.
  • Figure 3b illustrates a column addressing part for generating the addresses for different columns.
  • Lookup-table (LUT) circuits 36 are provided that outputs required address offsets for successive access ports 17.
  • the mode signal M, the X address and (part of) the Y address are used to as inputs to the LUT circuits 36.
  • the LUT circuit is arranged to generate offset values for respective ones of the columns as a function of mode signals M and window-relative X, Y address values.
  • Adder circuits 38 are provided to add the Y address to the outputs of the LUT circuit 36. It will be appreciated that many implementations of addressing circuit 15 are possible to realize addressing.
  • Adder circuits 38 may be integrated with LUT circuit 36 and/or with decoding circuits to generate selection signals for individual multiplexing circuits 32.
  • a computation circuit is provided for each access port 17, to compute the row of buffer memory that should be coupled to that access port 17.
  • the column addressing part effectively computes integer row addresses Y+j that satisfy the conditions
  • addressing circuit 15 controls the amount of shift in each column dependent only on the width W of the block. This may complicate address computation. Moreover, it has the effect that the access port that provides access to the pixel value for the upper left pixel location of the block depends on the X and Y address of the block. This pixel location is output to the X+Y*W th one of access ports 17.
  • a shift circuit (not shown) is coupled between access ports 17 and multiplexing circuits 32 to shift the pixel values circularly over X+Y*W positions so that predetermined access ports 17 provide access to pixel values for predetermined pixel locations relative to the upper left corner of the block.
  • processing circuits 16 can be adapted to the pixel values.
  • the dependence on the Y address is removed by adapting the amount of shift in the shift registers to the Y address and/or to the X address.
  • LUT circuit 34 may be provided with an input for Y address signals and/or X address signals to realize this for example.
  • addressing circuit 15 may be arranged to shift the pixel values in the ith shift register 30 over (i- Y)* W-X modulo N, where X and Y are the window relative X and Y address of the upper left pixel location of the block with which buffer memory 14 is addressed.
  • X and Y are the window relative X and Y address of the upper left pixel location of the block with which buffer memory 14 is addressed.
  • addressing circuit may control the shift of the ith shift register 30 according to (i-Y)*W modulo N.
  • the addresses of the columns can be determined from the Y address plus a Y address independent offset.
  • -X modulo N is added so that a dependence of access ports 17 on only Y position in the block is realized.
  • addressing circuit 15 computes decoded selection signals and outputs different selection signals to individual multiplexing circuits 32 to provide on/off control over access ports 17.
  • addressing circuit 15 may supply coded address signals to multiplexing circuits 32 of a column in common, in which case multiplexing circuits 32 should be arranged to do the decoding as well.
  • Shifting may be controlled by enabling each shift register 30 to shift data for a controlled number of shift cycles.
  • addressing circuit enables the i th shift register 30 for i*W modulo N shift cycles.
  • FIG. 4 shows an alternative shift register circuit 40 that may be used in a row of buffer memory 14.
  • Shift register circuit 40 comprises a circular series of registers and multiplexers 44 inserted between registers 42. Each multiplexer 44 has a first input coupled to an output of a preceding register 42 and the output of a register 42 a predetermined number of positions further back along the series.
  • Addressing circuit 15 (not shown) controls multiplexers 44 and registers 42.
  • a shift step multiplexers 44 can be used to select between shifting by one positions in register 40 or over a plurality of positions in a single shift cycle. Thus, the number of shift cycles that is needed to realize a large amount of shift can be reduced by controlling multiplexers 44 for selected shift registers 40 to pass pixel values from said predetermined number of positions back along the series.
  • each multiplexer 44 has selectable inputs coupled to outputs of each of the registers 42 from which pixel values may be shifted.
  • addressing circuit 35 controls the multiplexers 44 in each row of buffer memory to pass pixel values from registers 42 at a required distance along the cyclical series (e.g. from a distance i*W, i being the row number of the shift register 40 in buffer memory 14).
  • a single shift cycle suffices to shift data along each shift register 40.
  • only a limited number of shift amounts is allowed (for example according to the different block sizes) and selectable inputs of multiplexers 44 for all these amounts are provided, but not for other amounts.
  • control circuit 18 is arranged to supply a fresh mode control signal for each access operation.
  • each access instruction that is processed by control circuit preferably contains a field for a mode control signal value that control circuit supplies to addressing circuit 15.
  • the mode control signal may be set for a plurality of access operations, for example until a new mode control signal value is set.
  • a separate instruction may be used to set the mode control signal value for subsequent access instructions.
  • the pixel values in shift registers 30 (or 40) are shifted in response to the instruction for setting the new mode control signal value.
  • Figure 5 shows an embodiment wherein the shift registers 30 or 40 are replaced by sets of registers 50 coupled to a shift circuit 52 (e.g. a barrel shifter) that is controlled by addressing circuit 15.
  • the outputs of shift circuit 52 are coupled to access ports 17 by multiplexing circuits 54 (not shown in detail) similar to multiplexing circuits 32.
  • buffer memory 14 of figure 5 performs row dependent shifting by means of circuit switching in shift circuits 52, that is, without transferring data from one register in a shift register chain to another. This may speed up access, but it requires a more complicated access a less compact circuit than when shift registers are used.
  • addressing circuit 15 causes shift registers 30 (or 40) to shift the pixel values for each access operation by the amounts of shift needed for the access operation and back again over the same amount after the access operation.
  • Figure 6 shows an embodiment of a shift control part of addressing circuit 15 wherein the pixel values are shifted only before an access operation.
  • a register 60 is provided for storing a last used mode control signal and a LUT circuit 34 (or any other circuit with the same input output function) for outputting shift control signal corresponding to the difference, if any, between the existing amount of shift (as indicated by the old mode control signal value) and the newly required amount of shift for the new access operation.
  • the stored mode control signal value is updated.
  • the number of shift operations can be minimized, which saves power and time.
  • the amount of shift also depends on X and/or Y addresses, are preferably stored register 60 as well and supplied to LUT circuit 34.
  • addressing circuit 35 is arranged to disable shifting in the shift registers that are not addressed. If this embodiment is combined with the embodiment that use differential shifting, preferably registers are provided to represent the previous amount of shift for respective columns and the shift is controlled dependent on the change in shift with respect to that previous amount of shift.
  • Interface circuit 12 supports transfer of pixel values between main memory 10 and buffer memory 10. Any mechanism may be used to control transfer. In one embodiment transfer is linked to operation cycles of processing circuits 16.
  • control circuit initially sends a signal to interface circuit 12 that triggers interface circuit 12 to load pixel values for a window of pixel locations from main memory 10 into buffer memory 14.
  • Next control circuit 18 starts a processing cycle and upon completion of access during that operation cycle control circuit 18 sends a signal to interface circuit 12 that triggers interface circuit 12 to load an additional column of pixel values for a window of pixel locations from main memory 10 into buffer memory 14.
  • the shift mechanism of buffer memory 14 may be used to realize that the new pixel values can be written in predetermined registers.
  • interface circuit 12 may be arranged to transfer pixel values from main memory 10 to buffer memory 14 only, in an alternative embodiment interlace circuit 12 may arranged to transfer pixel values from buffer memory 14 to main memory 10 only (after processing), or to transfer pixel values from main memory 10 to buffer memory 14 before processing and back from buffer memory 14 to main memory 10 after processing.
  • control circuit 18 addresses pixel values by means of window-relative X Y addresses
  • an alternative embodiment may use absolute (image-relative) XY addresses.
  • a translation circuit may be provided to translate the absolute addresses into window relative X Y addresses, given an XY address of the window that is stored in buffer memory.
  • the internal addresses (and optionally shift amounts) in buffer memory may be computed directly from the absolute addresses.
  • the invention has been described by means of examples wherein there is a one to one correspondence between image lines and rows of buffer memory 14.
  • pixel values for more than one image line may be stored in one row or pixel values for one image line may be stored distributed over a plurality of rows of buffer memory 14.
  • the number of pixel locations from an image line that is stored in buffer memory 14 is an integer multiple of the number of memory locations in a row, or conversely the number of memory locations in a row is an integer multiple of the number of pixel locations from an image line that is stored in buffer memory 14.
  • the circuit is not limited to these integer multiples.
  • Figure 7a-b show fractional storage of image lines for an example wherein buffer memory has six access ports and rows of six memory locations.
  • the figure shows a matrix of rows and columns that correspond to rows (shift registers 30 or 40) and columns (access ports 17) of memory locations in buffer memory 14.
  • Pixel values for line segments of pixel locations in an image are stored in the memory locations.
  • Memory locations that store the pixel values for the leftmost pixel locations on the line segments are indicated by circles. In the case of fractional storage, as shown, these memory locations are not in the same column.
  • the figures also indicate the location of pixel values for line segments of rectangular blocks.
  • Figure 7a illustrates the storage location of pixel values for a 4x2 block of pixel locations with crosses, triangles and squares.
  • Figure 7b similarly illustrates the storage locations of pixel values for a 3x6 block of pixel locations.
  • addressing circuit 15 When blocks of selectable dimensions must be accessed a mode signal is supplied to addressing circuit 15 to indicate the dimensions of the selected block.
  • addressing circuit 15 also receives information about the length of the line segments that are stored in memory, or about the offsets between the starting memory locations of the line segments.
  • an X, Y addresses of the block is supplied.
  • Addressing circuit 15 controls shift amounts and addresses to allow pixel values for different line segments to be output on different access ports 17. For this purpose addressing circuit 15 controls shift amounts and addresses as a function of the mode signal and the length of the stored line segments.
  • the figures indicate pixel location from respective image lines that will be accessed in parallel on access port 17 by crosses and triangles respectively.
  • four pixel values from one line in the block are output in parallel and two pixels of a next line are accessed.
  • three pixel values from one line in the block are output in parallel and three pixels of a next line.
  • addressing circuit 15 may use shift amounts of zero and two for example.
  • the row addresses generated by addressing circuit 15 for the first two access ports address the second row and the addressed for the third to sixth access port address the first row.
  • addressing circuit 15 may use shift amounts of two, one and one for the first second and third row for example.
  • the row addresses generated by addressing circuit 15 for the first three access ports address the first row, the addresses for the following two access ports address the second row and the address for the last access port address the third row.
  • the addressing pattern is less regular in this case than for the example where the image lines start at the same position in each row of buffer memory 14.
  • a skilled person may select such combinations of shift amounts subject to the constraint that they have the result that the required pixel locations are shifted to mutually different sets of access ports.
  • the skilled person can then provide LUT circuits or any other convenient circuits that output control signals that effect the selected combination of shift amounts and addresses in response to information about the configuration of block dimension, block address and line segment lengths.
  • this technique can be used to support blocks of any shape, not just rectangular blocks.
  • a buffer memory 14 that contains one pixel value per memory location
  • more than one pixel values may be stored in each location, for example a group of four pixel values for successive pixel locations in the X direction.
  • this group preferably defines the basic granularity of access, transfer to the access port 17 being controlled in common for a group and the pixel values of the group being accessed together at the same access port 17.
  • this will limit addressing and block sizes to integer multiples of the granularity, however this is acceptable for image processing operations in most applications.
  • a non-circular shift register may be used (that is, a register that does not shift in pixel values at the left when these pixel values are shifted out at the right or vice versa).
  • the non- linear shift register is preferably wider than a row of the window, with additional locations for receiving pixel values from the window when they are shifted left or right, so that no pixel values from the window are lost when the data is shifted back and forth as required for the different modes. It will be appreciated that a smaller shift register can be used when a circular shift register is available. Also split presentation of the blocks is possible in this case.
  • Addressing circuit 15 may be implemented as part of processing circuits 16, or as part of a larger processing circuit that replaces a set of individual processing circuits. In this way block selection and addressing may be controlled by means of instructions in such a processing circuit.
  • addressing circuit 15 can be integrated with buffer memory 14.
  • part of the function of control circuit 18 may be integrated in buffer memory 14 and/or with addressing circuit 15, for example to detect when new data must be fetched from main memory 10 on the basis of address use and/or information from processing circuits 16 about future addressing.

Abstract

An image processing circuit has a buffer memory (14) for storing pixel values for pixel locations in a two-dimensional moveable window within an image. The buffer memory (14) comprises a plurality of functional rows of memory circuits (30) for storing pixel values from the window. A plurality of access ports (17) is provided, each for providing access to an addressable pixel value from a respective group of pixel values from respective ones of the rows. Shift circuits (32) are provided between the memory circuits (30) and the access ports, or as part of the arrangement of memory circuits (3). Each shift circuit is provided for a respective row and arranged to shift assignment of pixel values from the respective row to the groups. An addressing circuit (15) has inputs for receiving an address of a two-dimensional block of pixel locations and a mode signal indicative of a dimensions of the block. The addressing circuit (15) controls the shift circuits (32) to set respective amounts of the shift for respective ones of the rows dependent on the dimensions indicated by the mode signal. The addressing circuit sets the amounts of shift to a values whereby pixel values for respective lines of pixel locations in the block that are stored in different ones of the rows are assigned to mutually non-overlapping groups. In this way the pixel values of the block can be accessed in parallel.

Description

Image processing circuit with block accessible buffer memory
The invention relates to an image processing circuit and to a method of processing an image.
It is known to provide an image processing system with a buffer memory to provide fast access to storing pixel values for a rectangular window of pixel locations from a larger image. Popular image processing tasks, such as the computation of a DCT (Discrete Cosine Transform), two-dimensional filtering and matching between nearby blocks of pixels require repeated execution of the same operation, each time applied to pixel values for a different window of pixel locations. For each execution of the operation the pixel values for the corresponding window are kept in the buffer memory, so that these pixel values can be accessed quickly as part of the operation.
Before the next execution of the operation the content of the buffer memory is updated. Typically, a sliding window is used which moves over a predetermined horizontal distance form one execution of the operation to another. In this case usually only the pixel values for the rightmost column(s) of the image are newly loaded to replace the pixel values of the leftmost column(s).
To support this partial replacement some form of circular addressing is preferably used in the buffer memory. This can be realized for example by means of some form of circular address translation of window-relative X-Y addresses into memory addresses. Thus, for a horizontally moving window, a same memory address is the translation of X-Y addresses with successively decreasing X values. When the X value has been decreased outside the window, a new pixel value is loaded for the memory address and the X value is increased by the window size.
Of course, other solutions can be used to support partial replacement. As an alternative, old pixel values can be moved to shifted memory locations between each execution of the operation for different windows and new pixel values can be loaded into vacated memory locations. That is, shift registers are effectively used for respective lines of the image. In this case, a fixed translation of X-Y addresses into memory locations can be used to address locations in the shift registers. Advantageously, a buffer memory for image processing provides for parallel access to a plurality of pixel values, for example in parallel to pixel values for a horizontal line of pixel locations in the window. In this way parallel processor circuits can be used, each for processing one or more pixel values for a respective pixel location or group of pixel locations in parallel with the other processing circuits.
The size and organization of the buffer memory is typically chosen dependent on the operation that must be performed. If an operation requires blocks of pixel values for 16x16 pixel locations in the X and Y direction, a buffer memory of 256 (=16x16) memory locations is typically used and the X-Y addresses of the operation are preferably contain two four bit parts for addressing X and Y locations relative to the window respectively.
However, not all processing tasks require pixel values for a window of the same size. Some operations require data for windows of 8x8 pixel locations; others require data for windows of 16x16 pixel locations or for 9x9 or 17x17 pixel locations etc. For fast processing it is desirable that the buffer memory has sufficient locations to store all pixel values for the largest required window.
In this case, when an operation is executed that requires a smaller window, only part of the pixel values will be used. When the buffer memory supports parallel access pixel values for a line of pixel locations only part of the parallel access capacity will be used. When a plurality of parallel processor circuits is provided that corresponds to the largest possible window size only part of these processor circuits will be used.
Among others it is an object of the invention to increase the utilization of processing resources in an image processing system that provides for parallel access to a plurality of memory locations in a buffer memory for storing pixel values for an at least two- dimensional window of pixel locations. The invention provides for an image processing circuit according to Claim 1.
The invention provides for use of blocks of different dimensions during access to a buffer memory. The term "block" will be used herein to refer to collections of pixel locations within a window of pixel locations. Typically the pixel values for locations of the block are accessed in parallel. Respective shift circuits are provided for respective functional rows of the buffer memory to change the access ports through which the pixel values of the rows can be accessed independently of one another. Amounts of shift for different rows are selected dependent on a mode signal that indicates the dimensions of a block that is accessed. The amounts of shift used by the circuit have values that meet the condition that pixel values for respective lines of pixel locations in the block that are stored in different ones of the rows can be accessed in parallel. Thus for example, if a block with lines of N/m pixels wide is used and N access ports are available N pixels can be accessed in parallel when access to pixel values in different rows is shifted over integer multiples of N/m relative to one another. The functional rows are preferably implemented as geometrical rows in a memory matrix on an integrated circuit, but the invention is not limited to functional rows that are also geometrical rows.
Preferably, shifting is realized by implementing the functional rows of the buffer memory each as a circular shift register, pixel values being transferred along the shift register to implement access via different access ports. A simple serially connected chain of registers may be used. Alternatively multiplexers may be used between the registers in the shift register to provide for selectable shifting steps to speed up shifting. In another embodiment non-circular shift registers may be used, but in this case typically larger shift registers are needed, which are wider than a row of the window, to preserve pixel values that are shifted "out of view" These and other objects and advantageous aspects of the invention will be described using non- limitative examples of embodiments.
Figure 1 shows an image processing circuit Figure 2a-b illustrate relations between pixel locations and parallel output of pixels
Figure 3 shows a buffer memory
Figure 3 a shows a shift control part
Figure 3b shows a column addressing part Figure 4 shows a shift register
Figure 5 shows a buffer memory
Figure 6 shows a further shift control part
Figure 7a-b show fractional storage of image lines
Figure 1 shows an image processing circuit that comprises a main memory 10, a memory interface 12, a buffer memory 14, an addressing circuit 15, a plurality of processing circuits 16 and a control circuit 18. Buffer memory 14 is coupled to main memory 10 via interface 12. Buffer memory 14 has parallel data access ports 17 coupled to respective ones of processing circuits 16. Control circuit 18 has an instruction output coupled to processing circuits 16, an address/control output coupled to addressing circuit 15 and a control output coupled to interface 12. Addressing circuit 15 has outputs coupled to buffer memory 14. Although three access ports 17 and processing circuits 16 have been shown, it should be understood that any number may be provided in parallel, for example eight or sixteen access ports 17 and processing circuits 16.
In an embodiment control circuit comprises an instruction memory, for storing a program of instructions, a program counter to address these instructions. The instructions include instructions that comprise a command part and an address part. In this embodiment control circuit is arranged to feed the address part of the issued instructions to addressing circuit 15 and the command part of the instructions to processing circuits 16. Furthermore the program counter is coupled to interface circuit 12, for triggering a transfer between main memory 10 and buffer memory 14 at a predetermined position in a cycle of instructions. In this embodiment processing circuits 16 are SIMD processing circuits, arranged to receive the same instruction from control circuit 18. However, it should be realized that the invention is not limited to this embodiment. Other forms of providing an address to addressing circuit 15 and controlling operation of processing circuit may be used. Furthermore, although each processing circuit 16 is shown connected to one access port 17, it should be appreciated that in a further embodiment each processing circuit 16 may be connected to a plurality of access ports, e.g. to of its neighbors as well or even to those of the neighbors of its neighbors etc. When the access ports are read ports this allows more complicated operations to be performed in each processing circuit 16. In another embodiment sets of processing circuits 16 may be replaced by respective larger processing circuits each with inputs coupled to a plurality of access ports 17 to perform more complicated operations.
In operation control circuit 18 supplies instructions to processing circuits 16 and window-relative addresses and mode signals to addressing circuit 15. Typically control circuit 18 supplies at most one window relative address-to-address pixel value for all processing circuits 16. Addressing circuit 15 translates a combination of each window relative address and the mode signals into control signals for buffer memory 14.
Buffer memory 14 may be arranged to support both reading and writing from processing circuits 16, or only reading or only writing. When both reading and writing is possible control circuit also sends read/write control signals to addressing circuit 15, in association with window-relative addresses. For the sake of illustration a read operation will be discussed. In response to the address for the read operation buffer memory 14 retrieves pixel values for a plurality of pixel locations from buffer memory 14 and outputs these pixel values in parallel at ports 17.
Buffer memory 14 and addressing circuit 15 support control of the relation between the pixel locations of the pixel values that have been loaded from main memory 14 and the combination in which these pixel values will be output in parallel during reading.
Figures 2a-b illustrate the relation between the pixel locations and the combination of pixel values that will be output during a first and second mode of operation respectively for an example where there are four access ports 17 and processing circuits 16 in parallel. In the first mode pixel values for one line segment (one Y address) of four successive pixel locations (four X addresses) are output in parallel. In this first mode the window-relative address from control circuit 18 corresponds for example to the window relative XY-address of leftmost pixel location of the line segment. In the second mode pixel values for two successive line segments (two Y addresses) of each two-pixel locations (two X addresses) are output in parallel. In this second mode the window-relative address from control circuit 18 corresponds for example to window-relative XY address of the leftmost pixel location of the upper line segment.
The first mode can be used for example to process pixel values for blocks that are four pixel locations wide. In this mode, during execution of an instruction, each processing circuit 16 access a pixel value for a respective pixel location along a line segment in the block in parallel. The second mode can be used for example to process pixel values for blocks that are two pixel locations wide. In this mode, during execution of an instruction, processing circuits 16 are divided into two groups of each two processing circuits 16. A first group of two processing circuits 16 access pixel values for a first line segment during execution of the instruction as a result of a window-relative address that control circuit 18 applies to addressing circuit. A second group of processing circuits 16 access a pixel value for a second line segment during execution of the same instruction and as a result of the same application of the window-relative address. Thus, all processing circuits 16 can be utilized for both block widths. It is not necessary to leave part of access ports 17 and processing circuit 16 unused if the number of pixel locations along the width of the block is less than the number of access ports 17 and processing circuit 16.
Figure 3 illustrates an embodiment of buffer memory 14 and addressing circuit 15. For the sake of example a small buffer memory 14 with storage space for pixel values 4x4 pixel locations is shown, but it should be appreciated that in practice larger buffer memories may be used. In this embodiment buffer memory 14 comprises a plurality of circular shift register circuits 30. Multiplexing circuits 32 are provided, organized in groups that each correspond to a respective access port 17. Each group of multiplexing circuit 32 is coupled between outputs of respective ones of the shift registers 30 and its corresponding access port 17.
Addressing circuit 15 has an input X, Y for receiving an X, Y address and an input M for receiving a mode signal. Addressing circuit 15 has separate control outputs coupled to shift control inputs of respective ones of circular shift registers 30. Furthermore addressing circuit 15 has separate control outputs coupled to the multiplexing circuits for respective access ports 17.
Addressing circuit 15 is arranged to cause shifts in the shift registers 30 dependent on the mode signal from mode signal input M, so that the shifts in different shift registers may be mutually different. The mode signal indicates the width W of the block from which pixel values are needed. The pixel values in successive shift registers 30 are shifted over an optional common amount of shift C plus respective integer multiples of W, modulo N, where N is the number of pixel values stored in each shift register 30 (the width of the window that is stored in buffer memory). In one embodiment:
shift for ith row = C + i*W modulo N
The common amount of shift C, which is the same for all shift registers 30 may be zero C=O for example. Thus for the example of a 4x4 buffer memory N=4 and for a mode signal that indicates a block of width W=2, the pixel values in successive shift registers 30 are shifted alternately over 0 and 2 positions. Figure 3a illustrates an embodiment of a shift control part of addressing circuit
15, comprising a lookup table circuit 34 with inputs that receive the mode signal "M" and outputs for shift control signals for the respective rows of buffer memory 14 (not shown). A lookup table (LUT) circuit 34 can be implemented as a memory that uses the mode signal M as address and stores sets of control signals at addresses for respective mode signal values, for example according to the preceding relation. Alternatively a logic circuit may be used that produces the same input-output relation, or arithmetic circuits to perform computation of the control signals.
Multiplexing circuits 32 may be implemented as switches that connect the outputs of selected ones of shift registers 30 to access ports 17. Addressing circuit 15 is arranged to use a window-relative XY address signal from address signal input X, Y to generate control signals for multiplexing circuits 32. The control signals are generated so that the outputs of multiplexing circuits that contain pixels from the selected block are coupled to access ports 17. For each access port 17 one multiplexing circuit 32 is selected. It is possible to do so because a plurality of rows of buffer memory has been mutually shifted. For each relevant row of buffer memory W outputs of the shift register 30 for that row are output.
For example, in the embodiment wherein the ith row is shifted over i*W modulo N, if the window-relative address is X, Y addressing circuit 15 selects W columns in the YΛ row of buffer memory 14, starting from column number X+Y*W modulo N, wrapping around to the zeroth column if necessary. IfW is smaller than N, addressing circuit 15 also selects W columns in the (Y+l)st row, from column number X+(Y+1)*W modulo N, wrapping around to the zeroth column if necessary. Preferably, addressing circuit, if possible selects even more rows: the Y+jth row j=0, 1 , ..., at W positions starting from column number X+(Y+j)*W modulo N and wrapping around to the zeroth column if necessary. In this case the j values should satisfy j=0..jmax- 1 , with jmax*W smaller than or equal to N.
Figure 3b illustrates a column addressing part for generating the addresses for different columns. Lookup-table (LUT) circuits 36 are provided that outputs required address offsets for successive access ports 17. The mode signal M, the X address and (part of) the Y address are used to as inputs to the LUT circuits 36. The LUT circuit is arranged to generate offset values for respective ones of the columns as a function of mode signals M and window-relative X, Y address values. Adder circuits 38 are provided to add the Y address to the outputs of the LUT circuit 36. It will be appreciated that many implementations of addressing circuit 15 are possible to realize addressing. Adder circuits 38 may be integrated with LUT circuit 36 and/or with decoding circuits to generate selection signals for individual multiplexing circuits 32. In another embodiment, a computation circuit is provided for each access port 17, to compute the row of buffer memory that should be coupled to that access port 17.
In the embodiment of the example given above the column addressing part effectively computes integer row addresses Y+j that satisfy the conditions
P=X+k+(Y+j)*W modulo N
where P is the port number, k between 0 and W-I enumerates the different pixel values along a row in a block and j is between 0 and jmax- 1. Thus, for the example of a 4x4 buffer memory 14 (N=4) for a two pixel location wide block W=2, if the window relative X Y address is X=2, Y=2 addressing circuit 15 will control the multiplexing circuits 32 of the second row to couple the shift registers 30 of that row to the zeroth and first access port (P=O and 1 for k=0 and 1 respectively, corresponding to j=0). Furthermore, addressing circuit 15 will control the multiplexing circuits 32 of the third row to couple the shift registers 30 of that row to the second and third access port (P=2 and 3 for k=0 and 1 respectively, corresponding to j=l).
In this embodiment addressing circuit 15 controls the amount of shift in each column dependent only on the width W of the block. This may complicate address computation. Moreover, it has the effect that the access port that provides access to the pixel value for the upper left pixel location of the block depends on the X and Y address of the block. This pixel location is output to the X+Y*Wth one of access ports 17. In a further embodiment a shift circuit (not shown) is coupled between access ports 17 and multiplexing circuits 32 to shift the pixel values circularly over X+Y*W positions so that predetermined access ports 17 provide access to pixel values for predetermined pixel locations relative to the upper left corner of the block. Thus, processing circuits 16 can be adapted to the pixel values.
In alternative embodiments the dependence on the Y address is removed by adapting the amount of shift in the shift registers to the Y address and/or to the X address. LUT circuit 34 may be provided with an input for Y address signals and/or X address signals to realize this for example. In this case addressing circuit 15 may be arranged to shift the pixel values in the ith shift register 30 over (i- Y)* W-X modulo N, where X and Y are the window relative X and Y address of the upper left pixel location of the block with which buffer memory 14 is addressed. Thus, at the access ports 17, the pixel locations that correspond to predetermined pixel locations within an addressed block correspond to predetermined access ports 17. In this case no further shift circuit is needed to realize this at access ports 17. In alternative embodiments addressing circuit may control the shift of the ith shift register 30 according to (i-Y)*W modulo N. In this case the addresses of the columns can be determined from the Y address plus a Y address independent offset. In another embodiment or -X modulo N is added so that a dependence of access ports 17 on only Y position in the block is realized.
In one embodiment addressing circuit 15 computes decoded selection signals and outputs different selection signals to individual multiplexing circuits 32 to provide on/off control over access ports 17. Alternatively, addressing circuit 15 may supply coded address signals to multiplexing circuits 32 of a column in common, in which case multiplexing circuits 32 should be arranged to do the decoding as well.
Shifting may be controlled by enabling each shift register 30 to shift data for a controlled number of shift cycles. In one embodiment addressing circuit enables the ith shift register 30 for i*W modulo N shift cycles.
Figure 4 shows an alternative shift register circuit 40 that may be used in a row of buffer memory 14. Shift register circuit 40 comprises a circular series of registers and multiplexers 44 inserted between registers 42. Each multiplexer 44 has a first input coupled to an output of a preceding register 42 and the output of a register 42 a predetermined number of positions further back along the series. Addressing circuit 15 (not shown) controls multiplexers 44 and registers 42. A shift step multiplexers 44 can be used to select between shifting by one positions in register 40 or over a plurality of positions in a single shift cycle. Thus, the number of shift cycles that is needed to realize a large amount of shift can be reduced by controlling multiplexers 44 for selected shift registers 40 to pass pixel values from said predetermined number of positions back along the series.
It should be realized that a further reduction of the number of shift cycles can be realized by using multiplexing circuits 44 with more selectable inputs coupled to outputs of registers 42 at respective predetermined distances along the cyclical series of registers 42. In the most extreme form each multiplexer 44 has selectable inputs coupled to outputs of each of the registers 42 from which pixel values may be shifted. In this case addressing circuit 35 controls the multiplexers 44 in each row of buffer memory to pass pixel values from registers 42 at a required distance along the cyclical series (e.g. from a distance i*W, i being the row number of the shift register 40 in buffer memory 14). As a result a single shift cycle suffices to shift data along each shift register 40. In a further embodiment only a limited number of shift amounts is allowed (for example according to the different block sizes) and selectable inputs of multiplexers 44 for all these amounts are provided, but not for other amounts.
In an embodiment control circuit 18 is arranged to supply a fresh mode control signal for each access operation. In this case, each access instruction that is processed by control circuit preferably contains a field for a mode control signal value that control circuit supplies to addressing circuit 15. In an alternative embodiment, the mode control signal may be set for a plurality of access operations, for example until a new mode control signal value is set. In this case a separate instruction may be used to set the mode control signal value for subsequent access instructions. Preferably, in this alternative embodiment the pixel values in shift registers 30 (or 40) are shifted in response to the instruction for setting the new mode control signal value.
In both embodiments, different types of processing, which use different blocks sizes and/or shapes can be executed during the same task without reloading buffer memory to fit the different block sizes.
Figure 5 shows an embodiment wherein the shift registers 30 or 40 are replaced by sets of registers 50 coupled to a shift circuit 52 (e.g. a barrel shifter) that is controlled by addressing circuit 15. The outputs of shift circuit 52 are coupled to access ports 17 by multiplexing circuits 54 (not shown in detail) similar to multiplexing circuits 32. In operation buffer memory 14 of figure 5 performs row dependent shifting by means of circuit switching in shift circuits 52, that is, without transferring data from one register in a shift register chain to another. This may speed up access, but it requires a more complicated access a less compact circuit than when shift registers are used.
In one embodiment addressing circuit 15 causes shift registers 30 (or 40) to shift the pixel values for each access operation by the amounts of shift needed for the access operation and back again over the same amount after the access operation.
Figure 6 shows an embodiment of a shift control part of addressing circuit 15 wherein the pixel values are shifted only before an access operation. In this embodiment a register 60 is provided for storing a last used mode control signal and a LUT circuit 34 (or any other circuit with the same input output function) for outputting shift control signal corresponding to the difference, if any, between the existing amount of shift (as indicated by the old mode control signal value) and the newly required amount of shift for the new access operation. After the shift has been controlled the stored mode control signal value is updated. Thus, the number of shift operations can be minimized, which saves power and time. When the amount of shift also depends on X and/or Y addresses, are preferably stored register 60 as well and supplied to LUT circuit 34.
In another embodiment addressing circuit 35 is arranged to disable shifting in the shift registers that are not addressed. If this embodiment is combined with the embodiment that use differential shifting, preferably registers are provided to represent the previous amount of shift for respective columns and the shift is controlled dependent on the change in shift with respect to that previous amount of shift.
Interface circuit 12 supports transfer of pixel values between main memory 10 and buffer memory 10. Any mechanism may be used to control transfer. In one embodiment transfer is linked to operation cycles of processing circuits 16. In this embodiment control circuit initially sends a signal to interface circuit 12 that triggers interface circuit 12 to load pixel values for a window of pixel locations from main memory 10 into buffer memory 14. Next control circuit 18 starts a processing cycle and upon completion of access during that operation cycle control circuit 18 sends a signal to interface circuit 12 that triggers interface circuit 12 to load an additional column of pixel values for a window of pixel locations from main memory 10 into buffer memory 14. In this case the shift mechanism of buffer memory 14 may be used to realize that the new pixel values can be written in predetermined registers. It should be noted that this only requires the same amount of shift for all rows, not the mode dependent amounts of shift during operation. Although in one embodiment interface circuit 12 may arranged to transfer pixel values from main memory 10 to buffer memory 14 only, in an alternative embodiment interlace circuit 12 may arranged to transfer pixel values from buffer memory 14 to main memory 10 only (after processing), or to transfer pixel values from main memory 10 to buffer memory 14 before processing and back from buffer memory 14 to main memory 10 after processing.
It should be noted that other forms of transfer between main memory 10 and buffer memory 14 may be used. Instead of transfer linked to operation cycles (triggered for example by a program counter value), transfer in response to an explicit transfer instruction, or as a result of addressing outside buffer memory may be used. Although the invention has been described for an embodiment wherein control circuit 18 addresses pixel values by means of window-relative X Y addresses, an alternative embodiment may use absolute (image-relative) XY addresses. In this alternative embodiment a translation circuit may be provided to translate the absolute addresses into window relative X Y addresses, given an XY address of the window that is stored in buffer memory. Alternatively, the internal addresses (and optionally shift amounts) in buffer memory may be computed directly from the absolute addresses.
The invention has been described by means of examples wherein there is a one to one correspondence between image lines and rows of buffer memory 14. However, alternatively pixel values for more than one image line may be stored in one row or pixel values for one image line may be stored distributed over a plurality of rows of buffer memory 14. Preferably, in this case the number of pixel locations from an image line that is stored in buffer memory 14 is an integer multiple of the number of memory locations in a row, or conversely the number of memory locations in a row is an integer multiple of the number of pixel locations from an image line that is stored in buffer memory 14. In a further embodiment the circuit is not limited to these integer multiples.
Figure 7a-b show fractional storage of image lines for an example wherein buffer memory has six access ports and rows of six memory locations. The figure shows a matrix of rows and columns that correspond to rows (shift registers 30 or 40) and columns (access ports 17) of memory locations in buffer memory 14. Pixel values for line segments of pixel locations in an image are stored in the memory locations. Memory locations that store the pixel values for the leftmost pixel locations on the line segments are indicated by circles. In the case of fractional storage, as shown, these memory locations are not in the same column. The figures also indicate the location of pixel values for line segments of rectangular blocks. Figure 7a illustrates the storage location of pixel values for a 4x2 block of pixel locations with crosses, triangles and squares. Figure 7b similarly illustrates the storage locations of pixel values for a 3x6 block of pixel locations.
Now when blocks of selectable dimensions must be accessed a mode signal is supplied to addressing circuit 15 to indicate the dimensions of the selected block. In this embodiment addressing circuit 15 also receives information about the length of the line segments that are stored in memory, or about the offsets between the starting memory locations of the line segments. In addition, an X, Y addresses of the block is supplied. Addressing circuit 15 controls shift amounts and addresses to allow pixel values for different line segments to be output on different access ports 17. For this purpose addressing circuit 15 controls shift amounts and addresses as a function of the mode signal and the length of the stored line segments.
The figures indicate pixel location from respective image lines that will be accessed in parallel on access port 17 by crosses and triangles respectively. As can be seen in the example of figure 7a four pixel values from one line in the block are output in parallel and two pixels of a next line are accessed. As can be seen in the example of figure 7b three pixel values from one line in the block are output in parallel and three pixels of a next line.
The shift amounts and addresses for this type of access are more complicated than for the case wherein the number of stored pixel locations equals the number of memory locations per row of buffer memory 14. In the case of figure 7a addressing circuit 15 may use shift amounts of zero and two for example. The row addresses generated by addressing circuit 15 for the first two access ports address the second row and the addressed for the third to sixth access port address the first row. In the case of figure 7b addressing circuit 15 may use shift amounts of two, one and one for the first second and third row for example. The row addresses generated by addressing circuit 15 for the first three access ports address the first row, the addresses for the following two access ports address the second row and the address for the last access port address the third row.
As will be appreciated the addressing pattern is less regular in this case than for the example where the image lines start at the same position in each row of buffer memory 14. However, it is straightforward to select combinations of shift amounts and addresses that allow parallel access to pixel values from a plurality of lines for any given configuration of block dimensions, block address and line segment length of image line segments in buffer memory 14. A skilled person may select such combinations of shift amounts subject to the constraint that they have the result that the required pixel locations are shifted to mutually different sets of access ports. The skilled person can then provide LUT circuits or any other convenient circuits that output control signals that effect the selected combination of shift amounts and addresses in response to information about the configuration of block dimension, block address and line segment lengths. As will be appreciated this technique can be used to support blocks of any shape, not just rectangular blocks.
Although the invention has been described by means of specific embodiments it should be appreciated that the invention is not limited to these embodiments. For example it should be realized that a larger buffer memory size than 4x4 pixel locations can be used. Nor is the buffer memory 14 limited to a memory for a square window.
Furthermore, although the invention has been described for a buffer memory 14 that contains one pixel value per memory location, it should be realized that more than one pixel values may be stored in each location, for example a group of four pixel values for successive pixel locations in the X direction. In this case this group preferably defines the basic granularity of access, transfer to the access port 17 being controlled in common for a group and the pixel values of the group being accessed together at the same access port 17. Of course this will limit addressing and block sizes to integer multiples of the granularity, however this is acceptable for image processing operations in most applications.
Furthermore, although the use of a circular shift register has been described, it should be appreciated that alternatively a non-circular shift register may be used (that is, a register that does not shift in pixel values at the left when these pixel values are shifted out at the right or vice versa). In this case, the non- linear shift register is preferably wider than a row of the window, with additional locations for receiving pixel values from the window when they are shifted left or right, so that no pixel values from the window are lost when the data is shifted back and forth as required for the different modes. It will be appreciated that a smaller shift register can be used when a circular shift register is available. Also split presentation of the blocks is possible in this case.
Addressing circuit 15 may be implemented as part of processing circuits 16, or as part of a larger processing circuit that replaces a set of individual processing circuits. In this way block selection and addressing may be controlled by means of instructions in such a processing circuit. Alternatively, addressing circuit 15 can be integrated with buffer memory 14. Furthermore, part of the function of control circuit 18 may be integrated in buffer memory 14 and/or with addressing circuit 15, for example to detect when new data must be fetched from main memory 10 on the basis of address use and/or information from processing circuits 16 about future addressing.
Although the invention has been explained mainly for embodiments where the access involves reading of pixel values from buffer memory 14, it should be realized that the invention may alternatively be applied to writing only or to reading and writing or to reading only. This merely affects the direction in which signal flow is supported between the access ports 17 and registers in the shift registers 30, 40 or registers 50. In case of writing multiplexing circuits 32, 54 strictly speaking function as demultiplexing circuits, but this makes no difference if these multiplexing circuits are implemented using switched connections.

Claims

CLAIMS:
1. An image processing circuit with a buffer memory (14) for storing pixel values for pixel locations in a two-dimensional moveable window within an image, the buffer memory (14) comprising: a plurality of functional rows of memory circuits (30) for storing pixel values from the window; a plurality of access ports (17), each for providing access to an addressable pixel value from a respective group of pixel values from respective ones of the rows; shift circuits (32), each for a respective row and arranged to shift assignment of pixel values from the respective row to the groups; - an addressing circuit (15), comprising inputs for receiving an address of a two- dimensional block of pixel locations and a mode signal indicative of dimensions of the block, the addressing circuit (15) being arranged to control the shift circuits (32) to set respective amounts of the shift for respective ones of the rows dependent on the dimensions indicated by the mode signal, the amounts of shift satisfying a condition that pixel values for respective lines of pixel locations in the block that are stored in different ones of the rows are assigned to mutually non-overlapping groups.
2. An image processing circuit according to Claim 1, wherein the shift circuit (32) and the pixel value memory circuits (30) for each row form a respective shift register circuit.
3. An image processing circuit according to Claim 2, wherein the addressing circuit (15) is arranged to cause shifts back and forth according to the amounts of shifts in the shift registers before and after access respectively.
4. An image processing circuit according to Claim 2, wherein the addressing circuit (15) comprises storage elements (60) for representing last used amounts of shifts for the rows, the addressing circuit being arranged to perform differential shifts shifting pixel values through the shift register over a distance that corresponds to a difference between the amounts for successive accesses.
5. An image processing circuit according to Claim 2, wherein the shift register circuit (40) comprises multiplexers (44), each with an output coupled to a respective one of the memory circuits (42) and with inputs coupled to outputs of memory circuits (42) at respective distances along the shift register circuit (40) from said respective one of the memory circuits (42), the addressing circuit (15) being coupled to the multiplexer circuits (44) to control input selection dependent on the amount of shift for the row.
6. An image processing circuit according to Claim 2, wherein for each row the shift circuit (32) is coupled between the memory circuits (30) and the access port (17), and arranged to provide a plurality of shifted connections between the memory circuits (30) to the access ports (17).
7. An image processing circuit according to Claim 2, wherein the shift registers are circular shift registers each arranged to circularly shift assignment of pixel values from the respective row to the groups.
8. An image processing circuit according to Claim 1, wherein the addressing circuit (15) is arranged to generate respective row addresses for respective ones of the groups, dependent on the mode signal, to select the row of the addressable pixel value from the group to which access is provided via the access port (17) for the group, the addresses meeting a condition that said different ones of the rows are addressed for the non-overlapping groups respectively.
9. An image processing circuit according to Claim 1, comprising a main memory (10) and an interface circuit (12) coupled between the main memory (10) and the buffer memory (14) and arranged to transfer pixel values for the window between the main memory (10) and the buffer memory (14).
10. An image processing circuit according to Claim 1, comprising a plurality of parallel pixel value processing circuits (16), each coupled to a respective one of the access ports (17) or each coupled to a respective subset of the access ports (17).
11. An image processing circuit according to Claim 10, wherein the pixel value processing circuits (16) are programmed to access pixel values for the same pixel locations from the window for blocks of mutually different dimensions, and to send mode signals indicating the dimensions of the blocks between the accesses.
12. A method of processing pixel values for pixel locations in an image, using a buffer memory (14) for storing pixel values for pixel locations in a two-dimensional window within the image, the buffer memory (14) comprising a plurality of access ports (17) for accessing pixel values in parallel, each access port (17) providing access to an addressable pixel value from a respective group of pixel values from respective functional rows of the buffer memory (14), the method comprising: sending a signal to the buffer memory (14) to indicate dimensions of a block of pixel locations for which pixel values will be accessed in parallel; - setting, for respective ones of the rows, respective amounts of shift of assignment of pixel values from the respective ones of the rows to the groups, the assignments being set dependent on the indicated dimensions, the assignments satisfying a condition that pixel values for respective lines of pixel locations in the block that are stored in different ones of the rows are assigned to mutually non-overlapping groups as result of the amounts of shift; accessing the pixel values for the respective lines in parallel via the access ports (17).
13. A method according to Claim 12, wherein the functional rows each comprise a respective shift register (40), with memory locations (42) for respective pixel values, the method comprising shifting the pixel values through the shift registers (40) to realize said amounts of shift.
PCT/IB2006/051411 2005-05-10 2006-05-04 Image processing circuit with block accessible buffer memory WO2006120620A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP06765667A EP1882235A2 (en) 2005-05-10 2006-05-04 Image processing circuit with block accessible buffer memory
JP2008510698A JP2008541259A (en) 2005-05-10 2006-05-04 Image processing circuit having buffer memory capable of block access

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP05103871.9 2005-05-10
EP05103871 2005-05-10

Publications (2)

Publication Number Publication Date
WO2006120620A2 true WO2006120620A2 (en) 2006-11-16
WO2006120620A3 WO2006120620A3 (en) 2007-03-08

Family

ID=37086103

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/051411 WO2006120620A2 (en) 2005-05-10 2006-05-04 Image processing circuit with block accessible buffer memory

Country Status (4)

Country Link
EP (1) EP1882235A2 (en)
JP (1) JP2008541259A (en)
CN (1) CN101218604A (en)
WO (1) WO2006120620A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010095944A1 (en) 2009-02-20 2010-08-26 Silicon Hive B.V. Multimode accessible storage facility

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633477B (en) * 2017-10-20 2021-04-20 上海兆芯集成电路有限公司 Image processing method and device
CN110610679B (en) * 2019-09-26 2021-04-16 京东方科技集团股份有限公司 Data processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020012055A1 (en) * 1999-12-20 2002-01-31 Osamu Koshiba Digital still camera system and method
US20020135586A1 (en) * 2001-01-18 2002-09-26 Lightsurf Technologies, Inc. Programmable sliding window for image processing
US20020171655A1 (en) * 2001-05-18 2002-11-21 Sun Microsystems, Inc. Dirty tag bits for 3D-RAM SRAM

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020012055A1 (en) * 1999-12-20 2002-01-31 Osamu Koshiba Digital still camera system and method
US20020135586A1 (en) * 2001-01-18 2002-09-26 Lightsurf Technologies, Inc. Programmable sliding window for image processing
US20020171655A1 (en) * 2001-05-18 2002-11-21 Sun Microsystems, Inc. Dirty tag bits for 3D-RAM SRAM

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010095944A1 (en) 2009-02-20 2010-08-26 Silicon Hive B.V. Multimode accessible storage facility
US8762686B2 (en) 2009-02-20 2014-06-24 Intel Corporation Multimode accessible storage facility

Also Published As

Publication number Publication date
WO2006120620A3 (en) 2007-03-08
CN101218604A (en) 2008-07-09
JP2008541259A (en) 2008-11-20
EP1882235A2 (en) 2008-01-30

Similar Documents

Publication Publication Date Title
US7941634B2 (en) Array of processing elements with local registers
US6205533B1 (en) Mechanism for efficient data access and communication in parallel computations on an emulated spatial lattice
US9268746B2 (en) Architecture for vector memory array transposition using a block transposition accelerator
US20100088475A1 (en) Data processing with a plurality of memory banks
RU2006124538A (en) DATA PROCESSING DEVICE AND METHOD FOR MOVING DATA BETWEEN REGISTERS AND MEMORY
JPH09106342A (en) Rearrangement device
KR100874949B1 (en) Single instruction multiple data processor and memory array structure for it
EP1792258B1 (en) Interconnections in simd processor architectures
KR101412392B1 (en) Multimode accessible storage facility
US7355917B2 (en) Two-dimensional data memory
WO2006120620A2 (en) Image processing circuit with block accessible buffer memory
EP2024928B1 (en) Programmable data processing circuit
US5008852A (en) Parallel accessible memory device
US7945760B1 (en) Methods and apparatus for address translation functions
US5381406A (en) Time switching circuit
US9798550B2 (en) Memory access for a vector processor
US6467020B1 (en) Combined associate processor and memory architecture
WO2022047403A1 (en) Memory processing unit architectures and configurations
US7756207B2 (en) Method for pre-processing block based digital data
CN112712457A (en) Data processing method and artificial intelligence processor
CN116150046B (en) Cache circuit
EP0358374B1 (en) Data transfer between memories
CN117785119A (en) Semiconductor device with a semiconductor device having a plurality of semiconductor chips
JP2647380B2 (en) Color image processing equipment
KR19990019195A (en) Output data rearrangement device according to the method of reading frame memory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006765667

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2008510698

Country of ref document: JP

NENP Non-entry into the national phase in:

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

NENP Non-entry into the national phase in:

Ref country code: RU

WWW Wipo information: withdrawn in national office

Country of ref document: RU

WWE Wipo information: entry into national phase

Ref document number: 200680025132.3

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2006765667

Country of ref document: EP