US6272257B1 - Decoder of variable length codes - Google Patents

Decoder of variable length codes Download PDF

Info

Publication number
US6272257B1
US6272257B1 US09/025,613 US2561398A US6272257B1 US 6272257 B1 US6272257 B1 US 6272257B1 US 2561398 A US2561398 A US 2561398A US 6272257 B1 US6272257 B1 US 6272257B1
Authority
US
United States
Prior art keywords
data
instruction
encoded
length
variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US09/025,613
Other languages
English (en)
Inventor
Tomasz Thomas Prokop
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AUPO6489A external-priority patent/AUPO648997A0/en
Priority claimed from AUPO6490A external-priority patent/AUPO649097A0/en
Priority claimed from AUPO6482A external-priority patent/AUPO648297A0/en
Priority claimed from AUPO6487A external-priority patent/AUPO648797A0/en
Priority claimed from AUPO6483A external-priority patent/AUPO648397A0/en
Priority claimed from AUPO6485A external-priority patent/AUPO648597A0/en
Priority claimed from AUPO6488A external-priority patent/AUPO648897A0/en
Priority claimed from AUPO6480A external-priority patent/AUPO648097A0/en
Priority claimed from AUPO6492A external-priority patent/AUPO649297A0/en
Priority claimed from AUPO6491A external-priority patent/AUPO649197A0/en
Priority claimed from AUPO6479A external-priority patent/AUPO647997A0/en
Priority claimed from AUPO6481A external-priority patent/AUPO648197A0/en
Priority claimed from AUPO6484A external-priority patent/AUPO648497A0/en
Priority claimed from AUPO6486A external-priority patent/AUPO648697A0/en
Application filed by Canon Inc filed Critical Canon Inc
Assigned to CANON KABUSHIKI KAISHA reassignment CANON KABUSHIKI KAISHA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PROKOP, THOMASZ THOMAS
Assigned to CANON KABUSHIKI KAISHA, CANON INFORMATION SYSTEMS RESEARCH AUSTRALIA PTY. LTD. reassignment CANON KABUSHIKI KAISHA RE-RECORD TO ADD THE SECOND ASSIGNEE, PREVIOUSLY RECORDED ON REEL 9373 FRAME 0419, ASSIGNOR CONFIRMS THE ASSIGNMENT OF THE ENTIRE INTEREST. Assignors: PROKOP, THOMASZ THOMAS
Assigned to CANON KABUSHIKI KAISHA reassignment CANON KABUSHIKI KAISHA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CANON INFORMATION SYSTEMS RESEARCH AUSTRALIA PTY. LTD.
Publication of US6272257B1 publication Critical patent/US6272257B1/en
Application granted granted Critical
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor
    • G06F9/3879Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor for non-native instruction execution, e.g. executing a command; for Java instruction set
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3893Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
    • G06F9/3895Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
    • G06F9/3897Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1075Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers for multiport memories each having random access ports and serial ports, e.g. video RAM
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/28Indexing scheme for image data processing or generation, in general involving image processing hardware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals

Definitions

  • the present invention relates to decoders for decoding codes of variable length that are interleaved with variable-length bit fields not being encoded, some of which are passed unchanged through the decoder.
  • the length of the not-encoded bit fields are equal to or greater than zero.
  • variable-length coding such as Huffman coding.
  • Huffman coding was originally disclosed by D. A. Huffman in an article “A Method for the Construction of Minimum Redundancy Codes” Proc. IRE , 40: 1098, 1952.
  • variable-length codes in an encoded bit stream are not contiguous, but are interleaved with other not-encoded bit fields.
  • the bit fields may represent control and/or formatting information, and/or provide additional specification for encoded data including marker headers, marker codes, stuff bytes, padding bits and additional bits in, for example, JPEG encoded data.
  • Variable length encoding allocates codes of different lengths to different input data according to the probability of occurrence of the input data, so that statistically more frequent input codes are allocated shorter codes than the less frequent codes. The less frequent input codes are allocated longer codes.
  • the allocation of codes may be done either statically or adaptively. For the static case, the same output code is provided for a given input datum, no matter what block of data is being processed.
  • output codes are assigned to input data based on a statistical analysis of a particular input block or set of blocks of data, and possibly changes from block to block (or from a set of blocks to a set of blocks).
  • variable-length bit fields that are not encoded are interleaved with encoded data as in, for example, the JPEG standard.
  • Greater difficulty in fast decoding of such variable-length encoded data occurs when the length of a particular not-encoded bit field can be determined only after the preceding (encoded) datum is fully decoded, as in JPEG standard. This generally excludes direct pipelining from being incorporated into a decoder, because the position of the beginning of the next encoded datum is known only after the preceding one is fully decoded.
  • a preprocessing logic unit for removing a plurality of not-encoded fields of fixed length and outputting the plurality of variable-length code words interleaved with the not-encoded bit fields of variable length, and outputting signals indicating positions of the plurality of not-encoded fields of fixed length in the blocks of data;
  • the apparatus further comprises: a first processing unit comprising a first set of barrel shifters and a first register, wherein the first processing unit processes the outputted plurality of variable-length code words interleaved with the not-encoded bit fields of variable length; and a second processing unit comprising a second set of barrel shifters and a second register, the second processing unit for processing the outputted signals indicating positions of the not-encoded words of fixed length in the blocks of data; wherein the first and second processing units are identical, and the output of the respective barrel shifters and the units receive identical control signals.
  • the output of the second processing unit for processing signals indicating positions of the not-encoded words of fixed length may be used to determine the size of a not-encoded variable length field to be removed from the data stored in a data register for decoding purposes.
  • the preprocessing logic unit for removing the not-encoded fields of fixed length outputs a plurality of variable-length code words interleaved with the not-encoded bit fields of variable length as words of fixed length composed of bit fields of fixed length such that a single bit field has a corresponding tag indicating if the outputted field is passed or removed by the preprocessing unit, or passed by the preprocessing unit and following or preceding a marker being a not-encoded field of fixed length.
  • the blocks of data are encoded using Huffman coding.
  • the method further comprises the steps of: processing the outputted plurality of variable-length code words interleaved with the not-encoded bit fields of variable length using a first processing unit comprising a first set of barrel shifters and a first register; and processing the outputted signals indicating positions of the not-encoded words of fixed length in the blocks of data using a second processing unit comprising a second set of barrel shifters and a second register; wherein the first and second processing units are identical, and the output of the respective barrel shifters and the units receive identical control signals.
  • the method may comprise the step of determining the size of a not-encoded variable length field to be removed from the data stored in a data register for decoding purposes dependent upon the output of the second processing unit for processing signals indicating positions of the not-encoded words of fixed length.
  • the method may further comprise the step of outputting, by the preprocessing logic unit for removing the not-encoded fields of fixed length, a plurality of variable-length code words interleaved with the not-encoded bit fields of variable length as words of fixed length composed of bit fields of fixed length such that a single bit field has a corresponding tag indicating if the outputted field is passed or removed by the preprocessing unit, or passed by the preprocessing unit and following or preceding a marker being a not-encoded field of fixed length.
  • the blocks of data are encoded using Huffman coding.
  • FIGS. 82 to 91 the reader's attention is directed, in particular, to FIGS. 82 to 91 and their associated description without intending to detract from the disclosure of the remainder of the description.
  • FIG. 1 illustrates the operation of a raster image co-processor within a host computer environment
  • FIG. 2 illustrates the raster image co-processor of FIG. 1 in further detail
  • FIG. 3 illustrates the memory map of the raster image co-processor
  • FIG. 4 shows the relationship between a CPU, instruction queue, instruction operands and results in shared memory, and a co-processor
  • FIG. 5 shows the relationship between an instruction generator, memory manager, queue manager and co-processor
  • FIG. 6 shows the operation of the graphics co-processor reading instructions for execution from the pending instruction queue and placing them on the completed instruction queue
  • FIG. 7 shows a fixed length circular buffer implementation of the instruction queue, indicating the need to wait when the buffer fills
  • FIG. 8 illustrates to instruction execution streams as utilized by the co-processor
  • FIG. 9 illustrates an instruction execution flow chart
  • FIG. 10 illustrates the standard instruction word format utilized by the co-processor
  • FIG. 11 illustrates the instruction word fields of a standard instruction
  • FIG. 12 illustrates the data word fields of a standard instruction
  • FIG. 13 illustrates schematically the instruction controller of FIG. 2
  • FIG. 14 illustrates the execution controller of FIG. 13 in more detail
  • FIG. 15 illustrates a state transition diagram of the instruction controller
  • FIG. 16 illustrates the instruction decoder of FIG. 13
  • FIG. 17 illustrates the instruction sequencer of FIG. 16 in more detail
  • FIG. 18 illustrates a transition diagram for the ID sequencer of FIG. 16
  • FIG. 19 illustrates schematically the prefetch buffer controller of FIG. 13 in more detail
  • FIGS. 20 illustrates the standard form of register storage and module interaction as utilized in the co-processor
  • FIG. 21 illustrates the format of control bus transactions as utilized in the co-processor
  • FIG. 22 illustrates the data flow through a portion of the co-processor
  • FIGS. 23-29 illustrate various examples of data reformatting as utilized in the co-processor
  • FIGS. 30 and 31 illustrate the format conversions carried out by the co-processor
  • FIG. 32 illustrates the process of input data transformation as carried out in the co-processor
  • FIGS. 33-41 illustrate various further data transformations as carried out by the co-processor
  • FIG. 42 illustrates various internal to output data transformations carried out by the co-processor
  • FIGS. 43-47 illustrate various further example data transformations carried out by the co-processor
  • FIG. 48 illustrates various fields utilized by internal registers to determine what data transformations should be carried out
  • FIG. 49 depicts a block diagram of a graphics subsystem that uses data normalization.
  • FIG. 50 illustrates a circuit diagram of a data normalization apparatus
  • FIG. 51 illustrates the pixel processing carried out for compositing operations
  • FIG. 52 illustrates the instruction word format for compositing operations
  • FIG. 53 illustrates the data word format for compositing operations
  • FIG. 54 illustrates the instruction word format for tiling operations
  • FIG. 55 illustrates the operation of a tiling instruction on an image
  • FIG. 56 illustrates the process of utilization of interval and fractional tables to re-map color gamuts
  • FIG. 57 illustrates the form of storage of interval and fractional tables within the MUV buffer of the co-processor
  • FIG. 58 illustrates the process of color conversion utilising interpolation as carried out in the co-processor
  • FIG. 59 illustrates the refinements to the rest of the color conversion process at gamut edges as carried out by the co-processor
  • FIG. 60 illustrates the process of color space conversion for one output color as implemented in the co-processor
  • FIG. 61 illustrates the memory storage within a cache of the co-processor when utilising single color output color space conversion
  • FIG. 62 illustrates the methodology utilized for multiple color space conversion
  • FIG. 63 illustrates the process of address re-mapping for the cache when utilized during the process of multiple color space conversion
  • FIG. 64 illustrates the instruction word format for color space conversion instructions
  • FIG. 65 illustrates a method of multiple color conversion
  • FIGS. 66 and 67 illustrate the formation of MCU's during the process of JPEG conversion as carried out in the co-processor
  • FIG. 68 illustrates the structure of the JPEG coder of the co-processor
  • FIG. 69 illustrates the quantizer portion of FIG. 68 in more detail
  • FIG. 70 illustrates the Huffman coder of FIG. 68 in more detail
  • FIGS. 71 and 72 illustrate the Huffman coder and decoder in more detail
  • FIGS. 73-75 illustrate the process of cutting and limiting of JPEG data as utilized in the co-processor
  • FIG. 76 illustrates the instruction word format for JPEG instructions
  • FIG. 77 shows a block diagram of a typical discrete cosine transform apparatus (prior art).
  • FIG. 78 illustrates an arithmetic data path of a prior art DCT apparatus
  • FIG. 79 shows a block diagram of a DCT apparatus utilized in the co-processor
  • FIG. 80 depicts a block diagram of the arithmetic circuit of FIG. 79 in more detail
  • FIG. 81 illustrates an arithmetic data path of the DCT apparatus of FIG. 79
  • FIG. 82 presents a representational stream of Huffman-encoded data units interleaved with not encoded bit fields, both byte aligned and not, as in JPEG format;
  • FIGS. 83 illustrates the overall architecture of a Huffman decoder of JPEG data of FIG. 84 in more detail
  • FIG. 84 illustrates the overall architecture of the Huffman decoder of JPEG data
  • FIG. 85 illustrates data processing in the stripper block which removes byte aligned not encoded bit fields from the input data. Examples of the coding of tags corresponding to the data outputted by the stripper are also shown;
  • FIGS. 86 shows the organization and the data flow in the data preshifter
  • FIGS. 87 shows control logic for the decoder of FIG. 81;
  • FIGS. 88 shows the organization and the data flow in the marker preshifter
  • FIG. 89 shows a block diagram of a combinatorial unit decoding Huffman encoded values in JPEG context
  • FIG. 90 illustrates the concept of a padding zone and a block diagram of the decoder of padding bits
  • FIG. 91 shows an example of a format of data outputted by the decoder, the format being used in the co-processor
  • FIG. 92 illustrates methodology utilized in image transformation instructions
  • FIG. 93 illustrates the instruction word format for image transformation instructions
  • FIGS. 94 and 95 illustrate the format of an image transformation kernal as utilized in the co-processor
  • FIG. 96 illustrates the process of utilising an index table for image transformations as utilized in the co-processor
  • FIG. 97 illustrates the data field format for instructions utilising transformations and convolutions
  • FIG. 98 illustrates the process of interpretation of the bp field of instruction words
  • FIG. 99 illustrates the process of convolution as utilized in the co-processor
  • FIG. 100 illustrates the instruction word format for convolution instructions as utilized in the co-processor
  • FIG. 101 illustrates the instruction word format for matrix multiplication as utilized in the co-processor
  • FIGS. 102-105 illustrates the process utilized for hierarchial image manipulation as utilized in the co-processor
  • FIG. 106 illustrates the instruction word coding for hierarchial image instructions
  • FIG. 107 illustrates the instruction word coding for flow control instructions as illustrated in the co-processor:
  • FIG. 108 illustrates the pixel organizer in more detail
  • FIG. 109 illustrates the operand fetch unit of the pixel organizer in more detail
  • FIGS. 110-114 illustrate various storage formats as utilized by the co-processor
  • FIG. 115 illustrates the MUV address generator of the pixel organizer of the co-processor in more detail
  • FIG. 116 is a block diagram of a multiple value (MUV) buffer utilized in the co-processor
  • FIG. 117 illustrates a structure of the encoder of FIG. 116
  • FIG. 118 illustrates a structure of the decoder of FIG. 116
  • FIG. 119 illustrates a structure of an address generator of FIG. 116 for generating read addresses when in JPEG mode (pixel decomposition);
  • FIG. 120 illustrates a structure of an address generator of FIG. 116 for generating read addresses when in JPEG mode (pixel reconstruction);
  • FIG. 121 illustrates an organization of memory modules comprising the storage device of FIG. 116;
  • FIG. 122 illustrates a structure of a circuit that multiplexes read addresses to memory modules
  • FIG. 123 illustrates a representation of how lookup table entries are stored in the buffer operating in a single lookup table mode
  • FIG. 124 illustrates a representation of how lookup table entries are stored in the buffer operating in a multiple lookup table mode
  • FIG. 125 illustrates a representation of how pixels are stored in the buffer operating in JPEG mode (pixel decomposition);
  • FIG. 126 illustrate a representation of how single color data blocks are retrieved from the buffer operating in JPEG mode (pixel reconstruction);
  • FIG. 127 illustrates the structure of the result organizer of the co-processor in more detail
  • FIG. 128 illustrates the structure of the operand organizers of the co-processor in more detail
  • FIG. 129 is a block diagram of a computer architecture for the main data path unit utilized in the co-processor;
  • FIG. 130 is a block diagram of a input interface for accepting, storing and rearranging input data objects for further processing
  • FIG. 131 is a block diagram of a image data processor for performing arithmetic operations on incoming data objects
  • FIG. 132 is a block diagram of a color channel processor for performing arithmetic operations on one channel of the incoming data objects
  • FIG. 133 is a block diagram of a multifunction block in a color channel processor
  • FIG. 134 illustrates a block diagram for compositing operations
  • FIG. 135 shows an inverse transform of the scanline
  • FIG. 136 shows a block diagram of the steps required to calculate the value for a designation pixel
  • FIG. 137 illustrates a block diagram of the image transformation engine
  • FIG. 138 illustrates the two formats of kernel descriptions
  • FIG. 139 shows the definition and interpretation of a bp field
  • FIG. 140 shows a block diagram of multiplier-adders that perform matrix multiplication
  • FIG. 141 illustrates the control, address and data flow of the cache and cache controller of the co-processor
  • FIG. 142 illustrates the memory organization of the cache
  • FIG. 143 illustrates the address format for the cache controller of the co-processor
  • FIGS. 144 (A,B) is a block diagram of a multifunction block in a color channel processor
  • FIG. 145 illustrates the input interface switch of the co-processor in more FIG. 144 illustrates, a block diagram of the cache and cache controller;
  • FIG. 146 illustrates a four-port dynamic local memory controller of the co-processor showing the main address and data paths
  • FIG. 147 illustrates a state machine diagram for the controller of FIG. 146
  • FIG. 148 is a pseudo code listing detailing the function of the arbitrator of FIG. 146;
  • FIG. 149 depicts the structure of the requester priority bits and the terminology used in FIG. 146 .
  • FIG. 150 illustrates the external interface controller of the co-processor in more detail
  • FIGS. 151-154 illustrate the process of virtual to/from physical address mapping as utilized by the co-processor
  • FIGS. 155 illustrates the IBus receiver unit of FIG. 150 in more detail
  • FIGS. 156 (A,B) illustrates the RBus receiver unit of FIG. 2 in more detail
  • FIG. 157 illustrates the memory management unit of FIG. 150 in more detail
  • FIG. 158 illustrates the peripheral interface controller of FIG. 2 in more detail.
  • Table 14 Huffman and Quantization Tables as stored in Data Cache
  • a substantial advantage is gained in hardware rasterization by means of utilization of two independent instruction streams by a hardware accelerator.
  • the first instruction stream can be preparing a current page for printing
  • a subsequent instruction stream can be preparing the next page for printing.
  • a high utilization of hardware resources is available especially where the hardware accelerator is able to work at a speed substantially faster than the speed of the output device.
  • the preferred embodiment describes an arrangement utilising two instruction streams. However, arrangements having further instruction streams can be provided where the hardware trade-offs dictate that substantial advantages can be obtained through the utilization of further streams.
  • the utilization of two streams allows the hardware resources of the raster image co-processor to be kept fully engaged in preparing subsequent pages or bands, strips, etc., depending on the output printing device while a present page, band, etc is being forwarded to a print device.
  • the arrangement 201 includes a standard host computer system which takes the form of a host CPU 202 interconnected to its own memory store (RAM) 203 via a bridge 204 .
  • the host computer system provides all the normal facilities of a computer system including operating systems programs, applications, display of information, etc.
  • the host computer system is connected to a standard PCI bus 206 via a PCI bus interface 207 .
  • the PCI standard is a well known industry standard and most computer systems sold today, particularly those running Microsoft Windows (trade mark) operating systems, normally come equipped with a PCI bus 206 .
  • the PCI bus 206 allows the arrangement 201 to be expanded by means of the addition of one or more PCI cards, eg. 209 , each of which contain a further PCI bus interface 210 and other devices 211 and local memory 212 for utilization in the arrangement 201 .
  • a raster image accelerator card 220 to assist in the speeding up of graphical operations expressed in a page description language.
  • the raster image accelerator card 220 (also having a PCI bus interface 221 ) is designed to operate in a loosely coupled, shared memory manner with the host CPU 202 in the same manner as other PCI cards 209 . It is possible to add further image accelerator cards 220 to the host computer system as required.
  • the raster image accelerator card is designed to accelerate those operations that form the bulk of the execution complexity in raster image processing operations. These can include:
  • the raster image accelerator card 220 further includes its own local memory 223 connected to a raster image co-processor 224 which operates the raster image accelerator card 220 generally under instruction from the host CPU 202 .
  • the co-processor 224 is preferably constructed as an Application Specific Integrated Circuit (ASIC) chip.
  • ASIC Application Specific Integrated Circuit
  • the raster image co-processor 224 includes the ability to control at least one printer device 226 as required via a peripheral interface 225 .
  • the image accelerator card 220 may also control any input/output device, including scanners. Additionally, there is provided on the accelerator card 220 a generic external interface 227 connected with the raster image co-processor 224 for its monitoring and testing.
  • the host CPU 202 sends, via PCI bus 206 , a series of instructions and data for the creation of images by the raster image co-processor 224 .
  • the data can be stored in the local memory 223 in addition to a cache 230 in the raster image co-processor 224 or in registers 229 also located in the co-processor 224 .
  • the co-processor 224 is responsible for the acceleration of the aforementioned operations and consists of a number of components generally under the control of an instruction controller 235 .
  • a local memory controller 236 for communications with the local memory 223 of FIG. 1.
  • a peripheral interface controller 237 is also provided for the communication with printer devices utilising standard formats such as the Centronics interface standard format or other video interface formats.
  • the peripheral interface controller 237 is interconnected with the local memory controller 236 .
  • Both the local memory controller 236 and the external interface controller 238 are connected with an input interface switch 252 which is in turn connected to the instruction controller 235 .
  • the input interface switch 252 is also connected to a pixel organizer 246 and a data cache controller 240 .
  • the input interface switch 252 is provided for switching data from the external interface controller 238 and local memory controller 236 to the instruction controller 235 , the data cache controller 240 and the pixel organizer 246 as required.
  • the external interface controller 238 is provided in the raster image co-processor 224 and is connected to the instruction controller 235 .
  • a miscellaneous module 239 which is also connected to the instruction controller 235 and which deals with interactions with the co-processor 224 for purposes of test diagnostics and the provision of clocking and global signals.
  • the data cache 230 operates under the control of the data cache controller 240 with which it is interconnected.
  • the data cache 230 is utilized in various ways, primarily to store recently used values that are likely to be subsequently utilized by the co-processor 224 .
  • the aforementioned acceleration operations are carried out on plural streams of data primarily by a JPEG coder/decoder 241 and a main data path unit 242 .
  • the units 241 , 242 are connected in parallel arrangement to all of the pixel organizer 246 and two operand organizers 247 , 248 .
  • the processed streams from units 241 , 242 are forwarded to a results organizer 249 for processing and reformatting where required. Often, it is desirable to store intermediate results close at hand.
  • a multi-used value buffer 250 is provided, interconnected between the pixel organizer 246 and the result organizer 249 , for the storage of intermediate data.
  • the result organizer 249 outputs to the external interface controller 238 , the local memory controller 236 and the peripheral interface controller 237 as required.
  • a further (third) data path unit 243 can, if required be connected “in parallel” with the two other data paths in the form of JPEG coder/decoder 241 and the main data path unit 242 .
  • the extension to 4 or more data paths is achieved in the same way. Although the paths are “parallel” connected, they do not operate in parallel. Instead only one path at a time operates.
  • the overall ASIC design of FIG. 2 has been developed in the following manner. Firstly, in printing pages it is necessary that there not be even small or transient artefacts. This is because whilst in video signal creation for example, such small errors if present may not be apparent to the human eye (and hence be unobservable), in printing any small artefact appears permanently on the printed page and can sometimes be glaringly obvious. Further, any delay in the signal reaching the printer can be equally disastrous resulting in white, unprinted areas on a page as the page continues to move through the printer. It is therefore necessary to provide results of very high quality, very quickly and this is best achieved by a hardware rather than a software solution.
  • the first step was the realization that in image manipulation often repetitive calculations of the same basic type were required to be carried out.
  • a calculating unit could be configured to carry out a specific type of calculation, a long stream of data processed and then the calculating unit could be reconfigured for the next type of calculation step required. If the data streams were reasonably long, then the time required for reconfiguration would be negligible compared to the total calculation time and thus throughput would be enhanced.
  • the provision of plural data processing paths means that in the event that one path is being reconfigured whilst the other path is being used, then there is substantially no loss of calculating time due to the necessary reconfiguration.
  • the main data path unit 242 carries out a more general calculation and the other data path(s) carry out more specialized calculation such as JPEC coding and decoding as in unit 241 or, if additional unit 243 is provided, it can provide entropy and/or Huffman coding/decoding.
  • the fetching and presenting of data to the calculating unit can be proceeding. This process can be further speeded up, and hardware resources better utilized, if the various types of data are standardized or normalized in some way. Thus the total overhead involved in fetching and despatching data can be reduced.
  • the co-processor 224 operates under the control of host CPU 202 (FIG. 1 ).
  • the instruction controller 235 is responsible for the overall control of the co-processor 224 .
  • the instruction controller 235 operates the co-processor 224 by means of utilising a control bus 231 , hereinafter known as the CBus.
  • the CBus 131 is connected to each of the modules 236 - 250 inclusive to set registers ( 231 of FIG. 1) within each module so as to achieve overall operation of the co-processor 224 .
  • the interconnection of the control bus 231 to each of the modules 236 - 250 is omitted from FIG. 2 .
  • the layout 260 includes registers 261 dedicated to the overall control of the co-processor 224 and its instruction controller 235 .
  • the co-processor modules 236 - 250 include similar registers 262 .
  • Modern computer systems typically require some method of memory management to provide for dynamic memory allocation.
  • some method is necessary to synchronize between the dynamic allocation of memory and the use of that memory by a co-processor.
  • a computer hardware configuration has both a CPU and a specialized co-processor, each sharing a bank of memory.
  • the CPU is the only entity in the system capable of allocating memory dynamically. Once allocated by the CPU for use by the co-processor, this memory can be used freely by the co-processor until it is no longer required, at which point it is available to be freed by the CPU. This implies that some form of synchronization is necessary between the CPU and the co-processor in order to ensure that the memory is released only after the co-processor is finished using it. There are several possible solutions to this problem but each has undesirable performance implications.
  • statically allocated memory avoids the need for synchronization, but prevents the system from adjusting its memory resource usage dynamically. Similarly, having the CPU block and wait until the co-processor has finished performing each operation is possible, but this substantially reduces parallelism and hence reduces overall system performance.
  • interrupts to indicate completion of operations by the co-processor is also possible but imposes significant processing overhead if co-processor throughput is very high.
  • FIG. 4 the preferred arrangement for synchronising the (host) CPU and the co-processor is illustrated in FIG. 4 where the reference numerals used are those already utilized in the previous description of FIG. 1 .
  • the CPU 202 is responsible for all memory management in the system. It allocates memory 203 both for its own uses, and for use by the co-processor 224 .
  • the co-processor 224 has its own graphics-specific instruction set, and is capable of executing instructions 1022 from the memory 203 which is shared with the host processor 202 . Each of these instructions can also write results 1024 back to the shared memory 203 , and can read operands 1023 from the memory 203 as well.
  • the amount of memory 203 required to store operands 1023 and results 1024 of co-processor instructions varies according to the complexity and type of the particular operation.
  • the CPU 202 is also responsible for generating the instructions 1022 executed by the co-processor 224 .
  • instructions generated by the CPU 202 are queued as indicated at 1022 for execution by the co-processor 224 .
  • Each instruction in the queue 1022 can reference operands 1023 and results 1024 in the shared memory 203 , which has been allocated by the host CPU 202 for use by the co-processor 224 .
  • the method utilizes an interconnected instruction generator 1030 , memory manager 1031 and queue manager 1032 , as shown in FIG. 5 . All these modules execute in a single process on the host CPU 202 .
  • Instructions for execution by the co-processor 224 are generated by the instruction generator 1030 , which uses the services of the memory manager 1031 to allocate space for the operands 1023 and results 1024 of the instructions being generated.
  • the instruction generator 1030 also uses the services of the queue manager 1032 to queue the instructions for execution b the co-processor 224 .
  • the CPU 202 can free the memory which was allocated by the memory manager 1031 for use by the operands of that instruction.
  • the result of one instruction can also become an operand for a subsequent instruction, after which its memory can also be freed by the CPU.
  • the system frees the resources needed by each instruction via a cleanup function which runs at some stage after the co-processor 224 has completed the instruction. The exact time at which these cleanups occur depends on the interaction between the memory manager 1031 and the queue manager 1032 , and allows the system to adapt dynamically according to the amount of system memory available and the amount of memory required by each co-processor instruction.
  • FIG. 6 schematically illustrates the implementation of the co-processor instruction queue 1022 . Instructions are inserted into a pending instruction queue 1040 by the host CPU 202 , and are read by the co-processor 224 for execution. After execution by the co-processor 224 , the instructions remain on a cleanup queue 1041 , so that the CPU 202 can release the resources that the instructions required after the co-processor 224 has finished executing them.
  • the instruction queue 1022 itself can be implemented as a fixed or dynamically sized circular buffer.
  • the instruction queue 1022 decouples the generation of instructions by the CPU 202 from their execution by the co-processor 224 .
  • Operand and result memory for each instruction is allocated by the memory manager 1031 (FIG. 5) in response to requests from the instruction generator 1030 during instruction generation. It is the allocation of this memory for newly generated instructions which triggers the interaction between the memory manager 1031 and the queue manager 1032 described below, and allows the system to adapt automatically to the amount of memory available and the complexity of the instructions involved.
  • the instruction queue manager 1032 is capable of waiting for the co-processor 224 to complete the execution of any given instruction which has been generated by the instruction generator 1030 .
  • the instruction queue manager 1032 by providing a sufficiently large instruction queue 1022 and sufficient memory 203 for allocation by the memory manager 1031 , it becomes possible to avoid having to wait for the co-processor 224 at all, or at least until the very end of the entire instruction sequence, which can be several minutes on a very large job.
  • peak memory usage can easily exceed the memory available, and at this point the interaction between the queue manager 1032 and the memory manager 1031 comes into play.
  • the instruction queue manager 1032 can be instructed at any time to “cleanup” the completed instructions by releasing the memory that was dynamically allocated for them. If the memory manager 1031 detects that available memory is either running low or is exhausted, its first recourse is to instruct the queue manager 1032 to perform such a cleanup in an attempt to release some memory which is no longer in use by the co-processor 224 . This can allow the memory manager 1031 to satisfy a request from the instruction generator 1030 for memory required by a newly generated instruction, without the CPU 202 needing to wait for, or synchronize with, the co-processor 224 .
  • the memory manager 1031 can request that the queue manager 1032 wait for a fraction, say half, of the outstanding instructions on the pending instruction queue 1040 to complete. This will cause the CPU 202 processing to block until some of the co-processor 224 instructions have been completed, at which point their operands can be freed, which can release sufficient memory to satisfy the request. Waiting for only a fraction of the outstanding instructions ensures that the co-processor 224 is kept busy by maintaining at least some instructions in its pending instruction queue 1040 . In many cases the cleanup from the fraction of the pending instruction queue 1040 that the CPU 202 waits for, releases sufficient memory for the memory manager 1031 to satisfy the request from the instruction generator 1030 .
  • the final recourse of the memory manager 1031 is to wait until all pending co-processor instructions have completed. This should release sufficient resources to satisfy the request of the instruction generator 1030 , except in the case of extremely large and complex jobs which exceed the system's present memory capacity altogether.
  • the system effectively tunes itself to maximize throughput for the given amount of memory 203 available to the system. More memory results in less need for synchronization and hence greater throughput. Less memory requires the CPU 202 to wait more often for the co-processor 224 to finish using the scarce memory 203 , thereby yielding a system which still functions with minimal memory available, but at a lower performance.
  • the steps taken by the memory manager 1031 when attempting to satisfy a request from the instruction generator 1030 are summarized below. Each step is tried in sequence, after which the memory manager 1031 checks to see if sufficient memory 203 has been made available to satisfy the request. If so, it stops because the request can be satisfied; otherwize it proceeds to the next step in a more aggressive attempt to satisfy the request:
  • the queue manager 1032 can also initiate a synchronization with the co-processor 224 in the case where space in a fixed-length instruction queue buffer 1050 is exhausted. Such a situation is depicted in FIG. 7 .
  • the pending instructions queue 1040 is ten instructions in length.
  • the latest instruction to be added to the queue 1040 has the highest occupied number.
  • the next instruction to be input to the co-processor 224 is waiting at position zero.
  • the queue manager 1032 will also wait for, say, half the pending instructions to be completed by the co-processor 224 . This delay normally allows sufficient space in the instruction queue 1040 to be freed for new instructions to be inserted by the queue manager 1032 .
  • the method used by the queue manager 1032 when scheduling new instructions is as follows:
  • the method used by the queue manager 1032 when asked to wait for a given instruction is as follows:
  • the method used by the instruction generator 1030 when issuing new instructions is as follows:
  • CALL SCHEDULE_INSTRUCTION submit the co-processor instructions to the queue manager for execution.
  • the co-processor 224 maintains various registers 261 for the execution of each instruction stream.
  • Table 1 sets out the name, type and description of each of the registers utilized by the co-processor 224 while Appendix B sets out the structure of each field of each register.
  • eic_mmu_v Status Most recent page table physical address fetched by MMU.
  • eic_ip_addr Status Physical address for most recent IBus access to the PCI Bus.
  • eic_rp_addr Status Physical address for most recent RBus access to the PCI Bus.
  • eic_ig_addr Status Address for most recent IBus access to the Generic Bus
  • eic_rg_data Status Address for most recent RBus access to the Generic Bus.
  • pci_external_cfg Status 32-bit field downloaded at reset from an external serial ROM. Has no influence on coprocessor operation.
  • Todo Register (ic_tda and ic_tdb). This pair of registers each contains a sequence number counting queued instructions.
  • Interrupt Register (ic_inta and ic_intb). This pair of registers each contains a sequence number at which to interrupt.
  • Interrupt Status Registers (ic_stat.a_primed and ic_stat.b_primed). This pair of registers each contains a primed bit which is a flag enabling the interrupt following a match of the Interrupt and Finished Registers. This bit appears alongside other interrupt enable bits and other status/configuration information in the Interrupt Status (ic_stat) register.
  • Register Access Semaphores (ic_sema and ic_semb).
  • the host CPU 202 must obtain this semaphore before attempting register accesses to the co-processor 224 that requires atomicity, ie. more than one register write. Any register accesses not requiring atomicity can be performed at any time.
  • a side effect of the host CPU 202 obtaining this semaphore is that co-processor execution pauses once the currently executing instruction has completed.
  • the Register Access Semaphore is implemented as one bit of the configuration/status register of the co-processor 224 . These registers are stored in the Instruction Controllers own register area.
  • each sub-module of the co-processor has its own set of configuration and status registers. These registers are set in the course of regular instruction execution. All of these registers appear in the register map and many are modified implicitly as part of instruction execution. These are all visible to the host via the register map.
  • the co-processor 224 in order to maximize the utilization of its resources and to provide for rapid output on any external peripheral device, executes one of two independent instruction streams.
  • one instruction stream is associated with a current output page required by an output device in a timely manner, while the second instruction stream utilizes the modules of the co-processor 224 when the other instruction stream is dormant.
  • the overriding imperatives are to provide the required output data in a timely manner whilst simultaneously attempting to maximize the use of resources for the preparation of subsequent pages, bands, etc.
  • the co-processor 224 is therefore designed to execute two completely independent but identically implemented instruction streams (hereafter termed A and B).
  • the instructions are preferably generated by software running on the host CPU 202 (FIG.
  • One of the instruction streams (stream A) operates at a higher priority than the other instruction stream (stream B) during normal operation.
  • the stream or queue of instructions is written into a buffer or list of buffers within the host RAM 203 (FIG. 1) by the host CPU 202 .
  • the buffers are allocated at start-up time and locked into the physical memory of the host 203 for the duration of the application.
  • Each instruction is preferably stored in the virtual memory environment of the host RAM 203 and the raster image co-processor 224 utilizes a virtual to physical address translation scheme to determine a corresponding physical address with the in-host RAM 203 for the location of a next instruction.
  • These instructions may alternatively be stored in the co-processors 224 local memory.
  • FIG. 8 there is illustrated the format of two instruction streams A and B 270 , 271 which are stored within the host RAM 203 .
  • the format of each of the streams A and B is substantially identical.
  • the execution model for the co-processor 224 consists of:
  • Either stream can have priority, or priority can be by way of “round robin”.
  • Either stream can be ‘locked” in, ie. guaranteed to be executed regardless of stream priorities or availability of instructions on the other stream.
  • Either stream can be empty.
  • Either stream can be disabled.
  • Either stream can contain instructions that can be “overlapped”, ie. execution of the instruction can be overlapped with that of the following instruction if the following instruction is not also “overlapped”.
  • Each instruction has a “unique” 32 bit incrementing sequence number.
  • Each instruction can be coded to cause an interrupt, and/or a pause in instruction execution.
  • Instructions can be speculatively prefetched to minimize the impact of external interface latency.
  • the instruction controller 235 is responsible for implementing the co-processor's instruction execution model maintaining overall executive control of the co-processor 224 and fetching instructions from the host RAM 203 when required. On a per instruction basis, the instruction controller 235 carries out the instruction decoding and configures the various registers within the modules via CBus 231 to force the corresponding modules to carry-out that instruction.
  • the instruction execution cycle consists of four main stages 276 - 279 .
  • the first stage 276 is to determine if an instruction is pending on any instruction stream. If this is the case, an instruction is fetched 277 , decoded and executed 278 by means of updating registers 279 .
  • the instruction controller 235 will “spin” or idle until a pending instruction is found.
  • the Instruction Controller 235 fetches the instruction using the address in the corresponding instruction pointer register (ic_ipa or ic_ipb). However, the Instruction Controller 235 does not fetch an instruction if a valid instruction already exists in a prefetch buffer stored within the instruction controller 235 .
  • a valid instruction is in the prefetch buffer if:
  • the prefetch buffer is valid
  • the instruction in the prefetch buffer is from the same stream as the currently active stream.
  • the validity of the contents of the prefetch buffer is indicated by a prefetch bit in the ic_stat register, which is set on a successful instruction prefetch. Any external write to any of the registers of the instruction controller 235 causes the contents of the prefetch buffer to be invalidated.
  • the instruction controller 235 decodes it and configures the registers 229 of the co-processor 224 to execute the instruction.
  • the instruction format utilized by the raster image co-processor 224 differs from traditional processor instruction sets in that the instruction generation must be carried out instruction by instruction by the host CPU 202 and as such is a direct overhead for the host. Further, the instructions should be as small as possible as they must be stored in host RAM 203 and transferred over the PCI bus 206 of FIG. 1 to the co-processor 224 . Preferably, the co-processor 224 can be set up for operation with only one instruction. As much flexibility as possible should be maintained by the instruction set to maximize the scope of any future changes. Further, preferably any instruction executed by the co-processor 224 applies to a long stream of operand data to thereby achieve best performance.
  • the co-processor 224 employs an instruction decoding philosophy designed to facilitate simple and fast decoding for “typical instructions” yet still enable the host system to apply a finer control over the operation of the co-processor 224 for “atypical” operations.
  • Each instruction includes an instruction word or opcode 281 , and an operand or result type data word 282 setting out the format of the operands.
  • the addresses 283 - 285 of three operands A, B and C are also provided, in addition to a result address 286 . Further, an area 287 is provided for use by the host CPU 202 for storing information relevant to the instruction.
  • the structure 290 of an instruction opcode 281 of an instruction is illustrated in FIG. 11 .
  • the instruction opcode is 32 bits long and includes a major opcode 291 , a minor opcode 292 , an interrupt (I) bit 293 , a partial decode (Pd) bit 294 , a register length (R) bit 295 , a lock (L) bit 296 and a length 297 .
  • a description of the fields in the instruction word 290 is as provided by the following table.
  • Partial Decode 1 use the “partial decode” mechanism.
  • 0 Don't use the “partial decode” mechanism
  • R 1 length of instruction is specified by the Pixel Organizer's input length register (po_len)
  • 0 length of instruction is specified by the opcode length field.
  • L 1 this instruction stream (A or B) is “locked” for the next instruction.
  • 0 this instruction stream (A or B) is not “locked” in for the next instruction.
  • the instruction can be coded such that instruction execution sets an interrupt and pause on completion of that instruction. This interrupt is called an “instruction completed interrupt”.
  • the partial decode bit 294 provides for a partial decode mechanism such that when the bit is set and also enabled in the ic_cfg register, the various modules can be micro coded prior to the execution of the instruction in a manner which will be explained in more detail hereinafter.
  • the lock bit 296 can be utilized for operations which require more than one instruction to set up. This can involve setting various registers prior to an instruction and provides the ability to “lock” in the current instruction stream for the next instruction.
  • the length field 297 has a natural definition for each instruction and is defined in terms of the number of “input data items” or the number of “output data items” as required.
  • the length field 297 is only 16 bits long. For instructions operating on a stream of input data items greater than 64,000 items the R-bit 295 can be set, in which case the input length is taken from a po_len register within the pixel organizer 246 of FIG. 2 . This register is set immediately before such an instruction.
  • the number of operands 283 - 286 required for a given instruction varies somewhat depending on the type of instruction utilized.
  • the following table sets out the number of operands and length definition for each instruction type:
  • FIG. 12 there is illustrated, firstly, the data word format 300 of the data word or operand descriptor 282 of FIG. 10 for three operand instructions and, secondly, the data word format 301 for two operand instructions.
  • the details of the encoding of the operand descriptors are provided in the following table:
  • the co-processor 224 is set up to fetch, or otherwize calculate, one internal data item, and use this item for the length of the instruction for that operand.
  • the co-processor 224 is set up to cycle through a small set of data producing a “tiling effect”. When the L-bit of an operand descriptor is zero then the data is immediate, ie. the data items appear literally in the operand word.
  • each of the operand and result words 283 - 286 contains either the value of the operand itself or a 32-bit virtual address to the start of the operand or result where data is to be found or stored.
  • the instruction controller 235 of FIG. 2 proceeds to decode the instruction in two stages. It first checks to see whether the major opcode of the instruction is valid, raising an error if the major opcode 291 (FIG. 11) is invalid. Next, the instruction is executed by the instruction controller 235 by means of setting the various registers via CBus 231 to reflect the operation specified by the instruction. Some instructions can require no registers to be set.
  • the registers for each module can be classified into types based on their behavior. Firstly, there is the status register type which is “read only” by other modules and “read/write” by the module including the register. Next, a first type of configuration register, hereinafter called “config 1 ”, is “read/write” externally by the modules and “read only” by the module including the register. These registers are normally used for holding larger type configuration information, such as address values. A second type of configuration register, herein known as “config 2 ”, is readable and writable by any module but is read only by the module including the register. This type of register is utilized where bit by bit addressing of the register is required.
  • control 1 registers A number of control type registers are provided.
  • a first type hereinafter known as “control 1 ” registers, is readable and writable by all modules (including the module which includes the register).
  • the control 1 registers are utilized for holding large control information such as address values.
  • control 2 a second type of control register, hereinafter known as “control 2 ”, which can be set on a bit by bit basis.
  • a final type of register known as an interrupt register has bits within the register which are settable to 1 by the module including the register and resettable to zero externally by writing a “1” to the bit that has been set. This type of register is utilized for dealing with the interrupts/errors flagged by each of the modules.
  • Each of the modules of the co-processor 224 sets a c_active line on the CBus 231 when it is busy executing an instruction.
  • the instruction controller 235 can then determine when instructions have been completed by “OR-ing” the c_active lines coming from each of the modules over the CBus 231 .
  • the local memory controller module 236 and the peripheral interface controller module 237 are able to execute overlapped instructions and include a c_background line which is activated when they are executing an overlapped instruction.
  • the overlapped instructions are “local DMA” instructions transferring data between the local memory interface and the peripheral interface.
  • the execution cycle for an overlapped local DMA instruction is slightly different from the execution cycle of other instructions. If an overlapped instruction is encountered for execution, the instruction controller 235 checks whether there is already an overlapped instruction executing. If there is, or overlapping is disabled, the instruction controller 235 waits for that instruction to finish before proceeding with execution of that instruction. If there is not, and overlapping is enabled, the instruction controller 235 immediately decodes the overlapped instruction and configures the peripheral interface controller 237 and local memory controller 236 to carry out the instruction. After the register configuration is completed, the instruction controller 235 then goes on to update its registers (including finished register, status register, instruction pointer. etc.) without waiting for the instruction to “complete” in the conventional sense. At this moment, if the finished sequence number equals the interrupt sequence number, ‘the overlapped instruction completed’ interrupt is primed rather than raising the interrupt immediately. The ‘overlapped instruction completed’ interrupt is raized when the overlapped instruction has fully completed.
  • the instruction controller attempts to prefetch the next instruction while the current instruction is executing. Most instructions take considerably longer to execute than they will to fetch and decode.
  • the instruction controller 235 prefetches an instruction if all of the following conditions are met:
  • the currently executing instruction is not set to interrupt and pause
  • the currently executing instruction is not a jump instruction
  • the instruction controller 235 determines that prefetching is possible it requests the next instruction, places it in a prefetch buffer and then validates the buffer. At this point there is nothing more for the instruction controller 235 to do until the currently executing instruction has completed.
  • the instruction controller 235 determines the completion of an instruction by examining the c_active and c_background lines associated with the CBus 231 .
  • the instruction controller 235 Upon completion of an instruction, the instruction controller 235 updates its registers to reflect the new state. This must be done atomically to avoid problems with synchronising with possible external accesses. This atomic update process involves:
  • the instruction pointer (ic_ipa or ic_ipb) is incremented by the size of an instruction, unless the instruction was a successful jump, in which case the target value of the jump is loaded into the instruction pointer.
  • the finished register (ic_fna or ic_fnb), is then incremented if sequence numbering is enabled.
  • the status register (ic_stat) is also updated appropriately to reflect the new state. This includes setting the pause bits if necessary.
  • the Instruction Controller 235 pauses if an interrupt has occurred and pausing is enabled for that interrupt or if any error has occurred. Pausing is implemented by setting the instruction stream pause bits in the status register (a_pause or b_pause bits in ic_stat). To resume instruction execution, these bits should be reset to 0.
  • Sequence number completed interrupt occurs. That is, if the finished register (ic_fna or ic_fnb) sequence number is the same as interrupt sequence number. Then this interrupt is primed, sequence numbering is enabled, and the interrupt occurs; or
  • the Register Access Semaphore is a mechanism that provides atomic accesses to multiple instruction controller registers.
  • the registers that can require atomic access are as follows:
  • External agents can read all registers safely at any time. External agents are able to write any registers at any time, however to ensure that the Instruction Controller 235 does not update values in these registers, the external agent must first obtain the Register Access Semaphore. The Instruction Controller does not attempt to update any values in the abovementioned registers if the Register Access Semaphore is claimed externally. The instruction controller 235 updates all of the above mentioned registers in one clock cycle to ensure atomicity.
  • each instruction has associated with it a 32 bit “sequence number”. Instruction sequence numbers increment wrapping through from 0xFFFFFF to 0x00000000.
  • the instruction controller primes the “sequence number completed” interrupt mechanism by setting the “sequence number completed” primed bit (a_primed or b_primed bit in ic_stat) in the status register.
  • the instruction controller primes the “overlapped instruction sequence number completed” interrupt mechanism by setting the a_ol_primed or b_ol_primed bits in the ic_stat register.
  • interrupt sequence number is not “greater” than the finished sequence number, and there is an overlapped instruction in progress in that stream, but the interrupt sequence number does not equal the last overlapped instruction sequence number, then the interrupt sequence number represents a finished instruction, and no interrupt mechanism is primed.
  • interrupt sequence number is not “greater” than the finished sequence number, and there is no overlapped instruction in progress in that stream, then the interrupt sequence number must represent a finished instruction, and no interrupt mechanism is primed.
  • External agents can set any of the interrupt primed bits (bits a_primed, a_ol_primed, b_primed or b_ol_primed) in the status register to activate or de-activate this interrupt mechanism independently.
  • the instruction controller 235 includes an execution controller 305 which implements the instruction execution cycle as well as maintaining overall executive control of the co-processor 224 .
  • the functions of the execution controller 305 include maintaining overall executive control of the instruction controller 235 , determining instructing sequencing, instigating instruction fetching and prefetching, initiating instructing decoding and updating the instruction controller registers.
  • the instruction controller further includes an instruction decoder 306 .
  • the instruction decoder 306 accepts instructions from a prefetch buffer controller 307 and decodes them according the aforementioned description.
  • the instruction decoder 306 is responsible for configuring registers in the other co-processor modules to execute the instruction.
  • the prefetch buffer controller 307 manages the reading and writing to a prefetch buffer within the prefetch buffer controller and manages the interfacing between the instruction decoder 306 and the input interface switch 252 (FIG. 2 ).
  • the prefetch buffer controller 307 is also responsible for managing the updating of the two instruction pointer registers (ic_ipa and ic_ipb).
  • Access to the CBus 231 (FIG. 2) by the instruction controller 235 , the miscellaneous module 239 (FIG. 2) and the external interface controller 238 (FIG. 2) is controlled by a “CBus” arbitrator 308 which arbitrates between the three modules' request for access.
  • the requests are transferred by means of a control bus (CBus) 231 to the register units of the various modules.
  • CBus control bus
  • FIG. 14 there is illustrated the execution controller 305 of FIG. 13 in more detail.
  • the execution controller is responsible for implementing the instruction execution cycle 275 of FIG. 9 and, in particular, is responsible for:
  • the execution controller includes a large core state machine 310 hereinafter known as “the central brain” which implements the overall instruction execution cycle.
  • the execution controller includes an instruction prefetch logic unit 311 . This unit is responsible for determining whether there is an outstanding instruction to be executed and which instruction stream the instruction belongs to. The start 312 and prefetch 313 states of the transition diagram of FIG. 15 utilize this information in obtaining instructions.
  • a register management unit 317 of FIG. 14 is responsible for monitoring the register access semaphores on both instruction streams and updating all necessary registers in each module.
  • the register management unit 317 is also responsible for comparing the finished register (ic_fna or ic_fnb) with the interrupt register (ic_inta or ic_intb) to determine if a “sequence number completed” interrupt is due.
  • the register management unit 317 is also responsible for interrupt priming.
  • An overlapped instructions unit 318 is responsible for managing the finishing off of an overlapped instruction through management of the appropriate status bits in the ic_stat register.
  • the execution controller also includes a decoder interface unit 319 for interfacing between the central brain 310 and the instruction decoder 306 of FIG. 13 .
  • the instruction decoder 306 is responsible for configuring the co-processor to execute the instructions residing in the prefetch buffer.
  • the instruction decoder 306 includes an instruction decoder sequencer 321 which comprizes one large state machines broken down into many smaller state machines.
  • the instruction sequencer 321 communicates with a CBus dispatcher 312 which is responsible for setting the registers within each module.
  • the instruction decoder sequencer 321 also communicates relevant information to the execution controller such as instruction validity and instruction overlap conditions.
  • the instruction validity check being to check that the instruction opcode is not one of the reserved opcodes.
  • the instruction dispatch sequencer 321 includes a overall sequencing control state machine 324 and a series of per module configuration sequencer state machines, eg. 325 , 326 .
  • One per module configuration sequencer state machine is provided for each module to be configured.
  • Collectively the state machines implement the co-processor's microprogramming of the modules.
  • the state machines eg. 325 , instruct the CBus dispatcher to utilize the global CBus to set various registers so as to configure the various modules for processing.
  • a side effect of writing to particular registers is that the instruction execution commences.
  • Instruction execution typically takes much longer than the time it takes for the sequencer 321 to configure the co-processor registers for execution.
  • appendix A attached to the present specification, there is disclosed the microprogramming operations performed by the instruction sequencer of the co-processor in addition to the form of set up by the instruction sequencer 321 .
  • the Instruction Decode Sequencer 321 does not configure all of the modules within the co-processor for every instruction.
  • the table below shows the ordering of module configuration for each class of instruction with the module configured including the pixel organizer 246 (PO), the data cache controller 240 (DCC), the operand organizer B 247 (OOB), the operand organizer C 248 (OOC), main data path 242 (MDP), results organizer 249 (RO), and JPEG encoder 241 (JC).
  • Some of the modules are never configured during the course of instruction decoding. These modules are the External Interface Controller 238 (EIC), the Local Memory Controller 236 (LMC), the Instruction Controller 235 itself (IC), the Input Interface Switch 252 (IIS) and the Miscellaneous Module (MM).
  • Module Setup Order Instruction Module Configuration Sequence Class Sequence ID Compositing PO, DCC, OOB, OOC, MDP, RO 1 CSC PO, DCC, OOB, OOC, MDP, RO 2 JPEG coding PO, DCC, OOB, OOC, JC, RO 3 Data coding PO, DCC, OOB, OOC, JC, RO 3 Transformations and PO, DCC, OOB, OOC, MDP, RO 2 Convolutions Matrix Multiplication PO, DCC, OOB, OOC, MDP, RO 2 Halftoning PO, DCC, OOB, MDP, RO 4 General memory copy PO, JC, RO 8 Peripheral DMA PIC 5 Hierarchial Image - PO, DCC, OOB, OOC, MDP, RO 6 Horizontal Interpolation Hierarchial Image - PO, DCC, OOB, OOC, MDP, RO 4 others Internal access RO, RO, RO, RO 7 others — —
  • each of the module configuration sequencers eg. 325 is responsible for carrying out the required register access operations to configure the particular module.
  • the overall sequencing control state machine 324 is responsible for overall operation of the module configuration sequencer in the aforementioned order.
  • each of the modules configuration sequencers is responsible for controlling the CBus dispatcher to alter register details in order to set the various registers in operation of the modules.
  • the prefetch buffer controller consists of a prefetch buffer 335 for the storage of a single co-processor instruction (six times 32 bit words).
  • the prefetch buffer includes one write port controlled by a IBus sequencer 336 and one read port which provides data to the instruction decoder, execution controller and the instruction controller CBus interface.
  • the IBus sequencer 336 is responsible for observing bus protocols in the connection of the prefetch buffer 335 to the input interface switch.
  • An address manager unit 337 is also provided which deals with address generation for instruction fetching.
  • the address manager unit 337 performs the functions of selecting one of ic_ipa or ic_ipb to place on the bus to the input interface switch, incrementing one of ic_ipa or ic_ipb based on which stream the last instructions was fetched from and channelling jump target addresses back to the ic_ipa and ic_ipb register.
  • a PBC controller 339 maintains overall control of the prefetched buffer controller 307 .
  • each module including the instruction controller module itself, has an internal set of registers 304 as previously defined in addition to a CBus interface controller 303 as illustrated in FIG. 20 and which is responsible for receiving CBus requests and updating internal registers in light of those requests.
  • the module is controlled by writing registers 304 within the module via a CBus interface 302 .
  • a CBus arbitrator 308 (FIG. 13) is responsible for determining which module of the instruction controller 235 , the external interface controller or the miscellaneous module is able to control the CBus 309 for acting as a master of the CBus and for the writing or reading of registers.
  • FIG. 20 illustrates, in more detail, the standard structure of a CBus interface 303 as utilized by each of the modules.
  • the standard CBus interface 303 accepts read and write requests from the CBus 302 and includes a register file 304 which is utilized 341 and updated on 341 by the various submodules within a module. Further, control lines 344 are provided for the updating of any submodule memory areas including reading of the memory areas.
  • the standard CBus interface 303 acts as a destination on the CBus, accepting read and write requests for the register 304 and memory objects inside other submodules.
  • a “c_reset” signal 345 sets every register inside the Standard CBus interface 103 to their default states. However, “c_reset” will not reset the state machine that controls the handshaking of signals between itself and the CBus Master, so even if “c_reset” is asserted in the middle of a CBus transaction, the transaction will still finish, with undefined effects.
  • the “c_int” 347 sets every register inside the Standard CBus interface 103 to their default states. However, “c_reset” will not reset the state machine that controls the handshaking of signals between itself and the CBus Master, so even if “c_reset” is asserted in the middle of a CBus transaction, the transaction will still finish, with undefined effects.
  • the “c_int” 347 sets every register inside the Standard CBus interface 103 to their default states. However, “c_reset” will not reset the state machine that controls the handshaking of signals between itself and the CBus Master, so even if “c
  • the signals “c_sdata_in” 345 and “c_svalid_in” are data and valid signals from the previous module in a daisy chain of modules.
  • the signals “c_sdata_out” and “c_svalid_out” 350 are data and valid signals going to the next module in the daisy chain.
  • the functionality of the Standard CBus interface 303 includes:
  • the Standard CBus Interface 303 accepts register read/write and bit set requests that appears on the CBus. There are trio types of CBus instructions that Standard CBus Interface handles:
  • Type A operations allow other modules to read or write 1, 2, 3, or 4 bytes into any register inside Standard CBus Interface 303 .
  • the data cycle occurs in the clock cycle immediately after the instruction cycle. Note that the type field for register write and read are “ 1000 ” and “ 1001 ” respectively.
  • the Standard CBus Interface 303 decodes the instruction to check whether the instruction is addressed to the module, and whether it is a read or write operation.
  • the Standard CBus Interface 303 uses the “reg” field of the CBus transaction to select which register output is to put into the “c_sdata” bus 350 .
  • the Standard CBus Interface 303 uses the “reg” and “byte” fields to write the data into the selected register.
  • the Standard CBus Interface returns the data and asserts “c_svalid” 350 at the same time.
  • the Standard CBus Interface 303 asserts “c_svalid” 350 to acknowledge.
  • Type C operations allow other modules to write one or more bits in one of the bytes in one of the registers. Instruction and data are packed into one word.
  • the Standard CBus Interface 303 decodes the instruction to check whether the instruction is addressed to the module. It also decodes “reg”, “byte” and “enable” fields to generate the required enable signals. It also latches the data field of the instruction, and distributes it to all four bytes of a word so the required bit(s) are written in every enabled bit(s) in every enabled byte(s). No acknowledgment is required for this operation.
  • the Standard CBus Interface 303 accepts memory read and memory write requests that appears on the CBus. While accepting a memory read/write request, the Standard CBus Interface 303 checks whether the request is addressed to the module. Then, by decoding the address field in the instruction, the Standard CBus Interface generates the appropriate address and address strobe signals 344 to the submodule which a memory read/write operation is addressed to. For write operations the Standard CBus Interface also passes on the byte enable signals from the instruction to the submodules.
  • the operation of the standard CBus interface 303 is controlled by a read/write controller 352 which decodes the type field of a CBus instruction from the CBus 302 and generates the appropriate enable signals to the register file 304 and output selector 353 so that the data is latched on the next cycle into the register file 304 or forwarded to other submodules 344 .
  • the CBus instruction is a register read operation
  • the read/write controller 352 enables the output selector 353 to select the correct register output going onto the “c_sdata bus” 345 .
  • the instruction is a register write operation
  • the read/write controller 352 enables the register file 304 to select the data in the next cycle.
  • the register file 304 contains four parts, being a register select decoder 355 , an output selector 353 , interrupt 356 , error 357 and exception 358 generators, unmasked error generator 359 and the register components 360 which make up the registers of that particular module.
  • the register select decoder 355 decodes the signal “ref_en” (register file enable), “write” and “reg” from the read/write controller 352 and generates the register enable signals for enabling the particular register of interest.
  • the output selector 353 selects the correct register data to be output on c_sdata_out lines 350 for register read operations according to the signal “reg” output from the read/write controller 352 .
  • the exception generators 35 - 359 generate an output error signal, eg. 347 - 349 , 362 when an error is detected on their inputs.
  • the formula for calculating each output error is as aforementioned.
  • the register components 360 can be defined to be of a number of types in accordance with requirements as previously discussed when describing the structure of the register set with reference to Table 5.
  • the CBus (control bus) is responsible for the overall control of each module by way transferring information for the setting of registers within each module's standard CBus interface. It will be evident from the description of the standard CBus interface that the CBus serves two main purposes:
  • the CBus uses an instruction-address-data protocol to control modules by the setting configuration registers within the modules. In general, registers will be set on a per instruction basis but can be modified at any time.
  • the CBus gathers status and other information, and accesses RAM and FIFO data from the various modules by requesting data.
  • the CBus is driven on a transaction by transaction basis either by:
  • the driving module is considered to be the source module of the CBus, and all other modules possible destinations. Arbitration on this bus is carried out by the Instruction Controller.
  • a CBus c_iad signal contains the addressing data and is driven by the controller in two distinct cycles:
  • the data associated with an instruction is placed on the c_iad bus in the cycle directly following the instruction cycle.
  • the target module of the read operation drives the c_sdata signal until the data cycle completes.
  • the bus includes a 32 bit instruction-address-data field which can be one of three types 370 - 372 :
  • Type A operations are used to read and write registers and the per-module data areas within the co-processor. Those operations can be generated by the external interface controller 238 performing target mode PCI cycles, by the instruction controller 231 configuring the co-processor for an instruction, and by the External CBus Interface.
  • the data cycle occurs in the clock cycle immediately following the instruction cycle.
  • the data cycle is acknowledged by the designation module using the c_svalid signal.
  • Type B operations ( 371 ) are used for diagnostic purposes to access any local memory and to generate cycles on the Generic Interface. These operations will be generated by the External Interface Controller performing target mode PCI cycles and by the External CBus Interface. The data cycle can follow at any time after the instruction cycle. The data cycle is acknowledged by the destination module using the c_svalid signal.
  • Type C operations are used to set individual bits within a module's registers. These operations will be generated by the instruction controller 231 configuring the co-processor's for an instruction and by the External CBus Interface. There is no data cycle associated with a Type C operation, data is encoded in the instruction cycle.
  • the byte field is utilized for enabling bits within a register to be set.
  • the module field sets out the particular module to which an instruction on the CBus is addressed.
  • the register field sets out which of the registers within a module is to be updated.
  • the address field is utilized for addressing memory portions where an operation is desired on those memory portions and can be utilized for addressing RAMs, FIFOs, etc.
  • the enable field enables selected bits within a selected byte when a bit set instruction is utilized.
  • the data field contains the bit wize data of the bits to be written to the byte selected for update.
  • the CBus includes a c_active line for each module, which is asserted when ever a module has outstanding activity pending.
  • the instruction controller utilizes these signals to determine when an instruction has completed.
  • the CBus contains a c_background line for each module that can operate in a background mode in addition to any preset, error and interrupt lines, one for each module for resetting, detecting errors and interrupts.
  • the co-processor utilizes a data model that differentiates between external formats and internal formats.
  • the external data formats are the formats of data as it appears on the co-processor's external interfaces such as the local memory interface or the PCI bus.
  • the internal data formats are the formats which appear between the main functional modules of the co-processor 224 . This is illustrated schematically in FIG. 22 which shows the various input and output formats.
  • the input external format 381 is the format which is input to the pixel organizer 246 , the operand organizer B 247 and the operand organizer C 248 . These organizers are responsible for reformatting the input external format data into any of a number of input internal formats 382 , which may be inputted to the JPEG coder unit 241 and the main data path unit 242 . These two functional units output data in any of a number of output internal formats 383 , which are converted by the results organizer 249 to any of a number of required output formats 304 .
  • the external data formats can be divided into three types.
  • the first type is a “packed stream” of data which consists of a contiguous stream of data having up to four channels per data quantum, with each channel consisting of one, two, four, eight or sixteen bit samples.
  • This packed stream can typically represent pixels, data to be turned into pixels, or a stream of packed bits.
  • the co-processor is designed to utilize little endian byte addressing and big endian bit addressing within a byte.
  • FIG. 23 there is illustrated a first example 386 of the packed stream format. It is assumed that each object 387 is made up of three channels being channel 0 , channel 1 and channel 2 , with two bits per channel. The layout of data for this format is as indicated 388 .
  • a four channel object 395 having eight bits per channel is illustrated 396 with each data object taking up a 32 bit word.
  • one channel objects 396 are illustrated which each take up eight bits per channel starting at a bit address 397 .
  • the actual width and number of channels of data will vary depending upon the particular application involved.
  • a second type of external data format is the “unpacked byte stream” which consists of a sequence of 32 bit words, exactly one byte within each word being valid.
  • An example of this format is shown in FIG. 26 and designated 399 , in which a single byte 400 is utilized within each word.
  • a further external data format is represented by the objects classified as an “other” format.
  • these data objects are large table-type data representing information such as colour space conversion tables, Huffman coding tables and the like.
  • the co-processor utilizes four different internal data types.
  • a first type is known as a “packed bytes” format which comprizes 32 bit words, each consisting of four active bytes, except perhaps for a final 32 bit word.
  • FIG. 27 there is illustrated one particular example 402 of the packed byte format with 4 bytes per word.
  • the next data type is “pixel” format and comprises 32 bit words 403 , consisting of four active byte channels. This pixel format is interpreted as four channel data.
  • a next internal data type illustrated with reference to FIG. 29 is an “unpacked byte” format, in which each word consists of one active byte channel 405 and three inactive byte channels, the active byte channel being the least significant byte.
  • FIG. 30 illustrates the possible conversions carried out by the various organizers from an external format 410 to an internal format 411 .
  • FIG. 31 illustrates the conversions carried out by the results organizer 249 in the conversion from internal formats 412 to external formats 413 .
  • FIG. 32 there is shown the methodology utilized by the various organizers in the conversion process.
  • the external other format 416 this is merely passed through the various organizers unchanged.
  • the external unpacked byte format 417 undergoes unpacked normalization 418 to produce a format 419 known as internally unpacked bytes.
  • the process of unpacked normalization 418 involves discarding the three inactive bytes from an externally unpacked byte stream.
  • the process of unpacked normalization is illustrated in FIG. 33 wherein the input data 417 having four byte channels wherein only one byte channel is valid results in the output format 419 which merely comprizes the bytes themselves.
  • the process of packed normalization 421 involves translating each component object in an externally packed stream 422 into a byte stream 423 . If each component of a channel is less than a byte in size then the samples are interpolated up to eight bit values. For example, when translating four bit quantities to byte quantities, the four bit quantity 0xN is translated to the byte value 0xNN. Objects larger than one byte are truncated.
  • the input object sizes supported on the stream 422 are 1, 2, 4, 8 and 16 bit sizes, although again these may be different depending upon the total width of the data objects and words in any particular system to which the invention is applied.
  • FIG. 34 there is illustrated one form of packed normalization 421 on input data 422 which is in the form of 3 channel objects with two bits per channel (as per the data format 386 of FIG. 23 ).
  • the output data comprizes a byte channel format 423 with each channel “interpolated up” where necessary to comprize an eight bit sample.
  • the pixel streams are then subjected to either a pack operation 425 , an unpacked operation 426 or a component selection operation 427 .
  • FIG. 35 there is shown an example of the packed operation 425 which simply involves discarding the inactive byte channel and producing a byte stream. packed up with four active bytes per word. Hence, a single valid byte stream 430 is compressed into a format 431 having four active bytes per word.
  • the unpacking operation 426 involves almost the reverse of the packing operation with the unpacked bytes being placed in the least significant byte of a word. This is illustrated in FIG. 36 wherein a packed byte stream 433 is unpacked to produce result 434 .
  • component selection 427 involves selecting N components from an input stream, where N is the number of input channels per quantum.
  • the unpacking process can be utilized to produce “prototype pixels” eg. 437 , with the pixel channels filled from the least significant byte.
  • FIG. 38 there is illustrated an example of component selection 440 wherein input data in the form 436 is transformed by the component selection unit 427 to produce prototype pixel format 437 .
  • a process of component substitution 440 (FIG. 32) can be utilized.
  • the component substitution process 440 is illustrated in FIG. 38 and comprizes replacing selected components with a constant data value stored within an internal data register 441 to produce, as an example, output components 242 .
  • the lane swapping process involves a byte-wize multiplexing of any lane to any other lane, including the replication of a first lane onto a second lane.
  • the particular example illustrated in FIG. 39 includes the replacement of channel 3 with channel 1 and the replication of channel 3 to channels 2 and channel 1 .
  • the data stream can be optionally stored in the multi-used value RAM 250 before being read back and subjected to a replication process 446 .
  • the replication process 446 simply replicates the data object whatever it may be.
  • FIG. 40 there is illustrated a process of replication 446 as applied to pixel data.
  • the replication factor is one.
  • FIG. 41 there is illustrated a similar example of the process of replication applied to packed byte data.
  • FIG. 42 there is illustrated the process utilized by the result organizer 249 for transferral of data in an output internal format 383 to an output external format 384 .
  • This process includes equivalent steps 424 , 425 , 426 and 440 to the conversion process described in FIG. 32 .
  • the process 450 includes the steps of component deselection 451 , denormalization 452 , byte addressing 453 and write masking 454 .
  • the component deselection process 451 as illustrated in FIG. 43, is basically the inverse operation of the component selection process 427 of FIG. 37 and involves the discarding of unwanted data. For example, in FIG. 43, only 3 valid channels of the input are taken and packed into data items 456 .
  • the denormalization process 452 is illustrated with reference to FIG. 44 and is loosely the inverse operation of the packed normalization process 421 of FIG. 34 .
  • the denormalization process involves the translation of each object or data item, previously treated as a byte, to a non-byte value.
  • the byte addressing process 453 of FIG. 42 deals with any byte wize reorganization that is necessary to deal with byte addressing issues.
  • the byte addressing step 453 is responsible for re-mapping the output stream from one byte channel to another when external unpacked bytes are utilized (FIG. 45 ).
  • the byte addressing module 453 remaps the start address of the output stream as illustrated.
  • the write masks process 454 of FIG. 42 is illustrated in FIG. 47 and is used to mask off a particular channel eg. 460 of a packed stream which is not to be written out.
  • Operand Organizer B and Operand Organizer C Data Manipulation Registers (oob_dmr, ooc_dmr);
  • Each of the Data Manipulation Registers can be set up for an instruction in one of two ways:
  • the co-processor examines the contents of the Instruction Word and the Data Word of the instruction to determine, amongst other things, how to set up the various Data Manipulation Registers. Not all combinations of the instruction and operands make sense. Several instructions have implied formats for some operands. Instructions that are coded with inconsistent operands may complete without error, although any data so generated is “undefined”. If the ‘S’ bit of the corresponding Data Descriptor is 0, the co-processor sets the Data Manipulation Register to reflect the current instruction.
  • FIG. 48 The format of the Data Manipulation Registers is illustrated in FIG. 48 .
  • the following table sets out the format of the various bits within the registers as illustrated in FIG. 48 :
  • a plurality of internal and external data types may be utilized with each instruction. All operand, results and instruction type combinations are potentially valid, although typically only a subset of those combinations will lead to meaningful results. Particular operand and result data types that are expected for each instruction are detailed below in a first table (Table 9) summarising the expected data types for external and internal formats:
  • a computer graphics processor having three main functional blocks: a data normalizer 1062 which may be implemented in each of the pixel organizer 246 and operand organizers B and C 247 , 248 , a central graphics engine in the form of the main data path 242 or JPEG units 241 and a programming agent 1064 , in the form of an instruction controller 235 .
  • the operation of the data normalizer 1062 and the central graphics engine 1064 is determined by an instruction stream 1066 that is provided to the programming agent 1064 .
  • the programming agent 1064 For each instruction, the programming agent 1064 performs a decoding function and outputs internal control signals 1067 and 1068 to the other blocks in the system.
  • the normalizer 1062 will format the data according to the current instruction and pass the result to the central graphics engine 1063 , where further processing is performed.
  • the data normalizer represents, in a simplified form, the pixel organizer and the operand organizers B and C. Each of these organizers implements the data normalization circuitry, thereby enabling appropriate normalization of the input data prior to it passing to the central graphics engine in the form of the JPEG coder or the main data path.
  • the central graphics engine 1063 operates on data that is in a standard format, which in this case is 32-bit pixels.
  • the normalizer is thus responsible for converting its input data to a 32-bit pixel format.
  • the input data words 1069 to the normalizer are also 32 bits wide, but may take the form of either packed components or unpacked bytes.
  • a packed component input stream consists of consecutive data objects within a data word, the data objects being 1,2,4,8 or 16 bits wide.
  • an unpacked byte input stream consists of 32-bit words of which only one 8-bit byte is valid.
  • the pixel data 11 produced by the normalizer may consist of 1,2,3 or 4 valid channels, where a channel is defined as being 8 bits wide.
  • the data normalization unit 1062 is composed of the following circuits: a First-In-First-Out buffer (FIFO) 1073 , a 32-bit input register (REG 1 ) 1074 , a 32-bit output register (REG 2 ) 1076 , normalization multiplexors 1075 and a control unit 1076 .
  • FIFO First-In-First-Out buffer
  • REG 1 32-bit input register
  • REG 2 32-bit output register
  • Each input data word 1069 is stored in the FIFO 1073 and is subsequently latched into REG 1 1074 , where it remains until all its input bits have been converted into the desired output format.
  • the normalization multiplexors 1075 consist of 32 combinatorial switches that produce pixels to be latched into REG 2 by selecting bits from the value in REG 1 1074 and the current output of the FIFO 1073 .
  • the normalization multiplexors 1075 receive two 32-bit input words 1077 , 1078 , denoted as x[ 63 . . . 32 ] and x[ 31 . . . 0 ].
  • the control unit generates enable signals REG 1 _EN 20 and REG 2 _EN[ 3 . . . 0 ] 1081 for updating REG 1 1074 and REG 2 1076 , respectively, as well as signals to control the FIFO 1073 and normalization multiplexors 1075 .
  • the programming agent 1064 in FIG. 49 provides the following configuration signals for the data normalizer 1062 : a FIFO_WR 4 signal, a normalization factor n[ 2 . . . 0 ], a bit offset b[ 2 . . . 0 ], a channel count c[ 1 . . . 0 ] and an external format (E).
  • Input data is written into the FIFO 1073 by asserting the FIFO_WR signal 1085 for each clock cycle that valid data is present.
  • the FIFO asserts a fifo_full status flag 1086 when there is no space available.
  • bit offset determines the position in x[ 31 . . . 0 ], the value stored in REG 1 , from which to begin processing data. Assuming a bit offset relative to the most significant bit of the first input byte, the method for producing an output data byte y[ 7 . . . 0 ] is described by the following set of equations:
  • Corresponding equations may be used to generate output data bytes y[ 15 . . . 8 ], y[ 23 . . . 16 ] and y[ 31 . . . 24 ].
  • the above method may be generalized to produce an output array of any length by taking each component of the input stream and replicating it as many times as necessary to generate output objects of standard width.
  • the order of processing each input component may be defined as little-endian or big-endian.
  • the above example deals with big-endian component ordering since processing always begins from the most significant bit of an input byte. Little-endian ordering requires redefinition of the bit offset to be relative to the least significant bit of an input byte.
  • output components are generated by truncating each input component, typically by removing a suitable number of the least significant bits.
  • truncation of 16-bit input components to form 8-bit wide standard output is performed by selecting the most significant byte of each 16-bit data object.
  • the control unit of FIG. 50 performs the decoding of n[ 2 . . . 0 ] and c[ 1 . . . 0 ], and uses the result along with b[ 2 . . . 0 ] and E to provide the select signals for the normalization multiplexors and the enable signals for REG 1 and REG 2 . Since the FIFO may become empty during the course of an instruction, the control unit also contains counters that record the current bit position, in_bit[ 4 . . . 0 ], in REG 1 from which to select input data, and the current byte, out_byte[ 1 . . . 0 ], in REG 2 to begin writing output data.
  • the control unit detects when it has completed processing each input word by comparing the value of in_bit[ 4 . . . 0 ] to the position of the final object in REG 1 , and initiates a FIFO read operation by asserting the FIFO_RD signal for one clock cycle when the FIFO is not empty.
  • REG 1 _EN is asserted so that new data are captured into REG 1 .
  • the control unit calculates REG 2 _EN[ 3 . . . 0 ] by taking the minimum of the following 3 values: the decoded version of c[ 1 . . . 0 ], the number of valid components remaining to be processed in REG 1 , and the number of unused channels in REG 2 .
  • a complete output word is available when the number of channels that have been filled in REG 2 is equal to the decoded version of c[ 1 . . . 0 ].
  • the circuit area occupied by the apparatus in FIG. 50 can be substantially reduced by applying a truncation function to the bit offset parameter, such that only a restricted set of offsets are used by the control unit and normalization multiplexors.
  • each of the normalization multiplexors denoted in FIG. 50 by MUX 0 , MUX 1 . . . MUX 31 , to be reduced from 32-to-1 in size when no truncation is applied, to be a maximum size of 20-to-1 with bit offset truncation.
  • the size reduction in turn leads to an improvement in circuit speed.
  • the preferred embodiment provides an efficient circuit for the transformation of data into one of a few normalized forms.
  • the instruction controller 235 “executes” instructions which result in actions being performed by the co-processor 224 .
  • the instructions executed include a number of instructions for the performance of useful functions by the main data path unit 242 . A first of these useful instructions is compositing.
  • the compositing model 462 generally has three input sources of data and the output data or sink 463 .
  • the input sources can firstly include pixel data 464 from the same destination within the memory as the output 463 is to be written to.
  • the instruction operands 465 can be utilized as a data source which includes the color and opacity information.
  • the color and opacity can be either flat, a blend, pixels or tiled.
  • the flat or blend is generated by the blend generator 467 , as it is quicker to generate them internally than to fetch via input/output.
  • the input data can include attenuation data 466 which attenuates the operand data 465 .
  • the attenuation can be flat, bit map or a byte map.
  • pixel data normally consists of four channels with each channel being one byte wide.
  • the opacity channel is considered to be the byte of highest address.
  • the co-processor can utilize pre-multiplied data.
  • Pre-multiplication can consist of pre-multiplying each of the colored channels by the opacity channel.
  • two optional pre-multiplication units 468 , 469 are provided for pre-multiplying the opacity channel 470 , 471 by the colored data to form, where required, pre-multiplied outputs 472 , 473 .
  • a compositing unit 475 implements a composite of its two inputs in accordance with the current instruction data. The compositing operators are illustrated in Table 11 below:
  • a clamp/wrapping unit 476 is provided to clamp or wrap data around the limit values 0-255. Further, the data can be subjected to an optional “unpre-multiplication” 477 restoring the original pixel values as required. Finally, output data 463 is produced for return to the memory.
  • FIG. 52 there is illustrated the form of an instruction word directed to the main data path unit for composting operations.
  • the X field in the major op-code is 1, this indicates a plus operator is to be applied in accordance with the aforementioned table.
  • this field is 0, another instruction apart from the plus operator is to be applied.
  • the P a field determines whether or not to pre-multiply the first data stream 464 (FIG. 51 ).
  • the P b field determines whether or not to pre-multiply the second data stream 465 .
  • the P r field determines whether or not to “unpre-multiply” the result utilising unit 477 .
  • the C field determines whether to wrap or clamp, overflow or underflow in the range 0-255.
  • the “com-code” field determines which operator is to be applied.
  • the plus operator optionally utilizes an offset register (mdp_por). This offset is subtracted from the result of the plus operation before wrapping or clamping is applied.
  • the com-code field is interpreted as a per channel enablement of the offset register.
  • the standard instruction word encoding 280 of FIG. 10 previously discussed is altered for composting operands.
  • operand A will always be the same operand as the result word so operand A can be utilized in conjunction with operand B to describe at greater length the operand B.
  • the A descriptor within the instructions still describes the format of the input and the R descriptor defines the format of the output.
  • FIG. 53 there is illustrated in a first example 470 , the instruction word format of a blend instruction.
  • a blend is defined to have a start 471 and end value 472 for each channel.
  • FIG. 54 there is illustrated 475 the format of a tile instruction which is defined by a tile address 476 a start offset 477 , a length 478 . All tile addresses and dimensions are specified in bytes. Tiling is applied in a modular fashion and, in FIG. 55, there is shown the interpretation of the fields 476 - 478 of FIG. 54 .
  • the tile address 476 denotes the start address in memory of the tile.
  • a tile start offset 477 designates the first byte to be utilized as a start of the tile.
  • the tile length 478 designates the total length of the tile for wrap around.
  • every color component and opacity can be attenuated by an attenuation value 466 .
  • the attenuation value can be supplied in one of three ways:
  • Software can specify a flat attenuation by placing the attenuation factor in the operand C word of the instruction.
  • a bit map attenuation where 1 means fully on and 0 means fully off can be utilized with software specifying the address of the bit map in the operand C word of the instruction.
  • a byte map attenuation can be provided again with the address of the byte map in operand C.
  • the pre-multiplied color channel is multiplied by the attenuation factor by effectively calculating:
  • A is the attenuation and C o is the pre-multiplied color channel.
  • the main data path unit 242 and data cache 230 are also primarily responsible for color conversion.
  • the color space conversion involves the conversion of a pixel stream in a first color space format, for example suitable for RGB color display, to a second color space format, for example suitable for CYM or CYMK printing.
  • the color space conversion is designed to work for all color spaces and can be used for any function from at least one to one or more dimensions.
  • the instruction controller 235 configures, via the Cbus 231 , the main data path unit 242 , the data cache controller 240 , the input interface switch 252 , the pixel organizer 246 , the MUV buffer 250 , the operand organizer B 247 , the operand 35 organizer C 248 and the result organizer 249 to operate in the color conversion mode.
  • an input image consisting of a plurality of lines of pixels is supplied, one line of pixels after another, to the main data path unit 242 as a stream of pixels.
  • the main data path unit 242 (FIG. 2) receives the stream of pixels from the input interface switch 252 via the pixel organizer 246 for color space conversion processing one pixel at a time.
  • interval and fractional tables are pre-loaded into the MUV buffer 250 and color conversion tables are loaded into the data cache 230 .
  • the main data path unit 242 accesses these tables via the operand organizers B and C, and converts these pixels, for example from the RGB color space to the CYM or CYMK color space and supplies the converted pixels to the result organizer 249 .
  • the main data path unit 242 , the data cache 230 , the data controller 240 and the other abovementioned devices are able to operate in either of the following two modes under control of the instruction controller 235 ; a Single Output General Color Space (SOGCS) Conversion mode or a Multiple Output General Color Space (MOGCS) Conversion Mode.
  • SOGCS Single Output General Color Space
  • MOGCS Multiple Output General Color Space
  • Accurate color space conversion can be a highly non-linear process.
  • color space conversion of a RGB pixel to a single primary color component (e.g. cyan) of the CYMK color space is theoretically linear, however in practice non-linearities are introduced typically by the output device which is used to display the colour components of the pixel.
  • non-linearities are introduced typically by the output device which is used to display the colour components of the pixel.
  • the color space conversion of the RGB pixel to the other primary color components (yellow, magenta or black) of the CYMK color space Consequently a non-linear colour space conversion is typically used to compensate for the non-linearities introduced on each colour component.
  • the highly non-linear nature of the color conversion process requires either a complex transfer function to be implemented or a look-up table to be utilized.
  • the main data path 242 uses a look-up table stored in the data cache 230 having sparsely located output color values corresponding to points in the input color space and interpolates between the output color values to obtain an intermediate output.
  • the RGB color space is comprized of 24 bit pixels having 8 bit red, green and blue color components.
  • Each of the RGB dimensions of the RGB color space is divided into 15 intervals with the length of each interval having a substantially inverse proportionality to the non-linear behavior of the transfer function between the RGB to CYMK color space of the printer. That is, where the transfer function has a highly non-linear behavior the interval size is reduced and cohere the transfer function has a more linear behavior, the size of the interval is increased.
  • the color space of each output printer is accurately measured to determine those non-linear portions of its transfer function.
  • the transfer function can be approximated or modelled based on know-how or measured characteristics of a type printer (e.g.: ink-jet).
  • the color component value defines a position within one of the 15 intervals.
  • Two tables are used by the main data path unit 242 to determine which interval a particular input color component value lies within and also to determine a fraction along the interval in which a particular input color component value lies.
  • different tables may be used for output printers having different transfer functions.
  • each of the RGB dimensions is divided into 15 intervals.
  • the RGB color space forms a 3-dimensional lattice of intervals and the input pixels at the ends of the intervals form sparsely located points in the input color space.
  • the output color values of the output color space corresponding to the endpoints of the intervals are stored in look-up tables.
  • an output color value of an input color pixel can be calculated by determining the output color values corresponding to the endpoints of the intervals within which the input pixel lies and interpolating such output color values utilising the fractional values. This technique reduces the need for large memory storage.
  • FIG. 56 there is illustrated 480 an example of determining for a particular input RGB color pixel, the corresponding interval and fractional values.
  • the conversion process relies upon the utilization of an interval table 482 and a fractional table 483 for each 8 bit input color channel of the 24 bit input pixel.
  • the 8 bit input color component 481 shown in a binary form in FIG. 56 having the example decimal number 4, is utilized as a look-up to each of the interval and fractional tables. Hence, the number of entries in each table is 256.
  • the interval table 482 provides a 4 bit output defining one of the intervals numbered 0 to 14 into which the input color component value 481 falls.
  • the fractional table 483 indicates the fraction within an interval that the input color value component 481 falls.
  • the fractional table stores 8 bit values in the range of 0 to 255 which are interpreted as a fraction of 256.
  • this value is utilized to look-up the interval table 482 to produce an output value of 0.
  • the input value 4 is also utilized to look-up the fractional table 483 to produce an output value of 160 which designates the fraction 160/256.
  • the interval lengths are not equal. As noted previously, the length of the intervals are chosen according to the non-linear behavior of the transfer function.
  • each of the interval and fractional tables for each color component are loaded in the MUV buffer 250 (FIG. 2) and accessed by the main data path unit 242 when required.
  • the arrangement of the MUV buffer 250 for the color conversion process is as shown in FIG. 57 .
  • the MUV buffer 250 (FIG. 57) is divided into three areas 488 , 489 and 490 , one area for each color component.
  • Each area e.g. 488 is further divided into a 4 bit interval table and a 8 bit fractional table.
  • a 12 bit output 492 is retrieved by the main data path unit 242 from the MUV buffer 250 for each input color channel. In the example given above of a single input color component having a decimal value 4, the 12 bit output will be 000001010000.
  • the interpolation process consists primarily of interpolation from one three dimensional space 500 , for example RGB color space to an alternative color space, for example CMY or CMYK.
  • the pixels P 0 to P 7 form sparsely located points in the RGB input color space and having corresponding output color values CV(P 0 ) to CV(P 7 ) in the output color space.
  • the output color component value corresponding to the input pixel Pi falling between the pixels P 0 to P 7 is determined by; firstly, determining the endpoints PO, P 1 , . . .
  • the interpolation process includes a one dimensional interpolation in the red (R) direction to calculate the values temp 11 , temp 12 , temp 13 , temp 14 in accordance with the following equations:
  • temp 12 CV(P 2 )+frac_r(CV(P 3 ) ⁇ CV(P 2 ))
  • the interpolation process includes the calculation of a further one dimensional interpolation in the green (G) direction utilising the following equations to calculate the values temp 21 and temp 22 :
  • temp 21 temp 11 +frac_g(temp 12 ⁇ temp 11 )
  • temp 22 temp 13 +frac_g(temp 14 ⁇ temp 13 )
  • FIG. 59 represents a one dimensional mapping of input gamut values to output gamut values. It is assumed that output values are defined for the input values at points 510 and 511 . However, if the greatest output value is clamped at the point 512 then the point 511 must have an output value of this magnitude.
  • the line 515 forms the interpolation line and the input point 516 produces a corresponding output value 517 .
  • this may not be the best color mapping, especially where, without the gamut limitations, the output value would have been at the point 518 .
  • the interpolation line between 510 and 518 would produce an output value of 519 for the input point 516 .
  • the difference between the two output values 517 and 519 can often lead to unsightly artefacts, particularly when printing edge of gamut colors.
  • the interpolation process can either be carried out in the SOCGS conversion mode which converts RGB pixels to a single output color component (for example, cyan) or the MOGCS mode which converts RGB pixels to all the output color components simultaneously.
  • SOCGS conversion mode which converts RGB pixels to a single output color component (for example, cyan)
  • MOGCS mode which converts RGB pixels to all the output color components simultaneously.
  • color conversion is to be carried out for each pixel in an image, many millions of pixels may have to be independently color converted.
  • the main data path unit 242 retrieves for each color input channel, a 12 bit output consisting of a 4 bit interval part and a 8 bit fractional part.
  • the main data path unit 242 concatenates these 4 bit interval parts of the red, green and blue color channels to form a single 12 bit address (I R , I G , I B ), as shown in FIG. 60 as 520 .
  • FIG. 60 shows a data flow diagram illustrating the manner in which a single output color component 563 is obtained in response to the single 12 bit address 520 .
  • the 12 bit address 520 is first fed to an address generator of the data cache controller 240 , such as the generator 1881 (shown in FIG. 141) which generates 8 different 9 bit line and byte addresses 521 for memory banks (B 0 , B 1 , . . . B 7 ).
  • the data cache 230 (FIG. 2) is divided into 8 independent memory banks 522 which can be independently addressed by the respective 8 line and byte addresses.
  • the 12 bit address 520 is mapped by the address generator into the 8 line and byte addresses in accordance with the following table:
  • BIT[ 8 : 6 ], BIT[ 5 : 3 ] and BIT[ 2 : 0 ] represent the sixth to eighth bits, the third to fifth bits and the zero to second bits of the 9 bit bank addresses respectively;
  • R[ 3 : 1 ], G[ 3 : 1 ] and B[ 3 : 1 ] represent the first to third bits of the 4 bit intervals I R , I G and I B of the 12 bit address 520 respectively.
  • bits 1 to 3 of the 4 bit red interval I r of the 12 bit address 520 are mapped to bits 6 to 8 of the 9 bit address B 5 ; bits 1 to 3 and bit 0 of the 4 bit green interval I g are summed and then mapped to bits 3 to 5 of the 9 bit address B 5 ; and bits 1 to 3 of the 4 bit blue interval I b are mapped to bits 0 to 2 of the 9 bit address B 5 .
  • Each of the 8 different line and byte addresses 521 is utilized to address a respective memory bank 522 which consists of 512 ⁇ 8 bit entries, and the corresponding 8 bit output color component 523 is latched for each of the memory banks 522 .
  • the output color values of CV(P 0 ) to CV(P 7 ) correseponding to the endpoints P 0 to P 7 may be located at different positions in the memory banks. For example, a 12 bit address of 0000 0000 0000 will result in the same bank address for each bank, ie 000 000 000.
  • Each memory bank consists of 128 line entries 531 which are 32 bits long and comprize 4 ⁇ 8 bit memories 533 - 536 .
  • the top 7 bits of the memory address 521 are utilized to determine the corresponding row of data within the memory address to latch 542 as the memory bank output.
  • the bottom two bits are a byte address and are utilized as an input to multiplexer 543 to determine which of the 4 ⁇ 8 bit entries should be chosen 544 for output.
  • One data item is output for each of the 8 memory banks per clock cycle for return to the main data path unit 242 .
  • the data cache controller receives a 12 bit byte address from the operand organizer 248 (FIG. 2) and outputs in return to the operand organizers 247 , 248 , the 8 output color values for interpolation calculation by the main data path unit 242 .
  • the interpolation equations are implemented by the main data path unit 242 (FIG. 2) in three stages.
  • a first stage of multiplier and adder units eg. 550 which take as input the relevant color values output by the corresponding memory banks eg. 522 in addition to the red fractional component 551 and calculate the 4 output values in accordance with stage 1 of the abovementioned equations.
  • the outputs eg. 553 , 554 of this stage are fed to a next stage unit 556 which utilizes the frac_g input 557 to calculate an output 558 in accordance with the aforementioned equation for stage 2 of the interpolation process.
  • the output 558 in addition to other outputs eg. 559 of this stage are utilized 560 in addition to the frac_b input 562 to calculate a final output color 563 in accordance with the aforementioned equations.
  • the process illustrated in FIG. 60 is implemented in a pipelined manner so as to ensure maximum overall throughput. Further, the method of FIG. 60 is utilized when a single output color component 563 is required. For example, the method of FIG. 60 can be utilized to first produce the cyan color components of an output image followed by the magenta, yellow and black components of an output image reloading the cache tables between passes. This is particularly suitable for a four-pass printing process which requires each of the output colors as part of separate pass.
  • the co-processor 224 operates in the MOGCS mode in a substantially similar manner to the SOCGS mode, with a number of notable exceptions.
  • the main data path unit 242 , the data cache controller 240 and data cache of FIG. 2 co-operate to produce multiple color outputs simultaneously with four primary colors components being output simultaneously. This would require the data cache 230 to be four times larger in size.
  • the data cache controller 240 stores only one quarter of all the output color values of the output color space. The remaining output color values of the output color space are stored in a low speed external memory and are retrieved as required.
  • This particular apparatus and method is based upon the surprising revelation that the implementation of sparsely located color conversion tables in a cache system have an extremely low miss rate. This is based on the insight there is a low deviation in color values from one pixel to the next in most color images. In addition, there is a high probability the sparsely located output color values will be the same for neighboring pixels.
  • FIG. 62 there will now be described the method carried out by the co-processor to implement multi-channel cached color conversion.
  • Each input pixel is broken into its color components and a corresponding interval table value (FIG. 56) is determined as previously described resulting in the three 4 bit intervals Ir, Ig, Ib denoted 570 .
  • the combined 12 bit number 570 is utilized in conjunction with the aforementioned table 12 to again derive eight 9-bit addresses.
  • the addresses eg. 572 are then re-mapped as will be discussed below with reference to FIG. 63, and then are utilized to look up a corresponding memory bank 573 to produce four colour output channels 574 .
  • the memory bank 573 stores 128 ⁇ 32 bit entries out of a total possible 512 ⁇ 32 bit entries.
  • the memory bank 573 forms part of the data cache 230 (FIG. 2) and is utilized as a cache as will now be described with reference to FIG. 63 .
  • the 9 bit bank input 578 is re-mapped as 579 so as to anti-alias memory patterns by re-ordering the bits 580 - 582 as illustrated. This reduces the likelihood of neighboring pixel values aliasing to the same cache elements.
  • the reorganized memory address 579 is then utilized as an address into the corresponding memory bank eg. 585 which comprizes 128 entries each of 32 bits.
  • the 7 bit line address is utilized to access the memory 585 resulting in the corresponding output being latched 586 for each of the memory banks.
  • Each memory bank, eg 585 has an associated tag memory which comprizes 128 entries each of 2 bits.
  • the 7 bit line address is also utilized to access the corresponding tag in tag memory 587 .
  • the two most significant bits of the address 579 are compared with the corresponding tag in tag memory 587 to determine if the relevant output color value is stored in the cache. These two most significant bits of the 9 bit address correspond to the most significant bits of the red and green data intervals (see Table 12).
  • the RGB input color space is effectively divided into quadrants along the red and green dimensions where the two most significant bits of the 9 bit address designates the quadrant of the RGB input color space.
  • the output color values are effectively divided into four quadrants each designated by a two bit tag. Consequently the output color values for each tag value for a particular line are highly spaced apart in the output color space, enabling anti-aligning of memory patterns.
  • Each of the eight 32 bit sets of data 586 are then forwarded to the main data path unit ( 242 ) which carries out the aforementioned interpolation process (FIG. 62) in three stages 590 - 592 to each of the colored channels simultaneously and in a pipelined manner so as to produce four color outputs 595 for sending to a printer device.
  • the instruction encoding for both color space conversion modes (FIG. 10) utilized by the co-processor has the following structure:
  • the instruction field encoding for color space conversion instruction is illustrated in FIG. 64 with the following minor opcode encoding for the color conversion instructions.
  • FIG. 65 shows a method of converting a stream of RGB pixels into CYMK color values according to the MOGCS mode.
  • step S 1 a stream of 24 bit RGB pixels are received by the pixel organiser 246 (FIG. 2 ).
  • the pixel organiser 246 determines the 4 bit interval values and the 8 bit fractional values of each input pixel from lookup tables, in the manner previously discussed with respect to FIGS. 56 and 57.
  • the interval and fractional values of the input pixel designate which intervals and fractions along the intervals in which the input pixel lies.
  • step S 3 the main data path unit 242 concatenates the 4 bit intervals of the red, green and blue color components of the input pixel to form a 12 bit address word and supplies this 12 bit address word to the data cache controller 240 (FIG. 2 ).
  • step S 4 the data cache controller 240 converts this 12 bit address word into 8 different 9 bit addresses, in the manner previously discussed with respect to Table 12 and FIG. 62 . These 8 different addresses designate the location of the 8 output color values CV(P 0 )-CV(P 7 ) in the respective memory banks 573 (FIG. 62) of the data cache 230 (FIG. 2 ).
  • step S 5 the data cache controller 240 (FIG. 2) remaps the 8 different 9 bit addresses in the manner described previously with respect to FIG. 63 . In this way, the most significant bit of the red and green 4 bit intervals are mapped to the two most significant bits of the 9 bit addresses.
  • step S 6 the data cache controller 240 then compares the two most significant bits of the 9 bit addresses with respective 2 bit tags in memory 587 (FIG. 63 ). If the 2 bit tag does not correspond to the two most significant bits of the 9 bit addresses, then the output color values CV(P 0 )-CV(P 7 ) do not exist in the cache memory 230 . Hence, in step S 7 , all the output color values corresponding to the 2 bit tag entry for that line are read from external memory into the data cache 230 . If the 2 bit tag corresponds to these two most significant bits of the 9 bit addresses, then the data cache controller 240 retrieves in step S 8 the eight output color values CV(P 0 )-CV(P 7 ) in the manner discussed previously with respect to FIG.
  • step S 7 the main data path unit 242 interpolates the output color values CV(P 0 )-CV(P 7 ) utilising the fractional values determined in step S 2 and outputs the interpolated output color values.
  • the storage space of the data cache storage may be reduced further by dividing the RGB color space and the corresponding output color values into more than four quadrants, for example 32 blocks.
  • the data cache can have the capacity of storing only a ⁇ fraction (1/32) ⁇ block of output color values.
  • the data caching arrangement utilized in the MOGCS mode can also be used in a single output general conversion mode. Hence, in the latter mode the storage space of the data cache can also be reduced.
  • JPEG Joint Photographic Experts Group
  • JPEG Still Image Data Compression Standard by Pennebaker and Mitchell published 1993 by Van Nostrand Reinhold.
  • the co-processor 224 utilizes a subset of the JPEG standard in the storage of images.
  • the JPEG standard has the advantage that large factor compression can be gained with the retention of substantial image quality.
  • other standards for storing compressed images could be utilized.
  • the JPEG standard is well-known to those skilled in the art, and the various JPEG alternative implementations readily available in the marketplace from manufacturers including JPEG core products for incorporation into ASICS.
  • the co-processor 224 implements JPEG compression and decompression of images consisting of 1, 3 or 4 color components.
  • One-color-component images may be meshed or unmeshed. That is, a single-color-component can be extracted from meshed data or extracted from unmeshed data.
  • An example of meshed data is three-color components per pixel datum (i.e., RGB per pixel datum), and an example of unmeshed data is where each color component for an image is stored separately such that each color component can be processed separately.
  • the co-processor 224 utilizes one pixel per word, assuming the three color channels to be encoded in the lowest three bytes.
  • the JPEG standard decomposes an image into small two dimensional units called minimum coded units (MCU). Each minimal coded unit is processed separately.
  • the JPEG coder 241 (FIG. 2) is able to deal with MCU's which are 16 pixels wide and 8 pixels high for down sampled images or MCU's which are 8 pixels wide and 8 pixels high for images that are not to be down sampled.
  • FIG. 66 there is illustrated the method utilized for down sampling three component images.
  • the original pixel data 600 is stored in the MUV buffer 250 (FIG. 2) in a pixel form wherein each pixel 601 comprizes Y, U and V components of the YUV color space.
  • This data is first converted into a MCU unit which comprizes four data blocks 601 - 604 .
  • the data blocks comprize the various color components, with the Y component being directly sampled 601 , 602 and the U and V components being sub-sampled in the particular example of FIG. 13 to form blocks 603 , 604 .
  • Two forms of sub-sampling are implemented by the co-processor 224 , including direct sampling where no filtering is applied and odd pixel data is retained while even pixel data is discarded. Alternatively, filtering of the U and V components can occur with averaging of adjacent values taking place.
  • JPEG sub-sampling is four color channel sub-sampling as illustrated in FIG. 67 .
  • pixel data blocks of 16 ⁇ 8 pixels 610 each have four components 611 including an opacity component (O) in addition to the usual Y, U, V components.
  • This pixel data 410 is sub-sampled in a similar manner to that depicted in FIG. 66 .
  • the opacity channel is utilized to form data blocks 612 , 613 .
  • the JPEG encoder/decoder 241 is utilized for both JPEG encoding and decoding.
  • the encoding process receives block data via bus 620 from the pixel organizer 246 (FIG. 2 ).
  • the block data is stored within the MUV buffer 250 which is utilized as a block staging area.
  • the JPEG encoding process is broken down into a number of well defined stages. These stages include:
  • DCT discrete cosine transform
  • variable length encoding the output of the coefficients coder stage, carried out by Huffman coder unit 624 .
  • the output is fed via multiplexer 625 and Rbus 626 to the result organizer 629 (FIG. 2 ).
  • the JPEG decoding process is the inverse of JPEG encoding with the order of operations reversed. Hence, the JPEG decoding process comprizes the steps of inputting on Bus 620 a JPEG block of compressed data.
  • the compressed data is transferred via Bus 630 to the Huffman coder unit 624 which Huffman decodes data into DC differences and AC run lengths.
  • the data is forwarded to the co-efficients coder 623 which decodes the AC and DC coefficients and puts them into their natural order.
  • the quantizer unit 622 dequantizes the DC coefficients by multiplying them by a corresponding quantization value.
  • the DCT unit 621 applies an inverse discrete cosine transform to restore the original data which is then transferred via Bus 631 to the multiplexer 625 for output via Bus 626 to the Result Organizer.
  • the JPEG coder 241 operates in the usual manner via standard CBus interface 632 which contains the registers set by the instructions controller in order to begin operation of the JPEG coder.
  • both the quantizer unit 622 and the Huffman coder 624 require certain tables which are loaded in the data cache 230 as required.
  • the table data is accessed via an OBus interface unit 634 which connects to the operand organizer B unit 247 (FIG. 2) which in turn interacts with the data cache controller 240 .
  • the DCT unit 621 implements forward and inverse discrete cosine transforms on pixel data. Although many different types of DCT transforming implementations are known and discussed in the Still Image Data Compression Standard (ibid), the DCT 621 implements a high speed form of transform more fully discussed in the section herein entitled A Fast DCT Apparatus, which may implement a DCT transform operation in accordance with the article entitled A Fast DCT-SQ Scheme for Images by Arai et. al., published in The Transactions of the IEICE, Vol E71, No. 11, November 1988 at page 1095.
  • the quantizer 622 implements quantization and dequantization of DCT components and operates via fetching relevant values from corresponding tables stored in the data cache via the OBus interface unit 634 .
  • the incoming data stream is divided by values read from quantization tables stored in the data cache. The division is implemented as a fixed point multiply.
  • the data stream is multiplied by values kept in the dequantization table.
  • the quantizer 622 includes a DCT interface 640 responsible for passing data to and receiving data from the DCT module 621 via a local Bus. During quantization, the quantizer 622 receives two DCT coefficients per clock cycle. These values are written to one of the quantizers internal buffers 641 , 642 .
  • the buffers 641 , 642 are dual ported buffers used to buffer incoming data. During quantization, co-efficient data from the DCT sub-module 621 is placed into one of the buffers 641 , 642 .
  • the data is read from the buffer in a zig zag order and multiplied by multiplier 643 with the quantization values received via OBus interface unit 634 .
  • the output is forwarded to the co-efficient coder 623 (FIG. 68) via co-efficient coder interface 645 . While this is happening, the next block of coefficients is being written to the other buffer.
  • the quantizer module dequantizes decoded DCT co-efficients by multiplying them by values stored in the table.
  • the multiplier 643 is utilized during quantization and dequantization. The position of the co-efficient within the block of 8 ⁇ 8 values is used as the index into the dequantization table.
  • the two buffers 641 , 642 are utilized to buffer incoming co-efficient data from the co-efficient coder 623 (FIG. 68 ).
  • the data is multiplied with its quantization value and written into the buffers in reverse zig zag order.
  • the dequantized co-efficients are read out of the utilized buffer in natural order, two at a time, and passed via DCT interface 640 to the DCT sub-module 621 (FIG. 68 ).
  • the co-efficients coder interface module 645 is responsible for interfacing to the co-efficients coder and passes data and receives data from the coder via a local Bus.
  • This module also reads data from buffers in zig zag order during compression and writes data to the buffers in reverse zig zag order during decompression.
  • Both the DCT interface module 640 and the CC interface module 645 are able to read and write from buffers 641 , 642 .
  • address and control multiplexer 647 is provided to select which buffer each of these interfaces is interacting with under the control of a control module 648 , which comprizes a state machine for controlling all the various modules in the quantizer.
  • the multiplier 643 can be a 16 ⁇ 8, 2's complement multiplier which multiplies DCT coefficients by quantization table values.
  • the co-efficient coder 623 performs the functions of:
  • the co-efficient coder 623 is also able to be utilized for predictive encoding/decoding of pixels and memory copy operations as required independently of JPEG mode operation.
  • the co-efficient coder 623 implements predictive and run length encoding and decoding of DC and AC coefficients as specified in the Pink Book.
  • a standard implementation of predictive encoding and predictive decoding in addition to JPEG AC coefficients run lengthing encoding and decoding as specified in the JPEG standard is implemented.
  • the Huffman coder 624 is responsible for Huffman encoding and decoding of the JPEG data train.
  • the run length encoded data is received from the coefficients coder 623 and utilized to produce a Huffman stream of packed bytes.
  • the Huffman stream is read from the PBus interface 620 in the form of packed bytes and the Huffman decoded coefficients are presented to the co-efficient coder module 623 .
  • the Huffman coder 624 utilizes Huffman tables stored in the data cache and accessed via OBus interface 634 . Alternatively, the Huffman table can be hardwired for maximum speed.
  • This bank holds the 256, 16 bit entries of a EHUFCO_DC_1 or EHUFCO table. The least significant bit of the index chooses between the two 16 bit items in the 32 bit word. All 128 lines of this bank of memory are used.
  • This bank holds the 256, 16 bit entries of a EHUFCO_DC_2 table. The least significant bit of the index chooses between the two 16 bit items in the 32 bit word. All 128 lines of this bank of memory are used.
  • This bank holds the 256, 16 bit entries of a EHUFCO_AC_1 table. The least significant bit of the index chooses between the two 16 bit items in the 32 bit word.
  • This bank holds the 256, 16 bit entries of a EHUFCO_AC_2 table. The least significant bit of the index chooses between the two 16 bit items in the 32 bit word. All 128 lines of this bank of memory are used.
  • 4 This bank holds the 256,4 bit entries of a EHUFSI_DC_1 or EHUFSI table, as well as the 256, 4 bit entries of a EHUFSI_DC_2 table. All 128 lines of this bank of memory are used.
  • 5 holds the 256, 4 bit entries of a EHUFSI_AC_1 table, as well as the 256, 4 bit entries of a EHUFSI_AC_2 table, All 128 lines of this bank of memory are used.
  • 6 Not used 7 This banks holds the 128, 24 bit entries of the quantization table. It occupies the least significant 3 bytes of all 128 lines of this bank of memory.
  • the Huffman coder 624 consists primarily of two independent blocks being an encoder 660 and a decoder 661 . Both blocks 660 , 661 the same OBus interface via a multiplexer module 662 . Each block has its own input and output with only one block active at a time, depending on the function performed by the JPEG encoder.
  • Huffman tables are used to assign codes of varying lengths (up to 16 bits per code) to the DC difference values and to the AC run-length values, which are passed to the HC submodule from the CC submodule. These tables have to be preloaded into the data cache before the start of the operation.
  • the variable length code words are then concatenated with the additional bits for DC and AC coefficients (also passed from the CC submodule, then packed into bytes.
  • a X′00 byte is stuffed in if an X′FF byte is obtained as a result of packing. If there is a need for an RST m marker it is inserted.
  • the need for an RST m marker is signalled by the CC submodule.
  • the HC submodule inserts the EOI marker at the end of image, signalled by the “final” signal on the PBus-CC slave interface.
  • the insertion procedure of the EOI marker requires similar packing, padding and stuffing operations as for RST m markers.
  • the output stream is finally passed as packed bytes to the Result Organizer 249 for writing to external memory.
  • non-JPEG mode data is passed to the encoder from the CC submodule (PBus-CC slave interface) as unpacked bytes.
  • CC submodule PBus-CC slave interface
  • Each byte is separately encoded using tables preloaded into the cache (similarly to JPEG mode), the variable length symbols are then assembled back into packed bytes and passed to the Results Organizer 249 .
  • the very last byte in the output stream is padded with 1's.
  • the fast JPEG Huffman decoding algorithm maps Huffman symbols to either DC difference values or AC run-length values. It is specifically tuned for JPEG and assumes that the example Huffman tables (K3, K4, K5 and K6) were used during compression. The same tables are hard wired in to the algorithm allowing decompression without references to the cache memory.
  • This decoding style is intended to be used when decompressing images to be printed where certain data rates need to be guaranteed.
  • the data rate for the HC submodule decompressing a band is almost one DC/AC co-efficient per clock cycle.
  • One clock cycle delay between the HC submodule and CC sub-module may happen for each X′00 stuff byte being removed from the data stream, however this is strongly data dependent.
  • the Huffman decoder operates in a faster mode for the extraction of one Huffman symbol per clock cycle.
  • the fast Huffman decoder is described in the section herein entitled Decoder of Variable Length Codes.
  • the Huffman decoder 661 also implements a heap-based slow decoding algorithm and has a structure 670 as illustrated in FIG. 71 .
  • the STRIPPER 671 removes the X′00 stuff bytes, the X′FF fill bytes and RST m markers, passing Huffman symbols with concatenated additional bits to the SHIFTER 672 . This stage is bypassed for Huffman-only coded streams.
  • the first step in decoding a Huffman symbol is to look up the 256 entries HUFVAL table stored in the cache addressing it with the first 8 bits of the Huffman data stream. If this yields a value (and the true length of the corresponding Huffman symbol), the value is passed on to the OUTPUT FORMATTER 676 , and the length of the symbol and the number of the additional bits for the decoded value are fed back to the SHIFTER 672 enabling it to pass the relevant additional bits to the OUTPUT FORMATTER 676 and align the new front of the Huffman stream presented to the decoding unit 673 .
  • the number of the additional bits is a function of the decoded value.
  • the heap address is calculated and successive heap (located in the cache, too) accesses are performed following the algorithm until a match is found or an “illegal Huffman symbol” condition met.
  • a match results in identical behavior as in case of the first match and “illegal Huffman symbol” generates an interrupt condition.
  • the algorithm for heap-based decoding algorithm is as follows:
  • the STRIPPER 671 removes any X′00 stuff bytes, X′FF fill bytes and RST m markers from the incoming JPEG 671 coded stream and passes “clean” Huffman symbols with concatenated additional bits to the shifter 672 . There are no additional bits in Huffman-only encoding, so in this mode the passed stream consists of Huffman symbols only.
  • the shifter 672 block has a 16 bit output register in which it presents the next Huffman symbol to the decoding unit 673 (bitstream running from MSB to LSB). Often the symbol is shorter than 16 bits, but it is up to the decoding unit 673 to decide how many bits are currently being analysed.
  • the shifter 672 receives a feedback 678 from the decoding unit 673 . namely the length of the current symbol and the length of the following additional bits for the current symbol (in JPEG mode), which allows for a shift and proper alignment of the beginning of the next symbol in the shifter 672 .
  • the decoding unit 673 implements the core of the heap based algorithm and interfaces to the data cache via the OBus 674 . It incorporates a Data Cache fetch block, lookup value comparator, symbol length counter, heap index adder and a decoder of the number of the additional bits (the decoding is based on the decoded value).
  • the fetch address is interpreted as follows:
  • the OUTPUT FORMATTER block 676 packs decoded 8-bit values (standalone Huffman mode), or packs 24-bit value+additional bits+RST m marker information (JPEG mode) into 32-bit words.
  • the additional bits are passed to the OUTPUT FORMATTER 676 by the shifter 672 after the decoding unit 673 decides on the start position of the additional bits for the current symbol.
  • the OUTPUT FORMATTER 673 also implements a 2 deep FIFO buffer using a one word delay for prediction of the final value word. During the decoding process, it may happen that the shifter 672 (either fast or slow) tries to decode the trailing padding bits at the end of the input bitstream.
  • the Huffman encoder 660 of FIG. 70 is illustrated in FIG. 72 in more detail.
  • the Huffman encoder 660 maps byte data into Huffman symbols via look up tables and includes a encoding unit 681 , a shifter 682 and a OUTPUT FORMATTER 683 with the lookup tables being accessed from the cache.
  • Each submitted value 685 is coded by the encoding unit 681 using coding tables stored in the data cache.
  • One access to the cache 230 is needed to encode a symbol, although each value being encoded requires two tables, one that contains the corresponding code and the other that contains the code length.
  • a separate set of tables is needed for AC and DC coefficients. If subsampling is performed, separate tables are required for subsampled and non subsampled components. For non-JPEG compression, only two tables (code and size) are needed.
  • the code is then handled by the shifter 682 which assembles the outgoing stream on bit level.
  • the Shifter 682 also performs RST m and EOI markers insertion which implies byte padding, if necessary.
  • Bytes of data are then passed to the OUTPUT FORMATTER 683 which does stuffing (with X′00 bytes), filling with X′FF bytes, also the FF bytes leading the marker codes and formatting to packed bytes. In the non-JPEG mode, only formatting of packed bytes is required.
  • Insertion of X′FF bytes is handled by the shifter 682 , which means that the output formatter 683 needs to tell which bytes passed from the shifter 682 represent 5 markers, in order to insert an X′FF byte before. This is done by having a register of tags which correspond to bytes in the shifter 682 . Each marker, which must be on byte boundaries anyway, is tagged by the shifter 682 during marker insertion. The packer 683 does not insert stuff bytes after the X′′FF′′ bytes preceding the markers. The tags are shifted synchronously with the main shift register.
  • the Huffman encoder uses four or eight tables during JPEG compression, and two tables for straight Huffman encoding.
  • the tables utilized are as follows:
  • EHUFSI 256 Huffman code sizes Used during straight Huffman encoding. Uses the coded value as an index.
  • Uses magnitude category Used for subsampled blocks.
  • Huffman tables are stored locally by the co-processor data cache 230 .
  • the data cache is utilized for storing various Huffman tables.
  • the format of the data cache is as follows:
  • This bank holds the most significant 4 bits of both the DC and AC Huffman decode tables. The least significant 2 bits of each index chooses between the 4 respective nibbles within each word. 7 This bank holds the 128, 24 bit entires of the quantization table. It occupies the least significant 3 bytes of all 128 lines of this bank of memory.
  • the appropriate image width value in the image dimensions register (PO_IDR) or (RO_IDR) must be set.
  • the length of the instruction refers to the number of input data items to be processed. This includes any padding data and accounts for any sub-sampling options utilized and for the number of color channels used.
  • All instructions issued by the co-processor 224 may utilize two facilities for limiting the amount of output data produced. These facilities are most useful for instructions where the input and output data sizes are not the same and in particular where the output data size is unknown, such as for JPEG coding and decoding. The facilities determine whether the output data is written out or merely discarded with everything else being as if the instruction was properly processed. By default, these facilities are normally disabled and can be enabled by enabling the appropriate bits in the RO_CFG register. JPEG instructions however, include specific option for setting these bits. Preferably, when utilising JPEG compression, the co-processor 224 provides facilities for “cutting” and “limiting” of output data.
  • An input image 690 may be of a certain height 691 and a certain width 692 . Often, only a portion of the image is of interest with other portions being irrelevant for the purposes of printing out. However, the JPEG encoding system deals with 8 ⁇ 8 blocks of pixels. It may be the case that, firstly, the image width is not an exact multiple of 8 and additionally, the section of interest comprising MCU 695 does not fit across exact boundaries.
  • An output cut register, RO_cut specifies the number of output bytes at 696 at the beginning of the output data stream to discard. Further, an output limit register, RO_LMT specifies the maximum number of output bytes to be produced. This count includes any bytes that do not get written to memory as a result of the cut register. Hence, it is possible to target a final output byte 698 beyond which no data is to be outputted.
  • the first case is the extraction or decompression of a sub-section 700 of one strip 701 of a decompressed image.
  • the second useful case is illustrated in FIG. 75 wherein the extraction or decompression of a number of complete strips (eg. 711 , 712 and 713 ) is required from an overall image 714 .
  • the instruction format and field encoding for JPEG instructions is as illustrated in FIG. 76 .
  • the minor opcode fields are interpreted as follows:
  • the co-processor 224 provides for the ability to utilize portions of the JPEG coder 241 of FIG. 2 in other ways.
  • Huffman coding is utilized for both JPEG and many other methods of compression.
  • data coding instructions for manipulating the Huffman coding unit only for hierarchial image decompression.
  • the run length coder and decoder and the predictive coder can also be separately utilized with similar instructions.
  • a discrete cosine transform (DCT) apparatus as shown in FIG. 77 performs a full two-dimensional (2-D) transformation of a block of 8 ⁇ 8 pixels by first performing a 1-D DCT on the rows of the 8 ⁇ 8 pixel block. It then performs another 1-D DCT on the columns of the 8 ⁇ 8 pixel block.
  • Such an apparatus typically consists of an input circuit 1096 , an arithmetic circuit 1104 , a control circuit 1098 , a transpose memory circuit 1090 , and an output circuit 1092 .
  • the input circuit 1096 accepts 8-bit pixels from the 8 ⁇ 8 block.
  • the input circuit 1096 is coupled by intermediate multiplexers 1100 , 1102 to the arithmetic circuit 1004 .
  • the arithmetic circuit 1104 performs mathematical operations on either a complete row or column of the 8 ⁇ 8 block.
  • the control circuit 1098 controls all the other circuits, and thus implements the DCT algorithm.
  • the output of the arithmetic circuit is coupled to the transpose memory 1090 , register 1095 and output circuit 1092 .
  • the transpose memory is in turn connected to multiplexer 1100 , which provides output to the next multiplexer 1102 .
  • the multiplexer 1102 also receives input from the register 1094 .
  • the transpose circuit 1090 accepts 8 ⁇ 8 block data in rows and produces that data in columns.
  • the output circuit 1092 provides the coefficients of the DCT performed on a 8 ⁇ 8 block of pixel data.
  • the arithmetic circuit 1104 of FIG. 77 is typically implemented by breaking the arithmetic process down into several stages as described hereinafter with reference to FIG. 78. A single circuit is then built that implements each of these stages 1114 , 1148 , 1152 , 1156 using a pool of common resources, such as adders and multipliers. Such a circuit 1104 is mainly disadvantageous due to it being slower than optimal, because a single, common circuit is used to implement the various stages of circuit 1104 . This includes a storage means used to store intermediate results. Since the time allocated for the clock cycle of such a circuit must be greater or equal to the time of the slowest stage of the circuit, the overall time is potentially longer than the sum of all the stages.
  • FIG. 78 depicts a typical arithmetic data path, in accordance with the apparatus of FIG. 77, as part of a DCT with four stages.
  • the drawing does not reflect the actual implementation, but instead reflects the functionality.
  • Each of the four stages 1144 , 1148 , 1152 , and 1156 is implemented using a single, reconfigurable circuit. It is reconfigured on a cycle-by-cycle basis to implement each of the four arithmetic stages 1144 , 1148 , 1152 , and 1156 of the 1-D DCT.
  • each of the four stages 1144 , 1148 , 1152 , and 1156 uses pool of common resources (e.g. adders and multipliers) and thus minimises hardware.
  • the four stages 1144 , 1148 , 1152 , and 1156 are each implemented from the same pool of adders and multipliers.
  • the period of the clock is therefore determined by the speed of the slowest stage, which in this example is 20 ns (for block 1144 ).
  • the total time is 27 ns.
  • the fastest this DCT implementation can run at is 27 ns.
  • Pipelined DCT implementations are also well known.
  • the drawback with such implementations is that they require large amounts of hardware to implement. Whilst the present invention does not offer the same performance in terms of throughput, it offers an extremely good performance/size compromise, and good speed advantages over most of the current DCT implementations.
  • FIG. 79 shows a block diagram of the preferred form of discrete cosine transform unit utilized in the JPEG coder 241 (FIG. 2) where pixel data is inputted to an input circuit 1126 which captures an entire row of 8-bit pixel data.
  • the transpose memory 1118 converts row formatted data into column formatted data for the second pass of the two dimensional discrete cosine transform algorithm.
  • Data from the input circuit 1126 and the transpose memory 1118 is multiplexed by multiplexer 1124 , with the output data from multiplexer 1124 presented to the arithmetic circuit 1122 .
  • Results data from the arithmetic circuit 1122 is presented to the output circuit 1120 after the second pass of the process.
  • the control circuit 1116 controls the flow of data through the discrete cosine transform apparatus.
  • row data from the image to be transformed, or transformed image coefficients to be transformed back to pixel data is presented to the input circuit 1126 .
  • the multiplexer 1124 is configured by the control circuit 1116 to pass data from the input circuit 1126 to the arithmetic circuit 1122 .
  • FIG. 80 there is shown the structure of the arithmetic circuit 1122 in more detail.
  • the results from the forward circuit 1138 which is utilized to calculate the forward discrete cosine transform is selected via the multiplexer 1142 , which is configured in this way by the control circuit 1116 .
  • the output from the inverse circuit 1140 is selected via the multiplexer 1142 , as controlled by the control circuit 1126 .
  • the first pass after each row vector has been processed by the arithmetic circuit 1122 (configured in the appropriate way by control circuit 1116 ), that vector is written into the transpose memory 1118 . Once all eight row vectors in an 8 ⁇ 8 block have been processed and written into the transpose memory 1118 , the second pass of the discrete cosine transform begins.
  • the multiplexer 1124 is configured by the control circuit to ignore data from the input circuit 1136 and pass column vector data from the transpose memory 1118 to the arithmetic circuit 1122 .
  • the multiplexer 1142 in the arithmetic circuit 1122 is configured by the control circuit 1116 to pass results data from the inverse circuit 1140 to the output of the arithmetic circuit 1122 . When results from the arithmetic circuit 1122 are available, they are captured by the output circuit 1120 under direction from the control circuit 1116 to be outputted sometime later.
  • the arithmetic circuit 1122 is completely combinatorial, in that is there are no storage elements in the circuit storing intermediate results.
  • the control circuit 1116 knows how long it takes for data to flow from the input circuit 1136 , through the multiplexer 1124 and through the arithmetic circuit 1122 , and so knows exactly when to capture the results vector from the outputs of the arithmetic circuit 1122 into the output circuit 1120 .
  • the advantage of having no intermediate stages in the arithmetic circuit 1122 is that no time is wasted getting data in and out of intermediate storage elements, but also the total time taken for data to flow through the arithmetic circuit 1122 is equal to the sum of all the internal stages and not N times the delay of the longest stage (as with conventional discrete cosine transform implementations), where N is the number of stages in the arithmetic circuit.
  • the advantage of this circuit is that it provides an opportunity to reduce the overall system's clock period. Assuming that four clock cycles are allocated to getting a result from the circuit depicted in FIG. 81, the fastest run time for the entire DCT system would be 57/4 ns (14.25 ns), which is a significant improvement over the circuit in FIG. 78 which only allows for a DCT clock period of substantially 27 ns.
  • An examplary implementation of the present DCT apparatus might, but not necessarily, use the DCT algorithm proposed in the paper to The Transactions of the IEICE, Vol. E 71 . No. 11, November 1988, entitled A Fast DCT-SQ Scheme for Images at page 1095 by Yukihiro Arai, Takeshi Agui and Masayuki Nakajima.
  • this algorithm By implementing this algorithm in hardware, it can then easily be placed in the current DCT apparatus in the arithmetic circuit 1122 .
  • other DCT algorithms may be implemented in hardware in place of arithmetic circuit 1122 .
  • the embodiments of the invention provide efficient and fast, single stage (clock cycle) decoding of variable-length coded data in which byte aligned and not variable length encoded data is removed from the encoded data stream in a separate pre-processing block. Further, information about positions of the removed byte-aligned data is passed to the output of the decoder in a way which is synchronous with the data being decoded. In addition, it provides fast detection and removal of not byte-aligned and not variable length encoded bit fields that are still present in the pre-processed input data.
  • the preferred embodiment of the present invention preferably provides for a fast Huffman decoder capable of decoding a JPEG encoded data at a rate of one Huffman symbol per clock cycle between marker codes. This is accomplished by means of separation and removal of byte aligned and not Huffman encoded marker headers, marker codes and stuff bytes from the input data first in a separate pre-processing block. After the byte aligned data is removed, the input data is passed to a combinatorial data-shifting block, which provides continuous and contiguous filling up of the data decode register that consequently presents data to a decoding unit. Positions of markers removed from the original input data stream are passed on to a marker shifting block, which provides shifting of marker position bits synchronously with the input data being shifted in the data shifting block.
  • the decoding unit provides combinatorial decoding of the encoded bit field presented to its input by the data decode register.
  • the bit field is of a fixed length of n bits.
  • the output of the decoding unit provides the decoded value (v) and the actual length (m) of the input code, where m is less than or equal to n. It also provides the length (a) of a variable length bit field, where (a) is greater than or equal to 0.
  • the variable-length bit field is not Huffman encoded and follows immediately the Huffman code.
  • the n-long bit field presented to the input of the decoding unit may be longer than or equal to the actual code.
  • the decoding unit determines the actual length of the code (m) and passes it together with the length of the additional bits (a) to a control block.
  • the control block calculates a shift value (a+m) driving the data and marker shifting blocks to shift the input data for the next decoding cycle.
  • the apparatus of the invention can comprise any combinatorial decoding unit, including ROM, RAM, PLA or anything else based as long as it provides a decoded value, the actual length of the input code, and the length of the following not Huffman encoded bit field within a given time frame.
  • the decoding unit outputs predictively encoded DC difference values and AC run-length values as defined in JPEG standard.
  • the not Huffman encoded bit fields which are extracted from the input data simultaneously with decoded values, represent additional bits determining the value of the DC and AC coefficients as defined in JPEG standard.
  • the padding zone comprises up to k most significant bits of the data register and is indicated by the presence of a marker bit within k most significant bits of the marker register, position of said marker bit limiting the length of the padding zone. If all the bits in the padding zone are identical (and equal to 1 s in case of JPEG standard), they are considered as padding bits and are removed from the data register accordingly without being decoded. The contents of the data and marker registers are then adjusted for the next decoding cycle.
  • the exemplary apparatus comprises an output block that handles formatting of the outputted data according to the requirements of the preferred embodiment of the invention. It outputs the decoded values together with the corresponding not variable length encoded bit fields, such as additional bits in JPEG, and a signal indicating position of any inputted byte aligned and not encoded bit fields, such as markers in JPEG, with respect to the decoded values.
  • Data being decoded by the JPEG coder 241 is JPEG compatible and comprizes variable length Huffman encoded codes interleaved with variable length not encoded bit fields called “additional bits”, variable length not encoded bit fields called “padding bits” and fixed length, byte aligned and not encoded bit fields called “markers”, “stuff bytes” and “fill bytes”.
  • FIG. 82 shows a representative example of input data.
  • FIG. 83 illustrates the architecture of the Huffman decoder of the JPEG data in more detail.
  • the stripper 1171 removes marker codes (code FFXX hex , XX being non zero), fill bytes (code FF hex ) and stuff bytes (code 00 hex following code FF hex ), that is all byte aligned components of the input data, which are presented to the stripper as 32 bit words. The most significant bit of the first word to be processed is the head of the input bit stream.
  • the byte aligned bit fields are removed from each input data word before the actual decoding of Huffman codes takes place in the downstream parts of the decoder.
  • the input data arrives at the stripper's 1171 input as 32-bit words, one word per clock cycle. Numbering of the input bytes 1211 from 0 to 3 is shown in FIG. 85 . If a byte of a number (i) is removed because it is a fill byte, a stuff byte or belongs to a marker, the remaining bytes of numbers (i ⁇ 1) down to 0 are shifted to the left on the output of the stripper 1171 and take numbers (i) down to 1. Byte 0 becoming a “don't care” byte. Validity of bytes outputted by the stripper 1171 is also coded by means of separate output tags 1212 as shown in FIG. 85 .
  • the bytes which are not removed by the stripper 1171 are left aligned on the stripper's output.
  • Each byte on the output has a corresponding tag indicating if the corresponding byte is valid (i.e. passed on by the stripper 1171 ), or invalid (i.e. removed by the stripper 1171 ) or valid and following a removed marker.
  • the tags 1212 control loading of the data bytes into the data register 1182 through the data shifter and loading of marker positions into the marker register 1183 through the marker shifter. The same scheme applies if more than one byte is removed from the input wvord: all the remaining valid bytes are shifted to the left and the corresponding output tags indicate validity of the output bytes.
  • FIG. 85 provides examples 1213 of output bytes and output tags for various example combinations of input bytes.
  • the role of the preshifter and postshifter blocks 1172 , 1173 , 1180 , 1181 is to assure loading of the data into the corresponding data register 1182 and marker register 1183 in a contiguous way whenever there is enough room in the data register and the marker register.
  • the data shifter and the marker shifter blocks which consist of the respective pre- and postshifters, are identical and identically controlled. The difference is that while the data shifter handles data passed by the stripper 1171 , the marker shifter handles the tags only and its role is to pass marker positions to the output of the decoder in a way synchronous with the decoded Huffman values.
  • the outputs of the postshifters 1180 , 1181 feed directly to the respective registers 1182 , 1183 , as shown in FIG. 83 .
  • data arriving from the stripper 1171 is firstly extended to 64 bits by appending 32 zeroes to the least significant bit 1251 . Then the extended data is shifted in a 64 bit wide barrel shifter 1252 to the right by a number of bits currently present in the data register 1182 . This number is provided by the control logic 1185 which keeps track of how many valid bits are there in the data 1182 and marker 1183 registers.
  • the barrel shifter 1252 then presents 64 bits to the multiplexer block 1253 , which consists of 64 2 ⁇ 1 elementary multiplexers 1254 . Each elementary 2 ⁇ 1 multiplexer 1254 takes as inputs one bit from the barrel shifter 1252 and one bit from the data register 1182 .
  • the control signals to all the elementary multiplexers 1254 are decoded from a control block's shift control 1 signals as shown in FIG. 86, which are also shown in FIG. 87 as preshifter control bits 0 . . . 5 of register 1223 .
  • the outputs of the elementary multiplexers 1254 drive a barrel shifter 1255 . It shifts left by the number of bits provided on a 5 bit control signal shift control 2 as shown in FIG. 86 .
  • bits represent the number of bits consumed from the data register 1182 by the decoding of the current data, which can be either the length of the currently decoded Huffman code plus the number of the following additional bits, or the number of padding bits to be removed if padding bits are currently being detected, or zero if the number of valid data bits in the data register 1182 is less then the number of bits to be removed.
  • the data appearing on the output of barrel shifter 1255 contains new data to be loaded into the data register 1182 after a single decoding cycle.
  • the contents of the data register 1182 changes in such a way that the leading (most significant) bits are shifted out of the register as being decoded, and 0, 8, 16, 24 or 32 bits from the stripper 1171 are added to the contents of the data register 1182 .
  • the marker preshifter 1173 , postshifter 1181 and the marker register 1183 are units identical to the data preshifter 1172 , data postshifter 1180 and the data register 1182 , respectively.
  • the data flow inside units 1173 , 1181 and 1183 and among them is also identical as the data flow among units 1172 , 1180 and 1182 .
  • the same control signals are provided to both sets of units by the control unit 1185 . The difference is only in the type of data on the inputs of the marker preshifter 1173 and data preshifter 1172 , as well as in how the contents of the marker register 1183 and the data register 1182 are used. As shown in FIG.
  • tags 1261 from the stripper 1171 come as eight bit words, which provide two bits for each corresponding byte of data going to the data register 1182 .
  • an individual two bit tag indicating valid and following a marker byte has 1 on the most significant position. Only this most significant position of each of the four tags delivered by the stripper 1171 simultaneously is driven to the input 1262 of the marker preshifter 1173 . In this way, on the input to the marker preshifter there may be bits set to 1 indicating positions of the first encoded data bits following markers. At the same time, they mark the positions of the first encoded data bits in the data register 1182 which follow a marker.
  • This synchronous behavior of the marker position bits in the marker register 1183 and the data bits in the data register 1182 is used in the control block 1185 for detection and removal of padding bits, as well as for passing marker positions to the output of the decoder in a way synchronous with the decoded data.
  • the two preshifters (data 1172 and marker 1173 ), postshifters (data 1180 and marker 1181 ) and registers (data 1182 and marker 1183 ) get the same control signals which facilitates fully parallel and synchronous operation.
  • the decoding unit 1184 gets the sixteen most significant bits of the data register 1182 which are driven to a combinatorial decoding unit 1184 for extraction of a decoded Huffman value, the length of the present input code being decoded and the length of the additional bits following immediately the input code (which is a function of the decoded value).
  • the length of the additional bits is known after the corresponding preceding Huffman symbol is decoded, so is the starting position of the next Huffman symbol. This effectively requires, if speed of one value decoded per clock cycle is to be maintained, that decoding of a Huffman value is done in a combinatorial block.
  • the decoding unit comprizes four PLA style decoding tables hardwired as a combinatorial block taking a 16-bit token on input from the data register 1182 and producing a Huffman value (8 bits), the length of the corresponding Huffman-encoded symbol (4 bits) and the length of the additional bits (4 bits) as illustrated in FIG. 89 .
  • Removal of padding bits takes place during the actual decoding when a sequence of padding bits is detected in the data register 1182 by a decoder of padding bits which is part of the control unit 1185 .
  • the decoder of padding bits operates as shown in FIG. 90 .
  • Eight most significant bits of the marker register 1183 , 1242 are monitored for presence of a marker position bit. If a marker position bit is detected, all the bits in the data register 1182 , 1241 which correspond to, that is have the same positions as, the bits preceding the marker bit in the marker register 1242 are recognized as belonging to a current padding zone. The content of the current padding zone is checked by the detector of padding bits 1243 for 1's.
  • all the bits in the current padding zone are 1's, they are recognized as padding bits and are removed from the data register. Removal is done by means of shifting of the contents of the data register 1182 , 1241 (and at the same time the marker register 1183 , 1242 ) to the left using the respective shifters 1172 , 1173 , 1180 , 1181 in one clock cycle, as in normal decode mode with the difference that no decoded value is outputted. If not all the bits in the current padding zone are 1's, a normal decode cycle is performed rather than a padding bits removal cycle. Detection of padding bits takes place each cycle as described, in case there are some padding bits in the data register 1182 to be removed.
  • the control unit 1185 is shown in detail in FIG. 87 .
  • the central part of the control unit is the register 1223 holding the current number of valid bits in the data register 1182 .
  • the number of valid bits in the marker register 1183 is always equal to the number of valid bits in the data register 1182 .
  • the control unit preforms three functions. Firstly, it calculates a new number of bits in the data register 1182 to be stored in the register 1223 . Secondly, it determines control signals for the shifters 1172 , 1173 , 1180 , 1181 , 1186 , 1187 decoding unit 1184 , and the output formatter 1188 . Finally, it detects padding bits in the data register 1182 , as described above.
  • the new number of bits in the data register 1182 (new_nob) is calculated as the current number of bits in the data register 1182 (nob) plus the number of bits (nos) available for loading from the stripper 1171 in the current cycle, less the number of bits (nor) removed from the data register 1182 in the current cycle, which is either a decode cycle or a padding bits removal cycle.
  • the new number of bits is calculated as follows:
  • adder 1221 and subtractor 1222 The respective arithmetic operations are done in adder 1221 and subtractor 1222 . It should be noted that (nos) can be 0 if there is no data available from the stripper 1171 in the current cycle. Also, (nor) can be 0 if there is no decoding done in the current cycle because of shortage of bits in the data register 1182 , which means there are less bits in the data register than the sum of the current code length and the following additional bits length as delivered by the control unit 1185 . The value (new_nob) may exceed 64 and block 1224 checks for this condition. In such a case, the stripper 1171 is stalled and no new data is loaded. Multiplexer 1233 is used for zeroing the number of bits to be loaded from the stripper 1171 .
  • Signal “padding cycle” driven by decoder 1231 controls multiplexer 1234 to select either the number of padding bits or the number of decoded bits (that is the length of code bits plus additional bits) as number of bits to be removed (nor). If the number of the decoded bits is greater than the number (nob) of the bits in the data register, which is checked in comparator 1228 , the effective number of bits to shift as provided for multiplexer 1234 is set to zero by a complex NAND gate 1230 . As a result, (nor) is set to zero and no bits are removed from the data register.
  • the output of multiplexer 1234 is also used to control postshifters 1182 and 1183 .
  • the width of the data register 1182 must be chosen in a way preventing a deadlock situation. This means that at any time either there needs to be room in the data register to accommodate the maximum number of bits available from the stripper 1171 or sufficient number of valid bits to be removed as a result of a decode or a padding of bits removed cycle.
  • Block 1229 is used for detection of EOI (End Of Image) marker position.
  • the EOI marker itself is removed by the stripper 1171 , but there can be some padding bits which are the very last bits of the data and which used to precede the EOI marker before its removal in the stripper 1171 .
  • the comparator 1229 checks if the number of bits in the data register 1182 , stored in register 1223 is less than eight. If it is, and there is no more data to come from the stripper 1171 (that is the data register 1182 holds all the remaining bits for of the data unit being decoded), the remaining bits define the size of the padding zone before the removed EOI marker. Further handling of the padding zone and possible removal of padding bits is identical to the procedure applied in case of padding bits before RST markers, which has been described before.
  • Barrel shifters 1186 , 1187 and output formatter 1188 play a support role and depending on the embodiment may have a different implementation or may not be implemented at all. Control signals to them come from the control unit 1185 , as described above.
  • the ab_preshifter (additional bits preshifter) 1186 takes 32 bits from the data register as input and shifts them to the left by the length of the Huffman code being presently decoded. In this way, all the additional bits following the code being presently decoded appear left aligned on the output of the barrel shifter 1186 which is also the input to the barrel shifter 1187 .
  • the ab_post shifter (additional bits postshifter) 1187 adjusts the position of the additional bits from left aligned to right aligned in an 11 bit field, as used in the output format of the data and shown in FIG. 91 .
  • the additional bits field extends from bit 8 to bit 18 in the output word format 1196 and some of the most significant bits may be invalid, depending on the actual number of the additional bits. This number in encoded on bits 0 to 3 of 1196 , as specified by the JPEG standard. If a different format of the output data is adopted, the barrel shifters 1186 and 1187 and their functionality may change accordingly.
  • the output formatter block 1188 packs the decoded values, which in JPEG standard are DC and AC coefficients, ( 1196 , bits 0 to 7 ) and a DC coefficient indicator ( 1196 , bit 19 ) passed by the control unit 1185 together with the additional bits ( 1196 , bits 8 to 18 ) passed by the ab_postshifter 1187 and the marker position bit ( 1196 , bit 23 ) from the marker register 1183 into words according to the format presented in FIG. 91 .
  • the output formatter 1188 also handles any particular requirements as to the output interface of the decoder.
  • the implementation of the output formatter is normally expected to change if the output interface changes as a result of different requirements.
  • the foregoing described Huffman decoder provides a highly effective form of decoding providing a high speed decoding operation.
  • These instructions implement general affine transformations of source images.
  • the operation to construct a portion of a transformed image falls generally into two broad areas. These include firstly working out which parts of the source image are relevant to constructing the current output scanline and, if necessary, decompressing them.
  • the second step normally comprizes necessary sub-sampling and/or interpolation to construct the output image on a pixel by pixel basis.
  • FIG. 92 there is illustrated a flow chart of the steps required 720 to calculate the value of a destination pixel assuming that the appropriate sections of the source image have been decompressed.
  • the relevant sub-sampling if present, must be taken into account 721 .
  • two processes are normally implemented, one involving interpolation 722 and the other being sub-sampling. Normally interpolation and sub-sampling are alternative steps, however in some circumstances interpolation and sub-sampling may be used together.
  • the first step is to find the four surrounding pixels 722 , then determine if pre-multiplication is required 723 , before performing bilinear interpolation 724 .
  • the bilinear interpolation step 724 is often computationally intensive and limits the operation of the image transformation process.
  • the final step in calculating a destination pixel value is to add together the possibly bilinear interpolated sub-samples from the source image.
  • the added together pixel values can be accumulated 727 in different possible ways to produce destination image pixels of 728 .
  • the instruction word encoding for image transformation instructions is as illustrated in FIG. 93 with the following interpretation being placed on the minor opcode fields.
  • Operand A points to a data structure known as a “kernel descriptor” that describes all the information required to define the actual transformation.
  • This data structure has one of two formats (as defined by the L bit in the A descriptor).
  • FIG. 94 illustrates the long form of kernel descriptor coding
  • FIG. 95 illustrates the short form of encoding.
  • the kernel descriptor describes:
  • Source image start co-ordinates 730 (unsigned fixed point, 24.24 resolution). Location (0,0) is at the top left of the image.
  • a 3 bit bp field 733 defining the location of the binary point within the fixed point matrix co-efficients as described hereinafter.
  • Accumulation matrix coefficients 735 are of “variable” point resolution of 20 binary places (2's complement), with the location of the binary point implicitly specified by the bp field.
  • An rl field 736 that indicates the remaining number of words in the kernel descriptor. This value is equal to the number of rows times the number of columns minus 1.
  • the kernel coefficients in the descriptor are listed row by row, with elements of alternate rows listed in reverse direction, thereby forming a zig zag pattern.
  • the operand B consists of a pointer to an index table indexing into scan lines of a source image.
  • the structure of the index table is as illustrated in FIG. 96, with the operand B 740 pointing to an index table 741 which in turn points to scan lines (eg. 742 ) of the required source image pixels.
  • the index table and the source image pixels are cacheable and possibly located in the local memory.
  • the operand C stores the horizontal and vertical sub-sample rate.
  • the horizontal and vertical sub-sample rates are defined by the dimensions of the sub-sample weight matrix which are specified if the C descriptor is present.
  • the dimensions of the matrix r and c are encoded in the data word of the image transformation instruction as illustrated in FIG. 97 .
  • the accumulated value is kept to 36 binary places per channel.
  • the location of the binary point within this field is specified by the BP field.
  • the BP field indicates the number of leading bits in the accumulated result to discard.
  • the 36 bit accumulated value is treated as a signed 2's compliment number and is clamped or wrapped as specified.
  • FIG. 98 there is illustrated an example of the interpretation of the BP field in co-efficient encoding.
  • Convolutions as applied to rendering images, involves applying a two dimensional convolution kernel to a source image to produce a resultant image. Convolving is normally used for such matters as edge sharpening or indeed any image filter. Convolutions are implemented by the co-processor 224 in a similar manner to image transformations with the difference being that, in the case of transformations the kernel is translated by the width of the kernel for each output pixel, in the case of convolutions, the kernel is moved by one source pixel for each output pixel.
  • H ⁇ ( x , y ) ⁇ [ n ] ( I . offset ⁇ [ n ] ⁇ mdp por : 0000 ) + ⁇ i ⁇ ⁇ j ⁇ S ⁇ ( x + i , y + j ) ⁇ C ⁇ ( i , j ) ⁇ [ n ]
  • FIG. 99 there is illustrated an example of how a convolution kernel 750 is applied to a source image 751 to produce a resultant image 752 .
  • Source image address generation and output pixel calculations are performed in a similar manner to that for image transformation instructions.
  • the instruction operands take a similar form to image transformations.
  • FIG. 100 there is illustrated the instruction word encoding for convolution instructions with the following interpretation being applied to the various fields.
  • Matrix multiplication is utilized for many things including being utilized for color space conversion where an affine relationship exists between two color spaces.
  • the matrix multiplication instruction operands and results have the following format:
  • the co-processor 224 implements a multi-level dither for halftoning. Anything from 2 to 255 is a meaningful number of halftone levels.
  • Data to be halftoned can be either bytes (ie. unmeshed or one channel from meshed data) or pixels (ie. meshed) as long as the screen is correspondingly meshed or unmeshed. Up to four output channels (or four bytes from the same channel) can be produced per clock, either packed bits (for bi-level halftoning) or codes (for more than two output levels) which are either packed together in bytes or unpacked in one code per bye.
  • the output half-toned value is calculated using the following formula:
  • the minor op code specifies a number of halftone levels.
  • the operand B encoding is for the halftone screen and is encoded in the same way as a compositing tile.
  • Hierarchial image format decompression involves several stages. These stages include horizontal interpolation, vertical interpolation, Huffman decoding and residual merging. Each phase is a separate instruction.
  • the residual values to be added to the interpolated values from the interpolation steps are Huffman coded.
  • the JPEG decoder is utilized for Huffman decoding.
  • FIG. 102 there is illustrated the process of horizontal interpolation.
  • the output stream 761 consists of twice as much data as the input stream 762 with the last data value 763 being replicated 764 .
  • FIG. 103 illustrates horizontal interpolation by a factor of 4.
  • rows of pixels are up sampled by a factor of two or four vertically by linear interpolation.
  • one row of pixels is on operand A and the other row is on operand B.
  • the output data stream contains the same number of pixels as each input stream.
  • FIG. 104 there is illustrated an example of vertical interpolation wherein two input data streams 770 , 771 are utilized to produce a first output stream 772 having a factor of two interpolation or a second output stream 773 having a factor of 4 interpolation.
  • interpolation occurs separately on each of the four channels of four channel pixels.
  • the residual merging process involves the bytewize addition of two streams of data.
  • the first stream (operand A) is a stream of base values and the second stream (operand B) is a stream of residual values.
  • FIG. 105 there is illustrated two input streams 780 , 781 and a corresponding output stream 782 for utilising the process of residual merging.
  • FIG. 106 there is illustrated the instruction word encoding for hierarchial image format instructions with the following table providing the relevant details of the minor op code fields.
  • These instructions utilize the normal data flow path through the co-processor 224 , comprising the input interface module, input interface switch 252 , pixel organizer 246 , JPEG coder 241 , result organizer 249 and then the output interface module.
  • the JPEG coder module sends data straight through without applying any operation.
  • the data manipulation operation is carried out by a combination of the pixel organizer (on input) and the result organizer (on output). In many cases, these instructions can be combined with other instructions.
  • operand A represents the data to be copied and the result operand represents the target address of the memory copy instructions.
  • the particular data manipulation operation is specified by the operand B for input and operand C for output operand words.
  • the flow control instructions are a family of instructions that provide control over various aspect of the instruction execution model as described with reference to FIG. 9 .
  • the flow control instructions include both conditional and unconditional jumps enabling the movement from one virtual address to another when executing a stream of instructions.
  • a conditional jump instruction is determined by taking a co-processor or register, masking off any relevant fields and comparing it to given value. This provides for reasonable generality of instructions.
  • flow control instructions include wait instructions which are typically used to synchronize between overlapped and non-overlapped instructions or as part of micro-programming.
  • FIG. 107 there is illustrated instruction when encoding for flow control instructions with the minor opcodes being interpreted as follows:
  • the operand A word specified the target address of the jump instruction. If the S bit of the Minor Opcode is set to 0, then operand B specified a co-processor register to use as the source of the condition.
  • the value of the operand B descriptor specifies the address of the register, and the value of the operand B word defines a value to compare the contents of the register against.
  • the operand C word specifies a bitwize mask to apply to the result. That is, the Jump Instruction's condition is true of the bitwize operation:
  • the pixel organizer 246 addresses and buffers data streams from the input interface switch 252 .
  • the input data is stored in the pixel organizer's internal memory or buffered to the MUV buffer 250 . Any necessary data manipulation is performed upon the input stream before it is delivered to the main data path 242 or JPEG coder 241 as required.
  • the operating modes of the pixel organizer are configurable by the usual CBus interface.
  • the pixel organizer 246 operates in one of five modes, as specified by a PO_CFG control register. These modes include:
  • the MUV buffer 250 is therefore utilized by the pixel organizer 246 for both main data path 242 and JPEG coder 241 operations.
  • the MUV RAM 250 stores the interval and fractional tables and they are accessed as 36 bits of data (four color channels) ⁇ (4 bit interval values and 8 bit fractional values).
  • the MUV RAM 250 stores matrix co-efficients and related configuration data.
  • the co-efficient matrix is limited to 16 rows ⁇ 16 columns with each co-efficient being at a maximum 20 bits wide. Only one co-efficient per clock cycle is required from the MUV RAM 250 .
  • control information such as binary point, source start coordinates and sub-sample deltas must be passed to the main data path 242 . This control information is fetched by the pixel organizer 246 before any of the matrix coefficients are fetched.
  • the MUV buffer 250 is utilized by the pixel organizer 246 to double buffer MCU's.
  • the technique of double buffering is employed to increase the performance of JPEG compression.
  • One half of the MUV RAM 250 is written to using data from the input interface switch 252 while the other half is read by the pixel organizer to obtain data to send to the JPEG coder 241 .
  • the pixel organizer 246 is also responsible for performing horizontal sub-sampling of color components where required and to pad MCU's where an input image does not have a size equal to an exact integral number of MCUs.
  • the pixel organizer 246 is also responsible for formatting input data including byte lane swapping, normalization, byte substitution, byte packing and unpacking and replication operations as hereinbefore discussed with reference to FIG. 32 of the accompanying drawings. The operations are carried out as required by setting the pixel organizers registers.
  • the pixel organizer 246 operates under the control of its own set of registers contained within a CBus interface controller 801 which is interconnected to the instruction controller 235 via the global CBus.
  • the pixel organizer 246 includes an operand fetch unit 802 responsible for generating requests from the input interface switch 252 for operand data needed by the pixel organizer 246 .
  • the start address for operand data is given by the PO_SAID register which must be set immediately before execution.
  • the PO_SAID register may also hold immediate data, as specified by the L bit in the PO_DMR register.
  • the current address pointer in stored in the PO_CDP register and is incremented by the burst length of any input interface switch request.
  • the current offset for data is concatenated with a base address for the MUV RAM 250 as given by the PL_MUV register.
  • a FIFO 803 is utilized to buffer sequential input data fetched by the operand fetch unit 802 .
  • the data manipulation unit 804 is responsible for implementing for implementing the various manipulations as described with reference to FIG. 32 .
  • the output of the data manipulation unit is passed to the MUV address generator 805 which is responsible for passing data to the MUV RAM 250 , main data path 242 or JPEG coder 241 in accordance with configuration registers.
  • a pixel organizer control unit 806 is a state machine that generates the required control signals for all the sub-modules in the pixel organizer 246 . Included in these signals are those for controlling communication on the various Bus interfaces.
  • the pixel organizer control unit outputs diagnostic information as required to the miscellaneous module 239 according to its status register settings.
  • the operand fetch unit 802 includes an Instruction Bus address generator (IAG) 810 which contains a state machine for generating requests to fetch operand data. These requests are sent to a request arbiter 811 which arbitrates between requests from the address generator 810 and those from the MUV address generator 805 (FIG. 108) and sends the winning requests to the input (MAG) interface switch 252 .
  • the request arbiter 811 contains a state machine to handle requests. It monitors the state of the FIFO via FIFO count unit 814 to decide when it should dispatch the next request.
  • a byte enable generator 812 takes information on the IAG 810 and generates byte enable patterns 816 specifying the valid bytes within each operand data word returned by the input interface switch 252 .
  • the byte enabled pattern is stored along with the associated operand data in the FIFO.
  • the request arbiter 811 handles MAG requests before IAG requests when both requests arrive at the same time.
  • the MUV address generator 805 operates in a number of different modes.
  • a first of these modes is the JPEG (compression) mode.
  • JPEG compression
  • input data for JPEG compression is supplied by the data manipulation units 804 with the MUV buffer 250 being utilized as a double buffer.
  • the MUV RAM 250 address generator 805 is responsible for generating the right addresses to the MUV buffer to store incoming data processed by the data manipulation unit 804 .
  • the MAG 805 is also responsible for generating read addresses to retrieve color component data from the stored pixels to form 8 ⁇ 8 blocks for JPEG compression.
  • the MAG 805 is also responsible for dealing with the situation when a MCU lies partially on the image.
  • FIG. 110 there is illustrated an example of a padding operation carried out by the MAG 805 .
  • the MAG 805 For normal pixel data, the MAG 805 stores the four color components at the same address within the MUV RAM 250 in four 8 bit rams. To facilitate retrieval of data from the same color channel simultaneously, the MCU data is barrel shifted to the left before it is stored in the MUV RAM 250 . The number of bytes the data is shifted to the left is determined by the lowest two bits of the write address. For example, in FIG. 111 there is illustrated the data organization within the MUV RAM 250 for 32 bit pixel data when no sub-sampling is needed. Sub-sampling of input data maybe selected for three or four channel interleaved JPEG mode. In multichannel JPEG compression mode with subsampling operating, the MAG 805 (FIG.
  • FIG. 112 there is illustrated an example of MCU data organization for multi-channel sub-sampling mode.
  • the MAG treats all single channel unpacked data exactly the same as multi-channel pixel data.
  • An example of single channel packed data as read from the MUV RAM is illustrated in FIG. 113 .
  • the reading process is reading 8 ⁇ 8 blocks out of the MUV RAM.
  • the blocks are generated by the MAG 805 by reading the data for each channel sequentially, four coefficients at the time.
  • the stored data is organized as illustrated in FIG. 111 . Therefore, to compose one 8 ⁇ 8 block of non-sampled pixel data, the reading process reads data diagonally from the MUV RAM.
  • FIG. 114 shows the reading sequence for four channel data, the form of storage in the MUV RAM 250 assisting to read multiple values for the same channel simultaneously.
  • the MUV RAM 250 When operating in color conversion mode, the MUV RAM 250 is used as a cache to hold the interval and fractional values and the MAG 805 operates as a cache controller.
  • the MUV RAM 250 caches values for three color channels with each color channel containing 256 pairs of four bit interval and fractional values. For each pixel output via the DMU, the MAG 805 is utilized to get the values from the MUV RAM 250 . Where the value is not available, the MAG 805 generates a memory read request to fetch the missing interval and fractional values. Instead of fetching one entry in each request, multiple entries are fetched simultaneously for better utilization of bandwidth.
  • the MUV RAM 250 stores the matrix coefficients for the MDP.
  • the MAG cycles through all the matrix co-efficient stored in the MUV RAM 250 .
  • the MAG 805 At the start of an image transformation and convolution instruction, the MAG 805 generates a request to the operand fetch unit to fetch the kernal description “header” (FIG. 94) and the first matrix co-efficient in a burst request.
  • the MAG 805 includes an IBus request module 820 which multiplexers IBus requests generated by an image transformation controller (ITX) 821 and a color space conversion (CSC) controller 822 .
  • the requests are sent to the operand fetch unit which services the request.
  • the pixel organizer 246 is only operated either in image transformation or color space conversion mode. Hence, there is no arbitration required between the two controllers 821 , 822 .
  • the IBus request module 820 derives the information for generating a request to the operand fetch unit including the burst address and burst length from the relevant pixel organizer registers.
  • a JPEG controller 824 is utilized when operating in JPEG mode and comprizes two state machines being a JPEG write controller and a JPEG read controller. The two controllers operate simultaneously and synchronize with each other through the use of internal registers.
  • the DMU In a JPEG compression operation, the DMU outputs the MCU data which is stored into the MUV RAM.
  • the JPEG Write Controller is responsible for horizontal padding and control of pixel subsampling, while the JPEG Read Controller is responsible for vertical padding. Horizontal padding is achieved by stalling the DMU output, and vertical padding is achieved by reading the previously read 8 ⁇ 8 block line.
  • the JPEG Write Controller keeps track of the position of the current MCU and DMU output pixel on the source image, and uses this information to decide when the DMU has to be stalled for horizontal padding.
  • the JPEG Write Controller sets/resets a set of internal registers which indicates the MCU is on the right edge of the image, or is at the bottom edge of the image.
  • the JPEG Read Controller then uses the content of these registers to decide if it is required to perform vertical padding, and if it has read the last MCU on the image.
  • the JPEG Write Controller keeps track of DMU output data, and stores the DMU output data into the MUV RAM 250 .
  • the controller uses a set of registers to record the current position of the input pixel. This information is used to perform horizontally padding by stalling the DMU output.
  • the controller When a complete MCU has been written into the MUV RAM 250 , the controller writes the MCU information into JPEG-RW-IPC registers which is later used by the JPEG Read Controller.
  • the controller enters the SLEEP state after the last MCU has been written into the MUV RAM 250 .
  • the controller stays in this state until the current instruction completes.
  • the JPEG Read Controller read the 8 ⁇ 8 blocks from the MCUs stored in the MUV RAM 250 .
  • the controller reads the MCU several times, each time extracting a different byte from each pixel stored in the MUV RAM.
  • the controller detects if it needs to perform vertical padding using the information provided by the JPEG-RW-IPC. Vertical padding is achieved by re-reading the last 8-bytes read from the MUV RAM 250 .
  • the Image Transformation Controller 821 is responsible for reading the kernel discriptor from the IBus and passes the kernel header to the MDP 242 , and cycles through the matrix coefficients as many times as specified in the po.len register. All data output by the PO 246 in an image transformation and Convolution instruction are fetched directly from the IBus and not passed through the DMU.
  • the top eight bits of the first matrix co-efficient fetched immediately after the kernel header contains the number of remaining matrix coefficients to be fetched.
  • the kernel header is passed to the MDP directly without modifications, whilst the matrix coefficients are sign extended before they are passed to the MDP.
  • the pixel sub-sampler 825 comprizes two identical channel sub-samplers, each operating on a byte from the input word. When the relevant configuration register is not asserted, the pixel sub-sampler copies its input to its output. When the configuration register is asserted, the sub-sampler sub-samples the input data either by taking the average or by decimation.
  • An MUV multiplexer module 826 selects the MUV read and write signals from the currently active controller. Internal multiplexers are used to select the read addresses output via the various controllers that utilize the MUV RAM 250 .
  • An MUV RAM write address is held in an 8 bit register in an MUV multiplexer module. The controllers utilising the MUV RAM 250 , load the write address register in addition to providing control for determining a next MUV RAM address.
  • a MUV valid access module 827 is utilized by the color space conversion controller to determine if the interval and fractional values for a current pixel output by the data manipulation unit is available in the MUV RAM 250 . When one or more color channels are missing, the MUV valid access module 827 passes the relevant address to the IBus request module 820 for loading in burst mode, interval and fractional values. Upon servicing a cache miss, the MUV valid access module 827 sets internal validity bits which map the set of interval and fractional values fetched so far.
  • a replicate module 829 replicates the incoming data, the number of times as specified by an internal pixel register. The input stream is stalled while the replication module is replicating the current input word.
  • a PBus interface module 630 is utilized to re-time the output signals of the pixel organizer 246 to the main data path 242 and JPEG coder 241 and vice versa.
  • a MAG controller 831 generates signals for initiating and shutting down the various sub-modules. It also performs multiplexing of incoming PBus signals from the main data path 242 and JPEG coder 241 .
  • the reconfigurable MUV buffer 250 is able to support a number of operating modes including the single lookup table mode (mode 0 ), multiple lookup table mode (mode 1 ), and JPEG mode (mode 2 ).
  • mode 0 single lookup table mode
  • mode 1 multiple lookup table mode
  • mode 2 JPEG mode
  • a different type of data object is stored in the buffer in each mode.
  • the data objects that are stored in the buffer can be data words, values of a multiplicity of lookup tables, single channel data and multiple channel pixel data.
  • the data objects can have different sizes.
  • the data objects stored in the reconfigurable MUV buffer 250 can be accessed in substantially different ways which is dependent on the operating mode of the buffer.
  • the data objects are often encoded before they are stored.
  • the coding scheme applied to a data object is determined by the size of the data object, the format that the data objects are to be presented, how the data objects are retrieved from the buffer, and also the organization of the memory modules that comprize the buffer.
  • FIG. 116 is a block diagram of the components used to implement the reconfigurable MUV buffer 250 .
  • the reconfigurable MUV buffer 250 comprizes an encoder 1290 , a storage device 1293 , a decoder 1291 , and a read address and rotate signal generator 1292 .
  • the data object may be encoded into an internal data format and placed on the encoded input data stream 1296 by the encoder 1290 .
  • the encoded data object is stored in the storage device 1293 .
  • an encoded data object is read out of the storage device via encoded output data stream 1297 .
  • the encoded data object in the encoded output data stream 1297 is decoded by a decoder 1291 .
  • the decoded data object is then presented at the output data stream 1298 .
  • the write addresses 1305 to the storage device 1293 are provided by the MAG 805 (FIG. 108 ).
  • the read addresses 1299 , 1300 and 1301 are also provided by the MAG 805 (FIG. 108 ), and translated and multiplexed to the storage device 1293 by the Read Address and Rotate Signal Generator 1292 , which also generates input and output rotate control signals 1303 and 1304 to the encoder and decoder respectively.
  • the write enable signals 1306 and 1307 are provided by an external source.
  • An operating mode signal 1302 which is provided by means of the controller 801 (FIG. 108 ), is connected to the encoder 1290 , the decoder 1291 , the Read Address and Rotate Signal Generator 1292 , and the storage device 1293 .
  • An increment signal 1308 increments internal counter(s) in the read address and rotate signal generator and may be utilized in JPEG mode (mode 2 ).
  • the buffer behaves substantially like a single memory module.
  • Data objects may be stored into and retrieved from the buffer in substantially the same way used to access memory modules.
  • the buffer 250 When the reconfigurable MUV buffer 250 is operating in the multiple lookup table mode (mode 1 ), the buffer 250 is divided into a plurality of tables with up to three lookup tables may be stored in the storage device 1293 .
  • the lookup tables may be accessed separately and simultaneously. For instance, in one example, interval and fraction values are stored in the storage device 1293 in the multiple lookup table mode, and the tables are indexed utilizing the lower three bytes of the input data stream 1295 . Each of the three bytes are issued to access a separate lookup table stored in the storage device 1293 .
  • the image When an image undergoes JPEG compression, the image is converted into an encoded data stream.
  • the pixels are retrieved in the form of MCUs from the original image.
  • the MCUs are read from left to right, and top to bottom from the image.
  • Each MCU is decomposed into a number of single component 8 ⁇ 8 blocks.
  • the number of 8 ⁇ 8 blocks that can be extracted from a MCU depends on several factors including: the number of color components in the source pixels, and for a multiple channel JPEG mode, whether subsampling is needed.
  • the 8 ⁇ 8 blocks are then subjected to forward DCT (FDCT), quantization, and entropy encoding.
  • FDCT forward DCT
  • JPEG decompression the encoded data are read sequentially from a data stream.
  • the data stream undergoes entropy decoding, dequantization and inverse DCT (IDCT).
  • IDCT inverse DCT
  • the output of the IDCT operation are 8 ⁇ 8 blocks.
  • a number of single component 8 ⁇ 8 blocks are combined to reconstruct a MCU.
  • the number of single component 8 ⁇ 8 blocks are dependent on the same factors mentioned above.
  • the reconfigurable MUV buffer 250 may be used in the process to decompose MCUs into a multiplicity of single component 8 ⁇ 8 blocks, to reconstruct MCUs from a multiplicity of single component 8 ⁇ 8 blocks.
  • the input data stream 1295 to the buffer 250 comprizes pixels for a JPEG compression operation, or single component data in a JPEG decompression operation.
  • the output data stream 1298 of the buffer 250 comprizes single channel data blocks for a JPEG compression operation, or pixel data in a JPEG decompression operation.
  • an input pixel may comprize up to four channels denoted Y, U, V and O.
  • Each single component data block comprizes data from the like channel of each pixel stored in the buffer.
  • up to four single component data blocks may be extracted from one pixel data block.
  • a multiplicity of Minimum Coded Units (MCUs) each containing 64 single or 64 multiple channel pixels may be stored in the buffer, and a multiplicity of 64-byte long single channel component data blocks are extracted from each MCU stored in the buffer.
  • the output data stream contains output pixels that have up to four components Y, U, V and O.
  • FIG. 117 illustrates the encoder 1290 of FIG. 116 in more detail.
  • each input data object is encoded using a byte-wize rotation before it is stored into the storage device 1293 (FIG. 129 ).
  • the amount of rotation is specified by the input rotate control signal 1303 .
  • a 32-bit 4-to-1 multiplexer 1320 and output 1325 is used to select one of the four possible rotated versions of the input pixel.
  • the four bytes in a pixel are labelled ( 3 , 2 , 1 , 0 )
  • the four possible rotated versions of this pixel are ( 3 , 2 , 1 , 0 ), ( 0 , 3 , 2 , 1 ), ( 1 , 0 , 3 , 2 ) and ( 2 , 1 , 0 , 3 ).
  • the four encoded bytes are output 1296 for storage in the storage device.
  • mode 2 When the buffer is placed in an operating mode other than the JPEG mode (mode 2 ), for example, single lookup table mode (mode 0 ) and multiple lookup table mode (mode 1 ), byte-wize rotation may not be necessary and may not be performed on the input data objects.
  • the input data object is prevented from being rotated in the latter cases by overriding the input rotate control signal with a no-operation value.
  • This value 1323 can be zero.
  • a 2-to-1 multiplexer 1321 produces control signals 1326 by selecting between the input rotate control signal 1303 and the no-operation value 1323 .
  • the current operating mode 1302 is compared with the value assigned to the pixel block decomposition mode to produce the multiplexer select signal 1322 .
  • the 4-to-1 multiplexer 1320 which is controlled by signal 1326 selects one of the four rotated version of the input data object on the input data stream 1325 , and produces an encoded input data object on the encoded input data stream 1326 .
  • FIG. 118 illustrates a schematic of a combinatorial circuit which implements the decoder 1291 for the decoding of the encoded output data stream 1297 .
  • the decoder 1321 operates in a substantially similar manner to the encoder. The decoder only operates on the data when the data buffer is in the JPEG mode (mode 2 ).
  • the lower 32-bit of an encoded output data object in the encoded output data stream 1297 is passed to the decoder.
  • the data is decoded using a byte-wize rotation with an opposite sense of rotation to the rotation performed by the encoder 1290 .
  • a 32-bit 4-to-1 multiplexer 1330 is used to select one of the four possible rotated version of the encoded data.
  • the four bytes in an input pixel are labelled ( 3 , 2 , 1 , 0 )
  • the four possible rotated version of this pixel are ( 3 , 2 , 1 , 0 ), ( 2 , 1 , 0 , 3 ), ( 1 , 0 , 3 , 2 ) and ( 0 , 3 , 2 , 1 ).
  • the output rotate control signal 1304 is utilized only when the buffer is in a pixel block decomposition mode, and when overridden by a no-operation value in other operating modes.
  • the no-operation value utilized 1333 is zero.
  • a 2-to-1 multiplexer 1331 produces signal 1334 by selecting selects between the output rotate control signal 1304 and the no-operation value 1333 .
  • the current operating mode 1302 is compared with the value assigned to the pixel block decomposition mode to produce the multiplexer select signal 1332 .
  • the 4-to-1 multiplexer 1330 which is controlled by signal 1334 , selects one of the four rotated version of the encoded output data object on the encoded output data stream 1297 , and produces an output data object on the output data stream 1298 .
  • the method of internal read address generation used by the circuit is selected by the operating mode 1302 of the reconfigurable MUV buffer 250 .
  • the read addresses are provided by the MAG 805 (FIG. 108) in the form of external read addresses 1299 , 1300 , and 1301 .
  • the memory modules 1380 , 1381 , 1382 , 1383 , 1384 and 1385 (FIG. 121) of the storage device 1293 operate together.
  • the read address and the write address supplied to the memory modules 1380 to 1385 (FIG. 121) are substantially the same.
  • the storage device 1293 only needs the external circuits to supply one read address and one write address, and uses internal logic to multiplex these addresses to the memory modules 1380 to 1385 (FIG. 121 ).
  • the read address is supplied by the external read address 1299 (FIG. 116) and is multiplexed to the internal read address 1348 (FIG. 121) without substantial changes.
  • the external read addresses 1300 and 1301 (FIG. 116 ), and the internal read addresses 1349 , 1350 and 1351 (FIG. 121 ), are not used in mode 0 .
  • the write address is supplied by the external write address 1305 (FIG. 116 ), and is connected to the write address of each memory module 1380 to 1385 (FIG. 121) without substantial modification.
  • a design that provides three lookup tables in the multiple lookup table mode (mode 1 ) is presented.
  • the encoded input data is written simultaneously into all memory modules 1380 to 1385 (FIG. 121 ), while the three tables are accessed independently, and thus require one index to each of the three tables.
  • Three indices, that is, read addresses to the memory modules 1380 to 1385 (FIG. 121 ) are supplied to the storage device 1293 . These read addresses are multiplexed to the appropriate memory modules 1380 to 1385 using internal logic.
  • the write address supplied externally is connected to the write address of each of the memory modules 1380 to 1385 without substantial modifications.
  • the external read addresses 1299 , 1300 and 1311 are multiplexed to internal read addresses 1348 , 1349 and 1350 respectively.
  • the internal read address 1351 is not used in mode 1 .
  • the method of generating the internal read addresses need in the JPEG mode (mode 2 ) is different to the method described above.
  • FIG. 119 illustrates a schematic of a combinatorial circuit which implements the read address and rotate control signals generation circuit 1292 (FIG. 116 ), for the reconfigurable data buffer operating in the JPEG mode (mode 2 ) for JPEG compression.
  • the generator 1292 uses the output of a component block counter 1340 and the output of a data byte counter 1341 to compute the internal read addresses to the memory modules comprising the storage device 1293 .
  • the component block counter 1340 gives the number of component blocks extracted from a pixel data block, which is stored in the storage device. The number of like components extracted from the pixel data block is given by multiplying the output of the data byte counter 1341 by four.
  • an internal read address 1348 , 1349 , 1350 or 1351 for the pixel data block decomposition mode is computed as follows.
  • the output of the component block counter is used to generate an offset value 1343 , 1344 , 1345 , 1346 or 1347
  • the output of the data byte counter 1341 is used to generate a base read address 1354 .
  • the offset value 1343 is added 1358 to the base read address 1354 and the sum is an internal read address 1348 (or 1349 , 1350 or 1351 ).
  • the offset values for the memory modules are in general different for simultaneous read operations performed on multiple memory modules, but the offset value to each memory module is in general substantially the same during the extraction of one component data block.
  • the base addresses 1354 used to compute the four internal read addresses in the pixel data block decomposition mode are substantially the same.
  • the increment signal 1308 is used as the component byte counter increment signal.
  • the counter is incremented after every successful read operation has been performed.
  • a component block counter increment signal 1356 is used to increment the component block counter 1340 , after a complete single component data block has been retrieved from the buffer.
  • the output rotate control signal 1304 (FIG. 116) is derived from the output of the component block counter, and the output of the data byte counter, in substantially similar manner to the generation of an internal read address.
  • the output of the component block counter is used to compute a rotation offset 1347 .
  • the output rotate control signal 1304 is given by the lowest two bits of the sum of the base read address 1354 and the rotation offset 1355 .
  • the input rotate control signal 1303 is simply given by the lowest two bytes of the external write addresses 1305 in this example of the address and rotate control signals generator.
  • FIG. 120 shows another example of the address generator 1292 for reassembling multiple channel pixel data from single component data stored in the reconfigurable MUV buffer 250 .
  • the buffer is operating in the JPEG (mode 2 ) for JPEG decompression operation.
  • single component data blocks are stored in the buffer, and pixel data blocks are retrieved from the buffer.
  • the write address to the memory modules are provided by the external write address 1305 without substantial changes.
  • the single component blocks are stored in contiguous memory locations.
  • the input rotate control signal 1303 in this example is simply set to the lowest two bits of the write address.
  • a pixel counter 1360 is used to keep track of the number of pixels extracted from the single component blocks stored in the buffer.
  • the output of the pixel counter is used to generate the read addresses 1348 , 1349 , 1350 and 1351 , and the output rotate control signal 1304 .
  • the read addresses are in general different for each memory module that comprize the storage device 1293 .
  • a read address comprises two parts, a single component block index 1362 , 1363 , 1364 or 1365 , and a byte index 1361 .
  • An offset is added to bit 3 and 4 of the output of the pixel counter to calculate the single component block index for a particular block.
  • the offsets 1366 , 1367 , 1368 and 1369 are in general different for each read address. Bit 2 to bit 0 of the output of the pixel counter are used as the byte index 1361 of a read address.
  • a read address is the result of the concatenation of a single component block index 1362 , 1363 , 1364 or 1365 and a byte index 1361 , as illustrated in FIG. 120 .
  • the output rotate control signal 1304 is generated using bit 4 and bit 3 of the output of the pixel counter without substantial change.
  • the increment signal 1308 is used as the pixel counter increment signal to increment the pixel counter 1360 .
  • the pixel counter 1360 is incremented after a pixel has been successfully retrieved from the buffer.
  • FIG. 121 illustrates an example of a structure of the storage device 1293 .
  • the storage device 1293 can comprize three 4-bit wide memory modules 1383 , 1384 and 1385 , and three 8-bit wide memory modules 1380 , 1381 and 1382 .
  • the memory modules can be combined together to store 36-bit words in the single lookup table mode (mode 0 ), 3 ⁇ 12-bit words in the multiple lookup table mode (mode 1 ), and 32-bit pixels or 4 ⁇ 8-bit single component data in JPEG mode (mode 2 ).
  • each memory module is associated with a different part of the encoded input and output data streams ( 1296 and 1297 ).
  • memory module 1380 has its data input port connected to bit 0 to bit 7 of the encoded input data stream 1296 , and its data output port connected to bit 0 to bit 7 of the encoded output data stream 1297 .
  • the write addresses to all the memory modules are connected together, and share substantially the same value.
  • the read addresses 1386 , 1387 , 1388 , 1389 , 1390 and 1391 to the memory modules of the example illustrated in FIG. 121 are supplied by the read address generator 1292 , and are in general different.
  • a common write enable signal is used to provide the write enable signals to all three 8-bit memory modules
  • a second common write enable signal is used to provide the write enable signals to all three 4-bit memory modules.
  • FIG. 122 illustrates a schematic of a combinatorial circuit used for generating read addresses 1386 , 1387 . 1388 , 1389 , 1390 and 1391 for accessing to the memory modules contained in a storage device 1293 .
  • Each encoded input data object is broken up into parts, and each part is stored into a separate memory module in the storage device.
  • the write addresses to all memory modules for all operating modes are substantially the same and thus substantially no logic is required to compute the write address to the memory modules.
  • the read addresses in this example are typically different for different operations, and are also different to each memory module within each operating mode.
  • All bytes in the output data stream 1298 of the reconfigurable MUV buffer 250 must contain single component data extracted from the pixel data stored in the buffer in the JPEG mode (mode 2 ) for JPEG compression, or pixel data extracted from the single component data blocks stored in the buffer in the JPEG mode for JPEG decomposition.
  • the requirements on the output data stream are achieved by providing four read addresses 1348 , 1349 , 1350 and 1351 to the buffer.
  • mode 1 up to three lookup tables are stored in the buffer, and thus only up to three read addresses 1348 , 1349 and 1350 are needed to index the three lookup tables.
  • the read addresses to all memory modules are substantially the same in the single lookup table mode (mode 0 ), and only read address 248 is used in this mode.
  • the example controller circuit shown in FIG. 122 uses the operating mode signals to the buffer, and up to four read addresses, to compute the read address 1386 - 1391 to each of the six memory modules comprising the storage device 1293 .
  • the read address generator 1292 takes, as its inputs, the external read addresses 1299 , which comprizes external address buses 1348 , 1349 , 1350 and 1351 , and generates the internal read addresses 1386 , 1387 , 1388 , 1389 , 1390 and 1391 to the memory modules that comprize the storage device 1293 . No manipulation on the external write addresses 1305 is required in the operation of this example.
  • FIG. 123 illustrates a representation of an example of how 20-bit matrix coefficients may be stored in the buffer 250 when the buffer 250 is operating in single lookup table mode (mode 0 ).
  • mode 0 single lookup table mode
  • the matrix coefficients are stored in the 8-bit memory modules 1380 , 1381 and 1382 .
  • Bit 7 to bit 0 of the matrix coefficient are stored in memory module 1380
  • bit 15 to bit 8 of the matrix coefficient are stored in memory module 1381
  • bit 19 to bit 16 of the matrix coefficient are stored in the lower 4 bits of memory module 1382 .
  • the data objects stored in the buffer may be retrieved as many times as required for the rest of the instruction.
  • the write and read addresses to all memory modules involved in the single lookup table mode are substantially the same.
  • FIG. 124 illustrates a representation of how the table entries are stored in the buffer in the multiple lookup table mode (mode 1 ).
  • mode 1 multiple lookup table mode
  • up to three lookup tables may be stored in the buffer, and each lookup table entry comprizes a 4-bit interval value and an 8-bit fraction value.
  • the interval values are stored in the 4-bit memory modules, and the fraction values are stored in the 8-bit memory modules.
  • the three lookup tables 1410 , 1411 and 1412 are stored in the memory banks 1380 and 1383 , 1381 and 1384 , 1382 and 1385 in the example.
  • the separate write enable control signals 1306 and 1307 (FIG. 121) allow the interval values to be written into the storage device 1293 without affecting the fraction values already stored in the storage device. In substantially the same manner, the fraction values may be written into storage device without affecting the interval values already stored in the storage device.
  • FIG. 125 illustrates a representation of how pixel data is stored in the reconfigurable MUV buffer 250 when the JPEG mode (mode 2 ) for decomposing pixel data blocks into single component data blocks.
  • the storage device 1293 is organized as four 8-bit memory banks, which comprizes the memory modules 1380 , 1381 , 1382 , 1383 and 1384 , with 1383 and 1384 used together to operate substantially in the same manner as an 8-bit memory module.
  • Memory module 1385 is not used in the JPEG mode (mode 2 ).
  • a 32-bit encoded pixel is broken up into four bytes, and each is stored into a different 8-bit memory module.
  • FIG. 126 illustrates a representation of how the single component data blocks are stored in the storage device 1293 in single component mode.
  • the storage device 1293 is organized as four 8-bit memory banks, which comprizes the memory modules 1380 , 1381 , 1382 , 1383 and 1384 , with 1383 and 1384 used together to operate substantially in the same manner as an 8-bit memory module.
  • a single component block in this example comprizes 64 bytes.
  • a different amount of byte rotation can be applied to each single component block when it is written into the buffer.
  • a 32-bit encoded pixel data is retrieved by reading from the different single component data block stored in the buffer.
  • a reconfigurable data buffer may be used to handle data involved in different instructions.
  • a reconfigurable data buffer that provides three operating modes has been disclosed. Different address generation techniques may be needed in each operating mode of the buffer.
  • the single look-up table mode (mode 0 ) may be used to store matrix coefficients in the buffer for an image transformation operation.
  • the multiple look-up table mode (mode 1 ) may be used to store a multiplicity of interval and fraction lookup tables in the buffer in a multiple channel color space conversion (CSC) operation.
  • the JPEG mode (mode 2 ) may be used either to decompose MCU data into single component 8 ⁇ 8 blocks, or to reconstruct MCU data from single-component 8 ⁇ 8 blocks, in JPEG compression and decompression operation respectively.
  • the MUV buffer 250 is also utilized by the result organizer 249 .
  • the result organizer 249 buffers and formats the data stream from either the main data path 242 or the JPEG coder 241 .
  • the result organizer 249 also is responsible for data packing and unpacking, denormalization, byte lane swapping and realignment of result data as previously discussed with reference to FIG. 42 . Additionally the result organizer 249 transmits its results to the external interface controller 238 , the local memory controller 236 , and the peripheral interface controller 237 as required.
  • the results organizer 249 When operating in JPEG decompression mode, the results organizer 249 utilizes the MUV RAM 250 to double buffer image data produced by the JPEG coder 241 . Double buffering increases the performance of the JPEG decompression by allowing data from the JPEG coder 241 to be written to one half of the MUV RAM 250 while at the same time image data presently in the other half of the MUV RAM 250 is output to a desired destination.
  • the 1, 3 and 4 channel image data is passed to the result organizer 249 during JPEG decompression in a form of 8 ⁇ 8 blocks with each block consisting of 8 bit components from the same channel.
  • the result organizer stores these blocks in the MUV RAM 250 in the order provided and then, for multi-channel interleaved images, meshing of the channels in performed when reading data from the MUV RAM 250 .
  • the JPEG coder 241 outputs three 8 ⁇ 8 blocks, the first consisting of Y components, the second made of the U components and the third made up of the V components.
  • Meshing is accomplished by taking one component from each block and constructing the pixel in the form of (YUVX) where X represents an unused channel.
  • Byte swapping may be applied to each output to swap the channels as desired.
  • the result organizer 249 must also do any required sub-sampling to reconstruct chroma-data from decompressed output. This can involve replicating each program channel to produce and an one.
  • the result organizer 249 of FIG. 2 is based around the usual standard CBus interface 840 which includes a register file of registers to be set for operation of the result organizer 249 .
  • the operation of the result organizer 249 is similar to that of the pixel organizer 246 , however the reverse data manipulation operations take place.
  • a data manipulation unit 842 performs byte lane swapping, component substitution, component deselection and denormalization operations on data provided by the MUV address generator (MAG) 805 .
  • the operations carried out are those previously described with reference to FIG. 42 and operate in accordance with various fields set in internal registers.
  • the FIFO queue 843 provides buffering of output data before it is output via RBus control unit 844 .
  • the RBus control unit 844 is composed of an address decoder and state machines for address generation.
  • the address for the destination module is stored in an internal register in addition to data on the number of output bytes required.
  • an internal RO_CUT register specifies how many output bytes to discard before sending a byte stream on the output bus.
  • a RO_LMT register specifies the maximum number of data items to be output with subsequent data bytes after the output limit being ignored.
  • the MAG 805 generates addresses for the MUV RAM 250 during JPEG decompression.
  • the MUV RAM 250 is utilized to double buffer output from the JPEG decoder.
  • the MAG 805 performs any appropriate meshing of components in the MUV RAM 250 in accordance with an internal configuration register and outputs single channel, three channel or four channel interleaved pixels.
  • the data obtained from the MUV RAM 250 is then passed through the data manipulation unit 842 , since byte lane swapping may need to be applied before pixel data is sent to the appropriate destination.
  • the MAG 805 simply forwards data from the PBus receiver 845 straight through to the data manipulation unit 842 .
  • the two identical operand organizers 247 , 248 perform the function of buffering data from the data cache control 240 and forwarding the data to the JPEG coder 241 or the main data path 242 .
  • the operand organizers 247 , 248 are operated in a number of modes:
  • a number of modes of operation of the main data path 242 require at least one of the operand organizers 247 , 248 to operate in sequential mode. These modes include compositing wherein operand organizer B 247 is required to buffer pixels which are to be composited with another image. Operand organizer C 248 is used for compositing operations for attenuation of values for each data channel. In halftoning mode, operand organizer B 247 buffers 8 bit matrix coefficients and in hierarchial image format decompression mode the operand organizer B 247 buffers data for both vertical interpolation and residual merging instructions.
  • an operand organizer B constructs a single internal data word and replicates this word a number of times as given by an internal register.
  • an operand organizer B buffers data that comprizes a pixel tile.
  • An internal length register specifies the number of items to be generated by individual operand organizers 247 , 248 when operated in sequential/titling/constant mode.
  • Each operand organizer 247 , 248 keeps account of the number of data items processed so far and stops when the count reaches the value specified in its internal register.
  • Each operand organizer is further responsible for formatting input data via byte lane swapping, component substitution, packed/unpacked and normalization functions. The desired operations are configured utilising internal registers. Further, each operand organizer 247 , 248 may also be configured to constrict data items.
  • the operand organizer 247 , 248 includes the usual standard CBus interface and registers 850 responsible for the overall control of the operand organizer. Further, an OBus control unit 851 is provided for connection to the data cache controller 240 and is responsible for performing address generation for sequential/tile/constant modes, generating control signals to enable communications on the OBus interface to each operand organizer 247 , 248 and controlling data manipulation unit operations such as normalization and replication, that require the state to be saved from previous clock cycles of the input stream. When an operand organizer 247 , 248 is operating in sequential or tiling mode, the OBus control unit 851 sends requests for data to the data cache controller 240 , the addresses being determined by internal registers.
  • Each operand organizer further contains a 36 bit wide FIFO buffer 852 used to buffer data from the data cache controller 240 in various modes of operation.
  • a data manipulation unit 853 performs the same functions as the corresponding data manipulation unit 804 of the pixel organizer 246 .
  • a main data path/JPEG coder interface 854 multiplexer address and data to and from the main data path and JPEG coder modules 242 , 241 in normal operating mode.
  • the MDP/JC interface 854 passes input data from the data manipulation units 853 to the main data path and in the process may be configured to replicate this data.
  • the units 851 , 854 are bypassed in order to ensure high speed access to the data cache controller 240 and the color conversion tables.
  • the aspects of the following embodiment relate to an image processor providing a low cost computer architecture capable of performing a number of image processing operations at high speed. Still further, the image processor seeks to provide a flexible computer architecture capable of being configured to perform image processing operations that are not originally specified. The image processor also seeks to provide a computer architecture having a large amount of identical logic, which simplifies the design process and lowers the cost of designing such an architecture.
  • the computer architecture comprises a control register block, a decoding block, a data object processor, and flow control logic.
  • the control register block stores all the relevant information about the image processing operation.
  • the decoding block decodes the information into configuration signals, which configure an input data object interface.
  • the input data object interface accepts and stores data objects from outside, and distributes these data objects to the data object processor. For some image processing operations, the input data object interface may also generate addresses for data objects, so that the source of these data objects can provide the correct data objects.
  • the data object processor performs arithmetic operations on the data objects received.
  • the flow control logic controls the flow of data objects within the data object processing logic.
  • the data object processor can comprise a number of identical data object sub-processors, each of which processes part of an incoming data object.
  • the data object sub-processor includes a number of identical multifunctional arithmetic units that perform arithmetic operations on these parts of data objects, post processing logic that processes the outgoing data objects, and multiplexer logic that connects the multifunctional arithmetic units and the post-processing unit together.
  • the multifunctional arithmetic units contain storage for parts of the calculated data objects. The storage is enabled or disabled by the flow control logic.
  • the multifunctional arithmetic units and multiplexer logic are configured by the configuration signals generated by the decoding logic.
  • configuration signals from the decoding logic can be overridden by an external programming agent.
  • any multifunctional blocks and multiplexer logic can be individually configured by an external programming agent, allowing it to configure the image processor to perform image processing operations that are not specified beforehand.
  • the main data path unit 242 performs all data manipulation operations and instructions other than JPEG data coding. These instructions include compositing, color space conversion, image transformations, convolution, matrix multiplication, halftoning, memory copying and hierarchial image format decompression.
  • the main data path 242 receives pixel and operand data from the pixel organizer 246 , and operand organizers 247 , 248 and feeds the resultant output to the result organizer 249 .
  • FIG. 129 illustrates a block diagram of the main data path unit 242 .
  • the main data path unit 242 is a general image processor and includes input interface 1460 , image data processor 1462 , instruction word register 1464 , instruction word decoder 1468 , control signal register 1470 , register file 1472 , and a ROM 1475 .
  • the instruction controller 235 transfers instruction words to the instruction word register 1464 via bus 1454 .
  • Each instruction word contains information such as the kind of image processing operation to be executed, and flags to enable or disable various options in that image processing operation.
  • the instruction word is then transferred to the instruction word decoder 1468 via bus 1465 .
  • Instruction controller 235 can then indicate to the instruction word decoder 1468 to decode the instruction word.
  • the instruction decoder 1468 decodes the instruction word into control signals. These control signals are then transferred via bus 1469 to the control signal register 1470 .
  • the output of the control signal register is then connected to the input interface 1460 and image data processor 1462 via bus 1471 .
  • the instruction controller 235 can also write into the control signal register 1470 . This allows anyone who is familiar with the structure of the main data path unit 242 to micro-configure the main data path unit 242 so that the main data path unit 242 will execute image processing operations that are not be described by any instruction word.
  • the instruction controller 235 can write all the other information necessary to perform the desired image processing operation into some of the selected registers in register file 1472 .
  • the information is then transferred to the input interface 1460 and the image data processor 1462 via bus 1473 .
  • the input interface 1460 may update the contents of selected registers in the register file 1472 to reflect the current status of the main data path unit 242 . This feature helps the instruction controller 235 to find out what the problem is when there is a problem in executing an image processing operation.
  • the instruction controller 235 can indicate to the main data path unit 242 to start performing the desired image processing operation.
  • the input interface 1460 begins to accept data objects coming from bus 1451 .
  • the input interface 1460 may also begins to accept operand data coming from operand bus 1452 and/or operand bus 1453 , or generates addresses for operand data and receive operand data from operand bus 1452 and/or operand bus 1453 .
  • the input interface 1460 then stores and rearranges the incoming data in accordance with the output of the control signal register 1470 .
  • the input interface 1460 also generates coordinates to be fetched via buses 1452 and 1453 when calculating such functions as affine image transformation operations and convolution.
  • the image data processor 1462 performs the major arithmetic operations on the rearranged data objects from the input interface 1460 .
  • the image processor 1462 can: interpolate between two data objects with a provided interpolation factor; multiply two data objects and divide the product by 255; multiply and add two data objects in general; round off fraction parts of a data object which may have various resolutions; clamp overflow of a data object to some maximum value and underflow of a data object to some minimum value; and perform scaling and clamping on a data object.
  • the control signals on bus 1471 control which of the above arithmetic operations are performed on the data objects, and the order of the operations.
  • a ROM 1475 contains the dividends of 255/x, where x is from 0 to 255, rounded in 8.8 format.
  • the ROM 1475 is connected to the input interface 1460 and the image data processor 1462 via bus 1476 .
  • the ROM 1475 is used to generate blends of short lengths and multiply one data object by 255 and dividing the product by another data object.
  • the number of operand buses eg 1452 is limited to 2, which is sufficient for most image processing operations.
  • FIG. 130 illustrates the input interface 1460 in further detail.
  • Input interface 1460 includes data object interface unit 1480 , operand interface units 1482 and 1484 , address generation state machine 1486 , blend generation state machine 1488 , matrix multiplication state machine 1490 , interpolation state machine 1490 , data synchronizer 1500 , arithmetic unit 1496 , miscellaneous register 1498 , and data distribution logic 1505 .
  • Data object interface unit 1480 and operand interface units 1482 and 1484 are responsible to receive data objects and operands from outside. These interface units 1482 , 1484 are all configured by control signals from control bus 1515 . These interface units 1482 , 1484 have data registers within them to contain the data objects/operands that they have just received, and they all produce a VALID signal which is asserted when the data within the data register is valid. The outputs of the data registers in these interface units 1482 , 1484 are connected to data bus 1505 . The VALID signals of these interface units 1482 , 1484 are connected to flow bus 1510 .
  • operand interface units 1482 and 1484 When configured to fetch operands, operand interface units 1482 and 1484 accept addresses from arithmetic unit 1496 , matrix multiplication state machine 1490 and/or the output of data register in data object interface unit 1480 , and select amongst them the required address in accordance with the control signals from control bus 1515 .
  • the data registers in operand interface units 1482 and 1484 can be configured to store data from the output of data register in data object interface unit 1480 or arithmetic unit 1496 , especially when they are not needed to accept and store data from outside.
  • Address generation state machine 1486 is responsible for controlling arithmetic unit 1496 so that it calculates the next coordinates to be accessed in the source image in affine image transformation operations and convolution operations.
  • the address generation state machine 1486 waits for START signal on control bus 1515 to be set. When the START signal on control bus 1515 is set, address generation state machine 1486 then de-asserts the STALL signal to data object interface unit 1480 , and waits for data objects to arrive. It also sets a counter to be the number of data objects in a kernel descriptor that address generation state machine 1486 needs to fetch. The output of the counter is decoded to become enable signals for data registers in operand interface units 1482 and 1484 and miscellaneous register 1498 . When the VALID signal from data object interface unit 1480 is asserted, address generation state machine 1486 decrements the counter, so the next piece of data object is latched into a different register.
  • address generation state machine 1486 tells operand interface unit 1482 to start fetching index table values and pixels from operand interface unit 1484 . Also, it loads two counters, one with the number of rows, another with the number of columns. At every clock edge, when it is not paused by STALL signals from the operand interface unit 1482 or others, the counters are decremented to give the remaining rows and columns, and the arithmetic unit 1496 calculates the next coordinates to be fetched from. When both counters have reached zero, the counters reload themselves with the number of rows and columns again, and arithmetic unit 1496 is configured to find the top left hand corner of the next matrix.
  • address generation state machine 1486 decrements the number of rows and columns after every second clock cycle. This is implemented using a 1-bit counter, with the output used as the enable of the row and column counter. After the matrix is traversed around once, the state machine sends a signal to decrement the count in the length counter. When the counter reaches 1, and the final index table address is sent to the operand interface unit 1482 , the state machine asserts a final signal, and resets the start bit.
  • Blend generation state machine 1488 is responsible for controlling arithmetic unit 1496 to generate a sequence of numbers from 0 to 255 for the length of a blend. This sequence of numbers is then used as the interpolation factor to interpolate between the blend start value and blend end value.
  • Blend generation state machine 1488 determines which mode it should run in (jump mode or step mode). If the blend length is less than or equal to 256, then jump mode is used, otherwize step mode is used.
  • the blend generation state machine 1488 calculates the following and puts them in registers (reg 0 , reg 1 , reg 2 ). If a blend ramp is in step mode for a predetermined length, then latch 511 -length in reg 0 (24 bits), 512 ⁇ 2*length in reg 1 (24 bits), and end-start in reg 2 (4 ⁇ 9 bits). If the ramp is in jump mode, then latch 0 into reg 0 , 255/(length-1) into reg 1 , and end-start into reg 2 (4 ⁇ 9 bits).
  • step mode the following operations are performed for every cycle:
  • reg 0 If reg 0 >0, then add reg 0 with reg 1 and store the result in reg 0 .
  • the least 8 bits of the integer part of the incrementor is the ramp value.
  • the ramp value, the output of reg 2 , and the blend start value is then fed into the image data processor 1462 to produce the ramp.
  • Matrix multiplication state machine 1490 is responsible for performing linear color space conversion on input data objects using a conversion matrix.
  • the conversion matrix is of the dimension 4 ⁇ 5. The first four columns multiply with the 4 channels in the data object, while the last column contains constant coefficients to be added to the sum of products.
  • matrix multiplication state machine does the following:
  • Interpolation state machine 1494 is responsible for performing horizontal interpolation of data objects. During horizontal interpolation, main data path unit 242 accepts a stream of data objects from bus 1451 , and interpolates between adjacent data objects to output a stream of data objects which is twice or 4 times as long as the original stream. Since the data objects can be packed bytes or pixels, interpolation state machine 1494 operates differently in each case to maximize the throughput. Interpolation state machine 1494 does the following:
  • Arithmetic unit 1496 contains circuitry for performing arithmetic calculations. It is configured by control signals on control bus 1515 . It is used by two instructions only: affine image transformation and convolution, and blend generation in compositing.
  • arithmetic unit 1496 is responsible for:
  • arithmetic unit 1496 uses an adder/subtractor to add/subtract the x part of horizontal and vertical delta to/from the current x coordinate.
  • arithmetic unit 1498 uses an adder/subtractor to add/subtract the y part of the horizontal or vertical delta to/from the current y coordinate.
  • arithmetic unit 1496 does the following:
  • step mode one of the ramp adders is used to calculate an internal variable in the ramp generation algorithm, while the other adder is used to increment the ramp value when the internal variable is greater than 0.
  • Miscellaneous register 1498 provides extra storage space apart from the data registers in data object interface unit 1480 and operand interface units 1482 and 1484 . It is usually used to store internal variables or as a buffer of past data objects from data object interface unit 1480 . It is configured by control signals on control bus 1515 .
  • Data synchronizer 1500 is configured by control signals on control bus 1515 . It provides STALL signals to data object interface unit 1480 and operand interface units 1482 and 1484 so that if one of the interface units receives a piece of data object others have not, that interface unit is stalled until all the other interface units have received their pieces of data.
  • Data distribution logic 1505 rearranges data objects from data bus 1510 and register file 1472 via bus 1530 in accordance with control signals on control bus 1515 , including a MAT_SEL signal from matrix multiplication state machine 1490 and a INT_SEL signal from interpolation state machine 1494 .
  • the rearranged data is outputed onto bus 1461 .
  • FIG. 131 illustrates image data processor 1462 of FIG. 129 in further detail.
  • Image data processor 1462 includes a pipeline controller 1540 , and a number of color channel processors 1545 , 1550 , 1555 and 1560 . All color channel processors accept inputs from bus 1565 , which is driven by the input interface 1460 (FIG. 131 ). All color channel processors and pipeline controller 1540 are configured by control signals from control signal register 1470 via bus 1472 . All the color channel processors also accept inputs from register file 1472 and ROM 1475 of FIG. 129 via bus 1580 . The outputs of all the color channel processors and pipeline controller are grouped together to form bus 1570 , which forms the output 1455 of image data processor 1462 .
  • Pipeline controller 1540 controls the flow of data objects within all the color channel processors by enabling and disabling registers within all the color channel processors.
  • pipeline controller 1540 there is a pipeline of registers. The shape and depth of the pipeline is configured by the control signals from bus 1471 , and the pipeline in pipeline controller 1540 has the same shape as the pipeline in the color channel processors.
  • the Pipeline controller accepts VALID signals from bus 1565 . For each pipeline stage Within pipeline controller 1540 , if the incoming VALID signal is asserted and the pipeline stage is not stalled, then the pipeline stage asserts the register enable signals to all color channel processors, and latch the incoming VALID signal. The output of the latch then a VALID signal going to the next pipeline stage. In this way the movement of data objects in the pipeline is simulated and controlled, without storage of any data.
  • Color channel processors 1545 , 1550 , 1555 and 1560 perform the main arithmetic operations on incoming data objects, with each of them responsible for one of the channels of the output data object.
  • the number of color channel processors is limited to 4, since most pixel data objects have a maximum of 4 channels.
  • One of the color channel processors processes the opacity channel of a pixel.
  • There is additional circuitry (not shown in FIG. 131 ), connected to the control bus 1471 , which transforms the control signals from the control bus 1471 so that the color channel processor processes the opacity channel correctly, as for some image processing operations the operations on the opacity channel is slightly different from the operations on the color channels.
  • FIG. 132 illustrates color channel processor 1545 , 1550 , 1555 or 1560 (generally denoted by 1600 in FIG. 132) in further detail.
  • Each color channel processor 1545 , 1550 , 1555 or 1560 includes processing block A 1610 , processing block B 1615 , big adder 1620 , fraction rounder 1625 , clamp-or-wrapper 1630 , and output multiplexer 1635 .
  • the color channel processor 1600 accepts control signals from control signal register 1470 via bus 1602 , enable signals from pipeline controller 1540 via bus 1604 , information from register file 1472 via bus 1605 , data objects from other color channel processor via bus 1603 , and data objects from input interface 1460 via bus 1601 .
  • Processing block A 1610 performs some arithmetic operations on the data objects from bus 1601 , and produces partially computed data objects on bus 1611 . The following illustrates what processing block A 1610 does for designated image processing operations.
  • processing block A 1610 pre-multiplies data objects from data object bus 1451 with opacity, interpolates between a blend start value and a blend end value with an interpolation factor from input interface 1460 in FIG. 129, pre-multiplies operands from operand bus 1452 in FIG. 129 or multiplies blend color by opacity, and attenuates multiplication on pre-multiplied operand or blend color data.
  • the processing block A 1610 interpolates between 4 color table values using two fraction values from bus 1451 in FIG. 129 .
  • the processing block A 1610 pre-multiplies the color of the source pixel by opacity, and interpolates between pixels on the same row using the fraction part of current x-coordinate.
  • the processing block A 1610 pre-multiplies color of the source pixel by opacity, and multiplies pre-multiplied color data with conversion matrix coefficients.
  • the processing block A 1610 interpolates between two data objects.
  • the processing block A 1610 adds two data objects.
  • Processing block A 1610 includes a number of multifunction blocks 1640 and processing block A glue logic 1645 .
  • the multifunction blocks 1640 are configured by control signals, and may perform any one of the following functions:
  • Processing block A glue logic 1645 accepts data objects from bus 1601 and data objects from bus 1603 , and the outputs of some of the multifunction blocks 1640 , and routes them to inputs of other selected multifunction blocks 1640 .
  • Processing block A glue logic 1645 is also configured by control signals from bus 1602 .
  • Processing block B 1615 performs arithmetic operations on the data objects from bus 1601 , and partially computed data objects from bus 1611 , to produce partially computed data objects on bus 1616 .
  • the following description illustrates what processing block B 1615 does for designated image processing operations.
  • the processing block B 1615 multiplies pre-processed data objects from data object bus 1451 and operands from operand bus 1452 with compositing multiplicands from bus 1603 , and multiplies clamped/wrapped data objects by output of the ROM, which is 255/opacity in 8.8 format.
  • the processing block B 1615 adds two pre-processed data objects. In the opacity channel, it also subtracts 255 from the sum, multiplies an offset with the difference, and divides the product by 255.
  • the processing block B 1615 interpolates between 4 color table values using 2 of the fraction values from bus 1451 , and interpolates between partially interpolated color value from processing block A 1610 and the result of the previous interpolation using the remaining fraction value.
  • the processing block B 1615 interpolates between partially interpolated pixels using the fraction part of current y-coordinate, and multiplies interpolated pixels with coefficients in a sub-sample weight matrix.
  • the processing block B 1615 pre-multiplies the color of the source pixel by opacity, and multiplies pre-multiplied color with conversion matrix coefficients.
  • Processing block B 1615 again includes a number of multifunction blocks and processing block B glue logic 1650 .
  • the multifunction blocks are exactly the same as those in processing block A 1610 , but the processing block B glue logic 1650 accepts data objects from buses 1601 , 1603 , 1611 , 1631 and the outputs of selected multifunction blocks and routes them to the inputs of selected multifunction blocks.
  • Processing block B glue logic 1650 is also configured by control signals from bus 1602 .
  • Big adder 1620 is responsible for combining some of the partial results from processing block A 1610 and processing block B 1615 . It accepts inputs from input interface 1460 via bus 1601 , processing block A 1610 via bus 1611 , processing block B 1615 via bus 1616 , and register file 1472 via bus 1605 , and it produces the combined result on bus 1621 . It is also configured by control signals on bus 1602 .
  • big adder 1620 may be configured differently. The following description illustrates its operation during designated image processing operations.
  • the big adder 1620 adds two partial products from processing block B 1615 together.
  • the big adder 1620 subtracts the sum of pre-processed data objects with offset from the opacity channel, if an offset enable is on.
  • the big adder 1620 accumulates the products from processing block B 1615 .
  • the big adder adds the two matrix coefficients/data object products and the constant coefficient together.
  • the second cycle adds the sum of last cycle with another two matrix coefficients/data object products together.
  • Fraction rounder 1625 accepts input from the big adder 1620 via bus 1621 and rounds off the fraction part of the output.
  • the number of bits representing the fraction part is described by a BP signal on bus 1605 from register file 1472 .
  • the following table shows how the BP signal is interpreted.
  • the rounded output is provided on bus 1626 .
  • fraction rounder 1625 also does two things:
  • Clamp-or-wrapper 1630 accepts inputs from fraction rounder 1625 via bus 1626 and does the following in the order described:
  • Output multiplexer 1635 selects the final output from the output of processing block B on bus 1616 and the output of clamp-or-wrapper on bus 1631 . It also performs some final processing on the data object.
  • the following description illustrates its operation for designated image processing operations.
  • the multiplexer 1635 combines some of the outputs of processing block B 1615 to form the un-pre-multiplied data object.
  • the multiplexer 1635 passes on the output of clamp-or-wrapper 1630 .
  • the multiplexer 1635 combines some of the outputs of processing block B 1630 to form resultant data object.
  • the multiplexer 1635 applies the translate-and-clamp function on the output data object.
  • the multiplexer 1635 passes on the output of clamp-or-wrapper 1630 .
  • FIG. 133 illustrates a single multifunction block (e.g. 1640 ) in further detail.
  • Multifunction block 1640 includes mode detector 1710 , two addition operand logic units 1660 and 1670 , 3 multiplexing logic units 1680 , 1685 and 1690 , a 2-input adder 1675 , a 2-input multiplier with 2 addends 1695 , and register 1705 .
  • Mode detector 1710 accepts one input from control signal register 1470 , in FIG. 129 the MODE signal 1711 , and two inputs from input interface 1460 , in FIG. 129 SUB signal 1712 and SWAP signal 1713 . Mode detector 1710 decodes these signals into control signals going to addition operand logic units 1660 and 1670 , and multiplexing logic units 1680 , 1685 and 1690 , and these control signals configure multifunction block 1640 to perform various operations. There are 8 modes in multifunction block 1640 :
  • Add/sub mode adds or subtract input 1655 to/from input 1665 , in accordance with the SUB signal 1712 . Also, the inputs can be swapped in accordance with the SWAP signal 693 .
  • Interpolate mode interpolates between inputs 1655 and 1665 using input 1675 as the interpolation factor. Inputs 1655 and 1665 can be swapped in accordance with the SWAP signal 1713 .
  • Pre-multiply mode multiplies input 1655 with input 1675 and divide it by 255.
  • the output of the INC register 1708 tells the next stage whether to increment the result of this stage in bus 1707 to obtain the correct result.
  • Multiply mode multiplies input 1655 with 1675 .
  • Add/subtract-and-pre-multiply mode adds/subtracts input 1665 to/from input 1655 , multiplies the sum/difference with input 1675 , and then divide the product by 255.
  • the output of the INC register 1708 tells the next stage whether to increment the result of this stage in bus 1707 to obtain the correct result.
US09/025,613 1997-04-30 1998-02-18 Decoder of variable length codes Expired - Lifetime US6272257B1 (en)

Applications Claiming Priority (28)

Application Number Priority Date Filing Date Title
AUPO6487A AUPO648797A0 (en) 1997-04-30 1997-04-30 Page table pointers for co-processor virtual memory systems
AUPO6483A AUPO648397A0 (en) 1997-04-30 1997-04-30 Improvements in multiprocessor architecture operation
AUPO6492 1997-04-30
AUPO6485A AUPO648597A0 (en) 1997-04-30 1997-04-30 Reconfigurable data cache controller
AUPO6482 1997-04-30
AUPO6481 1997-04-30
AUPO6479 1997-04-30
AUPO6489 1997-04-30
AUPO6488A AUPO648897A0 (en) 1997-04-30 1997-04-30 Data normalization technique
AUPO6488 1997-04-30
AUPO6484 1997-04-30
AUPO6491 1997-04-30
AUPO6487 1997-04-30
AUPO6480A AUPO648097A0 (en) 1997-04-30 1997-04-30 General image processor
AUPO6486 1997-04-30
AUPO6490 1997-04-30
AUPO6480 1997-04-30
AUPO6484A AUPO648497A0 (en) 1997-04-30 1997-04-30 A fast dct apparatus
AUPO6491A AUPO649197A0 (en) 1997-04-30 1997-04-30 Multi-instruction stream processor
AUPO6489A AUPO648997A0 (en) 1997-04-30 1997-04-30 Instruction encoding format
AUPO6482A AUPO648297A0 (en) 1997-04-30 1997-04-30 Reconfigurable data buffer
AUPO6492A AUPO649297A0 (en) 1997-04-30 1997-04-30 Register setting-micro programming system
AUPO6486A AUPO648697A0 (en) 1997-04-30 1997-04-30 Decoder of variable length codes
AUPO6481A AUPO648197A0 (en) 1997-04-30 1997-04-30 Data normalisation circuit
AUPO6483 1997-04-30
AUPO6479A AUPO647997A0 (en) 1997-04-30 1997-04-30 Memory controller architecture
AUPO6485 1997-04-30
AUPO6490A AUPO649097A0 (en) 1997-04-30 1997-04-30 Cached colour conversion method and apparatus

Publications (1)

Publication Number Publication Date
US6272257B1 true US6272257B1 (en) 2001-08-07

Family

ID=27584624

Family Applications (5)

Application Number Title Priority Date Filing Date
US09/025,771 Expired - Lifetime US6246396B1 (en) 1997-04-30 1998-02-18 Cached color conversion method and apparatus
US09/025,613 Expired - Lifetime US6272257B1 (en) 1997-04-30 1998-02-18 Decoder of variable length codes
US09/025,506 Expired - Lifetime US6195674B1 (en) 1997-04-30 1998-02-18 Fast DCT apparatus
US09/025,770 Expired - Fee Related US6507898B1 (en) 1997-04-30 1998-02-18 Reconfigurable data cache controller
US09/025,194 Expired - Lifetime US6349379B2 (en) 1997-04-30 1998-02-18 System for executing instructions having flag for indicating direct or indirect specification of a length of operand data

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/025,771 Expired - Lifetime US6246396B1 (en) 1997-04-30 1998-02-18 Cached color conversion method and apparatus

Family Applications After (3)

Application Number Title Priority Date Filing Date
US09/025,506 Expired - Lifetime US6195674B1 (en) 1997-04-30 1998-02-18 Fast DCT apparatus
US09/025,770 Expired - Fee Related US6507898B1 (en) 1997-04-30 1998-02-18 Reconfigurable data cache controller
US09/025,194 Expired - Lifetime US6349379B2 (en) 1997-04-30 1998-02-18 System for executing instructions having flag for indicating direct or indirect specification of a length of operand data

Country Status (2)

Country Link
US (5) US6246396B1 (US06272257-20010807-P00020.png)
EP (5) EP0875855B1 (US06272257-20010807-P00020.png)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6393545B1 (en) 1919-04-30 2002-05-21 Canon Kabushiki Kaisha Method apparatus and system for managing virtual memory with virtual-physical mapping
US6408421B1 (en) * 1998-09-15 2002-06-18 The Trustees Of Columbia University High-speed asynchronous decoder circuit for variable-length coded data
US6674536B2 (en) 1997-04-30 2004-01-06 Canon Kabushiki Kaisha Multi-instruction stream processor
US20040105500A1 (en) * 2002-04-05 2004-06-03 Koji Hosogi Image processing system
US20040223608A1 (en) * 2001-09-25 2004-11-11 Oommen B. John Cryptosystem for data security
US20040252891A1 (en) * 2001-09-14 2004-12-16 Daigo Sasaki Image processing apparatus, image transmission apparatus, image reception apparatus, and image processing method
US20050050341A1 (en) * 2003-08-28 2005-03-03 Sunplus Technology Co., Ltd. Device of applying protection bit codes to encrypt a program for protection
US20060015648A1 (en) * 2004-06-30 2006-01-19 Nokia Inc. Chaining control marker data structure
US20060181724A1 (en) * 2005-02-14 2006-08-17 Stmicroelectronics Sa Image processing method and device
US20060267996A1 (en) * 2005-05-27 2006-11-30 Jiunn-Shyang Wang Apparatus and method for digital video decoding
US20080218387A1 (en) * 2007-03-07 2008-09-11 Industrial Technology Research Institute Variable length decoder utilizing reordered index decoding look-up-table (lut) and method of using the same
US20080270429A1 (en) * 2007-04-27 2008-10-30 Nec Electronics Corporation Data development device and data development method
US20080320223A1 (en) * 2006-02-27 2008-12-25 Fujitsu Limited Cache controller and cache control method
US20090002765A1 (en) * 2007-06-29 2009-01-01 Konica Minolta Systems Laboratory, Inc. Systems and Methods of Trapping for Print Devices
US20090244563A1 (en) * 2008-03-27 2009-10-01 Konica Minolta Systems Laboratory, Inc. Systems and methods for color conversion
US20090310151A1 (en) * 2008-06-12 2009-12-17 Kurt Nathan Nordback Systems and Methods for Multi-Mode Color Blending
US20100098151A1 (en) * 2004-06-07 2010-04-22 Nahava Inc. Method and Apparatus for Cached Adaptive Transforms for Compressing Data Streams, Computing Similarity, and Recognizing Patterns
US7873947B1 (en) * 2005-03-17 2011-01-18 Arun Lakhotia Phylogeny generation
CN102147766A (zh) * 2010-12-17 2011-08-10 曙光信息产业股份有限公司 一种维护tcp流表结构和乱序缓冲区的方法
US8527849B2 (en) * 2011-08-19 2013-09-03 Stec, Inc. High speed hard LDPC decoder
US8570340B2 (en) 2008-03-31 2013-10-29 Konica Minolta Laboratory U.S.A., Inc. Systems and methods for data compression
US20140189304A1 (en) * 2012-12-31 2014-07-03 Tensilica Inc. Bit-level register file updates in extensible processor architecture
US9448801B2 (en) 2012-12-31 2016-09-20 Cadence Design Systems, Inc. Automatic register port selection in extensible processor architecture
US9478312B1 (en) * 2014-12-23 2016-10-25 Amazon Technologies, Inc. Address circuit
US10171107B2 (en) 2014-01-31 2019-01-01 Hewlett-Packard Development Company, L.P. Groups of phase invariant codewords

Families Citing this family (196)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015780A1 (en) * 1995-10-25 2006-01-19 Cityu Research Limited Specifying data timeliness requirement and trap enabling on instruction operands of a processor
US7266725B2 (en) * 2001-09-03 2007-09-04 Pact Xpp Technologies Ag Method for debugging reconfigurable architectures
DE19651075A1 (de) 1996-12-09 1998-06-10 Pact Inf Tech Gmbh Einheit zur Verarbeitung von numerischen und logischen Operationen, zum Einsatz in Prozessoren (CPU's), Mehrrechnersystemen, Datenflußprozessoren (DFP's), digitalen Signal Prozessoren (DSP's) oder dergleichen
DE19654595A1 (de) * 1996-12-20 1998-07-02 Pact Inf Tech Gmbh I0- und Speicherbussystem für DFPs sowie Bausteinen mit zwei- oder mehrdimensionaler programmierbaren Zellstrukturen
DE19654846A1 (de) * 1996-12-27 1998-07-09 Pact Inf Tech Gmbh Verfahren zum selbständigen dynamischen Umladen von Datenflußprozessoren (DFPs) sowie Bausteinen mit zwei- oder mehrdimensionalen programmierbaren Zellstrukturen (FPGAs, DPGAs, o. dgl.)
ATE243390T1 (de) * 1996-12-27 2003-07-15 Pact Inf Tech Gmbh Verfahren zum selbständigen dynamischen umladen von datenflussprozessoren (dfps) sowie bausteinen mit zwei- oder mehrdimensionalen programmierbaren zellstrukturen (fpgas, dpgas, o.dgl.)
US6542998B1 (en) 1997-02-08 2003-04-01 Pact Gmbh Method of self-synchronization of configurable elements of a programmable module
US6311258B1 (en) 1997-04-03 2001-10-30 Canon Kabushiki Kaisha Data buffer apparatus and method for storing graphical data using data encoders and decoders
AU707898B2 (en) * 1997-08-29 1999-07-22 Canon Kabushiki Kaisha Load balancing and image rendering system
US8686549B2 (en) 2001-09-03 2014-04-01 Martin Vorbach Reconfigurable elements
DE19861088A1 (de) * 1997-12-22 2000-02-10 Pact Inf Tech Gmbh Verfahren zur Reparatur von integrierten Schaltkreisen
WO2000043883A1 (fr) * 1999-01-25 2000-07-27 Mitsubishi Denki Kabushiki Kaisha Unite peripherique de controleur programmable
US7003660B2 (en) 2000-06-13 2006-02-21 Pact Xpp Technologies Ag Pipeline configuration unit protocols and communication
US6449679B2 (en) * 1999-02-26 2002-09-10 Micron Technology, Inc. RAM controller interface device for RAM compatibility (memory translator hub)
US7643481B2 (en) * 1999-03-17 2010-01-05 Broadcom Corporation Network switch having a programmable counter
US6707818B1 (en) * 1999-03-17 2004-03-16 Broadcom Corporation Network switch memory interface configuration
US6499028B1 (en) * 1999-03-31 2002-12-24 International Business Machines Corporation Efficient identification of candidate pages and dynamic response in a NUMA computer
AU5805300A (en) * 1999-06-10 2001-01-02 Pact Informationstechnologie Gmbh Sequence partitioning in cell structures
US6449619B1 (en) 1999-06-23 2002-09-10 Datamirror Corporation Method and apparatus for pipelining the transformation of information between heterogeneous sets of data sources
JP3934290B2 (ja) * 1999-09-30 2007-06-20 株式会社東芝 離散コサイン変換処理装置、逆離散コサイン変換処理装置及び離散コサイン変換処理装置・逆離散コサイン変換処理装置
US6848029B2 (en) * 2000-01-03 2005-01-25 Dirk Coldewey Method and apparatus for prefetching recursive data structures
JP4251748B2 (ja) * 2000-03-09 2009-04-08 コニカミノルタビジネステクノロジーズ株式会社 色変換装置
US7106347B1 (en) * 2000-05-31 2006-09-12 Intel Corporation Transforming pixel data and addresses
US6870523B1 (en) 2000-06-07 2005-03-22 Genoa Color Technologies Device, system and method for electronic true color display
US8058899B2 (en) 2000-10-06 2011-11-15 Martin Vorbach Logic cell array and bus system
US7352488B2 (en) * 2000-12-18 2008-04-01 Genoa Color Technologies Ltd Spectrally matched print proofer
US6510506B2 (en) * 2000-12-28 2003-01-21 Intel Corporation Error detection in cache tag array using valid vector
JP4791637B2 (ja) * 2001-01-22 2011-10-12 キヤノンアネルバ株式会社 Cvd装置とこれを用いた処理方法
US7444531B2 (en) * 2001-03-05 2008-10-28 Pact Xpp Technologies Ag Methods and devices for treating and processing data
US20070299993A1 (en) * 2001-03-05 2007-12-27 Pact Xpp Technologies Ag Method and Device for Treating and Processing Data
US20090300262A1 (en) * 2001-03-05 2009-12-03 Martin Vorbach Methods and devices for treating and/or processing data
US7844796B2 (en) * 2001-03-05 2010-11-30 Martin Vorbach Data processing device and method
US7581076B2 (en) * 2001-03-05 2009-08-25 Pact Xpp Technologies Ag Methods and devices for treating and/or processing data
US9037807B2 (en) * 2001-03-05 2015-05-19 Pact Xpp Technologies Ag Processor arrangement on a chip including data processing, memory, and interface elements
US6816276B2 (en) 2001-03-08 2004-11-09 Electronics For Imaging, Inc. Efficiently scheduled multiple raster image processors
US20020184612A1 (en) * 2001-06-01 2002-12-05 Hunt Joseph R. Runtime configurable caching for component factories
US7436996B2 (en) 2001-06-07 2008-10-14 Genoa Color Technologies Ltd Device, system and method of data conversion for wide gamut displays
US7714824B2 (en) * 2001-06-11 2010-05-11 Genoa Color Technologies Ltd. Multi-primary display with spectrally adapted back-illumination
US8289266B2 (en) * 2001-06-11 2012-10-16 Genoa Color Technologies Ltd. Method, device and system for multi-color sequential LCD panel
AU2002304276A1 (en) * 2001-06-11 2002-12-23 Moshe Ben-Chorin Device, system and method for color display
WO2002103532A2 (de) * 2001-06-20 2002-12-27 Pact Xpp Technologies Ag Verfahren zur bearbeitung von daten
US7107464B2 (en) 2001-07-10 2006-09-12 Telecom Italia S.P.A. Virtual private network mechanism incorporating security association processor
WO2003007074A1 (en) * 2001-07-12 2003-01-23 Genoa Technologies Ltd. Sequential projection color display using multiple imaging panels
EP1423839A4 (en) * 2001-07-23 2007-03-07 Genoa Color Technologies Ltd SYSTEM AND METHOD FOR DISPLAYING AN IMAGE
US7996827B2 (en) * 2001-08-16 2011-08-09 Martin Vorbach Method for the translation of programs for reconfigurable architectures
EP1423972A1 (en) * 2001-08-27 2004-06-02 Koninklijke Philips Electronics N.V. Cache method
US7434191B2 (en) * 2001-09-03 2008-10-07 Pact Xpp Technologies Ag Router
US8686475B2 (en) 2001-09-19 2014-04-01 Pact Xpp Technologies Ag Reconfigurable elements
TW594743B (en) * 2001-11-07 2004-06-21 Fujitsu Ltd Memory device and internal control method therefor
JP3916953B2 (ja) * 2001-12-28 2007-05-23 日本テキサス・インスツルメンツ株式会社 可変時分割多重伝送システム
AU2003208563A1 (en) * 2002-01-07 2003-07-24 Moshe Ben-Chorin Electronic color display for soft proofing
WO2003071418A2 (en) * 2002-01-18 2003-08-28 Pact Xpp Technologies Ag Method and device for partitioning large computer programs
EP1483682A2 (de) * 2002-01-19 2004-12-08 PACT XPP Technologies AG Reconfigurierbarer prozessor
US8914590B2 (en) * 2002-08-07 2014-12-16 Pact Xpp Technologies Ag Data processing method and device
AU2003223892A1 (en) * 2002-03-21 2003-10-08 Pact Xpp Technologies Ag Method and device for data processing
CN1659620B (zh) * 2002-04-11 2010-04-28 格诺色彩技术有限公司 具有增强的属性的彩色显示装置和方法
JP2005309474A (ja) * 2002-06-28 2005-11-04 Nokia Corp 離散コサイン変換(dct)を実行するために用いるdctプロセッサ
US7086034B2 (en) * 2002-06-28 2006-08-01 Canon Kabushiki Kaisha Method, program, and storage medium for acquiring logs
JP5226931B2 (ja) * 2002-07-24 2013-07-03 三星ディスプレイ株式會社 高輝度広色域ディスプレイ装置および画像生成方法
US20110238948A1 (en) * 2002-08-07 2011-09-29 Martin Vorbach Method and device for coupling a data processing unit and a data processing array
US7657861B2 (en) * 2002-08-07 2010-02-02 Pact Xpp Technologies Ag Method and device for processing data
AU2003286131A1 (en) * 2002-08-07 2004-03-19 Pact Xpp Technologies Ag Method and device for processing data
AU2003289844A1 (en) 2002-09-06 2004-05-13 Pact Xpp Technologies Ag Reconfigurable sequencer structure
GB2395307A (en) * 2002-11-15 2004-05-19 Quadrics Ltd Virtual to physical memory mapping in network interfaces
SG110006A1 (en) * 2002-12-05 2005-04-28 Oki Techno Ct Singapore Pte A method of calculating internal signals for use in a map algorithm
US7191318B2 (en) * 2002-12-12 2007-03-13 Alacritech, Inc. Native copy instruction for file-access processor with copy-rule-based validation
US7093099B2 (en) * 2002-12-12 2006-08-15 Alacritech, Inc. Native lookup instruction for file-access processor searching a three-level lookup cache for variable-length keys
US7254696B2 (en) * 2002-12-12 2007-08-07 Alacritech, Inc. Functional-level instruction-set computer architecture for processing application-layer content-service requests such as file-access requests
US7460252B2 (en) * 2003-01-13 2008-12-02 Axiohm Transaction Solutions, Inc. Graphical printing system and method using text triggers
CN1742304A (zh) 2003-01-28 2006-03-01 皇家飞利浦电子股份有限公司 用于具有多于三种原色的显示器的最佳子像素排列
US7133553B2 (en) * 2003-02-18 2006-11-07 Avago Technologies Sensor Ip Pte. Ltd. Correlation-based color mosaic interpolation adjustment using luminance gradients
US20040164101A1 (en) * 2003-02-20 2004-08-26 Valois Sas Fluid dispenser
US7366352B2 (en) * 2003-03-20 2008-04-29 International Business Machines Corporation Method and apparatus for performing fast closest match in pattern recognition
US7412569B2 (en) * 2003-04-10 2008-08-12 Intel Corporation System and method to track changes in memory
US7149867B2 (en) * 2003-06-18 2006-12-12 Src Computers, Inc. System and method of enhancing efficiency and utilization of memory bandwidth in reconfigurable hardware
WO2006082091A2 (en) * 2005-02-07 2006-08-10 Pact Xpp Technologies Ag Low latency massive parallel data processing device
WO2005013193A2 (en) * 2003-08-04 2005-02-10 Genoa Color Technologies Ltd. Multi-primary color display
EP1676208A2 (en) * 2003-08-28 2006-07-05 PACT XPP Technologies AG Data processing device and method
US20050122532A1 (en) * 2003-12-04 2005-06-09 Realtek Semiconductor Corp. Apparatus for color processing and method thereof
US7162092B2 (en) * 2003-12-11 2007-01-09 Infocus Corporation System and method for processing image data
US7495722B2 (en) 2003-12-15 2009-02-24 Genoa Color Technologies Ltd. Multi-color liquid crystal display
CN103177701A (zh) 2003-12-15 2013-06-26 格诺色彩技术有限公司 多基色液晶显示器
US20050182884A1 (en) * 2004-01-22 2005-08-18 Hofmann Richard G. Multiple address two channel bus structure
US20050188172A1 (en) * 2004-02-20 2005-08-25 Intel Corporation Reduction of address aliasing
US20050196055A1 (en) * 2004-03-04 2005-09-08 Sheng Zhong Method and system for codifying signals that ensure high fidelity reconstruction
US7636489B2 (en) * 2004-04-16 2009-12-22 Apple Inc. Blur computation algorithm
US7278122B2 (en) * 2004-06-24 2007-10-02 Ftl Systems, Inc. Hardware/software design tool and language specification mechanism enabling efficient technology retargeting and optimization
US7167971B2 (en) * 2004-06-30 2007-01-23 International Business Machines Corporation System and method for adaptive run-time reconfiguration for a reconfigurable instruction set co-processor architecture
US7589738B2 (en) * 2004-07-14 2009-09-15 Integrated Device Technology, Inc. Cache memory management system and method
US7126613B2 (en) * 2004-08-05 2006-10-24 Destiny Technology Corporation Graphic processing method
US7151709B2 (en) * 2004-08-16 2006-12-19 Micron Technology, Inc. Memory device and method having programmable address configurations
US20060041609A1 (en) * 2004-08-20 2006-02-23 Pellar Ronald J System and method for multi-dimensional lookup table interpolation
US7340105B2 (en) * 2004-09-10 2008-03-04 Marvell International Technology Ltd. Method and apparatus for image processing
US7957016B2 (en) * 2004-09-20 2011-06-07 Marvell International Technology Ltd. Method and apparatus for image processing
US7340582B2 (en) * 2004-09-30 2008-03-04 Intel Corporation Fault processing for direct memory access address translation
US7508397B1 (en) * 2004-11-10 2009-03-24 Nvidia Corporation Rendering of disjoint and overlapping blits
TW200625097A (en) * 2004-11-17 2006-07-16 Sandbridge Technologies Inc Data file storing multiple date types with controlled data access
US8380686B2 (en) * 2005-03-14 2013-02-19 International Business Machines Corporation Transferring data from a primary data replication appliance in a primary data facility to a secondary data replication appliance in a secondary data facility
US7499060B2 (en) * 2005-03-21 2009-03-03 Microsoft Corporation Robust interactive color editing
CN1882103B (zh) * 2005-04-04 2010-06-23 三星电子株式会社 实现改进的色域对映演算的系统及方法
US7802028B2 (en) * 2005-05-02 2010-09-21 Broadcom Corporation Total dynamic sharing of a transaction queue
US7707387B2 (en) 2005-06-01 2010-04-27 Microsoft Corporation Conditional execution via content addressable memory and parallel computing execution model
US7856523B2 (en) * 2005-06-01 2010-12-21 Microsoft Corporation Random Access Memory (RAM) based Content Addressable Memory (CAM) management
US7793040B2 (en) * 2005-06-01 2010-09-07 Microsoft Corporation Content addressable memory architecture
US7539916B2 (en) 2005-06-28 2009-05-26 Intel Corporation BIST to provide phase interpolator data and associated methods of operation
US7903306B2 (en) * 2005-07-22 2011-03-08 Samsung Electronics Co., Ltd. Sensor image encoding and/or decoding system, medium, and method
WO2007060672A2 (en) * 2005-11-28 2007-05-31 Genoa Color Technologies Ltd. Sub-pixel rendering of a multiprimary image
US7716100B2 (en) * 2005-12-02 2010-05-11 Kuberre Systems, Inc. Methods and systems for computing platform
US8593474B2 (en) * 2005-12-30 2013-11-26 Intel Corporation Method and system for symmetric allocation for a shared L2 mapping cache
US7564466B2 (en) * 2006-01-10 2009-07-21 Kabushiki Kaisha Toshiba System and method for managing memory for color transforms
US20070162531A1 (en) * 2006-01-12 2007-07-12 Bhaskar Kota Flow transform for integrated circuit design and simulation having combined data flow, control flow, and memory flow views
US8250503B2 (en) * 2006-01-18 2012-08-21 Martin Vorbach Hardware definition method including determining whether to implement a function as hardware or software
US7522173B1 (en) * 2006-02-23 2009-04-21 Nvidia Corporation Conversion of data in an sRGB format to a compact floating point format
US7549022B2 (en) * 2006-07-21 2009-06-16 Microsoft Corporation Avoiding cache line sharing in virtual machines
US8189683B2 (en) * 2006-11-28 2012-05-29 General Instrument Corporation Method and system for providing single cycle context weight update leveraging context address look ahead
JP2008140124A (ja) * 2006-12-01 2008-06-19 Matsushita Electric Ind Co Ltd データ処理装置
US7639263B2 (en) * 2007-01-26 2009-12-29 Microsoft Corporation Fast filtered YUV to RGB conversion
US20080215849A1 (en) * 2007-02-27 2008-09-04 Thomas Scott Hash table operations with improved cache utilization
US8213499B2 (en) * 2007-04-04 2012-07-03 General Instrument Corporation Method and apparatus for context address generation for motion vectors and coefficients
US20080247459A1 (en) * 2007-04-04 2008-10-09 General Instrument Corporation Method and System for Providing Content Adaptive Binary Arithmetic Coder Output Bit Counting
US8051124B2 (en) * 2007-07-19 2011-11-01 Itt Manufacturing Enterprises, Inc. High speed and efficient matrix multiplication hardware module
US8819095B2 (en) * 2007-08-28 2014-08-26 Qualcomm Incorporated Fast computation of products by dyadic fractions with sign-symmetric rounding errors
US7996620B2 (en) * 2007-09-05 2011-08-09 International Business Machines Corporation High performance pseudo dynamic 36 bit compare
US7936917B2 (en) * 2007-11-30 2011-05-03 Xerox Corporation Systems and methods for image data encoding and decoding
AU2007249117B2 (en) * 2007-12-19 2010-11-18 Canon Kabushiki Kaisha Variable-length encoding for image data compression
US7979451B2 (en) * 2008-03-19 2011-07-12 International Business Machines Corporation Data manipulation command method and system
US7979470B2 (en) * 2008-03-19 2011-07-12 International Business Machines Corporation Data manipulation process method and system
JP5173547B2 (ja) * 2008-04-15 2013-04-03 キヤノン株式会社 画像復号装置及びその制御方法
DE102008052421A1 (de) * 2008-10-21 2010-04-22 Giesecke & Devrient Gmbh Vorrichtung und Verfahren zum Bedrucken eines Banderolenstreifens
US8335256B2 (en) * 2008-11-14 2012-12-18 General Instrument Corporation Motion compensation in video coding
US8856463B2 (en) * 2008-12-16 2014-10-07 Frank Rau System and method for high performance synchronous DRAM memory controller
US8099555B2 (en) * 2009-01-23 2012-01-17 Konica Minolta Laboratory U.S.A., Inc. Systems and methods for memory management on print devices
JP2010198567A (ja) * 2009-02-27 2010-09-09 Fuji Xerox Co Ltd 情報処理装置及びプログラム
GB2474250B (en) 2009-10-07 2015-05-06 Advanced Risc Mach Ltd Video reference frame retrieval
CN102045559B (zh) * 2009-10-22 2013-03-20 鸿富锦精密工业(深圳)有限公司 视频解码装置及视频解码方法
KR20110070468A (ko) * 2009-12-18 2011-06-24 삼성전자주식회사 인스트루먼테이션 실행 장치 및 방법
US8400678B2 (en) * 2010-04-16 2013-03-19 Xerox Corporation FIFO methods, systems and apparatus for electronically registering image data
CN101888405B (zh) * 2010-06-07 2013-03-06 北京高森明晨信息科技有限公司 一种云计算的文件系统和数据处理方法
US8615645B2 (en) 2010-06-23 2013-12-24 International Business Machines Corporation Controlling the selectively setting of operational parameters for an adapter
US9342352B2 (en) 2010-06-23 2016-05-17 International Business Machines Corporation Guest access to address spaces of adapter
US8626970B2 (en) 2010-06-23 2014-01-07 International Business Machines Corporation Controlling access by a configuration to an adapter function
US8468284B2 (en) 2010-06-23 2013-06-18 International Business Machines Corporation Converting a message signaled interruption into an I/O adapter event notification to a guest operating system
US8621112B2 (en) 2010-06-23 2013-12-31 International Business Machines Corporation Discovery by operating system of information relating to adapter functions accessible to the operating system
US8635430B2 (en) 2010-06-23 2014-01-21 International Business Machines Corporation Translation of input/output addresses to memory addresses
US8650335B2 (en) 2010-06-23 2014-02-11 International Business Machines Corporation Measurement facility for adapter functions
US8572635B2 (en) 2010-06-23 2013-10-29 International Business Machines Corporation Converting a message signaled interruption into an I/O adapter event notification
US8549182B2 (en) * 2010-06-23 2013-10-01 International Business Machines Corporation Store/store block instructions for communicating with adapters
US8566480B2 (en) * 2010-06-23 2013-10-22 International Business Machines Corporation Load instruction for communicating with adapters
US8650337B2 (en) 2010-06-23 2014-02-11 International Business Machines Corporation Runtime determination of translation formats for adapter functions
US8639858B2 (en) 2010-06-23 2014-01-28 International Business Machines Corporation Resizing address spaces concurrent to accessing the address spaces
US8505032B2 (en) 2010-06-23 2013-08-06 International Business Machines Corporation Operating system notification of actions to be taken responsive to adapter events
US8510599B2 (en) 2010-06-23 2013-08-13 International Business Machines Corporation Managing processing associated with hardware events
US9195623B2 (en) 2010-06-23 2015-11-24 International Business Machines Corporation Multiple address spaces per adapter with address translation
US9213661B2 (en) 2010-06-23 2015-12-15 International Business Machines Corporation Enable/disable adapters of a computing environment
US8600159B2 (en) 2010-08-31 2013-12-03 Apple Inc. Color converting images
WO2012108411A1 (ja) * 2011-02-10 2012-08-16 日本電気株式会社 符号化/復号化処理プロセッサ、および無線通信装置
US9378560B2 (en) * 2011-06-17 2016-06-28 Advanced Micro Devices, Inc. Real time on-chip texture decompression using shader processors
US20130027416A1 (en) * 2011-07-25 2013-01-31 Karthikeyan Vaithianathan Gather method and apparatus for media processing accelerators
JP2013161376A (ja) * 2012-02-07 2013-08-19 Hakko Denki Kk プログラマブル表示器、そのプログラム、表示/制御システム
JP5834997B2 (ja) * 2012-02-23 2015-12-24 株式会社ソシオネクスト ベクトルプロセッサ、ベクトルプロセッサの処理方法
US20130311753A1 (en) * 2012-05-19 2013-11-21 Venu Kandadai Method and device (universal multifunction accelerator) for accelerating computations by parallel computations of middle stratum operations
US9075735B2 (en) * 2012-06-21 2015-07-07 Breakingpoint Systems, Inc. Systems and methods for efficient memory access
US9146808B1 (en) * 2013-01-24 2015-09-29 Emulex Corporation Soft error protection for content addressable memory
US9262318B1 (en) * 2013-03-13 2016-02-16 Marvell International Ltd. Serial flash XIP with caching mechanism for fast program execution in embedded systems
EP3014866B1 (en) 2013-06-28 2017-12-27 Hewlett-Packard Development Company, L.P. Color image processing
US10649775B2 (en) * 2013-07-15 2020-05-12 Texas Instrum Ents Incorporated Converting a stream of data using a lookaside buffer
US10423596B2 (en) * 2014-02-11 2019-09-24 International Business Machines Corporation Efficient caching of Huffman dictionaries
CN103885587A (zh) * 2014-02-21 2014-06-25 联想(北京)有限公司 一种信息处理方法及电子设备
US10181098B2 (en) * 2014-06-06 2019-01-15 Google Llc Generating representations of input sequences using neural networks
US9785565B2 (en) * 2014-06-30 2017-10-10 Microunity Systems Engineering, Inc. System and methods for expandably wide processor instructions
US10222989B1 (en) 2015-06-25 2019-03-05 Crossbar, Inc. Multiple-bank memory device with status feedback for subsets of memory banks
US9921763B1 (en) 2015-06-25 2018-03-20 Crossbar, Inc. Multi-bank non-volatile memory apparatus with high-speed bus
US10141034B1 (en) * 2015-06-25 2018-11-27 Crossbar, Inc. Memory apparatus with non-volatile two-terminal memory and expanded, high-speed bus
US10560608B2 (en) * 2016-01-20 2020-02-11 Hewlett-Packard Development Company, L.P. Imaging pipeline processing
US10360177B2 (en) * 2016-06-22 2019-07-23 Ati Technologies Ulc Method and processing apparatus for gating redundant threads
US20170366819A1 (en) * 2016-08-15 2017-12-21 Mediatek Inc. Method And Apparatus Of Single Channel Compression
CN106773954B (zh) * 2016-12-15 2019-05-28 深圳市博巨兴实业发展有限公司 一种微控制器芯片中的工作模式控制系统
US10228937B2 (en) * 2016-12-30 2019-03-12 Intel Corporation Programmable matrix processing engine
JP6821183B2 (ja) 2017-02-23 2021-01-27 株式会社シキノハイテック 画像復号化装置
US11157287B2 (en) 2017-07-24 2021-10-26 Tesla, Inc. Computational array microprocessor system with variable latency memory access
US10671349B2 (en) * 2017-07-24 2020-06-02 Tesla, Inc. Accelerated mathematical engine
US11157441B2 (en) 2017-07-24 2021-10-26 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
CN111628845B (zh) * 2017-09-01 2022-12-06 惠州市德赛西威汽车电子股份有限公司 一种提高数据传输效率的方法
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US20190265976A1 (en) * 2018-02-23 2019-08-29 Yuly Goryavskiy Additional Channel for Exchanging Useful Information
JP7139719B2 (ja) * 2018-06-26 2022-09-21 富士通株式会社 情報処理装置、演算処理装置及び情報処理装置の制御方法
KR102576443B1 (ko) * 2018-10-26 2023-09-07 삼성에스디에스 주식회사 연산 장치 및 그 잡 스케줄링 방법
CN112559397A (zh) 2019-09-26 2021-03-26 阿里巴巴集团控股有限公司 一种装置和方法
US11379261B2 (en) 2019-11-12 2022-07-05 Tata Consultancy Services Limited Systems and methods for automatically creating an image processing pipeline
US20210312325A1 (en) * 2020-04-01 2021-10-07 Samsung Electronics Co., Ltd. Mixed-precision neural processing unit (npu) using spatial fusion with load balancing
US10915836B1 (en) * 2020-07-29 2021-02-09 Guy B. Olney Systems and methods for operating a cognitive automaton
US20220100513A1 (en) * 2020-09-26 2022-03-31 Intel Corporation Apparatuses, methods, and systems for instructions for loading data and padding into a tile of a matrix operations accelerator
TWI753732B (zh) * 2020-12-31 2022-01-21 新唐科技股份有限公司 計數電路及操作系統
CN117492702B (zh) * 2023-12-29 2024-04-02 成都凯迪飞研科技有限责任公司 一种大小端数据流的转换方法

Citations (168)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3883847A (en) 1974-03-28 1975-05-13 Bell Telephone Labor Inc Uniform decoding of minimum-redundancy codes
US3971927A (en) 1975-11-03 1976-07-27 The United States Of America As Represented By The Secretary Of The Navy Modular discrete cosine transform system
US4296476A (en) 1979-01-08 1981-10-20 Atari, Inc. Data processing system with programmable graphics generator
US4330833A (en) 1978-05-26 1982-05-18 Vicom Systems, Inc. Method and apparatus for improved digital image processing
US4385363A (en) 1978-12-15 1983-05-24 Compression Labs, Inc. Discrete cosine transformer
US4460958A (en) 1981-01-26 1984-07-17 Rca Corporation Window-scanned memory
EP0115179A2 (en) 1982-12-30 1984-08-08 International Business Machines Corporation Virtual memory address translation mechanism with combined hash address table and inverted page table
US4475174A (en) 1981-09-08 1984-10-02 Nippon Telegraph & Telephone Public Corporation Decoding apparatus for codes represented by code tree
USRE31736E (en) 1977-06-13 1984-11-13 Rockwell International Corporation Reactive computer system adaptive to a plurality of program inputs
US4535320A (en) 1984-06-22 1985-08-13 Digital Recording Research Limited Partnership Method and apparatus for digital Huffman decoding
US4550368A (en) 1982-07-02 1985-10-29 Sun Microsystems, Inc. High-speed memory and memory management system
US4587610A (en) 1984-02-10 1986-05-06 Prime Computer, Inc. Address translation systems for high speed computer memories
US4622545A (en) 1982-09-30 1986-11-11 Apple Computer, Inc. Method and apparatus for image compression and manipulation
US4646061A (en) 1985-03-13 1987-02-24 Racal Data Communications Inc. Data communication with modified Huffman coding
EP0205712A3 (en) 1985-05-31 1987-04-15 Schlumberger Technologies, Inc. Video stream processing system
EP0218287A1 (en) 1985-09-27 1987-04-15 Océ-Nederland B.V. Front-end system
US4680700A (en) 1983-12-07 1987-07-14 International Business Machines Corporation Virtual memory address translation mechanism with combined hash address table and inverted page table
USRE32493E (en) 1980-05-19 1987-09-01 Hitachi, Ltd. Data processing unit with pipelined operands
US4718091A (en) 1984-01-19 1988-01-05 Hitachi, Ltd. Multifunctional image processor
US4718024A (en) 1985-11-05 1988-01-05 Texas Instruments Incorporated Graphics data processing apparatus for graphic image operations upon data of independently selectable pitch
US4720871A (en) 1986-06-13 1988-01-19 Hughes Aircraft Company Digital image convolution processor method and apparatus
US4736440A (en) 1985-06-10 1988-04-05 Commissariat A L'energie Atomique Process for the processing of digitized signals representing an original image
US4754491A (en) 1985-05-03 1988-06-28 Thomson Grand Public Cosine transform computing devices, and image coding devices and decoding devices comprising such computing devices
US4779223A (en) 1985-01-07 1988-10-18 Hitachi, Ltd. Display apparatus having an image memory controller utilizing a barrel shifter and a mask controller preparing data to be written into an image memory
US4780761A (en) 1987-06-02 1988-10-25 Eastman Kodak Company Digital image compression and transmission system visually weighted transform coefficients
US4791598A (en) 1987-03-24 1988-12-13 Bell Communications Research, Inc. Two-dimensional discrete cosine transform processor
EP0154340B1 (fr) 1984-03-09 1988-12-28 Alcatel Cit Processeur de calcul d'une transformée discrète inverse du cosinus
EP0154341B1 (fr) 1984-03-09 1988-12-28 Alcatel Cit Processeur de calcul d'une transformée discrète du cosinus
US4797850A (en) 1986-05-12 1989-01-10 Advanced Micro Devices, Inc. Dynamic random access memory controller with multiple independent control channels
US4813056A (en) 1987-12-08 1989-03-14 General Electric Company Modified statistical coding of digital signals
EP0086380B1 (en) 1982-02-12 1989-04-05 Hitachi, Ltd. Data processing apparatus for virtual memory system
US4823286A (en) 1987-02-12 1989-04-18 International Business Machines Corporation Pixel data path for high performance raster displays with all-point-addressable frame buffers
US4839826A (en) 1986-04-30 1989-06-13 Kabushiki Kaisha Toshiba Affine conversion apparatus using a raster generator to reduce cycle time
US4853696A (en) 1987-04-13 1989-08-01 University Of Central Florida Code converter for data compression/decompression
US4907182A (en) 1985-09-27 1990-03-06 Elettronica San Giorgio-Elsag S.P.A. System enabling high-speed convolution processing of image data
US4920480A (en) 1987-03-05 1990-04-24 Mitsubishi Denki Kabushiki Kaisha Digital signal processor
US4920426A (en) 1986-11-10 1990-04-24 Kokusai Denshin Denwa Co., Ltd. Image coding system coding digital image signals by forming a histogram of a coefficient signal sequence to estimate an amount of information
US4935821A (en) 1987-08-13 1990-06-19 Ricoh Company, Ltd. Image processing apparatus for multi-media copying machine
US4937774A (en) 1988-11-03 1990-06-26 Harris Corporation East image processing accelerator for real time image processing applications
US4956771A (en) 1988-05-24 1990-09-11 Prime Computer, Inc. Method for inter-processor data transfer
US4965722A (en) 1984-06-11 1990-10-23 Nec Corporation Dynamic memory refresh circuit with a flexible refresh delay dynamic memory
US4975976A (en) 1988-09-20 1990-12-04 Oki Electric Industry Co., Ltd. Image transformation method and device
EP0274376A3 (en) 1987-01-08 1990-12-05 Ezel Inc. Image processing system
US4982343A (en) 1988-10-11 1991-01-01 Next, Inc. Method and apparatus for displaying a plurality of graphic images
US4983958A (en) 1988-01-29 1991-01-08 Intel Corporation Vector selectable coordinate-addressable DRAM array
US4991112A (en) 1987-12-23 1991-02-05 U.S. Philips Corporation Graphics system with graphics controller and DRAM controller
US5025482A (en) 1989-05-24 1991-06-18 Mitsubishi Denki Kabushiki Kaisha Image transformation coding device with adaptive quantization characteristic selection
US5029122A (en) 1988-12-27 1991-07-02 Kabushiki Kaisha Toshiba Discrete cosine transforming apparatus
EP0348703A3 (en) 1988-06-09 1991-07-03 Ezel Inc. Image processing method
US5051840A (en) 1988-12-14 1991-09-24 Fuji Photo Film Co., Ltd. Device for coding a picture signal by compression
US5053985A (en) 1989-10-19 1991-10-01 Zoran Corporation Recycling dct/idct integrated circuit apparatus using a single multiplier/accumulator and a single random access memory
EP0343992A3 (en) 1988-05-25 1991-10-02 Nec Corporation Multiprocessor system
US5060242A (en) 1989-02-24 1991-10-22 General Electric Company Non-destructive lossless image coder
EP0184547B1 (en) 1984-12-07 1991-11-21 Dainippon Screen Mfg. Co., Ltd. Processing method of image data and system therefor
US5109496A (en) 1989-09-27 1992-04-28 International Business Machines Corporation Most recently used address translation system with least recently used (LRU) replacement
US5109333A (en) 1988-04-15 1992-04-28 Hitachi, Ltd. Data transfer control method and apparatus for co-processor system
US5125042A (en) 1989-06-16 1992-06-23 Eastman Kodak Company Digital image interpolator using a plurality of interpolation kernals
US5125085A (en) 1989-09-01 1992-06-23 Bull Hn Information Systems Inc. Least recently used replacement level generating apparatus and method
EP0286183B1 (en) 1987-04-10 1992-07-01 Koninklijke Philips Electronics N.V. Television transmission system using transform coding
US5142380A (en) 1989-10-23 1992-08-25 Ricoh Company, Ltd. Image data processing apparatus
US5163103A (en) 1988-12-27 1992-11-10 Kabushiki Kaisha Toshiba Discrete cosine transforming apparatus
EP0275979B1 (en) 1987-01-20 1992-11-19 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Circuit for computing the quantized coefficient discrete cosine transform of digital signal samples
US5181183A (en) 1990-01-17 1993-01-19 Nec Corporation Discrete cosine transform circuit suitable for integrated circuit implementation
US5185661A (en) 1991-09-19 1993-02-09 Eastman Kodak Company Input scanner color mapping and input/output color gamut transformation
US5185694A (en) 1989-06-26 1993-02-09 Motorola, Inc. Data processing system utilizes block move instruction for burst transferring blocks of data entries where width of data blocks varies
US5185856A (en) 1990-03-16 1993-02-09 Hewlett-Packard Company Arithmetic and logic processing unit for computer graphics system
US5195050A (en) 1990-08-20 1993-03-16 Eastman Kodak Company Single chip, mode switchable, matrix multiplier and convolver suitable for color image processing
US5197021A (en) 1989-07-13 1993-03-23 Telettra-Telefonia Elettronica E Radio S.P.A. System and circuit for the calculation of the bidimensional discrete transform
US5196946A (en) 1990-03-14 1993-03-23 C-Cube Microsystems System for compression and decompression of video data using discrete cosine transform and coding techniques
US5204830A (en) 1992-02-13 1993-04-20 Industrial Technology Research Institute Fast pipelined matrix multiplier
US5212559A (en) 1989-11-13 1993-05-18 Lasermaster Corporation Duty cycle technique for a non-gray scale anti-aliasing method for laser printers
US5216516A (en) 1990-04-27 1993-06-01 Ricoh Company, Inc. Orthogonal transformation arithmetic unit
US5223926A (en) 1991-01-11 1993-06-29 Sony Broadcast & Communications Limited Compression of video signals
US5227789A (en) 1991-09-30 1993-07-13 Eastman Kodak Company Modified huffman encode/decode system with simplified decoding for imaging systems
US5233348A (en) 1992-03-26 1993-08-03 General Instrument Corporation Variable length code word decoder for use in digital communication systems
US5237655A (en) 1990-07-05 1993-08-17 Eastman Kodak Company Raster image processor for all points addressable printer
US5241222A (en) 1991-12-20 1993-08-31 Eastman Kodak Company Dram interface adapter circuit
US5243414A (en) 1991-07-29 1993-09-07 Tektronix, Inc. Color processing system
US5249146A (en) 1991-03-27 1993-09-28 Mitsubishi Denki Kabushiki Kaisha Dct/idct processor and data processing method
US5253078A (en) 1990-03-14 1993-10-12 C-Cube Microsystems, Inc. System for compression and decompression of video data using discrete cosine transform and coding techniques
US5253053A (en) 1990-12-31 1993-10-12 Apple Computer, Inc. Variable length decoding using lookup tables
US5254991A (en) 1991-07-30 1993-10-19 Lsi Logic Corporation Method and apparatus for decoding Huffman codes
US5258941A (en) 1991-12-13 1993-11-02 Edward Newberger Apparatus for utilizing a discrete fourier transformer to implement a discrete cosine transformer
US5262968A (en) 1992-06-25 1993-11-16 The United States Of America As Represented By The Secretary Of The Air Force High performance architecture for image processing
US5268769A (en) 1990-08-01 1993-12-07 Hitachi, Ltd. Image signal decoding system for decoding modified Huffman codes at high speeds
EP0335990B1 (en) 1988-04-02 1993-12-08 International Business Machines Corporation Processor-processor synchronization
US5270832A (en) 1990-03-14 1993-12-14 C-Cube Microsystems System for compression and decompression of video data using discrete cosine transform and coding techniques
US5283866A (en) 1987-07-09 1994-02-01 Ezel, Inc. Image processing system
EP0523764A3 (US06272257-20010807-P00020.png) 1991-06-24 1994-02-16 Philips Nv
EP0311034B1 (en) 1987-10-07 1994-03-16 Hitachi, Ltd. Cache memory control apparatus for a virtual memory data-processing system
US5299027A (en) 1990-11-30 1994-03-29 Hitachi, Ltd. Method and appratus for decoding and printing coded image, and facsimile apparatus, filing apparatus and communication apparatus using the same
US5303349A (en) 1990-06-06 1994-04-12 Valitek, Inc. Interface for establishing a number of consecutive time frames of bidirectional command and data block communication between a Host's standard parallel port and a peripheral device
US5303058A (en) 1990-10-22 1994-04-12 Fujitsu Limited Data processing apparatus for compressing and reconstructing image data
US5307451A (en) 1992-05-12 1994-04-26 Apple Computer, Inc. Method and apparatus for generating and manipulating graphical data for display on a computer output device
US5313577A (en) 1991-08-21 1994-05-17 Digital Equipment Corporation Translation of virtual addresses in a computer graphics system
US5317717A (en) 1987-07-01 1994-05-31 Digital Equipment Corp. Apparatus and method for main memory unit protection using access and fault logic signals
EP0600112A1 (de) 1992-11-30 1994-06-08 Siemens Nixdorf Informationssysteme Aktiengesellschaft Datenverarbeitungsanlage mit virtueller Speicheradressierung und schlüsselgesteuertem Speicherzugriff
EP0272705B1 (en) 1986-12-29 1994-06-08 Matsushita Electric Industrial Co., Ltd. Loosely coupled pipeline processor
US5321806A (en) 1991-08-21 1994-06-14 Digital Equipment Corporation Method and apparatus for transmitting graphics command in a computer graphics system
US5325215A (en) 1990-12-26 1994-06-28 Hitachi, Ltd. Matrix multiplier and picture transforming coder using the same
US5325092A (en) 1992-07-07 1994-06-28 Ricoh Company, Ltd. Huffman decoder architecture for high speed operation and reduced memory
US5333297A (en) 1989-11-09 1994-07-26 International Business Machines Corporation Multiprocessor system having multiple classes of instructions for purposes of mutual interruptibility
US5337319A (en) 1990-10-10 1994-08-09 Fuji Xerox Co., Ltd. Apparatus and method for reconfiguring an image processing system to bypass hardware
US5341318A (en) 1990-03-14 1994-08-23 C-Cube Microsystems, Inc. System for compression and decompression of video data using discrete cosine transform and coding techniques
US5349348A (en) 1991-08-15 1994-09-20 International Business Machines Corporation Multi-mode data stream generator
US5349651A (en) 1989-02-03 1994-09-20 Digital Equipment Corporation System for translation of virtual to physical addresses by operating memory management processor for calculating location of physical address in memory concurrently with cache comparing virtual addresses for translation
US5351067A (en) 1991-07-22 1994-09-27 International Business Machines Corporation Multi-source image real time mixing and anti-aliasing
EP0612007A3 (en) 1993-02-15 1994-10-12 Tokyo Electric Co Ltd Parallel interface and data transmission system for pushers with this interface.
EP0623799A1 (de) 1993-04-03 1994-11-09 SECOTRON ELEKTROGERÄTEBAU GmbH Interaktives Videosystem
EP0626661A1 (en) 1993-05-24 1994-11-30 Societe D'applications Generales D'electricite Et De Mecanique Sagem Digital image processing circuitry
US5379394A (en) 1989-07-13 1995-01-03 Kabushiki Kaisha Toshiba Microprocessor with two groups of internal buses
US5388216A (en) 1989-08-17 1995-02-07 Samsung Electronics Co., Ltd. Circuit for controlling generation of an acknowledge signal and a busy signal in a centronics compatible parallel interface
US5392038A (en) 1992-07-13 1995-02-21 Sony United Kingdom Ltd. Serial data decoding for variable length code words
US5394515A (en) 1991-07-08 1995-02-28 Seiko Epson Corporation Page printer controller including a single chip superscalar microprocessor with graphics functional units
US5414666A (en) 1991-07-31 1995-05-09 Ezel Inc. Memory control device
EP0335306B1 (en) 1988-03-30 1995-05-24 Alcatel N.V. Method and device for obtaining in real time the two-dimensional discrete cosine transform
US5424733A (en) 1993-02-17 1995-06-13 Zenith Electronics Corp. Parallel path variable length decoding for video signals
US5428356A (en) 1992-09-24 1995-06-27 Sony Corporation Variable length code decoder utilizing a predetermined prioritized decoding arrangement
EP0660247A1 (en) 1993-10-27 1995-06-28 Winbond Electronics Corporation Method and apparatus for performing discrete cosine transform and its inverse
US5436734A (en) 1991-11-18 1995-07-25 Fuji Xerox Co., Ltd. Image-edit processing apparatus
US5440404A (en) 1993-01-18 1995-08-08 Matsushita Electric Industrial Co., Ltd. Image signal compression apparatus and method using variable length encoding
US5446854A (en) 1993-10-20 1995-08-29 Sun Microsystems, Inc. Virtual memory computer apparatus and address translation mechanism employing hashing scheme and page frame descriptor that support multiple page sizes
US5450557A (en) 1989-11-07 1995-09-12 Loral Aerospace Corp. Single-chip self-configurable parallel processor
US5453786A (en) 1990-07-30 1995-09-26 Mpr Teltech Ltd. Method and apparatus for image data processing
US5467088A (en) 1992-10-13 1995-11-14 Nec Corporation Huffman code decoding circuit
US5479527A (en) 1993-12-08 1995-12-26 Industrial Technology Research Inst. Variable length coding system
US5481487A (en) 1994-01-28 1996-01-02 Industrial Technology Research Institute Transpose memory for DCT/IDCT circuit
US5483475A (en) 1993-09-15 1996-01-09 Industrial Technology Research Institute Fast pipelined 2-D discrete cosine transform architecture
US5485589A (en) 1991-12-31 1996-01-16 Dell Usa, L.P. Predictive addressing architecture
US5485557A (en) 1985-12-13 1996-01-16 Canon Kabushiki Kaisha Image processing apparatus
US5485568A (en) 1993-10-08 1996-01-16 Xerox Corporation Structured image (Sl) format for describing complex color raster images
EP0655712A3 (en) 1993-11-29 1996-01-17 Canon Kk Image processing method and apparatus.
US5502824A (en) 1992-12-28 1996-03-26 Ncr Corporation Peripheral component interconnect "always on" protocol
US5502804A (en) 1990-08-08 1996-03-26 Peerless Systems Corporation Method and apparatus for displaying a page with graphics information on a continuous synchronous raster output device
US5504842A (en) 1992-11-10 1996-04-02 Adobe Systems, Inc. Method and apparatus for processing data for a visual-output device with reduced buffer memory requirements
US5509137A (en) 1991-01-08 1996-04-16 Mitsubishi Denki Kabushiki Kaisha Store processing method in a pipelined cache memory
US5509115A (en) 1990-08-08 1996-04-16 Peerless Systems Corporation Method and apparatus for displaying a page with graphics information on a continuous synchronous raster output device
EP0383678B1 (en) 1989-02-14 1996-04-24 Fujitsu Limited Method and system for writing and reading coded data
US5513335A (en) 1992-11-02 1996-04-30 Sgs-Thomson Microelectronics, Inc. Cache tag memory having first and second single-port arrays and a dual-port array
US5515296A (en) 1993-11-24 1996-05-07 Intel Corporation Scan path for encoding and decoding two-dimensional signals
US5528238A (en) 1993-11-24 1996-06-18 Intel Corporation Process, apparatus and system for decoding variable-length encoded signals
US5528628A (en) 1994-11-26 1996-06-18 Samsung Electronics Co., Ltd. Apparatus for variable-length coding and variable-length-decoding using a plurality of Huffman coding tables
US5528764A (en) 1992-12-24 1996-06-18 Ncr Corporation Bus system with cache snooping signals having a turnaround time between agents driving the bus for keeping the bus from floating for an extended period
US5530944A (en) 1991-02-27 1996-06-25 Vlsi Technology, Inc. Intelligent programmable dram interface timing controller
US5530823A (en) 1992-05-12 1996-06-25 Unisys Corporation Hit enhancement circuit for page-table-look-aside-buffer
US5535291A (en) 1994-02-18 1996-07-09 Martin Marietta Corporation Superresolution image enhancement for a SIMD array processor
US5544342A (en) 1993-06-30 1996-08-06 International Business Machines Corporation System and method for prefetching information in a processing system
US5557733A (en) 1993-04-02 1996-09-17 Vlsi Technology, Inc. Caching FIFO and method therefor
US5561690A (en) * 1993-11-29 1996-10-01 Daewoo Electronics Co., Ltd. High speed variable length code decoding apparatus
US5561772A (en) 1993-02-10 1996-10-01 Elonex Technologies, Inc. Expansion bus system for replicating an internal bus as an external bus with logical interrupts replacing physical interrupt lines
US5561761A (en) 1993-03-31 1996-10-01 Ylsi Technology, Inc. Central processing unit data entering and interrogating device and method therefor
US5570432A (en) 1992-08-04 1996-10-29 Matsushita Electric Industrial Co., Ltd. Image controller
EP0714166A3 (en) 1994-11-21 1997-04-16 Sican Gmbh Method and circuit for reading code words of varying lengths from a memory for fixed-language code words
US5625355A (en) * 1994-01-28 1997-04-29 Matsushita Electric Industrial Co., Ltd. Apparatus and method for decoding variable-length code
US5652583A (en) * 1995-06-30 1997-07-29 Daewoo Electronics Co. Ltd Apparatus for encoding variable-length codes and segmenting variable-length codewords thereof
EP0486154B1 (en) 1990-11-13 1997-07-30 International Computers Limited Method of operating a virtual memory system
US5686915A (en) * 1995-12-27 1997-11-11 Xerox Corporation Interleaved Huffman encoding and decoding method
EP0472961B1 (en) 1990-08-31 1997-11-26 Samsung Electronics Co., Ltd. Coding method for increasing compressing efficiency of data in transmitting or storing picture signals
EP0674266A3 (en) 1994-03-24 1997-12-03 Discovision Associates Method and apparatus for interfacing with ram
US5736946A (en) * 1995-08-31 1998-04-07 Daewoo Electronics Co., Ltd. High speed apparatus and method for decoding variable length code
EP0380720B1 (en) 1989-01-30 1998-08-12 Yozan Inc. Image processing method
EP0692913A3 (en) 1994-07-13 1998-10-07 Matsushita Electric Industrial Co., Ltd. Digital coding/decoding apparatus using variable length codes
EP0708563A3 (en) 1994-10-19 1999-05-06 Matsushita Electric Industrial Co., Ltd. Image encoding/decoding device
EP0535749B1 (en) 1991-10-04 1999-07-07 D2B Systems Co. Ltd. Local communication bus system
EP0655854B1 (en) 1993-11-30 1999-07-21 Canon Kabushiki Kaisha Image communicating apparatus
EP0535893B1 (en) 1991-09-30 1999-11-17 Sony Corporation Transform processing apparatus and method and medium for storing compressed digital signals
EP0675632B1 (en) 1994-03-30 2000-02-09 Canon Kabushiki Kaisha Image recording apparatus and method therefor
EP0588726B1 (en) 1992-09-17 2001-02-28 Sony Corporation Discrete cosine transformation system and method and inverse discrete cosine transformation system and method, having simple structure and operable at high speed

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4547849A (en) * 1981-12-09 1985-10-15 Glenn Louie Interface between a microprocessor and a coprocessor
US4829465A (en) * 1986-06-19 1989-05-09 American Telephone And Telegraph Company, At&T Bell Laboratories High speed cosine transform
KR0136594B1 (ko) * 1988-09-30 1998-10-01 미다 가쓰시게 단일칩 마이크로 컴퓨터
US5261064A (en) 1989-10-03 1993-11-09 Advanced Micro Devices, Inc. Burst access memory
US5345577A (en) 1989-10-13 1994-09-06 Chips & Technologies, Inc. Dram refresh controller with improved bus arbitration scheme
US5175863A (en) * 1989-10-23 1992-12-29 International Business Machines Corporation Signal data processing system having independently, simultaneously operable alu and macu
US5197140A (en) * 1989-11-17 1993-03-23 Texas Instruments Incorporated Sliced addressing multi-processor and method of operation
US5276798A (en) * 1990-09-14 1994-01-04 Hughes Aircraft Company Multifunction high performance graphics rendering processor
DE4102340A1 (de) 1991-01-26 1992-07-30 Bayer Ag Lichtleitfasern und verfahren zu ihrer herstellung
JPH06163851A (ja) 1991-06-07 1994-06-10 Texas Instr Japan Ltd 半導体装置及びその製造方法
GB9118312D0 (en) * 1991-08-24 1991-10-09 Motorola Inc Real time cache implemented by dual purpose on-chip memory
US5452101A (en) * 1991-10-24 1995-09-19 Intel Corporation Apparatus and method for decoding fixed and variable length encoded data
US5786908A (en) * 1992-01-15 1998-07-28 E. I. Du Pont De Nemours And Company Method and apparatus for converting image color values from a first to a second color space
EP0566184A3 (en) * 1992-04-13 1994-11-17 Philips Electronics Nv Image transformer as well as television system comprising a transmitter and a receiver provided with a transformer.
US5450532A (en) * 1993-04-29 1995-09-12 Hewlett-Packard Company Cache memory system for a color ink jet printer
KR960010199B1 (ko) * 1993-07-16 1996-07-26 배순훈 디지탈 신호처리 칩 제어장치
US5583803A (en) * 1993-12-27 1996-12-10 Matsushita Electric Industrial Co., Ltd. Two-dimensional orthogonal transform processor
US5550765A (en) * 1994-05-13 1996-08-27 Lucent Technologies Inc. Method and apparatus for transforming a multi-dimensional matrix of coefficents representative of a signal
JPH0856292A (ja) * 1994-08-12 1996-02-27 Fuji Xerox Co Ltd 画像処理装置
AU3412295A (en) 1994-09-01 1996-03-22 Gary L. Mcalpine A multi-port memory system including read and write buffer interfaces
US5825676A (en) * 1994-09-30 1998-10-20 Canon Kabushiki Kaisha Orthogonal converting apparatus
JPH08205027A (ja) * 1995-01-31 1996-08-09 Sony Corp 多点補間回路
GB2301203B (en) * 1995-03-18 2000-01-12 United Microelectronics Corp Real time two dimensional discrete cosine transform/inverse discrete cosine transform circuit
US5737756A (en) * 1995-04-28 1998-04-07 Unisys Corporation Dual bus computer network using dual busses with dual spy modules enabling clearing of invalidation queue for processor with store through cache while providing retry cycles for incomplete accesses to invalidation queue
US5694141A (en) * 1995-06-07 1997-12-02 Seiko Epson Corporation Computer system with double simultaneous displays showing differing display images
GB2302743B (en) * 1995-06-26 2000-02-16 Sony Uk Ltd Processing apparatus
US5701263A (en) * 1995-08-28 1997-12-23 Hyundai Electronics America Inverse discrete cosine transform processor for VLSI implementation
US5710905A (en) * 1995-12-21 1998-01-20 Cypress Semiconductor Corp. Cache controller for a non-symetric cache system
US5999958A (en) * 1996-04-24 1999-12-07 National Science Council Device for computing discrete cosine transform and inverse discrete cosine transform
US5818364A (en) * 1996-06-19 1998-10-06 Hewlett-Packard Company High bit-rate huffman decoding
US5895487A (en) * 1996-11-13 1999-04-20 International Business Machines Corporation Integrated processing and L2 DRAM cache
US5860158A (en) * 1996-11-15 1999-01-12 Samsung Electronics Company, Ltd. Cache control unit with a cache request transaction-oriented protocol
US6043804A (en) * 1997-03-21 2000-03-28 Alliance Semiconductor Corp. Color pixel format conversion incorporating color look-up table and post look-up arithmetic operation

Patent Citations (181)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3883847A (en) 1974-03-28 1975-05-13 Bell Telephone Labor Inc Uniform decoding of minimum-redundancy codes
US3971927A (en) 1975-11-03 1976-07-27 The United States Of America As Represented By The Secretary Of The Navy Modular discrete cosine transform system
USRE31736E (en) 1977-06-13 1984-11-13 Rockwell International Corporation Reactive computer system adaptive to a plurality of program inputs
US4330833A (en) 1978-05-26 1982-05-18 Vicom Systems, Inc. Method and apparatus for improved digital image processing
US4385363A (en) 1978-12-15 1983-05-24 Compression Labs, Inc. Discrete cosine transformer
US4296476A (en) 1979-01-08 1981-10-20 Atari, Inc. Data processing system with programmable graphics generator
USRE32493E (en) 1980-05-19 1987-09-01 Hitachi, Ltd. Data processing unit with pipelined operands
US4460958A (en) 1981-01-26 1984-07-17 Rca Corporation Window-scanned memory
US4475174A (en) 1981-09-08 1984-10-02 Nippon Telegraph & Telephone Public Corporation Decoding apparatus for codes represented by code tree
EP0086380B1 (en) 1982-02-12 1989-04-05 Hitachi, Ltd. Data processing apparatus for virtual memory system
US4550368A (en) 1982-07-02 1985-10-29 Sun Microsystems, Inc. High-speed memory and memory management system
US4622545A (en) 1982-09-30 1986-11-11 Apple Computer, Inc. Method and apparatus for image compression and manipulation
EP0115179A2 (en) 1982-12-30 1984-08-08 International Business Machines Corporation Virtual memory address translation mechanism with combined hash address table and inverted page table
US4680700A (en) 1983-12-07 1987-07-14 International Business Machines Corporation Virtual memory address translation mechanism with combined hash address table and inverted page table
US4718091A (en) 1984-01-19 1988-01-05 Hitachi, Ltd. Multifunctional image processor
EP0150060B1 (en) 1984-01-19 1990-09-19 Hitachi, Ltd. Multifunctional image processor
US4587610A (en) 1984-02-10 1986-05-06 Prime Computer, Inc. Address translation systems for high speed computer memories
EP0154341B1 (fr) 1984-03-09 1988-12-28 Alcatel Cit Processeur de calcul d'une transformée discrète du cosinus
EP0154340B1 (fr) 1984-03-09 1988-12-28 Alcatel Cit Processeur de calcul d'une transformée discrète inverse du cosinus
US4965722A (en) 1984-06-11 1990-10-23 Nec Corporation Dynamic memory refresh circuit with a flexible refresh delay dynamic memory
US4535320A (en) 1984-06-22 1985-08-13 Digital Recording Research Limited Partnership Method and apparatus for digital Huffman decoding
EP0184547B1 (en) 1984-12-07 1991-11-21 Dainippon Screen Mfg. Co., Ltd. Processing method of image data and system therefor
US4779223A (en) 1985-01-07 1988-10-18 Hitachi, Ltd. Display apparatus having an image memory controller utilizing a barrel shifter and a mask controller preparing data to be written into an image memory
US4700175A (en) 1985-03-13 1987-10-13 Racal Data Communications Inc. Data communication with modified Huffman coding
US4646061A (en) 1985-03-13 1987-02-24 Racal Data Communications Inc. Data communication with modified Huffman coding
US4754491A (en) 1985-05-03 1988-06-28 Thomson Grand Public Cosine transform computing devices, and image coding devices and decoding devices comprising such computing devices
EP0205712A3 (en) 1985-05-31 1987-04-15 Schlumberger Technologies, Inc. Video stream processing system
US4736440A (en) 1985-06-10 1988-04-05 Commissariat A L'energie Atomique Process for the processing of digitized signals representing an original image
EP0206892B1 (fr) 1985-06-10 1991-12-11 Commissariat A L'energie Atomique Procédé de traitement de signaux numérisés représentatifs d'une image origine
EP0218287A1 (en) 1985-09-27 1987-04-15 Océ-Nederland B.V. Front-end system
US4907182A (en) 1985-09-27 1990-03-06 Elettronica San Giorgio-Elsag S.P.A. System enabling high-speed convolution processing of image data
US4718024A (en) 1985-11-05 1988-01-05 Texas Instruments Incorporated Graphics data processing apparatus for graphic image operations upon data of independently selectable pitch
US5485557A (en) 1985-12-13 1996-01-16 Canon Kabushiki Kaisha Image processing apparatus
US4839826A (en) 1986-04-30 1989-06-13 Kabushiki Kaisha Toshiba Affine conversion apparatus using a raster generator to reduce cycle time
US4797850A (en) 1986-05-12 1989-01-10 Advanced Micro Devices, Inc. Dynamic random access memory controller with multiple independent control channels
US4720871A (en) 1986-06-13 1988-01-19 Hughes Aircraft Company Digital image convolution processor method and apparatus
US4920426A (en) 1986-11-10 1990-04-24 Kokusai Denshin Denwa Co., Ltd. Image coding system coding digital image signals by forming a histogram of a coefficient signal sequence to estimate an amount of information
EP0272705B1 (en) 1986-12-29 1994-06-08 Matsushita Electric Industrial Co., Ltd. Loosely coupled pipeline processor
EP0274376A3 (en) 1987-01-08 1990-12-05 Ezel Inc. Image processing system
EP0275979B1 (en) 1987-01-20 1992-11-19 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Circuit for computing the quantized coefficient discrete cosine transform of digital signal samples
US4823286A (en) 1987-02-12 1989-04-18 International Business Machines Corporation Pixel data path for high performance raster displays with all-point-addressable frame buffers
USRE34850E (en) 1987-03-05 1995-02-07 Mitsubishi Denki Kabushiki Kaisha Digital signal processor
US4920480A (en) 1987-03-05 1990-04-24 Mitsubishi Denki Kabushiki Kaisha Digital signal processor
US4791598A (en) 1987-03-24 1988-12-13 Bell Communications Research, Inc. Two-dimensional discrete cosine transform processor
EP0286183B1 (en) 1987-04-10 1992-07-01 Koninklijke Philips Electronics N.V. Television transmission system using transform coding
US4853696A (en) 1987-04-13 1989-08-01 University Of Central Florida Code converter for data compression/decompression
US4780761A (en) 1987-06-02 1988-10-25 Eastman Kodak Company Digital image compression and transmission system visually weighted transform coefficients
US5317717A (en) 1987-07-01 1994-05-31 Digital Equipment Corp. Apparatus and method for main memory unit protection using access and fault logic signals
US5283866A (en) 1987-07-09 1994-02-01 Ezel, Inc. Image processing system
US4935821A (en) 1987-08-13 1990-06-19 Ricoh Company, Ltd. Image processing apparatus for multi-media copying machine
EP0311034B1 (en) 1987-10-07 1994-03-16 Hitachi, Ltd. Cache memory control apparatus for a virtual memory data-processing system
US4813056A (en) 1987-12-08 1989-03-14 General Electric Company Modified statistical coding of digital signals
US4991112A (en) 1987-12-23 1991-02-05 U.S. Philips Corporation Graphics system with graphics controller and DRAM controller
US4983958A (en) 1988-01-29 1991-01-08 Intel Corporation Vector selectable coordinate-addressable DRAM array
EP0335306B1 (en) 1988-03-30 1995-05-24 Alcatel N.V. Method and device for obtaining in real time the two-dimensional discrete cosine transform
EP0335990B1 (en) 1988-04-02 1993-12-08 International Business Machines Corporation Processor-processor synchronization
US5109333A (en) 1988-04-15 1992-04-28 Hitachi, Ltd. Data transfer control method and apparatus for co-processor system
US4956771A (en) 1988-05-24 1990-09-11 Prime Computer, Inc. Method for inter-processor data transfer
EP0343992A3 (en) 1988-05-25 1991-10-02 Nec Corporation Multiprocessor system
EP0348703A3 (en) 1988-06-09 1991-07-03 Ezel Inc. Image processing method
EP0360155B1 (en) 1988-09-20 1996-01-10 Oki Electric Industry Co., Ltd. Image transformation method and device
US4975976A (en) 1988-09-20 1990-12-04 Oki Electric Industry Co., Ltd. Image transformation method and device
US4982343A (en) 1988-10-11 1991-01-01 Next, Inc. Method and apparatus for displaying a plurality of graphic images
US4937774A (en) 1988-11-03 1990-06-26 Harris Corporation East image processing accelerator for real time image processing applications
US5051840A (en) 1988-12-14 1991-09-24 Fuji Photo Film Co., Ltd. Device for coding a picture signal by compression
US5163103A (en) 1988-12-27 1992-11-10 Kabushiki Kaisha Toshiba Discrete cosine transforming apparatus
US5029122A (en) 1988-12-27 1991-07-02 Kabushiki Kaisha Toshiba Discrete cosine transforming apparatus
EP0380720B1 (en) 1989-01-30 1998-08-12 Yozan Inc. Image processing method
US5349651A (en) 1989-02-03 1994-09-20 Digital Equipment Corporation System for translation of virtual to physical addresses by operating memory management processor for calculating location of physical address in memory concurrently with cache comparing virtual addresses for translation
EP0383678B1 (en) 1989-02-14 1996-04-24 Fujitsu Limited Method and system for writing and reading coded data
US5060242A (en) 1989-02-24 1991-10-22 General Electric Company Non-destructive lossless image coder
US5025482A (en) 1989-05-24 1991-06-18 Mitsubishi Denki Kabushiki Kaisha Image transformation coding device with adaptive quantization characteristic selection
US5125042A (en) 1989-06-16 1992-06-23 Eastman Kodak Company Digital image interpolator using a plurality of interpolation kernals
US5185694A (en) 1989-06-26 1993-02-09 Motorola, Inc. Data processing system utilizes block move instruction for burst transferring blocks of data entries where width of data blocks varies
US5197021A (en) 1989-07-13 1993-03-23 Telettra-Telefonia Elettronica E Radio S.P.A. System and circuit for the calculation of the bidimensional discrete transform
US5379394A (en) 1989-07-13 1995-01-03 Kabushiki Kaisha Toshiba Microprocessor with two groups of internal buses
US5388216A (en) 1989-08-17 1995-02-07 Samsung Electronics Co., Ltd. Circuit for controlling generation of an acknowledge signal and a busy signal in a centronics compatible parallel interface
US5125085A (en) 1989-09-01 1992-06-23 Bull Hn Information Systems Inc. Least recently used replacement level generating apparatus and method
US5109496A (en) 1989-09-27 1992-04-28 International Business Machines Corporation Most recently used address translation system with least recently used (LRU) replacement
US5053985A (en) 1989-10-19 1991-10-01 Zoran Corporation Recycling dct/idct integrated circuit apparatus using a single multiplier/accumulator and a single random access memory
US5142380A (en) 1989-10-23 1992-08-25 Ricoh Company, Ltd. Image data processing apparatus
US5450557A (en) 1989-11-07 1995-09-12 Loral Aerospace Corp. Single-chip self-configurable parallel processor
US5333297A (en) 1989-11-09 1994-07-26 International Business Machines Corporation Multiprocessor system having multiple classes of instructions for purposes of mutual interruptibility
US5212559A (en) 1989-11-13 1993-05-18 Lasermaster Corporation Duty cycle technique for a non-gray scale anti-aliasing method for laser printers
US5181183A (en) 1990-01-17 1993-01-19 Nec Corporation Discrete cosine transform circuit suitable for integrated circuit implementation
US5270832A (en) 1990-03-14 1993-12-14 C-Cube Microsystems System for compression and decompression of video data using discrete cosine transform and coding techniques
US5196946A (en) 1990-03-14 1993-03-23 C-Cube Microsystems System for compression and decompression of video data using discrete cosine transform and coding techniques
US5341318A (en) 1990-03-14 1994-08-23 C-Cube Microsystems, Inc. System for compression and decompression of video data using discrete cosine transform and coding techniques
US5253078A (en) 1990-03-14 1993-10-12 C-Cube Microsystems, Inc. System for compression and decompression of video data using discrete cosine transform and coding techniques
US5185856A (en) 1990-03-16 1993-02-09 Hewlett-Packard Company Arithmetic and logic processing unit for computer graphics system
US5216516A (en) 1990-04-27 1993-06-01 Ricoh Company, Inc. Orthogonal transformation arithmetic unit
US5303349A (en) 1990-06-06 1994-04-12 Valitek, Inc. Interface for establishing a number of consecutive time frames of bidirectional command and data block communication between a Host's standard parallel port and a peripheral device
US5237655A (en) 1990-07-05 1993-08-17 Eastman Kodak Company Raster image processor for all points addressable printer
US5453786A (en) 1990-07-30 1995-09-26 Mpr Teltech Ltd. Method and apparatus for image data processing
US5268769A (en) 1990-08-01 1993-12-07 Hitachi, Ltd. Image signal decoding system for decoding modified Huffman codes at high speeds
US5502804A (en) 1990-08-08 1996-03-26 Peerless Systems Corporation Method and apparatus for displaying a page with graphics information on a continuous synchronous raster output device
US5509115A (en) 1990-08-08 1996-04-16 Peerless Systems Corporation Method and apparatus for displaying a page with graphics information on a continuous synchronous raster output device
US5195050A (en) 1990-08-20 1993-03-16 Eastman Kodak Company Single chip, mode switchable, matrix multiplier and convolver suitable for color image processing
EP0472961B1 (en) 1990-08-31 1997-11-26 Samsung Electronics Co., Ltd. Coding method for increasing compressing efficiency of data in transmitting or storing picture signals
US5337319A (en) 1990-10-10 1994-08-09 Fuji Xerox Co., Ltd. Apparatus and method for reconfiguring an image processing system to bypass hardware
US5303058A (en) 1990-10-22 1994-04-12 Fujitsu Limited Data processing apparatus for compressing and reconstructing image data
EP0482864B1 (en) 1990-10-22 1997-09-24 Fujitsu Limited An image data processing apparatus
EP0486154B1 (en) 1990-11-13 1997-07-30 International Computers Limited Method of operating a virtual memory system
US5299027A (en) 1990-11-30 1994-03-29 Hitachi, Ltd. Method and appratus for decoding and printing coded image, and facsimile apparatus, filing apparatus and communication apparatus using the same
US5325215A (en) 1990-12-26 1994-06-28 Hitachi, Ltd. Matrix multiplier and picture transforming coder using the same
US5253053A (en) 1990-12-31 1993-10-12 Apple Computer, Inc. Variable length decoding using lookup tables
US5509137A (en) 1991-01-08 1996-04-16 Mitsubishi Denki Kabushiki Kaisha Store processing method in a pipelined cache memory
US5223926A (en) 1991-01-11 1993-06-29 Sony Broadcast & Communications Limited Compression of video signals
US5530944A (en) 1991-02-27 1996-06-25 Vlsi Technology, Inc. Intelligent programmable dram interface timing controller
EP0506111B1 (en) 1991-03-27 2000-04-12 Mitsubishi Denki Kabushiki Kaisha DCT/IDCT processor and data processing method
US5249146A (en) 1991-03-27 1993-09-28 Mitsubishi Denki Kabushiki Kaisha Dct/idct processor and data processing method
EP0523764A3 (US06272257-20010807-P00020.png) 1991-06-24 1994-02-16 Philips Nv
US5394515A (en) 1991-07-08 1995-02-28 Seiko Epson Corporation Page printer controller including a single chip superscalar microprocessor with graphics functional units
US5351067A (en) 1991-07-22 1994-09-27 International Business Machines Corporation Multi-source image real time mixing and anti-aliasing
US5243414A (en) 1991-07-29 1993-09-07 Tektronix, Inc. Color processing system
US5254991A (en) 1991-07-30 1993-10-19 Lsi Logic Corporation Method and apparatus for decoding Huffman codes
US5414666A (en) 1991-07-31 1995-05-09 Ezel Inc. Memory control device
US5349348A (en) 1991-08-15 1994-09-20 International Business Machines Corporation Multi-mode data stream generator
US5313577A (en) 1991-08-21 1994-05-17 Digital Equipment Corporation Translation of virtual addresses in a computer graphics system
US5321806A (en) 1991-08-21 1994-06-14 Digital Equipment Corporation Method and apparatus for transmitting graphics command in a computer graphics system
US5185661A (en) 1991-09-19 1993-02-09 Eastman Kodak Company Input scanner color mapping and input/output color gamut transformation
EP0535893B1 (en) 1991-09-30 1999-11-17 Sony Corporation Transform processing apparatus and method and medium for storing compressed digital signals
US5227789A (en) 1991-09-30 1993-07-13 Eastman Kodak Company Modified huffman encode/decode system with simplified decoding for imaging systems
EP0535749B1 (en) 1991-10-04 1999-07-07 D2B Systems Co. Ltd. Local communication bus system
US5436734A (en) 1991-11-18 1995-07-25 Fuji Xerox Co., Ltd. Image-edit processing apparatus
US5258941A (en) 1991-12-13 1993-11-02 Edward Newberger Apparatus for utilizing a discrete fourier transformer to implement a discrete cosine transformer
US5241222A (en) 1991-12-20 1993-08-31 Eastman Kodak Company Dram interface adapter circuit
US5485589A (en) 1991-12-31 1996-01-16 Dell Usa, L.P. Predictive addressing architecture
US5204830A (en) 1992-02-13 1993-04-20 Industrial Technology Research Institute Fast pipelined matrix multiplier
US5233348A (en) 1992-03-26 1993-08-03 General Instrument Corporation Variable length code word decoder for use in digital communication systems
US5530823A (en) 1992-05-12 1996-06-25 Unisys Corporation Hit enhancement circuit for page-table-look-aside-buffer
US5307451A (en) 1992-05-12 1994-04-26 Apple Computer, Inc. Method and apparatus for generating and manipulating graphical data for display on a computer output device
US5262968A (en) 1992-06-25 1993-11-16 The United States Of America As Represented By The Secretary Of The Air Force High performance architecture for image processing
US5325092A (en) 1992-07-07 1994-06-28 Ricoh Company, Ltd. Huffman decoder architecture for high speed operation and reduced memory
US5392038A (en) 1992-07-13 1995-02-21 Sony United Kingdom Ltd. Serial data decoding for variable length code words
US5570432A (en) 1992-08-04 1996-10-29 Matsushita Electric Industrial Co., Ltd. Image controller
EP0588726B1 (en) 1992-09-17 2001-02-28 Sony Corporation Discrete cosine transformation system and method and inverse discrete cosine transformation system and method, having simple structure and operable at high speed
US5428356A (en) 1992-09-24 1995-06-27 Sony Corporation Variable length code decoder utilizing a predetermined prioritized decoding arrangement
EP0589682B1 (en) 1992-09-24 1999-01-13 Sony Corporation Variable length code decoder
EP0593046B1 (en) 1992-10-13 2000-07-26 Nec Corporation Huffman code decoding circuit
US5467088A (en) 1992-10-13 1995-11-14 Nec Corporation Huffman code decoding circuit
US5513335A (en) 1992-11-02 1996-04-30 Sgs-Thomson Microelectronics, Inc. Cache tag memory having first and second single-port arrays and a dual-port array
US5544290A (en) 1992-11-10 1996-08-06 Adobe Systems, Inc. Method and apparatus for processing data for a visual-output device with reduced buffer memory requirements
US5506944A (en) 1992-11-10 1996-04-09 Adobe Systems, Inc. Method and apparatus for processing data for a visual-output device with reduced buffer memory requirements
US5539865A (en) 1992-11-10 1996-07-23 Adobe Systems, Inc. Method and apparatus for processing data for a visual-output device with reduced buffer memory requirements
US5504842A (en) 1992-11-10 1996-04-02 Adobe Systems, Inc. Method and apparatus for processing data for a visual-output device with reduced buffer memory requirements
EP0600112A1 (de) 1992-11-30 1994-06-08 Siemens Nixdorf Informationssysteme Aktiengesellschaft Datenverarbeitungsanlage mit virtueller Speicheradressierung und schlüsselgesteuertem Speicherzugriff
US5528764A (en) 1992-12-24 1996-06-18 Ncr Corporation Bus system with cache snooping signals having a turnaround time between agents driving the bus for keeping the bus from floating for an extended period
US5502824A (en) 1992-12-28 1996-03-26 Ncr Corporation Peripheral component interconnect "always on" protocol
US5440404A (en) 1993-01-18 1995-08-08 Matsushita Electric Industrial Co., Ltd. Image signal compression apparatus and method using variable length encoding
US5561772A (en) 1993-02-10 1996-10-01 Elonex Technologies, Inc. Expansion bus system for replicating an internal bus as an external bus with logical interrupts replacing physical interrupt lines
EP0612007A3 (en) 1993-02-15 1994-10-12 Tokyo Electric Co Ltd Parallel interface and data transmission system for pushers with this interface.
US5424733A (en) 1993-02-17 1995-06-13 Zenith Electronics Corp. Parallel path variable length decoding for video signals
US5561761A (en) 1993-03-31 1996-10-01 Ylsi Technology, Inc. Central processing unit data entering and interrogating device and method therefor
US5557733A (en) 1993-04-02 1996-09-17 Vlsi Technology, Inc. Caching FIFO and method therefor
EP0623799A1 (de) 1993-04-03 1994-11-09 SECOTRON ELEKTROGERÄTEBAU GmbH Interaktives Videosystem
US5524075A (en) 1993-05-24 1996-06-04 Sagem S.A. Digital image processing circuitry
EP0626661A1 (en) 1993-05-24 1994-11-30 Societe D'applications Generales D'electricite Et De Mecanique Sagem Digital image processing circuitry
US5544342A (en) 1993-06-30 1996-08-06 International Business Machines Corporation System and method for prefetching information in a processing system
US5483475A (en) 1993-09-15 1996-01-09 Industrial Technology Research Institute Fast pipelined 2-D discrete cosine transform architecture
US5485568A (en) 1993-10-08 1996-01-16 Xerox Corporation Structured image (Sl) format for describing complex color raster images
US5446854A (en) 1993-10-20 1995-08-29 Sun Microsystems, Inc. Virtual memory computer apparatus and address translation mechanism employing hashing scheme and page frame descriptor that support multiple page sizes
EP0660247A1 (en) 1993-10-27 1995-06-28 Winbond Electronics Corporation Method and apparatus for performing discrete cosine transform and its inverse
US5515296A (en) 1993-11-24 1996-05-07 Intel Corporation Scan path for encoding and decoding two-dimensional signals
US5528238A (en) 1993-11-24 1996-06-18 Intel Corporation Process, apparatus and system for decoding variable-length encoded signals
US5561690A (en) * 1993-11-29 1996-10-01 Daewoo Electronics Co., Ltd. High speed variable length code decoding apparatus
EP0655712A3 (en) 1993-11-29 1996-01-17 Canon Kk Image processing method and apparatus.
EP0655854B1 (en) 1993-11-30 1999-07-21 Canon Kabushiki Kaisha Image communicating apparatus
US5479527A (en) 1993-12-08 1995-12-26 Industrial Technology Research Inst. Variable length coding system
US5625355A (en) * 1994-01-28 1997-04-29 Matsushita Electric Industrial Co., Ltd. Apparatus and method for decoding variable-length code
US5481487A (en) 1994-01-28 1996-01-02 Industrial Technology Research Institute Transpose memory for DCT/IDCT circuit
US5535291A (en) 1994-02-18 1996-07-09 Martin Marietta Corporation Superresolution image enhancement for a SIMD array processor
EP0674266A3 (en) 1994-03-24 1997-12-03 Discovision Associates Method and apparatus for interfacing with ram
EP0675632B1 (en) 1994-03-30 2000-02-09 Canon Kabushiki Kaisha Image recording apparatus and method therefor
EP0692913A3 (en) 1994-07-13 1998-10-07 Matsushita Electric Industrial Co., Ltd. Digital coding/decoding apparatus using variable length codes
EP0708563A3 (en) 1994-10-19 1999-05-06 Matsushita Electric Industrial Co., Ltd. Image encoding/decoding device
EP0714166A3 (en) 1994-11-21 1997-04-16 Sican Gmbh Method and circuit for reading code words of varying lengths from a memory for fixed-language code words
US5528628A (en) 1994-11-26 1996-06-18 Samsung Electronics Co., Ltd. Apparatus for variable-length coding and variable-length-decoding using a plurality of Huffman coding tables
US5652583A (en) * 1995-06-30 1997-07-29 Daewoo Electronics Co. Ltd Apparatus for encoding variable-length codes and segmenting variable-length codewords thereof
US5736946A (en) * 1995-08-31 1998-04-07 Daewoo Electronics Co., Ltd. High speed apparatus and method for decoding variable length code
US5686915A (en) * 1995-12-27 1997-11-11 Xerox Corporation Interleaved Huffman encoding and decoding method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Arai et al., "A Fast DCT-SQ Scheme for Images," Trans. IEICE, vol. E 71, No. 11, Nov. 1988pp. 1095-1097.

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6393545B1 (en) 1919-04-30 2002-05-21 Canon Kabushiki Kaisha Method apparatus and system for managing virtual memory with virtual-physical mapping
US6674536B2 (en) 1997-04-30 2004-01-06 Canon Kabushiki Kaisha Multi-instruction stream processor
US6408421B1 (en) * 1998-09-15 2002-06-18 The Trustees Of Columbia University High-speed asynchronous decoder circuit for variable-length coded data
US7483574B2 (en) * 2001-09-14 2009-01-27 Nec Corporation Image processing apparatus, image transmission apparatus, image reception apparatus, and image processing method
US20040252891A1 (en) * 2001-09-14 2004-12-16 Daigo Sasaki Image processing apparatus, image transmission apparatus, image reception apparatus, and image processing method
US20040223608A1 (en) * 2001-09-25 2004-11-11 Oommen B. John Cryptosystem for data security
US7508935B2 (en) * 2001-09-25 2009-03-24 3927296 Canada, Inc. Cryptosystem for data security
US20040105500A1 (en) * 2002-04-05 2004-06-03 Koji Hosogi Image processing system
US20050050341A1 (en) * 2003-08-28 2005-03-03 Sunplus Technology Co., Ltd. Device of applying protection bit codes to encrypt a program for protection
US7707431B2 (en) * 2003-08-28 2010-04-27 Sunplus Technology Co., Ltd. Device of applying protection bit codes to encrypt a program for protection
US20100098151A1 (en) * 2004-06-07 2010-04-22 Nahava Inc. Method and Apparatus for Cached Adaptive Transforms for Compressing Data Streams, Computing Similarity, and Recognizing Patterns
US8175144B2 (en) * 2004-06-07 2012-05-08 Nahava Inc. Method and apparatus for cached adaptive transforms for compressing data streams, computing similarity, and recognizing patterns
WO2006006005A2 (en) * 2004-06-30 2006-01-19 Nokia Inc. Chaining control marker data structure
WO2006006005A3 (en) * 2004-06-30 2006-07-13 Nokia Inc Chaining control marker data structure
US20060015648A1 (en) * 2004-06-30 2006-01-19 Nokia Inc. Chaining control marker data structure
US8576246B2 (en) * 2005-02-14 2013-11-05 St-Ericsson Sa Image processing method and device
US20060181724A1 (en) * 2005-02-14 2006-08-17 Stmicroelectronics Sa Image processing method and device
US7873947B1 (en) * 2005-03-17 2011-01-18 Arun Lakhotia Phylogeny generation
US7423652B2 (en) * 2005-05-27 2008-09-09 Via Technologies Inc. Apparatus and method for digital video decoding
US20060267996A1 (en) * 2005-05-27 2006-11-30 Jiunn-Shyang Wang Apparatus and method for digital video decoding
US20080320223A1 (en) * 2006-02-27 2008-12-25 Fujitsu Limited Cache controller and cache control method
US8312218B2 (en) * 2006-02-27 2012-11-13 Fujitsu Limited Cache controller and cache control method
US20080218387A1 (en) * 2007-03-07 2008-09-11 Industrial Technology Research Institute Variable length decoder utilizing reordered index decoding look-up-table (lut) and method of using the same
US7460036B2 (en) * 2007-03-07 2008-12-02 Industrial Technology Research Institute Variable length decoder utilizing reordered index decoding look-up-table (LUT) and method of using the same
US7598891B2 (en) * 2007-04-27 2009-10-06 Nec Electronics Corporation Data development device and data development method
US20080270429A1 (en) * 2007-04-27 2008-10-30 Nec Electronics Corporation Data development device and data development method
US8432572B2 (en) 2007-06-29 2013-04-30 Konica Minolta Laboratory U.S.A., Inc. Systems and methods of trapping for print devices
US20090002765A1 (en) * 2007-06-29 2009-01-01 Konica Minolta Systems Laboratory, Inc. Systems and Methods of Trapping for Print Devices
US20090244563A1 (en) * 2008-03-27 2009-10-01 Konica Minolta Systems Laboratory, Inc. Systems and methods for color conversion
US7903286B2 (en) 2008-03-27 2011-03-08 Konica Minolta Systems Laboratory, Inc. Systems and methods for color conversion
US8570340B2 (en) 2008-03-31 2013-10-29 Konica Minolta Laboratory U.S.A., Inc. Systems and methods for data compression
US20090310151A1 (en) * 2008-06-12 2009-12-17 Kurt Nathan Nordback Systems and Methods for Multi-Mode Color Blending
US8699042B2 (en) 2008-06-12 2014-04-15 Konica Minolta Laboratory U.S.A., Inc. Systems and methods for multi-mode color blending
CN102147766A (zh) * 2010-12-17 2011-08-10 曙光信息产业股份有限公司 一种维护tcp流表结构和乱序缓冲区的方法
US8527849B2 (en) * 2011-08-19 2013-09-03 Stec, Inc. High speed hard LDPC decoder
US20140189304A1 (en) * 2012-12-31 2014-07-03 Tensilica Inc. Bit-level register file updates in extensible processor architecture
US9448801B2 (en) 2012-12-31 2016-09-20 Cadence Design Systems, Inc. Automatic register port selection in extensible processor architecture
US9477473B2 (en) * 2012-12-31 2016-10-25 Cadence Design Systems, Inc. Bit-level register file updates in extensible processor architecture
US10171107B2 (en) 2014-01-31 2019-01-01 Hewlett-Packard Development Company, L.P. Groups of phase invariant codewords
US10560117B2 (en) 2014-01-31 2020-02-11 Hewlett-Packard Development Company, L.P. Groups of phase invariant codewords
US9478312B1 (en) * 2014-12-23 2016-10-25 Amazon Technologies, Inc. Address circuit
US9805819B1 (en) 2014-12-23 2017-10-31 Amazon Technologies, Inc. Address circuit

Also Published As

Publication number Publication date
EP0875859B1 (en) 2006-09-13
EP0875859A3 (en) 2000-10-18
EP0875853A2 (en) 1998-11-04
EP0875855A2 (en) 1998-11-04
US6246396B1 (en) 2001-06-12
EP1553523A2 (en) 2005-07-13
US6195674B1 (en) 2001-02-27
EP0875859A2 (en) 1998-11-04
US6507898B1 (en) 2003-01-14
US6349379B2 (en) 2002-02-19
AU6369598A (en) 1998-11-19
US20010021971A1 (en) 2001-09-13
EP1553523A3 (en) 2010-10-06
AU717168B2 (en) 2000-03-16
EP0875854A3 (en) 2003-08-13
EP0875855A3 (en) 2003-05-28
EP0875855B1 (en) 2006-08-02
EP0875854A2 (en) 1998-11-04
EP0875853A3 (en) 2003-05-28

Similar Documents

Publication Publication Date Title
US6272257B1 (en) Decoder of variable length codes
US6311258B1 (en) Data buffer apparatus and method for storing graphical data using data encoders and decoders
US6336180B1 (en) Method, apparatus and system for managing virtual memory with virtual-physical mapping
US6707463B1 (en) Data normalization technique
US6674536B2 (en) Multi-instruction stream processor
US6118724A (en) Memory controller architecture
US6289138B1 (en) General image processor
US6237079B1 (en) Coprocessor interface having pending instructions queue and clean-up queue and dynamically allocating memory
US7230633B2 (en) Method and apparatus for image blending
US7681013B1 (en) Method for variable length decoding using multiple configurable look-up tables
JP4101253B2 (ja) 圧縮装置及びその方法
US7467287B1 (en) Method and apparatus for vector table look-up
US6822654B1 (en) Memory controller chipset
JP4227218B2 (ja) 動的メモリ管理装置及びその制御方法
AU739533B2 (en) Graphics processor architecture
JP4298006B2 (ja) 画像プロセッサ及びその画像処理方法
AU728882B2 (en) Compression
AU727990B2 (en) Graphics processing system
AU717336B2 (en) Graphics processor architecture
AU760297B2 (en) Memory controller architecture
AU766467B2 (en) Graphics processing system
JPH11167627A (ja) 画像処理装置及びその方法
US7558947B1 (en) Method and apparatus for computing vector absolute differences

Legal Events

Date Code Title Description
AS Assignment

Owner name: CANON KABUSHIKI KAISHA, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PROKOP, THOMASZ THOMAS;REEL/FRAME:009373/0419

Effective date: 19980615

AS Assignment

Owner name: CANON INFORMATION SYSTEMS RESEARCH AUSTRALIA PTY.

Free format text: RE-RECORD TO ADD THE SECOND ASSIGNEE, PREVIOUSLY RECORDED ON REEL 9373 FRAME 0419, ASSIGNOR CONFIRMS THE ASSIGNMENT OF THE ENTIRE INTEREST.;ASSIGNOR:PROKOP, THOMASZ THOMAS;REEL/FRAME:010198/0397

Effective date: 19980615

Owner name: CANON KABUSHIKI KAISHA, JAPAN

Free format text: RE-RECORD TO ADD THE SECOND ASSIGNEE, PREVIOUSLY RECORDED ON REEL 9373 FRAME 0419, ASSIGNOR CONFIRMS THE ASSIGNMENT OF THE ENTIRE INTEREST.;ASSIGNOR:PROKOP, THOMASZ THOMAS;REEL/FRAME:010198/0397

Effective date: 19980615

AS Assignment

Owner name: CANON KABUSHIKI KAISHA, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CANON INFORMATION SYSTEMS RESEARCH AUSTRALIA PTY. LTD.;REEL/FRAME:011036/0686

Effective date: 20000202

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12