WO2005025230A1 - Image processing device - Google Patents

Image processing device Download PDF

Info

Publication number
WO2005025230A1
WO2005025230A1 PCT/JP2003/010977 JP0310977W WO2005025230A1 WO 2005025230 A1 WO2005025230 A1 WO 2005025230A1 JP 0310977 W JP0310977 W JP 0310977W WO 2005025230 A1 WO2005025230 A1 WO 2005025230A1
Authority
WO
WIPO (PCT)
Prior art keywords
simd
image processing
processors
type computer
computers
Prior art date
Application number
PCT/JP2003/010977
Other languages
French (fr)
Japanese (ja)
Inventor
Hiroshi Takayanagi
Nobuhiro Seki
Osamu Mouri
Akihiro Makino
Masahiro Miura
Original Assignee
Hitachi Ulsi Systems Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ulsi Systems Co., Ltd. filed Critical Hitachi Ulsi Systems Co., Ltd.
Priority to PCT/JP2003/010977 priority Critical patent/WO2005025230A1/en
Priority to JP2005508742A priority patent/JP4516020B2/en
Publication of WO2005025230A1 publication Critical patent/WO2005025230A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation

Definitions

  • the present invention relates to an image processing apparatus, and more particularly to a technique that is effective when applied to MPEG image compression / decompression.
  • Huffman coding is performed by dividing a digital image into blocks and detecting motion vector for each block, discrete cosine transform (DCT), quantization, and ACZDC prediction. And compress the image data.
  • DCT discrete cosine transform
  • ACZDC prediction ACZDC prediction
  • motion vector detection is based on the difference between the current frame and the macroblock of the 16x16 pixel macroblock (shift block) in the image frame from the macroblock range of the previous or next image frame. Detects the position of the smallest 16 x 16 pixel. Then, by using the position vector and the difference (frame difference) to perform DCT, quantization, AC / DC prediction, and Huffman coding, high moving image compression becomes possible.
  • the expansion processing of the compressed data is realized by a procedure reverse to the above-described compression, that is, by generating a compensation image from Huffman decoding, AC / DC prediction, inverse quantization, inverse DCT, and motion vector information. .
  • SIMD Single Instruction Multi-Datastream
  • An example of the realization of the S IMD type parallel computing architecture is a general-purpose computer.
  • the architecture of this computer consists of multiple processors and one control system, and the control system operates by broadcasting common instructions and data to all processors.
  • Each processor has a local memory and an arithmetic unit (multiplier, ALU, shifter, etc.).
  • the control system has rewritable program memory and global memory.
  • a broadcast data bus for transmitting data from the control system to all processors and a common bus for transmitting data to the control system via a tri-state buffer by any one of the processors specified by the address signal are used between the control system and the processors. Is transmitted and received.
  • the data transmitted by the control system to all processors via the broadcast data path can be either data on the memory in the control system or data received by the control system from any one processor via a common bus. Can be selected.
  • Instructions from the control system can be executed by specifying only one processor specified by the address signal.
  • the control system can initialize the local memory of each processor, and it is possible to transmit 1-to-N (N is an arbitrary natural number) data between the control system and the processor.
  • each of multiple processors controlled by a single instruction broadcast from a control system is a unit that mimics a neuron (neural cell). Then, the operation of the neural network is imitated by calculating the weight value data in the local memory of each processor with respect to the input data broadcast to all the processors via the control system.
  • the control system program it is possible to perform general-purpose parallel computation of the Euro algorithm, other neuro-algorithms, and other non-neuro-algorithms, as typified by pack propagation (error reverse propagation). Become.
  • a plurality of processors can be used as a calculation unit of a shift block. That is, the shift block (a plurality of 16 x 16 pixels) in the motion vector detection range on the image one frame before is initialized in the local memory of the processor (the plurality).
  • the motion vector detection image (a macroblock of 16 x 16 pixels) of the current frame is broadcast from the control system to all processors.
  • Each processor calculates the frame difference between the broadcast data and the data in the local memory.
  • the control system can detect the image position of the shift block set to the minimum processor as the motion vector position by comparing the frame differences of all processors. Disclosure of the invention
  • motion detection and DCT processing are realized by only one SIMD type parallel computer, it is necessary to move the difference block information result of the processor that detected the motion to another processor for DCT processing. Occurs, and the overall processing performance decreases.
  • motion detection requires a difference absolute value calculator in the processor, and DCT requires a multiplier. Therefore, if motion detection and DCT are processed by a single SIMD parallel computer, both the absolute difference calculator and the multiplier must be built into the processor, increasing the overall gate size.
  • an object of the present invention is to provide an image processing apparatus which has a small circuit configuration and can be operated at high speed in image processing such as MPEG image compression / expansion.
  • At least a plurality of processors including a difference calculator and a local memory operate with a single instruction from the first control unit, and all the processors from the first control unit to all the processors.
  • the first S IMD type computer is connected to an S IMD type computer, and the first S IMD type computer performs motion detection processing in image processing, and one or more second S IMD type computers are used for DCT, inverse DCT, and quantum in image processing. Or quantization processing.
  • the operation result of the first SIMD type computer is transmitted in parallel to the processor in the second SIMD type computer via a buffer, and the processor performs the processing in parallel for each image block unit. Perform processing.
  • header information (block) indicating the attribute of the block.
  • each processor of the second S IMD type computer can determine each header information and efficiently perform the processing suitable for the block. It can be configured to be able to be performed.
  • Performance can be improved because motion detection and DCT / quantization can be performed by pipeline operation for each process.
  • An SIMD computer with a relatively small number of processors enables real-time image processing, such as MPEG compression and decompression, as represented by MPEG.
  • Semiconductor integrated circuits (electronic components) that can be mounted on low-power-consumption portable home appliances can realize image processing functions with a higher pixel density than before.
  • FIG. 1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention.
  • FIG. 2 is a block diagram showing an internal configuration of the SIMD type computer 100 included in the image processing apparatus according to one embodiment of the present invention.
  • FIG. 3 is an explanatory diagram showing a motion detection processing procedure in the SIMD computer 100 in the image processing apparatus according to the embodiment of the present invention.
  • FIG. 4 is an explanatory diagram showing a DCT / quantization processing procedure in the SIMD type computer 200 in the image processing apparatus according to the embodiment of the present invention.
  • FIG. 5 is an explanatory diagram showing an inverse DCT / inverse quantization procedure in the SIMD type computer 300 in the image processing apparatus according to the embodiment of the present invention.
  • FIG. 6 is a block diagram showing a configuration of a viewer V to which the image processing device according to one embodiment of the present invention is applied.
  • FIG. 1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention.
  • the image processing apparatus according to the present embodiment is, for example, an image compression system, and includes a SIMD-type computer 100, a SIMD-type computer 200, a SIMD-type computer 300, and buffers 401 to 400. , 501 to 506, 601 to 606, and the like.
  • the SIMD type computer 100 is composed of a processor array 130 composed of a plurality of processors 101 to 116 including an arithmetic unit (for example, a difference arithmetic unit) and a local memory, a control unit 140, and the like. It is configured.
  • the SIMD type computer 200 is composed of a processor array 230 consisting of a plurality of processors 201 to 206 including an arithmetic unit (for example, a multiplier) and a local memory, and a control unit 240. It is configured.
  • a processor array 230 consisting of a plurality of processors 201 to 206 including an arithmetic unit (for example, a multiplier) and a local memory, and a control unit 240. It is configured.
  • the SIMD type computer 300 includes a processor array 330 composed of a plurality of processors 301 to 360 including an arithmetic unit (for example, a multiplier) and a local memory, and a control unit 340. It is composed of
  • the SI MD computer 100 and the SI MD computer 200 are electrically connected to each other via a plurality of buffers 40:! To 406, and the control unit in the SI MD computer 100
  • the output of the host 140 is input in parallel to the plurality of buffers 401 to 406, and the output of the buffer 40 :! to 406 is output to the plurality of processors 200 in the SIMD type computer 200.
  • ⁇ 206 are input in parallel.
  • the SI MD type computer 200 and the SI MD type computer 300 are electrically connected to each other via a plurality of buffers 501 to 506, and a plurality of The outputs of the processors 201 to 206 are input in parallel to a plurality of buffers 50 :!
  • the outputs of the buffers 501 to 506 are output to a plurality of buffers in the SIMD type computer 300.
  • the signals are input to the processors 301 to 303 in parallel.
  • SI MD type Calculator 3 0 0 Multiple processors 3 0 1 to 3 6 Outputs multiple buffers 6 Enter 01 to 606.
  • the output of the buffers 501 to 506 is AC /
  • the outputs are transmitted to the DC prediction and the Huffman processing, and the outputs 601 to 606 are transmitted to the compensation image generation processing.
  • each processor 101 to 116 in the processor array 130 and the control unit 140 are electrically connected by an instruction bus 150, a broadcast data bus 160, a processor data output common bus 170, and the like. ing.
  • the processors 201 to 206 and the control unit 240 are electrically connected by an instruction bus 250 or the like.
  • the processors 301 to 306 and the control unit 340 are electrically connected to each other by an instruction bus 350 or the like.
  • the SIMD computers 100 to 300 have three stages, but may have two or four or more stages.
  • the number of processors 101 to 116 in the SIMD type computer 100 is 16, but any number is acceptable.
  • the processors 201 to 206 and 301 to 306 and the buffers 401 to 406, 501 to 506, and 601 to 606 in the 310-type computers 200 and 300 are respectively six in parallel, but how many are in parallel? You may.
  • FIG. 2 shows a detailed configuration of the SIMD type computer 100.
  • the SMDD computer 100 includes a plurality of memory units 121 to 129 in addition to the control unit 140 and the processor array 130.
  • Local memory and memory cut 12 in processors 101-116! 129 are composed of RAM (memory).
  • the processors 101 to 116 are arranged in a matrix, and the local memory in each processor 1 to 1 to 116 is connected to other processors in the upper, lower, left, and right directions, and can shift arithmetic data forward, backward, left, and right It is as follows.
  • the local memories of the processors 104, 108, 1 12, 1 13, 1 14, 1 15, and 1 16 located at the end of the processor array 130 include a memory unit 121 arranged around the processor array 130. To 129, and the arithmetic units can shift the operation data with the memory units 121 to 129.
  • the control unit 140 and the arithmetic units in all the processors 101 to 116 are connected to the instruction bus 150 and the A command and data are output from the control unit 140 to all the processors 101 to 116 via a broadcast data path 160.
  • the outputs of the arithmetic units in all the processors 101 to 116 are connected to the control unit 140 via a tri-state buffer and a processor data output common bus 170, and the arithmetic data of the arithmetic units in each of the processors 101 to 116 are output. Is output to the control unit 140.
  • Each of the memory units 121 to 129 is connected to another adjacent memory cut, so that data can be shifted between the memory units. Further, each of the memory units 121 to 129 and the control unit 140 are connected via a memory common bus 180.
  • control unit 140 is connected to an external control (main CPU) and an external memory (image data).
  • the SIMD computer 100 performs a motion detection process in image processing. Then, the SMDD computer 10.0 outputs the difference information and the motion vector information for each block, which are the result of the motion detection processing, to the buffers 401 to 406 in units of blocks. After outputting the difference information and the motion vector information to the buffers 401 to 406, the SIMD computer 100 performs a motion detection process for the next macroblock.
  • the processors 201 to 206 perform the DCT operation in the image processing in parallel.
  • the processors 201 to 206 take in the difference information for each block in the buffers 401 to 406 and perform a DCT operation.
  • the SIMD type computer 200 performs the quantization process in parallel by the processors 201-206.
  • the processors 201 to 2 ⁇ 6 output the processing results to the buffers 501 to 506 in parallel with the motion vector information.
  • the motion vector information of each block in the buffers 501 to 506 and the data after quantization processing are subjected to AC / DC processing and Huffman processing, and output as compressed data.
  • the motion vector information of each block in the buffers 501 to 506 and the data after the quantization processing are output to the SIMD type computer 300 for generating a compensation image.
  • the processors 301 to 306 perform inverse quantization on each block in parallel.
  • the processors 301 to 306 perform the inverse DCT operation in parallel.
  • the processors 301 to 306 output the processing results to the buffers 601 to 606 in parallel with the motion vector information.
  • the motion vector information of each block in the buffers 601 to 606 and the data after the inverse DCT calculation are used for the compensation image generation processing.
  • each processing motion detection, DCT operation, quantization, inverse quantization, inverse DCT operation
  • DCT calculation and quantization processing are performed by the S IMD computer 200, and inverse quantization processing and inverse DCT calculation are performed by the S IMD computer 300.
  • DCT operation, quantization processing, inverse quantization processing, and inverse DCT operation may be performed by a type computer, or each processing may be shared and executed by three or more SIMD type computers. Good.
  • the notifiers 401 to 406, 501 to 506, and 601 to 606 include not only the calculation processing results and the vector information of the block, but also the attributes of each block, such as whether or not the difference processing was performed with the comparison image. Information can be written, and each processor can determine the information and execute different arithmetic processing.
  • Fig. 3 (a) shows the order of motion detection processing in macroblock units for the entire image
  • Fig. 3 (b) shows the processing flow for each macroblock.
  • the whole image (current image) is divided into macroblocks (16 x 16 pixels) and processed for each macroblock.
  • the macro block is composed of luminance (Y0, Y1, Y2, Y3) and color difference (U, V).
  • Each of Y0, ⁇ 1, ⁇ 2, Y3, U, and V is composed of 8 ⁇ 8 color elements.
  • motion detection processing is performed for each macro block. The motion detection processing is performed by detecting a difference from the comparative image.
  • the information after the motion detection processing with the comparative image is the difference value information ( ⁇ ', Y1, Y, Y2, Y3, U, V') for each block Y0, Y1, Y2, Y3, U, V. ) And motion vector information.
  • the above block information is output to the buffers 401 to 406.
  • the difference value information (YO,, ⁇ 1 ', ⁇ 2,, Y3', U,, V,) and motion vector information of each block Y0, Yl, ⁇ 2, Y3, U, V are obtained from buffers 401 to 406.
  • the signals are input to the processors 201 to 206 in parallel, processed in parallel, and the processing results are output to the buffers 501 to 506 in parallel.
  • the processing results and motion vector information of each block ⁇ 0, ⁇ 1, ⁇ 2, , 3, U, V are input in parallel from the buffers 501 to 506 to the processors 301 to 306, and are processed in parallel, and the processing results are buffered. Output to 601 to 606 in parallel.
  • FIG. 6 shows an example in which the image processing apparatus according to the present embodiment is applied to a viewer system.
  • the system includes, for example, an image processing device 700, an ACZDC prediction Huffman 701, an image memory 702, a display circuit 703, a monitor 704, a ROM 705, a RAM 706, a CPU 707, an IF (interface) circuit 708 of the present embodiment. It is configured.
  • the image processing device 700 is connected to the image memory 702 and the AC / DC prediction Huffman 701, the image memory 702 is connected to the display circuit 703, and the display circuit 703 is connected to the monitor 704.
  • the ACZDC prediction Huffman 701, ROM 705, RAM 706, CPU 707, and IF circuit 708 are connected via paths, respectively.
  • the IF circuit 708 is connected to the memory card 709. This system monitors MPEG images taken with digital movie cameras, etc. And a system to display on TV.
  • This system processes dequantization and inverse DCT with only the SIMD type computer 300 among the image processing devices of the embodiment shown in FIG. 1 for processing only MPEG expansion. I do. Since the processing of motion detection, DCT, quantization, inverse quantization and inverse DCT are processed separately by each SIMD computer, the image processing device can be configured with only the necessary parts of the SIMD computer. Smaller size and lower power consumption are possible.
  • image processing such as video compression and decompression, such as MPEG
  • the processing function can be realized by a semiconductor integrated circuit device (electronic component) that can be mounted on a portable home appliance driven by low power consumption such as a digital video camera.
  • the present invention is not limited to this.
  • the image processing apparatus is suitable for use in electronic devices that perform moving image compression / expansion, such as digital video cameras, VCRs, and information terminals.
  • the present invention can be applied to all electronic devices that process calculation algorithms including matrix operations such as image processing and audio processing.

Abstract

A first SIMD computer (100) includes a plurality of processors (101 to 116) having at least a differential calculator and a local memory which processors are operated by a single instruction from a first control unit (140) and a broadcast bus (160) for transmitting data from the first control unit (140) to all the processors (101 to 116). The first SIMD computer (100) is connected to one or more second SIMD computers (200) including a plurality of processors (201 to 206) having at least a multiplier which processors are operated by a single instruction from a second control unit (240). The first SIMD computer (100) executes motion detection in the image processing and the one ore more second SIMD computers (200) execute DCT, inverse DCT, quantization, or de-quantization in the image processing.

Description

技術分野 Technical field
本発明は、 画像処理装置に関し、 特に、 MP EG画像圧縮伸張に適用して有効 な技術に関するものである。  The present invention relates to an image processing apparatus, and more particularly to a technique that is effective when applied to MPEG image compression / decompression.
 Light
背景技術 Background art
本発明者が検討した技術として、 例えば、書 MP EG (Mo v i n g P i c t u r e E x e r t s G r o u p) 画像圧縮伸張においては、 次の技術が考 えられる。  As a technique studied by the present inventor, for example, the following technique is conceivable in the compression and decompression of a book MPEG (MovPiccteurExErtsGroop).
I S O/ I EC 1 4496 -2 (MP EG 4) 、 I SO/1 EC 1 3 8 1 8- 2 (MP EG 2) 、 I SO/ I EC.1 1 1 7 2— 2 (MPEG 1) などの動画圧 縮規格では、 デジタルィヒされた画像をブロック分割してブロックごとに動きべク トル検出、 離散コサイン変換 (DCT ; D i s c r e t e C o s i n e T r a n s f o r m) 、 量子化、 ACZDC予測を実施し、 ハフマン符号化して画像 データを圧縮する。  ISO / IEC 14496-2 (MP EG 4), ISO 1 EC 13 8 18-2 (MP EG 2), ISO / IEC 1.11 7 2—2 (MPEG 1), etc. In the video compression standard, Huffman coding is performed by dividing a digital image into blocks and detecting motion vector for each block, discrete cosine transform (DCT), quantization, and ACZDC prediction. And compress the image data.
このうち、 動きべクトル検出は、 画像フレーム中の 1 6 X 1 6画素のマクロプ ロック (シフトブロック) に対して、 時間的に前あるいは後の画像フレームのマ クロプロック範囲から、 現在のフレームと差分がもっとも小さい 1 6 X 1 6画素 の位置を検出する。 そして、 その位置ベクトルと差分 (フレーム差) を用いて、 以後の DCT、量子化、 AC/DC予測、ハフマン符号化を実施することにより、 高動画圧縮が可能となる。  Of these, motion vector detection is based on the difference between the current frame and the macroblock of the 16x16 pixel macroblock (shift block) in the image frame from the macroblock range of the previous or next image frame. Detects the position of the smallest 16 x 16 pixel. Then, by using the position vector and the difference (frame difference) to perform DCT, quantization, AC / DC prediction, and Huffman coding, high moving image compression becomes possible.
また、 圧縮データの伸張処理は、 前記の圧縮と逆の手順、 すなわち、 ハフマン 復号化、 AC/DC予測、 逆量子化、 逆 DC Tおよび動きベク トル情報から補償 画を生成することにより実現する。  Further, the expansion processing of the compressed data is realized by a procedure reverse to the above-described compression, that is, by generating a compensation image from Huffman decoding, AC / DC prediction, inverse quantization, inverse DCT, and motion vector information. .
また、 動画圧縮を含めた画像処理は、 比較的単純な計算アルゴリズムの反復が 多く、 同一命令に対するデータ並列性が大きい。 そのため、 画像処理の高速化に は S I MD (S i n g l e I n s t r u c t i o n Mu l t i p l e Da t a s t r e am ;単一命令複数データ流) 型並列計算手法が適することが多 い。 Also, image processing including video compression involves many iterations of relatively simple calculation algorithms, and high data parallelism for the same instruction. Therefore, to speed up image processing In many cases, a SIMD (Single Instruction Multi-Datastream) single-instruction, multiple-data-flow parallel computing method is suitable.
S IMD型並列計算アーキテクチャの実現例としては、 例えば、 汎用型-ユー 口コンピュータがある。 このコンピュータのアーキテクチャは、 複数のプロセッ サと 1つの制御系からなり、 制御系が全プロセッサへ共通の命令とデータをプロ ードキャストして動作する。各プロセッサは、 ローカルメモリと演算器(乗算器、 ALU, シフタなど) を備えている。 制御系は、 書き換え可能なプログラムメモ リとグローバルメモリを備えている。 そして、 制御系から全プロセッサにデータ を伝達するブロードキャストデータバスと、 アドレス信号で指定する任意 1つの プロセッサがトライステートバッファを介して制御系にデータを伝達する共通 バスとによって、 制御系 ·プロセッサ間のデータの授受が行われる。  An example of the realization of the S IMD type parallel computing architecture is a general-purpose computer. The architecture of this computer consists of multiple processors and one control system, and the control system operates by broadcasting common instructions and data to all processors. Each processor has a local memory and an arithmetic unit (multiplier, ALU, shifter, etc.). The control system has rewritable program memory and global memory. A broadcast data bus for transmitting data from the control system to all processors and a common bus for transmitting data to the control system via a tri-state buffer by any one of the processors specified by the address signal are used between the control system and the processors. Is transmitted and received.
制御系がブロードキャストデータパスを介して全プロセッサに伝達するデー タとしては、 制御系内のメモリ上のデータ、 または制御系が共通バスを介して任 意の 1つのプロセッサから受け取ったデータのうちどちらかを選択することが できる。  The data transmitted by the control system to all processors via the broadcast data path can be either data on the memory in the control system or data received by the control system from any one processor via a common bus. Can be selected.
また、 制御系からの命令は、 アドレス信号で指定する 1つのプロセッサだけを 指定して実行させることも可能である。 これにより、 制御系は各プロセッサの口 一カルメモリを初期設定することが可能となり、 制御系 ·プロセッサ間の 1対 N (Nは任意の自然数) のデータ伝送が可能となる。  Instructions from the control system can be executed by specifying only one processor specified by the address signal. As a result, the control system can initialize the local memory of each processor, and it is possible to transmit 1-to-N (N is an arbitrary natural number) data between the control system and the processor.
汎用型ニューロコンピュータにおいて、 制御系からブロードキャストされる単 一の命令で制御される複数のプロセッサの個々がニューロン (神経細胞) の模倣 単位となる。 そして、 制御系を介して全プロセッサへブロードキャストされる入 力データに対して、 各プロセッサが個々に持つローカルメモリ内の重み値データ を演算することで、 ニューラルネットワークの動作を模倣する。 すなわち、 制御 系のプロプラムを書き換えることによって、 パックプロパゲーション (誤差逆伝 搬) に代表される-ユーロアルゴリズムやその他のニューロアルゴリズムなど、 あるいはニューロ以外のアルゴリズムを汎用的に並列計算することが可能とな る。 前記汎用型-ユーロコンピュータを画像処理における動きべク トル検出に適 用すると、 複数のプロセッサをシフトブロックの計算単位とすることができる。 すなわち、 1フレーム前の画像上にある動きべクトル検出範囲のシフトブロック ( 1 6 X 1 6画素の複数) をプロセッサ (複数) のローカルメモリに初期設定す る。 現フレームの動きベク トル検出画像 (1 6 X 1 6画素のマクロブロック) を 制御系から全プロセッサへブロードキャストする。 各プロセッサは、 ブロードキ ヤストされたデータとローカルメモリ内のデータとのフレーム差を演算する。 制 御系は、 全プロセッサのフレーム差を比較することで、 最小となるプロセッサに 設定したシフトブロックの画像位置を動きべクトル位置として検出することが できる。 発明の開示 In a general-purpose neurocomputer, each of multiple processors controlled by a single instruction broadcast from a control system is a unit that mimics a neuron (neural cell). Then, the operation of the neural network is imitated by calculating the weight value data in the local memory of each processor with respect to the input data broadcast to all the processors via the control system. In other words, by rewriting the control system program, it is possible to perform general-purpose parallel computation of the Euro algorithm, other neuro-algorithms, and other non-neuro-algorithms, as typified by pack propagation (error reverse propagation). Become. When the general-purpose euro computer is applied to motion vector detection in image processing, a plurality of processors can be used as a calculation unit of a shift block. That is, the shift block (a plurality of 16 x 16 pixels) in the motion vector detection range on the image one frame before is initialized in the local memory of the processor (the plurality). The motion vector detection image (a macroblock of 16 x 16 pixels) of the current frame is broadcast from the control system to all processors. Each processor calculates the frame difference between the broadcast data and the data in the local memory. The control system can detect the image position of the shift block set to the minimum processor as the motion vector position by comparing the frame differences of all processors. Disclosure of the invention
ところで、 前記のような M P E G画像圧縮伸張の技術について、 本発明者が検 討した結果、 以下のようなことが明らかとなった。  By the way, as a result of the present inventor's study on the above-mentioned technique of MPEG image compression / expansion, the following has become clear.
M P E G画像圧縮伸張における動きべクトル検出および D C T '逆 D C Tのた めの演算処理量は非常に膨大である。 動きベク トル検出において、 もっとも単純 な 「動き補償フレーム間符号化」 では、 1フレーム前のマクロブロックの周り土 1 5画素が検出範囲とされ、 現フレームのマクロブロックとの比較対照となるシ フトプロックは 9 6 1個存在する。 したがって、 V G A (画素サイズ; 6 4 O X 4 8 0画素) についての演算量は 2 7 8メガ回以上、 1秒間に 3 0フレームの処 理で換算すると 8ギガ回/秒必要となる。 また、 D C Tおよぴ逆 D C Tにおいて も、 V G Aサイズ毎秒 3 0フレーム処理換算で 0 . 8ギガ回 /秒が必要となる。 以上の処理を半導体集積回路装置の単一の演算器で行う場合、 1 0ギガオーダ の動作周波数が必要となり、 そのままデジタルビデオカメラなどの低消費電力で 駆動する携帯型家電に実装することは非常に困難である。  The amount of calculation processing for motion vector detection and DCT ′ reverse DCT in MPEG image compression / decompression is extremely large. In motion vector detection, the simplest “motion-compensated inter-frame coding” uses a shift block of 15 pixels around the macroblock one frame before as the detection range, which can be compared with the macroblock of the current frame. There are 9 6 1 Therefore, the amount of calculation for VGA (pixel size: 64 O X 480 pixels) is more than 278 mega-times, which is 8 giga-times / second when converted by processing 30 frames per second. Also, in the case of DCT and inverse DCT, 0.8 Giga-times / sec in terms of VGA size per 30 frames is required. When the above processing is performed by a single arithmetic unit of a semiconductor integrated circuit device, an operating frequency of 10 Giga-order is required, and it is very difficult to implement it directly in a low power consumption portable home appliance such as a digital video camera. Have difficulty.
また、 前記汎用型ニューロコンピュータのアーキテクチャを用いて動き検出を 行う場合、 各プロセッサのローカルメモリにマクロプロックに対する比較画情報 を記憶しておき、 マクロブロック情報をブロードキャストし、 ローカルメモリ内 の情報との差分演算を行えば、 各プロセッサで並列動作が実現できる。 また、 DCT ·逆 DCT処理においては、 各プロセッサのローカルメモリにプ ロック単位の差分情報を記憶しておき、 DCT '逆 DCT係数をブロードキャス トすることにより、 演算の並列化を実現できる。 In addition, when motion detection is performed using the architecture of the general-purpose neurocomputer, comparative image information for a macro block is stored in a local memory of each processor, macro block information is broadcasted, and the information in the local memory is compared. By performing the difference operation, parallel operation can be realized in each processor. In the DCT / inverse DCT processing, parallel processing can be realized by storing difference information for each block in the local memory of each processor and broadcasting DCT ′ inverse DCT coefficients.
し力 し、 上記の動き検出と DCT処理を 1つの S I MD型並列計算機のみで実 現する場合、 動きを検出したプロセッサの差分ブロック情報結果を DC T処理の ために他のプロセッサへ移動する必要が生じ、 全体の処理性能が低下する。 また、 動き検出ではプロセッサ内に差分絶対値演算器が必要であり、 D C Tでは乗算器 が必要となる。 したがって、 動き検出と DCTを 1つの S IMD型並列計算機で 処理した場合、 差分絶対値演算器と乗算器の両方をプロセッサに内蔵して構成す る必要があり、 全体のゲート規模が増加する。  However, if the above-mentioned motion detection and DCT processing are realized by only one SIMD type parallel computer, it is necessary to move the difference block information result of the processor that detected the motion to another processor for DCT processing. Occurs, and the overall processing performance decreases. In addition, motion detection requires a difference absolute value calculator in the processor, and DCT requires a multiplier. Therefore, if motion detection and DCT are processed by a single SIMD parallel computer, both the absolute difference calculator and the multiplier must be built into the processor, increasing the overall gate size.
そこで、 本発明の目的は、 MP EG画像圧縮伸張などの画像処理において、 回 路構成が小規模で、 かつ高速に動作させることができる画像処理装置を提供する ものである。  Therefore, an object of the present invention is to provide an image processing apparatus which has a small circuit configuration and can be operated at high speed in image processing such as MPEG image compression / expansion.
本発明の前記ならびにその他の目的と新規な特徴は、 本明細書の記述および添 付図面から明らかになるであろう。  The above and other objects and novel features of the present invention will become apparent from the description of the present specification and the accompanying drawings.
本願において開示される発明のうち、 代表的なものの概要を簡単に説明すれば、 次のとおりである。  The outline of typical inventions disclosed in the present application will be briefly described as follows.
すなわち、 本発明による画像処理装置は、 少なくとも差分演算器とローカルメ モリとを含む複数のプロセッサが第 1の制御ュニットからの単一命令で動作し、 第 1の制御ュ-ットから全プロセッサへデータを伝送するブロードキャストデ ータパスを備えた第 1の S IMD型計算機と、 少なくとも乗算器を含む複数のプ 口セッサが第 2の制御ュニットからの単一命令で動作する 1つ以上の第 2の S IMD型計算機とを接続し、 第 1の S IMD型計算機は画像処理における動き検 出の処理を行い、 1つまたは複数の第 2の S IMD型計算機は画像処理における DCT、 逆 DCT、 量子化または逆量子化の処理を行うものである。  That is, in the image processing apparatus according to the present invention, at least a plurality of processors including a difference calculator and a local memory operate with a single instruction from the first control unit, and all the processors from the first control unit to all the processors. A first SIMD-type computer with a broadcast data path for transmitting data, and one or more second processors in which at least a plurality of processors including multipliers operate with a single instruction from a second control unit. The first S IMD type computer is connected to an S IMD type computer, and the first S IMD type computer performs motion detection processing in image processing, and one or more second S IMD type computers are used for DCT, inverse DCT, and quantum in image processing. Or quantization processing.
また、 上記構成において、 第 1の S I MD型計算機の演算結果は、 第 2の S I MD型計算機内のプロセッサへ、 バッファを介して並列に伝送され、 プロセッサ は、 それぞれ画像のブロック単位で並列に処理を行う。  Further, in the above configuration, the operation result of the first SIMD type computer is transmitted in parallel to the processor in the second SIMD type computer via a buffer, and the processor performs the processing in parallel for each image block unit. Perform processing.
また、プロック単位のデータ伝送ごとに、プロックの属性を示すヘッダ情報(ブ ロックのずれのべクトル情報、 I · P · Bフレーム情報など) を付加することに より、 第 2の S IMD型計算機の各プロセッサは各ヘッダ情報を判別してプロッ クに適した処理を効率的に行える構成とする。 Also, for each data transmission in block units, header information (block) indicating the attribute of the block. By adding the information on the vector of the lock shift, the IP, B frame information, etc.), each processor of the second S IMD type computer can determine each header information and efficiently perform the processing suitable for the block. It can be configured to be able to be performed.
本願において開示される発明のうち、 代表的なものによって得られる効果を簡 単に説明すれば、 以下のとおりである。  The effects obtained by typical aspects of the invention disclosed in the present application will be briefly described as follows.
(1) MPEG動画圧縮において、 動き検出処理および DCT ·量子化などの 処理を各々別の S IMD型計算機で処理するため、 各処理に最小限必要な演算器 およびローカルメモリで構成することができ、 それぞれの処理に必要な性能に見 合ったプロセッサ数で画像圧縮処理を実現できる。  (1) In MPEG video compression, motion detection processing and DCT / quantization processing are performed by separate SIMD-type computers. However, image compression processing can be realized with the number of processors corresponding to the performance required for each processing.
(2) 動き検出や DCT ·量子化などを処理ごとにパイプライン動作で行える ため、 性能の向上が図れる。  (2) Performance can be improved because motion detection and DCT / quantization can be performed by pipeline operation for each process.
(3) プロセッサ数が比較的小規模の S IMD型計算機の構成で、 MPEGに 代表される動画圧縮 ·伸張などの画像処理が実時間で処理可能となるため、 例え ばデジタルビデオ力メラなどの低消費電力駆動の携帯型家電に実装可能な半導 体集積回路 (電子部品) で、 従来に比べて画素密度の大きい画像処理機能を実現 することが可能となる。 図面の簡単な説明  (3) An SIMD computer with a relatively small number of processors enables real-time image processing, such as MPEG compression and decompression, as represented by MPEG. Semiconductor integrated circuits (electronic components) that can be mounted on low-power-consumption portable home appliances can realize image processing functions with a higher pixel density than before. Brief Description of Drawings
図 1は本発明の一実施の形態の画像処理装置の構成を示すプロック図である。 図 2は本発明の一実施の形態の画像処理装置に含まれる S I MD型計算機 1 00の内部構成を示すプロック図である。  FIG. 1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention. FIG. 2 is a block diagram showing an internal configuration of the SIMD type computer 100 included in the image processing apparatus according to one embodiment of the present invention.
図 3は本発明の一実施の形態の画像処理装置において、 S I MD型計算機 10 0における動き検出処理手順を示す説明図である。  FIG. 3 is an explanatory diagram showing a motion detection processing procedure in the SIMD computer 100 in the image processing apparatus according to the embodiment of the present invention.
図 4は本発明の一実施の形態の画像処理装置において、 S I MD型計算機 20 0における DCT ·量子化処理手順を示す説明図である。  FIG. 4 is an explanatory diagram showing a DCT / quantization processing procedure in the SIMD type computer 200 in the image processing apparatus according to the embodiment of the present invention.
図 5は本発明の一実施の形態の画像処理装置において、 S IMD型計算機 30 0における逆 DC T ·逆量子化処理手順を示す説明図である。  FIG. 5 is an explanatory diagram showing an inverse DCT / inverse quantization procedure in the SIMD type computer 300 in the image processing apparatus according to the embodiment of the present invention.
図 6は本発明の一実施の形態の画像処理装置を応用したビューアー V の構成を示すプロック図である。 発明を実施するための最良の形態 FIG. 6 is a block diagram showing a configuration of a viewer V to which the image processing device according to one embodiment of the present invention is applied. BEST MODE FOR CARRYING OUT THE INVENTION
以下、 本発明の実施の形態を図面に基づいて詳細に説明する。 なお、 実施の形 態を説明するための全図において、 同一部材には同一の符号を付し、 その繰り返 しの説明は省略する。  Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In all the drawings for describing the embodiments, the same members are denoted by the same reference numerals, and the description thereof will not be repeated.
まず、 図 1により、 本発明の一実施の形態の画像処理装置の構成の一例を説明 する。 図 1は本発明の一実施の形態の画像処理装置の構成を示すプロック図であ る。 本実施の形態の画像処理装置は、 例えば、 画像圧縮システムとされ、 S I M D型計算機 1 0 0、 S I MD型計算機 2 0 0、 S I MD型計算機 3 0 0、 バッフ ァ 4 0 1〜4 0 6 , 5 0 1〜5 0 6 , 6 0 1〜 6 0 6などから構成されている。  First, an example of a configuration of an image processing apparatus according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention. The image processing apparatus according to the present embodiment is, for example, an image compression system, and includes a SIMD-type computer 100, a SIMD-type computer 200, a SIMD-type computer 300, and buffers 401 to 400. , 501 to 506, 601 to 606, and the like.
S I MD型計算機 1 0 0は、 演算器 (例えば差分演算器) とローカルメモリと を含む複数のプロセッサ 1 0 1〜1 1 6からなるプロセッサアレイ 1 3 0と、 制 御ユニット 1 4 0などから構成されている。  The SIMD type computer 100 is composed of a processor array 130 composed of a plurality of processors 101 to 116 including an arithmetic unit (for example, a difference arithmetic unit) and a local memory, a control unit 140, and the like. It is configured.
S I MD型計算機 2 0 0は、 演算器 (例えば乗算器) とローカルメモリとを含 む複数のプロセッサ 2 0 1〜2 0 6からなるプロセッサアレイ 2 3 0と、 制御ュ ニット 2 4 0などから構成されている。  The SIMD type computer 200 is composed of a processor array 230 consisting of a plurality of processors 201 to 206 including an arithmetic unit (for example, a multiplier) and a local memory, and a control unit 240. It is configured.
S I MD型計算機 3 0 0は、 演算器 (例えば乗算器) とローカルメモリとを含 む複数のプロセッサ 3 0 1〜3 0 6からなるプロセッサアレイ 3 3 0と、 制御ュ -ット 3 4 0などから構成されている。  The SIMD type computer 300 includes a processor array 330 composed of a plurality of processors 301 to 360 including an arithmetic unit (for example, a multiplier) and a local memory, and a control unit 340. It is composed of
S I MD型計算機 1 0 0と S I MD型計算機 2 0 0とは複数のバッファ 4 0 :!〜 4 0 6を介して電気的に接続され、 S I MD型計算機 1 0◦内の制御ュ-ッ ト 1 4 0の出力が複数のバッファ 4 0 1〜4 0 6に並列に入力し、 バッファ 4 0 :!〜 4 0 6の出力が S I MD型計算機 2 0 0内の複数のプロセッサ 2 0 1〜2 0 6に並列に入力している。 S I MD型計算機 2 0 0と S I MD型計算機 3 0 0 とは、 複数のバッファ 5 0 1〜5 0 6を介して電気的に接続され、 S I MD型計 算機 2 0 0内の複数のプロセッサ 2 0 1〜2 0 6の出力が複数のバッファ 5 0 :!〜 5 0 6に並列に入力し、 バッファ 5 0 1〜 5 0 6の出力が S I MD型計算機 3 0 0内の複数のプロセッサ 3 0 1〜3 0 6に並列に入力している。 S I MD型 計算機 3 0 0内の複数のプロセッサ 3 0 1〜3 0 6の出力が複数のバッファ 6 01〜606に入力している。 また、 ノ ッファ 501〜506の出力は、 AC/The SI MD computer 100 and the SI MD computer 200 are electrically connected to each other via a plurality of buffers 40:! To 406, and the control unit in the SI MD computer 100 The output of the host 140 is input in parallel to the plurality of buffers 401 to 406, and the output of the buffer 40 :! to 406 is output to the plurality of processors 200 in the SIMD type computer 200. ~ 206 are input in parallel. The SI MD type computer 200 and the SI MD type computer 300 are electrically connected to each other via a plurality of buffers 501 to 506, and a plurality of The outputs of the processors 201 to 206 are input in parallel to a plurality of buffers 50 :! to 506, and the outputs of the buffers 501 to 506 are output to a plurality of buffers in the SIMD type computer 300. The signals are input to the processors 301 to 303 in parallel. SI MD type Calculator 3 0 0 Multiple processors 3 0 1 to 3 6 Outputs multiple buffers 6 Enter 01 to 606. The output of the buffers 501 to 506 is AC /
D C予測およぴハフマン処理へ、 601〜 606の出力は補償画生成処 理へ伝送されている。 The outputs are transmitted to the DC prediction and the Huffman processing, and the outputs 601 to 606 are transmitted to the compensation image generation processing.
S IMD型計算機 10◦において、 プロセッサアレイ 130内の各プロセッサ 101〜 1 16と制御ュニット 140とは、 命令バス 150、 ブロードキャスト データバス 160、 プロセッサデータ出力共通バス 1 70などにより電気的に接 続されている。 S IMD型計算機 200において、 各プロセッサ 201〜206 と制御ュニット 240とは、 命令バス 250などにより電気的に接続されている。 S I MD型計算機 300において、 各プロセッサ 301〜306と制御ュニット 340とは、 命令バス 350などにより電気的に接続されている。  In the S IMD type computer 10◦, each processor 101 to 116 in the processor array 130 and the control unit 140 are electrically connected by an instruction bus 150, a broadcast data bus 160, a processor data output common bus 170, and the like. ing. In the S IMD type computer 200, the processors 201 to 206 and the control unit 240 are electrically connected by an instruction bus 250 or the like. In the S IMD type computer 300, the processors 301 to 306 and the control unit 340 are electrically connected to each other by an instruction bus 350 or the like.
なお、 図 1では、 S IMD型計算機 100〜300は 3段で構成されているが、 2段または 4段以上の構成であってもよい。 また、 S IMD型計算機 100内の プロセッサ 101〜1 16の個数は、 16個であるが、 いくつであってもよレヽ。 また、 31 0型計算機200, 300内のプロセッサ 201〜206, 301 〜 306並びにバッファ 401〜 406, 501〜 506, 601〜 606はそ れぞれ 6つ並列であるが、 並列個数はいくつであってもよい。  In FIG. 1, the SIMD computers 100 to 300 have three stages, but may have two or four or more stages. The number of processors 101 to 116 in the SIMD type computer 100 is 16, but any number is acceptable. The processors 201 to 206 and 301 to 306 and the buffers 401 to 406, 501 to 506, and 601 to 606 in the 310-type computers 200 and 300 are respectively six in parallel, but how many are in parallel? You may.
図 2に、 S I MD型計算機 100の詳細な構成を示す。 S I MD型計算機 10 0は、 制御ュニット 140、 プロセッサアレイ 130の他に複数のメモリュニッ ト 121〜129などから構成されている。 プロセッサ 101〜1 16内のロー カルメモリおよびメモリュ-ット 12:!〜 129は、 RAM (メモリ) から構成 されている。  FIG. 2 shows a detailed configuration of the SIMD type computer 100. The SMDD computer 100 includes a plurality of memory units 121 to 129 in addition to the control unit 140 and the processor array 130. Local memory and memory cut 12 in processors 101-116! 129 are composed of RAM (memory).
プロセッサアレイ 130内において、 各プロセッサ 101〜 1 16はマトリク ス状に配置され、 各プロセッサ 1◦ 1〜1 16内のローカルメモリは上下左右の 他のプロセッサと接続され、 演算データを前後左右ヘシフトできるようになって いる。 また、 プロセッサアレイ 130の端に位置するプロセッサ 104, 108, 1 12, 1 13, 1 14, 1 1 5, 1 16のローカルメモリは、 プロセッサァレ ィ 130の周りに配置されたメモリュニット 121〜 129と接続され、 メモリ ュニット 121〜 129とも演算データをシフトできるようになつている。 制御 ユニット 140と全プロセッサ 101〜1 16内の演算器は、命令バス 150お ょぴブロードキャストデータパス 160を介して接続され、 制御ュニット 140 から全プロセッサ 10 1〜1 1 6へ命令およびデータが出力されるようになつ ている。 全プロセッサ 101〜1 16内の演算器の出力は、 トライステートバッ ファとプロセッサデータ出力共通バス 1 70を介して制御ュニット 140と接 続され、 各プロセッサ 101〜1 16内の演算器の演算データが制御ュニット 1 40へ出力されるようになっている。 Within the processor array 130, the processors 101 to 116 are arranged in a matrix, and the local memory in each processor 1 to 1 to 116 is connected to other processors in the upper, lower, left, and right directions, and can shift arithmetic data forward, backward, left, and right It is as follows. The local memories of the processors 104, 108, 1 12, 1 13, 1 14, 1 15, and 1 16 located at the end of the processor array 130 include a memory unit 121 arranged around the processor array 130. To 129, and the arithmetic units can shift the operation data with the memory units 121 to 129. The control unit 140 and the arithmetic units in all the processors 101 to 116 are connected to the instruction bus 150 and the A command and data are output from the control unit 140 to all the processors 101 to 116 via a broadcast data path 160. The outputs of the arithmetic units in all the processors 101 to 116 are connected to the control unit 140 via a tri-state buffer and a processor data output common bus 170, and the arithmetic data of the arithmetic units in each of the processors 101 to 116 are output. Is output to the control unit 140.
各メモリユニット 121〜 129は隣接する他のメモリュ-ットと接続され、 メモリユニット間でデータをシフトできるようになつている。 また、 各メモリュ ニット 121〜 129と制御ュニット 140は、 メモリ共通バス 180を介して 接続されている。  Each of the memory units 121 to 129 is connected to another adjacent memory cut, so that data can be shifted between the memory units. Further, each of the memory units 121 to 129 and the control unit 140 are connected via a memory common bus 180.
さらに、 制御ユニット 140は、 外部制御 (メイン CPU) および外部メモリ (画像データ) と接続されている。  Further, the control unit 140 is connected to an external control (main CPU) and an external memory (image data).
次に、 図 1により、 本実施の形態の画像処理装置の動作を説明する。 まず、 S IMD型計算機 100では、 画像処理における動き検出処理を行う。 そして、 S I MD型計算機 10.0は、 動き検出処理の結果であるプロックごとの差分情報お ょぴ動きべクトル情報をプロック単位でバッファ 401〜406へ出力する。 差 分情報おょぴ動きべクトル情報をバッファ 401〜406へ出力した後、 S IM D型計算機 100は、 次のマクロブロックに対する動き検出処理を行う。  Next, the operation of the image processing apparatus according to the present embodiment will be described with reference to FIG. First, the SIMD computer 100 performs a motion detection process in image processing. Then, the SMDD computer 10.0 outputs the difference information and the motion vector information for each block, which are the result of the motion detection processing, to the buffers 401 to 406 in units of blocks. After outputting the difference information and the motion vector information to the buffers 401 to 406, the SIMD computer 100 performs a motion detection process for the next macroblock.
S I MD型計算機 200では、 プロセッサ 201〜 206により、 画像処理に おける DCT演算を並列に行う。 この時、 プロセッサ 201〜206は、 バッフ ァ 401〜 406内のプロックごとの差分情報を取り込み、 D C T演算を行う。 次に、 その演算結果を基に、 S IMD型計算機 200はプロセッサ 201〜20 6により量子化の処理を並列に行う。 各プロックの D C T演算および量子化処理 終了後、 プロセッサ 201〜2◦ 6は、 動きべクトル情報とともにバッファ 50 1〜506へ処理結果を並列に出力する。 バッファ 501〜506内の各プロッ クの動きべクトル情報おょぴ量子化処理後のデータは、 AC/DC処理とハフマ ン処理が行われ、 圧縮データとして出力される。  In the S I MD type computer 200, the processors 201 to 206 perform the DCT operation in the image processing in parallel. At this time, the processors 201 to 206 take in the difference information for each block in the buffers 401 to 406 and perform a DCT operation. Next, based on the calculation result, the SIMD type computer 200 performs the quantization process in parallel by the processors 201-206. After the DCT operation and quantization processing of each block are completed, the processors 201 to 2◦6 output the processing results to the buffers 501 to 506 in parallel with the motion vector information. The motion vector information of each block in the buffers 501 to 506 and the data after quantization processing are subjected to AC / DC processing and Huffman processing, and output as compressed data.
また、 バッファ 501〜 506の各プロックの動きべクトル情報および量子化 処理後のデータは、 補償画生成のために S IMD型計算機 300へ出力される。 S IMD型計算機 300では、 プロセッサ 301〜306が各ブロックに対して 逆量子化の処理を並列に行う。 次に、 その処理結果を基に、 プロセッサ 301〜 306は、 逆 DCT演算を並列に行う。 各ブロックの逆量子化処理および逆 DC T演算終了後、 プロセッサ 301〜306は、 動きベクトル情報とともにパッフ ァ 601〜 606へ処理結果を並列に出力する。 ノ ッファ 601〜 606内の各 プロックの動きべク トル情報および逆 D C T演算後のデータは、 補償画生成処理 に使用される。 The motion vector information of each block in the buffers 501 to 506 and the data after the quantization processing are output to the SIMD type computer 300 for generating a compensation image. In the S IMD type computer 300, the processors 301 to 306 perform inverse quantization on each block in parallel. Next, based on the processing result, the processors 301 to 306 perform the inverse DCT operation in parallel. After the inverse quantization processing and the inverse DCT operation of each block are completed, the processors 301 to 306 output the processing results to the buffers 601 to 606 in parallel with the motion vector information. The motion vector information of each block in the buffers 601 to 606 and the data after the inverse DCT calculation are used for the compensation image generation processing.
S I MD型計算機 100, 200, 300では、 それぞれの処理 (動き検出、 DCT演算、 量子化、 逆量子化、 逆 DC T演算) がバッファ 401〜406, 5 01〜506, 601〜606を介して行われるため、 パイプライン並行処理が 可能であり、 全体の処理が高速化される。  In the SI MD computers 100, 200, and 300, each processing (motion detection, DCT operation, quantization, inverse quantization, inverse DCT operation) is performed via buffers 401 to 406, 501 to 506, and 601 to 606. Since it is performed, pipeline parallel processing is possible, and the overall processing is accelerated.
なお、 本実施の形態では、 S IMD型計算機 200で DCT演算および量子ィ匕 処理を行い、 S IMD型計算機 300で逆量子化処理および逆 DC T演算を行つ ているが、 1つの S IMD型計算機で DCT演算、 量子化処理、 逆量子化処理お よび逆 DCT演算を行ってもよく、 あるいは、 3つ以上の S IMD型計算機でそ れぞれの処理を分担して実行させてもよい。  In the present embodiment, DCT calculation and quantization processing are performed by the S IMD computer 200, and inverse quantization processing and inverse DCT calculation are performed by the S IMD computer 300. DCT operation, quantization processing, inverse quantization processing, and inverse DCT operation may be performed by a type computer, or each processing may be shared and executed by three or more SIMD type computers. Good.
また、 ノ ッファ 401〜406, 501〜506, 601〜606には、 演算 処理結果やプロックのべク トル情報だけではなく、 プロックごとの属性、 例えば 比較画と差分処理を行ったか否かなどの情報を書き込むことができ、 各プロセッ サにてその情報を判断して、 それぞれ異なる演算処理を実施することが可能とな る。  In addition, the notifiers 401 to 406, 501 to 506, and 601 to 606 include not only the calculation processing results and the vector information of the block, but also the attributes of each block, such as whether or not the difference processing was performed with the comparison image. Information can be written, and each processor can determine the information and execute different arithmetic processing.
次に、 図 3により、 S IMD型計算機 100における動き検出処理の手順を説 明する。 図 3 (a) は全体画像に対するマクロプロック単位の動き検出処理順序 を示し、 図 3 (b) はマクロプロックごとの処理フローを示す。  Next, referring to FIG. 3, the procedure of the motion detection processing in the SIMD computer 100 will be described. Fig. 3 (a) shows the order of motion detection processing in macroblock units for the entire image, and Fig. 3 (b) shows the processing flow for each macroblock.
図 3 (a) に示すように、 全体画像 (現画像) は、 マクロプロック (16 X 1 6画素) に分割され、 マクロプロックごとに処理される。  As shown in Fig. 3 (a), the whole image (current image) is divided into macroblocks (16 x 16 pixels) and processed for each macroblock.
図 3 (b) に示すように、 マクロプロックは、輝度 (Y0, Y 1, Y2, Y 3) と色差 (U, V) とで構成される。 また、 Y0, Υ 1, Υ2, Y3, U, Vは、 それぞれ 8 X 8色要素で構成される。 マクロプロックに分割した後、 マクロプロックごとに動き検出処理を実施する。 動き検出処理は、 比較画との差分を検出することにより行われる。 As shown in Fig. 3 (b), the macro block is composed of luminance (Y0, Y1, Y2, Y3) and color difference (U, V). Each of Y0, Υ1, Υ2, Y3, U, and V is composed of 8 × 8 color elements. After dividing into macro blocks, motion detection processing is performed for each macro block. The motion detection processing is performed by detecting a difference from the comparative image.
したがって、 比較画との動き検出処理後の情報は、 プロック Y0, Y 1, Y2, Y3, U, Vごとの差分値情報 (ΥΟ' , Y1, , Υ2, , Y3, , U, , V' ) と動きベク トル情報となる。  Therefore, the information after the motion detection processing with the comparative image is the difference value information (ΥΟ ', Y1, Y, Y2, Y3, U, V') for each block Y0, Y1, Y2, Y3, U, V. ) And motion vector information.
上記のプロック情報は、 バッファ 401〜406へ出力される。  The above block information is output to the buffers 401 to 406.
次に図 4により、 S IMD型計算機 200における DCT演算おょぴ量子化処 理の手順を説明する。  Next, referring to FIG. 4, a description will be given of a procedure of DCT operation and quantization processing in the SIMD type computer 200.
各プロック Y0, Y l, Υ2, Y3, U, Vの差分値情報 (YO, , Υ 1 ' , Υ2, , Y3' , U, , V, ) と動きべクトル情報は、 バッファ 401〜406 からプロセッサ 201〜 206へ並列に入力され、 並列に演算処理され、 処理結 果がバッファ 501〜506へ並列に出力される。  The difference value information (YO,, Υ1 ', Υ2,, Y3', U,, V,) and motion vector information of each block Y0, Yl, Υ2, Y3, U, V are obtained from buffers 401 to 406. The signals are input to the processors 201 to 206 in parallel, processed in parallel, and the processing results are output to the buffers 501 to 506 in parallel.
次に、 図 5により、 S I MD型計算機 300における逆量子化処理および逆 D CT演算の手順を説明する。  Next, the procedure of the inverse quantization process and the inverse DCT operation in the SIMD type computer 300 will be described with reference to FIG.
各プロック Υ0, Υ 1, Υ 2, Υ3, U, Vの処理結果と動きベク トル情報は、 バッファ 501〜 506からプロセッサ 301〜 306へ並列に入力され、 並列 に演算処理され、 処理結果がバッファ 601〜606へ並列に出力される。  The processing results and motion vector information of each block Υ0, Υ1, Υ2, , 3, U, V are input in parallel from the buffers 501 to 506 to the processors 301 to 306, and are processed in parallel, and the processing results are buffered. Output to 601 to 606 in parallel.
次に、 図 6により、 本実施の形態の画像処理装置の応用例を説明する。 図 6は、 本実施の形態の画像処理装置をビューアーシステムへ応用した例である。 本シス テムは、 例えば、 本実施の形態の画像処理装置 700、 ACZDC予測ハフマン 701、 画像メモリ 702、 表示回路 703、 モニタ 704、 ROM 705、 R AM706、 CPU 707, I F (インターフェイス) 回路 708などから構成 されている。  Next, an application example of the image processing apparatus according to the present embodiment will be described with reference to FIG. FIG. 6 shows an example in which the image processing apparatus according to the present embodiment is applied to a viewer system. The system includes, for example, an image processing device 700, an ACZDC prediction Huffman 701, an image memory 702, a display circuit 703, a monitor 704, a ROM 705, a RAM 706, a CPU 707, an IF (interface) circuit 708 of the present embodiment. It is configured.
画像処理装置 700は、 画像メモリ 702および AC/DC予測ハフマン 70 1と接続され、 画像メモリ 702は表示回路 703と接続され、 表示回路 703 はモニタ 704と接続されている。 また、 ACZDC予測ハフマン 701、 RO M705、 RAM706、 CPU707、 I F回路 708はそれぞれパスを介し て接続されている。 I F回路 708はメモリカード 709と接続されている。 本システムは、 デジタルムービー力メラなどで撮影した M P E G画像をモニタ や T Vへ表示するシステムである。 The image processing device 700 is connected to the image memory 702 and the AC / DC prediction Huffman 701, the image memory 702 is connected to the display circuit 703, and the display circuit 703 is connected to the monitor 704. The ACZDC prediction Huffman 701, ROM 705, RAM 706, CPU 707, and IF circuit 708 are connected via paths, respectively. The IF circuit 708 is connected to the memory card 709. This system monitors MPEG images taken with digital movie cameras, etc. And a system to display on TV.
本システムは、 MP E Gの伸張のみの処理のため、 図 1で示した前記実施の形 態の画像処理装置のうち、 S I MD型計算機 3 0 0のみの構成で逆量子化と逆 D C Tを処理する。 動き検出、 D C T、 量子化、 逆量子化および逆 D C Tの処理を それぞれの S I MD型計算機で別々に処理するため、 必要な部分のみの S I MD 型計算機で画像処理装置を構成することができ、 小型化、 低消費電力化が可能と なる。  This system processes dequantization and inverse DCT with only the SIMD type computer 300 among the image processing devices of the embodiment shown in FIG. 1 for processing only MPEG expansion. I do. Since the processing of motion detection, DCT, quantization, inverse quantization and inverse DCT are processed separately by each SIMD computer, the image processing device can be configured with only the necessary parts of the SIMD computer. Smaller size and lower power consumption are possible.
したがって、 前記実施の形態の画像処理装置によれば、 M P E G動画圧縮では、 動き検出おょぴ D C T '量子化などの処理を各々別々の S I MD型計算機で処理 するため、 各処理に必要な演算器およびローカルメモリで構成することができ、 それぞれの処理に必要な性能に見合ったプロセッサ数で画像圧縮処理を実現す ることができる。 また、 動き検出や D C T ·量子化などを処理ごとにパイプライ ン動作で行うことができるため、 性能の向上を図ることができる。  Therefore, according to the image processing apparatus of the above-described embodiment, in MPEG moving image compression, processes such as motion detection and DCT 'quantization are processed by separate SIMD-type computers, respectively. Image compression processing can be realized with the number of processors that match the performance required for each processing. In addition, since motion detection, DCT, quantization, and the like can be performed by a pipeline operation for each process, performance can be improved.
また、 プロセッサ数が比較的小規模の S I MD型計算機の構成で、 M P E Gに 代表される動画圧縮 ·伸張などの画像処理が実時間で処理可能になるため、 従来 に比べて画素密度の大きい画像処理機能をデジタルビデオカメラなどの低消費 電力で駆動する携帯型家電に実装可能な半導体集積回路装置 (電子部品) により 実現することが可能となる。  In addition, with the configuration of a SIMD type computer with a relatively small number of processors, image processing such as video compression and decompression, such as MPEG, can be processed in real time. The processing function can be realized by a semiconductor integrated circuit device (electronic component) that can be mounted on a portable home appliance driven by low power consumption such as a digital video camera.
以上、 本発明者によってなされた発明をその実施の形態に基づき具体的に説明 したが、 本発明は前記実施の形態に限定されるものではなく、 その要旨を逸脱し ない範囲で種々変更可能であることはいうまでもない。  As described above, the invention made by the inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various changes can be made without departing from the gist of the invention. Needless to say, there is.
例えば、 前記実施の形態においては、 M P E Gの動画圧縮 ·伸張について説明 したが、 これに限定されるものではなく、 他の画像圧縮 ·伸張についても適用可 能である。  For example, in the above-described embodiment, the description has been given of the moving image compression / expansion of MPEG. However, the present invention is not limited to this.
以上の説明では、 主として本発明者によってなされた発明をその属する技術分 野である画像処理に適用した場合について説明したが、 これに限定されるもので はなく、 例えば、 その他の画像処理、 音声処理を始めとする行列演算を含む計算 アルゴリズムを処理する電子機器全般などに適用することも可能である。 産業上の利用可能性 In the above description, the case where the invention made by the present inventor is mainly applied to image processing, which is a technical field to which the invention belongs, has been described. However, the present invention is not limited to this. For example, other image processing, audio The present invention can also be applied to general electronic devices that process calculation algorithms including matrix operations including processing. Industrial applicability
以上のように、 本発明にかかる画像処理装置は、 例えば、 デジタルビデオカメ ラ、 ビデオデッキ、 情報端末などの動画圧縮.伸張を行う電子機器に用いるのに 適している。 また、 その他の画像処理、 音声処理を始めとする行列演算を含む計 算アルゴリズムを処理する電子機器全般に応用可能である。  As described above, the image processing apparatus according to the present invention is suitable for use in electronic devices that perform moving image compression / expansion, such as digital video cameras, VCRs, and information terminals. In addition, the present invention can be applied to all electronic devices that process calculation algorithms including matrix operations such as image processing and audio processing.

Claims

請 求 の 範 囲 The scope of the claims
1 . 差分演算器とローカルメモリとを含む複数の第 1のプロセッサと、 前記第 1 のプロセッサを制御する第 1の制御ュニットと、 前記第 1の制御ュニットからす ベての前記第 1のプロセッサへデータを伝送するブロードキャストデータパス と、 を有する第 1の S I MD型計算機と、 1. A plurality of first processors including a difference arithmetic unit and a local memory; a first control unit for controlling the first processor; and all the first processors from the first control unit A first SI MD type computer having a broadcast data path for transmitting data to
• 乗算器を含む複数の第 2のプロセッサと、 前記第 2のプロセッサを制御する第 2の制御ユニットと、 を有する一つまたは複数の第 2の S I MD型計算機と、 を 備え、  A plurality of second processors including a multiplier, a second control unit controlling the second processor, and one or a plurality of second SIMD computers having:
前記第 1の S I MD型計算機は、 前記第 1の制御ュ-ットからの単一命令によ り複数の前記第 1のプロセッサが並列に動作し、 画像処理における動き検出の処 理を行い、  In the first SIMD type computer, a plurality of the first processors operate in parallel by a single instruction from the first control unit, and perform motion detection processing in image processing. ,
前記第 2の S I MD型計算機は、 前記第 2の制御ュ-ットからの単一命令によ り複数の前記第 2のプロセッサが並列に動作し、 画像処理における離散コサイン 変換、 逆離散コサイン変換、 量子化または逆量子化の処理を行うことを特徴とす  In the second SIMD type computer, a plurality of the second processors operate in parallel by a single instruction from the second control cut, and perform discrete cosine transform and inverse discrete cosine in image processing. It performs transform, quantization or inverse quantization processing.
2 . 請求項 1記載の画像処理装置であって、 2. The image processing apparatus according to claim 1, wherein
前記第 1の S I MD型計算機と前記第 2の S I MD型計算機は、 複数 c ァを介して接続され、 前記第 1の S I MD型計算機と前記第 2の S I MD型計算 機は、 それぞれパイプライン並行処理を行うことを特徴とする画像処理装置。 The first SIMD type computer and the second SIMD type computer are connected via a plurality of computers, and the first SIMD type computer and the second SIMD type computer are each connected to a pipe. An image processing apparatus for performing line parallel processing.
3 . 請求項 2記載の画像処理装置であって、 3. The image processing apparatus according to claim 2, wherein
前記第 2の S I MD型計算機が複数の場合に、 複数の前記第 2の S I MD型計 算機は、 それぞれ複数のバッファを介して接続され、 前記第 1の S I MD型計算 機と複数の前記第 2の S I MD型計算機は、 それぞれパイプライン並行処理を行 うことを特徴とする画像処理装置。  When there are a plurality of the second SIMD computers, the plurality of second SIMD computers are connected via a plurality of buffers, respectively, and the plurality of second SIMD computers are connected to the plurality of the first SIMD computers. An image processing apparatus, wherein each of the second SIMD-type computers performs a pipeline parallel process.
4 . 請求項 2記載の画像処理装置であって、  4. The image processing apparatus according to claim 2, wherein
複数の前記バッファは、 前記第 1の S I MD型計算機の演算結果を前記第 2の S I MD型計算機内の複数の前記第 2のプロセッサへ並列に転送することを特 徴とする画像処理装置。 An image processing apparatus characterized in that the plurality of buffers transfer the calculation results of the first SIMD computer to the plurality of second processors in the second SIMD computer in parallel.
5 . 請求項 3記載の画像処理装置であって、 5. The image processing device according to claim 3, wherein
複数の前記バッファは、 前段の前記第 2の S I MD型計算機の演算結果を後段 の前記第 2の S I MD型計算機内の前記第 2のプロセッサへ並列に転送するこ とを特徴とする画像処理装置。  Image processing, wherein the plurality of buffers transfer, in parallel, a calculation result of the second-stage SIMD-type computer in the preceding stage to the second processor in the second-stage second-SIMD-type computer apparatus.
6 . 請求項 1〜 5のいずれか 1項に記載の画像処理装置であって、  6. The image processing device according to any one of claims 1 to 5, wherein
前記第 1の S I MD型計算機と 1つまたは複数の前記第 2の S I MD型計算 機の間で転送されるデータには、 プロック単位のデータ転送ごとにブロックの属 性を示すヘッダ情報が付加されていることを特徴とする画像処理装置。  Data transferred between the first SIMD type computer and one or more second SIMD type computers is added with header information indicating block attributes for each data transfer in block units. An image processing apparatus, comprising:
PCT/JP2003/010977 2003-08-28 2003-08-28 Image processing device WO2005025230A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2003/010977 WO2005025230A1 (en) 2003-08-28 2003-08-28 Image processing device
JP2005508742A JP4516020B2 (en) 2003-08-28 2003-08-28 Image processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2003/010977 WO2005025230A1 (en) 2003-08-28 2003-08-28 Image processing device

Publications (1)

Publication Number Publication Date
WO2005025230A1 true WO2005025230A1 (en) 2005-03-17

Family

ID=34260082

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2003/010977 WO2005025230A1 (en) 2003-08-28 2003-08-28 Image processing device

Country Status (2)

Country Link
JP (1) JP4516020B2 (en)
WO (1) WO2005025230A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007166192A (en) * 2005-12-13 2007-06-28 Toshiba Corp Information processing device, control method, and program
KR100863515B1 (en) * 2006-10-13 2008-10-15 연세대학교 산학협력단 Method and Apparatus for decoding video signal
US11893474B2 (en) 2015-10-23 2024-02-06 Semiconductor Energy Laboratory Co., Ltd. Semiconductor device and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06292178A (en) * 1993-03-31 1994-10-18 Sony Corp Adaptive video signal arithmetic processor
JPH06290262A (en) * 1993-03-31 1994-10-18 Sony Corp Processor for image codec
JPH07121687A (en) * 1993-10-20 1995-05-12 Sony Corp Processor for image codec and access pattern conversion method
JPH10191352A (en) * 1996-12-20 1998-07-21 Toshiba Corp Motion vector detector and its method
JPH11252549A (en) * 1998-02-27 1999-09-17 Toshiba Corp Image coding/decoding device
JP2003189312A (en) * 2001-12-20 2003-07-04 Oki Electric Ind Co Ltd Moving picture encoder and moving picture decoder

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH117432A (en) * 1997-06-16 1999-01-12 Hitachi Ltd Information processor and semiconductor device
US6671708B1 (en) * 1998-11-26 2003-12-30 Matsushita Electric Industrial Co., Ltd. Processor and image processing device
JP2002532810A (en) * 1998-12-15 2002-10-02 インテンシス・コーポレーション Programmable parallel computer for image processing functions and control
JP3676237B2 (en) * 1999-01-20 2005-07-27 株式会社ルネサステクノロジ Data processing apparatus and arithmetic unit
JP2001309386A (en) * 2000-04-19 2001-11-02 Mitsubishi Electric Corp Image processor
JP2002112258A (en) * 2000-09-27 2002-04-12 Sony Corp Noise reduction device and its method, and image recording and reproducing device
CN1297134C (en) * 2001-07-09 2007-01-24 三星电子株式会社 Moving estimating device and method for reference macro block window in scanning search area

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06292178A (en) * 1993-03-31 1994-10-18 Sony Corp Adaptive video signal arithmetic processor
JPH06290262A (en) * 1993-03-31 1994-10-18 Sony Corp Processor for image codec
JPH07121687A (en) * 1993-10-20 1995-05-12 Sony Corp Processor for image codec and access pattern conversion method
JPH10191352A (en) * 1996-12-20 1998-07-21 Toshiba Corp Motion vector detector and its method
JPH11252549A (en) * 1998-02-27 1999-09-17 Toshiba Corp Image coding/decoding device
JP2003189312A (en) * 2001-12-20 2003-07-04 Oki Electric Ind Co Ltd Moving picture encoder and moving picture decoder

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007166192A (en) * 2005-12-13 2007-06-28 Toshiba Corp Information processing device, control method, and program
KR100863515B1 (en) * 2006-10-13 2008-10-15 연세대학교 산학협력단 Method and Apparatus for decoding video signal
US11893474B2 (en) 2015-10-23 2024-02-06 Semiconductor Energy Laboratory Co., Ltd. Semiconductor device and electronic device

Also Published As

Publication number Publication date
JP4516020B2 (en) 2010-08-04
JPWO2005025230A1 (en) 2006-11-16

Similar Documents

Publication Publication Date Title
US8369419B2 (en) Systems and methods of video compression deblocking
US20100321579A1 (en) Front End Processor with Extendable Data Path
JP4704333B2 (en) Image encoding device, image decoding device, and integrated circuit used in the same
JP3401823B2 (en) Processor for image codec
JPH05300494A (en) Moving image coder and control system therefor
US8462848B2 (en) Method and system for intra-mode selection without using reconstructed data
Jilani et al. JPEG image compression using FPGA with Artificial Neural Networks
WO2005025230A1 (en) Image processing device
US7330595B2 (en) System and method for video data compression
CN109117945B (en) Processor and processing method thereof, chip packaging structure and electronic device
US7756351B2 (en) Low power, high performance transform coprocessor for video compression
Asbun et al. Real-time error concealment in digital video streams using digital signal processors
Baldev et al. A directional and scalable streaming deblocking filter hardware architecture for HEVC decoder
Gupta et al. An efficient modified lifting based 2-D discrete wavelet transform architecture
Vece et al. PK tool 2.0: a SystemC environment for high level power estimation
Barina et al. Single-loop approach to 2-D wavelet lifting with JPEG 2000 compatibility
Siddiqui et al. Investigation of a novel common subexpression elimination method for low power and area efficient DCT architecture
US9819951B2 (en) Image processing method, devices and system
US6374280B1 (en) Computationally efficient inverse discrete cosine transform method and apparatus
JP3653799B2 (en) Image encoding device
Furht Processor architectures for multimedia
CN114531600A (en) Transformation unit, field programmable gate array, chip, electronic device and system on chip
Ryuta Exploration of Divided DCNN for Object Recognition using both Cloud and Edge Computing
JP4476032B2 (en) Image compression apparatus, image expansion apparatus, and image processing apparatus
TW202244719A (en) Image transmission method and image transmission system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR SG US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2005508742

Country of ref document: JP

122 Ep: pct application non-entry in european phase