WO2005025230A1

WO2005025230A1 - Image processing device

Info

Publication number: WO2005025230A1
Application number: PCT/JP2003/010977
Authority: WO
Inventors: Hiroshi Takayanagi; Nobuhiro Seki; Osamu Mouri; Akihiro Makino; Masahiro Miura
Original assignee: Hitachi Ulsi Systems Co., Ltd.
Priority date: 2003-08-28
Filing date: 2003-08-28
Publication date: 2005-03-17
Also published as: JP4516020B2; JPWO2005025230A1

Abstract

A first SIMD computer (100) includes a plurality of processors (101 to 116) having at least a differential calculator and a local memory which processors are operated by a single instruction from a first control unit (140) and a broadcast bus (160) for transmitting data from the first control unit (140) to all the processors (101 to 116). The first SIMD computer (100) is connected to one or more second SIMD computers (200) including a plurality of processors (201 to 206) having at least a multiplier which processors are operated by a single instruction from a second control unit (240). The first SIMD computer (100) executes motion detection in the image processing and the one ore more second SIMD computers (200) execute DCT, inverse DCT, quantization, or de-quantization in the image processing.

Description

Technical field

The present invention relates to an image processing apparatus, and more particularly to a technique that is effective when applied to MPEG image compression / decompression.

Light

Background art

As a technique studied by the present inventor, for example, the following technique is conceivable in the compression and decompression of a book MPEG (MovPiccteurExErtsGroop).

ISO / IEC 14496-2 (MP EG 4), ISO 1 EC 13 8 18-2 (MP EG 2), ISO / IEC 1.11 7 2—2 (MPEG 1), etc. In the video compression standard, Huffman coding is performed by dividing a digital image into blocks and detecting motion vector for each block, discrete cosine transform (DCT), quantization, and ACZDC prediction. And compress the image data.

Of these, motion vector detection is based on the difference between the current frame and the macroblock of the 16x16 pixel macroblock (shift block) in the image frame from the macroblock range of the previous or next image frame. Detects the position of the smallest 16 x 16 pixel. Then, by using the position vector and the difference (frame difference) to perform DCT, quantization, AC / DC prediction, and Huffman coding, high moving image compression becomes possible.

Further, the expansion processing of the compressed data is realized by a procedure reverse to the above-described compression, that is, by generating a compensation image from Huffman decoding, AC / DC prediction, inverse quantization, inverse DCT, and motion vector information. .

Also, image processing including video compression involves many iterations of relatively simple calculation algorithms, and high data parallelism for the same instruction. Therefore, to speed up image processing In many cases, a SIMD (Single Instruction Multi-Datastream) single-instruction, multiple-data-flow parallel computing method is suitable.

An example of the realization of the S IMD type parallel computing architecture is a general-purpose computer. The architecture of this computer consists of multiple processors and one control system, and the control system operates by broadcasting common instructions and data to all processors. Each processor has a local memory and an arithmetic unit (multiplier, ALU, shifter, etc.). The control system has rewritable program memory and global memory. A broadcast data bus for transmitting data from the control system to all processors and a common bus for transmitting data to the control system via a tri-state buffer by any one of the processors specified by the address signal are used between the control system and the processors. Is transmitted and received.

The data transmitted by the control system to all processors via the broadcast data path can be either data on the memory in the control system or data received by the control system from any one processor via a common bus. Can be selected.

Instructions from the control system can be executed by specifying only one processor specified by the address signal. As a result, the control system can initialize the local memory of each processor, and it is possible to transmit 1-to-N (N is an arbitrary natural number) data between the control system and the processor.

In a general-purpose neurocomputer, each of multiple processors controlled by a single instruction broadcast from a control system is a unit that mimics a neuron (neural cell). Then, the operation of the neural network is imitated by calculating the weight value data in the local memory of each processor with respect to the input data broadcast to all the processors via the control system. In other words, by rewriting the control system program, it is possible to perform general-purpose parallel computation of the Euro algorithm, other neuro-algorithms, and other non-neuro-algorithms, as typified by pack propagation (error reverse propagation). Become. When the general-purpose euro computer is applied to motion vector detection in image processing, a plurality of processors can be used as a calculation unit of a shift block. That is, the shift block (a plurality of 16 x 16 pixels) in the motion vector detection range on the image one frame before is initialized in the local memory of the processor (the plurality). The motion vector detection image (a macroblock of 16 x 16 pixels) of the current frame is broadcast from the control system to all processors. Each processor calculates the frame difference between the broadcast data and the data in the local memory. The control system can detect the image position of the shift block set to the minimum processor as the motion vector position by comparing the frame differences of all processors. Disclosure of the invention

By the way, as a result of the present inventor's study on the above-mentioned technique of MPEG image compression / expansion, the following has become clear.

The amount of calculation processing for motion vector detection and DCT ′ reverse DCT in MPEG image compression / decompression is extremely large. In motion vector detection, the simplest “motion-compensated inter-frame coding” uses a shift block of 15 pixels around the macroblock one frame before as the detection range, which can be compared with the macroblock of the current frame. There are 9 6 1 Therefore, the amount of calculation for VGA (pixel size: 64 O X 480 pixels) is more than 278 mega-times, which is 8 giga-times / second when converted by processing 30 frames per second. Also, in the case of DCT and inverse DCT, 0.8 Giga-times / sec in terms of VGA size per 30 frames is required. When the above processing is performed by a single arithmetic unit of a semiconductor integrated circuit device, an operating frequency of 10 Giga-order is required, and it is very difficult to implement it directly in a low power consumption portable home appliance such as a digital video camera. Have difficulty.

In addition, when motion detection is performed using the architecture of the general-purpose neurocomputer, comparative image information for a macro block is stored in a local memory of each processor, macro block information is broadcasted, and the information in the local memory is compared. By performing the difference operation, parallel operation can be realized in each processor. In the DCT / inverse DCT processing, parallel processing can be realized by storing difference information for each block in the local memory of each processor and broadcasting DCT ′ inverse DCT coefficients.

However, if the above-mentioned motion detection and DCT processing are realized by only one SIMD type parallel computer, it is necessary to move the difference block information result of the processor that detected the motion to another processor for DCT processing. Occurs, and the overall processing performance decreases. In addition, motion detection requires a difference absolute value calculator in the processor, and DCT requires a multiplier. Therefore, if motion detection and DCT are processed by a single SIMD parallel computer, both the absolute difference calculator and the multiplier must be built into the processor, increasing the overall gate size.

Therefore, an object of the present invention is to provide an image processing apparatus which has a small circuit configuration and can be operated at high speed in image processing such as MPEG image compression / expansion.

The above and other objects and novel features of the present invention will become apparent from the description of the present specification and the accompanying drawings.

The outline of typical inventions disclosed in the present application will be briefly described as follows.

That is, in the image processing apparatus according to the present invention, at least a plurality of processors including a difference calculator and a local memory operate with a single instruction from the first control unit, and all the processors from the first control unit to all the processors. A first SIMD-type computer with a broadcast data path for transmitting data, and one or more second processors in which at least a plurality of processors including multipliers operate with a single instruction from a second control unit. The first S IMD type computer is connected to an S IMD type computer, and the first S IMD type computer performs motion detection processing in image processing, and one or more second S IMD type computers are used for DCT, inverse DCT, and quantum in image processing. Or quantization processing.

Further, in the above configuration, the operation result of the first SIMD type computer is transmitted in parallel to the processor in the second SIMD type computer via a buffer, and the processor performs the processing in parallel for each image block unit. Perform processing.

Also, for each data transmission in block units, header information (block) indicating the attribute of the block. By adding the information on the vector of the lock shift, the IP, B frame information, etc.), each processor of the second S IMD type computer can determine each header information and efficiently perform the processing suitable for the block. It can be configured to be able to be performed.

The effects obtained by typical aspects of the invention disclosed in the present application will be briefly described as follows.

(1) In MPEG video compression, motion detection processing and DCT / quantization processing are performed by separate SIMD-type computers. However, image compression processing can be realized with the number of processors corresponding to the performance required for each processing.

(2) Performance can be improved because motion detection and DCT / quantization can be performed by pipeline operation for each process.

(3) An SIMD computer with a relatively small number of processors enables real-time image processing, such as MPEG compression and decompression, as represented by MPEG. Semiconductor integrated circuits (electronic components) that can be mounted on low-power-consumption portable home appliances can realize image processing functions with a higher pixel density than before. Brief Description of Drawings

FIG. 1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention. FIG. 2 is a block diagram showing an internal configuration of the SIMD type computer 100 included in the image processing apparatus according to one embodiment of the present invention.

FIG. 3 is an explanatory diagram showing a motion detection processing procedure in the SIMD computer 100 in the image processing apparatus according to the embodiment of the present invention.

FIG. 4 is an explanatory diagram showing a DCT / quantization processing procedure in the SIMD type computer 200 in the image processing apparatus according to the embodiment of the present invention.

FIG. 5 is an explanatory diagram showing an inverse DCT / inverse quantization procedure in the SIMD type computer 300 in the image processing apparatus according to the embodiment of the present invention.

FIG. 6 is a block diagram showing a configuration of a viewer V to which the image processing device according to one embodiment of the present invention is applied. BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In all the drawings for describing the embodiments, the same members are denoted by the same reference numerals, and the description thereof will not be repeated.

First, an example of a configuration of an image processing apparatus according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention. The image processing apparatus according to the present embodiment is, for example, an image compression system, and includes a SIMD-type computer 100, a SIMD-type computer 200, a SIMD-type computer 300, and buffers 401 to 400. , 501 to 506, 601 to 606, and the like.

The SIMD type computer 100 is composed of a processor array 130 composed of a plurality of processors 101 to 116 including an arithmetic unit (for example, a difference arithmetic unit) and a local memory, a control unit 140, and the like. It is configured.

The SIMD type computer 200 is composed of a processor array 230 consisting of a plurality of processors 201 to 206 including an arithmetic unit (for example, a multiplier) and a local memory, and a control unit 240. It is configured.

The SIMD type computer 300 includes a processor array 330 composed of a plurality of processors 301 to 360 including an arithmetic unit (for example, a multiplier) and a local memory, and a control unit 340. It is composed of

The SI MD computer 100 and the SI MD computer 200 are electrically connected to each other via a plurality of buffers 40:! To 406, and the control unit in the SI MD computer 100 The output of the host 140 is input in parallel to the plurality of buffers 401 to 406, and the output of the buffer 40 :! to 406 is output to the plurality of processors 200 in the SIMD type computer 200. ~ 206 are input in parallel. The SI MD type computer 200 and the SI MD type computer 300 are electrically connected to each other via a plurality of buffers 501 to 506, and a plurality of The outputs of the processors 201 to 206 are input in parallel to a plurality of buffers 50 :! to 506, and the outputs of the buffers 501 to 506 are output to a plurality of buffers in the SIMD type computer 300. The signals are input to the processors 301 to 303 in parallel. SI MD type Calculator 3 0 0 Multiple processors 3 0 1 to 3 6 Outputs multiple buffers 6 Enter 01 to 606. The output of the buffers 501 to 506 is AC /

The outputs are transmitted to the DC prediction and the Huffman processing, and the outputs 601 to 606 are transmitted to the compensation image generation processing.

In the S IMD type computer 10◦, each processor 101 to 116 in the processor array 130 and the control unit 140 are electrically connected by an instruction bus 150, a broadcast data bus 160, a processor data output common bus 170, and the like. ing. In the S IMD type computer 200, the processors 201 to 206 and the control unit 240 are electrically connected by an instruction bus 250 or the like. In the S IMD type computer 300, the processors 301 to 306 and the control unit 340 are electrically connected to each other by an instruction bus 350 or the like.

In FIG. 1, the SIMD computers 100 to 300 have three stages, but may have two or four or more stages. The number of processors 101 to 116 in the SIMD type computer 100 is 16, but any number is acceptable. The processors 201 to 206 and 301 to 306 and the buffers 401 to 406, 501 to 506, and 601 to 606 in the 310-type computers 200 and 300 are respectively six in parallel, but how many are in parallel? You may.

FIG. 2 shows a detailed configuration of the SIMD type computer 100. The SMDD computer 100 includes a plurality of memory units 121 to 129 in addition to the control unit 140 and the processor array 130. Local memory and memory cut 12 in processors 101-116! 129 are composed of RAM (memory).

Within the processor array 130, the processors 101 to 116 are arranged in a matrix, and the local memory in each processor 1 to 1 to 116 is connected to other processors in the upper, lower, left, and right directions, and can shift arithmetic data forward, backward, left, and right It is as follows. The local memories of the processors 104, 108, 1 12, 1 13, 1 14, 1 15, and 1 16 located at the end of the processor array 130 include a memory unit 121 arranged around the processor array 130. To 129, and the arithmetic units can shift the operation data with the memory units 121 to 129. The control unit 140 and the arithmetic units in all the processors 101 to 116 are connected to the instruction bus 150 and the A command and data are output from the control unit 140 to all the processors 101 to 116 via a broadcast data path 160. The outputs of the arithmetic units in all the processors 101 to 116 are connected to the control unit 140 via a tri-state buffer and a processor data output common bus 170, and the arithmetic data of the arithmetic units in each of the processors 101 to 116 are output. Is output to the control unit 140.

Each of the memory units 121 to 129 is connected to another adjacent memory cut, so that data can be shifted between the memory units. Further, each of the memory units 121 to 129 and the control unit 140 are connected via a memory common bus 180.

Further, the control unit 140 is connected to an external control (main CPU) and an external memory (image data).

Next, the operation of the image processing apparatus according to the present embodiment will be described with reference to FIG. First, the SIMD computer 100 performs a motion detection process in image processing. Then, the SMDD computer 10.0 outputs the difference information and the motion vector information for each block, which are the result of the motion detection processing, to the buffers 401 to 406 in units of blocks. After outputting the difference information and the motion vector information to the buffers 401 to 406, the SIMD computer 100 performs a motion detection process for the next macroblock.

In the S I MD type computer 200, the processors 201 to 206 perform the DCT operation in the image processing in parallel. At this time, the processors 201 to 206 take in the difference information for each block in the buffers 401 to 406 and perform a DCT operation. Next, based on the calculation result, the SIMD type computer 200 performs the quantization process in parallel by the processors 201-206. After the DCT operation and quantization processing of each block are completed, the processors 201 to 2◦6 output the processing results to the buffers 501 to 506 in parallel with the motion vector information. The motion vector information of each block in the buffers 501 to 506 and the data after quantization processing are subjected to AC / DC processing and Huffman processing, and output as compressed data.

The motion vector information of each block in the buffers 501 to 506 and the data after the quantization processing are output to the SIMD type computer 300 for generating a compensation image. In the S IMD type computer 300, the processors 301 to 306 perform inverse quantization on each block in parallel. Next, based on the processing result, the processors 301 to 306 perform the inverse DCT operation in parallel. After the inverse quantization processing and the inverse DCT operation of each block are completed, the processors 301 to 306 output the processing results to the buffers 601 to 606 in parallel with the motion vector information. The motion vector information of each block in the buffers 601 to 606 and the data after the inverse DCT calculation are used for the compensation image generation processing.

In the SI MD computers 100, 200, and 300, each processing (motion detection, DCT operation, quantization, inverse quantization, inverse DCT operation) is performed via buffers 401 to 406, 501 to 506, and 601 to 606. Since it is performed, pipeline parallel processing is possible, and the overall processing is accelerated.

In the present embodiment, DCT calculation and quantization processing are performed by the S IMD computer 200, and inverse quantization processing and inverse DCT calculation are performed by the S IMD computer 300. DCT operation, quantization processing, inverse quantization processing, and inverse DCT operation may be performed by a type computer, or each processing may be shared and executed by three or more SIMD type computers. Good.

In addition, the notifiers 401 to 406, 501 to 506, and 601 to 606 include not only the calculation processing results and the vector information of the block, but also the attributes of each block, such as whether or not the difference processing was performed with the comparison image. Information can be written, and each processor can determine the information and execute different arithmetic processing.

Next, referring to FIG. 3, the procedure of the motion detection processing in the SIMD computer 100 will be described. Fig. 3 (a) shows the order of motion detection processing in macroblock units for the entire image, and Fig. 3 (b) shows the processing flow for each macroblock.

As shown in Fig. 3 (a), the whole image (current image) is divided into macroblocks (16 x 16 pixels) and processed for each macroblock.

As shown in Fig. 3 (b), the macro block is composed of luminance (Y0, Y1, Y2, Y3) and color difference (U, V). Each of Y0, Υ1, Υ2, Y3, U, and V is composed of 8 × 8 color elements. After dividing into macro blocks, motion detection processing is performed for each macro block. The motion detection processing is performed by detecting a difference from the comparative image.

Therefore, the information after the motion detection processing with the comparative image is the difference value information (ΥΟ ', Y1, Y, Y2, Y3, U, V') for each block Y0, Y1, Y2, Y3, U, V. ) And motion vector information.

The above block information is output to the buffers 401 to 406.

Next, referring to FIG. 4, a description will be given of a procedure of DCT operation and quantization processing in the SIMD type computer 200.

The difference value information (YO,, Υ1 ', Υ2,, Y3', U,, V,) and motion vector information of each block Y0, Yl, Υ2, Y3, U, V are obtained from buffers 401 to 406. The signals are input to the processors 201 to 206 in parallel, processed in parallel, and the processing results are output to the buffers 501 to 506 in parallel.

Next, the procedure of the inverse quantization process and the inverse DCT operation in the SIMD type computer 300 will be described with reference to FIG.

The processing results and motion vector information of each block Υ0, Υ1, Υ2, ， 3, U, V are input in parallel from the buffers 501 to 506 to the processors 301 to 306, and are processed in parallel, and the processing results are buffered. Output to 601 to 606 in parallel.

Next, an application example of the image processing apparatus according to the present embodiment will be described with reference to FIG. FIG. 6 shows an example in which the image processing apparatus according to the present embodiment is applied to a viewer system. The system includes, for example, an image processing device 700, an ACZDC prediction Huffman 701, an image memory 702, a display circuit 703, a monitor 704, a ROM 705, a RAM 706, a CPU 707, an IF (interface) circuit 708 of the present embodiment. It is configured.

The image processing device 700 is connected to the image memory 702 and the AC / DC prediction Huffman 701, the image memory 702 is connected to the display circuit 703, and the display circuit 703 is connected to the monitor 704. The ACZDC prediction Huffman 701, ROM 705, RAM 706, CPU 707, and IF circuit 708 are connected via paths, respectively. The IF circuit 708 is connected to the memory card 709. This system monitors MPEG images taken with digital movie cameras, etc. And a system to display on TV.

This system processes dequantization and inverse DCT with only the SIMD type computer 300 among the image processing devices of the embodiment shown in FIG. 1 for processing only MPEG expansion. I do. Since the processing of motion detection, DCT, quantization, inverse quantization and inverse DCT are processed separately by each SIMD computer, the image processing device can be configured with only the necessary parts of the SIMD computer. Smaller size and lower power consumption are possible.

Therefore, according to the image processing apparatus of the above-described embodiment, in MPEG moving image compression, processes such as motion detection and DCT 'quantization are processed by separate SIMD-type computers, respectively. Image compression processing can be realized with the number of processors that match the performance required for each processing. In addition, since motion detection, DCT, quantization, and the like can be performed by a pipeline operation for each process, performance can be improved.

In addition, with the configuration of a SIMD type computer with a relatively small number of processors, image processing such as video compression and decompression, such as MPEG, can be processed in real time. The processing function can be realized by a semiconductor integrated circuit device (electronic component) that can be mounted on a portable home appliance driven by low power consumption such as a digital video camera.

As described above, the invention made by the inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various changes can be made without departing from the gist of the invention. Needless to say, there is.

For example, in the above-described embodiment, the description has been given of the moving image compression / expansion of MPEG. However, the present invention is not limited to this.

In the above description, the case where the invention made by the present inventor is mainly applied to image processing, which is a technical field to which the invention belongs, has been described. However, the present invention is not limited to this. For example, other image processing, audio The present invention can also be applied to general electronic devices that process calculation algorithms including matrix operations including processing. Industrial applicability

As described above, the image processing apparatus according to the present invention is suitable for use in electronic devices that perform moving image compression / expansion, such as digital video cameras, VCRs, and information terminals. In addition, the present invention can be applied to all electronic devices that process calculation algorithms including matrix operations such as image processing and audio processing.

Claims

The scope of the claims

1. A plurality of first processors including a difference arithmetic unit and a local memory; a first control unit for controlling the first processor; and all the first processors from the first control unit A first SI MD type computer having a broadcast data path for transmitting data to

A plurality of second processors including a multiplier, a second control unit controlling the second processor, and one or a plurality of second SIMD computers having:

In the first SIMD type computer, a plurality of the first processors operate in parallel by a single instruction from the first control unit, and perform motion detection processing in image processing. ,

In the second SIMD type computer, a plurality of the second processors operate in parallel by a single instruction from the second control cut, and perform discrete cosine transform and inverse discrete cosine in image processing. It performs transform, quantization or inverse quantization processing.

2. The image processing apparatus according to claim 1, wherein

The first SIMD type computer and the second SIMD type computer are connected via a plurality of computers, and the first SIMD type computer and the second SIMD type computer are each connected to a pipe. An image processing apparatus for performing line parallel processing.

3. The image processing apparatus according to claim 2, wherein

When there are a plurality of the second SIMD computers, the plurality of second SIMD computers are connected via a plurality of buffers, respectively, and the plurality of second SIMD computers are connected to the plurality of the first SIMD computers. An image processing apparatus, wherein each of the second SIMD-type computers performs a pipeline parallel process.

4. The image processing apparatus according to claim 2, wherein

An image processing apparatus characterized in that the plurality of buffers transfer the calculation results of the first SIMD computer to the plurality of second processors in the second SIMD computer in parallel.

5. The image processing device according to claim 3, wherein

Image processing, wherein the plurality of buffers transfer, in parallel, a calculation result of the second-stage SIMD-type computer in the preceding stage to the second processor in the second-stage second-SIMD-type computer apparatus.

6. The image processing device according to any one of claims 1 to 5, wherein

Data transferred between the first SIMD type computer and one or more second SIMD type computers is added with header information indicating block attributes for each data transfer in block units. An image processing apparatus, comprising: