JP2008004983A

JP2008004983A - Image processing apparatus and method, program, and recording medium

Info

Publication number: JP2008004983A
Application number: JP2006169646A
Authority: JP
Inventors: Junichi Tanaka; 潤一田中; Kazufumi Sato; 数史佐藤; Toru Okazaki; 透岡崎; Yoichi Yagasaki; 陽一矢ヶ崎
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2006-06-20
Filing date: 2006-06-20
Publication date: 2008-01-10
Anticipated expiration: 2026-06-20
Also published as: JP4775132B2

Abstract

<P>PROBLEM TO BE SOLVED: To enhance the satisfaction of a user viewing a reproduced image while suppressing a bit rate of coded image data to a low rate. <P>SOLUTION: A caption detection section 155 determines whether a macroblock of image data supplied from an image rearrangement buffer 142 includes a caption, and informs a rate control section 154, a motion prediction/compensation section 153, and a quantization section 145 of information associated with the macroblock including the caption. Upon the receipt of a notice from the caption detection section 155, each of the rate control section 154, the motion prediction/compensation section 153, and the quantization section 145 is operated in a way of not deteriorating images of characters when the macro block of the caption is coded, and the image data are coded. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、画像処理装置および方法、プログラム、並びに記録媒体に関し、特に、符号化された画像データのビットレートを低く抑えながら、再生された画像を見たユーザの満足度を高めることができるようにする画像処理装置および方法、プログラム、並びに記録媒体に関する。 The present invention relates to an image processing apparatus and method, a program, and a recording medium, and in particular, can improve the satisfaction of a user who viewed a reproduced image while keeping the bit rate of encoded image data low. The present invention relates to an image processing apparatus and method, a program, and a recording medium.

近年、画像情報をデジタルとして取り扱い、その際、効率の高い情報の伝送、蓄積を目的とし、画像情報特有の冗長性を利用して、離散コサイン変換等の直交変換と動き補償により圧縮するＭＰＥＧ（Moving Picture Coding Experts Group）などの方式に準拠した装置が、放送局などの情報配信、及び一般家庭における情報受信の双方において普及しつつある。 In recent years, MPEG (compressed by orthogonal transform such as discrete cosine transform and motion compensation is used for the purpose of efficiently transmitting and storing information, and using redundancy unique to image information. A device conforming to a scheme such as Moving Picture Coding Experts Group) is becoming popular in both information distribution in broadcasting stations and information reception in general households.

特に、ＭＰＥＧ２（ＩＳＯ／ＩＥＣ１３８１８−２）は、汎用画像符号化方式として規定されており、飛び越し走査画像及び順次走査画像の双方、並びに標準解像度画像及び高精細画像を網羅する標準で、プロフェッショナル用途（業務用）及びコンシューマー用途の広範なアプリケーションに現在広く用いられている。ＭＰＥＧ２の符号化方式を用いることにより、例えば７２０×４８０画素を持つ標準解像度の飛び越し走査画像であれば４〜８Ｍｂｐｓ、１９２０×１０８８画素を持つ高解像度の飛び越し走査画像であれば１８〜２２Ｍｂｐｓの符号量（ビットレート）を割り当てることで、高い圧縮率と良好な画質の実現が可能である。 In particular, MPEG2 (ISO / IEC 13818-2) is defined as a general-purpose image coding system, and is a standard that covers both interlaced scanning images and progressive scanning images, standard resolution images, and high-definition images. Widely used in a wide range of applications for (business) and consumer use. By using the MPEG2 encoding method, for example, a standard resolution interlaced scanning image having 720 × 480 pixels is 4 to 8 Mbps, and a high resolution interlaced scanning image having 1920 × 1088 pixels is 18 to 22 Mbps. By assigning an amount (bit rate), it is possible to achieve a high compression rate and good image quality.

ＭＰＥＧ２は主として放送に適した高画質の符号化を対象としていたので、ＭＰＥＧ１より低い符号量（ビットレート）、すなわち、より高い圧縮率の符号化方式には対応していなかった。今後は、携帯端末などの普及により、より低い符号量の符号化方式のニーズが高まると思われ、これに対応してＭＰＥＧ４符号化方式の標準化が行われた。ＭＰＥＧ４の画像符号化方式に関しては、１９９８年１２月にＩＳＯ／ＩＥＣ１４４９６−２としてその規格が国際標準として規定された。 Since MPEG2 was mainly intended for high-quality encoding suitable for broadcasting, it did not support encoding methods having a lower code amount (bit rate) than MPEG1, that is, a higher compression rate. In the future, with the widespread use of portable terminals and the like, the need for a lower code amount encoding method is expected to increase, and the MPEG4 encoding method has been standardized accordingly. Regarding the MPEG4 image encoding system, the standard was defined as an international standard in December 1998 as ISO / IEC 14496-2.

更に、近年、当初テレビ会議用の画像符号化を目的として策定された、Ｈ.２６Ｌ（ＩＴＵ−ＴＱ６／１６ＶＣＥＧ）という標準が注目されている。Ｈ．２６ＬはＭＰＥＧ２やＭＰＥＧ４といった従来の符号化方式に比べ、その符号化または復号に、より多くの演算量が要求されるものの、より高い符号化効率が実現されることが知られている。また、現在、ＭＰＥＧ４の活動の一環として、このＨ．２６Ｌをベースに、Ｈ．２６Ｌではサポートされない機能をも取り入れ、より高い符号化効率を実現する符号化方式の標準化がＪｏｉｎｔＭｏｄｅｌｏｆＥｎｈａｎｃｅｄ−ＣｏｍｐｒｅｓｓｉｏｎＶｉｄｅｏＣｏｄｉｎｇとして行われている。２００３年３月にはH.２６４/ＡＶＣ（ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｉｎｇ）という国際標準が制定されている。 Furthermore, in recent years, a standard called H.26L (ITU-T Q6 / 16 VCEG), which was originally formulated for the purpose of image coding for video conferencing, has attracted attention. H. 26L is known to achieve higher encoding efficiency than the conventional encoding schemes such as MPEG2 and MPEG4, although a larger amount of computation is required for encoding or decoding. In addition, as part of MPEG4 activities, this H.264 Based on H.26L Standardization of an encoding method that incorporates a function that is not supported by 26L and realizes higher encoding efficiency is performed as Joint Model of Enhanced-Compression Video Coding. In March 2003, an international standard called H.264 / AVC (Advanced Video Coding) was established.

また、MPEG２、MPEG4、H.264/AVCなどの符号化方式による画像の符号化にあたっては、より高い符号化効率を得るためにビットレートの調整を行うことが一般的である。すなわち、画像の中の所定のピクチャ、またはマクロブロックを符号化する場合、通常、そのピクチャ、またはマクロブロックに割り当てられるビット数が少なくなるように符号化される。 In addition, when encoding an image using an encoding method such as MPEG2, MPEG4, or H.264 / AVC, it is common to adjust the bit rate in order to obtain higher encoding efficiency. That is, when a predetermined picture or macroblock in an image is encoded, encoding is usually performed so that the number of bits allocated to the picture or macroblock is reduced.

このようなビットレートの調整（レート制御）の代表的な方式として、例えば、MPEG-2 TestModel5（TM5）があげられる。ＴＭ５のレート制御方法は、各ピクチャへのビット配分を行うステップ1、仮想バッファ制御を用いたレート制御を行うステップ２、及び、視覚特性を考慮した適応量子化を行うステップ３の３つの階層から構成されている。 As a typical method for adjusting the bit rate (rate control), for example, MPEG-2 TestModel5 (TM5) can be mentioned. The TM5 rate control method has three layers: Step 1 for distributing bits to each picture, Step 2 for performing rate control using virtual buffer control, and Step 3 for performing adaptive quantization considering visual characteristics. It is configured.

ステップ１では、ＧＯＰ（ＧｒｏｕｐｏｆＰｉｃｔｕｒｅｓ）内の各ピクチャに対する割当ビット量を、割当対象ピクチャを含めてＧＯＰ内で、未だ符号化が行われていないピクチャに対して割り当てられるビット量を元にして配分する。 In step 1, the allocated bit amount for each picture in the GOP (Group of Pictures) is determined based on the bit amount allocated to a picture that has not yet been encoded in the GOP including the allocation target picture. To distribute.

ステップ２では、ステップ１で求められた各ピクチャに対する割当ビット量と、実際の発生符号量を一致させるため、ピクチャタイプごとに独立に設定した３種類の仮想バッファの容量を元に、量子化スケールをマクロブロック単位のフィードバック制御により求める。 In step 2, in order to match the allocated bit amount for each picture obtained in step 1 with the actual generated code amount, the quantization scale is based on the capacity of three types of virtual buffers set independently for each picture type. Is obtained by feedback control in units of macroblocks.

ステップ３では、ステップ２で求められた量子化スケールについて、視覚的に劣化の目立ちやすい平坦部ではより細かく量子化され、劣化の比較的目立ちにくい絵柄の複雑な部分でより粗く量子化されるように、各マクロブロックのアクティビティによって変化させる。すなわち、符号化されたときの割り当てビット量が大きくなりやすいアクティビティの高いマクロブロックにおいては、大きい量子化スケールが設定されるように、量子化スケールを変化させ、その結果、符号化された画像のデータにおいてビット数ができるだけ少なくなるように（ビットレートが低くなるように）制御されることになる。 In step 3, the quantization scale obtained in step 2 is quantized more finely in the flat portion where deterioration is visually noticeable, and coarser in the complicated portion of the pattern where deterioration is relatively inconspicuous. And change according to the activity of each macroblock. That is, in a macroblock with a high activity that tends to have a large allocated bit amount when encoded, the quantization scale is changed so that a large quantization scale is set. Control is performed so that the number of bits in the data is as small as possible (to reduce the bit rate).

さらに、入力された画像に含まれるオブジェクトに応じて画像の圧縮率を変化させ、圧縮率を効率的に高めることも提案されている（例えば、特許文献１参照）。 Furthermore, it has been proposed to efficiently increase the compression rate by changing the compression rate of the image according to the object included in the input image (see, for example, Patent Document 1).

また、これらの符号化方式の普及に伴い、ある符号化方式で符号化されたデータを、他の符号化方式で符号化されたデータに変換するトランスコードと呼ばれる技術も重要となる。
特開２００５−１０９６０６号公報 In addition, with the widespread use of these encoding methods, a technique called transcoding that converts data encoded by a certain encoding method into data encoded by another encoding method becomes important.
JP 2005-109606 A

ところで、画像に含まれるテロップなどのキャプション部分においては、エッジが多く含まれる画像（文字など）が表示されることになり、キャプション部分のアクティビティは高いものとなる。MPEG２、MPEG4、H.264/AVCなどの符号化方式による画像の符号化する場合、ＴＭ５のレート制御が行われるとき、画像に含まれるキャプション部分では、高いアクティビティが検出され、大きい量子化スケールが設定されることになる。 By the way, in a caption portion such as a telop included in an image, an image (characters or the like) including many edges is displayed, and the activity of the caption portion is high. When encoding an image using an encoding method such as MPEG2, MPEG4, or H.264 / AVC, when TM5 rate control is performed, high activity is detected in the caption portion included in the image, and a large quantization scale is set. Will be set.

量子化スケールが大きい場合、符号化された画像データを復号して得られる画像において、符号化される前の画像を正確に再生することが難しくなるが、一般に人間の視覚特性は、エッジの少ない低周波成分に敏感であるため、エッジが多く含まれる画像を符号化する場合、量子化スケールを大きく設定することは、符号化されたデータのビット数を少なくする上で効果的な方式と言える。 When the quantization scale is large, it is difficult to accurately reproduce the image before being encoded in the image obtained by decoding the encoded image data. However, in general, human visual characteristics have few edges. Because it is sensitive to low-frequency components, when encoding an image with many edges, setting a large quantization scale is an effective method for reducing the number of bits of encoded data. .

しかしながら、キャプション部分には、文字などが表示されており、復号された画像を見るユーザは、通常、他の部分と比較してキャプション部分を、より注意して見ることになり、キャプション部分における画像の劣化は、ユーザに意識されやすい。このため、従来、エンコーダにおいて、ビットレートを低く抑えながら、より自然で美しい画像を再生できるように符号化しても、再生された画像を見たユーザに、画質が低いという印象を与えてしまう場合があった。 However, characters and the like are displayed in the caption portion, and a user who views the decoded image usually looks at the caption portion more carefully than other portions, and the image in the caption portion is displayed. The deterioration of the image is easily noticed by the user. For this reason, even when encoding is performed so that a more natural and beautiful image can be reproduced while keeping the bit rate low in the conventional encoder, the user who viewed the reproduced image has an impression that the image quality is low. was there.

本発明はこのような状況に鑑みてなされたものであり、符号化された画像データのビットレートを低く抑えながら、再生された画像を見たユーザの満足度を高めることができるようにするものである。 The present invention has been made in view of such a situation, and enables the satisfaction of a user who viewed a reproduced image to be increased while keeping the bit rate of encoded image data low. It is.

本発明の一側面は、MPEG（Moving Picture Coding Experts Group）4、またはH.264／AVC（Advanced Video Coding）方式で画像データの符号化を行う画像処理装置であって、符号化すべき前記画像データを取得する画像データ取得手段と、前記画像データ取得手段により取得された前記画像データの画像に、キャプションが含まれているか否かを判定する判定手段と、前記画像データを量子化するための量子化パラメータを、前記画像の特徴量に応じて変化させることで、符号化された前記画像データのビットレートを制御するレート制御手段とを備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記レート制御手段が、前記画像においてキャプションが表示されている部分の複数の画素で構成されるブロックに対して設定される量子化パラメータを、前記画像の特徴量に係らず所定の値とする画像処理装置である。 One aspect of the present invention is an image processing apparatus that encodes image data using MPEG (Moving Picture Coding Experts Group) 4 or H.264 / AVC (Advanced Video Coding) system, and the image data to be encoded Image data acquisition means for acquiring image data, determination means for determining whether or not a caption is included in the image of the image data acquired by the image data acquisition means, and quantum data for quantizing the image data A rate control unit that controls a bit rate of the encoded image data by changing a conversion parameter according to a feature amount of the image, and a caption is added to the image of the image data by the determination unit. If it is determined that the image is included, the rate control means is a block composed of a plurality of pixels in the portion where the caption is displayed in the image The quantization parameter that is set for an image processing device for a predetermined value irrespective of the feature value of the image.

符号化すべき前記画像データの画像の動きに応じた、前記符号化すべき前記画像データの画像に対応する予測画像の画像データを生成する予測画像データ生成手段をさらに備え、
前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記予測画像データ生成手段が、前記画像においてキャプションが表示されている部分の複数の画素で構成されるブロックに対して設定される動きベクトルを、予め定められた範囲の値とするようにすることができる。 A prediction image data generation unit configured to generate image data of a prediction image corresponding to an image of the image data to be encoded, according to a motion of an image of the image data to be encoded;
When the determination unit determines that a caption is included in the image of the image data, the predicted image data generation unit includes a plurality of pixels in a portion where the caption is displayed in the image. The motion vector set for the block can be a value in a predetermined range.

符号化すべき前記画像データの画像の動きに応じた、前記符号化すべき前記画像データの画像に対応する予測画像の画像データを生成する予測画像データ生成手段をさらに備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記予測画像データ生成手段が、動きベクトルを設定するブロックであって、前記画像においてキャプションが表示されている部分の複数の画素で構成される前記ブロックのサイズを、予め設定されたサイズより大きいサイズとするようにすることができる。 A prediction image data generation unit configured to generate image data of a prediction image corresponding to the image of the image data to be encoded, according to the motion of the image of the image data to be encoded; When it is determined that a caption is included in the image of the data, the predicted image data generation means is a block for setting a motion vector, and a plurality of pixels in the portion where the caption is displayed in the image The size of the block to be configured may be larger than a preset size.

符号化すべき前記画像データの画像の動きに応じた、前記符号化すべき前記画像データの画像に対応する予測画像の画像データを生成する予測画像データ生成手段をさらに備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記予測画像データ生成手段が、前記画像においてキャプションが表示されている部分の複数の画素で構成される前記ブロックに対応するブロックを含む前記予測画像データを生成するために用いられる画像のフィールドを、前記符号化すべき前記画像データの画像と同じフィールドとするようにすることができる。 A prediction image data generation unit configured to generate image data of a prediction image corresponding to the image of the image data to be encoded, according to the motion of the image of the image data to be encoded; When it is determined that a caption is included in the image of the data, the predicted image data generation unit selects a block corresponding to the block composed of a plurality of pixels in the portion where the caption is displayed in the image. The field of the image used to generate the predicted image data including the same may be the same field as the image of the image data to be encoded.

符号化すべき前記画像データの画像の動きに応じた、前記符号化すべき前記画像データの画像に対応する予測画像の画像データを生成する予測画像データ生成手段をさらに備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記予測画像データ生成手段が、前記画像においてキャプションが表示されている部分の複数の画素で構成されるマクロブロックに対して設定されるマクロブロックモードを、スキップト・マクロブロックとするようにすることができる。 A prediction image data generation unit configured to generate image data of a prediction image corresponding to the image of the image data to be encoded, according to the motion of the image of the image data to be encoded; When it is determined that a caption is included in the image of the data, the predicted image data generation unit is set for a macroblock including a plurality of pixels in a portion where the caption is displayed in the image. The macroblock mode to be used can be a skipped macroblock.

符号化すべき前記画像データの画像の動きに応じた、前記符号化すべき前記画像データの画像に対応する予測画像の画像データを生成する予測画像データ生成手段をさらに備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記予測画像データ生成手段が、前記画像においてキャプションが表示されている部分の複数の画素で構成される前記ブロックに対応する前記予測画像データを生成するために用いられる画像を、前記符号化すべき前記画像データの画像より時間的に前の画像、または前記画像データの画像より時間的に後の画像のうちのいずれか一方とするようにすることができる。 A prediction image data generation unit configured to generate image data of a prediction image corresponding to the image of the image data to be encoded, according to the motion of the image of the image data to be encoded; When it is determined that a caption is included in the image of the data, the prediction image data generation unit is configured to perform the prediction corresponding to the block including a plurality of pixels in a portion where the caption is displayed in the image. The image used for generating the image data is one of an image temporally preceding the image of the image data to be encoded and an image temporally subsequent to the image of the image data. Can be.

符号化すべき前記画像データの画像の動きに応じた、前記符号化すべき前記画像データの画像に対応する予測画像の画像データを生成する予測画像データ生成手段をさらに備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記予測画像データ生成手段が、前記予測画像データの画素精度を、整数画素精度、または１／２画素精度とするようにすることができる。 A prediction image data generation unit configured to generate image data of a prediction image corresponding to the image of the image data to be encoded, according to the motion of the image of the image data to be encoded; When it is determined that a caption is included in the image of the data, the predicted image data generation means sets the pixel accuracy of the predicted image data to integer pixel accuracy or 1/2 pixel accuracy. Can do.

符号化すべき前記画像データと、前記画像データに対応する予測画像データとの差分のデータに対して直交変換処理を施す直交変換処理手段をさらに備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記直交変換処理手段が、前記データに対してフレーム符号化モードで直交変換処理を施すようにすることができる。 The image processing apparatus further includes orthogonal transform processing means for performing orthogonal transform processing on difference data between the image data to be encoded and predicted image data corresponding to the image data. When it is determined that a caption is included, the orthogonal transform processing means can perform an orthogonal transform process on the data in a frame coding mode.

符号化すべき前記画像データと、前記画像データに対応する予測画像データとの差分のデータに対して直交変換処理を施す直交変換処理手段をさらに備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記直交変換処理手段が、前記データに対して直交変換処理を施す単位である直交変換サイズの値を、予め設定されたサイズより小さい値とするようにすることができる。 The image processing apparatus further includes orthogonal transform processing means for performing orthogonal transform processing on difference data between the image data to be encoded and predicted image data corresponding to the image data. When it is determined that the caption is included, the orthogonal transform processing unit sets the value of the orthogonal transform size, which is a unit for performing the orthogonal transform processing on the data, to a value smaller than a preset size. Can be.

本発明の一側面の画像処理方法は、MPEG（Moving Picture Coding Experts Group）4、またはH.264／AVC（Advanced Video Coding）方式で画像データの符号化を行う画像処理装置の画像処理方法であって、符号化すべき前記画像データを取得し、前記取得された前記画像データの画像に、キャプションが含まれているか否かを判定し、前記画像データの画像に、キャプションが含まれていると判定された場合、前記画像データを量子化するための量子化パラメータを、前記画像の特徴量に応じて変化させることで、符号化された前記画像データのビットレートを制御するレート制御手段が、前記画像においてキャプションが表示されている部分の複数の画素で構成されるブロックに対して設定される量子化パラメータを、前記画像の特徴量に係らず所定の値とするステップを含む画像処理方法である。 An image processing method according to an aspect of the present invention is an image processing method of an image processing apparatus that encodes image data using MPEG (Moving Picture Coding Experts Group) 4 or H.264 / AVC (Advanced Video Coding). The image data to be encoded is acquired, it is determined whether or not a caption is included in the image of the acquired image data, and it is determined that a caption is included in the image of the image data. A rate control unit that controls a bit rate of the encoded image data by changing a quantization parameter for quantizing the image data according to a feature amount of the image; A quantization parameter set for a block composed of a plurality of pixels in a portion where captions are displayed in an image is set to a predetermined value regardless of the feature amount of the image. Tsu is an image processing method including the flop.

本発明の一側面のプログラムは、MPEG（Moving Picture Coding Experts Group）4、またはH.264／AVC（Advanced Video Coding）方式で画像データの符号化を行う画像処理装置に画像処理を実行させるプログラムであって、符号化すべき前記画像データの取得を制御し、前記取得された前記画像データの画像に、キャプションが含まれているか否かの判定を制御し、前記画像データの画像に、キャプションが含まれていると判定された場合、前記画像データを量子化するための量子化パラメータを、前記画像の特徴量に応じて変化させることで、符号化された前記画像データのビットレートを制御するレート制御手段が、前記画像においてキャプションが表示されている部分の複数の画素で構成されるブロックに対して設定される量子化パラメータを、前記画像の特徴量に係らず所定の値とするように制御するステップを含むコンピュータが読み取り可能なプログラムである。 A program according to one aspect of the present invention is a program that causes an image processing apparatus that performs image processing to encode image data using MPEG (Moving Picture Coding Experts Group) 4 or H.264 / AVC (Advanced Video Coding). And controlling the acquisition of the image data to be encoded, controlling whether or not a caption is included in the image of the acquired image data, and including a caption in the image of the image data A rate at which the bit rate of the encoded image data is controlled by changing a quantization parameter for quantizing the image data according to the feature amount of the image. The control means sets a quantization parameter set for a block composed of a plurality of pixels in a portion where captions are displayed in the image. Computer comprising a step for controlling to a predetermined value irrespective of the amount is readable program.

本発明の一側面においては、符号化すべき前記画像データが取得され、前記取得された前記画像データの画像に、キャプションが含まれているか否かが判定され、前記画像データの画像に、キャプションが含まれていると判定された場合、前記画像データを量子化するための量子化パラメータを、前記画像の特徴量に応じて変化させることで、符号化された前記画像データのビットレートを制御するレート制御手段により、前記画像においてキャプションが表示されている部分の複数の画素で構成されるブロックに対して設定される量子化パラメータが、前記画像の特徴量に係らず所定の値とされる。 In one aspect of the present invention, the image data to be encoded is acquired, it is determined whether a caption is included in the image of the acquired image data, and a caption is included in the image of the image data. When it is determined that the image data is included, the bit rate of the encoded image data is controlled by changing a quantization parameter for quantizing the image data according to the feature amount of the image. The quantization parameter set for the block composed of a plurality of pixels in the portion where the caption is displayed in the image is set to a predetermined value by the rate control means regardless of the feature amount of the image.

本発明によれば、符号化された画像データのビットレートを低く抑えながら、再生された画像を見たユーザの満足度を高めることができる。 ADVANTAGE OF THE INVENTION According to this invention, the satisfaction of the user who saw the reproduced image can be raised, suppressing the bit rate of the encoded image data low.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、明細書または図面に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、明細書または図面に記載されていることを確認するためのものである。従って、明細書または図面中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が構成要件に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外の構成要件には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between constituent elements of the present invention and the embodiments described in the specification or the drawings are exemplified as follows. This description is intended to confirm that the embodiments supporting the present invention are described in the specification or the drawings. Therefore, even if there is an embodiment which is described in the specification or the drawings but is not described here as an embodiment corresponding to the constituent elements of the present invention, that is not the case. It does not mean that the form does not correspond to the constituent requirements. Conversely, even if an embodiment is described here as corresponding to a configuration requirement, that means that the embodiment does not correspond to a configuration requirement other than the configuration requirement. It's not something to do.

本発明の一側面の画像処理装置は、MPEG（Moving Picture Coding Experts Group）4、またはH.264／AVC（Advanced Video Coding）方式で画像データの符号化を行う画像処理装置であって、符号化すべき前記画像データを取得する画像データ取得手段（例えば、図１の画面並べ替えバッファ１４２）と、前記画像データ取得手段により取得された前記画像データの画像に、キャプションが含まれているか否かを判定する判定手段（例えば、図１のキャプション検出部１５５）と、前記画像データを量子化するための量子化パラメータを、前記画像の特徴量に応じて変化させることで、符号化された前記画像データのビットレートを制御するレート制御手段（例えば、図１のレート制御部１５４）とを備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記レート制御手段が、前記画像においてキャプションが表示されている部分の複数の画素で構成されるブロックに対して設定される量子化パラメータを、前記画像の特徴量に係らず所定の値とする。 An image processing apparatus according to an aspect of the present invention is an image processing apparatus that performs encoding of image data using MPEG (Moving Picture Coding Experts Group) 4 or H.264 / AVC (Advanced Video Coding). Whether or not a caption is included in the image of the image data acquired by the image data acquisition means (for example, the screen rearrangement buffer 142 in FIG. 1) for acquiring the image data and the image data acquisition means. The determination unit (for example, the caption detection unit 155 in FIG. 1) and a quantization parameter for quantizing the image data are changed according to the feature amount of the image, thereby encoding the image. Rate control means (for example, the rate control unit 154 in FIG. 1) for controlling the bit rate of data, and the determination means adds a caption to the image of the image data. When the rate control means determines that a quantization parameter set for a block composed of a plurality of pixels in a portion where captions are displayed in the image, A predetermined value is used regardless of the feature amount.

この画像処理装置は、符号化すべき前記画像データの画像の動きに応じた、前記符号化すべき前記画像データの画像に対応する予測画像の画像データを生成する予測画像データ生成手段（例えば、図１の動き予測・補償部１５３）をさらに備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記予測画像データ生成手段が、前記画像においてキャプションが表示されている部分の複数の画素で構成されるブロックに対して設定される動きベクトルを、予め定められた範囲の値とするようにすることができる。 The image processing apparatus includes predicted image data generation means (for example, FIG. 1) that generates image data of a predicted image corresponding to an image of the image data to be encoded, according to a motion of the image of the image data to be encoded. Motion prediction / compensation unit 153), and when the determination unit determines that the image of the image data includes a caption, the prediction image data generation unit displays the caption in the image. A motion vector set for a block composed of a plurality of pixels in a portion that has been set can be a value in a predetermined range.

この画像処理装置は、符号化すべき前記画像データと、前記画像データに対応する予測画像データとの差分のデータに対して直交変換処理を施す直交変換処理手段（例えば、図１の直交変換部１４４）をさらに備え、前記判定手段により、前記画像データの画像に、キャプションが含まれていると判定された場合、前記直交変換処理手段が、前記データに対してフレーム符号化モードで直交変換処理を施すようにすることができる。 This image processing apparatus includes orthogonal transform processing means (for example, an orthogonal transform unit 144 in FIG. 1) that performs orthogonal transform processing on difference data between the image data to be encoded and predicted image data corresponding to the image data. ), And when the determining means determines that the image data includes a caption, the orthogonal transform processing means performs orthogonal transform processing on the data in a frame coding mode. Can be applied.

本発明の一側面の画像処理方法は、MPEG（Moving Picture Coding Experts Group）4、またはH.264／AVC（Advanced Video Coding）方式で画像データの符号化を行う画像処理装置の画像処理方法であって、符号化すべき前記画像データを取得し（例えば、図１４のステップS１０１の処理）、前記取得された前記画像データの画像に、キャプションが含まれているか否かを判定し（例えば、図１４のステップS１０３の処理）、前記画像データの画像に、キャプションが含まれていると判定された場合、前記画像データを量子化するための量子化パラメータを、前記画像の特徴量に応じて変化させることで、符号化された前記画像データのビットレートを制御するレート制御手段（例えば、図１のレート制御部１５４）が、前記画像においてキャプションが表示されている部分の複数の画素で構成されるブロックに対して設定される量子化パラメータを、前記画像の特徴量に係らず所定の値とする（例えば、図１５のステップS２０１）ステップを含む。 An image processing method according to an aspect of the present invention is an image processing method of an image processing apparatus that encodes image data using MPEG (Moving Picture Coding Experts Group) 4 or H.264 / AVC (Advanced Video Coding). The image data to be encoded is acquired (for example, the process of step S101 in FIG. 14), and it is determined whether or not a caption is included in the image of the acquired image data (for example, FIG. 14). In step S103), when it is determined that the image data includes a caption, a quantization parameter for quantizing the image data is changed according to the feature amount of the image. Thus, rate control means (for example, the rate control unit 154 in FIG. 1) for controlling the bit rate of the encoded image data displays the caption in the image. That the quantization parameter is set to the block composed of a plurality of pixels of the portion to a predetermined value irrespective of the feature value of the image (e.g., step S201 in FIG. 15) includes the step.

以下、図面を参照して、本発明の実施の形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は本発明を適用した画像処理装置１００の一実施の形態に係る構成例を示すブロック図である。画像処理装置１００は、例えば、入力された画像信号を、H.２６４/ＡＶＣ（ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｉｎｇ）方式により圧縮されて符号化された画像データに変換する。 FIG. 1 is a block diagram showing a configuration example according to an embodiment of an image processing apparatus 100 to which the present invention is applied. For example, the image processing apparatus 100 converts an input image signal into image data that has been compressed and encoded by an H.264 / AVC (Advanced Video Coding) method.

同図において、入力となる画像信号は、まず、Ａ／Ｄ変換部１４１においてデジタルデータに変換される。 In the figure, an input image signal is first converted into digital data by an A / D converter 141.

次に、出力となる画像圧縮情報のＧＯＰ（ＧｒｏｕｐｏｆＰｉｃｔｕｒｅｓ）構造に応じ、画面並べ替えバッファ１４２においてフレームの並べ替えが行われる。 Next, the screen rearrangement buffer 142 rearranges the frames in accordance with the GOP (Group of Pictures) structure of the image compression information to be output.

画面並べ替えバッファ１４２を介して供給される画像データは、その画像データの画素値と、イントラ予測部１５２または動き予測・補償部１５３から供給される画素値との差分情報が加算機１４３により演算され、直交変換部１４４に入力される。 For the image data supplied via the screen rearrangement buffer 142, the adder 143 calculates difference information between the pixel value of the image data and the pixel value supplied from the intra prediction unit 152 or the motion prediction / compensation unit 153. And input to the orthogonal transform unit 144.

入力される画像信号に対応する画像データがイントラ（画像内）符号化される画像データである場合、画面並べ替えバッファ１４２を介して供給される画像データは、その画像データの画素値と、イントラ予測部１５２がフレームメモリ１５１に蓄積されている画像データに基づいて生成する画素値との差分情報が加算機１４３により演算され、直交変換部１４４に入力され、その差分情報に対して離散コサイン変換（DCT：Discrete Cosine Transform）、カルーネン・レーベ変換等の直交変換処理が施される。 When the image data corresponding to the input image signal is image data encoded intra (intra-image), the image data supplied via the screen rearrangement buffer 142 includes the pixel value of the image data, the intra Difference information with the pixel value generated by the prediction unit 152 based on the image data stored in the frame memory 151 is calculated by the adder 143 and input to the orthogonal transformation unit 144, and discrete cosine transform is performed on the difference information. Orthogonal transformation processing such as DCT (Discrete Cosine Transform) and Karhunen-Loeve transformation is performed.

直交変換部１４４から出力される変換係数は、量子化部１４５において量子化処理が施される。なお、以下においては、主として直交変換部１４４が直交変換処理として、ＤＣＴ処理を行う場合の例について説明する。 The transform coefficient output from the orthogonal transform unit 144 is subjected to quantization processing in the quantization unit 145. In the following, an example will be described in which the orthogonal transform unit 144 performs DCT processing as orthogonal transform processing.

レート制御部１５４は、量子化部１４５による量子化処理に用いられる量子化スケールなどを必要に応じて変更するなどして制御することで、出力されるデータのビットレートを制御する。 The rate control unit 154 controls the bit rate of the output data by controlling the quantization scale used for the quantization processing by the quantization unit 145 by changing the quantization scale as necessary.

量子化部１４５から出力される量子化された変換係数は、可逆変換部１４６に入力され、可逆変換部１４６により可変長符号化、算術符号化等の可逆符号化の処理が施された後、蓄積バッファ１４７に蓄積され、H.２６４/ＡＶＣ方式で符号化された画像データとして出力される。 The quantized transform coefficient output from the quantization unit 145 is input to the lossless transform unit 146, and after lossless encoding processing such as variable length coding and arithmetic coding is performed by the lossless transform unit 146, The data is stored in the storage buffer 147 and output as image data encoded by the H.264 / AVC format.

一方、量子化部１４５から出力される量子化された変換係数は、逆量子化部１４８にも供給されて逆量子化の処理が施された後、さらに逆直交変換部１４９において逆直交変換処理が施されて、復号された画像データとなる。 On the other hand, the quantized transform coefficient output from the quantization unit 145 is also supplied to the inverse quantization unit 148 and subjected to inverse quantization processing, and then further subjected to inverse orthogonal transformation processing in the inverse orthogonal transformation unit 149. Is applied to obtain decoded image data.

逆直交変換部１４９から出力される復号された画像データは、デブロックフィルタ１５０においてブロック歪の除去が施された後、フレームメモリ１５１に蓄積される。 The decoded image data output from the inverse orthogonal transform unit 149 is stored in the frame memory 151 after the block distortion is removed by the deblocking filter 150.

なお、イントラ予測部１５２においては、符号化するマクロブロックに応じて、時間軸において前方向（過去側）のフレーム画像データのみを参照画像とする前方向予測モード、時間軸において後ろ方向（未来側）のフレーム画像データのみを参照画像とする後ろ方向予測モード、上記２枚のフレーム画像データの両方を参照画像とする双方向予測モードなどのモードを適用することが可能である。イントラ予測部１５２において、当該マクロブロックに対して適用されたイントラ予測モードに関する情報は、可逆符号化部１４６に伝送され、H.２６４/ＡＶＣ方式で符号化された画像データにおけるヘッダ情報の一部として符号化される。 In the intra prediction unit 152, the forward prediction mode in which only the frame image data in the forward direction (past side) on the time axis is used as a reference image according to the macroblock to be encoded, and the backward direction (future side) in the time axis. The backward prediction mode in which only the frame image data is used as a reference image and the bidirectional prediction mode in which both of the two frame image data are used as reference images can be applied. In the intra prediction unit 152, information on the intra prediction mode applied to the macroblock is transmitted to the lossless encoding unit 146 and part of header information in the image data encoded by the H.264 / AVC format. Is encoded as

一方、入力される画像信号に対応する画像データがインター（画像間）符号化される画像データである場合、画面並べ替えバッファ１４２を介して供給される画像データは、まず、動き予測・補償部１５３に入力される。このとき、動き予測・補償部１５３は、フレームメモリ１５１から参照画像の画像データを取り出し、その画像データに対して動き予測・補償処理を施すことで予測画像データを生成する。 On the other hand, when the image data corresponding to the input image signal is image data that is inter (inter-image) encoded, the image data supplied via the screen rearrangement buffer 142 is first a motion prediction / compensation unit. 153 is input. At this time, the motion prediction / compensation unit 153 retrieves the image data of the reference image from the frame memory 151, and generates predicted image data by performing motion prediction / compensation processing on the image data.

動き予測・補償部１５３から出力される予測画像データは、加算器１４３に入力され、加算器１４３は、画面並べ替えバッファ１４２を介して供給される画像データの画素値と、予測画像データの画素値との差分情報を演算する。なお、図１においては、イントラ予測部１５２と加算器１４３とが接続されるように示されているが、入力される画像信号に対応する画像データがインター（画像間）符号化される画像データである場合、動き予測・補償部１５３と加算器１４３とが接続されるものとする。 The predicted image data output from the motion prediction / compensation unit 153 is input to the adder 143, and the adder 143 includes the pixel value of the image data supplied via the screen rearrangement buffer 142 and the pixel of the predicted image data. The difference information with the value is calculated. Although FIG. 1 shows that the intra prediction unit 152 and the adder 143 are connected, the image data corresponding to the input image signal is inter (inter-image) encoded image data. In this case, it is assumed that the motion prediction / compensation unit 153 and the adder 143 are connected.

その後、イントラ符号化の場合と同様に、加算器１４３から出力されたデータが直交変換部１４４に入力され、量子化部１４５、可逆変換部１４６による処理を経て蓄積バッファ１４７に蓄積され、H.２６４/ＡＶＣ方式で符号化された画像データとして出力される。 After that, as in the case of intra coding, the data output from the adder 143 is input to the orthogonal transform unit 144, stored in the storage buffer 147 through the processing of the quantization unit 145 and the lossless transform unit 146, and H. It is output as image data encoded by the H.264 / AVC format.

また、量子化部１４５から出力される量子化された変換係数は、やはり逆量子化部１４８にも供給され、逆直交変換部１４９の処理を経て復号された画像データが、デブロックフィルタ１５０の処理を経てフレームメモリ１５１に蓄積される。 The quantized transform coefficient output from the quantizing unit 145 is also supplied to the inverse quantizing unit 148, and the image data decoded through the processing of the inverse orthogonal transform unit 149 is processed by the deblocking filter 150. It is stored in the frame memory 151 through processing.

なお、動き予測・補償部１５３は、画面並べ替えバッファ１４２を介して供給される画像データに基づいて生成される動きベクトルに関する情報を可逆符号化部１４６に供給し、可逆符号化部１４６により、その情報に対して可変長符号化、算術符号化などの可逆符号化処理が施され、H.２６４/ＡＶＣ方式で符号化された画像データにおけるヘッダ情報の一部として符号化される。 Note that the motion prediction / compensation unit 153 supplies information on motion vectors generated based on the image data supplied via the screen rearrangement buffer 142 to the lossless encoding unit 146, and the lossless encoding unit 146 The information is subjected to lossless encoding processing such as variable length encoding and arithmetic encoding, and is encoded as part of header information in image data encoded by the H.264 / AVC format.

また、画像処理装置１００においては、画面並べ替えバッファ１４２を介して供給される画像データにおいて、画像に含まれる、例えば、テロップ、字幕などのキャプションを検出するキャプション検出部１５５が設けられている。 Further, the image processing apparatus 100 is provided with a caption detection unit 155 that detects, for example, captions such as telops and captions included in the image in the image data supplied via the screen rearrangement buffer 142.

キャプション検出部１５５は、例えば、画像の所定のマクロブロックのエッジ数を検出するなどして、文字の画像が含まれるマクロブロックを検出（特定）することでマクロブロック単位にキャプションの検出を行う。そして、キャプション検出部１５５は、キャプションが検出された場合、キャプションが検出されたことを表す制御信号を、キャプションが検出されたマクロブロックを特定する情報と対応付けて直交変換部１４４、動き予測・補償部１５３、およびレート制御部１５４に供給するようになされている。 The caption detection unit 155 detects captions in units of macroblocks by detecting (specifying) a macroblock including a character image, for example, by detecting the number of edges of a predetermined macroblock of the image. Then, when a caption is detected, the caption detection unit 155 associates a control signal indicating that the caption is detected with information identifying the macroblock in which the caption is detected, the orthogonal transformation unit 144, the motion prediction / The compensation unit 153 and the rate control unit 154 are supplied.

ここでマクロブロックは、符号化を行う画像データの中で、例えば、１６×１６個の画素で構成されるブロックであり、H.２６４/ＡＶＣ方式で符号化処理を行う場合の処理単位とされる。 Here, the macroblock is a block composed of, for example, 16 × 16 pixels in the image data to be encoded, and is a processing unit when performing the encoding process in the H.264 / AVC format. The

本発明の画像処理装置１００においては、キャプション検出部１５５により検出された場合、画像の中のキャプション部分の劣化をできるだけ抑制するように、画像データの符号化の処理が行われる。すなわち、画像処理装置１００は、符号化された画像データが復号されて表示されるとき、画像の中でユーザが注目するキャプション部分ができるだけ劣化しないように画像を処理するようになされている。 In the image processing apparatus 100 of the present invention, when detected by the caption detection unit 155, image data encoding processing is performed so as to suppress degradation of the caption portion in the image as much as possible. In other words, when the encoded image data is decoded and displayed, the image processing apparatus 100 processes the image so that the caption portion noted by the user in the image is not deteriorated as much as possible.

H.264/AVCなどの符号化方式による画像の符号化にあたっては、より高い符号化効率を得るためにビットレートの調整を行うことが一般的である。すなわち、画像の中の所定のピクチャ、またはマクロブロックを符号化する場合、通常、そのピクチャ、またはマクロブロックに割り当てられるビット数が少なくなるように符号化される。 When encoding an image using an encoding method such as H.264 / AVC, it is common to adjust the bit rate in order to obtain higher encoding efficiency. That is, when a predetermined picture or macroblock in an image is encoded, encoding is usually performed so that the number of bits allocated to the picture or macroblock is reduced.

ステップ１では、ＧＯＰ内の各ピクチャに対する割当ビット量を、割当対象ピクチャを含めてＧＯＰ内で、未だ符号化が行われていないピクチャに対して割り当てられるビット量を元にして配分する。 In step 1, the allocated bit amount for each picture in the GOP is distributed based on the bit amount allocated to the picture that has not yet been encoded in the GOP including the allocation target picture.

ステップ３では、ステップ２で求められた量子化スケールについて、視覚的に劣化の目立ちやすい平坦部ではより細かく量子化され、劣化の比較的目立ちにくい絵柄の複雑な部分でより粗く量子化されるように、各マクロブロックのアクティビティによって変化させる。 In step 3, the quantization scale obtained in step 2 is quantized more finely in the flat portion where deterioration is visually noticeable, and coarser in the complicated portion of the pattern where deterioration is relatively inconspicuous. And change according to the activity of each macroblock.

例えば、ｊ番目のマクロブロックのアクティビティact_jは、式（１）により求められる。 For example, the activity act _j of the j-th macroblock is obtained by Expression (1).

ここで、var sblkは、１個のマクロブロックを、８×８個の画素で構成される４個のサブブロックに分割し、その分割されたサブブロックの画素値の分散値を表す値であり、式（２）および式（３）により求められる。 Here, var sblk is a value that represents a variance value of pixel values of the divided sub-blocks by dividing one macroblock into four sub-blocks composed of 8 × 8 pixels. , Which are obtained by the equations (2) and (3).

ここで、ここで、Pkは１つのマクロブロック内の画素値を表す値とされる。 Here, Pk is a value representing a pixel value in one macroblock.

すなわち、式（１）により、マクロブロックが、８×８個の画素で構成される４個のサブブロックに分割され、その分割されたサブブロックのそれぞれについてフレームDCT符号化モードの場合と、フィールドDCT符号化モードの場合の２通りの場合についての画素値の分散値が求められ、それにより得られた８通りのサブブロックの画素値の分散値（var sblk）のうちの最小のものが選択されることになる。 That is, according to the equation (1), the macroblock is divided into four subblocks composed of 8 × 8 pixels, and the case of the frame DCT coding mode for each of the divided subblocks and the field The dispersion value of the pixel value for the two cases in the DCT encoding mode is obtained, and the smallest one of the dispersion values (var sblk) of the pixel values of the eight sub-blocks thus obtained is selected. Will be.

そして、式（４）により、その値が０．５〜２の範囲をとる正規化アクティビティNactjが求められる。 Then, the normalized activity Nactj whose value is in the range of 0.5 to 2 is obtained by the equation (4).

ここで、avgactは直前に符号化したピクチャでのactjの平均値である。 Here, avgact is an average value of actj in the picture encoded immediately before.

そして、最終的に求められる量子化スケールコードmquantjは、ステップ２で得られた量子化スケールコードQjを元に、式（５）により与えられる。 The finally obtained quantization scale code mquantj is given by equation (5) based on the quantization scale code Qj obtained in step 2.

すなわち、TM5のレート制御のステップ３においては、符号化されたときの割り当てビット量が大きくなりやすいアクティビティの高いマクロブロックにおいては、大きい量子化スケールが設定されるように、量子化スケールを変化させる。 That is, in step 3 of TM5 rate control, the quantization scale is changed so that a large quantization scale is set in a macroblock having a high activity in which the allocated bit amount when encoded is likely to be large. .

このように、アクティビティに基づいて量子化スケールを変化させた場合、符号化された画像のデータにおいてビット数ができるだけ少なくなるように（ビットレートが低くなるように）制御することは可能となるが、キャプション部分においては、ユーザに画像の劣化が意識されやすくなってしまう。 As described above, when the quantization scale is changed based on the activity, it is possible to control the encoded image data so that the number of bits is as small as possible (to reduce the bit rate). In the caption portion, it is easy for the user to be aware of image degradation.

図２は、キャプション部分に含まれる文字の画像の例を示している。同図においては、「あ」の文字が示されており、マクロブロック（MB）A乃至Dの４個のマクロブロックの中に１文字が表示されている。また、同図において、マクロブロックA乃至Dは、１６×１６個の画素で構成されているものとし、上述したアクティビティの算出において、マクロブロックA乃至Dのそれぞれが、８×８個の画素で構成される４個のサブブロックに分割されるものとする。 FIG. 2 shows an example of a character image included in the caption portion. In the drawing, the character “A” is shown, and one character is displayed in four macroblocks of macroblocks (MB) A to D. In the same figure, it is assumed that macroblocks A to D are composed of 16 × 16 pixels, and in the above-described activity calculation, each of macroblocks A to D is composed of 8 × 8 pixels. Assume that it is divided into four sub-blocks.

いま、図２のマクロブロックBについて符号化する場合を考える。マクロブロックBは、サブブロックB-1乃至B-4のサブブロックに分割されており、サブブロックB-1、B-3およびB-4には、文字の画像の一部が含まれているが、サブブロックB-2には文字の画像が含まれていない。 Consider a case where the macro block B in FIG. 2 is encoded. The macroblock B is divided into subblocks B-1 to B-4, and the subblocks B-1, B-3, and B-4 include part of the character image. However, the sub-block B-2 does not include a character image.

上述したように、TM5のレート制御のステップ３では、式（１）により、マクロブロックが、８×８個の画素で構成される４個のサブブロックに分割され、その分割されたサブブロックのそれぞれの画素値の分散値のうちの最小のものが選択されることになる。画素値の分散値は、例えば、エッジの多い画像ほど大きくなり、エッジの少ない画像ほど小さくなるので、マクロブロックBにおいては、量子化スケールを特定するためのサブブロックの分散値としてサブブロックB-2の分散値が選択されることになる。 As described above, in step 3 of TM5 rate control, the macroblock is divided into four sub-blocks composed of 8 × 8 pixels according to the equation (1), and the sub-blocks of the divided sub-blocks are divided. The smallest of the variance values of the respective pixel values is selected. For example, in the macroblock B, the variance value of the pixel value is larger as the image having more edges and smaller as the image having fewer edges. Therefore, in the macroblock B, the subblock B− A variance value of 2 will be selected.

サブブロックB-2には、文字の画像が含まれていないので、エッジが存在せず、画素の分散値（アクティビティ）も低いものとなる。従って、TM5のレート制御のステップ３において、マクロブロックBは、アクティビティが低いマクロブロックと見なされ、大きい量子化スケールが設定されることはない。 Since the sub-block B-2 does not include a character image, there is no edge and the pixel dispersion value (activity) is low. Therefore, in step 3 of the rate control of TM5, the macroblock B is regarded as a macroblock with low activity, and a large quantization scale is not set.

これに対して、マクロブロックA、C、およびDにおいては、どのサブブロックにも文字の画像の一部がふくまれているので、TM5のレート制御のステップ３において、マクロブロックA、C、およびDは、アクティビティが高いマクロブロックと見なされ、大きい量子化スケールが設定されることになる。 On the other hand, in macroblocks A, C, and D, every sub-block includes a part of the character image. Therefore, in step 3 of TM5 rate control, macroblocks A, C, and D is regarded as a macroblock with high activity, and a large quantization scale is set.

量子化スケールが大きい場合、符号化された画像データを復号して得られる画像において、符号化される前の画像を正確に再生することが難しくなる。図３は、図２の画像を、通常のTM5のレート制御を用いて符号化し、その符号化された画像データを復号して得られた画像を示している。 When the quantization scale is large, it is difficult to accurately reproduce an image before encoding in an image obtained by decoding encoded image data. FIG. 3 shows an image obtained by encoding the image of FIG. 2 using normal TM5 rate control and decoding the encoded image data.

同図に示されるように、復号して得られた画像において、マクロブロックBの位置では、画像が明瞭に表示されているが、マクロブロックA、C、およびDの位置では、画像が不明瞭（ぼけて）表示されている。このように、文字の一部がぼけて表示されるなどした場合、表示された画像をみているユーザに、画質が低いという印象を与えてしまう可能性が高い。 As shown in the figure, in the image obtained by decoding, the image is clearly displayed at the position of macroblock B, but the image is unclear at the positions of macroblocks A, C, and D. (Blurred) is displayed. As described above, when a part of the character is blurred and displayed, there is a high possibility that the user viewing the displayed image has an impression that the image quality is low.

そこで、本発明の画像処理装置１００においては、キャプションが検出された場合、キャプションが含まれるマクロブロックに対しては、TM5のレート制御のステップ３の処理が施されないようになされている。 Therefore, in the image processing apparatus 100 of the present invention, when a caption is detected, the macro block including the caption is not subjected to the process of step 3 of TM5 rate control.

すなわち、キャプション検出部１５５が画面並べ替えバッファ１４４から供給された画像の中のキャプションを検出すると、キャプションが検出されたことを表す情報とともに、検出されたキャプションのマクロブロックを特定する情報を、レート制御部１５４に出力する。 That is, when the caption detection unit 155 detects a caption in the image supplied from the screen rearrangement buffer 144, information indicating that the caption is detected and information specifying the macroblock of the detected caption are displayed. The data is output to the control unit 154.

例えば、図２の画像を含む画像がキャプション検出部１５５に入力された場合、キャプション検出部１５５は、キャプションが検出されたことを表す情報と、マクロブロックA乃至Dのそれぞれを特定する情報（例えば、位置の情報など）をレート制御部１５４に出力することで、キャプション検出部１５５は、マクロブロックA乃至Dがキャプションのマクロブロックであることをレート制御部１５４に通知することになる。このとき、キャプションの検出と通知は、サブブロック単位ではなく、マクロブロック単位で行われるので、文字の画像を含まないサブブロックB-2を有するマクロブロックBも、当然、キャプションのマクロブロックとしてレート制御部１５４に通知されることになる。 For example, when an image including the image of FIG. 2 is input to the caption detection unit 155, the caption detection unit 155 includes information indicating that the caption has been detected and information specifying each of the macroblocks A to D (for example, , Position information, etc.) to the rate control unit 154, the caption detection unit 155 notifies the rate control unit 154 that the macroblocks A to D are caption macroblocks. At this time, since the caption detection and notification is performed in units of macro blocks, not in units of sub-blocks, the macro block B having the sub-block B-2 that does not include a character image is naturally also rated as a macro block of captions. The control unit 154 is notified.

レート制御部１５４は、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた場合、キャプションのマクロブロックに対しては、TM5のレート制御のステップ３の処理が施されないようにする。すなわち、マクロブロックA乃至Dについては、そのマクロブロックのアクティビティの高低に係らず、TM5のレート制御のステップ２の処理で求められた量子化スケールがそのまま（ステップ３の処理を施されずに）適用される。 When receiving the notification of the caption macroblock from the caption detection unit 155, the rate control unit 154 prevents the processing of TM5 rate control step 3 from being performed on the caption macroblock. That is, for the macroblocks A to D, the quantization scale obtained by the TM2 rate control step 2 processing is kept as it is regardless of the macroblock activity level (without the step 3 processing). Applied.

その結果、キャプションのマクロブロック（文字の画像が含まれるマクロブロック）は、アクティビティが高いが、大きい量子化スケールが設定されることがなく、符号化された画像データを復号して得られる画像において、符号化される前の画像を、ほぼ正確に再生することが可能となる。 As a result, caption macroblocks (macroblocks containing character images) have high activity, but a large quantization scale is not set, and in an image obtained by decoding encoded image data. The image before encoding can be reproduced almost accurately.

また、レート制御部１５４は、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けていない場合、それらのマクロブロックに対しては、通常のTM5のレート制御の処理（ステップ３の処理を含む処理）が施されるようにする。すなわち、文字の画像が含まれないマクロブロックについては、TM5のレート制御のステップ１乃至３の処理が施される。 In addition, when the caption control unit 154 has not received a caption macroblock notification from the caption detection unit 155, the rate control unit 154 performs normal TM5 rate control processing (including the processing in step 3) on these macroblocks. Process). That is, for macroblocks that do not include character images, TM5 rate control steps 1 to 3 are performed.

これにより、文字の画像が含まれないマクロブロックについては、レート制御部１５４がアクティビティに基づいて量子化部１４５を制御して量子化スケールを変化させて符号化の処理が行われるので、符号化された画像のデータにおいてビット数ができるだけ少なくなるように（ビットレートが低くなるように）制御することが可能となる。 As a result, for macroblocks that do not include a character image, the rate control unit 154 controls the quantization unit 145 based on the activity to change the quantization scale, so that encoding processing is performed. It is possible to perform control so that the number of bits in the image data is reduced as much as possible (to reduce the bit rate).

一般に人間の視覚特性は、エッジの少ない低周波成分に敏感であるため、TM５のレート制御のステップ３の処理のように、エッジが多く含まれる画像を符号化する場合、量子化スケールを大きく設定することは、符号化されたデータのビット数を少なくする上で効果的な方式と言える。 In general, human visual characteristics are sensitive to low-frequency components with few edges, so when encoding an image that contains many edges, as in TM3 rate control step 3, set a large quantization scale. This is an effective method for reducing the number of bits of encoded data.

しかしながら、キャプション部分には、文字などが表示されており、復号された画像を見るユーザは、通常、他の部分と比較してキャプション部分を、より注意して見ることになり、キャプション部分における画像の劣化は、ユーザに意識されやすい。このため、従来、エンコーダにおいて、ビットレートを低く抑えながら、復号した画像がより自然で美しい画像となるように符号化しても、実際に復号された画像を見たユーザに、画質が低いという印象を与えてしまう場合があった。 However, characters and the like are displayed in the caption portion, and a user who views the decoded image usually looks at the caption portion more carefully than other portions, and the image in the caption portion is displayed. The deterioration of the image is easily noticed by the user. For this reason, even if encoding is performed so that the decoded image becomes a more natural and beautiful image while keeping the bit rate low in an encoder, the impression that the image quality is low for the user who actually viewed the decoded image. There was a case that gave.

これに対して、本発明の画像処理装置１００においては、キャプション部分のマクロブロックを符号化する場合、TM５のレート制御のステップ３の処理が行われないので、復号された画像を見たユーザに違和感を与えることがない。また、画像の中のキャプション部分における符号化効率は下がるものの、キャプション部分以外の部分では、TM５のレート制御のステップ３の処理が行われるので、画像全体としては、人間の視覚特性を考慮した効率的な符号化が行われることになる。その結果、本発明によれば、符号化された画像データのビットレートを低く抑えながら、再生された画像を見たユーザの満足度を高めることができる。 On the other hand, in the image processing apparatus 100 of the present invention, when the macro block of the caption portion is encoded, the process of step 3 of the TM5 rate control is not performed, so that the user who viewed the decoded image can be There is no sense of incongruity. In addition, although the encoding efficiency in the caption portion of the image is lowered, the processing in step 3 of the TM5 rate control is performed in the portion other than the caption portion, so that the efficiency of the entire image in consideration of human visual characteristics. Encoding will be performed. As a result, according to the present invention, it is possible to increase the satisfaction level of the user who viewed the reproduced image, while keeping the bit rate of the encoded image data low.

ここまで、レート制御部１５４によるレート制御の方式を変更することで、キャプション部分の劣化をできるだけ小さくするように符号化する例について説明したが、他の方式により、キャプション部分の劣化をできるだけ小さくするように符号化することもできる。 Up to this point, an example has been described in which encoding is performed so as to minimize degradation of the caption portion by changing the rate control method by the rate control unit 154. However, degradation of the caption portion is minimized by other methods. It can also be encoded as follows.

最初に、動き予測・補償部１５３による予測画像データの生成を、適切に制御することによりキャプション部分の劣化をできるだけ小さくするように符号化する例について説明する。 First, an example will be described in which the generation of predicted image data by the motion prediction / compensation unit 153 is encoded so as to minimize the deterioration of the caption portion by appropriately controlling.

図４ａと図４ｂは、動き予測・補償部１５３による動き補償予測の処理を説明する図である。H.264/AVCで符号化された画像データにおいては、画像データのフレーム構造がフレームストラクチャの場合、マクロブロックはトップフィールドとボトムフィールドがインターレースされた１６画素×１６ライン（輝度信号）のフレームブロックで構成され、フレーム動き補償予測、またはフィールド動き補償予測という動き補償予測が用いられる。 FIG. 4A and FIG. 4B are diagrams for explaining the process of motion compensation prediction by the motion prediction / compensation unit 153. FIG. In image data encoded by H.264 / AVC, when the frame structure of the image data is a frame structure, the macroblock is a frame block of 16 pixels × 16 lines (luminance signal) in which the top field and the bottom field are interlaced. Motion compensation prediction called frame motion compensation prediction or field motion compensation prediction is used.

フレーム動き補償予測は、インターレースされた２つのフィールドが合成されたフレームで動き補償予測を行うもので、輝度信号はインターレースされた１６画素×１６ラインブロックごとに予測される。インターレース信号においては、１フレームを構成する２つのフィールドのうち、空間的に上にあるフィールドがトップフィールドと呼ばれ、空間的に下にあるフィールドがボトムフィールドと呼ばれる。 In the frame motion compensation prediction, motion compensation prediction is performed in a frame in which two interlaced fields are combined, and a luminance signal is predicted for each interlaced 16 pixel × 16 line block. In an interlaced signal, of two fields constituting one frame, a spatially upper field is called a top field, and a spatially lower field is called a bottom field.

図４aは、例えば、１フレーム離れた参照フレーム（動き予測・補償部１５３から出力される予測画像データのフレーム）から前方向の動き補償予測を行う例を示す図である。同図においては、トップフィールドの画素が円で示され、またボトムフィールドの画素が四角形で示されており、「MV」で示される動きベクトルに従って、参照フレームに対応する入力フレーム（画面並べ替えバッファ１４２から出力される画像データのフレーム）の画素位置が特定される。 FIG. 4A is a diagram illustrating an example in which forward motion compensation prediction is performed from a reference frame (a frame of predicted image data output from the motion prediction / compensation unit 153) separated by one frame, for example. In the figure, pixels in the top field are indicated by circles, and pixels in the bottom field are indicated by rectangles, and an input frame (screen rearrangement buffer) corresponding to the reference frame according to the motion vector indicated by “MV”. The pixel position of the frame of the image data output from 142 is specified.

フレーム動き補償予測は、例えば、比較的ゆっくりした動きで、フレーム内での相関が高いまま等速度で動いている場合に有効な予測方式である。 The frame motion compensated prediction is an effective prediction method when, for example, the motion is relatively slow and the motion in the frame is moving at a constant speed with a high correlation.

一方、フィールド動き補償予測とは、フィールドごとに動き補償を行うもので、図４bに示されるように、トップフィールドに動きベクトル「ＭＶ１」、ボトムフィールドには動きベクトル「ＭＶ２」がそれぞれ設定され、「MV１」、または「MV２」のそれぞれの動きベクトルに従って、参照フレームに対応する入力フレームの画素位置が特定される。 On the other hand, the field motion compensation prediction performs motion compensation for each field. As shown in FIG. 4b, the motion vector “MV1” is set in the top field, and the motion vector “MV2” is set in the bottom field. According to each motion vector of “MV1” or “MV2”, the pixel position of the input frame corresponding to the reference frame is specified.

また、入力フレーム中の画素に対応する参照フレーム中のフィールドはトップフィールドとされるようにすることもできるし、ボトムフィールドとされるようにすることもできる。図４bの例では、入力フレーム中のトップフィールドの画素、ボトムフィールドの画素のいずれに対しても参照フレーム中のトップフィールドが参照されている。なお、同図においてもトップフィールドの画素が円で示され、またボトムフィールドの画素が四角形で示されており、フィールド動き補償予測では、マクロブロック内の各フィールド別に予測されるため、１６画素ｘ８ラインのフィールドブロック単位で予測されることになる。 Further, the field in the reference frame corresponding to the pixel in the input frame can be a top field, or can be a bottom field. In the example of FIG. 4b, the top field in the reference frame is referred to for both the top field pixel and the bottom field pixel in the input frame. In the figure, pixels in the top field are indicated by circles, and pixels in the bottom field are indicated by rectangles. In the field motion compensation prediction, prediction is performed for each field in the macroblock. The prediction is performed in units of field blocks of the line.

画像の中に、文字が表示されている場合、通常、時間の経過に伴って表示された文字が画面内で動いていく可能性は低いと考えられ、また、例えば、文字「あ」の形状は、時間的に前後する画像においても同じ形状とされる。そこで、本発明の画像処理装置１００においては、キャプションが検出された場合、キャプションが含まれるマクロブロックに対しては、そのマクロブロックに設定される動きベクトルを、例えば、「０」に固定する。 When characters are displayed in the image, it is usually considered that the displayed characters are unlikely to move in the screen as time passes. For example, the shape of the character “A” Is the same shape in images that are temporally changed. Therefore, in the image processing apparatus 100 of the present invention, when a caption is detected, for a macroblock including the caption, the motion vector set to the macroblock is fixed to “0”, for example.

文字または文字の一部が含まれるマクロブロックに対して、通常の動き補償予測の処理を行うと、個々のマクロブロックに対して異なる動きベクトルが設定されてしまう場合がある。すなわち、マクロブロックに含まれる文字ではなく、その背景の画像などに基づいて動きベクトルが設定されてしまう場合がある。なお、H.264/AVC方式の符号化では、１個のマクロブロックを、８×８個の画素で構成される４個のサブブロックに分割し、個々のサブブロックに対する動きベクトルが設定される。 When normal motion compensation prediction processing is performed on a macroblock including a character or a part of a character, a different motion vector may be set for each macroblock. That is, the motion vector may be set based on the background image or the like instead of the character included in the macroblock. In H.264 / AVC encoding, one macroblock is divided into four subblocks each composed of 8 × 8 pixels, and a motion vector for each subblock is set. .

図５は、図２の画像を、通常の動き補償予測の処理を行って符号化し、その符号化された画像データを復号して得られた画像の例を示している。同図において、個々の枠（サブブロック）に示される矢印のそれぞれが動きベクトルを表している。この例では、例えば、背景の画像の動きに伴って、個々のサブブロックのそれぞれに、別々の動きベクトルが設定されており、その結果、図５の画像に示される「あ」の文字が歪みのある状態で表示されている。 FIG. 5 shows an example of an image obtained by encoding the image of FIG. 2 by performing a normal motion compensation prediction process and decoding the encoded image data. In the figure, each arrow shown in each frame (sub-block) represents a motion vector. In this example, for example, a separate motion vector is set for each of the sub-blocks in accordance with the motion of the background image, and as a result, the character “a” shown in the image of FIG. 5 is distorted. It is displayed in the state with.

これに対して、本発明の画像処理装置１００においては、キャプションが含まれるマクロブロックに対して設定される動きベクトルが、例えば、「０」に固定される。 On the other hand, in the image processing apparatus 100 of the present invention, the motion vector set for the macroblock including the caption is fixed to “0”, for example.

例えば、図２の画像を含む画像がキャプション検出部１５５に入力された場合、キャプション検出部１５５は、キャプションが検出されたことを表す情報と、マクロブロックA乃至Dのそれぞれを特定する情報（例えば、位置の情報など）を動き予測・補償部１５３に出力することで、キャプション検出部１５５は、マクロブロックA乃至Dがキャプションのマクロブロックであることを動き予測・補償部１５３に通知することになる。 For example, when an image including the image of FIG. 2 is input to the caption detection unit 155, the caption detection unit 155 includes information indicating that the caption has been detected and information specifying each of the macroblocks A to D (for example, , Position information, etc.) to the motion prediction / compensation unit 153, the caption detection unit 155 notifies the motion prediction / compensation unit 153 that the macroblocks A to D are macroblocks of the caption. Become.

動き予測・補償部１５３は、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた場合、キャプションのマクロブロックに対しては、設定される動きベクトルが「０」に固定されるようにする。すなわち、マクロブロックA乃至Dについては、予測画像データにおける動きの補償が行われないことになる。 When receiving the notification of the caption macroblock from the caption detection unit 155, the motion prediction / compensation unit 153 fixes the set motion vector to “0” for the caption macroblock. . That is, for macroblocks A to D, motion compensation in the predicted image data is not performed.

その結果、キャプションのマクロブロック（文字の画像が含まれるマクロブロック）は、背景の画像が動いたとしても、動きベクトルが「０」に固定され、符号化された画像データを復号して得られる画像において、文字（いまの場合「あ」）の画像を、ほぼ正確に再生することが可能となる。 As a result, a caption macroblock (a macroblock including a character image) is obtained by decoding the encoded image data with the motion vector fixed to “0” even if the background image moves. In the image, it is possible to reproduce the image of the character (in this case, “A”) almost accurately.

また、動き予測・補償部１５３は、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けていない場合、それらのマクロブロックに対しては、通常通り動きベクトルを設定する。すなわち、文字の画像が含まれないマクロブロックについては、画像の動きに応じた適切な予測画像が生成される。 Also, when the motion prediction / compensation unit 153 has not received a caption macroblock notification from the caption detection unit 155, the motion prediction / compensation unit 153 sets a motion vector for the macroblocks as usual. That is, for a macroblock that does not include a character image, an appropriate prediction image corresponding to the motion of the image is generated.

これにより、復号された画像を見たユーザに違和感を与えることがなく、また、画像全体としては、動きを考慮した美しい画像を復号できるように、符号化が行われることになる。 As a result, the user who sees the decoded image does not feel uncomfortable, and the entire image is encoded so that a beautiful image in consideration of motion can be decoded.

なお、ここでは、動きベクトルを「０」に固定すると説明したが、固定される値は、「０」に限られるものではなく、例えば、「０」に近い予め設定された所定の比較的小さい範囲の値に固定されるようにしてもよい。 Here, it has been described that the motion vector is fixed to “0”, but the fixed value is not limited to “0”, for example, a predetermined relatively small value close to “0”. You may make it fix to the value of a range.

あるいはまた、動き予測・補償部１５３が、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた場合、キャプションが含まれるマクロブロックに対しては、動きベクトルが設定されるサブブロックのサイズを、大きく設定するようにしてもよい。 Alternatively, when the motion prediction / compensation unit 153 receives the notification of the caption macroblock from the caption detection unit 155, the size of the subblock in which the motion vector is set is set for the macroblock including the caption. Alternatively, it may be set larger.

すなわち、動きベクトルの設定にあたっては、上述したように、通常、１つのマクロブロックが、８×８個の画素で構成される４個のサブブロックに分割され、個々のサブブロックに対する動きベクトルが設定されるが、キャプションが検出された場合、キャプションが含まれるマクロブロックに対しては、サブブロックのサイズを、例えば、１６×１６の画素で構成されるブロックとするようにしてもよい。このようにすることで、マクロブロックA乃至Dに対して、それぞれ１つの動きベクトルのみが設定されることになり、図５を参照して上述したような、例えば、背景の画像の動きに伴って、個々のサブブロックのそれぞれに、別々の動きベクトルが設定され、画像の文字が歪みのある状態で表示されることを抑止することが可能となる。 That is, in setting a motion vector, as described above, normally, one macro block is divided into four sub-blocks composed of 8 × 8 pixels, and the motion vector for each sub-block is set. However, when a caption is detected, the size of the sub-block may be a block composed of, for example, 16 × 16 pixels for a macroblock including the caption. In this way, only one motion vector is set for each of the macroblocks A to D. For example, as the background image moves as described above with reference to FIG. Thus, different motion vectors are set for each of the individual sub-blocks, and it is possible to prevent the characters of the image from being displayed in a distorted state.

あるいはまた、キャプションが検出された場合、キャプションが含まれるマクロブロックに対しては、図４ａに示されるフレーム動き補償予測のみが行われるようにし、フィールド動き補償予測が行われないようにしてもよい。 Alternatively, when a caption is detected, only the frame motion compensated prediction shown in FIG. 4a may be performed on the macroblock including the caption, and the field motion compensated prediction may not be performed. .

図６（ａ）乃至（ｃ）は、文字「あ」の画像について、それぞれ全体を表示させた場合の図、インターレースのトップフィールドのみを表示させた場合の図、インターレースのボトムフィールドのみを表示させた場合の図である。同図に示されるように、文字の画像は、通常、動かないので、符号化を行うときには、文字の中のトップフィールドに属する部分は、トップフィールドを参照することが望ましく、文字の中のボトムフィールドに属する部分は、ボトムフィールドを参照することが望ましい。そのようにすることで、符号化された画像が復号されたとき、文字「あ」の画像が正確に表示されるようにすることが可能となる。 6 (a) to 6 (c) are diagrams in which the entire image of the character “A” is displayed, a diagram in which only the top field of the interlace is displayed, and a display in which only the bottom field of the interlace is displayed. FIG. As shown in the figure, an image of a character usually does not move. Therefore, when encoding, a part belonging to a top field in the character preferably refers to the top field, and a bottom of the character It is desirable to refer to the bottom field for the part belonging to the field. By doing so, it is possible to accurately display the image of the character “A” when the encoded image is decoded.

しかしながら、フィールド動き補償予測においては、図４bを参照して上述したように、トップフィールドの画素に対して、ボトムフィールドの画素を参照させたり、ボトムフィールドの画素に対して、トップフィールドの画素を参照させたりすることが可能であるので、キャプションが含まれるマクロブロックに対して、フィールド動き補償予測が行われると、符号化された画像が復号されたとき、文字「あ」の画像が正確に表示できなくなることがある。 However, in the field motion compensation prediction, as described above with reference to FIG. 4B, the top field pixel is referred to the bottom field pixel, or the top field pixel is referred to the bottom field pixel. Therefore, when field motion compensation prediction is performed on a macroblock including a caption, when the encoded image is decoded, the image of the character “a” is accurately Display may not be possible.

そこで、キャプションが含まれるマクロブロックに対しては、図４ａに示されるフレーム動き補償予測のみが行われるようする。この場合、動き予測・補償部１５３が、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けたとき、キャプションのマクロブロックに対しては、フレーム動き補償予測のみを行って予測画像データを生成するようにすればよい。なお、H.264/AVC方式の符号化において、フレーム動き補償予測のみを行わせる場合、当該マクロブロックに対して設定される参照ピクチャ番号（ref_idx）を「０」、または、予め設定された値に設定するようにすればよい。 Therefore, only the frame motion compensated prediction shown in FIG. 4A is performed for the macroblock including the caption. In this case, when the motion prediction / compensation unit 153 receives a caption macroblock notification from the caption detection unit 155, the prediction macroblock performs only frame motion compensation prediction to generate predicted image data. You just have to do it. When only frame motion compensated prediction is performed in H.264 / AVC encoding, the reference picture number (ref_idx) set for the macroblock is “0” or a preset value. Should be set to.

その結果、キャプションのマクロブロック（文字の画像が含まれるマクロブロック）は、動きベクトルを「０」に固定した場合と同様に、符号化された画像データを復号して得られる画像において、文字（いまの場合「あ」）の画像を、ほぼ正確に再生することが可能となる。 As a result, the caption macroblock (macroblock including the character image) is the same as the character (in the image obtained by decoding the encoded image data, as in the case where the motion vector is fixed to “0”. In this case, the image “a”) can be reproduced almost accurately.

また、動き予測・補償部１５３は、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けていない場合、それらのマクロブロックに対しては、通常通りフレーム動き補償予測またはフィールド動き補償予測を適応的に行って予測画像データを生成する。すなわち、文字の画像が含まれないマクロブロックについては、画像の動きに応じた適切な予測画像が生成される。 In addition, when the caption detection unit 155 has not received a caption macroblock notification from the caption detection unit 155, the motion prediction / compensation unit 153 applies frame motion compensation prediction or field motion compensation prediction to the macroblocks as usual. To generate predicted image data. That is, for a macroblock that does not include a character image, an appropriate prediction image corresponding to the motion of the image is generated.

これにより、やはり復号された画像を見たユーザに違和感を与えることがなく、また、画像全体としては、動きを考慮した美しい画像を復号できるように、符号化が行われることになる。 As a result, the user who sees the decoded image does not feel uncomfortable, and the entire image is encoded so that a beautiful image in consideration of motion can be decoded.

あるいはまた、動き予測・補償部１５３が、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた場合、キャプションが含まれるマクロブロックに対しては、マクロブロックモードを、「スキップト・マクロブロック」というマクロブロックモードに固定するようにしてもよい。 Alternatively, when the motion prediction / compensation unit 153 receives a notification of a caption macroblock from the caption detection unit 155, the macroblock mode is set to “skipped macroblock” for the macroblock including the caption. The macroblock mode may be fixed.

「スキップト・マクロブロック」というマクロブロックモードが設定されたマクロブロックに対しては、参照面の画像（予測画像データの画像）との差分の抽出が行われず、符号化された画像データが復号されて得られる画像は、結果として参照面の画像と同じ画像となる。 For macroblocks for which the macroblock mode of “skip macroblock” is set, the difference from the reference plane image (predicted image data image) is not extracted, and the encoded image data is decoded. As a result, the obtained image becomes the same image as the image of the reference plane.

キャプションに含まれる文字の画像は、通常、ほぼ動かないものであり、また、例えば、文字「あ」の形状は、時間的に前後する画像においても同じ形状とされるので、参照面の画像を、そのまま表示させた方が文字の画像を正確に表示することが可能となることが多い。 The image of the character included in the caption is usually almost non-moving. For example, the shape of the character “A” is the same shape in the images that move forward and backward, so the image of the reference plane is In many cases, it is possible to display an image of a character accurately when displayed as it is.

図７（a）は、文字「あ」の画像を符号化するときの、動き予測・補償部１５３により生成される予測画像データの画像（参照面の画像）の例を示している。図７（ｂ）は、図７（ａ）に示される予測画像データとの差分を抽出して符号化された符号化データが復号されて得られた画像の例を示している。同図に示されるように、図７（ｂ）の画像は、図７（ａ）の画像と比較して歪んだ画像となって表示されている。 FIG. 7A shows an example of an image (reference plane image) of predicted image data generated by the motion prediction / compensation unit 153 when an image of the character “A” is encoded. FIG. 7B illustrates an example of an image obtained by decoding encoded data obtained by extracting a difference from the predicted image data illustrated in FIG. As shown in the figure, the image of FIG. 7B is displayed as a distorted image as compared with the image of FIG.

「スキップト・マクロブロック」というマクロブロックモードが設定されたマクロブロックに対しては、参照面の画像（予測画像データの画像）との差分の抽出が行われないので、文字の画像を含むマクロブロック、すなわちキャプションのマクロブロックに対して、「スキップト・マクロブロック」というマクロブロックモードが設定されるようにすれば、図７（ａ）に示されるような歪みの少ない画像を表示させるようにすることが可能となる。 Differences from the reference plane image (predicted image data image) are not extracted for macroblocks for which the macroblock mode of “skipped macroblock” is set, so that the macroblock includes a character image. That is, if a macroblock mode of “skip macroblock” is set for the caption macroblock, an image with less distortion as shown in FIG. 7A is displayed. Is possible.

なお、キャプションのマクロブロックに対して、必ず「スキップト・マクロブロック」というマクロブロックモードが設定されるようにする必要はなく、他のマクロブロックモードと比較して「スキップト・マクロブロック」がより設定されやすくするようにすればよい。 Note that it is not always necessary to set the macro block mode called “skip macro block” for the macro block of the caption, and “skipped macro block” is set more than the other macro block modes. You should make it easy to be done.

あるいはまた、動き予測・補償部１５３が、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた場合、キャプションが含まれるマクロブロックに対しては、参照方向（予測方向）を１方向に制限するようにしてもよい。 Alternatively, when the motion prediction / compensation unit 153 receives a caption macroblock notification from the caption detection unit 155, the reference direction (prediction direction) is limited to one direction for the macroblock including the caption. You may make it do.

例えば、H.264/AVC方式で符号化すべきマクロブロックが、Ｂスライスのマクロブロックである場合、前方予測、後方予測、または双方向予測の３通りの予測符号化が可能である。すなわち、動き予測・補償部１５３が、符号化すべきマクロブロックの画像よりも時間的に前の画像に基づく予測画像データを生成して符号化する前方予測、動き予測・補償部１５３が、符号化すべき画像よりも時間的に後の画像に基づく予測画像データを生成して符号化する後方予測、動き予測・補償部１５３が、符号化すべき画像よりも時間的に前の画像に基づく予測画像データ、および後の画像に基づく予測画像データを生成して符号化する双方向予測のうち、キャプションが含まれるマクロブロックに対しては、前方予測または後方予測のみが行われるようにしてもよい。 For example, when a macroblock to be encoded by the H.264 / AVC format is a B-slice macroblock, three types of predictive encoding are possible: forward prediction, backward prediction, or bidirectional prediction. That is, the forward prediction / motion prediction / compensation unit 153 that generates and encodes prediction image data based on an image temporally prior to the macroblock image to be encoded is encoded by the motion prediction / compensation unit 153. Prediction image data based on an image temporally earlier than an image to be encoded by the backward prediction and motion prediction / compensation unit 153 that generates and encodes predicted image data based on an image temporally after the power image Among the bi-directional predictions that generate and encode predicted image data based on the subsequent images, only forward prediction or backward prediction may be performed on a macroblock including a caption.

図８（ａ）乃至（ｃ）は、文字「あ」の画像を符号化するときの、動き予測・補償部１５３により生成される予測画像データの画像（参照面の画像）の例を示している。図８（ａ）は、前方予測の場合の参照面の画像の例であり、図８（ｂ）は、後方予測の場合の参照面の画像の例であり、図８（ｃ）は、双方向予測の場合の参照面の画像の例である。同図に示されるように、図８（ｃ）の画像は、図８（ａ）または図８（ｂ）の画像と比較して歪んだ画像となって表示されている。 FIGS. 8A to 8C show examples of images (reference plane images) of predicted image data generated by the motion prediction / compensation unit 153 when an image of the character “A” is encoded. Yes. 8A is an example of a reference plane image in the case of forward prediction, FIG. 8B is an example of a reference plane image in the case of backward prediction, and FIG. It is an example of the image of the reference surface in the case of direction prediction. As shown in the figure, the image of FIG. 8C is displayed as a distorted image as compared with the image of FIG. 8A or FIG. 8B.

双方向予測の場合、予測画像データは、符号化すべきマクロブロックの画像より時間的に前の画像と、時間的に後の画像のそれぞれのマクロブロックに含まれる画素の平均値により生成される。従って、上述した時間的に前の画像と、時間的に後の画像において、わずかでも空間的なずれなどがある場合、生成される予測画像のデータの画像は、図８（ｃ）に示されるように歪んだものとなってしまう。 In the case of bi-directional prediction, predicted image data is generated by the average value of pixels included in each macroblock of an image temporally preceding and temporally subsequent to an image of a macroblock to be encoded. Therefore, when there is even a slight spatial shift or the like in the temporally preceding image and the temporally subsequent image, the image of the predicted image data to be generated is shown in FIG. Will be distorted.

キャプションが含まれるマクロブロックに対しては、前方予測または後方予測のみが行われるようにすることで、生成される予測画像データの画像を、図８（ａ）または図８（ｂ）に示されるような歪みの少ない画像とすることが可能となり、その結果、符号化された符号化データが復号されて得られた画像においても、文字の画像を正確に表示することが可能となる。 An image of predicted image data generated by performing only forward prediction or backward prediction on a macroblock including a caption is shown in FIG. 8A or FIG. 8B. Such an image with less distortion can be obtained. As a result, even in an image obtained by decoding the encoded data, the character image can be accurately displayed.

あるいはまた、動き予測・補償部１５３が、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた場合、キャプションが含まれるマクロブロックに対しては、動き補償の精度を、整数画素精度、または１／２画素精度に制限するようにしてもよい。 Alternatively, when the motion prediction / compensation unit 153 receives the notification of the caption macroblock from the caption detection unit 155, the motion compensation accuracy for the macroblock including the caption is set to integer pixel accuracy, or You may make it restrict | limit to 1/2 pixel precision.

例えば、H.264/AVC方式での符号化においては、動き補償の精度を、整数画素精度、１／２画素精度、１／４画素精度とすることが可能である。例えば、１／２画素精度の動き補償を行う場合、参照面の画像の中で動きベクトルにより特定される位置に画素がないとき、近隣の２つ画素値に基づいて、画素と画素の中間に位置する画素の画素値を仮想的に生成する処理が行われる。 For example, in H.264 / AVC encoding, motion compensation accuracy can be integer pixel accuracy, 1/2 pixel accuracy, and 1/4 pixel accuracy. For example, when motion compensation with 1/2 pixel accuracy is performed, if there is no pixel at the position specified by the motion vector in the image of the reference plane, the pixel is intermediate between the two pixels based on the two neighboring pixel values. A process of virtually generating the pixel value of the pixel located is performed.

図９は、動き補償の精度の例を説明する図であり、いま、参照面の画像の中に、画素Ｅ、Ｆ、Ｇ、Ｈ、Ｉ、およびＪの６つの画素があるものとする。例えば、H.264/AVC方式における動き補償では、６tap Fair Filterと呼ばれるフィルタリング処理が施されることで１／２精度の画素の画素値の生成が行われる。例えば、同図の画素ｂの画素値を生成する場合、上述の画素Ｅ乃至Ｊのそれぞれの画素値に対して予め設定された係数を乗じるフィルタリング処理を行うことにより画素ｂの画素値が生成される。 FIG. 9 is a diagram for explaining an example of the accuracy of motion compensation. Assume that there are six pixels E, F, G, H, I, and J in the image of the reference plane. For example, in motion compensation in the H.264 / AVC format, a pixel value of a ½ precision pixel is generated by performing a filtering process called a 6 tap Fair Filter. For example, when generating the pixel value of the pixel b in the figure, the pixel value of the pixel b is generated by performing a filtering process that multiplies each of the pixel values of the pixels E to J by a preset coefficient. The

例えば、画素ａまたはｃの画素値を生成する場合、１／４画素精度の動き補償が行われることになる。この場合、参照面の画像の中に、実在する画素Ｅ乃至Ｊのみから画素値を生成することができないので、画素ａまたはｃの画素値を生成にあたっては、１／２画素精度の動き補償により生成された画素ｂの画素値も用いられることになる。 For example, when the pixel value of the pixel a or c is generated, motion compensation with 1/4 pixel accuracy is performed. In this case, since the pixel value cannot be generated only from the actual pixels E to J in the image of the reference plane, the pixel value of the pixel a or c is generated by motion compensation with 1/2 pixel accuracy. The pixel value of the generated pixel b is also used.

すなわち、１／４画素精度の動き補償が行われる場合、参照面の画像の中に、実在しない画素に基づいて仮想的な画素が生成されることになる。 That is, when motion compensation with ¼ pixel accuracy is performed, virtual pixels are generated based on nonexistent pixels in the reference plane image.

図１０（ａ）乃至（ｃ）は、文字「あ」の画像を符号化するときの、動き予測・補償部１５３により生成される予測画像データの画像（参照面の画像）の例を示している。図１０（ａ）は、整数精度の動き補償の場合の参照面の画像の例であり、図１０（ｂ）は、１／２画素精度の動き補償の場合の参照面の画像の例であり、図１０（ｃ）は、１／４画素精度の動き補償の場合の参照面の画像の例である。同図に示されるように、図１０（ｃ）の画像は、図１０（ａ）または図１０（ｂ）の画像と比較して歪んだ画像となって表示されている。 FIGS. 10A to 10C show examples of predicted image data images (reference plane images) generated by the motion prediction / compensation unit 153 when an image of the character “A” is encoded. Yes. FIG. 10A is an example of a reference plane image in the case of integer-precision motion compensation, and FIG. 10B is an example of a reference plane image in the case of half-pixel precision motion compensation. FIG. 10C is an example of an image of the reference surface in the case of motion compensation with ¼ pixel accuracy. As shown in the figure, the image of FIG. 10C is displayed as a distorted image as compared with the image of FIG. 10A or 10B.

キャプションに含まれる文字の画像は、通常、ほぼ動かないものであり、また、例えば、文字「あ」の形状は、時間的に前後する画像においても同じ形状とされるので、仮想的に生成された画素に基づいて予測画像データを生成すると、歪んだ画像となることが多い。 The image of the character included in the caption is usually almost non-moving. For example, the shape of the character “A” is the same shape in the images that are temporally changed, so it is virtually generated. When predictive image data is generated based on the obtained pixels, the image is often distorted.

キャプションが含まれるマクロブロックに対しては、動き補償の精度が、整数画素精度、または１／２画素精度に制限されるようにすることで、生成される予測画像データの画像を、図１０（ａ）または図１０（ｂ）に示されるような歪みの少ない画像とすることが可能となり、その結果、符号化された符号化データが復号されて得られた画像においても、文字の画像を正確に表示することが可能となる。 For macroblocks including captions, the accuracy of motion compensation is limited to integer pixel accuracy or ½ pixel accuracy, so that an image of predicted image data generated is displayed as shown in FIG. a) or an image with less distortion as shown in FIG. 10 (b). As a result, even in an image obtained by decoding the encoded data, the character image can be accurately obtained. Can be displayed.

次に、直交変換部１４４による直交変換に関する処理を、適切に制御することによりキャプション部分の劣化をできるだけ小さくするように符号化する例について説明する。 Next, an example will be described in which processing related to orthogonal transformation by the orthogonal transformation unit 144 is encoded so as to minimize the deterioration of the caption portion by appropriately controlling.

MPEG２符号化方式では、直交変換部１４４による直交変換処理としてDCTが行われる場合、２種類のＤＣＴ符号化モードが用いられる。図１１（ａ）と図１１（ｂ）は、それぞれのＤＣＴ符号化モードを説明する図である。 In the MPEG2 encoding method, when DCT is performed as orthogonal transform processing by the orthogonal transform unit 144, two types of DCT encoding modes are used. FIG. 11A and FIG. 11B are diagrams for explaining the respective DCT coding modes.

フレームＤＣＴ符号化モードの場合、マクロブロックの輝度信号が、例えば、４個のサブブロックに分解される際に、各サブブロックが図１１（a）示されるように、トップフィールドとボトムフィールドのそれぞれを含んで構成されるように分解される。 In the case of the frame DCT coding mode, when the luminance signal of the macroblock is decomposed into, for example, four subblocks, each subblock has a top field and a bottom field, as shown in FIG. It is decomposed | disassembled so that it may be comprised including.

一方、フィールドＤＣＴ符号化モードの場合、マクロブロックの輝度信号が、例えば、４個のサブブロックに分解される際に、図１１（b）に示されるように、各サブブロックのそれぞれが、トップフィールドまたはボトムフィールドのみで構成されるように分解される。 On the other hand, in the case of the field DCT coding mode, when the luminance signal of the macroblock is decomposed into, for example, four subblocks, each of the subblocks is the top as shown in FIG. It is disassembled to consist of only the field or bottom field.

H.264/ＡＶＣ符号化方式では、画像の中で上下に（垂直方向に）隣接する２つのマクロブロックで構成されるマクロブロックペア単位に、フレーム符号化するかフィールド符号化するかを適応的に選択できるようになされている。H.264/ＡＶＣのビットストリーム中の、シーケンスパラメタセットRBSP(Raw Byte Sequence Payloads)中に、mb_adaptive_frame_field_flag(マクロブロック適応フレーム・フィールド・フラグ)というパラメタが存在し、また、スライスヘッダの中に、field_pic_flag(フィールド・ピクチャ・フラグ)というパラメタが存在する。これらのフラグの設定によって、フレームおよびマクロブロック単位の符号化方式（フレーム符号化またはフィールド符号化）が定まる。 In the H.264 / AVC coding system, whether to perform frame coding or field coding in units of macroblock pairs composed of two macroblocks that are adjacent vertically (in the vertical direction) in an image is adaptive. It has been made to be able to select. In the sequence parameter set RBSP (Raw Byte Sequence Payloads) in the H.264 / AVC bitstream, there is a parameter called mb_adaptive_frame_field_flag (macroblock adaptive frame field flag), and in the slice header, field_pic_flag There is a parameter called (field picture flag). The setting of these flags determines the encoding method (frame encoding or field encoding) in units of frames and macroblocks.

符号化すべき画像データが、インターレース（飛び越し走査）・フォーマットである場合は、ピクチャレベルまたはマクロブロックレベルでの符号化処理（個々のピクチャまたはマクロブロックペアを、フレーム符号化するか、またはフィールド符号化する処理）を適応的に行うようにすることが可能である。例えば、H.264/ＡＶＣのビットストリームのシーケンスパラメタセット中の、Mb_adaptive_frame_field_flagを「１」に設定し、スライスヘッダ中のfield_pic_flagを「０」に設定すると、ピクチャ全体としてフレーム符号化を行い、マクロブロックペアに対して、フィールド符号化またはフレーム符号化を行うことが可能である。 When the image data to be encoded is in an interlace (interlace scanning) format, encoding processing at the picture level or macroblock level (individual picture or macroblock pair is subjected to frame encoding or field encoding) Can be adaptively performed. For example, if Mb_adaptive_frame_field_flag in the sequence parameter set of the H.264 / AVC bitstream is set to “1” and field_pic_flag in the slice header is set to “0”, frame coding is performed for the entire picture, and the macroblock Field encoding or frame encoding can be performed on the pair.

ところで、マクロブロックまたはサブブロックに対して直交変換処理と量子化処理を施して圧縮して符号化し、符号化されたデータを復号して得られる画像においては、量子化誤差によりエッジ周辺部に、モスキートノイズが発生する。 By the way, in an image obtained by performing orthogonal transformation processing and quantization processing on a macroblock or sub-block, compressing and encoding, and decoding the encoded data, the edge periphery due to the quantization error, Mosquito noise is generated.

画像データを符号化するときの量子化誤差は、MPEG２符号化方式において、上述したフレームＤＣＴ符号化モードの場合、フレーム単位に発生することになり、上述したフィールドＤＣＴ符号化モードの場合、フィールド単位に発生することになる。すなわち、フレームＤＣＴ符号化モードによる符号化が行われる場合、量子化誤差によるモスキートノイズが、トップフィールドとボトムフィールドの双方にほぼ均等に生じるのに対して、フィールドＤＣＴ符号化モードによる符号化が行われる場合、量子化誤差によるモスキートのノイズが、トップフィールドまたはボトムフィールドのいずれか一方においてより顕著に発生することがある。 The quantization error when encoding image data occurs in the frame unit in the above-described frame DCT encoding mode in the MPEG2 encoding method, and in the field DCT encoding mode in the above-described field DCT encoding mode. Will occur. That is, when encoding in the frame DCT encoding mode is performed, mosquito noise due to quantization error is generated almost uniformly in both the top field and the bottom field, whereas encoding in the field DCT encoding mode is performed. In some cases, mosquito noise due to quantization errors may be more prominent in either the top field or the bottom field.

図１２（ａ）と（ｂ）は、文字「あ」の画像を符号化した符号化データと復号して得られた画像の例を示している。図１２（ａ）は、フレームＤＣＴ符号化モードの場合の画像の例であり、図１２（ｂ）は、フィールドＤＣＴ符号化モードの場合の画像の例であり、図１２（ｂ）には、図中横方向の線状のモスキートノイズが発生している。このように、文字の画像の中で、トップフィールドまたはボトムフィールドのいずれか一方に（規則的に）、より顕著なモスキートノイズが発生すると、文字全体にモスキートノイズが発生している図１２（ａ）の画像の場合と比較してより視覚されやすい歪みとなってしまう。 FIGS. 12A and 12B show examples of encoded data obtained by encoding an image of the character “A” and an image obtained by decoding. FIG. 12A is an example of an image in the case of the frame DCT encoding mode, FIG. 12B is an example of an image in the case of the field DCT encoding mode, and FIG. In the figure, horizontal mosquito noise is generated in the horizontal direction. As described above, when more remarkable mosquito noise is generated in either the top field or the bottom field (regularly) in the character image, the mosquito noise is generated in the entire character. ), The distortion becomes easier to be seen than in the case of the image.

そこで、本発明の画像処理装置１００においては、直交変換部１４４が、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた場合、キャプションのマクロブロックのマクロブロックペアに対しては、フレーム符号化による符号化のみを行うようになされている。 Therefore, in the image processing apparatus 100 of the present invention, when the orthogonal transform unit 144 receives the notification of the caption macroblock from the caption detection unit 155, the frame code is not applied to the macroblock pair of the caption macroblock. Only encoding by encoding is performed.

このようにすることで、キャプションの画像を符号化したデータを復号して得られる画像において、図１２（ｂ）に示されるような、ユーザに視覚されやすいモスキートノイズが発生することを抑止できる。 By doing in this way, it is possible to suppress the occurrence of mosquito noise that is easy for the user to see as shown in FIG. 12B in the image obtained by decoding the data obtained by encoding the caption image.

なお、直交変換部１４４が、マクロブロックペアに対して、フレーム符号化による符号化を行う場合、同時に動き予測・補償部１５３が図４（ａ）を参照して上述したフレーム動き補償予測を行うようにしてもよい。 In addition, when the orthogonal transform unit 144 performs encoding by frame encoding on the macroblock pair, the motion prediction / compensation unit 153 simultaneously performs the frame motion compensation prediction described above with reference to FIG. You may do it.

あるいはまた、直交変換部１４４が、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた場合、キャプションが含まれるマクロブロックに対しては、直交変換処理に伴ってそのマクロブロックをサブブロックに分割するとき、分割されるサブブロックのサイズ（いわゆる直交変換サイズ）を、４×４個の画素で構成されるサブブロックに制限するようにしてもよい。 Alternatively, when the orthogonal transform unit 144 receives a notification of a caption macro block from the caption detection unit 155, the macro block including the caption is converted into a sub-block along with the orthogonal transform process. When dividing, the size of the sub-block to be divided (so-called orthogonal transform size) may be limited to a sub-block composed of 4 × 4 pixels.

上述したように、直交変換処理と量子化処理を施して圧縮して符号化し、符号化されたデータを復号して得られる画像においては、量子化誤差によりモスキートノイズが発生するが、発生するモスキートノイズは、直交変換処理が施されるサブブロックを単位として伝播することになる。 As described above, mosquito noise is generated due to quantization error in an image obtained by compressing and encoding by performing orthogonal transform processing and quantization processing, and decoding the encoded data. Noise propagates in units of sub-blocks subjected to orthogonal transform processing.

図１３（ａ）と（ｂ）は、文字「あ」の画像を符号化した符号化データと復号して得られた画像の例を示している。図１３（ａ）は、いわゆる直交変換サイズを４×４（個の画素）とした場合の画像の例であり、図１３（ｂ）は、いわゆる直交変換サイズを８×８（個の画素）とした場合の画像の例であり、図１３（ｂ）の画像の歪みは、図１３（ａ）の画像の場合と比較してより視覚されやすい歪みとなってしまう。 FIGS. 13A and 13B show examples of encoded data obtained by encoding an image of the character “A” and an image obtained by decoding. FIG. 13A is an example of an image when the so-called orthogonal transformation size is 4 × 4 (pixels), and FIG. 13B is an example of the so-called orthogonal transformation size of 8 × 8 (pixels). In this example, the distortion of the image in FIG. 13B is more easily visible than that of the image in FIG.

すなわち、直交変換サイズが大きい場合、一度量子化誤差が発生すると、画像の中の広い範囲でモスキートノイズが発生するが、直交変換サイズが小さい場合、量子化誤差が発生しても、モスキートノイズは、画像の中の比較的狭い範囲に発生することになり、文字の画像を符号化する場合、直交変換サイズが小さい方が、符号化したデータを復号して得られる画像においてモスキートノイズが視覚されにくいことになる。 In other words, if the orthogonal transform size is large, once a quantization error occurs, mosquito noise occurs in a wide range in the image, but if the orthogonal transform size is small, even if a quantization error occurs, the mosquito noise is When encoding a character image, mosquito noise is more visible in the image obtained by decoding the encoded data when the orthogonal transform size is smaller. It will be difficult.

このように、直交変換サイズを４×４に制限することで、キャプションの画像を符号化したデータを復号して得られる画像において、図１３（ｂ）に示されるような、ユーザに視覚されやすいモスキートノイズが発生することを抑止できる。 In this way, by limiting the orthogonal transform size to 4 × 4, the image obtained by decoding the data obtained by encoding the caption image is easily visible to the user as shown in FIG. 13B. Generation of mosquito noise can be suppressed.

次に、図１４のフローチャートを参照して画像処理装置１００による符号化処理について説明する。 Next, the encoding process by the image processing apparatus 100 will be described with reference to the flowchart of FIG.

ステップS１０１において、キャプション検出部１５５は、いまから符号化すべき画像の画像データを取得する。このとき、例えば、画面並べ替えバッファ１４２を介して供給される画像データがキャプション検出部１５５により取得される。 In step S101, the caption detection unit 155 acquires image data of an image to be encoded from now. At this time, for example, image data supplied via the screen rearrangement buffer 142 is acquired by the caption detection unit 155.

ステップS１０２において、キャプション検出部１５５は、ステップS１０１の処理で取得された画像データの画像を解析する。このとき、例えば、画像の所定のマクロブロックのエッジ数を検出するなどして、当該マクロブロックに文字が含まれているか否かがチェックされる。 In step S102, the caption detection unit 155 analyzes the image of the image data acquired in the process of step S101. At this time, for example, by detecting the number of edges of a predetermined macroblock of the image, it is checked whether or not the macroblock contains characters.

ステップS１０３において、キャプション検出部１５５は、ステップS１０２の処理による解析の結果に基づいて、その画像のマクロブロックにおいて、キャプションが検出されたか否かを判定する。 In step S103, the caption detection unit 155 determines whether or not a caption is detected in the macroblock of the image based on the analysis result obtained in step S102.

ステップS１０３において、キャプションが検出されたと判定された場合、処理は、ステップS１０４に進む。このとき、キャプション検出部１５５は、例えば、キャプションが検出されたことを表す制御信号を、キャプションが検出されたマクロブロックを特定する情報と対応付けて直交変換部１４４、動き予測・補償部１５３、およびレート制御部１５４に供給する。 If it is determined in step S103 that a caption has been detected, the process proceeds to step S104. At this time, the caption detection unit 155 associates, for example, a control signal indicating that the caption is detected with information identifying the macroblock in which the caption is detected, the orthogonal transform unit 144, the motion prediction / compensation unit 153, And supplied to the rate control unit 154.

ステップS１０４においては、図１５乃至図２３を参照して後述するように、キャプション対応処理が実行される。 In step S104, a caption handling process is executed as described later with reference to FIGS.

図１５は、図１４のステップS１０４のキャプション対応処理の詳細な例を説明するフローチャートである。 FIG. 15 is a flowchart for explaining a detailed example of the caption handling process in step S104 of FIG.

同図のステップS２０１では、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けたレート制御部１５４が、キャプションのマクロブロックに対しては、TM5のレート制御のステップ１とステップ２のみを行う。すなわち、キャプションのマクロブロックに対しては、TM5のレート制御のステップ３の処理が施されないようにする。 In step S201 in the figure, the rate control unit 154 that has received the notification of the caption macroblock from the caption detection unit 155 performs only the TM5 rate control step 1 and step 2 for the caption macroblock. . That is, the processing of step 3 of TM5 rate control is not performed on the caption macroblock.

これにより、図３を参照して上述したように、符号化された画像データを復号して得られた画像において、文字の一部がぼけて表示されることが抑止される。 As a result, as described above with reference to FIG. 3, in the image obtained by decoding the encoded image data, a part of characters is prevented from being blurred.

図１６は、図１４のステップS１０４のキャプション対応処理の詳細についての別の例を説明するフローチャートである。 FIG. 16 is a flowchart for explaining another example of the details of the caption handling process in step S104 of FIG.

同図のステップS２２１では、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた動き予測・補償部１５３が、キャプションのマクロブロックに対して設定される動きベクトルを、「０」または「０」に近い所定の範囲の値とする。 In step S221 in the figure, the motion prediction / compensation unit 153 that has received the notification of the caption macroblock from the caption detection unit 155 sets the motion vector set for the caption macroblock to “0” or “0”. A value in a predetermined range close to “”.

これにより、図５を参照して上述したように、符号化された画像データを復号して得られた画像において、個々のサブブロックのそれぞれに、別々の動きベクトルが設定されて文字が歪みのある状態で表示されことが抑止される。 Accordingly, as described above with reference to FIG. 5, in the image obtained by decoding the encoded image data, a separate motion vector is set for each of the sub-blocks, and the characters are distorted. Display in a certain state is suppressed.

図１７は、図１４のステップS１０４のキャプション対応処理の詳細についてのさらに別の例を説明するフローチャートである。 FIG. 17 is a flowchart illustrating yet another example of the details of the caption handling process in step S104 of FIG.

同図のステップS２４１では、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた動き予測・補償部１５３が、キャプションのマクロブロックに対して、動きベクトルが設定されるサブブロックのサイズを、１６×１６の画素で構成されるブロックとする。 In step S241 in the figure, the motion prediction / compensation unit 153 that has received the notification of the caption macroblock from the caption detection unit 155 determines the size of the sub-block in which the motion vector is set for the caption macroblock. The block is composed of 16 × 16 pixels.

これにより、やはり、図５を参照して上述したように、符号化された画像データを復号して得られた画像において、個々のサブブロックのそれぞれに、別々の動きベクトルが設定されて文字が歪みのある状態で表示されことが抑止される。 As a result, as described above with reference to FIG. 5, in the image obtained by decoding the encoded image data, a separate motion vector is set for each sub-block, and characters are Displaying in a distorted state is suppressed.

図１８は、図１４のステップS１０４のキャプション対応処理の詳細についてのさらに別の例を説明するフローチャートである。 FIG. 18 is a flowchart illustrating yet another example of the details of the caption handling process in step S104 of FIG.

同図のステップS２６１では、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた動き予測・補償部１５３が、キャプションのマクロブロックに対して設定される参照ピクチャ番号（ref_idx）を「０」、または、予め設定された値に設定する。 In step S261 in FIG. 9, the motion prediction / compensation unit 153 that has received the notification of the caption macroblock from the caption detection unit 155 sets the reference picture number (ref_idx) set for the caption macroblock to “0”. Or set to a preset value.

これにより、キャプションが含まれるマクロブロックに対しては、図４ａに示されるフレーム動き補償予測のみが行われるようになり、フィールド動き補償予測が行われることによる予想画像データの画像における歪みの発生が抑止される。 As a result, only the frame motion compensation prediction shown in FIG. 4A is performed for the macroblock including the caption, and distortion in the image of the predicted image data due to the field motion compensation prediction is generated. Deterred.

図１９は、図１４のステップS１０４のキャプション対応処理の詳細についてのさらに別の例を説明するフローチャートである。 FIG. 19 is a flowchart illustrating yet another example of the details of the caption handling process in step S104 of FIG.

同図のステップS２８１では、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた動き予測・補償部１５３が、キャプションのマクロブロックに対して設定されるマクロブロックモードについて、「スキップト・マクロブロック」が優先されて設定されるようにする。 In step S281 in the figure, the motion prediction / compensation unit 153 that has received the notification of the caption macroblock from the caption detection unit 155 performs “skipped macroblock” for the macroblock mode set for the caption macroblock. "Is prioritized and set.

これにより、図７（ａ）と図７（ｂ）を参照して上述したように、予測画像データとの差分を抽出して符号化された符号化データが復号されて得られた画像ではなく、予測画像データの画像がそのまま表示される可能性が高くなり、歪みの少ない画像を表示させるようにすることが可能となる。 As a result, as described above with reference to FIGS. 7A and 7B, the encoded data obtained by extracting the difference from the predicted image data and decoding it is not an image obtained by decoding. The possibility that the image of the predicted image data is displayed as it is is increased, and an image with less distortion can be displayed.

図２０は、図１４のステップS１０４のキャプション対応処理の詳細についてのさらに別の例を説明するフローチャートである。 FIG. 20 is a flowchart for explaining yet another example of the details of the caption handling process in step S104 of FIG.

同図のステップS３０１では、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた動き予測・補償部１５３が、キャプションのマクロブロックに対して前方予測、または後方予測を行う。 In step S <b> 301 in FIG. 6, the motion prediction / compensation unit 153 that has received the notification of the caption macroblock from the caption detection unit 155 performs forward prediction or backward prediction on the caption macroblock.

これにより、図８（ａ）乃至（ｃ）を参照して上述したように、双方向予測により生成される予測画像データの画像が歪んだものとなることが抑止される。 Accordingly, as described above with reference to FIGS. 8A to 8C, the image of the predicted image data generated by the bidirectional prediction is prevented from being distorted.

図２１は、図１４のステップS１０４のキャプション対応処理の詳細についてのさらに別の例を説明するフローチャートである。 FIG. 21 is a flowchart for explaining yet another example of details of the caption handling process in step S104 of FIG.

同図のステップS３２１では、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた動き予測・補償部１５３が、キャプションのマクロブロックに対する動き補償の画素精度を、整数画素制度、または１／２画素制度とする。 In step S321 in the figure, the motion prediction / compensation unit 153 that has received the caption macroblock notification from the caption detection unit 155 changes the pixel accuracy of the motion compensation for the caption macroblock to the integer pixel system or 1/2 The pixel system.

これにより、図１０（ａ）乃至（ｃ）を参照して上述したように、１／４画素精度の動き補償により予測画像データの画像が歪んだ画像となることが抑止される。 Accordingly, as described above with reference to FIGS. 10A to 10C, the image of the predicted image data is prevented from being a distorted image by the motion compensation with ¼ pixel accuracy.

図２２は、図１４のステップS１０４のキャプション対応処理の詳細についてのさらに別の例を説明するフローチャートである。 FIG. 22 is a flowchart illustrating yet another example of the details of the caption handling process in step S104 of FIG.

同図のステップS３４１では、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた直交変換部１４４が、キャプションのマクロブロックのマクロブロックペアに対してフレーム符号化（フレームＤＣＴ符号化モード）による符号化を行う。 In step S341 in the figure, the orthogonal transform unit 144 that has received the notification of the caption macroblock from the caption detection unit 155 performs frame coding (frame DCT coding mode) on the macroblock pair of the caption macroblock. Encoding is performed.

これにより、図１２（ａ）と図１２（ｂ）を参照して上述したように、キャプションの画像に対してフィールド符号化の符号化が行われ、符号化したデータを復号して得られる画像において、ユーザに視覚されやすいモスキートノイズが発生することを抑止できる。 Thus, as described above with reference to FIGS. 12A and 12B, field encoding is performed on the caption image, and an image obtained by decoding the encoded data is obtained. Therefore, it is possible to suppress the occurrence of mosquito noise that is easily visible to the user.

図２３は、図１４のステップS１０４のキャプション対応処理の詳細についてのさらに別の例を説明するフローチャートである。 FIG. 23 is a flowchart illustrating yet another example of the details of the caption handling process in step S104 of FIG.

同図のステップS３６１では、キャプション検出部１５５から、キャプションのマクロブロックの通知を受けた直交変換部１４４が、キャプションのマクロブロックのマクロブロックペアの直交変換サイズを４×４にする。 In step S361, the orthogonal transform unit 144 that has received the caption macroblock notification from the caption detection unit 155 sets the orthogonal transform size of the macroblock pair of the caption macroblock to 4 × 4.

これにより、図１３（ａ）と図１３（ｂ）を参照して上述したように、キャプションの画像に対して直交変換サイズが大きい状態で符号化が行われ、画像を符号化したデータを復号して得られる画像において、ユーザに視覚されやすいモスキートノイズが発生することを抑止できる。 As a result, as described above with reference to FIGS. 13A and 13B, encoding is performed in a state where the orthogonal transform size is large with respect to the caption image, and the data obtained by encoding the image is decoded. It is possible to suppress the occurrence of mosquito noise that is easily visible to the user in the obtained image.

以上のように、キャプション対応処理が実行される。図１４のステップS１０４では、図１５乃至図２３を参照して上述した処理のうちの１つが実行されるようにしてもよいし、図１５乃至図２３を参照して上述した処理の全てが行われるようにしてもよい。さらに、図１４のステップS１０４では、図１５乃至図２３を参照して上述した処理のうちの複数の処理が適宜選択されて実行されるようにしてもよい。 As described above, the caption handling process is executed. In step S104 of FIG. 14, one of the processes described above with reference to FIGS. 15 to 23 may be executed, or all of the processes described above with reference to FIGS. 15 to 23 may be performed. You may be made to be. Furthermore, in step S104 of FIG. 14, a plurality of processes among the processes described above with reference to FIGS. 15 to 23 may be appropriately selected and executed.

図１４のステップＳ１０４の処理の後、ステップＳ１０５において、当該画像データのH.264／AVC方式による符号化が行われる。このとき、図１を参照して上述したように、加算器１４３乃至レート制御部１５４が動作するが、ステップS１０３の処理でキャプションが検出されたと判定されている場合、レート制御部１５４、動き予測・補償部１５３、または量子化部１４５のそれぞれは、図１５乃至図２３を参照して上述した処理に対応して動作するものとする。 After the process of step S104 in FIG. 14, the image data is encoded by the H.264 / AVC format in step S105. At this time, as described above with reference to FIG. 1, the adder 143 to the rate control unit 154 operate. However, when it is determined that the caption is detected in the process of step S103, the rate control unit 154 and the motion prediction Each of the compensation unit 153 and the quantization unit 145 is assumed to operate corresponding to the processing described above with reference to FIGS.

ステップＳ１０６において、全ての画像データが符号化されたか否かが判定され、まだ全ての画像データが符号化されていないと判定された場合、処理は、ステップＳ１０１に戻り、それ以降の処理が繰り返し実行される。 In step S106, it is determined whether or not all image data has been encoded. If it is determined that all image data has not been encoded yet, the process returns to step S101, and the subsequent processes are repeated. Executed.

ステップＳ１０６において、全ての画像データが符号化されたかと判定された場合、符号化処理は終了する。 If it is determined in step S106 that all image data has been encoded, the encoding process ends.

以上においては、画像処理装置１００において、H．264/AVCによる符号化が行われる例について説明したが、MPEG４など他の符号化方式（圧縮符号化方式）による符号化が行われる場合であっても本発明を適用することができる。 In the above, in the image processing apparatus 100, the H.264 Although an example in which encoding by H.264 / AVC is performed has been described, the present invention can be applied even when encoding by another encoding method (compression encoding method) such as MPEG4 is performed.

なお、上述した一連の処理は、ハードウェアにより実行させることもできるし、ソフトウェアにより実行させることもできる。上述した一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば図２４に示されるような汎用のパーソナルコンピュータ７００などに、ネットワークや記録媒体からインストールされる。 The series of processes described above can be executed by hardware, or can be executed by software. When the above-described series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. For example, a general-purpose personal computer 700 as shown in FIG. 24 is installed from a network or a recording medium.

図２４において、CPU（Central Processing Unit）７０１は、ROM（Read Only Memory）７０２に記憶されているプログラム、または記憶部７０８からRAM（Random Access Memory）７０３にロードされたプログラムに従って各種の処理を実行する。RAM７０３にはまた、CPU７０１が各種の処理を実行する上において必要なデータなども適宜記憶される。 24, a CPU (Central Processing Unit) 701 executes various processes according to a program stored in a ROM (Read Only Memory) 702 or a program loaded from a storage unit 708 to a RAM (Random Access Memory) 703. To do. The RAM 703 also appropriately stores data necessary for the CPU 701 to execute various processes.

CPU７０１、ROM７０２、およびRAM７０３は、バス７０４を介して相互に接続されている。このバス７０４にはまた、入出力インタフェース７０５も接続されている。 The CPU 701, ROM 702, and RAM 703 are connected to each other via a bus 704. An input / output interface 705 is also connected to the bus 704.

入出力インタフェース７０５には、キーボード、マウスなどよりなる入力部７０６、CRT(Cathode Ray Tube)、ＬＣＤ(Liquid Crystal display)などよりなるディスプレイ、並びにスピーカなどよりなる出力部７０７、ハードディスクなどより構成される記憶部７０８、モデム、LANカードなどのネットワークインタフェースカードなどより構成される通信部７０９が接続されている。通信部７０９は、インターネットを含むネットワークを介しての通信処理を行う。 The input / output interface 705 includes an input unit 706 including a keyboard and a mouse, a display including a CRT (Cathode Ray Tube) and an LCD (Liquid Crystal display), an output unit 707 including a speaker, and a hard disk. A communication unit 709 including a storage unit 708, a network interface card such as a modem and a LAN card, and the like is connected. The communication unit 709 performs communication processing via a network including the Internet.

入出力インタフェース７０５にはまた、必要に応じてドライブ７１０が接続され、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア７１１が適宜装着され、それらから読み出されたコンピュータプログラムが、必要に応じて記憶部７０８にインストールされる。 A drive 710 is also connected to the input / output interface 705 as necessary, and a removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is appropriately mounted, and a computer program read from them is loaded. It is installed in the storage unit 708 as necessary.

上述した一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、インターネットなどのネットワークや、リムーバブルメディア７１１などからなる記録媒体からインストールされる。 When the above-described series of processing is executed by software, a program constituting the software is installed from a network such as the Internet or a recording medium such as a removable medium 711.

なお、この記録媒体は、図２４に示される、装置本体とは別に、ユーザにプログラムを配信するために配布される、プログラムが記録されている磁気ディスク（フロッピディスク（登録商標）を含む）、光ディスク（CD-ROM(Compact Disk-Read Only Memory),DVD(Digital Versatile Disk)を含む）、光磁気ディスク（MD（Mini-Disk）（登録商標）を含む）、もしくは半導体メモリなどよりなるリムーバブルメディア７１１により構成されるものだけでなく、装置本体に予め組み込まれた状態でユーザに配信される、プログラムが記録されているROM７０２や、記憶部７０８に含まれるハードディスクなどで構成されるものも含む。 Note that this recording medium is a magnetic disk (including a floppy disk (registered trademark)) on which a program is recorded, which is distributed to distribute the program to the user separately from the apparatus main body shown in FIG. Removable media consisting of optical disks (including CD-ROM (compact disk-read only memory), DVD (digital versatile disk)), magneto-optical disks (including MD (mini-disk) (registered trademark)), or semiconductor memory It includes not only those configured by 711 but also those configured by a ROM 702 storing a program, a hard disk included in the storage unit 708, and the like that are distributed to the user in a state of being incorporated in the apparatus main body in advance.

本明細書において上述した一連の処理を実行するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 The steps of executing the series of processes described above in this specification are performed in parallel or individually even if they are not necessarily processed in time series, as well as processes performed in time series in the order described. It also includes processing.

本発明を適用した画像処理装置の一実施形態に係る構成例を示すブロック図である。It is a block diagram which shows the structural example which concerns on one Embodiment of the image processing apparatus to which this invention is applied. 文字の画像が含まれるマクロブロックの例を示す図である。It is a figure which shows the example of the macroblock in which the image of a character is contained. 図２の画像が符号化された画像データが復号されて得られる画像の例を示す図である。It is a figure which shows the example of the image obtained by decoding the image data in which the image of FIG. 2 was encoded. フレーム動き補償予測とフィールド動き補償予測を説明する図である。It is a figure explaining frame motion compensation prediction and field motion compensation prediction. 別々の動きベクトルが設定された文字の画像が含まれるマクロブロックの例を示す図である。It is a figure which shows the example of the macroblock containing the image of the character in which the separate motion vector was set. インターレースの画像、トップフィールドの画像、およびボトムフィールドの画像の例を示す図である。It is a figure which shows the example of the image of an interlace, the image of a top field, and the image of a bottom field. 予測画像データの画像と、予測画像データの画像との差分を抽出して符号化された画像の例を示す図である。It is a figure which shows the example of the image extracted by extracting the difference of the image of prediction image data, and the image of prediction image data. 前方予測、後方予測、双方向予測により生成される予測画像データの画像の例を示す図である。It is a figure which shows the example of the image of the prediction image data produced | generated by forward prediction, backward prediction, and bidirectional | two-way prediction. 動き補償の精度を説明する図である。It is a figure explaining the precision of motion compensation. 整数精度、1/2画素精度、および1/4画素精度の動き補償により生成される予測画像データの画像の例を示す図である。It is a figure which shows the example of the image of the estimated image data produced | generated by the motion compensation of integer precision, 1/2 pixel precision, and 1/4 pixel precision. フレームDCT符号化モードと、フィールドDCT符号化モードを説明する図である。It is a figure explaining frame DCT encoding mode and field DCT encoding mode. フレームDCT符号化モードと、フィールドDCT符号化モードのそれぞれで符号化されたデータを復号して得られる画像の例を示す図である。It is a figure which shows the example of the image obtained by decoding the data encoded by each of frame DCT encoding mode and field DCT encoding mode. 直交変換サイズ４×４と、８×８のそれぞれで符号化されたデータを復号して得られる画像の例を示す図である。It is a figure which shows the example of the image obtained by decoding the data encoded with each of orthogonal transformation size 4x4 and 8x8. 符号化処理を説明するフローチャートである。It is a flowchart explaining an encoding process. キャプション対応処理を説明するフローチャートである。It is a flowchart explaining a caption corresponding process. キャプション対応処理の別の例を説明するフローチャートである。It is a flowchart explaining another example of a caption corresponding | compatible process. キャプション対応処理のさらに別の例を説明するフローチャートである。It is a flowchart explaining another example of a caption corresponding | compatible process. キャプション対応処理のさらに別の例を説明するフローチャートである。It is a flowchart explaining another example of a caption corresponding | compatible process. キャプション対応処理のさらに別の例を説明するフローチャートである。It is a flowchart explaining another example of a caption corresponding | compatible process. キャプション対応処理のさらに別の例を説明するフローチャートである。It is a flowchart explaining another example of a caption corresponding | compatible process. キャプション対応処理のさらに別の例を説明するフローチャートである。It is a flowchart explaining another example of a caption corresponding | compatible process. キャプション対応処理のさらに別の例を説明するフローチャートである。It is a flowchart explaining another example of a caption corresponding | compatible process. キャプション対応処理のさらに別の例を説明するフローチャートである。It is a flowchart explaining another example of a caption corresponding | compatible process. パーソナルコンピュータの構成例を示すブロック図である。And FIG. 16 is a block diagram illustrating a configuration example of a personal computer.

Explanation of symbols

１００画像処理装置，１４２画面並べ替えバッファ，１４３加算器，１４４直交変換部，１４５量子化部，１４６可逆符号化部，１４７蓄積バッファ，１５３動き予測・補償部，１５４レート制御部，１５５キャプション検出部 100 image processing apparatus, 142 screen rearrangement buffer, 143 adder, 144 orthogonal transform unit, 145 quantization unit, 146 lossless encoding unit, 147 accumulation buffer, 153 motion prediction / compensation unit, 154 rate control unit, 155 caption detection Part

Claims

MPEG (Moving Picture Coding Experts Group) 4 or H.264 / AVC (Advanced Video Coding) format image processing apparatus for encoding image data,
Image data acquisition means for acquiring the image data to be encoded;
Determination means for determining whether or not a caption is included in the image of the image data acquired by the image data acquisition means;
A rate control unit that controls a bit rate of the encoded image data by changing a quantization parameter for quantizing the image data according to a feature amount of the image;
When it is determined by the determination means that the image of the image data includes a caption, the rate control means is configured to a block composed of a plurality of pixels in a portion where the caption is displayed in the image. An image processing apparatus that sets a quantization parameter set to a predetermined value regardless of the feature amount of the image.

A prediction image data generation unit configured to generate image data of a prediction image corresponding to an image of the image data to be encoded, according to a motion of an image of the image data to be encoded;
When the determination unit determines that a caption is included in the image of the image data, the predicted image data generation unit includes a plurality of pixels in a portion where the caption is displayed in the image. The image processing apparatus according to claim 1, wherein the motion vector set for the block is a value in a predetermined range.

A prediction image data generation unit configured to generate image data of a prediction image corresponding to an image of the image data to be encoded, according to a motion of an image of the image data to be encoded;
When the determination unit determines that the image data includes a caption, the prediction image data generation unit is a block for setting a motion vector, and the caption is displayed in the image. The image processing apparatus according to claim 1, wherein a size of the block including a plurality of pixels in a certain portion is larger than a preset size.

A prediction image data generation unit configured to generate image data of a prediction image corresponding to an image of the image data to be encoded, according to a motion of an image of the image data to be encoded;
When the determination unit determines that a caption is included in the image of the image data, the predicted image data generation unit includes a plurality of pixels in a portion where the caption is displayed in the image. The image processing apparatus according to claim 1, wherein a field of an image used for generating the predicted image data including a block corresponding to the block is the same field as an image of the image data to be encoded.

A prediction image data generation unit configured to generate image data of a prediction image corresponding to an image of the image data to be encoded, according to a motion of an image of the image data to be encoded;
When the determination unit determines that a caption is included in the image of the image data, the predicted image data generation unit includes a plurality of pixels in a portion where the caption is displayed in the image. The image processing apparatus according to claim 1, wherein the macroblock mode set for the macroblock is a skipped macroblock.

A prediction image data generation unit configured to generate image data of a prediction image corresponding to an image of the image data to be encoded, according to a motion of an image of the image data to be encoded;
When the determination unit determines that a caption is included in the image of the image data, the predicted image data generation unit includes a plurality of pixels in a portion where the caption is displayed in the image. An image used to generate the predicted image data corresponding to the block is an image temporally preceding the image data image to be encoded, or an image temporally subsequent to the image data image. The image processing apparatus according to claim 1, wherein the image processing apparatus is any one of them.

A prediction image data generation unit configured to generate image data of a prediction image corresponding to an image of the image data to be encoded, according to a motion of an image of the image data to be encoded;
When it is determined by the determination unit that the image of the image data includes a caption, the predicted image data generation unit sets the pixel accuracy of the predicted image data to integer pixel accuracy or 1/2 pixel. The image processing apparatus according to claim 1, wherein the image processing apparatus has accuracy.

Further comprising orthogonal transform processing means for performing orthogonal transform processing on difference data between the image data to be encoded and predicted image data corresponding to the image data;
2. The orthogonal transform processing unit performs orthogonal transform processing on the data in a frame coding mode when the determination unit determines that a caption is included in the image of the image data. The image processing apparatus described.

Further comprising orthogonal transform processing means for performing orthogonal transform processing on difference data between the image data to be encoded and predicted image data corresponding to the image data;
When the determining unit determines that the image data includes a caption, the orthogonal transform processing unit sets an orthogonal transform size value, which is a unit for performing orthogonal transform processing on the data. The image processing apparatus according to claim 1, wherein the value is smaller than a preset size.

An image processing method of an image processing apparatus that encodes image data by MPEG (Moving Picture Coding Experts Group) 4 or H.264 / AVC (Advanced Video Coding) system,
Obtaining the image data to be encoded;
Determining whether a caption is included in the image of the acquired image data;
When it is determined that a caption is included in the image of the image data, the image data is encoded by changing a quantization parameter for quantizing the image data according to the feature amount of the image. A rate control means for controlling the bit rate of the image data uses a quantization parameter set for a block composed of a plurality of pixels in a portion where captions are displayed in the image as a feature amount of the image. Regardless of this, an image processing method including a step of setting a predetermined value.

An MPEG (Moving Picture Coding Experts Group) 4 or H.264 / AVC (Advanced Video Coding) format image processing apparatus that performs image processing on an image processing apparatus,
Controlling the acquisition of the image data to be encoded,
Controlling whether or not a caption is included in the image of the acquired image data;
When it is determined that a caption is included in the image of the image data, the image data is encoded by changing a quantization parameter for quantizing the image data according to the feature amount of the image. A rate control means for controlling the bit rate of the image data uses a quantization parameter set for a block composed of a plurality of pixels in a portion where captions are displayed in the image as a feature amount of the image. Regardless of this, a computer-readable program including a step for controlling to a predetermined value.

A recording medium on which the program according to claim 11 is recorded.