JP2010169822A

JP2010169822A - Image output device and method for outputting image

Info

Publication number: JP2010169822A
Application number: JP2009011111A
Authority: JP
Inventors: Takaaki Noguchi; 高明野口
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2009-01-21
Filing date: 2009-01-21
Publication date: 2010-08-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image output device having a circuit that forms and outputs an interpolation image other than an original image, the image output device minimizing image quality deterioration caused by disturbance of a stationary title portion, thereby: achieving a circuit for a higher image quality; eliminating the need to add a special buffer memory to mount the circuit. <P>SOLUTION: When an interpolation image is generated from an interpolation vector based on an original image in an interpolation image forming circuit 6 with respect to a stationary title area detected by a stationary title detection circuit 3, the value of an interpolation vector relative to a pixel determined as the stationary title area is used as 0. This prevents a stationary title from being pervaded and deleted by a surrounding dynamic motion picture portion. Further, if a pixel specified by an interpolation vector is in a stationary title area when an interpolation image is generated by pixels outside the stationary title area, and an interpolation image is generated having an interpolation vector value as 0. This prevents pixels in the stationary title area from being located outside the stationary title area. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、画像出力装置及び画像出力方法に関し、液晶テレビなどの映像表示装置に画像を表示する技術に属する。 The present invention relates to an image output device and an image output method, and belongs to a technique for displaying an image on a video display device such as a liquid crystal television.

従来、テレビ放送は１秒間に６０フレームの画像を送信し、テレビ受像機側でこれを受信し、１秒間に６０フレームの画像を出力することで動画を表現している。液晶テレビなどのホールド型ディスプレイでは、非特許文献１にあるように、人間の視覚特性上、エッジぼやけなどの画質劣化を生むひとつの原因となっていた。 Conventionally, in television broadcasting, an image of 60 frames is transmitted per second, the television receiver receives the image, and outputs an image of 60 frames per second to express a moving image. In a hold type display such as a liquid crystal television, as described in Non-Patent Document 1, due to human visual characteristics, it is one cause of image quality degradation such as edge blurring.

この画質劣化を低減する技術のひとつとして、１秒間に表示する画像枚数を２倍(秒間１２０フレーム)に増加した倍速表示技術がある。
これは、受信した秒間６０フレームの原画像に対して、原画像と原画像の間に、原画像を元に生成した補間画像（以後「内挿画像」という）を生成し、１フレームおきに原画像と内挿画像を交互に表示させる方法である。 As one of the techniques for reducing the image quality deterioration, there is a double speed display technique in which the number of images displayed per second is doubled (120 frames per second).
This is because an interpolated image (hereinafter referred to as “interpolated image”) generated based on the original image is generated between the original image and the original image for 60 frames per second received, and every other frame. In this method, the original image and the interpolated image are displayed alternately.

内挿画像を生成する方法については種々の方法があるが、その一例として、ある時刻の原画像（フレームＮ）と、その１／６０秒後の原画像（フレームＮ＋１）の２つの原画像間の動きベクトルを求め、その動きベクトルを元に原画像（フレームＮ）の１／１２０秒後の内挿画像を生成する方法がある。 There are various methods for generating an interpolated image, and as an example, between two original images of an original image at a certain time (frame N) and an original image 1/60 seconds later (frame N + 1). There is a method in which an interpolation image after 1/120 second of the original image (frame N) is generated based on the motion vector.

図４にこの内挿画像を生成し、原画像と内挿画像を交互に表示するための表示回路の構成例を示す。本構成例では、なんらかの形式で入力された原画像の入力データを、画像入力部１でＲＧＢフォーマットとした画像を用いる。
この表示回路は、ＲＧＢフォーマットの画像を蓄積するためのＲＧＢバッファメモリ９と、同じＲＧＢフォーマットの画像を輝度（Ｙ）信号に変換するＲＧＢ−Ｙ変換回路２と、ＲＧＢ―Ｙ変換回路２で変換された輝度画像を蓄積するためのＹバッファメモリ８と、連続する２フレームの原画像の輝度画像から動きベクトルを検出する動きベクトル検出回路４と、求めた動きベクトルから最適な動きベクトルを判定し選択する内挿ベクトル評価回路５と、求めた内挿ベクトルと原画像から内挿画像を生成する内挿画像生成回路６と、原画像と内挿画像とを同期信号とともに交互に出力する画像出力部７とから構成される。 FIG. 4 shows a configuration example of a display circuit for generating the interpolated image and displaying the original image and the interpolated image alternately. In this configuration example, an image in which the input data of the original image input in some form is converted into the RGB format by the image input unit 1 is used.
This display circuit is converted by an RGB buffer memory 9 for storing an RGB format image, an RGB-Y conversion circuit 2 for converting the same RGB format image into a luminance (Y) signal, and an RGB-Y conversion circuit 2. A Y buffer memory 8 for storing the obtained luminance image, a motion vector detection circuit 4 for detecting a motion vector from the luminance image of the continuous two frames of the original image, and determining an optimum motion vector from the obtained motion vector. An interpolation vector evaluation circuit 5 to be selected, an interpolation image generation circuit 6 that generates an interpolation image from the obtained interpolation vector and the original image, and an image output that alternately outputs the original image and the interpolation image together with a synchronization signal Part 7.

以下、本構成例での動作について示す。
入力された原画像データのＲＧＢフォーマット画像は、原画像フレームとして出力されたり、および内挿画像生成に用いられるため、ＲＧＢバッファメモリ９へ書き込まれる。一方で、入力された原画像データのＲＧＢフォーマット画像は、動きベクトルを求めるためにＲＧＢ−Ｙ変換回路２にて輝度画像に変換される。変換後の輝度画像は、Ｙバッファメモリ８に格納されると同時に、前フレームの輝度画像と共に動きベクトル検出回路４へと送信される。 The operation in this configuration example will be described below.
The RGB format image of the input original image data is output as an original image frame or written to the RGB buffer memory 9 for use in generating an interpolated image. On the other hand, the RGB format image of the input original image data is converted into a luminance image by the RGB-Y conversion circuit 2 in order to obtain a motion vector. The converted luminance image is stored in the Y buffer memory 8 and simultaneously transmitted to the motion vector detection circuit 4 together with the luminance image of the previous frame.

動きベクトル検出回路４では、得られた２フレーム間の輝度画像より動きベクトルを求める。動きベクトルを求める方法には種々の方法があるが、いずれの方法でもよいものとする。ただし、いずれの方法にせよ、画像の内容によってはベクトル検出結果に大きな誤差を含む場合がある。例えば原画像（フレームＮ−１）の時刻では存在していたある物体が、原画像（フレームＮ）では別の物体の奥側に入りこむことで存在しなくなってしまい、動きベクトルが正常に検出できない場合などである。 The motion vector detection circuit 4 obtains a motion vector from the obtained luminance image between two frames. There are various methods for obtaining the motion vector, and any method may be used. However, in any method, depending on the contents of the image, the vector detection result may include a large error. For example, an object that existed at the time of the original image (frame N-1) does not exist in the original image (frame N) because it enters the back side of another object, and the motion vector cannot be detected normally. Such as the case.

次に、求められた動きベクトルを、内挿ベクトル評価回路５へと送信する。内挿ベクトル評価回路５では、主に、得られた動きベクトルの候補の中から最適な動きベクトルを判定し選択を行う。評価対象となるある画素(ｘ,ｙ)を評価した場合に、場合によっては２つ以上の動きベクトルが交差する場合もあるし、逆に１つの動きベクトルも指示されない場合がある。このような場合に、最も確からしいと判断されるベクトルを選択したり、あるいは周辺の動きベクトルの状態を鑑みて、その平均値などで補間されたベクトルを割り当てて、最終的に原画像から内挿画像を生成するための内挿ベクトルを決定する。この時に、内挿ベクトルの判定精度が良くない場合は、生成される内挿画像が前後の原画像との連続性を失い、画質劣化の原因となる。 Next, the obtained motion vector is transmitted to the interpolation vector evaluation circuit 5. The interpolation vector evaluation circuit 5 mainly determines and selects an optimal motion vector from the obtained motion vector candidates. When a certain pixel (x, y) to be evaluated is evaluated, in some cases, two or more motion vectors may intersect, and conversely, one motion vector may not be indicated. In such a case, select the most probable vector, or assign an interpolated vector based on the average value in consideration of the state of surrounding motion vectors, and finally add an internal vector from the original image. An interpolation vector for generating an inserted image is determined. At this time, if the determination accuracy of the interpolation vector is not good, the generated interpolation image loses continuity with the preceding and succeeding original images, causing image quality degradation.

最後に、求められた内挿ベクトルが内挿画像生成回路６へ送信される。原画像を元に生成された該内挿ベクトルで補間した内挿画像は、画像出力部７によって同期信号とともに原画像と交互に出力される。 Finally, the obtained interpolation vector is transmitted to the interpolation image generation circuit 6. The interpolated image interpolated with the interpolated vector generated based on the original image is output alternately with the original image together with the synchronization signal by the image output unit 7.

このようにして、秒間６０フレームの画像を倍速の秒間１２０フレームの画像に拡張して液晶テレビなどのディスプレイに表示する。さらなる画質劣化を防ぐ方法として、原画像（フレームＮ）と原画像（フレームＮ−１）の間に生成する内挿画像を２フレームに増やして、原画像（フレームＮ）の１／１８０秒後、２／１８０秒後に表示させて、３倍の１秒間１８０フレーム表示にさせたり、同様に４倍の１秒間２４０フレーム表示させることも可能である。
この場合、動きベクトルから生成する内挿ベクトルの大きさを、内挿する画像の時刻に合わせて調整することが必要である。 In this manner, an image of 60 frames per second is expanded to an image of 120 frames per second at double speed and displayed on a display such as a liquid crystal television. As a method for preventing further image quality degradation, the interpolation image generated between the original image (frame N) and the original image (frame N-1) is increased to 2 frames, and 1/180 seconds after the original image (frame N) It is also possible to display after 2/180 seconds and display 180 frames for 3 seconds for 1 second, or similarly, display 240 frames for 1 second for 4 times.
In this case, it is necessary to adjust the size of the interpolation vector generated from the motion vector according to the time of the image to be interpolated.

なお、動きベクトル検出においては、ＲＧＢフォーマット画像のまま実現することも可能であるが、回路規模が大きくなってしまう事を鑑みて、本構成例ではＲＧＢ−Ｙ変換回路２を用いている。 Note that the motion vector detection can be realized as an RGB format image as it is, but the RGB-Y conversion circuit 2 is used in the present configuration example in view of an increase in circuit scale.

一方、画像から文字などの字幕を検出する技術が存在し、特許文献１〜３に示すような方法がある。
特許文献１では、ダイジェスト映像を作成する目的で、動画像から静止字幕領域（テロップ領域）を抽出する方法について開示している。この発明では、時間的に隣接する２フレームの画像間で、輝度データの差分を用いて静止字幕領域の判定を行っている。
しかしながら、特許文献１の発明のようにして検出された静止字幕領域は、たとえば２フレーム間で静止字幕部分以外の画像でほとんど動きの無い画像が連続した場合、本来静止字幕では無い画像部分についても、誤って静止字幕領域と判定されてしまう可能性があった。 On the other hand, there is a technique for detecting subtitles such as characters from an image, and there are methods shown in Patent Documents 1 to 3.
Patent Document 1 discloses a method for extracting a still subtitle area (telop area) from a moving image for the purpose of creating a digest video. In the present invention, a still caption area is determined using a difference in luminance data between two temporally adjacent images.
However, the still subtitle area detected as in the invention of Patent Document 1 can be applied to an image portion that is not originally a static subtitle when, for example, an image with almost no motion is continuous between two frames in an image other than the still subtitle portion. There is a possibility that it is erroneously determined as a still caption area.

特許文献２に示す方法では、１フレーム内の画像のみを用いて、輝度分布からテロップ文字を抽出する提案がなされている。この発明では時間的に隣接するフレーム間の情報が全く使われていないため、フレーム間で動きのある字幕、静止している字幕を問わず検出される。 In the method shown in Patent Document 2, there has been a proposal for extracting a telop character from a luminance distribution using only an image in one frame. In the present invention, information between frames that are temporally adjacent to each other is not used at all. Therefore, subtitles that move between frames and subtitles that are stationary are detected.

特許文献３に示す方法では、１フレーム内で輝度が均一である部分の抽出、および高周波領域検出による字幕境界部分の抽出、さらに多数のフレームメモリによる一定時間無変化である部分の抽出を行い、これらの抽出結果から静止字幕領域を抽出する提案がなされている。この発明では、静止字幕（テロップ）の下記特徴を利用して検出を行っている。
（１）同一信号レベルの画素で構成される。
（２）通常３〜４秒以上は表示されるので、テロップを構成する画素は所定の時間以上、同一の信号レベルを保持する。
（３）テロップを目立たせるため、背景となる画と信号レベルで大きく差がある。 In the method shown in Patent Document 3, extraction of a portion having uniform luminance within one frame, extraction of a caption boundary portion by high-frequency region detection, and extraction of a portion that does not change for a certain period of time by a number of frame memories, Proposals have been made to extract still caption areas from these extraction results. In the present invention, detection is performed using the following features of still captions (telops).
(1) It is composed of pixels having the same signal level.
(2) Since the display is normally performed for 3 to 4 seconds or more, the pixels constituting the telop hold the same signal level for a predetermined time or more.
(3) Since the telop is conspicuous, there is a large difference between the background image and the signal level.

つまりテロップ領域は画像平面において比較的空間周波数の高い領域となる。ただし、特許文献３の発明によれば、抽出される静止字幕領域は、テロップ全体を包含する矩形領域に留まる。また、輝度差分の演算を全て隣接する画素間との差分で行っているために、例えば静止字幕部分が鮮明な画像においては精度良く検出可能であるが、字幕境界部分がボケてしまっているような画像に対しては検出することができない。 That is, the telop area is an area having a relatively high spatial frequency on the image plane. However, according to the invention of Patent Document 3, the extracted still subtitle area remains a rectangular area including the entire telop. Also, since all luminance differences are calculated using the difference between adjacent pixels, for example, still captions can be detected accurately in an image with clear captions, but the caption boundary appears to be blurred. It is not possible to detect a simple image.

特開平１２−２３０６２号公報JP-A-12-23062 特開平１２−１８２０５３号公報Japanese Patent Laid-Open No. 12-182053 特開平１０−２３３９９４号公報JP-A-10-233994

電子情報通信学会技術研究報告、電子ディスプレイ（EID2001-102）IEICE technical report, electronic display (EID2001-102)

上記背景技術によれば、倍速表示技術により高画質化が可能な一方で、防ぎ得ない動きベクトル検出の検出誤りや、内挿ベクトルの補間処理により、動画像の連続性を損なった内挿画像を生成してしまい、視聴者にとって見苦しい動画を出力してしまうという課題があった。特に、静止字幕のように輪郭が鮮鋭でかつ静止状態にある画像の乱れは、動画像において視認され易く、画質劣化の要因となっていた。 According to the above-mentioned background art, high-quality images can be achieved by the double-speed display technology, but the motion vector detection detection error that cannot be prevented, and the interpolated image in which the continuity of the moving image is impaired due to the interpolation processing of the interpolation vector There is a problem that a video that is unsightly for viewers is output. In particular, a disturbance of an image having a sharp outline and a stationary state like a stationary subtitle is easily recognized in a moving image, and has been a factor of image quality degradation.

一方、字幕の検出技術については、動いている字幕と静止字幕の区別ができなかったり、静止字幕を検出できても、字幕以外の静止画像部分まで誤検出される場合もある。また、鮮鋭な画像に対しては精度よく静止字幕を検出を行うアルゴリズムであっても、例えば原画からスケーリング処理などによって静止字幕境界部分がボケてしまったような画像に対して、正確な検出が行えない問題があった。
また、数フレーム間の連続するデータを記憶するために、大量のメモリが必要であったり、検出される静止字幕領域が大まかな矩形領域のみであった。 On the other hand, with regard to caption detection technology, moving captions and still captions cannot be distinguished, and even when still captions can be detected, even still image portions other than captions may be erroneously detected. Moreover, even for an algorithm that accurately detects still captions for sharp images, accurate detection is possible even for images in which the boundary of still captions is blurred due to, for example, scaling processing from the original image. There was a problem that could not be done.
Also, a large amount of memory is required to store continuous data for several frames, or the detected still caption area is only a rough rectangular area.

本発明は、上述のごとき実情に鑑みてなされたもので、原画像以外の内挿画像を生成して出力する回路を備えた画像出力装置において、静止字幕部分の乱れによる画質劣化を最少限に抑え、高画質化を行う回路を実現でき、かつ特別なバッファメモリを追加することなく回路を実装すること可能とした画像出力装置、及び画像出力装置で実行する画像出力方法を提供することを目的とするものである。 The present invention has been made in view of the above circumstances, and in an image output apparatus provided with a circuit that generates and outputs an interpolated image other than the original image, image quality degradation due to disturbance of the still caption portion is minimized. An object of the present invention is to provide an image output apparatus capable of realizing a circuit that suppresses image quality and achieves high image quality, and that can be mounted without adding a special buffer memory, and an image output method executed by the image output apparatus It is what.

上記課題を解決するために、本発明の第１の技術手段は、表示する画像フレームを時間的に連続する２つの入力画像フレームの間に、１つまたは複数の内挿フレーム（補間フレーム）を生成し出力する画像出力装置において、画像を入力するための画像入力部と、入力した画像の画像フォーマットを変換せずにそのまま蓄積するための原画像蓄積用バッファメモリと、入力した画像を順次輝度情報のみの画像フォーマットに変換し輝度画像を生成する画像フォーマット変換回路と、輝度画像を蓄積するための輝度画像蓄積用バッファメモリと、時間的に連続した２フレームの輝度画像間の動きベクトルを検出する動きベクトル検出回路と、動きベクトルを用いて時間的に連続した２フレームの間に挿入される内挿フレームの各画素または複数画素で構成されるブロックの内挿ベクトルを選定する内挿ベクトル評価回路と、入力した時間的に連続した２フレームの輝度画像間の静止している字幕画像を検出する静止字幕検出回路と、原画像蓄積用バッファメモリに蓄積された原画像を元に内挿ベクトルおよび静止字幕検出結果から所望の内挿フレームを生成する内挿画像生成回路と、原画像蓄積用バッファメモリに蓄えられた原画像フレームおよび内挿画像生成回路で生成された内挿フレームを時間的に連続するように並べて出力する画像出力部とを有し、内挿画像生成回路は、静止字幕検出回路で字幕と判定された画素または複数画素を含むブロックに割り当てられた内挿ベクトルの値を０とし、静止字幕検出回路で字幕と判定されなかった画素または複数画素を含むブロックのうち内挿ベクトルが指示する画素において字幕と判定された場合は画素または複数画素を含むブロックに割り当てられた内挿ベクトルを０として内挿フレームを生成することを特徴としたものである。 In order to solve the above problems, the first technical means of the present invention provides one or a plurality of interpolated frames (interpolated frames) between two input image frames that are temporally continuous with the image frames to be displayed. In an image output device that generates and outputs an image, an image input unit for inputting an image, an original image storage buffer memory for storing the input image without converting the image format, and luminance of the input image sequentially An image format conversion circuit that generates a luminance image by converting to an information-only image format, a luminance image storage buffer memory for storing the luminance image, and a motion vector between two temporally continuous luminance images A motion vector detection circuit for performing interpolation, and each pixel or a plurality of interpolation frames inserted between two temporally continuous frames using the motion vector An interpolation vector evaluation circuit that selects an interpolation vector of a block composed of elements, a still caption detection circuit that detects a stationary caption image between two input temporally continuous luminance images, An interpolated image generating circuit for generating a desired interpolated frame from the interpolated vector and the still caption detection result based on the original image stored in the image storing buffer memory, and the original image stored in the original image storing buffer memory And an image output unit for outputting the frames and the interpolated frames generated by the interpolated image generation circuit so as to be temporally continuous, and the interpolated image generation circuit is determined to be subtitles by the still subtitle detection circuit The value of the interpolation vector assigned to a pixel or a block including a plurality of pixels is set to 0, and the block including a pixel or a plurality of pixels that has not been determined as a caption by the still caption detection circuit If the interpolation vector is determined as the subtitle for a pixel instruction is obtained by and generates an interpolation frame interpolation vector allocated to the block containing a pixel or a plurality of pixels as 0.

第２の技術手段は、第１の技術手段において、静止字幕検出回路が、評価対象画素を中心として点対象位置にある２点の画素の輝度差分絶対値が時間的に連続する２フレームにおいてともに閾値を超え、かつ２フレーム間で輝度差分絶対値がほとんど同じ値である場合に、評価対象画素を字幕境界画素と判定し、複数画素で構成されるブロック内に字幕境界画素の数がある閾値を超えた場合に、ブロック内の画素を字幕と判定することを特徴としたものである。 The second technical means is the same as in the first technical means, in which the still caption detection circuit is used in two frames in which the luminance difference absolute values of two pixels at the point target position centered on the evaluation target pixel are temporally continuous. When the absolute value of the luminance difference is almost the same between two frames when the threshold is exceeded and the evaluation target pixel is determined to be a caption boundary pixel, the number of caption boundary pixels is within a block composed of a plurality of pixels. When the value exceeds the threshold, the pixels in the block are determined as captions.

第３の技術手段は、第２の技術手段において、静止字幕検出回路が、字幕境界画素として判定された画素について、評価対象画素が属する行（ライン）に存在する字幕境界画素として判定された画素数と、ラインの直下のラインに存在する字幕境界画素として判定された画素数が、共にある閾値を超えている場合に、評価対象画素が属するラインの画素全てを黒帯境界と判定し、字幕境界画素として判定した画素の判定を解除することを特徴としたものである。 According to a third technical means, in the second technical means, the still caption detection circuit determines a pixel determined as a caption boundary pixel existing in a row (line) to which the evaluation target pixel belongs for a pixel determined as a caption boundary pixel. When the number of pixels and the number of pixels determined as subtitle boundary pixels existing in the line immediately below the line both exceed a certain threshold, all the pixels of the line to which the evaluation target pixel belongs are determined to be black belt boundaries, It is characterized by canceling the determination of the pixel determined as the boundary pixel.

第４の技術手段は、第２または第３の技術手段において、時間的に連続する２つの原画像フレームの各々の評価対象画素について、静止字幕検出回路が、評価対象画素の輝度差分絶対値がある閾値を超えた場合に、評価対象画素を静止画素と判定し、静止画素の周辺画素に前記字幕境界画素が存在する場合に、静止画素を字幕と判定し、複数画素で構成されるブロック内における字幕境界画素の数と字幕画素の数の和がある閾値を超えた場合に、ブロック内の画素を字幕と判定することを特徴としたものである。 According to a fourth technical means, in the second or third technical means, for each evaluation target pixel of two temporally continuous original image frames, the still caption detection circuit determines that the luminance difference absolute value of the evaluation target pixel is When a certain threshold value is exceeded, the evaluation target pixel is determined as a still pixel, and when the caption boundary pixel exists in the peripheral pixels of the still pixel, the still pixel is determined as a caption, and the block is composed of a plurality of pixels. When the sum of the number of subtitle boundary pixels and the number of subtitle pixels exceeds a certain threshold, the pixel in the block is determined as a subtitle.

第５の技術手段は、第２〜第４のいずれか１の技術手段において、入力画像フレームのうち時間的に連続する２つのフレームについて、画面全体でほとんど差分が無い画像と判断された場合に、いずれか一方のフレームを破棄し２つのフレームの間に内挿フレームを生成せず、静止字幕検出回路における字幕検出処理を行わないことを特徴としたものである。 The fifth technical means is that in any one of the second to fourth technical means, when two temporally continuous frames of the input image frames are determined to have almost no difference in the entire screen. One of the frames is discarded, no interpolated frame is generated between the two frames, and no caption detection processing is performed in the still caption detection circuit.

第６の技術手段は、表示する画像フレームを時間的に連続する２つの入力画像フレームの間に、１つまたは複数の内挿フレーム（補間フレーム）を生成し出力する画像出力装置により実行する画像出力方法において、画像出力装置に画像を入力する画像入力ステップと、入力した画像の画像フォーマットを変換せずにバッファメモリにそのまま蓄積する原画像蓄積ステップと、入力した画像を順次輝度情報のみの画像フォーマットに変換し輝度画像を生成する画像フォーマット変換ステップと、輝度画像をバッファメモリに蓄積する輝度画像蓄ステップと、時間的に連続した２フレームの輝度画像間の動きベクトルを検出する動きベクトル検出ステップと、動きベクトルを用いて前記時間的に連続した２フレームの間に挿入される内挿フレームの各画素または複数画素で構成されるブロックの内挿ベクトルを選定する内挿ベクトル評価ステップと、入力した時間的に連続した２フレームの輝度画像間の静止している字幕画像を検出する静止字幕検出ステップと、原画像蓄積ステップでバッファメモリに蓄積された原画像を元に内挿ベクトルおよび静止字幕検出結果から所望の内挿フレームを生成する内挿画像生成ステップと、原画像蓄積ステップでバッファメモリに蓄えられた原画像フレームおよび内挿画像生成ステップで生成された内挿フレームを時間的に連続するように並べて出力する画像出力ステップとを有し、内挿画像生成ステップは、静止字幕検出ステップで字幕と判定された画素または複数画素を含むブロックに割り当てられた内挿ベクトルの値を０とし、静止字幕検出ステップで字幕と判定されなかった画素または複数画素を含むブロックのうち内挿ベクトルが指示する画素において字幕と判定された場合は画素または複数画素を含むブロックに割り当てられた内挿ベクトルを０として内挿フレームを生成することを特徴としたものである。 The sixth technical means provides an image executed by an image output device that generates and outputs one or a plurality of interpolation frames (interpolation frames) between two input image frames that are temporally continuous with the image frames to be displayed. In the output method, an image input step for inputting an image to the image output device, an original image storage step for storing the input image as it is in the buffer memory without converting the image format, and an image for which only the luminance information is sequentially input An image format conversion step for generating a luminance image by converting into a format, a luminance image storage step for storing the luminance image in a buffer memory, and a motion vector detection step for detecting a motion vector between two temporally continuous luminance images And an interpolation frame inserted between the two consecutive frames using the motion vector. An interpolation vector evaluation step for selecting an interpolation vector of a block composed of each pixel or a plurality of pixels, and a stationary subtitle for detecting a stationary subtitle image between two temporally continuous luminance images inputted A detection step, an interpolation image generation step for generating a desired interpolation frame from the interpolation vector and the still caption detection result based on the original image stored in the buffer memory in the original image storage step, and a buffer in the original image storage step An image output step for arranging and outputting the original image frame stored in the memory and the interpolation frame generated in the interpolation image generation step so as to be temporally continuous, and the interpolation image generation step includes still caption detection The value of the interpolation vector assigned to a block or a block including a plurality of pixels determined as subtitles in step is set to 0, and still subtitle detection is performed. If it is determined that the subtitle is determined in the pixel indicated by the interpolation vector among the pixels that are not determined as subtitles in the step or the block including a plurality of pixels, the interpolation vector assigned to the block including the pixels or the plurality of pixels is set to 0. It is characterized by generating an insertion frame.

第７の技術手段は、第６の技術手段において、静止字幕検出ステップが、評価対象画素を中心として点対象位置にある２点の画素の輝度差分絶対値が、時間的に連続する２フレームにおいてともに閾値を超え、かつ２フレーム間で輝度差分絶対値がほとんど同じ値である場合に、評価対象画素を字幕境界画素と判定し、複数画素で構成されるブロック内に字幕境界画素の数がある閾値を超えた場合に、ブロック内の画素を字幕と判定することを特徴としたものである。 According to a seventh technical means, in the sixth technical means, the static subtitle detection step is performed in two frames in which the luminance difference absolute values of two pixels at the point target position centering on the evaluation target pixel are temporally continuous. When both exceed the threshold and the absolute value of the luminance difference between the two frames is almost the same value, the evaluation target pixel is determined to be a caption boundary pixel, and the number of caption boundary pixels is within a block composed of a plurality of pixels. When the threshold value is exceeded, the pixel in the block is determined as a caption.

第８の技術手段は、第７の技術手段において、静止字幕検出ステップが、字幕境界画素として判定された画素について、評価対象画素が属する行（ライン）に存在する字幕境界画素として判定された画素数と、ラインの直下のラインに存在する字幕境界画素として判定された画素数が、共にある閾値を超えている場合に、評価対象画素が属するラインの画素全てを黒帯境界と判定し、字幕境界画素として判定した画素の判定を解除することを特徴としたものである。 According to an eighth technical means, in the seventh technical means, the still caption detection step is a pixel determined as a caption boundary pixel existing in a row (line) to which the evaluation target pixel belongs with respect to a pixel determined as a caption boundary pixel. When the number of pixels and the number of pixels determined as subtitle boundary pixels existing in the line immediately below the line both exceed a certain threshold, all the pixels of the line to which the evaluation target pixel belongs are determined to be black belt boundaries, It is characterized by canceling the determination of the pixel determined as the boundary pixel.

第９の技術手段は、第７または第８の技術手段において、静止字幕検出ステップが、時間的に連続する２つの原画像フレームの各々の評価対象画素について、評価対象画素の輝度差分絶対値がある閾値を超えた場合に、評価対象画素を静止画素と判定し、静止画素の周辺画素に字幕境界画素が存在する場合に、静止画素を字幕と判定し、複数画素で構成されるブロック内における字幕境界画素の数と字幕画素の数の和がある閾値を超えた場合に、ブロック内の画素を字幕と判定することを特徴としたものである。 According to a ninth technical means, in the seventh or eighth technical means, the static subtitle detection step has a luminance difference absolute value of the evaluation target pixel for each of the evaluation target pixels of two original image frames that are temporally continuous. When a certain threshold value is exceeded, the evaluation target pixel is determined to be a still pixel, and when a caption boundary pixel exists in the peripheral pixels of the still pixel, the still pixel is determined to be a caption, and the block is composed of a plurality of pixels. When the sum of the number of subtitle boundary pixels and the number of subtitle pixels exceeds a certain threshold, the pixel in the block is determined as a subtitle.

第１０の技術手段は、第７〜第９のいずれか１の技術手段において、入力画像フレームのうち時間的に連続する２つのフレームについて、画面全体でほとんど差分が無い画像と判断された場合に、いずれか一方のフレームを破棄し２つのフレームの間に内挿フレームを生成せず、静止字幕検出ステップにおける字幕検出処理を行わないことを特徴とする画像出力方法。 In the tenth technical means, in any one of the seventh to ninth technical means, when two temporally continuous frames of the input image frames are determined to have almost no difference in the entire screen. An image output method characterized in that either one of the frames is discarded, no interpolated frame is generated between the two frames, and no caption detection processing is performed in the still caption detection step.

本発明によれば、倍速表示可能な液晶テレビなどのディスプレイにおいて、静止字幕部分の乱れによる画質劣化を最少限に抑え、高画質化を行う回路を実現でき、かつ特別なバッファメモリを追加することなく、実装することが可能となる。
また、倍速表示以外のディスプレイについても、原画像以外に内挿画像を生成するような場合についても同様である。 According to the present invention, in a display such as a liquid crystal television capable of double-speed display, it is possible to realize a circuit for improving image quality by minimizing image quality degradation due to disturbance of a still caption portion, and adding a special buffer memory And can be implemented.
The same applies to a display other than the double-speed display when an interpolated image is generated in addition to the original image.

本発明の画像出力装置が備えるフレームレート変換処理回路の一形態を示す構成を模式的に示す説明図である。It is explanatory drawing which shows typically the structure which shows one form of the frame rate conversion processing circuit with which the image output device of this invention is provided. 図１の静止字幕検出回路による静止字幕領域の検出手順を説明するための図である。It is a figure for demonstrating the detection procedure of the still caption area | region by the still caption detection circuit of FIG. 本発明の画像出力装置が備えるフレームレート変換処理回路の一形態を示す構成を模式的に示す説明図である。It is explanatory drawing which shows typically the structure which shows one form of the frame rate conversion processing circuit with which the image output device of this invention is provided. 従来の画像出力装置が備えるフレームレート変換処理回路の一形態を示す構成を模式的に示す説明図である。It is explanatory drawing which shows typically the structure which shows one form of the frame rate conversion processing circuit with which the conventional image output apparatus is provided. ３−２プルダウンされた映像フレームを模式的に示す説明図である。It is explanatory drawing which shows typically the video frame pulled down 3-2. ３−２プルダウンされた映像フレームに対して、内挿画像を生成するフレーム構成、および静止字幕検出の更新タイミングの一形態を模式的に示す説明図である。It is explanatory drawing which shows typically the frame structure which produces | generates an interpolation image, and the update timing of a still caption detection with respect to the video frame pulled down 3-2.

本発明の実施形態では、倍速表示技術等による内挿画像生成時に発生する静止字幕部分の乱れを防ぐために、高精度な静止字幕検出による静止字幕検出回路を、２フレーム分のバッファメモリで実装可能な回路を構成する。
字幕検出回路にて検出された静止字幕領域に対して、内挿画像生成回路において原画像をもとに求めた内挿ベクトルから内挿画像を生成する際に、静止字幕領域と判定された画素に対する内挿ベクトルの値を０として扱うことにより、静止字幕が周辺の動画像部分に侵食されて静止字幕が削られてしまうのを防ぐ。また、静止字幕領域外における画素で内挿画像生成時に、内挿ベクトルが指示する画素が静止字幕領域であった場合に、その内挿ベクトルの値を０として内挿画像生成を行うことにより、静止字幕領域内の画素が静止字幕領域外に飛び出してしまうのを防ぐことが可能となり、画質劣化を防ぎ高画質化を実現する。
上記の特徴を有する本発明の実施形態を図面を参照して、さらに具体的に説明する。 In the embodiment of the present invention, a static subtitle detection circuit based on still subtitle detection with high accuracy can be implemented with a buffer memory for two frames in order to prevent disturbance of a still subtitle portion that occurs when an interpolated image is generated by a double speed display technique or the like. A simple circuit.
Pixels determined as still caption areas when generating an interpolated image from an interpolation vector determined based on the original image in the interpolated image generation circuit for the still caption area detected by the caption detection circuit By treating the value of the interpolation vector with respect to 0 as 0, it is possible to prevent the still caption from being eroded by the surrounding moving image portion and being deleted. In addition, when generating an interpolated image with pixels outside the still caption area, if the pixel indicated by the interpolation vector is a still caption area, by generating an interpolated image with the value of the interpolation vector set to 0, It is possible to prevent the pixels in the still caption area from jumping out of the still caption area, thereby preventing image quality deterioration and improving the image quality.
The embodiment of the present invention having the above features will be described more specifically with reference to the drawings.

（実施形態１）
実施形態１として、秒間６０フレームの入力画像を取得し、フレームレート変換処理（秒間６０フレームの映像を秒間１２０フレームの映像に変換する）を行う回路において、字幕付き映像が入力される場合について説明する。 (Embodiment 1)
Embodiment 1 describes a case where a subtitle-added video is input in a circuit that acquires an input image of 60 frames per second and performs a frame rate conversion process (converts a video of 60 frames per second to a video of 120 frames per second). To do.

（実施形態１：構成）
図１は、本発明の画像出力装置が備えるフレームレート変換処理回路の一形態を示す構成を模式的に示す説明図である。
図１において、入力画像データ１１が画像入力部１に入力され、ＲＧＢフォーマットの原画像データ１７は、２つの信号系統に分散される。第１の信号系統は、ＲＧＢ−Ｙ変換回路２へ入力され、輝度画像へフォーマット変換された後、Ｙバッファメモリ８に輝度画像データを蓄積する。Ｙバッファメモリ８は２フレーム分のデータ容量を持つメモリで構成されており、静止字幕検出回路３、および動きベクトル検出回路４に接続されている。 (Embodiment 1: Configuration)
FIG. 1 is an explanatory diagram schematically showing a configuration showing an embodiment of a frame rate conversion processing circuit provided in the image output apparatus of the present invention.
In FIG. 1, input image data 11 is input to an image input unit 1, and original image data 17 in RGB format is distributed into two signal systems. The first signal system is input to the RGB-Y conversion circuit 2, converted into a luminance image, and then the luminance image data is stored in the Y buffer memory 8. The Y buffer memory 8 is composed of a memory having a data capacity for two frames, and is connected to the still caption detection circuit 3 and the motion vector detection circuit 4.

静止字幕検出回路３は、連続する２つのフレームの輝度画像データ（フレームＮ）１２および輝度画像データ（フレームＮ−１）１３から、画像内の静止字幕領域を検出する。検出された字幕領域情報１６は内挿画像生成回路６に送られる。
動きベクトル検出回路４は、連続する２つのフレームの輝度画像データ１２および輝度画像データ１３から、画像間の動きベクトル検出を行う。 The still caption detection circuit 3 detects a still caption area in the image from the luminance image data (frame N) 12 and the luminance image data (frame N-1) 13 of two consecutive frames. The detected caption area information 16 is sent to the interpolated image generation circuit 6.
The motion vector detection circuit 4 detects a motion vector between images from the luminance image data 12 and the luminance image data 13 of two consecutive frames.

動きベクトル検出回路４で検出された結果は動きベクトル１４として、内挿ベクトル評価回路５に送られ、動きベクトル検出回路４で検出された結果が妥当であるかの評価が行われ、背景技術と同様な方法により最終的な内挿ベクトル１５が得られる。得られた内挿ベクトル１５は静止字幕検出回路３で生成された字幕領域情報１６と共に、内挿画像生成回路６へ送られる。 The result detected by the motion vector detection circuit 4 is sent to the interpolation vector evaluation circuit 5 as a motion vector 14 to evaluate whether the result detected by the motion vector detection circuit 4 is valid. A final interpolation vector 15 is obtained by a similar method. The obtained interpolation vector 15 is sent to the interpolated image generation circuit 6 together with the caption area information 16 generated by the still caption detection circuit 3.

第２の信号系統は、ＲＧＢバッファメモリ９に接続され、ＲＧＢフォーマットの原画像データ１７をＲＧＢバッファメモリ９へ蓄積する。蓄積された原画像データ１７は、第１の信号系統で生成された内挿ベクトル１５と字幕領域情報１６と同期したタイミングで内挿画像生成用原画像データ１８として内挿画像生成回路６へと出力するよう構成されている。 The second signal system is connected to the RGB buffer memory 9 and accumulates original image data 17 in the RGB format in the RGB buffer memory 9. The accumulated original image data 17 is sent to the interpolation image generation circuit 6 as the interpolation image generation original image data 18 at a timing synchronized with the interpolation vector 15 generated by the first signal system and the caption area information 16. It is configured to output.

内挿画像生成回路６は、得られた内挿画像生成用原画像データ１８、および内挿ベクトル１５、字幕領域情報１６を元に内挿画像２０を生成する。参照する内挿画像生成用原画像データ１８は、フレームＮ番目の原画像データであってもフレームＮ−１番目の原画像データであっても構わず、フレームＮ番目の原画像データで内挿する場合は、ベクトルの方向が反転する。 The interpolated image generating circuit 6 generates an interpolated image 20 based on the obtained interpolated image generating original image data 18, the interpolated vector 15, and the caption area information 16. The interpolated image generation original image data 18 to be referred to may be frame N-th original image data or frame N-1th original image data, and is interpolated with the frame N-th original image data. If so, the vector direction is reversed.

内挿画像生成回路６は、生成した内挿画像２０を出力用原画像データ１９と交互に画像出力部７へと出力する。出力順序は、Ｎ−１番目の原画像データ、内挿画像データ、Ｎ番目の原画像データの順になるようにする。
画像出力部７では、ディスプレイ表示に必要な同期信号が付加されて所定のタイミングで出力画像データ２１をディスプレイ表示装置に対して出力する。 The interpolated image generation circuit 6 outputs the generated interpolated image 20 to the image output unit 7 alternately with the output original image data 19. The output order is set in the order of the (N-1) th original image data, the interpolated image data, and the Nth original image data.
The image output unit 7 adds a synchronization signal necessary for display display and outputs output image data 21 to the display display device at a predetermined timing.

（実施形態１：動作）
続いて、本実施形態の動作について説明する。
画像入力部１に対して、ある時刻の第Ｎ番目の入力画像データ１１が入力され、第１の信号系統および第２の信号系統にＲＧＢフォーマットの原画像データ１７が送信される。
第１の信号系統ではＲＧＢフォーマットの原画像データ１７がＲＧＢ−Ｙ変換回路２に入力され、フォーマット変換処理により輝度画像（フレームＮ）１２が生成される。輝度画像（フレームＮ）１２はＹバッファメモリ８に書き込まれると同時に、時間的に連続する過去の輝度画像(フレームＮ−１)１３がＹバッファメモリ８から読み出され、これら２つのフレーム（フレームＮ，フレームＮ−１）の輝度画像が静止字幕検出回路３および動きベクトル検出回路４に送信される。 (Embodiment 1: Operation)
Next, the operation of this embodiment will be described.
The Nth input image data 11 at a certain time is input to the image input unit 1, and the original image data 17 in RGB format is transmitted to the first signal system and the second signal system.
In the first signal system, original image data 17 in RGB format is input to the RGB-Y conversion circuit 2 and a luminance image (frame N) 12 is generated by format conversion processing. The luminance image (frame N) 12 is written into the Y buffer memory 8, and at the same time, a past luminance image (frame N-1) 13 continuous in time is read from the Y buffer memory 8, and these two frames (frames) are read out. The luminance image of N, frame N−1) is transmitted to the still caption detection circuit 3 and the motion vector detection circuit 4.

動きベクトル検出回路４では、入力された２フレームの輝度画像から、動きベクトル１４を求める。動きベクトル１４を求める方法については種々の提案がなされており、例えばブロックマッチングによる動きベクトルの検出方法がある。これは複数の画素をまとめてブロック化し、フレームＮとフレームＮ−１の２フレーム間で、ブロック単位に一致しているか否かを評価し、最も一致したブロックとの位置関係から動きベクトルを求める方法である。得られた動きベクトル１４は内挿ベクトル評価回路５に送信される。 The motion vector detection circuit 4 obtains a motion vector 14 from the input luminance image of two frames. Various proposals have been made for a method for obtaining the motion vector 14, and for example, there is a motion vector detection method by block matching. In this method, a plurality of pixels are grouped into blocks, whether or not the two frames of the frame N and the frame N-1 match each other is evaluated in block units, and a motion vector is obtained from the positional relationship with the most matched block. Is the method. The obtained motion vector 14 is transmitted to the interpolation vector evaluation circuit 5.

内挿ベクトル評価回路５では、内挿画像生成に必要な内挿ベクトル１５を生成する。内挿ベクトル評価回路５は動きベクトル１４を元に、評価対象画素（または複数の画素からなる評価対象ブロック）の内挿ベクトルとして当該画素を指示している動きベクトルを内挿ベクトルとするが、複数の動きベクトルが当該画素を指示している場合や、ひとつの動きベクトルも当該画素を指示していない場合がある。このような場合に、例えば、当該画素を指示していない評価対象画素（または評価対象ブロック）においては、評価対象画素の周辺画素の内挿ベクトルで補間したり、あるいは動きが無いものとして０ベクトルを割り当てたり、適宜もっともらしい内挿ベクトルの割り当てを行う。評価対象画素に割り当てた内挿ベクトル１５は、内挿画像生成回路６へ送信される。 The interpolation vector evaluation circuit 5 generates an interpolation vector 15 necessary for generating an interpolation image. The interpolation vector evaluation circuit 5 uses the motion vector indicating the pixel as the interpolation vector of the evaluation target pixel (or the evaluation target block composed of a plurality of pixels) based on the motion vector 14 as an interpolation vector. There may be a case where a plurality of motion vectors indicate the pixel or a single motion vector does not indicate the pixel. In such a case, for example, in an evaluation target pixel (or an evaluation target block) that does not designate the pixel, interpolation is performed with an interpolation vector of peripheral pixels of the evaluation target pixel, or 0 vector is assumed to have no motion. Or assigning a plausible interpolation vector as appropriate. The interpolation vector 15 assigned to the evaluation target pixel is transmitted to the interpolation image generation circuit 6.

一方、静止字幕検出回路３では、時間的に連続する２つの輝度画像（フレームＮ）１２と輝度画像（フレームＮ−１）１３とから、静止字幕領域の検出を行う。
静止字幕検出回路３では、図２に示す手順で静止字幕領域の検出が行われる。ステップＳ１で、字幕境界検出部１０１にて字幕境界画素の検出が行われる。字幕境界検出部１０１では、ある評価対象画素（ｘ,ｙ）を中心として点対称位置にある２点間の画素の輝度差分絶対値を、過去の輝度画像（フレームＮ−１）１３および現在の輝度画像（フレームＮ）１２のそれぞれで求める。 On the other hand, the still caption detection circuit 3 detects a still caption area from two temporally continuous luminance images (frame N) 12 and luminance image (frame N-1) 13.
The still caption detection circuit 3 detects a still caption area in the procedure shown in FIG. In step S1, the caption boundary detection unit 101 detects caption boundary pixels. In the caption boundary detection unit 101, the luminance difference absolute value of a pixel between two points that are in a point-symmetrical position with respect to a certain evaluation target pixel (x, y) is used as a past luminance image (frame N-1) 13 and the current luminance value. Each luminance image (frame N) 12 is obtained.

評価対象画素からの画素位置差分ｐ、ｑ、フレーム番号Ｎを用いて、式（１）および式（２）のように表わされる。
Ｙ_Ｎ（ｐ，ｑ）＝｜Ｙ_Ｎ（ｘ−ｐ，ｙ−ｑ）−Ｙ_Ｎ（ｘ＋ｐ，ｙ＋ｑ）｜・・・（１）
Ｙ_Ｎ−１（ｐ，ｑ）＝｜Ｙ_Ｎ−１（ｘ−ｐ，ｙ−ｑ）−Ｙ_Ｎ−１（ｘ＋ｐ，ｙ＋ｑ）｜・・・（２） Using the pixel position differences p and q and the frame number N from the pixel to be evaluated, they are expressed as Expression (1) and Expression (2).
Y _N (p, q) = | Y _N (x−p, y−q) −Y _N (x + p, y + q) | (1)
Y _N-1 (p, q) = | Y _N-1 (xp, yq) −Y _N-1 (x + p, y + q) | (2)

本実施例では、評価対象画素（ｘ，ｙ）を中心として周囲３×３ｔａｐの上記輝度差分絶対値を求めるものとする。この時、（ｐ，ｑ）＝（１，１），（０，１），（−１，１），（１，０）の各組み合わせで輝度差分絶対値Ｙ_Ｎ（ｐ，ｑ）、Ｙ_Ｎ−１（ｐ，ｑ）を求める。
評価対象画素（ｘ，ｙ）を中心として周囲Ｐ×Ｑｔａｐで同様の値を求める場合は、［（Ｑ−１）／２≧ｑ＞０，（Ｐ−１）／２≧ｐ≧−（Ｐ−１）／２］の全組み合わせと、［ｑ＝０，（Ｐ−１）／２≧ｐ≧０］の全組み合わせとなる。 In the present embodiment, the absolute value of luminance difference of 3 × 3 taps around the evaluation target pixel (x, y) is obtained. At this time, the luminance difference absolute value Y _N (p, q), Y for each combination of (p, q) = (1,1), (0,1), (−1,1), (1,0). _N-1 (p, q) is obtained.
In the case where a similar value is obtained around the evaluation target pixel (x, y) at the surrounding P × Qtap, [(Q−1) / 2 ≧ q> 0, (P−1) / 2 ≧ p ≧ − (P -1) / 2] and all combinations [q = 0, (P-1) / 2 ≧ p ≧ 0].

ここで、求めた各輝度差分絶対値について、式（３）乃至式（５）の条件を満たすか評価を行う。式（３）の条件式は時間的に連続する２つのフレームの同じ位置にある輝度差分絶対値の差の絶対値を求め、該差分絶対値がある閾値Ｔ_ｄｉｆｆより小さいか判定し「静止していること」が評価される。また、式４、式５の条件式では、それぞれの輝度差分絶対値Ｙ_Ｎ（ｐ，ｑ）、Ｙ_Ｎ−１（ｐ，ｑ）が閾値Ｔ_{ｐｏｗeｒ}より大きいかを判定し、字幕境界部分の特徴である「背景となる画と信号レベルで大きく差がある」ことが評価される。 Here, it is evaluated whether or not the obtained luminance difference absolute values satisfy the conditions of Expressions (3) to (5). The conditional expression (3) determines the absolute value of the difference between the luminance difference absolute values at the same position in two temporally consecutive frames, determines whether the difference absolute value is smaller than a certain threshold T _diff , Is evaluated. Further, in the conditional expressions of Expression 4 and Expression 5, it is determined whether the respective luminance difference absolute values Y _N (p, q) and Y _N-1 (p, q) are larger than the threshold value T _power , and It is evaluated that the feature is “there is a large difference between the background image and the signal level”.

｜Ｙ_Ｎ（ｐ，ｑ）−Ｙ_Ｎ−１（ｐ，ｑ）｜＜Ｔ_ｄｉｆｆ・・・（３）
Ｙ_Ｎ（ｐ，ｑ）＞Ｔ_{ｐｏｗeｒ}・・・（４）
Ｙ_Ｎ−１（ｐ，ｑ）＞Ｔ_{ｐｏｗeｒ}・・・（５） | Y _N (p, q) −Y _N-1 (p, q) | <T _diff (3)
Y _N (p, q)> T _power (4)
Y _N-1 (p, q)> T _power (5)

全ての条件式（式（３）、式（４）、式（５））を満たす（ｐ，ｑ）の組み合わせ数をカウントし、この組み合わせの数がある閾値を超えた場合に、画素（ｘ，ｙ）が静止字幕境界であると判定する。 The number of combinations (p, q) satisfying all the conditional expressions (expression (3), expression (4), expression (5)) is counted, and when the number of combinations exceeds a certain threshold value, the pixel (x , Y) is determined to be a stationary subtitle boundary.

本方式によれば、評価対象画素（ｘ，ｙ）自身の輝度値は一切用いずに静止字幕境界が検出される。また、ｔａｐ数に応じて間隔の離れた画素間の輝度差分絶対値の結果も考慮される。入力画像データ１１が本回路への入力の前段階において、例えばＳＤ解像度（６４０×４８０）の映像がＦｕｌｌＨＤ解像度（１９２０×１０８０）の映像へスケーリング処理されていた場合に、静止字幕境界部分が中間輝度によって補間されてしまい、隣接画素部分でボケてしまうような場合がある。このような場合に、隣接画素間で字幕境界を検出しようとしても中間輝度画素によって段階的に輝度が移行してしまい検出できない。しかしながら、本方式によれば数画素離れた画素間での輝度差分絶対値を用いて検出することが可能なため、精度良く字幕境界の検出が可能である。 According to this method, a still caption boundary is detected without using any luminance value of the evaluation target pixel (x, y) itself. In addition, the result of the absolute value of the luminance difference between pixels spaced apart according to the number of taps is also considered. In the stage before input image data 11 is input to this circuit, for example, when a video of SD resolution (640 × 480) is scaled to a video of FullHD resolution (1920 × 1080), a still subtitle boundary portion is intermediate There are cases in which interpolation is performed due to luminance and blur is caused in adjacent pixel portions. In such a case, even if an attempt is made to detect a caption boundary between adjacent pixels, the luminance shifts stepwise by the intermediate luminance pixel and cannot be detected. However, according to this method, since it is possible to detect using the absolute value of the luminance difference between pixels several pixels apart, it is possible to detect the caption boundary with high accuracy.

また、画素（ｘ，ｙ）に対する字幕境界の検出においては、複数の（ｐ，ｑ）の組み合わせの結果を用いて閾値判定を行っている。これは字幕境界部分では、付近の画素も字幕境界画素の特徴を含む画素の組み合わせが多く存在するという特徴に基づいており、ｔａｐ数を増加させた場合に字幕境界部分以外の誤検出を防ぐのに有効である。 Further, in the detection of the caption boundary for the pixel (x, y), threshold determination is performed using the result of a combination of a plurality of (p, q). This is based on the feature that there are many combinations of pixels including the feature of the caption boundary pixel in the vicinity of the caption boundary portion. When the number of taps is increased, erroneous detection other than the caption boundary portion is prevented. It is effective for.

ステップＳ２では、検出された字幕境界画素１１１に対して黒帯境界検出部１０２において黒帯境界検出を行い、黒帯が検出された場合に必要なフィルタ処理を行う。
一部の映像コンテンツにおいては、上下が黒帯表示になるレターボックス表示される場合がある。この場合、黒帯と撮影映像との境界ラインが水平方向に発生し、撮影映像の状態によっては字幕境界とよく似た特徴を示し、この境界線上が字幕境界画素として誤判定される場合がある。 In step S2, black band boundary detection is performed on the detected caption boundary pixel 111 by the black band boundary detection unit 102, and necessary filter processing is performed when a black band is detected.
Some video content may be displayed in a letterbox with black bars on the top and bottom. In this case, a boundary line between the black belt and the captured image is generated in the horizontal direction, and depending on the state of the captured image, the feature may be similar to the caption boundary, and the boundary line may be erroneously determined as the caption boundary pixel. .

このレターボックス表示での誤判定を回避する目的で、同一ライン上に検出された字幕境界画素数が、その上下のラインにある字幕境界画素数に比べて、ある閾値以上に多い場合は、そのライン上のすべての字幕境界画素を字幕境界画素から除外するフィルタ処理を行う。
ただし、レターボックス表示においては、黒帯と撮影映像との境界ライン上に字幕が重畳される場合もある。この場合、字幕境界画素として検出されるべき画素が、黒帯検出によって該フィルタ処理が行われてしまう場合がある。
字幕境界画素であるのか、黒帯と撮影画素との境界ラインであるのかの区別をすることは困難であるため、該画素についてはステップＳ４により救済措置が施される。 For the purpose of avoiding misjudgment in this letterbox display, if the number of subtitle boundary pixels detected on the same line is more than a certain threshold compared to the number of subtitle boundary pixels in the upper and lower lines, Filter processing is performed to exclude all subtitle boundary pixels on the line from subtitle boundary pixels.
However, in letterbox display, subtitles may be superimposed on the boundary line between the black belt and the captured video. In this case, a pixel to be detected as a caption boundary pixel may be subjected to the filtering process by black band detection.
Since it is difficult to distinguish whether the pixel is a subtitle boundary pixel or a boundary line between a black belt and a photographic pixel, a relief measure is applied to the pixel in step S4.

ステップＳ３では、静止画素検出部１０３により静止画素の検出が行われる。字幕画素は静止していて、輝度が高いという特徴をもつことから、字幕境界画素周辺で本特徴をもつ画素を抽出する。過去の輝度画像（フレームＮ−１）１３と現在の輝度画像（フレームＮ）１２の２フレームの評価対象画素（ｘ，ｙ）に対して、輝度差分絶対値Ｙ_Ｎ（ｐ，ｑ）、Ｙ_Ｎ−１（ｐ，ｑ）の差分絶対値がある閾値Ｔ_ｄｉｆｆ以下である場合は評価対象画素(ｘ,ｙ)を静止画素と判定する。
｜Ｙ_Ｎ−１（ｘ，ｙ）−Ｙ_Ｎ（ｘ，ｙ）｜＜Ｔ_ｄｉｆｆ・・・（６） In step S <b> 3, still pixel detection is performed by the still pixel detection unit 103. Since the caption pixel is stationary and has a feature of high brightness, a pixel having this feature is extracted around the caption boundary pixel. The luminance difference absolute value Y _N (p, q), Y for two evaluation target pixels (x, y) of the past luminance image (frame N−1) 13 and the current luminance image (frame N) 12 _When the difference absolute value of _N−1 (p, q) is equal to or smaller than a threshold value T _diff , the evaluation target pixel (x, y) is determined as a still pixel.
| Y _N-1 (x, y) −Y _N (x, y) | <T _diff (6)

ステップＳ４では、ステップＳ２で求めたフィルタ処理後の字幕境界画素１１２と、ステップＳ３で求めた静止画素１１３を用いて、静止画素と判定された画素（ｘ,ｙ）に対して、自身を除く±１画素の範囲内にある画素の字幕境界画素検出結果を参照し、字幕境界画素が存在すれば画素（ｘ,ｙ）を字幕画素として判定する。
本処理によって、ステップＳ１にて本来字幕では無い画素が、誤検出により字幕境界画素として検出されてしまった場合についても、周辺に字幕画素が検出されないので、ステップＳ５以降の処理によりノイズ除去の効果を果たす。また、字幕境界画素と黒帯境界ラインとが重畳してしまっている場合については、黒帯境界と判定されて、字幕境界画素から除外されてしまった画素を字幕画素として再度検出することで信頼性を向上している。 In step S4, the filtered subtitle boundary pixel 112 obtained in step S2 and the still pixel 113 obtained in step S3 are used to exclude the pixel (x, y) determined to be a still pixel. With reference to the caption boundary pixel detection result of a pixel within a range of ± 1 pixel, if a caption boundary pixel exists, the pixel (x, y) is determined as a caption pixel.
With this processing, even when a pixel that is not originally a caption is detected as a caption boundary pixel due to erroneous detection in step S1, no caption pixel is detected in the vicinity, so that the effect of noise removal is achieved by the processing after step S5. Fulfill. In addition, when the caption boundary pixel and the black belt boundary line are overlapped, it is determined to be the black belt boundary, and the pixel that has been excluded from the caption boundary pixel is detected again as the caption pixel. Improved.

ステップＳ５では、ｍ＊ｎ画素を単位とする複数画素で構成されるブロックを形成し（これを、検出ブロックという）、ステップＳ２で得られたフィルタ処理後の字幕境界画素１１２およびステップＳ４で得られた字幕画素１１４を元に、検出ブロック中の字幕境界画素および字幕画素の検出数の和を、検出ブロックの字幕らしさを示す指標とする（以後、この指標の値を字幕信頼度と記す）。 In step S5, a block composed of a plurality of pixels with m * n pixels as a unit is formed (this is referred to as a detection block), and the filtered caption subtitle pixel 112 obtained in step S2 and obtained in step S4. Based on the subtitle pixel 114, the sum of the numbers of subtitle boundary pixels and subtitle pixels detected in the detection block is used as an index indicating the likelihood of subtitles in the detection block (hereinafter, the value of this index is referred to as subtitle reliability). .

ステップＳ６では、ステップＳ５で得られた字幕信頼度１１５を用い、評価対象の検出ブロックに対して、評価対象としている検出ブロックの字幕信頼度と、評価対象周辺の検出ブロックの字幕信頼度に対してそれぞれ重みづけを行い、その総和がある閾値Ｔ_{ｂｌｏｃｋ}を超えた場合に、評価対象の検出ブロックを字幕ブロックとして、検出ブロック内にある画素全てを字幕であると判定する。 In step S6, the subtitle reliability 115 obtained in step S5 is used, and for the detection block to be evaluated, the subtitle reliability of the detection block to be evaluated and the subtitle reliability of the detection blocks around the evaluation target are determined. When the sum total exceeds a certain threshold value T _block , the detection block to be evaluated is set as a caption block, and all the pixels in the detection block are determined to be captions.

評価対象周辺の検出ブロックに対する重み付け方法については以下に示す限りではないが、ここでは、下記の字幕の特徴を用いて重みづけを行う方法を示す。
(特徴１) 字幕ブロックに隣接する検出ブロックは字幕ブロックである可能性が高い。
(特徴２) 字幕ブロック周囲に存在する検出ブロックは、少なくともいずれか一方向に対して、数ブロック連続して字幕ブロックが存在する可能性が高い。
上記特徴から、評価対象周辺の検出ブロックに対して次のような手順で重みづけを行う。 The weighting method for the detection blocks around the evaluation target is not limited to the following, but here, a method of performing weighting using the following caption characteristics is shown.
(Feature 1) A detection block adjacent to a caption block is highly likely to be a caption block.
(Characteristic 2) There is a high possibility that a detection block existing around a caption block has several consecutive blocks in at least one direction.
From the above characteristics, the detection blocks around the evaluation target are weighted by the following procedure.

（ａ）評価対象検出ブロックの字幕信頼度Ｓ（ｘ,ｙ)に対して重みづけ(α倍)を行う。
（ｂ）評価対象検出ブロックの周辺１ブロック以内の検出ブロックの字幕信頼度の総和Ｓ（ｘ±１，ｙ±１）に対して重みづけ（β倍）を行う。
（ｃ）評価対象検出ブロックから垂直／水平位置にある複数（ｔ）ブロックの検出ブロックの字幕信頼度Ｓ（ｘ±ｔ，ｙ）、Ｓ（ｘ，ｙ±１）に対して重みづけ（γ倍）を行う。 (A) The subtitle reliability S (x, y) of the evaluation target detection block is weighted (α times).
(B) Weighting (β times) is performed on the sum S (x ± 1, y ± 1) of subtitle reliability of the detection blocks within one block around the detection target detection block.
(C) Weighting (γ) for caption reliability S (x ± t, y) and S (x, y ± 1) of detection blocks of a plurality of (t) blocks at vertical / horizontal positions from the evaluation target detection block Times).

手順（ａ）（ｂ）（ｃ）で重複する検出ブロックに対しては（ａ）（ｂ）（ｃ）の順で優先として一度用いられた字幕信頼度は以降の後段の手順で用いないものとする。また、手順（ｃ）で重みづけされた値については、上下左右の各方向で和を求め、最も大きい値となった一方向分を代表値とする。
手順（ａ）（ｂ）（ｃ）で求めた値の総和を閾値Ｔ_ｄｉｆｆと比較し、ある閾値を超えた場合は評価対象の検出ブロックを字幕ブロックと判定し、当該検出ブロック内にある画素全てを字幕領域と判定し、字幕領域情報１６として内挿画像生成回路６に送信する。 For the detection blocks that overlap in steps (a), (b), and (c), the subtitle reliability that has been used once as a priority in the order of (a), (b), and (c) is not used in the subsequent steps. And As for the values weighted in the procedure (c), the sum is obtained in each of the upper, lower, left and right directions, and the one direction having the largest value is used as the representative value.
The sum of the values obtained in steps (a), (b), and (c) is compared with a threshold value T _diff, and if a certain threshold value is exceeded, the detection block to be evaluated is determined as a caption block, and the pixels in the detection block All are determined to be subtitle areas, and are transmitted to the interpolated image generation circuit 6 as subtitle area information 16.

次に内挿画像生成回路６では、内挿ベクトル評価回路５で生成された内挿ベクトル１５と、静止字幕検出回路３で生成された字幕領域情報１６を受信し、ＲＧＢバッファメモリ９に蓄積された内挿画像用原画像データ１８を元に、内挿画像を生成する。内挿画像用原画像データ１８は、フレームＮの原画像データであっても良いし、フレームＮ−１の原画像データであっても良いが、原画像データとしてフレームＮを用いる場合は、画素（ｘ，ｙ）に対する内挿ベクトル（ｖｘ，ｖｙ）の値を、正負反転し（−ｖｘ，−ｖｙ）とする必要がある。以下、本動作例では内挿画像生成用原画像データ１８としてフレームＮ−１の原画像データを用いるものとする。 Next, the interpolated image generation circuit 6 receives the interpolation vector 15 generated by the interpolation vector evaluation circuit 5 and the caption area information 16 generated by the still caption detection circuit 3 and stores them in the RGB buffer memory 9. An interpolated image is generated based on the original image data 18 for the interpolated image. The original image data 18 for the interpolated image may be the original image data of the frame N or the original image data of the frame N-1, but when the frame N is used as the original image data, It is necessary to invert the value of the interpolation vector (vx, vy) for (x, y) to (−vx, −vy). Hereinafter, in this operation example, the original image data of the frame N-1 is used as the original image data 18 for generating the interpolated image.

内挿画像生成回路６では、生成しようとする画素（ｘ，ｙ）の位置にある内挿ベクトル（ｖｘ，ｖｙ）を参照し、内挿画像生成用原画像データ１８において、内挿ベクトルの指示する画素（ｘ−ｖｘ，ｙ−ｖｙ）の値を、生成する内挿画像２０の画素（ｘ，ｙ）の画素値に割り当てる。ただし、画素（ｘ、ｙ）の生成処理において、該画素を含む検出ブロックが字幕領域情報１６により字幕領域であると判定されている場合（マスク条件１）、または内挿ベクトルの指示する画素（ｘ−ｖｘ，ｙ−ｖｙ）を含む検出ブロックが字幕領域情報１６により字幕領域である判定されている場合（マスク条件２）は、画素（ｘ，ｙ）に対する内挿ベクトルを０（ｖｘ＝ｖｙ＝０）として画素（ｘ，ｙ）の生成処理を行う。 The interpolation image generation circuit 6 refers to the interpolation vector (vx, vy) at the position of the pixel (x, y) to be generated, and indicates the interpolation vector in the interpolation image generation original image data 18. The value of the pixel (x-vx, y-vy) to be assigned is assigned to the pixel value of the pixel (x, y) of the generated interpolation image 20. However, in the pixel (x, y) generation process, when the detection block including the pixel is determined to be a caption area by the caption area information 16 (mask condition 1), or a pixel (indicated by an interpolation vector) When the detection block including x-vx, y-vy) is determined to be a caption area by the caption area information 16 (mask condition 2), the interpolation vector for the pixel (x, y) is 0 (vx = vy). = 0), the pixel (x, y) generation process is performed.

マスク条件１により生成された画素（ｘ，ｙ）は、本来字幕領域内にあるべき画素に対して、動きベクトルの誤検出により別の画素が飛び込んで字幕画素を破壊することを防ぐ効果がある。また、マスク条件２により生成された画素（ｘ，ｙ）は、本来字幕領域ではない画素に対して、動きベクトルの誤検出により字幕領域内の画素が飛び込んで撮影画像を破壊することを防ぐ効果がある。 The pixel (x, y) generated by the mask condition 1 has an effect of preventing the subtitle pixel from being destroyed due to another pixel jumping into the pixel that should originally be in the subtitle area due to erroneous detection of the motion vector. . In addition, the pixel (x, y) generated by the mask condition 2 has an effect of preventing a pixel in the subtitle region from jumping into a captured image and destroying the captured image due to erroneous detection of a motion vector with respect to a pixel that is not originally a subtitle region. There is.

最後に画像出力部７により、ＲＧＢバッファメモリ９より出力用原画像データ１９を受信し、内挿画像生成回路６より内挿画像２０を受信し、これらを交互に出力画像データ２１として、ディスプレイ表示装置に出力する。この時の出力順序は、原画像データ（フレームＮ−１）、内挿画像データ、原画像データ（フレームＮ）の順になるように制御する。 Finally, the image output unit 7 receives the output original image data 19 from the RGB buffer memory 9, receives the interpolated image 20 from the interpolated image generation circuit 6, and alternately displays these as output image data 21 on the display. Output to the device. The output order at this time is controlled to be in the order of original image data (frame N-1), interpolated image data, and original image data (frame N).

以上のようにして、内挿画像フレームを生成して画質向上を図るフレームレート変換回路が備えられた画像出力装置において、字幕部分の破綻による画質劣化を防ぐことで高画質化を実現する回路を、字幕領域検出用に特別なメモリを増加することなく実現することが可能となる。
また、本発明によれば、フレームレートを変換する全ての装置に同様に有効であり、例えば、秒間２４フレームで撮影された映像コンテンツを、秒間６０フレーム乃至秒間１２０フレームで出力する装置においても同様である。 As described above, in the image output apparatus provided with the frame rate conversion circuit that generates the interpolated image frame and improves the image quality, the circuit that realizes the high image quality by preventing the image quality deterioration due to the breakdown of the subtitle portion. Thus, it is possible to realize the caption area detection without increasing a special memory.
Further, according to the present invention, it is equally effective for all devices that convert the frame rate. For example, the same applies to a device that outputs video content shot at 24 frames per second from 60 frames per second to 120 frames per second. It is.

（実施形態２）
実施形態２として、秒間２４フレームの画像を秒間６０フレームの画像に変換した画像、即ち、３−２プルダウンされた秒間６０フレームの入力画像を取得し、フレームレート変換処理（秒間６０フレームの映像を秒間１２０フレームの映像に変換する）を行う回路において、字幕付き映像が入力される場合について説明する。 (Embodiment 2)
As Embodiment 2, an image obtained by converting an image of 24 frames per second into an image of 60 frames per second, that is, an input image of 60 frames per second pulled down 3-2 is acquired, and frame rate conversion processing (60 frames per second video is obtained). A case where subtitled video is input in a circuit that converts video into 120 frames per second) will be described.

（実施形態２：構成）
図３は、本発明の画像出力装置が備えるフレームレート変換処理回路の一形態を示す構成を模式的に示す説明図である。本実施形態の構成は、図１に示す実施形態１の構成と同様であるが、Ｙバッファメモリ８、静止字幕検出回路３、動きベクトル検出回路４，内挿ベクトル評価回路５、内挿画像生成回路６等における機能や動作が以下に示すように実施形態１とは異なっている。 (Embodiment 2: Configuration)
FIG. 3 is an explanatory diagram schematically showing a configuration showing an embodiment of a frame rate conversion processing circuit included in the image output apparatus of the present invention. The configuration of the present embodiment is the same as the configuration of the first embodiment shown in FIG. 1, but the Y buffer memory 8, the still caption detection circuit 3, the motion vector detection circuit 4, the interpolation vector evaluation circuit 5, and the interpolation image generation. The functions and operations of the circuit 6 and the like are different from those of the first embodiment as described below.

図４において、入力画像データ１１が画像入力部１に入力され、ＲＧＢフォーマットの原画像データ１７は、２つの信号系統に分散される。第１の信号系統は、ＲＧＢ−Ｙ変換回路２へ入力され、輝度画像へフォーマット変換された後、Ｙバッファメモリ８に輝度画像データを蓄積する。Ｙバッファメモリ８は２フレーム分以上のデータ容量を持つメモリで構成されており、静止字幕検出回路３、および動きベクトル検出回路４に接続されている。 In FIG. 4, input image data 11 is input to the image input unit 1, and the original image data 17 in the RGB format is distributed into two signal systems. The first signal system is input to the RGB-Y conversion circuit 2, converted into a luminance image, and then the luminance image data is stored in the Y buffer memory 8. The Y buffer memory 8 is composed of a memory having a data capacity of two frames or more, and is connected to the still caption detection circuit 3 and the motion vector detection circuit 4.

静止字幕検出回路３は、輝度画像データ（フレームＮ）１２、および起動画像データ（フレームＮ）から時間的に１／２４秒だけずれた（同一時刻でないフレーム）の輝度画像データ（フレームＮ−１）１３の２つのフレームから、画像内の静止字幕領域を検出する。検出された字幕領域情報１６は内挿画像生成回路６に送られる。
動きベクトル検出回路４は、輝度画像データ１２および輝度画像データ１３から、画像間の動きベクトル検出を行う。 The static subtitle detection circuit 3 detects the luminance image data (frame N−1) that is shifted from the luminance image data (frame N) 12 and the activation image data (frame N) by 1/24 seconds in time (frames that are not at the same time). ) A still caption area in the image is detected from the two frames. The detected caption area information 16 is sent to the interpolated image generation circuit 6.
The motion vector detection circuit 4 detects a motion vector between images from the luminance image data 12 and the luminance image data 13.

内挿画像生成回路６は、生成した内挿画像２０を４フレーム連続で画像出力部７へと出力する。次に出力用原画像データ１９を１フレームだけ画像出力部７へと出力する。出力順序は、ある時刻の出力原画像データを基準として、Ｎ−１番目の原画像データ、内挿画像データ４フレーム、Ｎ番目の原画像データの順に出力するように制御する。
画像出力部７では、ディスプレイ表示に必要な同期信号が付加されて所定のタイミングで出力画像データ２１をディスプレイ表示装置に対して出力する。 The interpolated image generation circuit 6 outputs the generated interpolated image 20 to the image output unit 7 for four consecutive frames. Next, the output original image data 19 is output to the image output unit 7 for only one frame. The output order is controlled so that the output original image data at a certain time is used as a reference, and the (N−1) th original image data, the four interpolated image data frames, and the Nth original image data are output in this order.
The image output unit 7 adds a synchronization signal necessary for display display and outputs output image data 21 to the display display device at a predetermined timing.

以上のようにして、内挿画像フレームを生成して画質向上を図るフレームレート変換回路が備えられた画像出力装置において、字幕部分の破綻による画質劣化を防ぐことで高画質化を実現する回路を、字幕領域検出用に特別なメモリを増加することなく実現することが可能となる。 As described above, in the image output apparatus provided with the frame rate conversion circuit that generates the interpolated image frame and improves the image quality, the circuit that realizes the high image quality by preventing the image quality deterioration due to the breakdown of the subtitle portion. Thus, it is possible to realize the caption area detection without increasing a special memory.

（実施形態２：動作）
続いて、本実施形態の動作について説明する。
画像入力部１に対して、ある時刻の第Ｎ番目の入力画像データ１１が入力され、第１の信号系統および第２の信号系統にＲＧＢフォーマットの原画像データ１７が送信される。
第１の信号系統ではＲＧＢフォーマットの原画像データ１７がＲＧＢ−Ｙ変換回路２に入力され、フォーマット変換処理により輝度画像（フレームＮ）１２が生成される。輝度画像（フレームＮ）１２はＹバッファメモリ８に書き込まれると同時に、時間的に１／２４秒だけずれた過去の輝度画像(フレームＮ−１)１３がＹバッファメモリ８から読み出され、これら２つのフレーム（フレームＮ，フレームＮ−１）の輝度画像が静止字幕検出回路３および動きベクトル検出回路４に送信される。 (Embodiment 2: Operation)
Next, the operation of this embodiment will be described.
The Nth input image data 11 at a certain time is input to the image input unit 1, and the original image data 17 in RGB format is transmitted to the first signal system and the second signal system.
In the first signal system, original image data 17 in RGB format is input to the RGB-Y conversion circuit 2 and a luminance image (frame N) 12 is generated by format conversion processing. The luminance image (frame N) 12 is written into the Y buffer memory 8, and at the same time, the past luminance image (frame N-1) 13 shifted by 1/24 seconds is read from the Y buffer memory 8, and these Luminance images of two frames (frame N and frame N−1) are transmitted to the still caption detection circuit 3 and the motion vector detection circuit 4.

なお、入力画像は同一時刻に撮影されたフレームが３フレーム入力、２フレーム入力、３フレーム入力と交互に続く、いわゆる３−２プルダウンされた映像である。この映像の入力の様子を図５に示す。入力フレームＮ―１が２フレーム（フレームＮ−１（１）乃至フレームＮ−１（２））入力され、次に入力フレームＮが３フレーム（フレームＮ（１）乃至フレームＮ（３））入力され、さらに入力フレームＮ＋１が２フレーム（フレームＮ＋１（１）乃至フレームＮ＋１（２））入力され、以後、３フレーム入力、２フレーム入力を繰り返す。このように３−２プルダウンされた入力フレームに対して内挿画像を生成して出力する様子を図６に示す。フレームＮ−１とフレームＮとの間で４枚の内挿画像を生成するが、この時に必要なフレーム画像データはフレームＮ−１の画像データとフレームＮの画像データの２フレームであり、同じフレームを示すフレームデータを複数必要としないため、Ｙバッファメモリ８に３フレーム以上の容量を実装し、入力順にＹバッファメモリ８に対して書き込んでおいて読み出す時点である時刻のフレーム（フレームＮ）と、それに対して１／２４秒ずれた過去の画像（フレームＮ−１）を読み出すように制御しても良いし、Ｙバッファメモリ８に書き込む前に同一時刻のフレームについて書き込まないように制御するか、同一時刻のフレームが書き込まれているＹバッファメモリ８上の領域に上書きして、同一時刻のフレームデータを破棄する。本発明ではその制御方法、およびＹバッファメモリの容量については上記方法に限定されない。 The input image is a so-called 3-2 pull-down video in which frames taken at the same time are alternately followed by 3-frame input, 2-frame input, and 3-frame input. FIG. 5 shows how the video is input. Input frame N-1 is input 2 frames (frame N-1 (1) to frame N-1 (2)), and then input frame N is input 3 frames (frame N (1) to frame N (3)). Further, 2 frames (frame N + 1 (1) to frame N + 1 (2)) are input as input frame N + 1, and thereafter, 3 frame input and 2 frame input are repeated. FIG. 6 shows a state in which an interpolated image is generated and output for an input frame that has been pulled down 3-2 in this way. Four interpolated images are generated between the frame N-1 and the frame N. At this time, the necessary frame image data is two frames of the image data of the frame N-1 and the image data of the frame N, which are the same. Since a plurality of frame data indicating a frame is not required, a capacity of 3 frames or more is mounted in the Y buffer memory 8, and the frame at the time point when the data is written to the Y buffer memory 8 in the input order and read (frame N) Then, control may be performed so that a past image (frame N-1) shifted by 1/24 second is read out, or control is performed so that frames at the same time are not written before writing to the Y buffer memory 8. Or, it overwrites the area on the Y buffer memory 8 where the frame at the same time is written, and discards the frame data at the same time. In the present invention, the control method and the capacity of the Y buffer memory are not limited to the above methods.

また、前出のように３−２プルダウンされた入力画像では時間的に１／２４秒ずれた２つのフレーム間に、４つの内挿フレームを生成する必要がある。これらの内挿フレームは時間的にそれぞれ１／１２０秒ずつずれた動きベクトルを用いたフレームを生成することが理想である。例えば、動きベクトル検出回路４で得られた動きベクトルを元に、このベクトルの１／５、２／５、３／５、４／５の大きさのベクトルを用いて、内挿ベクトル評価を各々行なうことで実現することが可能である。また、この時に生成されるフレームＮとフレームＮ−１の間の内挿フレームを、それぞれ内挿フレームＮ−１’（ベクトル１／５倍）、内挿フレームＮ−１’’（ベクトル２／５倍）、内挿フレームＮ−１’’’（ベクトル３／５倍）、内挿フレームＮ−１’’’’（ベクトル４／５倍）とする。ただし、内挿ベクトルの生成方法については本例の方法に限定されるものではない。 Further, in the input image pulled down 3-2 as described above, it is necessary to generate four interpolation frames between two frames that are shifted by 1/24 seconds in time. Ideally, these interpolated frames generate frames using motion vectors that are shifted in time by 1/120 seconds. For example, based on the motion vector obtained by the motion vector detection circuit 4, interpolation vector evaluation is performed using vectors having a size of 1/5, 2/5, 3/5, and 4/5 of this vector. It can be realized by doing. Also, the interpolation frames generated at this time between the frame N and the frame N−1 are respectively represented as an interpolation frame N-1 ′ (vector 1/5 times) and an interpolation frame N-1 ″ (vector 2 / 5 times), an interpolation frame N-1 ′ ″ (vector 3/5 times), and an interpolation frame N-1 ″ ″ (vector 4/5 times). However, the method of generating the interpolation vector is not limited to the method of this example.

一方、静止字幕検出回路３では、時間的に連続する（同一時刻でない）２つの輝度画像（フレームＮ）１２と輝度画像（フレームＮ−１）１３とから、静止字幕領域の検出を行う。静止字幕領域の検出方法については、（実施形態１：動作）と同様である。 On the other hand, the still caption detection circuit 3 detects a still caption area from two luminance images (frame N) 12 and luminance image (frame N-1) 13 that are temporally continuous (not at the same time). The method for detecting a still subtitle area is the same as in (Embodiment 1: Operation).

また、原画像データ（フレームＮ−１）と原画像データ（フレームＮ）との間には、４フレームの内挿画像データが生成されるが、マスク条件として使用する字幕領域情報１６は、この４フレームでも同じ情報を用いる。このため、図６に示すようにフレームＮ−１とフレームＮへの切替り時のみ静止字幕領域の検出を行い、静止字幕領域情報の更新を行うように制御する。 Also, four frames of interpolated image data are generated between the original image data (frame N-1) and the original image data (frame N). The same information is used for four frames. For this reason, as shown in FIG. 6, control is performed so that still caption area detection is performed only when switching between frame N-1 and frame N and still caption area information is updated.

最後に画像出力部７により、ＲＧＢバッファメモリ９より出力用原画像データ１９を受信し、内挿画像生成回路６より内挿画像２０を受信し、これらを出力画像データ２１として、ディスプレイ表示装置に出力する。この時の出力順序は、図６に示すように原画像データ（フレームＮ−１）、内挿画像データ（内挿フレームＮ−１’）、内挿画像データ（内挿フレームＮ−１’’）、内挿画像データ（内挿フレームＮ−１’’’）、内挿画像データ（内挿フレームＮ−１’’’’）、原画像データ（フレームＮ）の順になるように制御する。 Finally, the image output unit 7 receives the output original image data 19 from the RGB buffer memory 9, receives the interpolated image 20 from the interpolated image generation circuit 6, and outputs these as output image data 21 to the display device. Output. As shown in FIG. 6, the output order at this time is as follows: original image data (frame N-1), interpolation image data (interpolation frame N-1 ′), interpolation image data (interpolation frame N-1 ″). ), Interpolated image data (interpolated frame N-1 ′ ″), interpolated image data (interpolated frame N−1 ″ ″), and original image data (frame N).

以上のようにして、内挿画像フレームを生成して画質向上を図るフレームレート変換回路が備えられた画像出力装置において、字幕部分の破綻による画質劣化を防ぐことで高画質化を実現する回路を、３−２プルダウンされた入力画像のように、連続する入力画像間で同一時刻の画像が複数入力されるような場合でも適用可能である。 As described above, in the image output apparatus provided with the frame rate conversion circuit that generates the interpolated image frame and improves the image quality, the circuit that realizes the high image quality by preventing the image quality deterioration due to the breakdown of the subtitle portion. The present invention can also be applied to a case where a plurality of images at the same time are input between consecutive input images, such as an input image that is 3-2 pulled down.

１…画像入力部、２…Ｙ変換回路、３…静止字幕検出回路、４…動きベクトル検出回路、５…内挿ベクトル評価回路、６…内挿画像生成回路、７…画像出力部、８…Ｙバッファメモリ、９…ＲＧＢバッファメモリ、１１…入力画像データ、１２…輝度画像、１３…輝度画像、１４…動きベクトル、１５…内挿ベクトル、１６…字幕領域情報、１７…原画像データ、１８…内挿画像生成用原画像データ、１９…出力用原画像データ、２０…内挿画像、２１…出力画像データ、１０１…字幕境界検出部、１０２…黒帯境界検出部、１０３…静止画素検出部、１１１…字幕境界画素、１１２…字幕境界画素、１１３…静止画素、１１４…字幕画素、１１５…字幕信頼度。 DESCRIPTION OF SYMBOLS 1 ... Image input part, 2 ... Y conversion circuit, 3 ... Still subtitle detection circuit, 4 ... Motion vector detection circuit, 5 ... Interpolation vector evaluation circuit, 6 ... Interpolation image generation circuit, 7 ... Image output part, 8 ... Y buffer memory, 9 ... RGB buffer memory, 11 ... input image data, 12 ... luminance image, 13 ... luminance image, 14 ... motion vector, 15 ... interpolation vector, 16 ... subtitle area information, 17 ... original image data, 18 ... Original image data for generating an interpolated image, 19 ... Original image data for output, 20 ... Interpolated image, 21 ... Output image data, 101 ... Subtitle boundary detection unit, 102 ... Black belt boundary detection unit, 103 ... Still pixel detection 111: subtitle boundary pixels, 112: subtitle boundary pixels, 113: still pixels, 114: subtitle pixels, 115: subtitle reliability.

Claims

In an image output device that generates and outputs one or more interpolated frames between two input image frames that are temporally continuous with image frames to be displayed,
An image input unit for inputting an image, an original image storage buffer memory for storing the input image without converting the image format, and the input image sequentially converted to an image format of only luminance information An image format conversion circuit for generating an image, a luminance image storage buffer memory for storing a luminance image, a motion vector detection circuit for detecting a motion vector between two temporally continuous luminance images, and a motion vector An interpolation vector evaluation circuit for selecting an interpolation vector of a block composed of each pixel or a plurality of pixels of an interpolation frame inserted between two temporally consecutive frames using A stationary subtitle detection circuit for detecting a stationary subtitle image between two consecutive frames of luminance images, and the original image storage buffer memo An interpolated image generating circuit for generating a desired interpolated frame from the interpolated vector and the still caption detection result based on the original image accumulated in the original image, the original image frame stored in the original image accumulating buffer memory, and the inner image An image output unit that outputs the interpolated frames generated by the inserted image generation circuit so as to be continuous in time, and
The interpolated image generation circuit sets a value of an interpolation vector assigned to a pixel or a block including a plurality of pixels determined to be a caption by the still caption detection circuit to 0, and the still caption detection circuit does not determine a caption When the pixel indicated by the interpolation vector in the block including the pixel or the plurality of pixels is determined to be a caption, the interpolation frame assigned to the block including the pixel or the plurality of pixels is set to 0 to generate the interpolation frame. An image output apparatus.

The image output device according to claim 1, wherein the static subtitle detection circuit has a luminance difference value of two pixels at a point target position centering on the evaluation target pixel exceeding a threshold value in both temporally continuous frames. When the luminance difference value is almost the same between the two frames, the evaluation target pixel is determined to be a caption boundary pixel, and the number of the caption boundary pixels is within a block composed of a plurality of pixels. An image output device that determines a pixel in the block as a subtitle when the number exceeds.

The image output device according to claim 2, wherein the still caption detection circuit includes, for the pixel determined as the caption boundary pixel, the number of pixels determined as the caption boundary pixel existing in a line to which the evaluation target pixel belongs, and When the number of pixels determined as the caption boundary pixels existing in the line immediately below the line both exceeds a certain threshold, all the pixels of the line to which the evaluation target pixel belongs are determined as black belt boundaries, and the caption An image output apparatus that cancels the determination of a pixel determined as a boundary pixel.

4. The image output device according to claim 2, wherein for each evaluation target pixel of two temporally continuous original image frames, the still caption detection circuit exceeds a threshold value with a luminance difference value of the evaluation target pixel. The evaluation target pixel is determined to be a still pixel, and when the caption boundary pixel is present in a peripheral pixel of the still pixel, the still pixel is determined to be a caption, and the block is composed of a plurality of pixels. An image output apparatus, wherein when the sum of the number of subtitle boundary pixels and the number of subtitle pixels exceeds a certain threshold, a pixel in the block is determined as a subtitle.

The image output device according to any one of claims 2 to 4, wherein when two temporally continuous frames of input image frames are determined to have almost no difference in the entire screen, one of the frames is determined. An image output apparatus characterized in that no frame is discarded, no interpolated frame is generated between the two frames, and no caption detection processing is performed in the still caption detection circuit.

In an image output method executed by an image output apparatus that generates and outputs one or a plurality of interpolated frames between two input image frames that are temporally continuous with image frames to be displayed,
An image input step of inputting an image to the image output device;
An original image storage step for storing the input image as it is in the buffer memory without converting the image format;
An image format conversion step for sequentially converting the input image into an image format of only luminance information and generating a luminance image;
A luminance image storage step for storing the luminance image in the buffer memory;
A motion vector detection step of detecting a motion vector between two temporally continuous luminance images;
An interpolation vector evaluation step of selecting an interpolation vector of a block composed of each pixel or a plurality of pixels of an interpolation frame inserted between two temporally continuous frames using a motion vector;
A stationary subtitle detection step for detecting a stationary subtitle image between two temporally continuous luminance images input;
An interpolated image generating step for generating a desired interpolated frame from the interpolated vector and the still caption detection result based on the original image accumulated in the buffer memory in the original image accumulating step;
An image output step of arranging and outputting the original image frames stored in the buffer memory in the original image accumulation step and the interpolation frames generated in the interpolation image generation step so as to be temporally continuous, and
In the interpolated image generation step, a value of an interpolation vector assigned to a pixel or a block including a plurality of pixels determined as a caption in the still caption detection step is set to 0, and the caption is not determined as a caption in the still caption detection step. When the pixel indicated by the interpolation vector in the block including the pixel or the plurality of pixels is determined to be a caption, the interpolation frame assigned to the block including the pixel or the plurality of pixels is set to 0 to generate the interpolation frame. An image output method.

7. The image output method according to claim 6, wherein in the still caption detection step, luminance difference values of two pixels at a point target position centering on the evaluation target pixel both exceed a threshold value in two temporally continuous frames. When the luminance difference value is almost the same between the two frames, the evaluation target pixel is determined to be a caption boundary pixel, and the number of the caption boundary pixels is within a block composed of a plurality of pixels. An image output method characterized by determining a pixel in the block as a subtitle when the number exceeds.

8. The image output method according to claim 7, wherein the still caption detection step includes, for the pixels determined as the caption boundary pixels, the number of pixels determined as the caption boundary pixels existing in a line to which the evaluation target pixel belongs, and When the number of pixels determined as the caption boundary pixels existing in the line immediately below the line both exceeds a certain threshold, all the pixels of the line to which the evaluation target pixel belongs are determined as black belt boundaries, and the caption An image output method comprising: canceling the determination of a pixel determined as a boundary pixel.

9. The image output method according to claim 7, wherein the still caption detection step exceeds a certain threshold for a luminance difference value of the evaluation target pixel for each of the evaluation target pixels of two temporally continuous original image frames. The evaluation target pixel is determined to be a still pixel, and when the caption boundary pixel is present in a peripheral pixel of the still pixel, the still pixel is determined to be a caption, and the block is composed of a plurality of pixels. An image output method comprising: determining a pixel in the block as a subtitle when a sum of the number of subtitle boundary pixels and the number of subtitle pixels exceeds a certain threshold.

The image output method according to any one of claims 7 to 9, wherein when two temporally continuous frames of input image frames are determined to have almost no difference on the entire screen, one of them is determined. An image output method characterized by discarding the frame, generating no interpolated frame between the two frames, and not performing the caption detection processing in the still caption detection step.