JP4850123B2

JP4850123B2 - Image data processing device

Info

Publication number: JP4850123B2
Application number: JP2007121867A
Authority: JP
Inventors: 信寛正賀
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2007-05-02
Filing date: 2007-05-02
Publication date: 2012-01-11
Anticipated expiration: 2027-05-02
Also published as: JP2008278380A

Description

本発明は、映像データと字幕データに基づいて、映像表示装置へ出力される映像信号を生成するための画像データ処理装置に関する。 The present invention relates to an image data processing device for generating a video signal output to a video display device based on video data and caption data.

テレビ放送において、映像および音声を放送するだけでなく、音声の内容を文字で表した、字幕を放送することが知られている。字幕は、視聴者が映像に係る音声が聞こえない場合であっても、映像とともに液晶テレビ等の映像表示装置に表示されることにより、映像に現われる人物が話している内容を視聴者が把握するためのものであり、放送事業者は、映像、音声および字幕を放送するために、映像データと音声データと字幕データを送信する。 In television broadcasting, it is known not only to broadcast video and audio, but also to broadcast subtitles that express the audio content in characters. Subtitles are displayed on a video display device such as a liquid crystal television together with the video even if the viewer cannot hear the audio related to the video, so that the viewer can understand what the person appearing in the video is talking about In order to broadcast video, audio, and subtitles, the broadcaster transmits video data, audio data, and subtitle data.

また、映像に現われる人物である発話者と字幕との関係を分かりやすくするために、人物の一部から字幕を覆う図形（即ち、吹き出し）を映像表示装置に表示することが提案されている（例えば、特許文献１参照）。このような吹き出しの表示に関して、テレビ放送の受信者側において吹き出しを表示するために、コンテンツ作成者は、吹き出しの表示に用いられる吹き出しデータを作成し、放送事業者は、映像と音声と吹き出し付字幕を放送する。より具体的には、コンテンツ作成者は、音声データに基づいて吹き出しの表示時間を決定するとともに、公知の画像処理技術を用いて映像における人物の顔や口を検出することにより、吹き出しの表示位置を決定し、さらに、吹き出しの表示形状を決定する。そして、コンテンツ作成者は、吹き出しの表示時間・表示位置・表示形状を有するとともに字幕データを含む吹き出しデータを作成し、この吹き出しデータを、放送事業者が放送機を用いて、映像データや音声データとともに送信する。
特開２００５−１２４１６９号公報 Further, in order to make it easy to understand the relationship between a speaker, who is a person appearing in a video, and subtitles, it has been proposed to display a graphic (that is, a balloon) covering a subtitle from a part of the person on a video display device ( For example, see Patent Document 1). Regarding the display of such speech balloons, in order to display speech balloons on the television broadcast receiver side, the content creator creates speech balloon data used for speech balloon display, and the broadcaster adds video, audio, and speech balloons. Broadcast subtitles. More specifically, the content creator determines the display time of the speech balloon based on the audio data, and detects the position of the speech balloon by detecting the face and mouth of a person in the video using a known image processing technique. Further, the display shape of the speech balloon is determined. Then, the content creator creates speech balloon data that has the display time, display position, and display shape of the speech balloons and includes caption data, and the speech broadcast data is transmitted to the video data and audio data by the broadcaster using a broadcaster. Send with.
JP 2005-124169 A

しかしながら、上記従来の吹き出しの表示に関して、テレビ放送の送信者側（即ち、コンテンツ作成者および放送事業者）が、吹き出しデータを作成して放送する必要がある。送信者側が吹き出しデータを作成するか否かは、任意の事項であるため、送信者側から吹き出しデータが送信されない場合があり、このような場合は、受信者側において吹き出しを映像表示装置に表示することができないという問題がある。 However, regarding the conventional display of a balloon, it is necessary for a television broadcast sender (that is, a content creator and a broadcaster) to create and broadcast balloon data. Whether or not the sender side creates the balloon data is an arbitrary matter, so the balloon data may not be transmitted from the sender side. In such a case, the balloon is displayed on the video display device on the receiver side. There is a problem that you can not.

本発明は、こうした実情に鑑みてなされたものであり、その目的は、送信者側から吹き出しデータが送信されない場合であっても、映像データと字幕データに基づいて、吹き出しを映像表示装置に表示させることができる画像データ処理装置を提供することにある。 The present invention has been made in view of such circumstances, and its purpose is to display a speech balloon on a video display device based on video data and caption data even when speech balloon data is not transmitted from the sender side. Another object of the present invention is to provide an image data processing apparatus that can be made to operate.

請求項１に記載の発明は、映像データと字幕データに基づいて、映像信号を生成し映像表示装置へ出力する画像データ処理装置であって、映像データを解析することにより、映像表示装置に表示される人物の一部を検出するとともに、人物の一部の表示位置の情報を取得する映像データ解析部と、字幕データを解析することにより、映像表示装置に表示される字幕を検出するとともに、字幕の表示位置の情報を取得する字幕データ解析部と、人物の一部の表示位置の情報および字幕の表示位置の情報を用いて、人物の一部の位置を基準点として字幕を覆う図形を映像表示装置に表示するための映像信号を生成する映像信号生成部と、映像データ解析部によって人物の一部が検出できたか否かを判断するとともに、字幕データ解析部によって字幕が検出できたか否かを判断して、映像信号生成部による映像信号の生成を制御する制御部とを備えることを特徴とする。 The invention according to claim 1 is an image data processing device that generates a video signal based on video data and caption data and outputs the video signal to the video display device, and displays the video data on the video display device by analyzing the video data. In addition to detecting a part of a person to be displayed, a video data analysis unit that acquires information on a display position of a part of the person, and analyzing subtitle data, thereby detecting a subtitle displayed on the video display device, Using a caption data analysis unit that obtains information on the display position of the caption, information on the display position of the part of the person and information on the display position of the caption, a figure that covers the caption with the position of the part of the person as a reference point A video signal generation unit that generates a video signal to be displayed on the video display device and a video data analysis unit determine whether or not a part of the person has been detected, and the subtitle data analysis unit There it is determined whether or not detected, characterized in that it comprises a control unit for controlling the generation of the video signal by the video signal generator.

同構成によれば、制御部は、映像データ解析部によって人物の一部が検出できたか否かを判断するとともに、字幕データ解析部によって字幕が検出できたか否かを判断して、人物の一部の位置を基準点として字幕を覆う図形（即ち、吹き出し）を映像表示装置に表示するために、映像信号生成部による映像信号の生成を制御する。このため、映像データと字幕データに基づいて、映像表示装置に吹き出しを表示することができる。従って、例えば、テレビ放送において吹き出しデータが送信者側から送信されない場合であっても、受信者側において、映像データと字幕データに基づいて、映像表示装置に吹き出しを表示することができる。 According to this configuration, the control unit determines whether or not a part of the person can be detected by the video data analysis unit and determines whether or not the subtitles can be detected by the subtitle data analysis unit. In order to display a graphic (that is, a balloon) covering the subtitles with the position of the part as a reference point on the video display device, the generation of the video signal by the video signal generation unit is controlled. For this reason, it is possible to display a balloon on the video display device based on the video data and the caption data. Therefore, for example, even when the balloon data is not transmitted from the sender side in the television broadcast, the balloon can be displayed on the video display device on the receiver side based on the video data and the caption data.

請求項２に記載の発明は、請求項１に記載の画像データ処理装置であって、制御部は、映像データ解析部によって１人の人物の一部が検出できたと判断し、字幕データ解析部によって字幕が検出できたと判断した場合に、映像信号生成部が映像信号を生成するように制御することを特徴とする。 The invention according to claim 2 is the image data processing device according to claim 1, wherein the control unit determines that a part of one person has been detected by the video data analysis unit, and the caption data analysis unit When it is determined that a subtitle has been detected by the control, the video signal generation unit controls to generate a video signal.

同構成によれば、制御部は、１人の人物の一部が検出できたと判断し、且つ、字幕が検出できたと判断した場合に、人物の一部の位置を基準点として字幕を覆う図形（即ち、吹き出し）を映像表示装置に表示するために、映像信号生成部が映像信号を生成するように制御する。このため、映像表示装置に吹き出しを容易に表示することができる。 According to this configuration, when the control unit determines that a part of one person has been detected and determines that a caption has been detected, the figure covers the caption with the position of the part of the person as a reference point. In order to display (that is, the balloon) on the video display device, the video signal generation unit controls the video signal to be generated. For this reason, a balloon can be easily displayed on the video display device.

請求項３に記載の発明は、請求項１に記載の画像データ処理装置であって、制御部は、さらに、字幕が、映像表示装置に表示される何れの人物に対する字幕であるかを判断することにより、映像信号生成部による映像信号の生成を制御することを特徴とする。 The invention according to claim 3 is the image data processing device according to claim 1, wherein the control unit further determines which subtitle is displayed for the person displayed on the video display device. Thus, the generation of the video signal by the video signal generation unit is controlled.

同構成によれば、制御部は、さらに、字幕が、映像表示装置に表示される何れの人物に対する字幕であるかを判断することにより、人物の一部の位置を基準点として字幕を覆う図形（即ち、吹き出し）を映像表示装置に表示するために、映像信号生成部による映像信号の生成を制御する。このため、映像表示装置に吹き出しを効果的に表示することができる。 According to the same configuration, the control unit further determines whether the caption is a caption for which person displayed on the video display device, and thereby the figure covering the caption with the position of a part of the person as a reference point In order to display (that is, the balloon) on the video display device, the generation of the video signal by the video signal generation unit is controlled. For this reason, the balloon can be effectively displayed on the video display device.

請求項４に記載の発明は、請求項３に記載の画像データ処理装置であって、字幕データ解析部は、さらに、字幕データを解析することにより、字幕の色の情報を取得し、制御部は、字幕の色の情報を用いて、字幕が、映像表示装置に表示される何れの人物に対する字幕であるかを判断することにより、映像信号生成部による映像信号の生成を制御することを特徴とする。 The invention according to claim 4 is the image data processing device according to claim 3, wherein the caption data analysis unit further acquires the color information of the caption by analyzing the caption data, and the control unit Controls the generation of the video signal by the video signal generation unit by determining which subtitle is the subtitle for the person displayed on the video display device using the color information of the subtitle. And

同構成によれば、制御部は、字幕データ解析部によって取得された字幕の色の情報を用いて、字幕が、映像表示装置に表示される何れの人物に対する字幕であるかを判断することにより、人物の一部の位置を基準点として字幕を覆う図形（即ち、吹き出し）を映像表示装置に表示するために、映像信号生成部による映像信号の生成を制御する。このため、映像表示装置に吹き出しを容易かつ効果的に表示することができる。 According to the configuration, the control unit uses the subtitle color information acquired by the subtitle data analysis unit to determine which subtitle is the subtitle for the person displayed on the video display device. The generation of the video signal by the video signal generation unit is controlled in order to display on the video display device a graphic (that is, a balloon) that covers the subtitle with the position of a part of the person as a reference point . For this reason, the balloon can be easily and effectively displayed on the video display device.

請求項５に記載の発明は、請求項３または請求項４に記載の画像データ処理装置であって、制御部は、映像信号生成部によって映像信号が生成された後において、字幕データ解析部によって再度検出された映像表示装置に表示される字幕が、映像表示装置に表示されている同一の人物に対する字幕であると判断した場合に、映像信号生成部が映像信号を継続して生成するように制御することを特徴とする。 The invention according to claim 5 is the image data processing device according to claim 3 or claim 4, wherein the control unit performs the subtitle data analysis unit after the video signal is generated by the video signal generation unit. When the subtitle displayed on the video display device detected again is a subtitle for the same person displayed on the video display device, the video signal generation unit continuously generates the video signal. It is characterized by controlling.

同構成によれば、制御部は、映像信号生成部によって映像信号が生成された後において、字幕データ解析部によって再度検出された映像表示装置に表示される字幕が、映像表示装置に表示されている同一の人物に対する字幕であると判断した場合に、映像信号生成部が映像信号を継続して生成するように制御する。このため、映像表示装置に表示されている同一の人物に対する字幕であると判断した場合に、映像表示装置に吹き出しを効果的に表示することを継続することができる。 According to this configuration, after the video signal is generated by the video signal generation unit, the control unit displays the subtitle displayed on the video display device detected again by the subtitle data analysis unit on the video display device. When it is determined that the subtitles are for the same person, the video signal generation unit controls to continuously generate the video signal. For this reason, when it is determined that the subtitles are for the same person displayed on the video display device, it is possible to continue displaying the speech balloons effectively on the video display device.

本発明によれば、映像データと字幕データに基づいて、映像表示装置に吹き出しを表示することができる。 According to the present invention, a balloon can be displayed on a video display device based on video data and caption data.

以下に、本発明の具体的な実施形態について図面を参照しながら説明する。図１は、本発明の画像データ処理装置を内蔵した放送受信装置が利用されるテレビ放送システムを示すブロック図であり、図２は、本発明の実施形態に係る画像データ処理装置を内蔵する放送受信装置の構成を示すブロック図である。なお、本実施形態では、テレビ放送として、ＡＲＩＢ（Association of Radio Industries and Businesses，電波産業会）の規格（ＳＴＤ：Standard）および技術資料（ＴＲ:Technical Report）、またはこれらによって参照される規格に基づいて放送される、ＩＳＤＢ−Ｔ（Integrated Services Digital Broadcasting - Terrestrial）方式の地上デジタルテレビ放送を例に挙げて説明する。 Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a television broadcasting system in which a broadcast receiving apparatus incorporating an image data processing apparatus of the present invention is used, and FIG. 2 is a broadcasting incorporating an image data processing apparatus according to an embodiment of the present invention. It is a block diagram which shows the structure of a receiver. In the present embodiment, the television broadcast is based on ARIB (Association of Radio Industries and Businesses) standard (STD) and technical data (TR: Technical Report), or a standard referred to by these standards. An example of an terrestrial digital television broadcast of ISDB-T (Integrated Services Digital Broadcasting-Terrestrial) broadcasted will be described.

（放送受信装置）
図１に示すように、放送受信装置１には、アンテナ２と音声出力装置３と映像表示装置４が接続されている。アンテナ２は、送信者側の放送機５からの放送信号を受信し、放送信号を放送受信装置１へ出力する。そして、放送受信装置１は入力された放送信号に対して種々の信号処理を施し、放送信号に含まれるデータに基づいて、音声信号を音声出力装置３へ出力するとともに、映像信号を映像表示装置４へ出力することで、音声出力装置３によって音声を再生し、映像表示装置に人物の動画や字幕等の画像を表示する。 (Broadcast receiving device)
As shown in FIG. 1, an antenna 2, an audio output device 3, and a video display device 4 are connected to the broadcast receiving device 1. The antenna 2 receives a broadcast signal from the transmitter 5 on the transmitter side and outputs the broadcast signal to the broadcast receiving apparatus 1. The broadcast receiving apparatus 1 performs various signal processing on the input broadcast signal, outputs an audio signal to the audio output apparatus 3 based on data contained in the broadcast signal, and outputs the video signal to the video display apparatus. 4, the sound is reproduced by the sound output device 3, and an image such as a moving image or subtitles of a person is displayed on the video display device.

放送受信装置１は、図１に示すように、選局部１１と、伝送路復号部１２と、多重分離部１３と、音声データ処理部１４と、画像データ処理装置１５とを備えている。これら以外にも、インターネット等と接続する通信インタフェースや、視聴者からの入力を受け付けるリモコンインタフェースを設けてもよい。放送受信装置１の各部は、ＣＰＵ（Central Processing Unit）（不図示）が、ＲＯＭ（Read Only Memory）（不図示）に記憶されているプログラムを、ＲＡＭ（Random Access Memory）（不図示）を用いて実行することによって制御される。また、ＲＯＭには、後述する８単位文字符号データの文字符号によって示される文字や、輝度信号と色差信号と透明度の値を間接的に指定するＣＬＵＴ（Color Look-Up Table，カラールックアップテーブル）が記憶されている。 As shown in FIG. 1, the broadcast receiving apparatus 1 includes a channel selection unit 11, a transmission path decoding unit 12, a demultiplexing unit 13, an audio data processing unit 14, and an image data processing device 15. In addition to these, a communication interface for connecting to the Internet or the like, and a remote control interface for receiving input from the viewer may be provided. Each unit of the broadcast receiving apparatus 1 uses a CPU (Central Processing Unit) (not shown), a program stored in a ROM (Read Only Memory) (not shown), and a RAM (Random Access Memory) (not shown). It is controlled by executing. The ROM also includes a CLUT (Color Look-Up Table) that indirectly designates characters indicated by a character code of 8-unit character code data, which will be described later, and a luminance signal, a color difference signal, and a transparency value. Is remembered.

選局部１１は、アンテナ２で受信した複数のチャンネルの放送信号から、所望のチャンネルに対応する放送信号のみを抽出し、この放送信号を伝送路復号部１２において復調しやすいように放送信号を低周波の信号に変換するとともに、デジタル信号へ変換する。そして、選局部１１は、デジタル信号となった放送信号を伝送路復号部１２へ出力する。 The channel selection unit 11 extracts only the broadcast signal corresponding to the desired channel from the broadcast signals of a plurality of channels received by the antenna 2, and reduces the broadcast signal so that the transmission path decoding unit 12 can easily demodulate the broadcast signal. Converts to a frequency signal and also to a digital signal. Then, the channel selection unit 11 outputs the broadcast signal that has become a digital signal to the transmission path decoding unit 12.

伝送路復号部１２は、入力された放送信号であるデジタル信号に対して、送信者側での信号処理手順と逆の手順で、各種の信号処理（例えば、復調、ビタビ復号化、デインタリーブ、リードソロモン復号化）を施し、放送信号をＭＰＥＧ−２ＴＳ（Transport Stream）形式（以下、「ＴＳ形式」という。）の信号へ変換する。そして、伝送路復号部１２は、ＴＳ形式の信号を多重分離部１３へ出力する。 The transmission path decoding unit 12 performs various types of signal processing (for example, demodulation, Viterbi decoding, deinterleaving, etc.) on the input digital signal, which is a procedure reverse to the signal processing procedure on the sender side. Reed-Solomon decoding) is performed, and the broadcast signal is converted into an MPEG-2TS (Transport Stream) format signal (hereinafter referred to as “TS format”). Then, the transmission path decoding unit 12 outputs a TS format signal to the demultiplexing unit 13.

多重分離部１３は、ＭＰＥＧ−２Ｓｙｓｔｅｍｓに基づいて多重化されているＴＳ形式の信号から、各種データ毎に、ＰＥＳ（Packetized Elementary Stream）形式のデータを構成するＭＰＥＧ−２ＴＳパケット（以下、「ＴＳパケット」という。）を分離して、各種データ（即ち、映像データ、字幕データ、音声データ）を取り出す。取り出された音声データは、多重分離部１３によって、音声データ処理部１４へ出力されるとともに、取り出された映像データおよび字幕データは、多重分離部１３によって、画像データ処理装置１５へ出力される。 The demultiplexing unit 13 generates an MPEG-2 TS packet (hereinafter referred to as a “TS packet”) that constitutes data in a PES (Packetized Elementary Stream) format for each type of data from a TS format signal multiplexed based on MPEG-2 Systems. ”), And various data (that is, video data, caption data, audio data) are extracted. The extracted audio data is output to the audio data processing unit 14 by the demultiplexing unit 13, and the extracted video data and caption data are output to the image data processing device 15 by the demultiplexing unit 13.

音声データ処理部１４は、ＭＰＥＧ−２ＡＡＣ（Advanced Audio Coding）によって符号化されている音声データを復号化し、復号化された音声信号をアンプ付スピーカー等の音声出力装置３へ出力する。 The audio data processing unit 14 decodes audio data encoded by MPEG-2 AAC (Advanced Audio Coding), and outputs the decoded audio signal to the audio output device 3 such as a speaker with an amplifier.

（画像データ処理装置）
画像データ処理装置１５は、入力された映像データと字幕データに基づいて、映像信号を生成して液晶テレビ等の映像表示装置４へ出力するために、図２に示すように、映像デコーダ部１５１と、映像データ解析部１５２と、字幕デコーダ部１５３と、字幕データ解析部１５４と、制御部１５５と、フレームメモリを有する映像信号生成部１５６とを備えている。なお、映像データと字幕データは、それぞれ画像データ処理装置１５の映像デコーダ部１５１と字幕デコーダ部１５３に入力される。 (Image data processing device)
The image data processing device 15 generates a video signal based on the input video data and subtitle data and outputs the video signal to the video display device 4 such as a liquid crystal television as shown in FIG. A video data analysis unit 152, a caption decoder unit 153, a caption data analysis unit 154, a control unit 155, and a video signal generation unit 156 having a frame memory. The video data and the caption data are input to the video decoder unit 151 and the caption decoder unit 153 of the image data processing device 15, respectively.

映像デコーダ部１５１は、ＭＰＥＧ−２Ｖｉｄｅｏに基づいて符号化されている映像データに対して、可変長符号の復号や逆量子化や逆離散コサイン変換や動き補償等を行うことにより復号化を行い、復号化された映像データに基づいて、輝度信号と色差信号を映像信号生成部１５６の動画用フレームメモリ１５６ａに書き込む。映像デコーダ部１５１が、復号化された映像データに基づいて、輝度信号と色差信号を動画用フレームメモリ１５６ａに逐次書き込むことによって、動画を示すプレーン（即ち、動画プレーン）が生成される。 The video decoder unit 151 performs decoding by decoding variable length code, inverse quantization, inverse discrete cosine transform, motion compensation, and the like on video data encoded based on MPEG-2 Video. Based on the decoded video data, the luminance signal and the color difference signal are written into the moving image frame memory 156a of the video signal generation unit 156. The video decoder unit 151 sequentially writes a luminance signal and a color difference signal to the moving image frame memory 156a based on the decoded video data, thereby generating a plane indicating a moving image (that is, a moving image plane).

映像データ解析部１５２は、復号化された映像データのうち、１フレーム分の静止画像を示す映像データを解析することにより、所定時刻に映像表示装置４に表示される人物の口を検出するとともに、この人物の口の表示位置の情報を少なくとも取得する。即ち、映像データ（即ち、動画用フレームメモリ１５６ａに書き込まれる輝度信号と色差信号）によって示される１フレーム分の静止画像に対して、パターンマッチング等の画像処理を行うことによって映像データを解析し、映像表示装置４に表示される人物の口を検出する。そして、映像データ解析部１５２は、検出された人物の口の情報（即ち、検出された人物の口の表示位置の情報や、検出された人物の口の数や、検出された人物の口の特徴に関する情報等）を取得し、これらの情報を制御部１５５へ出力する。なお、本実施形態においては、映像データ解析部１５２は、映像表示装置４に表示される人物の口を検出して、検出された人物の口の情報を取得しているが、映像表示装置４に表示される顔や頭等の人物の一部を検出して、検出された顔や頭等の人物の一部の情報を取得するようにしてもよい。 The video data analysis unit 152 detects the mouth of a person displayed on the video display device 4 at a predetermined time by analyzing video data indicating a still image for one frame in the decoded video data. At least information on the display position of the person's mouth is acquired. That is, the video data is analyzed by performing image processing such as pattern matching on the still image for one frame indicated by the video data (that is, the luminance signal and the color difference signal written to the moving image frame memory 156a), The mouth of a person displayed on the video display device 4 is detected. The video data analysis unit 152 then detects information on the detected person's mouth (that is, information on the display position of the detected person's mouth, the number of detected person's mouths, Information related to features) is acquired, and the information is output to the control unit 155. In the present embodiment, the video data analysis unit 152 detects the person's mouth displayed on the video display device 4 and acquires information about the detected person's mouth. It is also possible to detect a part of a person such as a face or head displayed on the screen and acquire information on a part of the detected person such as a face or head.

字幕デコーダ部１５３は、字幕データに基づいて、輝度信号と色差信号と透明度の値を間接的に指定する値（以下、「カラーインデックス」という。）を、映像信号生成部１５６の字幕用フレームメモリ１５６ｂに書き込むことによって、字幕を示すプレーン（即ち、字幕プレーン）を生成する。より具体的には、例えば、ＰＥＳ形式の映像データおよび音声データに含まれずに独立してＰＥＳ形式の字幕データとなっている独立ＰＥＳ形式の字幕データは、図３に示すように、ＰＥＳヘッダ領域とＰＥＳデータ領域から構成され、ＰＥＳデータ領域には、字幕管理データまたは字幕文データがデータグループとして含まれている。また、データグループのデータユニット領域には、図３に示すように、本文データユニットが含まれている。本文データユニットのデータユニットデータ領域には、８単位文字符号データが含まれており、この８単位文字符号データには字幕の文字を示す文字符号だけでなく、字幕の表示位置や、大きさや、色等を指定するための制御符号（例えば、ＳＤＰやＳＤＦやＳＳＺやＳＺＸやＣＯＬ等）が含まれている。文字符号によって示されるＲＯＭに記憶された文字が、制御符号の指定通りに映像表示装置４に表示されるように、字幕用フレームメモリ１５６ｂの、映像表示装置４において字幕が表示される位置に対応する箇所に、カラーインデックスが書き込まれる。 The subtitle decoder unit 153 uses the subtitle frame memory of the video signal generation unit 156 to specify values (hereinafter referred to as “color index”) that indirectly specify the luminance signal, the color difference signal, and the transparency value based on the subtitle data. By writing in 156b, a plane indicating a caption (ie, a caption plane) is generated. More specifically, for example, independent PES format caption data that is not included in the PES format video data and audio data and is independently PES format caption data is shown in FIG. The PES data area includes caption management data or caption text data as a data group. Further, the data unit area of the data group includes a text data unit as shown in FIG. The data unit data area of the body data unit includes 8-unit character code data. The 8-unit character code data includes not only the character code indicating the subtitle character, but also the subtitle display position, size, A control code (for example, SDP, SDF, SSZ, SZX, COL, etc.) for specifying a color or the like is included. Corresponds to the position where the caption is displayed on the video display device 4 in the subtitle frame memory 156b so that the character stored in the ROM indicated by the character code is displayed on the video display device 4 as specified by the control code. The color index is written at the location to be performed.

字幕データ解析部１５４は、字幕データを解析することにより、映像表示装置４に表示される字幕を検出するとともに、この字幕の表示位置の情報を少なくとも取得する。即ち、字幕用フレームメモリ１５６ｂに、字幕を表示するためのカラーインデックス（透明度が０ではない値を示すカラーインデックス）が書き込まれる場合に、字幕データ解析部１５４が、映像表示装置４に表示される字幕を検出する。そして、字幕データ解析部１５４は、字幕用フレームメモリ１５６ｂの、カラーインデックスが書き込まれる箇所を検出することにより、字幕の表示位置の情報を取得し、また、字幕用フレームメモリ１５６ｂに書き込まれるカラーインデックスが示す輝度信号および色差信号を検出することにより、字幕の色の情報を取得する。そして、字幕データ解析部１５４は、検出された字幕の情報（例えば、検出された字幕の表示位置の情報や、検出された字幕の色の情報等）を取得し、制御部１５５へ出力する。なお、本実施形態においては、字幕用フレームメモリ１５６ｂに書き込まれる字幕データ（即ち、字幕用フレームメモリ１５６ｂに書き込まれるカラーインデックス）を解析したが、８単位文字符号データを含む字幕データを解析して、映像表示装置４に表示される字幕を検出するとともに、この字幕の情報を取得してもよい。 The caption data analysis unit 154 analyzes the caption data, thereby detecting the caption displayed on the video display device 4 and at least obtaining the information on the display position of the caption. That is, when a color index for displaying a caption (a color index indicating a value whose transparency is not 0) is written in the caption frame memory 156b, the caption data analysis unit 154 is displayed on the video display device 4. Detect subtitles. Then, the subtitle data analysis unit 154 detects the position where the color index is written in the subtitle frame memory 156b, thereby acquiring information on the display position of the subtitle, and the color index written in the subtitle frame memory 156b. The subtitle color information is acquired by detecting the luminance signal and the color difference signal indicated by. Then, the caption data analysis unit 154 acquires information on the detected caption (for example, information on the display position of the detected caption, information on the color of the detected caption, etc.) and outputs the acquired information to the control unit 155. In this embodiment, subtitle data written to the subtitle frame memory 156b (that is, a color index written to the subtitle frame memory 156b) is analyzed. However, subtitle data including 8-unit character code data is analyzed. The caption displayed on the video display device 4 may be detected and the caption information may be acquired.

制御部１５５は、映像データ解析部１５２によって人物の口が検出できたか否かを判断するとともに、字幕データ解析部１５４によって字幕が検出できたか否かを判断する。また、制御部１５５は、字幕の色の情報を用いて、逐次表示される字幕が、映像表示装置４に表示される何れの人物に対する字幕であるか等の種々の判断をする。これらの判断は、検出された人物の口の情報や検出された字幕の情報を用いて判断することができ、制御部１５５は、上記の判断等に基づいて、映像信号生成部１５６による映像信号の生成を制御する。即ち、制御部１５５は、人物の一部から字幕を覆う図形（即ち、吹き出し）を映像表示装置４に表示するための映像信号を生成する映像信号生成部１５６を制御する。 The control unit 155 determines whether or not the person's mouth can be detected by the video data analysis unit 152 and determines whether or not the caption data analysis unit 154 can detect the caption. In addition, the control unit 155 makes various determinations, such as which subtitles to be sequentially displayed are subtitles for which person displayed on the video display device 4, using the color information of the subtitles. These determinations can be made using the detected person mouth information and the detected caption information, and the control unit 155 determines the video signal generated by the video signal generation unit 156 based on the above determination and the like. Control the generation of. That is, the control unit 155 controls the video signal generation unit 156 that generates a video signal for displaying on the video display device 4 a graphic (that is, a balloon) that covers a caption from a part of a person.

映像表示装置４に吹き出しを表示する場合、制御部１５５は、映像信号生成部１５６が映像表示装置４に吹き出しを表示するための映像信号を生成するように制御する。この場合、制御部１５５は、検出された人物の口の表示位置の情報と字幕の表示位置の情報に基づいて、映像信号生成部１５６が有する吹き出し用フレームメモリ１５６ｃに透明度が０ではない値を示すカラーインデックスを書き込む。より具体的には、制御部１５５は、吹き出し用フレームメモリ１５６ｃの、映像表示装置４において人物の口から字幕を覆う図形（即ち、吹き出し）が表示される位置に対応する箇所に、透明度が０ではない値を示すカラーインデックスを書き込み、吹き出しの図形を示すプレーン（即ち、図形プレーン）を生成する。 When displaying a balloon on the video display device 4, the control unit 155 controls the video signal generation unit 156 to generate a video signal for displaying the balloon on the video display device 4. In this case, based on the detected information on the display position of the mouth of the person and the information on the display position of the caption, the control unit 155 assigns a value whose transparency is not 0 to the balloon frame memory 156c included in the video signal generation unit 156. Write the indicated color index. More specifically, the control unit 155 has a transparency of 0 at a position corresponding to a position where a graphic (that is, a balloon) covering the caption from the mouth of the person is displayed on the video display device 4 in the frame memory 156c for the balloon. A color index indicating a non-value is written, and a plane indicating a balloon graphic (that is, a graphic plane) is generated.

一方、映像表示装置４に吹き出しを表示しない（即ち、吹き出しを消す）場合は、制御部１５５は、映像信号生成部１５６が映像表示装置４に吹き出しを表示するための映像信号を生成しないように制御する。この場合、制御部１５５は、映像信号生成部１５６が有する吹き出し用フレームメモリ１５６ｃの全てに、透明度が０を示すカラーインデックスを書き込む。 On the other hand, when the speech balloon is not displayed on the video display device 4 (that is, the speech balloon is turned off), the control unit 155 prevents the video signal generation unit 156 from generating a video signal for displaying the speech bubble on the video display device 4. Control. In this case, the control unit 155 writes a color index indicating transparency 0 in all of the balloon frame memory 156c included in the video signal generation unit 156.

以上のように、制御部１５５は、カラーインデックスを吹き出し用フレームメモリ１５６ｃに書き込むことによって、映像信号生成部１５６による映像信号の生成を制御し、映像表示装置４に表示される吹き出しを制御することができる。 As described above, the control unit 155 controls the generation of the video signal by the video signal generation unit 156 by writing the color index into the balloon frame memory 156c, and controls the balloon displayed on the video display device 4. Can do.

映像信号生成部１５６は、動画プレーンと図形プレーンと字幕プレーンを合成した合成画像プレーンを生成するために、合成用フレームメモリ１５６ｄに合成後の画像を示す輝度および色差信号を書き込む。より具体的には、動画用フレームメモリ１５６ａに書き込まれた輝度信号および色差信号と、字幕用フレームメモリ１５６ｂと吹き出し用フレームメモリ１５６ｃに書き込まれたカラーインデックスを基に、映像表示装置４で表示する画像の輝度信号および色差信号を合成用フレームメモリ１５６ｄに書き込む。この場合、合成後の画像は、図４に示すように、まず、図形プレーンを、動画プレーンの前に重畳するように合成し、さらに、字幕プレーンを、動画プレーンおよび図形プレーンの前に重畳するように合成した画像である。なお、合成に用いられる図形プレーンの全面の透明度が０の場合は、映像表示装置４に吹き出しが表示されることはない。そして、映像信号生成部１５６は、合成用フレームメモリ１５６ｄに書き込まれた輝度信号や色差信号を、映像信号として映像表示装置４へ出力する。従って、映像信号生成部１５６は、映像データ解析部１５２によって取得された人物の口の表示位置の情報と、字幕データ解析部１５４によって取得された字幕の表示位置の情報を用いて、図４に示すような、人物の一部から字幕を覆う図形（即ち、吹き出し）を映像表示装置４に表示するための映像信号を生成する。 The video signal generation unit 156 writes the luminance and color difference signals indicating the combined image in the combining frame memory 156d in order to generate a combined image plane by combining the moving image plane, the graphic plane, and the caption plane. More specifically, the image is displayed on the video display device 4 based on the luminance signal and color difference signal written in the moving image frame memory 156a and the color index written in the subtitle frame memory 156b and the balloon frame memory 156c. The luminance signal and color difference signal of the image are written into the synthesis frame memory 156d. In this case, as shown in FIG. 4, the synthesized image is first synthesized so that the graphic plane is superimposed before the moving image plane, and further, the subtitle plane is superimposed before the moving image plane and the graphic plane. This is the synthesized image. When the transparency of the entire surface of the graphic plane used for composition is 0, no balloon is displayed on the video display device 4. Then, the video signal generation unit 156 outputs the luminance signal and the color difference signal written in the synthesizing frame memory 156d to the video display device 4 as a video signal. Therefore, the video signal generation unit 156 uses the information on the display position of the person's mouth acquired by the video data analysis unit 152 and the information on the display position of the subtitle acquired by the subtitle data analysis unit 154 in FIG. As shown, a video signal for displaying on the video display device 4 a graphic covering a subtitle from a part of a person (that is, a balloon) is generated.

（吹き出しを表示する手順）
次に、吹き出しを映像表示装置４に表示する際の手順について、図５および図６を参照しながら説明する。まず、視聴者によって映像表示装置４に吹き出しを含む画像を表示する旨の指示がなされて、映像表示装置４に吹き出しを表示する吹き出し表示モードが開始される（ステップＳ１）。 (Procedure for displaying speech balloons)
Next, a procedure for displaying a balloon on the video display device 4 will be described with reference to FIGS. 5 and 6. First, the viewer gives an instruction to display an image including a balloon on the video display device 4, and a balloon display mode for displaying the balloon on the video display device 4 is started (step S1).

次いで、所定時刻Ｔ１に映像表示装置４に表示される字幕の有無を判断する（ステップＳ２）。この場合、映像表示装置４に表示される字幕の有無の判断は、制御部１５５が、字幕データ解析部１５４によって字幕が検出できたか否かを判断することによって判断する。即ち、ステップＳ２においては、制御部１５５が、字幕データ解析部１５４によって映像表示装置４に表示される字幕を検出できたと判断した場合は、映像表示装置４に表示される字幕が有ると判断する。また、制御部１５５が、字幕データ解析部１５４によって映像表示装置４に表示される字幕を検出できなかったと判断した場合は、映像表示装置４に表示される字幕が無いと判断する。 Next, it is determined whether or not there is a caption displayed on the video display device 4 at a predetermined time T1 (step S2). In this case, whether or not there is a caption displayed on the video display device 4 is determined by the control unit 155 determining whether or not the caption data analysis unit 154 has detected the caption. That is, in step S2, if the control unit 155 determines that the subtitle data analysis unit 154 has detected the subtitle displayed on the video display device 4, it determines that there is a subtitle displayed on the video display device 4. . If the control unit 155 determines that the subtitle data analysis unit 154 cannot detect the subtitle displayed on the video display device 4, the control unit 155 determines that there is no subtitle displayed on the video display device 4.

次いで、ステップＳ２において映像表示装置４に表示される字幕が有ると判断された場合、制御部１５５が、映像データ解析部１５２によって人物の口が検出できたか否かを判断する（ステップＳ３）。この場合、映像データ解析部１５２は、ステップＳ２において検出された字幕が映像表示装置４に表示される所定時刻Ｔ１と、同一の時刻に映像表示装置４に表示される映像を示す映像データを解析して、人物の口を検出するようにする。 Next, when it is determined in step S2 that there is a caption displayed on the video display device 4, the control unit 155 determines whether the video data analysis unit 152 has detected a person's mouth (step S3). In this case, the video data analysis unit 152 analyzes the video data indicating the video displayed on the video display device 4 at the same time as the predetermined time T1 when the caption detected in step S2 is displayed on the video display device 4. Thus, the person's mouth is detected.

次いで、ステップＳ３において人物の口が検出できたと判断された場合、ステップＳ３において検出できた人物が１人であるか否かを判断する（ステップＳ４）。この場合、ステップＳ３においては、制御部１５５が、ステップＳ３において映像データ解析部１５２によって検出できた口が１つであるか否かを判断して、検出できた人物が１人であるか否かを判断する。即ち、ステップＳ３およびステップＳ４においては、制御部１５５が、映像データ解析部１５２によって１人の人物の口が検出できたか否かを判断している。 Next, when it is determined in step S3 that a person's mouth has been detected, it is determined whether or not only one person has been detected in step S3 (step S4). In this case, in step S3, the control unit 155 determines whether or not the number of mouths that can be detected by the video data analysis unit 152 in step S3 is one, and whether or not one person can be detected. Determine whether. That is, in step S3 and step S4, the control unit 155 determines whether or not the video data analysis unit 152 has detected the mouth of one person.

ステップＳ２〜ステップＳ４において、映像表示装置４に表示される字幕が無い、または、人物の口が検出できなかった、または、検出できた人物が１人ではないと判断された場合は、吹き出しを表示することなく、ステップＳ２以降の処理が再び行われる。即ち、表示される字幕が無い場合は、吹き出しを表示する必要がなく、人物の口が検出できない場合、または検出できた人物が１人ではない（即ち、２人以上である）場合は、吹き出しの対象となる人物が特定できない。このため、映像表示装置４に吹き出しを表示せずに、所定時間後においてステップＳ２以降の処理を再び行う。 If it is determined in step S2 to step S4 that there is no subtitle displayed on the video display device 4, or the mouth of a person cannot be detected, or the number of detected persons is not one, a balloon is used. The process after step S2 is performed again without displaying. That is, when there is no subtitle to be displayed, there is no need to display a speech bubble, and when a person's mouth cannot be detected or when the number of detected persons is not one (ie, two or more), a speech bubble is displayed. The person who is the target of cannot be identified. For this reason, the process after step S2 is performed again after a predetermined time without displaying a balloon on the video display device 4.

一方、ステップＳ２およびステップＳ３を経て、ステップＳ４において検出できた人物が１人であると判断された場合は、検出された人物の口から字幕を覆う図形（即ち、吹き出し）を映像表示装置４に表示する（ステップＳ５）。即ち、ステップＳ２〜ステップＳ４において、映像表示装置４に表示される字幕が有ると判断され、1人の人物の口が検出できたと判断された場合は、ステップＳ２において字幕データ解析部１５４によって検出された字幕は、ステップＳ３において映像データ解析部１５２によって検出された人物の口に対応する人物Ａが話す内容であると判断する。従って、人物Ａを吹き出しの対象として、ステップＳ３において検出された人物Ａの口から、ステップＳ２において検出された字幕を覆う図形を映像表示装置４に表示するようにする。そして、ステップＳ５において人物Ａを対象として吹き出しを表示した後、所定時間後に後述するステップＳ６へ移行する。 On the other hand, if it is determined through step S2 and step S3 that only one person can be detected in step S4, a graphic (that is, a balloon) that covers the caption from the mouth of the detected person is displayed on the video display device 4. (Step S5). That is, in step S2 to step S4, when it is determined that there is a caption displayed on the video display device 4, and it is determined that one person's mouth has been detected, the caption data analysis unit 154 detects it in step S2. It is determined that the subtitles are the contents spoken by the person A corresponding to the person's mouth detected by the video data analysis unit 152 in step S3. Therefore, the figure covering the subtitle detected in step S2 is displayed on the video display device 4 from the mouth of the person A detected in step S3, with the person A as the target of the balloon. Then, after displaying a balloon for the person A in step S5, the process proceeds to step S6 described later after a predetermined time.

人物Ａを対象として吹き出しが映像表示装置４に表示された後は、図６に示す手順で映像表示装置４に表示される吹き出しが制御される。まず、所定時刻Ｔ２に映像表示装置４に表示される字幕の有無を判断する（ステップＳ６）。この場合、映像表示装置４に表示される字幕の有無の判断は、制御部１５５が、字幕データ解析部１５４によって映像表示装置４に表示される字幕が検出できたか否かを判断することによって判断する。 After the balloon is displayed on the video display device 4 for the person A, the balloon displayed on the video display device 4 is controlled according to the procedure shown in FIG. First, it is determined whether or not there is a caption displayed on the video display device 4 at a predetermined time T2 (step S6). In this case, whether or not there is a caption displayed on the video display device 4 is determined by the control unit 155 determining whether or not the caption data analysis unit 154 has detected the subtitle displayed on the video display device 4. To do.

次いで、ステップＳ６において映像表示装置４に表示される字幕が有ると判断された場合、制御部１５５が、映像データ解析部１５２によって人物の口が検出できたか否かを判断する（ステップＳ７）。この場合、映像データ解析部１５２は、ステップＳ６において検出された字幕が映像表示装置４に表示される所定時刻Ｔ２と、同一の時刻に映像表示装置４に表示される映像を示す映像データを解析して、人物の口を検出するようにする。 Next, when it is determined in step S6 that there is a caption displayed on the video display device 4, the control unit 155 determines whether the video data analysis unit 152 has detected a person's mouth (step S7). In this case, the video data analysis unit 152 analyzes the video data indicating the video displayed on the video display device 4 at the same time as the predetermined time T2 when the caption detected in step S6 is displayed on the video display device 4. Thus, the person's mouth is detected.

ステップＳ６またはステップＳ７において、映像表示装置４に表示される字幕が無い、または、人物の口が検出できなかったと判断された場合は、吹き出しが映像表示装置４に表示されないように、映像表示装置４に表示されていた人物Ａを対象とする吹き出しを消す（ステップＳ８）。その後、所定時間後においてステップＳ２以降の処理を再び行う。 In step S6 or step S7, when it is determined that there is no caption displayed on the video display device 4 or the mouth of the person cannot be detected, the video display device is configured so that the balloon is not displayed on the video display device 4. The balloon for the person A displayed in 4 is erased (step S8). Thereafter, the processing after step S2 is performed again after a predetermined time.

一方、ステップＳ６を経て、ステップＳ７において人物の口が検出できたと判断された場合は、話者が変化したか否かを判断する（ステップＳ９）。即ち、ステップＳ６において検出された字幕が、ステップＳ５において表示した吹き出しの対象となる人物Ａが話している内容であるか否かを判断する。この場合、話者が変化したか否かの判断は、制御部１５５が、字幕データ解析部１５４によって取得された字幕の色の情報を用いて、ステップＳ２とステップＳ６において検出された字幕の色が異なるか否かを判断することによって判断する。より具体的には、ステップＳ２において検出された字幕の色（即ち、所定時刻Ｔ１に映像表示装置４に表示される字幕の色）とステップＳ６において検出された字幕の色（即ち、所定時刻Ｔ２に映像表示装置４に表示される字幕の色）が異なる場合は、話者が変化したと判断する。また、ステップＳ２において検出された字幕の色（即ち、所定時刻Ｔ１に映像表示装置４に表示される字幕の色）とステップＳ６において検出できた字幕の色（即ち、所定時刻Ｔ２に映像表示装置４に表示される字幕の色）が同じ場合は、話者が変化していないと判断する。ステップＳ９において、話者が変化していないと判断することは、ステップＳ６において字幕データ解析部によって検出された字幕が、所定時刻Ｔ１，Ｔ２において映像表示装置４に表示されている同一の人物Ａに対する字幕であると判断することである。 On the other hand, if it is determined through step S6 that the mouth of the person has been detected in step S7, it is determined whether or not the speaker has changed (step S9). That is, it is determined whether or not the caption detected in step S6 is the content spoken by the person A who is the target of the balloon displayed in step S5. In this case, whether or not the speaker has changed is determined by the control unit 155 using the caption color information acquired by the caption data analysis unit 154 and the caption color detected in step S2 and step S6. Judgment is made by judging whether or not. More specifically, the color of the subtitle detected in step S2 (that is, the color of the subtitle displayed on the video display device 4 at the predetermined time T1) and the color of the subtitle detected in step S6 (that is, the predetermined time T2) If the subtitle color displayed on the video display device 4 is different, it is determined that the speaker has changed. Further, the color of the subtitle detected in step S2 (that is, the color of the subtitle displayed on the video display device 4 at the predetermined time T1) and the color of the subtitle detected in step S6 (that is, the video display device at the predetermined time T2). If the color of the subtitle displayed in 4 is the same, it is determined that the speaker has not changed. In step S9, determining that the speaker has not changed is that the caption detected by the caption data analysis unit in step S6 is the same person A displayed on the video display device 4 at the predetermined times T1 and T2. It is determined that it is a subtitle for.

ステップＳ９において話者が変化していないと判断された場合は、ステップＳ７において検出された人物Ａの口から、ステップＳ６において検出された字幕を覆う図形を表示して、人物Ａを対象とする吹き出しの表示を継続する（ステップＳ１０）。その後、所定時間後においてステップＳ６以降の処理が再び行われる。 If it is determined in step S9 that the speaker has not changed, the figure covering the caption detected in step S6 is displayed from the mouth of the person A detected in step S7, and the person A is targeted. The display of the balloon is continued (step S10). Thereafter, the processing after step S6 is performed again after a predetermined time.

一方、ステップＳ９において話者が変化したと判断された場合は、ステップＳ７において検出できた人物が２人以下であるか否かを判断する（ステップＳ１１）。この場合、ステップＳ１１においては、制御部１５５が、ステップＳ７において映像データ解析部１５２によって検出できた口が２つ以下であるか否かを判断して、検出できた人物が２人以下であるか否かを判断する。 On the other hand, if it is determined in step S9 that the speaker has changed, it is determined whether or not the number of persons detected in step S7 is two or less (step S11). In this case, in step S11, the control unit 155 determines whether the number of mouths that can be detected by the video data analysis unit 152 in step S7 is two or less, and the number of detected persons is two or less. Determine whether or not.

ステップＳ１１において検出できた人物が２人以下でない（即ち、３人以上である）と判断された場合は、吹き出しが映像表示装置４に表示されないように、映像表示装置４に表示されていた人物Ａを対象とする吹き出しを消す（ステップＳ１２）。即ち、ステップＳ６において検出できた字幕が、ステップＳ７において検出できた３人以上の人物のうち、何れの人物が話す内容であるか特定できないため、吹き出しを消すようにする。その後、所定時間後においてステップＳ２以降の処理を再び行う。 If it is determined in step S11 that the number of detected persons is not two or less (that is, three or more), the person displayed on the video display device 4 is prevented from being displayed on the video display device 4. The balloon for A is deleted (step S12). That is, since the subtitles detected in step S6 cannot specify which of the three or more persons detected in step S7 is the content to be spoken, the speech balloon is turned off. Thereafter, the processing after step S2 is performed again after a predetermined time.

一方、ステップＳ１１において検出できた人物が２人以下であると判断された場合は、ステップＳ６において検出できた字幕が、ステップＳ７において検出できた人物のうち、人物Ａではない人物Ｂが話す内容であると判断することができる。従って、この場合、映像表示装置４に表示されていた人物Ａに対する吹き出しを消す（ステップＳ１３）。そして、映像表示装置４に人物Ｂに対する他の吹き出しを表示するようにする（ステップＳ１４）。なお、映像表示装置４に表示される人物Ａ，Ｂの口の識別は、制御部１５５が、映像データ解析部１５２によって取得された口の特徴の情報や、所定時間における口の表示位置の情報を用いて識別すればよい。 On the other hand, if it is determined in step S11 that the number of persons detected is two or less, the subtitles detected in step S6 are the contents spoken by person B who is not person A among the persons detected in step S7. Can be determined. Therefore, in this case, the balloon for the person A displayed on the video display device 4 is erased (step S13). Then, another balloon for the person B is displayed on the video display device 4 (step S14). It should be noted that the mouths of the persons A and B displayed on the video display device 4 are identified by the control unit 155 by the mouth characteristic information acquired by the video data analysis unit 152 and the mouth display position information at a predetermined time. What is necessary is just to identify using.

次いで、視聴者によって、吹き出し表示モード終了の旨が指示されているか否かを制御部１５５が判断する（ステップＳ１５）。ステップＳ１５において、吹き出し表示モード終了の旨が指示されていないと判断された場合は、所定時間後にステップＳ６以降の処理を再び行う。一方、ステップＳ１５において、吹き出し表示モード終了の旨が指示されていると判断された場合は、映像表示装置４に吹き出しを表示しないように吹き出し表示モードが終了される。 Next, the control unit 155 determines whether or not the viewer has instructed the end of the balloon display mode (step S15). In step S15, if it is determined that the end of the balloon display mode is not instructed, the processing from step S6 onward is performed again after a predetermined time. On the other hand, if it is determined in step S15 that the instruction to end the balloon display mode is instructed, the balloon display mode is ended so as not to display the balloon on the video display device 4.

上記実施形態の画像データ処理装置１５によれば、以下のような効果を得ることができる。
（１）制御部１５５は、映像データ解析部１５２によって人物の口が検出できたか否かを判断するとともに、字幕データ解析部１５４によって字幕が検出できたか否かを判断する。これらを判断して、制御部１５５は、人物の口から字幕を覆う図形（即ち、吹き出し）を映像表示装置４に表示するために、映像信号生成部１５６による映像信号の生成を制御する。このため、映像データと字幕データに基づいて、映像表示装置４に吹き出しを表示することができる。従って、例えば、テレビ放送において吹き出しデータが送信者側から送信されない場合であっても、受信者側において、映像データと字幕データに基づいて、映像表示装置４に吹き出しを表示することができる。 According to the image data processing device 15 of the above embodiment, the following effects can be obtained.
(1) The control unit 155 determines whether or not the mouth of the person can be detected by the video data analysis unit 152, and determines whether or not the caption can be detected by the caption data analysis unit 154. Based on these determinations, the control unit 155 controls the generation of the video signal by the video signal generation unit 156 in order to display a graphic (that is, a balloon) covering the caption from the mouth of the person on the video display device 4. For this reason, a balloon can be displayed on the video display device 4 based on the video data and the caption data. Therefore, for example, even when the balloon data is not transmitted from the sender side in the television broadcast, the balloon can be displayed on the video display device 4 on the receiver side based on the video data and the caption data.

（２）制御部１５５は、ステップＳ３およびステップＳ４において１人の人物の口が検出できたと判断し、且つ、ステップＳ２において字幕が検出できたと判断した場合に、人物Ａの口から字幕を覆う図形（即ち、吹き出し）を映像表示装置４に表示するために、映像信号生成部１５６が映像信号を生成するように制御する。このため、映像表示装置４に吹き出しを容易に表示することができる。 (2) The control unit 155 covers the subtitle from the mouth of the person A when it is determined that the mouth of one person can be detected in step S3 and step S4, and the caption is detected in step S2. In order to display a graphic (ie, a balloon) on the video display device 4, the video signal generation unit 156 controls to generate a video signal. For this reason, a balloon can be easily displayed on the video display device 4.

（３）制御部１５５は、さらに、字幕データ解析部１５４によって取得された字幕の色の情報を用いて、字幕が、映像表示装置４に表示される何れの人物に対する字幕であるかを判断する。この判断により、人物の口から字幕を覆う図形（即ち、吹き出し）を映像表示装置４に表示するために、映像信号生成部１５６による映像信号の生成を制御する。このため、映像表示装置４に吹き出しを容易かつ効果的に表示することができる。 (3) The control unit 155 further uses the caption color information acquired by the caption data analysis unit 154 to determine which person displayed on the video display device 4 is the caption. . Based on this determination, generation of the video signal by the video signal generation unit 156 is controlled in order to display on the video display device 4 a graphic covering the caption from the person's mouth. For this reason, the balloon can be easily and effectively displayed on the video display device 4.

（４）制御部１５５は、映像信号生成部１５６によって映像表示装置４に吹き出しを表示するための映像信号が生成された後であるステップＳ９において、字幕データ解析部１５４によってステップＳ６で再度検出された字幕が、映像表示装置４に表示されている同一の人物に対する字幕であるかを判断している。そして、制御部１５５は、同一の人物に対する字幕であると判断した場合に、人物Ａの口から字幕を覆う図形（即ち、吹き出し）を映像表示装置４に表示するために、ステップＳ１０において映像信号生成部１５６が映像信号を継続して生成するように制御する。このため、映像表示装置４に吹き出しを効果的に表示することを継続することができる。 (4) The control unit 155 detects again in step S6 by the caption data analysis unit 154 in step S9 after the video signal generation unit 156 generates a video signal for displaying a balloon on the video display device 4. It is determined whether the closed caption is a caption for the same person displayed on the video display device 4. Then, when the control unit 155 determines that the caption is for the same person, the video signal is displayed in step S10 in order to display a graphic (that is, a balloon) covering the caption from the mouth of the person A on the video display device 4. The generation unit 156 controls to continuously generate the video signal. For this reason, it is possible to continue displaying the speech balloons effectively on the video display device 4.

なお、本発明は、上記実施形態に限定されるものではなく、本発明の趣旨に基づいて種々の設計変更をすることが可能であり、それらを本発明の範囲から除外するものではない。例えば、上記実施形態は以下のように変更してもよい。 In addition, this invention is not limited to the said embodiment, A various design change is possible based on the meaning of this invention, and they are not excluded from the scope of the present invention. For example, the above embodiment may be modified as follows.

・上記実施形態においては、字幕データは独立ＰＥＳ形式の字幕データであったが、字幕データが映像データまたは音声データにおける多重可能な領域に含まれるデータや、セクション形式のデータであってもよい。 In the above embodiment, the caption data is the caption data in the independent PES format, but the caption data may be data included in a multiplexable area in the video data or audio data, or data in the section format.

・上記実施形態においては、ＰＥＳ形式の字幕データに含まれる制御符号によって、字幕の色が指定されていたが、字幕データに、字幕の色は含まれていなくてもよい。この場合、上述のステップＳ９においては、字幕の色の情報を用いて話者が変化したか否かを判断していたが、これ以外の情報（例えば、字幕の大きさ等）を用いて、ステップＳ９において話者が変化したか否か、即ち、字幕が、映像表示装置４に表示される何れの人物に対する字幕であるかを判断すればよい。このようにしても、映像表示装置４に吹き出しを効果的に表示することができる。 In the above embodiment, the subtitle color is specified by the control code included in the PES format subtitle data, but the subtitle color may not be included in the subtitle data. In this case, in step S9 described above, it has been determined whether or not the speaker has changed using the subtitle color information, but other information (for example, the subtitle size or the like) is used. It may be determined whether or not the speaker has changed in step S9, that is, which person the subtitle is displayed on the video display device 4 is. Even in this case, the balloon can be effectively displayed on the video display device 4.

・上記実施形態においては、Ｓ４において検出できた人物が１人であると判断された場合に、映像表示装置４に吹き出しを表示するようにしたが、これ以外の方法により、ステップＳ２において検出された字幕が、ステップＳ３において検出された人物が話す内容であると判断してもよい。 In the above embodiment, when it is determined that there is only one person that can be detected in S4, a balloon is displayed on the video display device 4, but it is detected in step S2 by other methods. It may be determined that the closed caption is the content spoken by the person detected in step S3.

・上記実施形態においては、ＰＥＳ形式の字幕データに含まれる制御符号によって、字幕の表示位置が指定されていたが、字幕データには、字幕の表示位置は含まれていなくてもよい。即ち、画像データ処理装置１５の字幕デコーダ部１５３が、映像表示装置４において字幕が表示される位置を決定するようにしてもよい。 In the above embodiment, the caption display position is specified by the control code included in the PES format caption data, but the caption data may not include the caption display position. That is, the caption decoder unit 153 of the image data processing device 15 may determine the position where the caption is displayed on the video display device 4.

・上記実施形態においては、地上デジタルテレビ放送を例に説明したが、ＢＳデジタル放送、ＣＳデジタル放送、またはアナログ放送においても、本発明の画像データ処理装置を用いて、映像データと字幕データに基づき映像表示装置４に吹き出しを表示することができる。 In the above embodiment, terrestrial digital television broadcasting has been described as an example, but BS digital broadcasting, CS digital broadcasting, or analog broadcasting is also used based on video data and caption data using the image data processing device of the present invention. A balloon can be displayed on the video display device 4.

本発明の活用例としては、映像データと字幕データを含む放送信号が入力される放送受信装置に内蔵され、映像データと字幕データに基づいて、映像信号を生成して映像表示装置へ出力する画像データ処理装置が挙げられる。 As an application example of the present invention, an image that is built in a broadcast receiving apparatus to which a broadcast signal including video data and caption data is input, generates a video signal based on the video data and caption data, and outputs the generated video signal to the video display apparatus A data processing apparatus is mentioned.

本発明の実施形態に係るテレビ放送システムを示すブロック図。1 is a block diagram showing a television broadcast system according to an embodiment of the present invention. 本発明の実施形態に係る画像データ処理装置を示すブロック図。1 is a block diagram showing an image data processing apparatus according to an embodiment of the present invention. 本発明の実施形態に係る字幕データの構成図。The block diagram of the caption data which concerns on embodiment of this invention. 動画プレーンと図形プレーンと字幕プレーンの合成を示す概念図。The conceptual diagram which shows the synthesis | combination of a moving image plane, a figure plane, and a caption plane. 本発明の実施形態に係る吹き出しを表示する手順を示すフローチャート。The flowchart which shows the procedure which displays the speech balloon which concerns on embodiment of this invention. 本発明の実施形態に係る吹き出しを表示する手順を示すフローチャート。The flowchart which shows the procedure which displays the speech balloon which concerns on embodiment of this invention.

Explanation of symbols

１…放送受信装置、４…映像表示装置、１５…画像データ処理装置、１５２…映像データ解析部、１５４…字幕データ解析部、１５５…制御部、１５６…映像信号生成部。 DESCRIPTION OF SYMBOLS 1 ... Broadcast receiving apparatus, 4 ... Video display apparatus, 15 ... Image data processing apparatus, 152 ... Video data analysis part, 154 ... Subtitle data analysis part, 155 ... Control part, 156 ... Video signal generation part.

Claims

An image data processing device that generates a video signal based on video data and caption data and outputs the video signal to a video display device,
By analyzing the video data, a part of the person displayed on the video display device is detected, and a video data analysis unit for acquiring information on the display position of the part of the person;
A subtitle data analyzing unit that detects the subtitles displayed on the video display device by analyzing the subtitle data, and obtains information on a display position of the subtitles;
The video for displaying on the video display device a graphic covering the subtitle using the partial position display information and the subtitle display position information as a reference point A video signal generator for generating a signal;
It is determined whether or not a part of a person can be detected by the video data analysis unit, and whether or not captions can be detected by the caption data analysis unit, and the generation of the video signal by the video signal generation unit An image data processing apparatus comprising: a control unit that controls the image data.

When the control unit determines that a part of one person has been detected by the video data analysis unit and determines that a caption has been detected by the caption data analysis unit, the video signal generation unit causes the video signal to be detected. The image data processing device according to claim 1, wherein the image data processing device is controlled to generate the image data.

The control unit further controls generation of the video signal by the video signal generation unit by determining which person displayed on the video display device is the subtitle. The image data processing apparatus according to claim 1.

The caption data analysis unit further acquires color information of the caption by analyzing the caption data,
The control unit determines whether the subtitle is a subtitle for the person displayed on the video display device by using the color information of the subtitle, so that the video signal by the video signal generation unit The image data processing apparatus according to claim 3, wherein generation of the image data is controlled.

After the video signal is generated by the video signal generation unit, the control unit displays subtitles displayed on the video display device detected again by the subtitle data analysis unit on the video display device. 5. The image according to claim 3, wherein the video signal generation unit performs control so as to continuously generate the video signal when it is determined that the subtitles are for the same person. 5. Data processing device.