JP2005124169A

JP2005124169A - Video image contents forming apparatus with balloon title, transmitting apparatus, reproducing apparatus, provisioning system, and data structure and record medium used therein

Info

Publication number: JP2005124169A
Application number: JP2004268644A
Authority: JP
Inventors: Koji Kobayashi; 浩二小林
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-09-26
Filing date: 2004-09-15
Publication date: 2005-05-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide a forming apparatus of video image contents wherein the relation between an utterance person and a title is intelligible and the whole screen is easy to see; a transmitting apparatus; a reproducing apparatus, a providing system, and a data structure and a recording medium which are used in them. <P>SOLUTION: The contents forming apparatus 1 forms balloon data required for offering video image contents with titles by balloons. In the balloon data, at least one information out of information on time duration which should display a balloon, information on a region which displays the balloon, information on a profile of the balloon and information on title characters to be inserted in the balloon is stored. The contents transmitting apparatus 2 multiplexes the balloon data and contents data and makes a broadcast instrument 3 broadcast them. The contents reproducing apparatus 4 analyzes the balloon data, generates a signal for a balloon image and a signal for title characters, synthesizes the signals with a video image picture and makes a content indicating apparatus 5 display a video image with a title by the balloon. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、映像コンテンツの作成装置、送信装置、再生装置、提供システムならびにそれらで用いられるデータ構造および記録媒体に関し、より特定的には、字幕付きの映像コンテンツの作成装置、送信装置、再生装置、提供システムならびにそれらで用いられるデータ構造および記録媒体に関する。 The present invention relates to a video content creation device, a transmission device, a playback device, a providing system, and a data structure and a recording medium used therein, and more specifically, a video content creation device with captions, a transmission device, and a playback device. The present invention relates to a providing system and a data structure and a recording medium used in the providing system.

従来、外国語の映画の内容を理解するために、登場人物による会話を映像画面の上下左右の辺りに母国語の文字で表示していた。これによって、視聴者は、登場人物が外国語で会話していても、その内容を理解することができた。また、最近では、演出上の理由から、登場人物が母国語で会話しているような場合であっても、画面の上下左右の辺りに、登場人物がしゃべった内容を文字で表示するといったテレビ放送も行われている。また、登場人物の会話の内容以外に、画面の説明のために、画面の上下左右の辺りに文字が表示されることがある。このように、画面の上下左右の辺りに表示される文字のことを字幕という。映像内に字幕が表示されることによって、視聴者は、映像内の登場人物の会話を理解したり、映像の内容を理解したりすることができる。 Conventionally, in order to understand the contents of foreign language movies, conversations by characters have been displayed in native language characters on the top, bottom, left and right of the video screen. This allowed viewers to understand what the characters were talking in in a foreign language. In addition, recently, for reasons of production, even if the characters are talking in their native language, the contents of the characters spoken by characters are displayed around the top, bottom, left, and right of the screen. There is also a broadcast. In addition to the content of the characters' conversation, characters may be displayed around the top, bottom, left and right of the screen for explanation of the screen. Thus, the characters displayed around the top, bottom, left and right of the screen are called subtitles. By displaying the subtitles in the video, the viewer can understand the conversation of the characters in the video and understand the content of the video.

近年では、画面中の発話者と字幕との関係を分かりやすくするために様々な方法が提案されている。たとえば、発話者が女性の場合には、字幕の色を暖色にし、発話者が男性の場合には、字幕の色を寒色にしたり、発話者の氏名を表示したりする方法がある。 In recent years, various methods have been proposed in order to make it easy to understand the relationship between a speaker on a screen and subtitles. For example, when the speaker is a female, there is a method in which the color of the subtitle is warm, and when the speaker is a male, the color of the subtitle is cold or the name of the speaker is displayed.

また、画面中の発話者と字幕との関係を視覚的に分かりやすくするために、発話者の口元に字幕を表示するような方法も提案されている（特許文献１参照）。特許文献１に係る装置は、画面中の発話者の位置、その発話者の口の位置、発話者の体の向きを３次元的に計算する。さらに、当該装置は、画面中の発話者が発声している方向を３次元的に計算する。当該装置は、その発音方向を表示画面に２次元化して基準線とし、その基準線上に発話文字を表示する。
特表平９−５０５６７１号公報 In addition, in order to make it easy to visually understand the relationship between a speaker and subtitles on the screen, a method of displaying subtitles at the mouth of the speaker has also been proposed (see Patent Document 1). The apparatus according to Patent Document 1 three-dimensionally calculates the position of a speaker on the screen, the position of the speaker's mouth, and the body direction of the speaker. Further, the apparatus calculates the direction in which the speaker on the screen is speaking three-dimensionally. The device displays the utterance character on the reference line by making the sound generation direction two-dimensionally on the display screen as a reference line.
JP-T 9-505671

一般に、字幕がある場合でも、視聴者は、音声を出力させ、字幕と共に、音声の高低等の特徴を認識して、発話者が誰であるのかを認識している。そのため、従来のような字幕を用いた場合、音声を完全に消してしまうと、視聴者は、画面上、誰が発話しているのかを理解することができない。このことは、画面上、発話者が複数存在する場合、特に問題となる。 In general, even when there are subtitles, the viewer outputs audio and recognizes features such as the level of audio along with the subtitles to recognize who the speaker is. Therefore, when conventional subtitles are used, if the sound is completely erased, the viewer cannot understand who is speaking on the screen. This is particularly problematic when there are multiple speakers on the screen.

また、従来のように、文字の色を変化させることによって、発話者が誰であるのかを示唆することもできるが、この方法はあくまでも発話者が誰であるかについて視聴者にヒントを与えているに過ぎず、音声を消してしまっている場合、視聴者は、発話者を完全に理解することができない場合がある。 In addition, it is possible to suggest who the speaker is by changing the color of the characters as in the past, but this method only gives the viewer a hint about who the speaker is. If the audio is muted, the viewer may not be able to fully understand the speaker.

また、発話者の氏名を表示させたりすることによって、発話者が誰であるのかを示すこともできるが、文字数が増えるなど、デメリットが大きい。 In addition, the name of the speaker can be displayed by displaying the name of the speaker, but there are significant disadvantages such as an increase in the number of characters.

さらに、特許文献１のように、発話者の口元から字幕を基準線に沿って表示する方法にも、文字が発話者以外の登場人物の顔を隠してしまったり、重要な画面を隠してしまったりするなどの問題がある。 Furthermore, as in Patent Document 1, the method of displaying subtitles along the reference line from the mouth of the speaker also hides the faces of characters other than the speaker or hides important screens. There is a problem such as being loose.

このように、従来の字幕を用いた映像の表示方法には、発話者と字幕との関係が分かりにくいという問題があり、さらに、発話者と字幕との関係が分かりやすかったとしても、全体の画面が見づらいものとなっていた。 As described above, the conventional video display method using subtitles has a problem that it is difficult to understand the relationship between the speaker and the subtitle, and even if the relationship between the speaker and the subtitle is easy to understand, The screen was difficult to see.

それゆえ、本発明の目的は、発話者と字幕との関係が分かりやすく、かつ全体の画面が見やすい映像コンテンツの作成装置、送信装置、再生装置、提供システムならびにそれらで用いられるデータ構造および記録媒体を提供することである。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to create a video content creation device, a transmission device, a playback device, a providing system, and a data structure and a recording medium used in them, in which the relationship between a speaker and subtitles is easily understood and the entire screen is easy to see Is to provide.

本発明のさらなる目的は、たとえ音声が消えていたとしても、発話者と字幕との関係が分かりやすく、かつ全体の画面が見やすい映像コンテンツの作成装置、送信装置、再生装置、提供システムならびにそれらで用いられるデータ構造および記録媒体を提供することである。 A further object of the present invention is to provide a video content creation device, a transmission device, a playback device, a providing system, and a system for creating a video content that makes it easy to understand the relationship between a speaker and subtitles and to easily view the entire screen even if the voice is turned off. It is to provide a data structure and a recording medium to be used.

上記課題を解決するために、本発明は、以下のような特徴を有する。本発明は、吹き出しによる字幕付きの映像コンテンツを提供するために必要なデータを作成するためのコンテンツ作成装置である。コンテンツ作成装置は、吹き出し時間抽出手段と、吹き出し領域決定手段と、吹き出し画像決定手段と、字幕文字決定手段と、吹き出しデータ作成手段とを備える。吹き出し時間抽出手段は、元データとなる映像コンテンツデータに基づく映像内において、吹き出しを表示すべき時間を抽出する。吹き出し領域決定手段は、吹き出し時間抽出手段が抽出した時間における映像内において、吹き出しの表示に適した吹き出し領域を決定する。吹き出し画像決定手段は、吹き出し領域決定手段が決定した吹き出し領域に合成する吹き出し画像を決定する。字幕文字決定手段は、吹き出し画像決定手段が決定した吹き出し画像に合成する字幕文字を決定する。吹き出しデータ作成手段は、吹き出しを表示すべき時間に関する情報、吹き出し領域に関する情報、吹き出し画像に関する情報、および字幕文字に関する情報の内、少なくとも一つの情報をデータ化することによって吹き出しデータを作成する。吹き出しデータ作成手段によって作成された吹き出しデータは、映像コンテンツデータと共に再生されることによって、吹き出しによる字幕付きの映像コンテンツを提供する。 In order to solve the above problems, the present invention has the following features. The present invention is a content creation apparatus for creating data necessary for providing video content with subtitles by speech balloons. The content creation device includes speech balloon time extraction means, speech balloon area determination means, speech balloon image determination means, subtitle character determination means, and speech balloon data creation means. The balloon time extracting means extracts a time for displaying the balloon in the video based on the video content data as the original data. The balloon area determining means determines a balloon area suitable for displaying a balloon in the video at the time extracted by the balloon time extracting means. The balloon image determining unit determines a balloon image to be combined with the balloon region determined by the balloon region determining unit. The subtitle character determining means determines a subtitle character to be combined with the balloon image determined by the balloon image determining means. The speech balloon data creating means creates speech balloon data by converting at least one of the information regarding the time for displaying the speech balloon, the information regarding the speech balloon area, the information regarding the speech balloon image, and the information regarding the subtitle character into data. The speech balloon data created by the speech balloon data creating means is reproduced together with the video content data, thereby providing video content with subtitles by the speech balloon.

好ましくは、吹き出し領域決定手段は、映像コンテンツデータに基づく画像内における色調の変化を検出して、色調が平坦な部分を抽出し、当該平坦な部分に含まれる枠を吹き出し領域とする。吹き出し画像決定手段は、枠内に字幕文字を表示するような大きさの画像を吹き出し画像とするとよい。 Preferably, the balloon area determination unit detects a change in color tone in the image based on the video content data, extracts a portion having a flat color tone, and sets a frame included in the flat portion as a balloon region. The balloon image determination means may use an image having a size that displays the subtitle characters in the frame as the balloon image.

さらに、好ましくは、吹き出し領域決定手段は、ユーザからの指示に基づいて、抽出した枠を変更して、吹き出し領域を決定するとよい。吹き出し画像決定手段は、ユーザからの指示に基づいて、吹き出し画像の形状を変更するとよい。字幕文字決定手段は、ユーザからの指示に基づいて、字幕文字を変更するとよい。 Further, preferably, the balloon area determining means may determine the balloon area by changing the extracted frame based on an instruction from the user. The balloon image determining means may change the shape of the balloon image based on an instruction from the user. The subtitle character determining means may change the subtitle character based on an instruction from the user.

また、字幕文字決定手段は、吹き出しを表示すべき時間における単位時間当たりの字幕文字の数が所定数以上であるか否かを判断して、所定数以上である場合、字幕文字を変更するようユーザに通知するとよい。 Further, the subtitle character determining means determines whether or not the number of subtitle characters per unit time in the time when the speech balloon is to be displayed is equal to or greater than a predetermined number. The user should be notified.

好ましくは、字幕文字決定手段は、ユーザからの指示に基づいて、字幕文字の属性を変更するとよい。 Preferably, the subtitle character determination means may change the attribute of the subtitle character based on an instruction from the user.

さらに、コンテンツ作成装置は、映像コンテンツデータと吹き出しデータ作成手段が作成した吹き出しデータとを多重化する多重化手段を備えるとよい。また、さらに、コンテンツ作成装置は、多重化手段によって多重化されたデータをネットワークを介して送信する多重化データ送信手段を備えるとよい。また、さらに、コンテンツ作成装置は、多重化手段によって多重化されたデータをパッケージメディアに格納するパッケージメディア格納手段を備えてもよい。 Furthermore, the content creation device may include multiplexing means for multiplexing the video content data and the balloon data created by the balloon data creation means. Furthermore, the content creation device may further include multiplexed data transmission means for transmitting the data multiplexed by the multiplexing means via the network. Furthermore, the content creation device may further include package media storage means for storing the data multiplexed by the multiplexing means in the package media.

さらに、コンテンツ作成装置は、映像コンテンツデータの再生中の音声の音量を判断する音量判断手段を備えるとよい。このとき、字幕文字決定手段は、音量判断手段が判断した音量に応じて、字幕文字の属性を変化させるとよい。 Furthermore, the content creation device may include a volume determination unit that determines the volume of the sound being played back of the video content data. At this time, the subtitle character determining means may change the attribute of the subtitle character according to the sound volume determined by the sound volume determining means.

さらに、コンテンツ作成装置は、映像コンテンツデータに基づく映像内における人物の顔の大きさを抽出する顔サイズ抽出手段を備えるとよい。このとき、吹き出し画像決定手段は、顔サイズ抽出手段によって抽出された顔サイズの大きさに応じて、吹き出し画像の開始点を決定するとよい。 Furthermore, the content creation device may include a face size extraction unit that extracts the size of a person's face in the video based on the video content data. At this time, the balloon image determining means may determine the start point of the balloon image according to the face size extracted by the face size extracting means.

好ましくは、映像コンテンツデータは、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）方式によって符号化されており、吹き出しデータは、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）によって記述されているとよい。 Preferably, the video content data is encoded by the MPEG (Moving Picture Experts Group) method, and the balloon data is described by XML (extensible Markup Language).

また、本発明は、吹き出しによる字幕付きの映像コンテンツを提供するために必要なデータを送信するためのコンテンツ送信装置である。コンテンツ送信装置は、吹き出しデータ取得手段と、映像コンテンツデータ取得手段と、多重化手段と、送信手段とを備える。吹き出しデータ取得手段は、元データとなる映像コンテンツデータに基づく映像内において吹き出しを表示すべき時間に関する情報、映像内において吹き出しを表示する領域に関する情報、領域内における吹き出しの形状に関する情報、および吹き出しに挿入する字幕文字に関する情報の内、少なくとも一つの情報がデータ化されている吹き出しデータを取得する。映像コンテンツデータ取得手段は、映像コンテンツデータを取得する。多重化手段と、吹き出しデータ取得手段が取得した吹き出しデータと映像コンテンツデータ取得手段が取得した映像コンテンツデータとを多重化する。送信手段は、多重化手段によって多重化されたデータを送信する。 The present invention is also a content transmission apparatus for transmitting data necessary for providing video content with subtitles by speech balloons. The content transmission apparatus includes balloon data acquisition means, video content data acquisition means, multiplexing means, and transmission means. The balloon data acquisition means includes information relating to the time for displaying the balloon in the video based on the video content data as the original data, information relating to the area in which the balloon is displayed in the video, information relating to the shape of the balloon in the area, and Balloon data in which at least one piece of information is converted into data among the information about the subtitle characters to be inserted is acquired. The video content data acquisition means acquires video content data. The multiplexing means, the balloon data acquired by the balloon data acquisition means, and the video content data acquired by the video content data acquisition means are multiplexed. The transmission means transmits the data multiplexed by the multiplexing means.

たとえば、送信手段は、無線放送のための放送装置に対して、多重化データを送信してもよいし、ネットワークを介して、映像コンテンツデータおよび吹き出しデータを再生するためのコンテンツ再生装置に対して、多重化データを送信してもよい。 For example, the transmission means may transmit multiplexed data to a broadcasting device for wireless broadcasting, or to a content playback device for playing back video content data and balloon data via a network. Multiplexed data may be transmitted.

本発明は、吹き出しによる字幕付きの映像コンテンツを提供するために必要なデータをパッケージメディア化するためのコンテンツパッケージメディア化装置である。コンテンツパッケージメディア化装置は、吹き出しデータ取得手段と、映像コンテンツデータ取得手段と、多重化手段と、格納手段とを備える。吹き出しデータ取得手段は、元データとなる映像コンテンツデータに基づく映像内において吹き出しを表示すべき時間に関する情報、映像内において吹き出しを表示する領域に関する情報、領域内における吹き出しの形状に関する情報、および吹き出しに挿入する字幕文字に関する情報の内、少なくとも一つの情報がデータ化されている吹き出しデータを取得する。映像コンテンツデータ取得手段は、映像コンテンツデータを取得する。多重化手段は、吹き出しデータ取得手段が取得した吹き出しデータと映像コンテンツデータ取得手段が取得した映像コンテンツデータとを多重化する。格納手段は、多重化手段によって多重化されたデータをパッケージメディアに格納する。 The present invention is a content package mediating device for packaging data necessary for providing video content with captions by speech balloons. The content package media conversion apparatus includes balloon data acquisition means, video content data acquisition means, multiplexing means, and storage means. The balloon data acquisition means includes information relating to the time when the balloon should be displayed in the video based on the video content data as the original data, information relating to the area in which the balloon is displayed in the video, information relating to the shape of the balloon in the area, and Balloon data in which at least one piece of information is converted into data among the information about the subtitle characters to be inserted is acquired. The video content data acquisition means acquires video content data. The multiplexing unit multiplexes the balloon data acquired by the balloon data acquisition unit and the video content data acquired by the video content data acquisition unit. The storage means stores the data multiplexed by the multiplexing means in the package medium.

また、本発明は、吹き出しによる字幕付きの映像コンテンツを再生するためのコンテンツ再生装置である。コンテンツ再生装置は、吹き出しデータ取得手段と、映像コンテンツデータ取得手段と、吹き出し信号生成手段と、字幕文字信号生成手段と、映像信号生成手段と、合成転送手段とを備える。吹き出しデータ取得手段は、元データとなる映像コンテンツデータに基づく映像内において吹き出しを表示すべき時間に関する情報、映像内において吹き出しを表示する領域に関する情報、領域内における吹き出しの形状に関する情報、および吹き出しに挿入する字幕文字に関する情報の内、少なくとも一つの情報がデータ化されている吹き出しデータを取得する。映像コンテンツデータ取得手段は、映像コンテンツデータを取得する。吹き出し信号生成手段は、吹き出しデータに基づいて、吹き出しの画像に関する信号を生成する。字幕文字信号生成手段は、吹き出しデータに基づいて、字幕文字に関する信号を生成する。映像信号生成手段は、映像コンテンツデータに基づいて、映像に関する信号を生成する。合成転送手段は、吹き出し信号生成手段が生成した吹き出し信号、字幕文字信号生成手段が生成した字幕文字信号、および映像信号生成手段が生成した映像信号を合成し、合成信号を表示装置に転送する。 The present invention is also a content playback device for playing back video content with subtitles by a balloon. The content reproduction apparatus includes balloon data acquisition means, video content data acquisition means, balloon signal generation means, caption character signal generation means, video signal generation means, and synthesis transfer means. The balloon data acquisition means includes information relating to the time when the balloon should be displayed in the video based on the video content data as the original data, information relating to the area in which the balloon is displayed in the video, information relating to the shape of the balloon in the area, and Balloon data in which at least one piece of information is converted into data among the information about the subtitle characters to be inserted is acquired. The video content data acquisition means acquires video content data. The balloon signal generation means generates a signal related to the balloon image based on the balloon data. The subtitle character signal generating means generates a signal related to the subtitle character based on the balloon data. The video signal generation means generates a video signal based on the video content data. The composition transfer means synthesizes the speech signal generated by the speech signal generation means, the caption character signal generated by the caption character signal generation means, and the video signal generated by the video signal generation means, and transfers the composite signal to the display device.

さらに、コンテンツ再生装置は、合成転送手段に吹き出し信号と字幕文字信号とを映像信号に合成させるか否かを命令する合成有無命令手段を備えるとよい。このとき、合成転送手段は、合成有無命令手段から吹き出し信号と字幕文字信号とを映像信号に合成する旨の命令を受けた場合、合成信号を表示装置に転送し、合成しない旨の命令を受けた場合、映像信号のみを表示装置に転送するとよい。 Further, the content reproduction apparatus may include a compositing presence / absence command unit that commands the compositing / transfer unit to synthesize the speech signal and the subtitle character signal with the video signal. At this time, when receiving a command for synthesizing the balloon signal and the subtitle character signal with the video signal from the synthesis presence / absence command unit, the synthesis transfer unit transfers the synthesized signal to the display device and receives a command for not synthesizing. In this case, only the video signal may be transferred to the display device.

さらに、コンテンツ再生装置は、周囲の音の大きさを計測する音量計測手段と、音量計測手段が計測した音の大きさが、閾値を超えるか否かを判断する音量閾値判断手段とを備えるとよい。このとき、合成有無命令手段は、音量閾値判断手段の判断結果に応じて、合成の有無を合成転送手段に命令するとよい。 Furthermore, the content reproduction apparatus includes a volume measuring unit that measures the volume of surrounding sounds, and a volume threshold value determining unit that determines whether the volume of sound measured by the volume measuring unit exceeds a threshold value. Good. At this time, the compositing presence / absence command means may instruct the compositing / transfer means to determine whether or not to composit according to the determination result of the volume threshold value determining means.

好ましくは、合成有無命令手段は、音量閾値判断手段が周囲の音の大きさが閾値を超えないと判断した場合、合成する旨の命令を合成転送手段に与え、さらに、音声を出力するための音声出力装置に音声を出力させないとよい。 Preferably, the synthesis presence / absence command means gives an instruction to synthesize to the synthesis transfer means when the volume threshold judgment means judges that the volume of the surrounding sound does not exceed the threshold, and further outputs the voice It is better not to output audio to the audio output device.

また、合成有無命令手段は、音量閾値判断手段が周囲の音の大きさが閾値を超えると判断した場合、合成する旨の命令を合成転送手段に与えてもよい。 Further, the synthesis presence / absence command means may give the synthesis transfer means a command to synthesize when the volume threshold judgment means judges that the volume of the surrounding sound exceeds the threshold.

さらに、コンテンツ再生装置は、自機器の移動速度を計測する移動速度計測手段を備えるとよい。このとき、合成有無命令手段は、移動速度計測手段によって計測された移動速度が所定の閾値を超えるか否かを判断し、超える場合、合成する旨の命令を合成転送手段に与えるとよい。 Furthermore, the content playback apparatus may include a moving speed measuring unit that measures the moving speed of the device itself. At this time, the compositing presence / absence command means determines whether or not the moving speed measured by the moving speed measuring means exceeds a predetermined threshold value.

また、合成有無命令手段は、ユーザからの指示に応じて、合成転送手段に吹き出し信号と字幕文字信号とを映像信号に合成させるか否かを命令してもよい。 Further, the synthesis presence / absence command means may instruct the synthesis transfer means whether or not to synthesize the speech signal and the subtitle character signal with the video signal in accordance with an instruction from the user.

また、字幕文字信号生成手段は、ユーザからの指示に応じて、字幕文字を画面の上下左右のいずれかに表示するための通常字幕文字信号を吹き出しデータに基づいて生成してもよい。このとき、合成転送手段は、字幕文字信号生成手段が通常字幕文字信号を生成した場合、通常字幕文字信号および映像信号のみを合成して、合成信号を表示装置に転送するとよい。 Further, the subtitle character signal generation means may generate a normal subtitle character signal for displaying the subtitle character on either the top, bottom, left or right of the screen based on the balloon data in accordance with an instruction from the user. At this time, when the subtitle character signal generation unit generates the normal subtitle character signal, the synthesis transfer unit may synthesize only the normal subtitle character signal and the video signal and transfer the synthesized signal to the display device.

好ましくは、合成転送手段は、一フレーム毎に吹き出し信号、字幕文字信号、および映像信号を合成するとよい。 Preferably, the synthesis transfer unit may synthesize the speech balloon signal, the caption character signal, and the video signal for each frame.

さらに、好ましくは、コンテンツ再生装置は、合成転送手段から転送された合成信号に基づいて、合成後の映像を表示する表示手段を備えるとよい。 Furthermore, preferably, the content reproduction apparatus may include a display unit that displays the combined video based on the combined signal transferred from the combined transfer unit.

また、本発明は、コンピュータ装置に吹き出しによる字幕付きの映像コンテンツを表示させるための構造を有するデータが記録されたコンピュータ読み取り可能な記録媒体である。記録媒体は、元データとなる映像コンテンツデータに基づく映像内において吹き出しを表示すべき時間に関する情報を格納するための構造と、時間に関する情報と対応して、映像内において吹き出しを表示する領域に関する情報を格納するための構造と、時間に関する情報と対応して、領域内における吹き出しの形状に関する情報を格納するための構造と、時間に関する情報と対応して、吹き出しに挿入する字幕文字に関する情報を格納するための構造とを有するデータが格納されているとよい。 The present invention is also a computer-readable recording medium on which data having a structure for displaying video content with subtitles by a balloon is recorded on a computer device. The recording medium has a structure for storing information related to the time for displaying the speech balloon in the video based on the video content data as the original data, and information related to the area for displaying the speech balloon in the video corresponding to the information related to the time. Corresponding to the structure for storing the text and information about the time, the structure for storing the information about the shape of the speech bubble in the area, and the information about the subtitle character to be inserted into the speech balloon corresponding to the information about the time It is preferable that data having a structure to be stored is stored.

好ましくは、時間に関する情報を格納するための構造は、字幕の開始時間を示す情報を格納するための構造と、字幕を継続する時間を示す情報を格納するための構造とからなるとよい。 Preferably, the structure for storing information relating to time may be composed of a structure for storing information indicating the start time of captions and a structure for storing information indicating time to continue captions.

また、本発明は、コンピュータ装置に吹き出しによる字幕付きの映像コンテンツを表示させるための上記のようなデータ構造である。 In addition, the present invention is a data structure as described above for displaying video content with captions by a balloon on a computer device.

また、本発明は、吹き出しによる字幕付きの映像コンテンツを提供するためコンテンツ提供システムであって、元データとなる映像コンテンツデータに基づく映像内において吹き出しを表示すべき時間に関する情報、映像内において吹き出しを表示する領域に関する情報、領域内における吹き出しの形状に関する情報、および吹き出しに挿入する字幕文字に関する情報の内、少なくとも一つの情報がデータ化されている吹き出しデータを作成するコンテンツ作成装置と、コンテンツ作成装置が作成した吹き出しデータと映像コンテンツデータとを多重化して、当該多重化データを映像コンテンツとして提供するコンテンツ提供手段と、コンテンツ提供手段から提供される多重化データに基づいて、吹き出しによる字幕付きの映像コンテンツを再生するコンテンツ再生装置とを備える。 The present invention is also a content providing system for providing video content with subtitles by a speech balloon, wherein the information regarding the time to display the speech balloon in the video based on the video content data as the original data, the speech balloon in the video A content creation device that creates speech balloon data in which at least one of the information related to the area to be displayed, the information related to the shape of the speech balloon in the region, and the information related to the subtitle character inserted in the speech balloon is converted into data, and the content creation device The content providing means for multiplexing the speech balloon data and the video content data created by and providing the multiplexed data as video content, and the video with subtitles by the speech balloon based on the multiplexed data provided from the content providing means Content And a content reproduction apparatus which live.

コンテンツ提供手段は、無線放送によって、多重化データをコンテンツ再生装置に送信してもよいし、ネットワーク配信によって、多重化データをコンテンツ再生装置に送信してもよいし、パッケージメディアを介して、多重化データをコンテンツ再生装置に提供してもよい。 The content providing means may transmit the multiplexed data to the content reproduction device by wireless broadcasting, may transmit the multiplexed data to the content reproduction device by network distribution, or may multiplex through the package media. The structured data may be provided to the content reproduction apparatus.

本発明によれば、映像コンテンツにおいて、字幕文字を吹き出し部分に挿入して表示することができるので、発話者と字幕との関係が分かりやすくなる。さらに、吹き出し部分に字幕文字が表示されるので、画面が見やすくなる。吹き出しには、始点があって、その始点が発話者が誰であるかを指しているので、たとえ、音声が出力されていなかったとしても、発話者と字幕文字とをユーザは対応付けることができ、映像の内容を認識することができる。これは、音声を出すことができない静かな場所や、逆に周囲の音が大きくてスピーカからの音を聞き取りにくい場所などでより役立つ。また、携帯通信端末に組み込むことによって、ヘッドホン等から音声を聞かなくても、映像の内容を認識することができる。 According to the present invention, subtitle characters can be inserted and displayed in a speech balloon portion in video content, so that the relationship between a speaker and subtitles can be easily understood. Further, since the subtitle characters are displayed in the balloon portion, the screen is easy to see. The speech bubble has a starting point, and the starting point indicates who the speaker is. Therefore, even if no voice is output, the user can associate the speaker with the subtitle character. The content of the video can be recognized. This is more useful in a quiet place where no sound can be output, or a place where the surrounding sound is loud and it is difficult to hear the sound from the speaker. Further, by incorporating it into a mobile communication terminal, the contents of the video can be recognized without listening to sound from headphones or the like.

また、色調が平坦な部分に吹き出しが設けられることとなるので、画面上、重要な部分が隠れてしまう事態を回避することができる。また、ユーザの指示に応じて、吹き出し画像を表示する領域を変更することができるので、重要な部分が隠れるのを意識的に防止することができる。また、吹き出し画像の形状が変更できるので、発話者の発言内容に応じて、適切な吹き出しを選ぶことができる。たとえば、心に思っているようなことは、雲のような吹き出しを用いればよい。また、字幕文字を変更することができるので、強調を入れたい場合に加工を加えることができる。 In addition, since the balloon is provided in a portion where the color tone is flat, it is possible to avoid a situation where an important portion is hidden on the screen. In addition, since the area for displaying the balloon image can be changed according to the user's instruction, it is possible to consciously prevent the important part from being hidden. Moreover, since the shape of the speech balloon image can be changed, an appropriate speech balloon can be selected according to the content of the speech of the speaker. For example, a cloud-like balloon can be used for things that are in mind. In addition, since the subtitle characters can be changed, it is possible to add processing when emphasis is desired.

字幕文字の数が多い場合は、自動的にユーザに通知するので、適切な字幕文字を作成することができる。 When the number of subtitle characters is large, the user is automatically notified, so that an appropriate subtitle character can be created.

映像コンテンツデータとしてＭＰＥＧデータを用い、吹き出しデータとしてＸＭＬに準拠するデータを用いることによって、親和性を高めることができ、標準化に貢献できる。 By using MPEG data as the video content data and using XML-compliant data as the balloon data, the affinity can be increased and contribution to standardization can be achieved.

コンテンツ再生装置は、周囲の音の大きさに応じて、音声の出力、字幕文字の表示を制御することができるので、場所に応じた出力が自動的に提供されることとなる。 Since the content reproduction apparatus can control the output of audio and the display of subtitle characters according to the volume of surrounding sounds, the output corresponding to the place is automatically provided.

以下、図面を参照して、本発明の一実施形態について説明する。図１は、本発明の一実施形態に係る吹き出しを用いた字幕付きの映像コンテンツを放送するための放送システムの全体構成を示すブロック図である。図１において、放送システムは、コンテンツ作成装置１と、コンテンツ送信装置２と、放送装置３と、コンテンツ再生装置４と、コンテンツ表示装置５とを備える。なお、図１においては、コンテンツ作成装置１、コンテンツ送信装置２、放送装置３、コンテンツ再生装置４、およびコンテンツ表示装置５を簡単のために一つずつだけ示したが、二つ以上あってもよい。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the overall configuration of a broadcasting system for broadcasting video content with captions using a speech balloon according to an embodiment of the present invention. In FIG. 1, the broadcast system includes a content creation device 1, a content transmission device 2, a broadcast device 3, a content reproduction device 4, and a content display device 5. In FIG. 1, only one content creation device 1, content transmission device 2, broadcast device 3, content reproduction device 4, and content display device 5 are shown for simplicity, but there may be two or more. Good.

コンテンツ作成装置１は、予め格納されているコンテンツデータによる映像に合致する字幕文字のリストを示すデータ（以下、字幕リストデータという）、およびコンテンツデータによる映像に吹き出しを用いた字幕付きの映像を合成するために用いる吹き出しデータを作成する。 The content creation device 1 synthesizes data indicating a list of subtitle characters matching the video based on the content data stored in advance (hereinafter referred to as subtitle list data), and video with subtitles using speech bubbles on the video based on the content data. Create speech balloon data to be used.

コンテンツ送信装置２は、コンテンツ作成装置１からコンテンツデータおよび吹き出しデータを取得して、これらを多重化して多重化データとして、構内回線や公衆網、インターネット、電波網等のネットワークを介して、放送装置３に送信する。コンテンツ作成装置１およびコンテンツ送信装置２は、たとえば、製作会社等のコンテンツ作成者側に存在する。なお、ここでは、ネットワークを介して多重化データを放送装置３に送信することとしたが、ＤＶＤ等の記録媒体に多重化データを格納して、放送装置３に多重化データを読みとらせるようにしてもよい。 The content transmission device 2 acquires content data and speech balloon data from the content creation device 1 and multiplexes these data as multiplexed data via a network such as a local line, public network, Internet, radio network, etc. 3 to send. The content creation device 1 and the content transmission device 2 exist on the content creator side such as a production company, for example. Here, the multiplexed data is transmitted to the broadcasting device 3 via the network. However, the multiplexed data is stored in a recording medium such as a DVD so that the broadcasting device 3 can read the multiplexed data. It may be.

放送装置３は、コンテンツ送信装置２から送られてくる多重化データを受信し、アンテナを介して放送する。放送装置３は、たとえば、テレビ局等の放送事業者側に存在する。 The broadcast device 3 receives the multiplexed data sent from the content transmission device 2 and broadcasts it via the antenna. The broadcast device 3 is present on the broadcast provider side such as a television station, for example.

コンテンツ再生装置４は、放送装置３から送信されてくる多重化データを受信して解析し、吹き出しを用いた字幕付きの映像をコンテンツ表示装置５に表示させる。コンテンツ表示装置５は、コンテンツ再生装置４から送られてくる信号に応じて、吹き出しを用いた字幕付きの映像を表示する。コンテンツ再生装置４およびコンテンツ表示装置５は、たとえば、視聴者等の自宅内に存在する。 The content reproduction device 4 receives and analyzes the multiplexed data transmitted from the broadcasting device 3 and causes the content display device 5 to display a video with captions using a balloon. The content display device 5 displays a video with captions using a balloon in response to a signal sent from the content playback device 4. The content reproduction device 4 and the content display device 5 exist in the home of a viewer or the like, for example.

図２は、コンテンツ作成装置１の機能的構成を示すブロック図である。図２において、コンテンツ作成装置１は、データ作成制御部１１と、入力部１２と、表示出力部１３と、タイムカウント部１４と、記憶部１５とを含む。 FIG. 2 is a block diagram illustrating a functional configuration of the content creation device 1. In FIG. 2, the content creation device 1 includes a data creation control unit 11, an input unit 12, a display output unit 13, a time count unit 14, and a storage unit 15.

入力部１２は、マウスやキーボード、タッチパネル、ジョイスティック等の入力デバイスであって、これらを操作することによってユーザが入力した操作情報をデータ作成制御部１１に入力する。 The input unit 12 is an input device such as a mouse, a keyboard, a touch panel, and a joystick, and inputs operation information input by the user to the data creation control unit 11 by operating these devices.

記憶部１５は、ハードディスク等の記録装置である。記憶部１５には、コンテンツデータと、字幕リストデータと、吹き出し形状データと、吹き出しデータとが格納されている。 The storage unit 15 is a recording device such as a hard disk. The storage unit 15 stores content data, subtitle list data, balloon shape data, and balloon data.

コンテンツデータは、映像および音声がＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）等の符号化方式によって符号化された結果得られた符号化後のストリームデータである。 Content data is encoded stream data obtained as a result of encoding video and audio by an encoding method such as MPEG (Moving Picture Experts Group).

字幕リストデータには、字幕文字とそれを表示する時間に関する情報とが格納されている。図３は、字幕リストデータのデータ構成の例を示す図である。図３に示すように、字幕リストデータには、たとえば、字幕開始時間と、字幕継続時間と、字幕文字とが登録されている。ここで、字幕開始時間は、コンテンツの先頭からどれだけ経過した時点で、対応する字幕文字を表示するかを示す情報である。字幕継続時間は、対応する字幕文字を表示し続ける時間を示す情報である。図３では、「君のアイデアに賛成する」という字幕文字を、コンテンツの開始から０時間２４分３０秒１５フレーム目を経過した時点から、０分２秒間表示するための字幕リストデータの例を示している。なお、図３に示した字幕リストデータは一例であって、これに限定されるものではない。また、１秒間当たりフレーム数も、特に限定されるものではない。 The subtitle list data stores subtitle characters and information about the time for displaying them. FIG. 3 is a diagram illustrating an example of a data structure of caption list data. As illustrated in FIG. 3, for example, a caption start time, a caption duration, and a caption character are registered in the caption list data. Here, the subtitle start time is information indicating how much time has elapsed from the beginning of the content and the corresponding subtitle character is to be displayed. The subtitle duration is information indicating the time during which the corresponding subtitle character is continuously displayed. In FIG. 3, an example of subtitle list data for displaying the subtitle character “I agree with your idea” for 0 minute 2 seconds from the start of the content at 0 hour 24 minutes 30 seconds 15 frames. Show. The subtitle list data shown in FIG. 3 is an example, and the present invention is not limited to this. Also, the number of frames per second is not particularly limited.

吹き出し形状データは、吹き出しの形状を定義したデータである。たとえば、吹き出し形状データには、吹き出し形状の名称と、吹き出し形状に関する情報とが対応付けられている。 The balloon shape data is data defining the shape of the balloon. For example, the balloon shape data is associated with the name of the balloon shape and information regarding the balloon shape.

図４は、吹き出しデータのデータ構成の例を示す図である。図４に示すように、吹き出しデータには、たとえば、コンテンツデータ名に対応して、字幕開始時間毎に、字幕継続時間、字幕文字展開スピード、字幕文字属性、吹き出し範囲、吹き出し開始点、吹き出し形状、および字幕文字が記述されている。字幕開始時間、および字幕継続時間は、吹き出しを表示すべき時間に関する情報である。字幕文字展開スピード、字幕文字属性、および字幕文字は、字幕文字に関する情報である。吹き出し範囲、および吹き出し開始点は、吹き出しの表示に適した映像内における吹き出し領域に関する情報である。吹き出し形状は、吹き出し領域に合成する吹き出し画像に関する情報である。吹き出しデータは、吹き出しを表示すべき時間に関する情報、吹き出し領域に関する情報、吹き出し画像に関する情報、および字幕文字に関する情報の内、少なくとも一つの情報をデータ化されたものであればよい。たとえば、吹き出しデータは、メタ言語で記述されている。ここで、字幕開始時間、字幕継続時間、および字幕文字は、字幕リストデータのものと同様である。字幕文字展開スピードは、字幕継続時間内に字幕文字を文章の先頭から順次表示する速度を示す情報である。字幕文字属性は、字幕文字のフォント種類や、字幕文字の色、字幕文字の背景と透過率、字幕文字の枠の種類等を示す情報である。吹き出し範囲は、吹き出しを合成する画面上の位置を示す情報である。吹き出し開始点は、吹き出しを開始する画面上の位置を示す情報である。吹き出し形状は、吹き出しデータに登録されている吹き出しの名称を示す情報である。 FIG. 4 is a diagram illustrating an example of the data structure of the balloon data. As shown in FIG. 4, for example, the speech balloon data includes, for each subtitle start time, subtitle duration, subtitle character expansion speed, subtitle character attributes, speech balloon range, speech balloon start point, speech balloon shape, corresponding to the content data name. , And subtitle characters are described. The caption start time and caption caption duration are information relating to the time for which a speech balloon should be displayed. The subtitle character development speed, the subtitle character attribute, and the subtitle character are information on the subtitle character. The speech balloon range and the speech balloon start point are information relating to the speech balloon area in the video suitable for displaying the speech balloon. The balloon shape is information regarding a balloon image to be combined with the balloon area. The speech balloon data may be data obtained by converting at least one of the information regarding the time for displaying the speech balloon, the information regarding the speech balloon area, the information regarding the speech balloon image, and the information regarding the subtitle character into data. For example, the balloon data is described in a meta language. Here, the subtitle start time, subtitle duration, and subtitle characters are the same as those of the subtitle list data. The subtitle character expansion speed is information indicating the speed at which subtitle characters are sequentially displayed from the beginning of the sentence within the subtitle duration. The subtitle character attribute is information indicating the font type of the subtitle character, the color of the subtitle character, the background and transparency of the subtitle character, the type of the subtitle character frame, and the like. The balloon range is information indicating the position on the screen where the balloon is synthesized. The balloon start point is information indicating the position on the screen where the balloon starts. The balloon shape is information indicating the name of the balloon registered in the balloon data.

このように、吹き出しデータは、コンピュータ装置に吹き出しによる字幕付きの映像コンテンツを表示させるための構造を有している。このような構造は、元データとなる映像コンテンツデータに基づく映像内において吹き出しを表示すべき時間に関する情報（たとえば、上述の字幕開始時間や字幕継続時間）を格納するための構造と、時間に関する情報と対応して、映像内において吹き出しを表示する領域に関する情報（たとえば、上述の吹き出し範囲や吹き出し開始点）を格納するための構造と、時間に関する情報と対応して、領域内における吹き出しの形状に関する情報（たとえば、上述の吹き出し形状）を格納するための構造と、時間に関する情報と対応して、吹き出しに挿入する字幕文字に関する情報（たとえば、上述の字幕文字展開スピードや字幕文字属性、字幕文字）を格納するための構造とを有している。本実施形態では、時間に関する情報を格納するための構造は、字幕の開始時間を示す情報を格納するための構造と、字幕を継続する時間を示す情報を格納するための構造とからなるとしている。このような構造を有するデータは、コンピュータ読み取り可能な記録媒体に格納可能である。 As described above, the balloon data has a structure for causing the computer device to display video content with captions by the balloon. Such a structure includes a structure for storing information related to the time for which a speech balloon should be displayed in the video based on the video content data serving as the original data (for example, the above-described caption start time and caption duration), and information related to the time. Corresponding to the structure for storing information (for example, the above-mentioned speech balloon range and speech balloon starting point) in the video, and the information regarding the time, the shape of the speech balloon in the area Corresponding to the structure for storing information (for example, the above-mentioned balloon shape) and the information about time, information about the subtitle character to be inserted into the balloon (for example, the above-mentioned subtitle character expansion speed, subtitle character attribute, subtitle character) And a structure for storing. In the present embodiment, the structure for storing information related to time is composed of a structure for storing information indicating the start time of captions and a structure for storing information indicating time to continue captions. . Data having such a structure can be stored in a computer-readable recording medium.

タイムカウント部１４は、時間を計測する。表示出力部１３は、データ作成制御部１１からの信号に応じて、映像および吹き出し作成のための画像を表示し、音声を出力する。 The time count unit 14 measures time. The display output unit 13 displays a video and an image for creating a speech balloon according to a signal from the data creation control unit 11, and outputs a sound.

データ作成制御部１１は、コンテンツデータを再生して、音声の立ち上がり時間および立ち下がり時間を検出して字幕開始時間および字幕継続時間を認識する。データ作成制御部１１は、認識した字幕開始時間および字幕継続時間と入力部１２を介してユーザが入力した字幕文字と対応させて、字幕リストデータを作成し、記憶部１５に格納する。データ作成制御部１１は、字幕リストデータを参照して、音声の立ち上がり時間を検出し、当該表示時間における映像および音声を表示出力部１３に表示・出力させる。データ作成制御部１１は、表示している映像に吹き出し形状を合成させ、当該吹き出し形状の中に字幕文字を合成する。データ作成制御部１１は、上記合成の結果、ユーザによる最終決定がなされれば、当該字幕開始時間における吹き出しデータを作成する。データ作成制御部１１は、字幕開始時間毎に作成された吹き出しデータを総合して、最終的な吹き出しデータを生成する。データ作成制御部１１は、生成した最終的な吹き出しデータを記憶部１５に格納する。 The data creation control unit 11 reproduces the content data, detects the rise time and fall time of the audio, and recognizes the caption start time and the caption duration time. The data creation control unit 11 creates subtitle list data in association with the recognized subtitle start time and subtitle duration and the subtitle characters input by the user via the input unit 12 and stores the subtitle list data in the storage unit 15. The data creation control unit 11 refers to the caption list data, detects the sound rise time, and causes the display output unit 13 to display and output the video and audio at the display time. The data creation control unit 11 synthesizes a balloon shape with the displayed video and synthesizes a subtitle character in the balloon shape. If the final decision is made by the user as a result of the synthesis, the data creation control unit 11 creates balloon data at the caption start time. The data creation control unit 11 generates final speech balloon data by combining speech balloon data created at each caption start time. The data creation control unit 11 stores the generated final balloon data in the storage unit 15.

図５は、コンテンツ送信装置２の機能的構成を示すブロック図である。図５において、コンテンツ送信装置２は、多重化制御部２１と、操作部２２と、誤り訂正符号付加部２３と、デジタル変調部２４と、送信部２５とを含む。 FIG. 5 is a block diagram illustrating a functional configuration of the content transmission device 2. In FIG. 5, the content transmission apparatus 2 includes a multiplexing control unit 21, an operation unit 22, an error correction code addition unit 23, a digital modulation unit 24, and a transmission unit 25.

操作部２２は、マウスやキーボード等の入力デバイスであって、ユーザの指示に応じて、放送したいコンテンツデータに関する情報を多重化制御部２１に入力する。 The operation unit 22 is an input device such as a mouse or a keyboard, and inputs information related to content data to be broadcast to the multiplexing control unit 21 in accordance with a user instruction.

多重化制御部２１は、操作部２２からの情報に基づいて、コンテンツ作成装置１の記憶部１５からユーザが所望するコンテンツデータとそれに対応する吹き出しデータとを読み出し、二つのデータを多重化する。以下、この多重化されたデータを多重化データという。 Based on the information from the operation unit 22, the multiplexing control unit 21 reads out the content data desired by the user from the storage unit 15 of the content creation device 1 and the corresponding balloon data and multiplexes the two data. Hereinafter, the multiplexed data is referred to as multiplexed data.

誤り訂正符号付加部２３は、多重化制御部２１によって多重化された多重化データに誤り訂正用の符号を付加する。デジタル変調部２４は、誤り訂正用の符号が付加された多重化データをデジタル変調する。送信部２５は、デジタル変調された多重化データを放送装置３へ送信する。なお、コンテンツデータと吹き出しデータとは、予めコンテンツ作成装置１が多重化してもよい。また、多重化データの送信機能は、コンテンツ作成装置に含まれていてもよい。 The error correction code adding unit 23 adds an error correction code to the multiplexed data multiplexed by the multiplexing control unit 21. The digital modulation unit 24 digitally modulates the multiplexed data to which the error correction code is added. The transmission unit 25 transmits the digitally modulated multiplexed data to the broadcasting device 3. The content data and the balloon data may be multiplexed in advance by the content creation device 1. Moreover, the transmission function of multiplexed data may be included in the content creation device.

放送装置３は、コンテンツ送信装置２から送られてくる多重化データを電波に変換して放射する。放送装置３の内部構成は、公知のものと同様であるので、詳しい説明を省略する。 The broadcast device 3 converts the multiplexed data sent from the content transmission device 2 into a radio wave and radiates it. Since the internal configuration of the broadcasting device 3 is the same as a known one, detailed description thereof is omitted.

図６は、コンテンツ再生装置４の機能的構成を示すブロック図である。図６において、コンテンツ再生装置４は、再生制御部４１と、操作部４２と、合成転送部４３と、タイムカウント部４４と、吹き出し形状記憶部４５と、受信部４６と、復調部４７と、誤り訂正部４８とを含む。 FIG. 6 is a block diagram showing a functional configuration of the content reproduction apparatus 4. In FIG. 6, the content reproduction apparatus 4 includes a reproduction control unit 41, an operation unit 42, a composition transfer unit 43, a time count unit 44, a balloon shape storage unit 45, a reception unit 46, a demodulation unit 47, And an error correction unit 48.

受信部４６は、放送装置３が放送した電波を受信する。復調部４７は、受信部４６が受信した電波を復調する。誤り訂正部４８は、復調部４７によって復調された多重化データに含まれている誤り訂正符号を参照して、誤りを訂正する。 The receiving unit 46 receives radio waves broadcast by the broadcasting device 3. The demodulator 47 demodulates the radio wave received by the receiver 46. The error correction unit 48 refers to the error correction code included in the multiplexed data demodulated by the demodulation unit 47 and corrects the error.

操作部４２は、リモートコントローラやボタンスイッチ等、ユーザがコンテンツ再生装置４の動作を制御するための入力デバイスである。タイムカウント部４４は、コンテンツデータを再生する際の時間をカウントする。吹き出し形状記憶部４５は、コンテンツ作成装置１の記憶部１５と同様、吹き出し形状データを格納している。 The operation unit 42 is an input device such as a remote controller or button switch for the user to control the operation of the content reproduction apparatus 4. The time count unit 44 counts the time for reproducing the content data. The balloon shape storage unit 45 stores balloon shape data, like the storage unit 15 of the content creation device 1.

再生制御部４１は、誤り訂正部４８によって誤り訂正された多重化データの中から、コンテンツデータを読み出し、映像および音声に関する信号（以下、映像信号および音声信号という）を一フレーム毎に合成転送部４３に転送する。また、再生制御部４１は、誤り訂正部４８によって誤り訂正された多重化データの中から、吹き出しデータを読み出し、吹き出しデータに含まれている吹き出し形状に関する情報にしたがって、吹き出し形状記憶部４５から吹き出し形状に関するデータを読み出し、吹き出し画像に関する信号（以下、吹き出し信号という）を作成し、合成転送部４３に送る。なお、吹き出し信号は、複数のフレームに渡って同一である場合があるが、ここでは、再生制御部４１は、フレーム毎に、吹き出し信号を合成転送部４３に送ることとする。再生制御部４１は、吹き出しデータに含まれている字幕文字にしたがって、吹き出しの中に挿入すべき字幕文字に関する信号（以下、字幕文字信号という）をフレーム毎に作成して、合成転送部４３に送る。なお、受信部４６は、コンテンツ再生装置４の外部にあってもよい。 The reproduction control unit 41 reads content data from the multiplexed data that has been error-corrected by the error correction unit 48, and synthesizes and forwards video and audio signals (hereinafter referred to as video signals and audio signals) for each frame. 43. Further, the reproduction control unit 41 reads out the balloon data from the multiplexed data that has been error-corrected by the error correction unit 48, and outputs the balloon data from the balloon shape storage unit 45 according to the information regarding the balloon shape included in the balloon data. Data related to the shape is read, a signal related to the balloon image (hereinafter referred to as a balloon signal) is generated, and is sent to the synthesis transfer unit 43. Note that the balloon signal may be the same over a plurality of frames, but here, the reproduction control unit 41 sends the balloon signal to the synthesis transfer unit 43 for each frame. The reproduction control unit 41 creates a signal related to the subtitle character to be inserted into the balloon (hereinafter referred to as a subtitle character signal) according to the subtitle character included in the balloon data, and sends it to the composition transfer unit 43. send. The receiving unit 46 may be outside the content reproduction device 4.

合成転送部４３は、再生制御部４１から送られてくる信号を一フレーム毎に合成して、コンテンツ表示装置５に転送する。 The composition transfer unit 43 synthesizes the signals sent from the reproduction control unit 41 for each frame, and transfers them to the content display device 5.

図７は、コンテンツ表示装置５の機能的構成を示すブロック図である。図７において、コンテンツ表示装置５は、表示出力デバイス部５１と、駆動回路部５２とを含む。表示出力デバイス部５１は、ブラウン管や液晶ディスプレイ、スピーカ等である。駆動回路部５２は、コンテンツ再生装置４から送信されてくる合成された信号および音声信号にしたがって、表示出力デバイス部５１に映像および音声を再生させる。 FIG. 7 is a block diagram showing a functional configuration of the content display device 5. In FIG. 7, the content display device 5 includes a display output device unit 51 and a drive circuit unit 52. The display output device unit 51 is a cathode ray tube, a liquid crystal display, a speaker, or the like. The drive circuit unit 52 causes the display output device unit 51 to reproduce video and audio in accordance with the synthesized signal and audio signal transmitted from the content reproduction device 4.

図８は、コンテンツ作成装置１の動作を示すフローチャートである。図９Ａ〜図９Ｄは、コンテンツ作成装置１における表示内容の一例を示す図である。以下、図８および図９Ａ〜図９Ｄを参照しながら、コンテンツ作成装置１の動作について説明する。 FIG. 8 is a flowchart showing the operation of the content creation device 1. 9A to 9D are diagrams illustrating examples of display contents in the content creation device 1. Hereinafter, the operation of the content creation device 1 will be described with reference to FIG. 8 and FIGS. 9A to 9D.

まず、コンテンツ作成装置１のデータ作成制御部１１は、入力部１２を介して入力されるユーザからの指示に応じて、記憶部１５に格納されている所望のコンテンツデータを読み出して、表示出力部１３に映像を表示させ、音声を出力させる（ステップＳ１０１）。 First, the data creation control unit 11 of the content creation device 1 reads out desired content data stored in the storage unit 15 in accordance with an instruction from the user input via the input unit 12, and displays a display output unit. The video is displayed on 13 and the sound is output (step S101).

次に、データ作成制御部１１は、音声認識によって、音声の立ち上がりタイミングが到来するか否かを判断する（ステップＳ１０２）。音声の立ち上がりタイミングが到来しない場合、データ作成制御部１１は、ステップＳ１０４の動作に進む。一方、音声の立ち上がりタイミングが到来した場合、データ作成制御部１１は、音声の立ち上がりタイミングを字幕開始時間とし、音声が立ち下がる時間までの経過間隔を字幕継続時間として、この間の音声に対応する字幕文字をユーザに入力させる。データ作成制御部１１は、当該字幕開始時間、当該字幕継続時間、および当該字幕文字を字幕リストデータの一部として、記憶部１５に格納し（ステップＳ１０３）、ステップＳ１０４の動作に進む。この際、ユーザは、分かち書きによって、字幕文字を入力するとよい。 Next, the data creation control unit 11 determines whether or not the voice rising timing comes by voice recognition (step S102). If the voice rise timing has not arrived, the data creation control unit 11 proceeds to the operation of step S104. On the other hand, when the voice rise timing has arrived, the data creation control unit 11 uses the voice rise timing as the caption start time and the elapsed time until the voice falls as the caption continuation time, and the caption corresponding to the voice during this time Let the user enter characters. The data creation control unit 11 stores the subtitle start time, the subtitle duration, and the subtitle character as part of the subtitle list data in the storage unit 15 (step S103), and proceeds to the operation of step S104. At this time, the user may input subtitle characters by division.

ステップＳ１０４において、データ作成制御部１１は、当該コンテンツデータの再生が全て終了したか否かを判断する。全て終了していない場合、データ作成制御部１１は、ステップＳ１０２の動作に戻って、次の音声立ち上がりタイミングからの字幕文字の作成を継続する。一方、全て終了した場合、データ作成制御部１１は、ステップＳ１０３で随時作成された字幕リストデータを一つにまとめて、当該コンテンツに対する字幕リストデータを最終的に作成し、記憶部１５に格納し（ステップＳ１０５）、ステップＳ１０６の動作に進む。 In step S104, the data creation control unit 11 determines whether or not the reproduction of the content data has been completed. If not all have been completed, the data creation control unit 11 returns to the operation of step S102 and continues to create subtitle characters from the next voice rising timing. On the other hand, when all the processes are completed, the data creation control unit 11 combines the caption list data created at any time in step S103 into one, finally creates caption list data for the content, and stores it in the storage unit 15. (Step S105), the process proceeds to Step S106.

ステップＳ１０６において、データ作成制御部１１は、字幕リストデータを参照して、先頭から順に字幕開始時間および字幕継続時間を認識する。次に、データ作成制御部１１は、コンテンツデータを参照して、ステップＳ１０６で認識した字幕開始時間から字幕継続時間までの映像および音声を表示出力部１３に再生させる（ステップＳ１０７）。 In step S106, the data creation control unit 11 refers to the caption list data and recognizes the caption start time and the caption duration time in order from the top. Next, the data creation control unit 11 refers to the content data and causes the display output unit 13 to reproduce video and audio from the caption start time to the caption duration time recognized in step S106 (step S107).

次に、データ作成制御部１１は、字幕開始時間から字幕継続時間までの映像内における色彩の平坦度を計算して、同じような色調である部分（以下、平坦部分という）を抽出する（ステップＳ１０８）。次に、データ作成制御部１１は、抽出した平坦部分に入る程度の大きさの四角形を認識する（ステップＳ１０９）。次に、データ作成制御部１１は、認識した四角形を点線の枠（以下、四角枠という）で表すように、表示出力部１３に字幕開始時点における映像と合成させて表示させる（ステップＳ１１０）。この際、データ作成制御部１１は、四角枠の四隅を黒丸で表示させる。図９Ａは、ステップＳ１１０で表示される画面の一例を示す図である。図９Ａに示すように、色調が平坦な平坦部分Ｆａに最大限納まるように、四角枠Ｓａが表示されている。なお、枠は、四角形以外であってもよい。 Next, the data creation control unit 11 calculates the flatness of the color in the video from the caption start time to the caption continuation time, and extracts a portion having a similar color tone (hereinafter referred to as a flat portion) (step). S108). Next, the data creation control unit 11 recognizes a quadrangle that is large enough to fit into the extracted flat portion (step S109). Next, the data creation control unit 11 causes the display output unit 13 to synthesize and display the image at the start time of the caption so that the recognized rectangle is represented by a dotted frame (hereinafter referred to as a square frame) (step S110). At this time, the data creation control unit 11 displays the four corners of the square frame with black circles. FIG. 9A is a diagram illustrating an example of a screen displayed in step S110. As shown in FIG. 9A, the square frame Sa is displayed so that the color tone is maximally accommodated in the flat portion Fa. The frame may be other than a rectangle.

次に、データ作成制御部１１は、ステップＳ１１０で表した四角枠を吹き出しを表示する範囲としてよいか否かをユーザに問う表示を表示出力部１３に表示させ、ユーザからの修正指示があれば、指示に応じた四角枠を吹き出しを表示する範囲とする（ステップＳ１１１）。このとき、データ作成制御部１１は、四角枠の四隅の座標をメモリ（図示せず）に一時的に格納する。また、ユーザによる修正は、入力部１２によって行われる。たとえば、ユーザは、マウスのポインタを四角枠の四辺または四隅に位置させ、ドラッグすることによって、四角枠の大きさおよび／または位置を修正する。このような手法は、画像用ソフトウエア等の分野で公知であるので、これ以上の詳しい説明を省略する。 Next, the data creation control unit 11 causes the display output unit 13 to display a display asking the user whether or not the square frame represented in step S110 may be the range for displaying the balloon, and if there is a correction instruction from the user The square frame corresponding to the instruction is set as a range for displaying the balloon (step S111). At this time, the data creation control unit 11 temporarily stores the coordinates of the four corners of the square frame in a memory (not shown). Further, correction by the user is performed by the input unit 12. For example, the user modifies the size and / or position of the square frame by positioning the mouse pointer on the four sides or four corners of the square frame and dragging the mouse pointer. Since such a method is well known in the field of image software and the like, further detailed description is omitted.

次に、データ作成制御部１１は、映像から人物の顔部分を認識する（ステップＳ１１２）。このときの認識手法としては、様々考えられるが、肌の色や、顔の形状等に基づいて、データ作成制御部１１は、人物の顔部分を認識することができる。このような手法は、画像認識の分野で公知であるので、これ以上の詳しい説明を省略する。 Next, the data creation control unit 11 recognizes a human face from the video (step S112). There are various recognition methods at this time, but the data creation control unit 11 can recognize the face portion of the person based on the skin color, the shape of the face, and the like. Since such a method is known in the field of image recognition, further detailed description is omitted.

次に、データ作成制御部１１は、認識した顔部分の面積を求め、当該面積が所定の閾値を超えるか否かを判断する（ステップＳ１１３）。超える場合、データ作成制御部１１は、口元部分を検出して、当該口元から四角枠の対角線交点（以下、四角枠中心という）に向けた基準線を表示出力部１３に表示させ、当該基準線上に吹き出しの開始点の候補を表示させ（ステップＳ１１４）、ステップＳ１１６の動作に進む。 Next, the data creation control unit 11 obtains the area of the recognized face portion and determines whether or not the area exceeds a predetermined threshold (step S113). If so, the data creation control unit 11 detects the mouth portion and causes the display output unit 13 to display a reference line from the mouth toward the diagonal line intersection of the square frame (hereinafter referred to as the center of the square frame). The balloon start point candidates are displayed (step S114), and the operation proceeds to step S116.

一方、超えない場合、データ作成制御部１１は、顔の中心部分を認識して、当該中心部分から四角枠中心に向けて基準線を表示出力部１３に表示させ、当該基準線上に吹き出し開始点の候補を表示させ（ステップＳ１１５）、ステップＳ１１６の動作に進む。図９Ｂは、ステップＳ１１５において、吹き出し開始点の候補が表示されたときの一例を示す図である。図９Ｂに示すように、吹き出し開始点Ｐａは、顔の中心から四角枠Ｓａの中心に向けてひかれた基準線Ｌａ上に表示されている。このように、データ作成制御部１１は、顔サイズの大きさに応じて吹き出し画像の開始点を決定する。 On the other hand, if not exceeding, the data creation control unit 11 recognizes the center part of the face, causes the display output unit 13 to display the reference line from the center part toward the center of the rectangular frame, and starts the balloon on the reference line. Are displayed (step S115), and the operation proceeds to step S116. FIG. 9B is a diagram illustrating an example when a balloon start point candidate is displayed in step S115. As shown in FIG. 9B, the balloon start point Pa is displayed on a reference line La drawn from the center of the face toward the center of the square frame Sa. Thus, the data creation control unit 11 determines the start point of the balloon image according to the face size.

ステップＳ１１６において、データ作成制御部１１は、入力部１２を介したユーザからの指示に応じて、吹き出し開始点を修正し、最終的な吹き出し開始点の座標をメモリ（図示せず）に格納し、ステップＳ１１７の動作に進む。なお、ユーザによる修正がない場合、データ作成制御部１１は、吹き出し開始点の候補とした座標を格納する。 In step S116, the data creation control unit 11 corrects the balloon start point in accordance with an instruction from the user via the input unit 12, and stores the coordinates of the final balloon start point in a memory (not shown). Then, the operation proceeds to step S117. When there is no correction by the user, the data creation control unit 11 stores the coordinates that are candidates for the balloon start point.

ステップＳ１１７において、データ作成制御部１１は、標準の吹き出し形状であると予め設定されている吹き出し形状に関するデータを記憶部１５から読み出して、必要があれば、ステップＳ１１１で決定した四角枠の中に最大限納まるように、吹き出し形状のサイズを変更して、四角枠内にサイズ変更後の吹き出し画像を、表示出力部１３に表示させる。図９Ｃは、ステップＳ１１７で表示された吹き出し画像の一例を示す図である。図９Ｃに示すように、吹き出し画像Ｂａは、四角枠Ｓａ内に納まるように表示されている。 In step S117, the data creation control unit 11 reads from the storage unit 15 data related to a balloon shape that is preset as a standard balloon shape, and if necessary, puts it in the square frame determined in step S111. The size of the balloon shape is changed so as to be accommodated to the maximum, and the balloon image after the size change is displayed on the display output unit 13 within the rectangular frame. FIG. 9C is a diagram illustrating an example of a balloon image displayed in step S117. As shown in FIG. 9C, the balloon image Ba is displayed so as to be contained within the square frame Sa.

次に、データ作成制御部１１は、ユーザからの指示に応じて、吹き出し画像を修正する（ステップＳ１１８）。具体的には、吹き出しの形状や大きさ、向き等が修正される。このような修正は、吹き出しの形状を示すダイアログボックスからユーザが所望の形状を選択するようにして行われる。また、サイズ修正は、表示中の吹き出しをドラッグ等されることによって行われる。その他、修正の手法としては、様々考えられる。 Next, the data creation control unit 11 corrects the balloon image in accordance with an instruction from the user (step S118). Specifically, the shape, size, direction, etc. of the balloon are corrected. Such correction is performed in such a manner that the user selects a desired shape from a dialog box indicating the shape of the balloon. The size correction is performed by dragging a balloon that is being displayed. There are various other correction methods.

ユーザによる修正が終了、または、修正が無かったら、データ作成制御部１１は、最終的な吹き出し画像を決定する（ステップＳ１１９）。この際、データ作成制御部１１は、吹き出し画像の形状を示す名称をメモリ（図示せず）に一時格納する。また、吹き出し画像のサイズが変更になっている場合は、データ作成制御部１１は、サイズ変更があった吹き出しを囲む最小の大きさの四角枠の四隅の座標を、吹き出しを表示する範囲として、メモリに格納されている四隅の座標を変更する。 If the correction by the user is completed or there is no correction, the data creation control unit 11 determines a final balloon image (step S119). At this time, the data creation control unit 11 temporarily stores a name indicating the shape of the balloon image in a memory (not shown). If the size of the balloon image has been changed, the data creation control unit 11 uses the coordinates of the four corners of the minimum-sized square frame surrounding the balloon whose size has been changed as a range for displaying the balloon. Change the coordinates of the four corners stored in memory.

次に、データ作成制御部１１は、当該字幕開始時間における字幕文字を字幕リストデータから読み出し、決定した吹き出しの中に、字幕文字を挿入する（ステップＳ１２０）。この際、データ作成制御部１１は、字幕開始時間から字幕継続時間が経過するまでの間、字幕文字が先頭から一フレーム毎に表示するように、表示出力部１３に指示する。このとき、データ作成制御部１１は、字幕文字展開スピードを決定する。字幕文字展開スピードは、一フレーム中に何文字の字幕を新たに表示するかを段階的に示すことによって定義される。たとえば、ノーマルスピードであれば、一フレーム中に新たに六文字を表示するといったように定義される。データ作成制御部１１は、字幕文字展開スピードもメモリに一時格納しておく。図９Ｄは、字幕文字が挿入されたときの表示の一例を示す図である。図９Ｄに示すように、吹き出し画像Ｂａの中に、字幕文字Ｃａが表示されている。 Next, the data creation control unit 11 reads out the subtitle character at the subtitle start time from the subtitle list data, and inserts the subtitle character into the determined balloon (step S120). At this time, the data creation control unit 11 instructs the display output unit 13 to display the subtitle characters frame by frame from the beginning until the subtitle duration time elapses from the subtitle start time. At this time, the data creation control unit 11 determines the subtitle character expansion speed. The subtitle character development speed is defined by showing step by step how many subtitles are newly displayed in one frame. For example, it is defined that six characters are newly displayed in one frame at normal speed. The data creation control unit 11 also temporarily stores the subtitle character expansion speed in the memory. FIG. 9D is a diagram illustrating an example of a display when a caption character is inserted. As shown in FIG. 9D, the subtitle character Ca is displayed in the balloon image Ba.

次に、データ作成制御部１１は、ユーザからの指示に応じて、字幕文字を修正する（ステップＳ１２１）。ここでは、字幕文字属性として、文字の種類、文字の色、字幕の背景、字幕の透過率、字幕の枠の種類、字幕文字の強調等が修正できるとする。データ作成制御部１１は、これら字幕文字属性もメモリに一時格納しておく。なお、コンテンツ作成装置１は、映像コンテンツデータの再生中の音声の音量を判断する音量判断部を備えるとよい。このとき、コンテンツ作成装置１は、音量判断部が判断した音量に応じて、字幕文字の属性を変化させるとよい。たとえば、音量が大きい場合、コンテンツ作成装置１は、字幕文字を大きくしたり、変色させたりするとよい。 Next, the data creation control unit 11 corrects the caption character in accordance with an instruction from the user (step S121). Here, it is assumed that the character type, character color, subtitle background, subtitle transparency, subtitle frame type, subtitle character emphasis, and the like can be corrected as subtitle character attributes. The data creation control unit 11 temporarily stores these subtitle character attributes in the memory. Note that the content creation device 1 may include a volume determination unit that determines the volume of audio during reproduction of video content data. At this time, the content creation device 1 may change the attribute of the subtitle character according to the volume determined by the volume determination unit. For example, when the volume is high, the content creation device 1 may enlarge the caption character or change the color.

次に、データ作成制御部１１は、メモリに一時格納した情報を読み出して、字幕開始時間に対応するように、字幕継続時間、字幕文字展開スピード、字幕文字属性、吹き出し範囲（四角枠の四隅の座標）、吹き出し開始点の座標、吹き出し形状および字幕文字を記憶部１５に格納する（ステップＳ１２２）。 Next, the data creation control unit 11 reads the information temporarily stored in the memory, and subtitle duration, subtitle character expansion speed, subtitle character attribute, balloon range (at the four corners of the square frame) so as to correspond to the subtitle start time. Coordinates), the coordinates of the balloon start point, the balloon shape, and the subtitle characters are stored in the storage unit 15 (step S122).

次に、データ作成制御部１１は、コンテンツが全て終了したか否かを判断する（ステップＳ１２３）。終了していない場合、データ作成制御部１１は、ステップＳ１０６の動作に戻り、字幕開始時間毎の吹き出しデータの作成を継続する。一方、終了している場合、データ作成制御部１１は、今まで作成した字幕開始時間毎の吹き出しデータを一つに統合して、所望のコンテンツデータに対応する最終的な吹き出しデータを作成して、記憶部１５に格納し（ステップＳ１２４）、処理を終了する。 Next, the data creation control unit 11 determines whether or not all the contents have been completed (step S123). If not completed, the data creation control unit 11 returns to the operation of step S106 and continues creating the balloon data for each caption start time. On the other hand, if completed, the data creation control unit 11 integrates the balloon data for each subtitle start time created so far into one, and creates final balloon data corresponding to the desired content data. Then, the data is stored in the storage unit 15 (step S124), and the process is terminated.

図１０は、最終的に作成された吹き出しデータの一例を示す図である。図１０では、コンテンツデータとして用いられるＭＰＥＧデータフォーマットとの親和性を保たせ、規格化し易くするために、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）に準拠したフォーマットで、吹き出しデータが記述されたの一例を示している。図１０に示すように、吹き出しデータには、字幕開始時間毎に、字幕文字展開スピード、字幕継続時間、吹き出し範囲、吹き出し開始点、吹き出し形状、および字幕文字が定義されている。図１０では、字幕文字属性は、全体で統一するようなものとしているが、字幕開始時間毎に定義するようにしてもよい。 FIG. 10 is a diagram illustrating an example of the balloon data finally created. FIG. 10 shows an example in which balloon data is described in a format compliant with XML (extensible Markup Language) in order to maintain compatibility with the MPEG data format used as content data and facilitate standardization. Yes. As shown in FIG. 10, subtitle character expansion speed, subtitle duration, speech bubble range, speech bubble start point, speech bubble shape, and subtitle characters are defined in the speech bubble data for each caption start time. In FIG. 10, the subtitle character attributes are unified as a whole, but may be defined for each subtitle start time.

図１１は、コンテンツ送信装置２の動作を示すフローチャートである。以下、図１１を参照しながら、コンテンツ送信装置２の動作について説明する。 FIG. 11 is a flowchart showing the operation of the content transmission device 2. Hereinafter, the operation of the content transmission device 2 will be described with reference to FIG.

まず、コンテンツ送信装置２の多重化制御部２１は、操作部２２を介したユーザからの指示に応じて、コンテンツ作成装置１の記憶部１５に格納されている所望のコンテンツデータを読み出す（ステップＳ２０１）。次に、多重化制御部２１は、当該コンテンツデータに対応する吹き出しデータを記憶部１５から読み出す（ステップＳ２０２）。次に、多重化制御部２１は、読み出したコンテンツデータと吹き出しデータとを多重化する（ステップＳ２０３）。多重化の方法としては、コンテンツデータのヘッダ部分に吹き出しデータを埋め込むなど、どのような方法であってもよい。 First, the multiplexing control unit 21 of the content transmission device 2 reads out desired content data stored in the storage unit 15 of the content creation device 1 in accordance with an instruction from the user via the operation unit 22 (step S201). ). Next, the multiplexing control unit 21 reads balloon data corresponding to the content data from the storage unit 15 (step S202). Next, the multiplexing control unit 21 multiplexes the read content data and balloon data (step S203). As a multiplexing method, any method such as embedding balloon data in the header portion of the content data may be used.

次に、誤り訂正符号付加部２３は、多重化データに誤り訂正符号を付加する（ステップＳ２０４）。次に、デジタル変調部２４は、誤り訂正符号が付加された多重化データをデジタル変調する（ステップＳ２０５）。次に送信部２５は、デジタル変調されたデータを放送装置３に向けて送信し（ステップＳ２０６）、処理を終了する。 Next, the error correction code adding unit 23 adds an error correction code to the multiplexed data (step S204). Next, the digital modulation unit 24 digitally modulates the multiplexed data to which the error correction code is added (step S205). Next, the transmission unit 25 transmits the digitally modulated data to the broadcasting device 3 (step S206), and ends the process.

図１２は、コンテンツ再生装置４の動作を示すフローチャートである。図１３Ａ〜図１３Ｄは、コンテンツ再生装置４によって作成される映像信号、吹き出し信号および字幕文字信号の内容の一例を示す図である。以下、図１２および図１３Ａ〜図１３Ｄを参照しながら、コンテンツ再生装置４の動作について、説明する。 FIG. 12 is a flowchart showing the operation of the content reproduction apparatus 4. 13A to 13D are diagrams illustrating examples of the contents of the video signal, the speech balloon signal, and the subtitle character signal created by the content reproduction device 4. Hereinafter, the operation of the content reproduction apparatus 4 will be described with reference to FIG. 12 and FIGS. 13A to 13D.

まず、コンテンツ再生装置４は、受信部４６で受信した信号を復調部４７で復調して、誤り訂正部４８で誤り訂正した後、再生制御部４１に入力する（ステップＳ３０１）。次に、再生制御部４１は、誤り訂正された多重化データの中から、コンテンツデータを読み出して、当該コンテンツデータの再生に必要な映像信号および音声信号をフレーム毎に合成転送部４３に送る（ステップＳ３０２）。再生制御部４１は、映像信号および音声信号を、以下の動作と並行しながら、一フレーム毎に合成転送部４３へ送る。図１３Ａは、映像信号の内容の一例を示す図である。図１３Ａに示すように、ステップＳ３０２では、吹き出しに関する情報が含まれていない映像および音声に関する情報のみが転送される。 First, the content reproduction apparatus 4 demodulates the signal received by the reception unit 46 by the demodulation unit 47, corrects the error by the error correction unit 48, and then inputs the signal to the reproduction control unit 41 (step S301). Next, the reproduction control unit 41 reads content data from the error-corrected multiplexed data, and sends a video signal and an audio signal necessary for reproduction of the content data to the synthesis transfer unit 43 for each frame ( Step S302). The reproduction control unit 41 sends the video signal and the audio signal to the synthesis transfer unit 43 for each frame in parallel with the following operation. FIG. 13A is a diagram illustrating an example of the content of a video signal. As shown in FIG. 13A, in step S302, only information related to video and audio that does not include information related to speech balloons is transferred.

次に、再生制御部４１は、多重化データの中から、吹き出しデータを読み出して、字幕開始時間と字幕継続時間とを認識する（ステップＳ３０３）。次に、再生制御部４１は、タイムカウント部４４からの情報に基づいて、字幕開始時間が到来したか否かを判断する（ステップＳ３０４）。字幕開始時間が到来していない場合、再生制御部４１は、ステップＳ３１２の動作に進む。 Next, the reproduction control unit 41 reads out the balloon data from the multiplexed data and recognizes the caption start time and the caption duration time (step S303). Next, the playback control unit 41 determines whether or not the caption start time has arrived based on the information from the time count unit 44 (step S304). If the subtitle start time has not arrived, the playback control unit 41 proceeds to the operation of step S312.

一方、字幕開始時間が到来した場合、再生制御部４１は、吹き出しデータに格納されている吹き出し範囲に基づいて、吹き出しをはめ込む画面上の範囲を認識する（ステップＳ３０５）。次に、再生制御部４１は、吹き出しデータに格納されている吹き出し形状に基づいて、吹き出し形状記憶部４５から指定されている吹き出し形状に関する情報を読み出して、ステップＳ３０５で求めた範囲内に納まるように吹き出し画像のサイズを決定する（ステップＳ３０６）。次に、再生制御部４１は、決定した範囲および大きさに吹き出し画像が表示されるような吹き出し信号を作成して、合成転送部４３へ送る（ステップＳ３０７）。このとき、字幕継続時間中は吹き出し部分の形は変わることがないが、再生制御部４１は、映像信号および字幕文字信号と同期をとりやすくするために、その他の動作とは並行して、一フレーム毎に吹き出し信号を合成転送部４３に送る。図１３Ｂは、吹き出し信号の内容（吹き出し画像）の一例を示す図である。図１３Ｂに示すように、吹き出し信号は、吹き出し部分の画像のみに関する情報を提供する。 On the other hand, when the subtitle start time has arrived, the playback control unit 41 recognizes the range on the screen into which the speech balloon is to be inserted based on the speech balloon range stored in the speech balloon data (step S305). Next, the reproduction control unit 41 reads out information related to the balloon shape designated from the balloon shape storage unit 45 based on the balloon shape stored in the balloon data, and falls within the range obtained in step S305. Next, the size of the balloon image is determined (step S306). Next, the reproduction control unit 41 creates a balloon signal that displays a balloon image in the determined range and size, and sends the balloon signal to the composition transfer unit 43 (step S307). At this time, the shape of the speech balloon portion does not change during the caption duration, but the playback control unit 41 is in parallel with other operations in order to facilitate synchronization with the video signal and the caption character signal. A balloon signal is sent to the synthesis transfer unit 43 for each frame. FIG. 13B is a diagram illustrating an example of the content (balloon image) of the balloon signal. As shown in FIG. 13B, the balloon signal provides information relating only to the image of the balloon portion.

次に、再生制御部４１は、吹き出しデータに格納されている字幕継続時間に基づいて、字幕継続時間の間に存在するフレーム数を認識する（ステップＳ３０８）。次に、再生制御部４１は、字幕文字数をステップＳ３０８で求めたフレーム数で割って、一フレーム当たりに表示すべき字幕文字の文字数を認識し、一フレーム当たりの字幕文字を表示するための字幕文字信号を作成して（ステップＳ３０９）、合成転送部４３に送る（ステップＳ３１０）。図１３Ｃは、最初のフレームにおける字幕文字信号に基づく画像の一例を示す図である。図１３Ｄは、２回目のフレームにおける字幕文字信号に基づく画像の一例を示す図である。図１３Ｃ，Ｄに示すように、字幕文字信号によって、字幕継続時間の間に表示すべき字幕文字が徐々に追加されていく。 Next, the reproduction control unit 41 recognizes the number of frames existing during the caption duration based on the caption duration stored in the balloon data (step S308). Next, the reproduction control unit 41 divides the number of subtitle characters by the number of frames obtained in step S308, recognizes the number of subtitle characters to be displayed per frame, and displays subtitle characters per frame. A character signal is created (step S309) and sent to the composition transfer unit 43 (step S310). FIG. 13C is a diagram illustrating an example of an image based on the caption character signal in the first frame. FIG. 13D is a diagram illustrating an example of an image based on a caption character signal in the second frame. As shown in FIGS. 13C and 13D, subtitle characters to be displayed are gradually added during the subtitle duration by the subtitle character signal.

次に、再生制御部４１は、字幕継続時間における全てのフレームが終了したが否かを判断する（ステップＳ３１１）。終了していない場合、再生制御部４１は、ステップＳ３０８の動作に戻り、次のフレームに必要な字幕文字信号を作成して合成転送部４３に送る。一方、終了している場合、再生制御部４１は、コンテンツが終了したか否かを判断する（ステップＳ３１２）。コンテンツが終了していない場合、再生制御部４１は、ステップＳ３０４の動作に戻って、次の字幕開始時間からの吹き出し信号および字幕文字信号の転送を継続する。一方、コンテンツが終了している場合、再生制御部４１は、処理を終了する。 Next, the playback control unit 41 determines whether or not all the frames in the caption duration time have ended (step S311). If not finished, the playback control unit 41 returns to the operation of step S308, creates a subtitle character signal necessary for the next frame, and sends it to the synthesis transfer unit 43. On the other hand, if it has been completed, the playback control unit 41 determines whether or not the content has ended (step S312). If the content has not ended, the playback control unit 41 returns to the operation of step S304 and continues to transfer the balloon signal and the caption character signal from the next caption start time. On the other hand, when the content has ended, the playback control unit 41 ends the process.

図１４は、コンテンツ再生装置４の合成転送部４３の動作を示す図である。図１５Ａおよび図１５Ｂは、コンテンツ表示装置５における表示例を示す図である。以下、図１４，図１５Ａおよび図１５Ｂを参照しながら、合成転送部４３の動作について説明する。 FIG. 14 is a diagram illustrating the operation of the composition transfer unit 43 of the content reproduction apparatus 4. FIG. 15A and FIG. 15B are diagrams showing display examples on the content display device 5. Hereinafter, the operation of the composite transfer unit 43 will be described with reference to FIGS. 14, 15A, and 15B.

まず、合成転送部４３は、再生制御部４１から送られてくる一フレーム分の映像信号を受信する（ステップＳ４０１）。次に、合成転送部４３は、再生制御部４１から送られてくる一フレーム分の吹き出し信号および字幕文字信号を受信して、映像信号と吹き出し信号と字幕文字信号とを合成して（ステップＳ４０２）、音声信号と共に、コンテンツ表示装置５に転送して（ステップＳ４０３）、ステップＳ４０１に戻り、次のフレームの処理に進む。 First, the composition transfer unit 43 receives a video signal for one frame sent from the reproduction control unit 41 (step S401). Next, the synthesis transfer unit 43 receives the speech balloon signal and the caption character signal for one frame sent from the reproduction control unit 41, and synthesizes the video signal, the speech balloon signal, and the caption character signal (step S402). ) And the audio signal are transferred to the content display device 5 (step S403), the process returns to step S401, and the process proceeds to the next frame.

合成転送部４３からの信号を受信したコンテンツ表示装置５は、第１回目のフレームで、図１５Ａに示すように字幕の一部を表示し、第２回目のフレームで、図１５Ｂに示すように第１回目のフレームにおける字幕と第２回目のフレームにおける字幕とを合わせた字幕を表示することとなる。 The content display device 5 that has received the signal from the composition transfer unit 43 displays a part of the subtitles as shown in FIG. 15A in the first frame, and as shown in FIG. 15B in the second frame. A subtitle combining the subtitle in the first frame and the subtitle in the second frame is displayed.

このように、本発明の一実施形態では、映像コンテンツにおいて字幕文字を吹き出し部分に挿入して表示することができるので、発話者と字幕との関係が分かりやすくなる。さらに、吹き出し部分に字幕文字が表示されるので、画面が見やすくなる。 As described above, according to the embodiment of the present invention, subtitle characters can be inserted and displayed in the balloon portion in the video content, so that the relationship between the speaker and the subtitle can be easily understood. Further, since the subtitle characters are displayed in the balloon portion, the screen is easy to see.

上記実施形態におけるコンテンツ再生装置およびコンテンツ表示装置は、たとえ音声が消されていたとしても、吹き出し開始点を見れば、だれが発話しているのか一目瞭然である。したがって、本実施形態におけるコンテンツ再生装置およびコンテンツ表示装置は、音を出せないような環境においても、映像コンテンツの内容を理解するのに有効に利用できる。これにより、ユーザは、ヘッドホンなどの装置を用いずに、映像コンテンツを楽しむことができる。 In the content reproduction device and the content display device in the above-described embodiment, even if the sound is muted, it is clear at a glance who is speaking by looking at the balloon start point. Therefore, the content reproduction apparatus and content display apparatus in this embodiment can be effectively used to understand the content of video content even in an environment where no sound can be produced. Thereby, the user can enjoy video content without using a device such as headphones.

たとえば、図書館や病院、公共施設など、音声を出すべきでない静かな場所などで、本実施形態に係るコンテンツ再生装置およびコンテンツ表示装置を固定情報端末として設置しておけば、周りの人に迷惑をかけずに、映像コンテンツを楽しむことができる。この場合、コンテンツ再生装置およびコンテンツ表示装置をパソコンなどで容易に実現することができる。その他、周囲の音がうるさいために、音声が聞き取りにくいような環境でも、屋外広告装置や公共案内サービス装置として設置しておけば、吹き出しによる字幕を見ることによって、音声を聞かなくても映像コンテンツを楽しむことができる。 For example, if the content playback device and content display device according to this embodiment are installed as fixed information terminals in a quiet place such as a library, a hospital, or a public facility where sound should not be output, it will cause inconvenience to those around you. You can enjoy video content without spending time. In this case, the content reproduction device and the content display device can be easily realized by a personal computer or the like. In addition, even in an environment where the sound is difficult to hear due to the surrounding sounds being noisy, if installed as an outdoor advertising device or public guidance service device, the video content can be seen without listening to the sound by watching the subtitles in a balloon. Can enjoy.

上記実施形態では、コンテンツ再生装置とコンテンツ表示装置とが別々の装置であるとしたが、二つの装置を一体化し、携帯可能なように小型化にすることもできる。このような携帯情報端末を用いれば、マナーとして音を配慮しなければならない環境（たとえば、電車・バスの車中や、船の船内、図書館、病院、飛行機の機内等）でも、映像コンテンツを楽しむことができる。このように、本発明は、多岐に渡って利用価値を有するものである。 In the above embodiment, the content reproduction device and the content display device are separate devices, but the two devices can be integrated and miniaturized so that they can be carried. With such a portable information terminal, you can enjoy video content even in an environment where you have to consider sound as a manner (for example, in a train or bus, in a ship, in a library, a hospital, or in an airplane) be able to. As described above, the present invention has a wide range of utility values.

なお、上記実施形態では、コンテンツ再生装置とコンテンツ表示装置とを別々の装置であるかのように説明したが、一方の機能が他方に含まれているようにしてもよい。また、コンテンツ作成装置とコンテンツ送信装置とについても、一方の機能が他方に含まれているようにしてもよい。 In the above embodiment, the content playback device and the content display device have been described as if they were separate devices, but one function may be included in the other. Also, the content creation device and the content transmission device may include one function in the other.

上記のように、多岐に渡って利用価値の高いものとするために、以下のような機能をコンテンツ再生装置（コンテンツ表示装置が内蔵されているものも含む）がさらに有しているとなおよい。 As described above, in order to have a wide range of useful values, it is better that the content playback device (including the content display device built-in) further has the following functions: .

たとえば、コンテンツ再生装置は、ユーザの指示に応じて、吹き出しの表示の有無を切り換えることができるとよい。この場合、具体的には、ユーザから吹き出しの表示をしない旨の指示があった場合、コンテンツ再生装置の再生制御部は、合成転送部に対して、映像信号と音声信号とのみを合成するように命令すればよい。 For example, the content reproduction device may be able to switch the presence / absence of a balloon display in accordance with a user instruction. In this case, specifically, when the user gives an instruction not to display the speech balloon, the reproduction control unit of the content reproduction apparatus synthesizes only the video signal and the audio signal to the synthesis transfer unit. You can order it.

また、コンテンツ再生装置は、自動的に吹き出しの表示の有無を切り換えるようにしてもよい。たとえば、コンテンツ再生装置は、周囲の音量を計測する音量計測部をさらに備える。コンテンツ再生装置は、音量計測部が計測したスピーカから出力されている音量と周囲の音量とを比較する。比較の結果、周囲の音量が所定の閾値以上の場合、コンテンツ再生装置の再生制御部は、スピーカからの音声出力を停止させ、合成転送部に吹き出しによる字幕表示による合成に切り換えるよう命令する。これにより、周囲の音がうるさくなった場合に、自動的に吹き出し字幕表示となるので、音が伝わりにくい環境においても映像コンテンツを楽しむことができる。 Further, the content reproduction apparatus may automatically switch the presence / absence of a balloon display. For example, the content reproduction apparatus further includes a volume measuring unit that measures the surrounding volume. The content reproduction apparatus compares the volume output from the speaker measured by the volume measurement unit with the surrounding volume. As a result of the comparison, when the surrounding sound volume is equal to or higher than a predetermined threshold, the playback control unit of the content playback apparatus instructs the synthesis transfer unit to switch to synthesis by subtitle display by a balloon, and stops the audio output from the speaker. As a result, when the surrounding sound becomes loud, the balloon caption is automatically displayed, so that the video content can be enjoyed even in an environment where the sound is not easily transmitted.

また、コンテンツ再生装置の再生制御部は、周囲の音量が所定の閾値よりも低い場合、自動的にマナーモードとして、スピーカからの音声出力を停止させ、合成転送部に吹き出しによる字幕表示による合成に切り換えるよう命令するようにしてもよい。これにより、携帯電話やＰＤＡ等の移動体端末を用いている場合に、周囲が静かな場合、自動的にマナーモードとなって、映像コンテンツを楽しむことができる。 Also, the playback control unit of the content playback device automatically sets the manner mode when the surrounding volume is lower than a predetermined threshold, stops the audio output from the speaker, and causes the synthesis transfer unit to perform synthesis by subtitle display by a balloon. It may be instructed to switch. Accordingly, when a mobile terminal such as a mobile phone or a PDA is used, if the surroundings are quiet, the manner mode is automatically set and the video content can be enjoyed.

また、コンテンツ再生装置は、加速度センサや受信電波のドップラー効果等を計算して、自機器がどのくらいの速度で移動しているかを計測する移動速度計測部をさらに備える。速度計測部が計測した移動速度が歩行の速度よりも速い場合、コンテンツ再生装置の再生制御部は、運転中または乗り物に乗っていると判断して、マナーモードとして、吹き出しによる字幕表示に切り換えるように合成転送部に命令してもよい。 In addition, the content reproduction apparatus further includes a movement speed measurement unit that calculates an acceleration sensor, a Doppler effect of received radio waves, and the like, and measures how fast the device is moving. When the moving speed measured by the speed measuring unit is faster than the walking speed, the playback control unit of the content playback device determines that the vehicle is driving or is on a vehicle, and switches to subtitle display using a speech balloon as the manner mode. Alternatively, the combination transfer unit may be instructed.

なお、コンテンツ再生装置は、ユーザからの指示に応じて、従来通りの字幕表示と吹き出しによる字幕表示とを切り換えることができるようにしてもよい。具体的には、ユーザから従来通りの字幕表示が指示された場合、コンテンツ再生装置は、吹き出しデータの字幕開始時間、字幕継続時間、および字幕文字情報のみを参照して、字幕開始時間で指定されている時間から字幕継続時間までの間、画面の上下左右のいずれかに字幕文字が配置されるような字幕文字信号を生成する。そして、合成転送部は、字幕文字信号と映像信号とを合成し、コンテンツ表示装置に表示させる。これによって、従来通りの字幕も表示することができる。 Note that the content reproduction apparatus may be able to switch between conventional subtitle display and subtitle display using a balloon in accordance with an instruction from the user. Specifically, when the subtitle display in the conventional manner is instructed by the user, the content playback apparatus is designated by the subtitle start time by referring only to the subtitle start time, subtitle duration, and subtitle character information of the balloon data. The subtitle character signal is generated so that the subtitle characters are arranged on either the top, bottom, left, or right of the screen from the current time to the subtitle duration. Then, the synthesis transfer unit synthesizes the subtitle character signal and the video signal and causes the content display device to display the synthesized signal. As a result, conventional subtitles can also be displayed.

なお、コンテンツ作成装置は、字幕リストデータを作成する際、音圧の強さに応じて、文字を強調するような情報を字幕リストデータに登録してもよい。この場合、具体的には、コンテンツ作成装置は、圧電センサ等によって音圧を検出する音圧検出装置を備え、字幕継続時間中の音圧の平均が閾値よりも大きい場合は、文字を大きくする属性を字幕リストデータに登録し、小さい場合は、文字を小さくする属性を字幕リストデータに登録しておけばよい。 Note that the content creation device may register information for emphasizing characters in the caption list data according to the sound pressure level when creating the caption list data. In this case, specifically, the content creation device includes a sound pressure detection device that detects sound pressure using a piezoelectric sensor or the like, and if the average of the sound pressure during the caption duration is larger than a threshold value, the content is enlarged. The attribute is registered in the caption list data. If the attribute is small, the attribute for reducing the text may be registered in the caption list data.

なお、字幕継続時間が短いために、吹き出し部分に字幕文字が入りきらない場合、コンテンツ作成装置は、字幕文字が入りきらない旨を示すマーク等で表示出力部に表示させることによって、ユーザに通知する。ユーザは、この通知があった場合、吹き出し部分の大きさや、字幕文字の内容等を変更させるようにすればよい。コンテンツ作成装置による吹き出し部分に字幕が入りきるか否かは、字幕継続時間中における単位時間（たとえば、一フレーム）当たりの前記字幕文字の数が所定数以上であるか否かによって判断されればよい。所定数以上である場合、コンテンツ作成装置は、字幕が入りきらないとして、字幕文字を変更するように、ユーザに通知すればよい。 If the subtitle characters cannot be entered in the balloon part because the subtitle duration is short, the content creation device notifies the user by displaying on the display output unit with a mark or the like indicating that the subtitle characters cannot be entered. To do. When this notification is given, the user may change the size of the balloon part, the content of the subtitle characters, and the like. Whether or not subtitles can be inserted into the balloon portion by the content creation device is determined by whether or not the number of subtitle characters per unit time (for example, one frame) in the subtitle duration is greater than or equal to a predetermined number. Good. If the number is greater than or equal to the predetermined number, the content creation device may notify the user to change the subtitle character on the assumption that the subtitles cannot be entered.

なお、字幕文字の文字数が多い場合は、吹き出し部分に入るだけの字幕文字を表示した後、入り切らなかった字幕文字を新たに同一の吹き出し部分に入れるようにすればよい。具体的には、コンテンツ再生装置が、ステップＳ３０９で字幕文字信号を作成する際に、入りきらなかった字幕文字からなる字幕文字信号を作成することによって、容易に実現できる。 If the number of subtitle characters is large, only the subtitle characters that can be entered into the speech balloon portion are displayed, and then the subtitle characters that have not been entered can be newly inserted into the same speech balloon portion. Specifically, this can be easily realized by creating a subtitle character signal composed of subtitle characters that cannot be included when the content reproduction apparatus creates a subtitle character signal in step S309.

なお、吹き出し形状データは、標準化されるのが理想的であるが、コンテンツ作成装置とコンテンツ再生装置との間で使用する吹き出し形状データが異なる場合、コンテンツ再生装置は、標準データを吹き出し形状データとして用いればよい。いかなる標準データを用いるかは、予めガイドラインを決めておけばよい。 Ideally, the balloon shape data should be standardized. However, if the balloon shape data used between the content creation device and the content playback device is different, the content playback device uses the standard data as the balloon shape data. Use it. What kind of standard data should be used can be determined in advance.

なお、上記実施形態では、コンテンツ作成装置は、字幕リストデータを吹き出しデータとを別々に作成することとしたが、吹き出しデータと一緒に字幕リストデータを作成してもよい。具体的には、コンテンツ作成装置は、音声立ち上がり位置を認識した際、吹き出し形状と字幕文字とを同時に登録するようにすればよい。 In the above embodiment, the content creation apparatus creates the caption list data separately from the balloon data, but may create the caption list data together with the balloon data. Specifically, the content creation device may register the balloon shape and the subtitle character at the same time when the voice rising position is recognized.

なお、上記実施形態では、吹き出しデータを作成する直前に字幕リストデータを作成することとしたが、字幕リストデータは、吹き出しデータとは別に予め作成されていてもよい。 In the above embodiment, the caption list data is created immediately before creating the balloon data, but the caption list data may be created in advance separately from the balloon data.

なお、上記実施形態では、まず、コンテンツ作成装置が吹き出し形状を自動的に選び出した後、必要があればユーザが修正を加えることとしたが、コンテンツ作成装置は、ユーザによる修正を認めずに、全て自動的に吹き出しデータを作成してもよいし、また、全て手動で吹き出しデータを作成してもよい。 In the above embodiment, first, the content creation device automatically selects the balloon shape, and then the user makes corrections if necessary, but the content creation device does not allow the user to make corrections, All of the balloon data may be automatically created, or all the balloon data may be manually created.

なお、上記実施形態では、コンテンツデータおよび吹き出しデータが放送されることとしたが、コンテンツを提供するためのシステムは、これに限られるものではない。 In the above embodiment, the content data and the balloon data are broadcast. However, the system for providing the content is not limited to this.

図１６は、インターネットを介して、コンテンツデータおよび吹き出しデータを提供するためのシステムの全体構成を示す図である。図１６に示すように、コンテンツ送信装置２ａは、インターネット３ａを介して、コンテンツ再生装置４ａにコンテンツデータおよび吹き出しデータが多重化されたデータを送信するようにしてもよい。この場合、コンテンツ作成装置１、コンテンツ表示装置５については、上記実施形態と同様の構成である。コンテンツ送信装置２ａは、多重化データをインターネットを介して、ＴＣＰ／ＩＰにしたがって、パケット送信するようにすればよい。コンテンツ再生装置４ａは、インターネットを介して送信されてくる多重化データをパケット単位で受信すればよい。 FIG. 16 is a diagram showing the overall configuration of a system for providing content data and speech balloon data via the Internet. As shown in FIG. 16, the content transmission device 2a may transmit data in which content data and speech balloon data are multiplexed to the content reproduction device 4a via the Internet 3a. In this case, the content creation device 1 and the content display device 5 have the same configuration as in the above embodiment. The content transmission device 2a may transmit packets of multiplexed data according to TCP / IP via the Internet. The content playback apparatus 4a may receive multiplexed data transmitted via the Internet in units of packets.

図１７は、コンテンツデータおよび吹き出しデータが多重化された多重化データをＤＶＤ等のパッケージメディアに格納して流通するためのシステムの全体構成を示す図である。図１７に示すように、パッケージメディア化装置２ｂは、多重化データをＤＶＤ等の記録媒体に格納して、パッケージメディア化する。当該パッケージメディアは、流通システム３ｂを介して、視聴者の下に届く。パッケージメディア再生装置４ｂは、当該パッケージメディアに格納されている多重化データを読み出して、吹き出し字幕付きの映像コンテンツを再生する。 FIG. 17 is a diagram showing the overall configuration of a system for storing and distributing multiplexed data obtained by multiplexing content data and balloon data in a package medium such as a DVD. As shown in FIG. 17, the package media converting apparatus 2b stores the multiplexed data in a recording medium such as a DVD to form a package media. The package media reaches the viewers via the distribution system 3b. The package media playback device 4b reads multiplexed data stored in the package media and plays back video content with balloon subtitles.

本発明にかかる吹き出し字幕付き映像コンテンツ作成装置、送信装置、再生装置、提供システムならびにそれらで用いられるデータ構造および記録媒体は、発話者と字幕との関係が分かりやすく、かつ全体の画面が見やすい映像コンテンツを提供することができ、コンテンツの作成分野等において有用である。 The video content creation device, the transmission device, the playback device, the providing system, and the data structure and recording medium used in them according to the present invention are images in which the relationship between a speaker and subtitles is easily understood and the entire screen is easy to see. Content can be provided, which is useful in the field of content creation.

本発明の一実施形態に係る吹き出しを用いた字幕付きの映像コンテンツを放送するための放送システムの全体構成を示すブロック図The block diagram which shows the whole structure of the broadcast system for broadcasting the video content with a subtitle using the speech balloon concerning one Embodiment of this invention. コンテンツ作成装置１の機能的構成を示すブロック図The block diagram which shows the functional structure of the content creation apparatus 1 字幕リストデータのデータ構成の例を示す図The figure which shows the example of a data structure of subtitle list data 吹き出しデータのデータ構成の例を示す図The figure which shows the example of a data structure of speech balloon data コンテンツ送信装置２の機能的構成を示すブロック図The block diagram which shows the functional structure of the content transmitter 2 コンテンツ再生装置４の機能的構成を示すブロック図The block diagram which shows the functional structure of the content reproduction apparatus 4. コンテンツ表示装置５の機能的構成を示すブロック図The block diagram which shows the functional structure of the content display apparatus 5 コンテンツ作成装置１の動作を示すフローチャートFlow chart showing the operation of the content creation device 1 コンテンツ作成装置１における表示内容の一例を示す図The figure which shows an example of the display content in the content creation apparatus 1 コンテンツ作成装置１における表示内容の一例を示す図The figure which shows an example of the display content in the content creation apparatus 1 コンテンツ作成装置１における表示内容の一例を示す図The figure which shows an example of the display content in the content creation apparatus 1 コンテンツ作成装置１における表示内容の一例を示す図The figure which shows an example of the display content in the content creation apparatus 1 最終的に作成された吹き出しデータの一例を示す図The figure which shows an example of the balloon data finally created コンテンツ送信装置２の動作を示すフローチャートFlow chart showing operation of content transmission device 2 コンテンツ再生装置４の動作を示すフローチャートA flowchart showing the operation of the content reproduction apparatus 4 コンテンツ再生装置４によって作成される映像信号に基づく画像の一例を示す図The figure which shows an example of the image based on the video signal produced by the content reproduction apparatus 4 コンテンツ再生装置４によって作成される吹き出し信号に基づく画像の内容の一例を示す図The figure which shows an example of the content of the image based on the balloon signal produced by the content reproduction apparatus 4 コンテンツ再生装置４によって作成される字幕文字信号に基づく画像の一例を示す図The figure which shows an example of the image based on the caption character signal produced by the content reproduction apparatus 4 コンテンツ再生装置４によって作成される字幕文字信号に基づく画像の一例を示す図The figure which shows an example of the image based on the caption character signal produced by the content reproduction apparatus 4 コンテンツ再生装置４の合成転送部４３の動作を示す図The figure which shows operation | movement of the synthetic | combination transfer part 43 of the content reproduction apparatus 4. コンテンツ表示装置５における表示例を示す図The figure which shows the example of a display in the content display apparatus 5 コンテンツ表示装置５における表示例を示す図The figure which shows the example of a display in the content display apparatus 5 インターネットを介して、コンテンツデータおよび吹き出しデータを提供するためのシステムの全体構成を示す図The figure which shows the whole structure of the system for providing content data and speech balloon data via the internet コンテンツデータおよび吹き出しデータが多重化された多重化データをＤＶＤ等のパッケージメディアに格納して流通するためのシステムの全体構成を示す図The figure which shows the whole structure of the system for storing and distribute | circulating the multiplexed data with which content data and balloon data were multiplexed in package media, such as DVD

Explanation of symbols

１コンテンツ作成装置
２，２ａコンテンツ送信装置
２ｂパッケージメディア化装置
３放送装置
３ａインターネット
３ｂ流通システム
４，４ａコンテンツ再生装置
４ｂパッケージメディア再生装置
５コンテンツ表示装置
１１データ作成制御部
１２入力部
１３表示出力部
１４，４４タイムカウント部
１５記憶部
２１多重化制御部
２２，４２操作部
２３誤り訂正符号付加部
２４デジタル変調部
２５送信部
４１再生制御部
４３合成転送部
４５吹き出し形状記憶部
４６受信部
４７復調部
４８誤り訂正部
５１表示出力デバイス部
５２駆動回路部 DESCRIPTION OF SYMBOLS 1 Content production apparatus 2, 2a Content transmission apparatus 2b Package media conversion apparatus 3 Broadcast apparatus 3a Internet 3b Distribution system 4, 4a Content reproduction apparatus 4b Package media reproduction apparatus 5 Content display apparatus 11 Data creation control part 12 Input part 13 Display output part 14, 44 Time count unit 15 Storage unit 21 Multiplexing control unit 22, 42 Operation unit 23 Error correction code addition unit 24 Digital modulation unit 25 Transmission unit 41 Playback control unit 43 Composite transfer unit 45 Balloon shape storage unit 46 Reception unit 47 Demodulation Unit 48 error correction unit 51 display output device unit 52 drive circuit unit

Claims

A content creation device for creating data necessary to provide video content with subtitles by a speech balloon,
In the video based on the video content data as the original data, a balloon time extracting means for extracting a time for displaying the balloon,
In the video at the time extracted by the balloon time extracting means, a balloon area determining means for determining a balloon area suitable for displaying a balloon,
A balloon image determining unit for determining a balloon image to be combined with the balloon region determined by the balloon region determining unit;
Subtitle character determining means for determining a subtitle character to be combined with the balloon image determined by the balloon image determining means;
Balloon data creation means for creating balloon data by converting at least one of the information about the time to display the balloon, the information about the balloon area, the information about the balloon image, and the information about the subtitle character into data. And
The content creation device, wherein the speech balloon data created by the speech balloon data creation means is reproduced together with the video content data to provide video content with captions by speech balloons.

The balloon region determining means detects a change in color tone in the image based on the video content data, extracts a portion having a flat color tone, and sets a frame included in the flat portion as a balloon region,
The content creation apparatus according to claim 1, wherein the balloon image determination unit sets an image having a size that displays the subtitle characters in the frame as a balloon image.

The content creation apparatus according to claim 2, wherein the balloon area determination unit determines the balloon area by changing the extracted frame based on an instruction from a user.

The content creation apparatus according to claim 2, wherein the balloon image determination unit changes the shape of the balloon image based on an instruction from a user.

The content creation device according to claim 2, wherein the subtitle character determination unit changes the subtitle character based on an instruction from a user.

The subtitle character determining means determines whether or not the number of subtitle characters per unit time in the time when the speech balloon is to be displayed is equal to or greater than a predetermined number. The content creation device according to claim 5, wherein the content creation device notifies the user to do so.

The content creation device according to claim 2, wherein the subtitle character determination unit changes an attribute of the subtitle character based on an instruction from a user.

The content creation apparatus according to claim 1, further comprising a multiplexing unit that multiplexes the video content data and the balloon data created by the balloon data creation unit.

The content creation device according to claim 8, further comprising multiplexed data transmission means for transmitting data multiplexed by the multiplexing means via a network.

The content creation device according to claim 8, further comprising package media storage means for storing data multiplexed by the multiplexing means in a package medium.

Furthermore, a volume determination means for determining the volume of the sound during reproduction of the video content data is provided,
The content creation device according to claim 1, wherein the subtitle character determination unit changes the attribute of the subtitle character according to the volume determined by the volume determination unit.

Furthermore, a face size extracting means for extracting the size of a person's face in the video based on the video content data is provided,
The content creation apparatus according to claim 1, wherein the balloon image determination unit determines a start point of the balloon image according to a face size extracted by the face size extraction unit.

The video content data is encoded by the MPEG (Moving Picture Experts Group) system,
The content creation apparatus according to claim 1, wherein the balloon data is described in XML (extensible Markup Language).

A content transmission device for transmitting data necessary for providing video content with subtitles by a speech balloon,
Information about the time to display a speech balloon in the video based on the video content data as the original data, information about the area where the speech balloon is displayed in the video, information about the shape of the speech balloon in the area, and subtitle characters to be inserted into the speech balloon Balloon information acquisition means for acquiring balloon data in which at least one piece of information is converted into data,
Video content data acquisition means for acquiring the video content data;
A multiplexing unit that multiplexes the balloon data acquired by the balloon data acquisition unit and the video content data acquired by the video content data acquisition unit;
A content transmission apparatus comprising: transmission means for transmitting data multiplexed by the multiplexing means.

15. The content transmitting apparatus according to claim 14, wherein the transmitting unit transmits multiplexed data to a broadcasting apparatus for wireless broadcasting.

15. The content transmission apparatus according to claim 14, wherein the transmission means transmits multiplexed data to a content reproduction apparatus for reproducing the video content data and balloon data via a network. .

A content package mediating device for packaging data necessary for providing video content with subtitles by a speech balloon,
Information about the time to display a speech balloon in the video based on the video content data as the original data, information about the area where the speech balloon is displayed in the video, information about the shape of the speech balloon in the area, and subtitle characters to be inserted into the speech balloon Balloon information acquisition means for acquiring balloon data in which at least one piece of information is converted into data,
Video content data acquisition means for acquiring the video content data;
A multiplexing unit that multiplexes the balloon data acquired by the balloon data acquisition unit and the video content data acquired by the video content data acquisition unit;
A content package media conversion apparatus comprising: storage means for storing data multiplexed by the multiplexing means in a package medium.

A content playback device for playing back video content with subtitles in a balloon,
Information about the time to display a speech balloon in the video based on the video content data as the original data, information about the area where the speech balloon is displayed in the video, information about the shape of the speech balloon in the area, and subtitle characters to be inserted into the speech balloon Balloon information acquisition means for acquiring balloon data in which at least one piece of information is converted into data,
Video content data acquisition means for acquiring the video content data;
A balloon signal generating means for generating a signal related to the image of the balloon based on the balloon data;
Subtitle character signal generating means for generating a signal related to subtitle characters based on the balloon data;
Video signal generating means for generating a video signal based on the video content data;
Combining and transferring means for synthesizing the balloon signal generated by the balloon signal generating means, the caption character signal generated by the caption character signal generating means, and the video signal generated by the video signal generating means, and transferring the combined signal to the display device A content playback device comprising:

And further comprising a synthesis presence / absence command means for commanding the synthesis transfer means whether to synthesize the speech signal and the subtitle character signal with the video signal,
When the synthesis transfer means receives an instruction to synthesize the speech signal and the subtitle character signal with the video signal from the synthesis presence / absence command means, the synthesis transfer means transfers the synthesis signal to the display device and does not synthesize. 19. The content reproduction apparatus according to claim 18, wherein when receiving a command, only the video signal is transferred to the display device.

further,
A volume measuring means for measuring the volume of surrounding sounds;
Volume threshold judgment means for judging whether or not the volume of sound measured by the volume measurement means exceeds a threshold;
20. The content reproduction apparatus according to claim 19, wherein the synthesis presence / absence commanding unit commands the synthesis transfer unit to determine whether or not to perform synthesis in accordance with a determination result of the volume threshold value determination unit.

The synthesis presence / absence command means, when the volume threshold value judgment means judges that the volume of surrounding sounds does not exceed the threshold value, gives a command to synthesize to the synthesis transfer means, and further outputs the voice 21. The content reproduction apparatus according to claim 20, wherein the audio output apparatus does not output audio.

21. The composition presence / absence command means, when the volume threshold value judgment means judges that the volume of surrounding sounds exceeds the threshold value, gives a command to synthesize to the composition transfer means. The content reproduction device described.

Furthermore, it has a moving speed measuring means for measuring the moving speed of its own device,
The compositing presence / absence command means determines whether or not the moving speed measured by the moving speed measuring means exceeds a predetermined threshold value, and if so, gives a command for compositing to the combining and transferring means. The content playback device according to claim 19.

The composition presence / absence command means instructs the composition transfer means to synthesize the speech signal and the subtitle character signal with the video signal in accordance with an instruction from a user. The content reproduction apparatus described in 1.

The subtitle character signal generating means generates a normal subtitle character signal for displaying a subtitle character on either the top, bottom, left or right of the screen based on the balloon data in accordance with an instruction from the user,
When the subtitle character signal generating unit generates the normal subtitle character signal, the synthesizing and transferring unit synthesizes only the normal subtitle character signal and the video signal and transfers the composite signal to the display device. The content reproduction apparatus according to claim 18.

The content reproduction apparatus according to claim 18, wherein the synthesis transfer unit synthesizes the speech balloon signal, the caption character signal, and the video signal for each frame.

The content reproduction apparatus according to claim 18, further comprising display means for displaying the synthesized video based on the synthesized signal transferred from the synthesized transfer means.

A computer-readable recording medium on which data having a structure for displaying video content with subtitles by a speech balloon is recorded on a computer device,
A structure for storing information about a time when a speech balloon should be displayed in a video based on video content data as original data;
Corresponding to the information about the time, a structure for storing information about a region for displaying a balloon in the video,
Corresponding to the information about time, a structure for storing information about the shape of the balloon in the region;
A recording medium on which data having a structure for storing information on subtitle characters to be inserted into a balloon is recorded in correspondence with the information on time.

The structure for storing information about the time is:
A structure for storing information indicating the start time of subtitles;
29. The recording medium according to claim 28, comprising a structure for storing information indicating a duration of subtitles.

A data structure for displaying video content with subtitles by a balloon on a computer device,
A structure for storing information about a time when a speech balloon should be displayed in a video based on video content data as original data;
Corresponding to the information about the time, a structure for storing information about a region for displaying a balloon in the video,
Corresponding to the information about time, a structure for storing information about the shape of the balloon in the region;
A data structure having a structure for storing information on subtitle characters to be inserted into a balloon in correspondence with the information on time.

A content providing system for providing video content with subtitles by a speech balloon,
Information about the time to display a speech balloon in the video based on the video content data as the original data, information about the area where the speech balloon is displayed in the video, information about the shape of the speech balloon in the area, and subtitle characters to be inserted into the speech balloon A content creation device that creates speech balloon data in which at least one piece of information is converted into data,
Content providing means for multiplexing the balloon data created by the content creation device and the video content data, and providing the multiplexed data as video content;
A content providing system comprising: a content reproducing device that reproduces video content with subtitles by a balloon based on multiplexed data provided from the content providing means.

32. The content providing system according to claim 31, wherein the content providing means transmits the multiplexed data to the content reproduction device by wireless broadcasting.

32. The content providing system according to claim 31, wherein the content providing means transmits the multiplexed data to the content reproduction device by network distribution.

32. The content providing system according to claim 31, wherein the content providing means provides the multiplexed data to the content reproduction device via a package medium.