JP2005210196A

JP2005210196A - Information processing apparatus, and information processing method

Info

Publication number: JP2005210196A
Application number: JP2004011977A
Authority: JP
Inventors: Taro Takita; 太郎滝田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2004-01-20
Filing date: 2004-01-20
Publication date: 2005-08-04

Abstract

<P>PROBLEM TO BE SOLVED: To attach additional information matching the contents of video/audio data with higher versatility than that of prior arts. <P>SOLUTION: A translation server receives the streaming video/audio data transmitted from a base station apparatus 1 via the Internet, and generates translation metadata as additional information to the received video/audio data. The translation metadata are obtained through voice recognition of voice signal data of the received video/audio data and match the contents of the data. Then the translation metadata produced as above are attached to the originally received video/audio data and transmitted to a monitor apparatus as streaming data. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、例えば通信網上に接続され、受信取得した映像／音声データについての情報処理を行うための情報処理装置、及び情報処理方法に関するものである。 The present invention relates to an information processing apparatus and an information processing method for performing information processing on received / acquired video / audio data connected to a communication network, for example.

先に本出願人は、例えばＡＶ(Audio Video)機器として、基地局装置とモニタ装置とを無線通信により接続するように構成した情報送受信システムを提案している。 The present applicant has previously proposed an information transmission / reception system configured to connect a base station apparatus and a monitor apparatus by wireless communication, for example, as AV (Audio Video) equipment.

この情報送受信システムの基地局装置は、例えばテレビジョン放送のチューナを備えることで、テレビジョン放送信号を受信選局して映像／音声信号に復調することができるようになっている。また、例えば映像／音声信号入力端子などを備えることで外部映像／音声ソースの入力機能を有する。そして、例えばこのようにして得られた映像／音声情報を圧縮符号化した映像／音声データの形式に変換して無線により送信出力することが可能とされる。 The base station apparatus of this information transmission / reception system includes a television broadcast tuner, for example, so that it can receive and select a television broadcast signal and demodulate it into a video / audio signal. For example, an external video / audio source input function is provided by providing a video / audio signal input terminal. Then, for example, the video / audio information obtained in this way can be converted into a compressed / encoded video / audio data format and transmitted and output wirelessly.

モニタ装置は、例えば室内において持ち歩きが可能な程度の小型のサイズを有しているものとされる。そして、基地局装置から無線送信されてくる映像／音声データをデコードして、画像表示及び音声出力を行うようにされる。 The monitor device is assumed to have a small size that can be carried around indoors, for example. Then, the video / audio data wirelessly transmitted from the base station apparatus is decoded to perform image display and audio output.

このような情報送受信システムであれば、ユーザは、例えば室内においてモニタ装置を持ち歩き、任意の場所に設置することができる。つまり、通信可能な範囲内である限りは、自由な場所でテレビジョン放送や、基地局装置に入力される映像／音声情報などのコンテンツ情報を視聴することができる。
ここで、例えばテレビジョン放送の選局や映像／音声ソースの選択などをはじめとした、各種の基地局装置側をコントロールするための操作も、モニタ装置側で行えるようになっており、その操作情報が基地局装置に送信されるようになっている。基地局装置では、受信した操作情報に応じて、例えば選局チャンネルが切り換わるようにチューナに対する制御を実行したり、また、映像／音声ソースの入力切り換えを行うようにされる。 With such an information transmission / reception system, the user can carry the monitor device indoors, for example, and install it at an arbitrary location. That is, as long as it is within a communicable range, it is possible to view television broadcasts and content information such as video / audio information input to the base station apparatus at any place.
Here, for example, operations for controlling various base station devices such as selection of television broadcasts and selection of video / audio sources can be performed on the monitor device side. Information is transmitted to the base station apparatus. In the base station apparatus, for example, the tuner is controlled so that the channel selection channel is switched or the input of the video / audio source is switched according to the received operation information.

このような情報送受信システムは、例えば映像／音声ソースを入力、取得する装置（基地局装置）と、この映像／音声を出力するモニタ装置とが別体化された構成のものとしてみることができる。そして、本来は、上記もしているように、映像／音声の伝送を無線で行うことで、例えば家屋内程度の比較的狭い通信可能範囲ではあるが、ユーザがコンテンツ情報を視聴、鑑賞する場所の自由度を与えるようにしているものである。 Such an information transmission / reception system can be viewed as a configuration in which, for example, a device (base station device) that inputs and acquires a video / audio source and a monitor device that outputs this video / audio are separated. . Originally, as described above, by transmitting video / audio wirelessly, for example, in a relatively narrow communicable range such as in a house, a place where a user can view and appreciate content information. It gives a degree of freedom.

そのうえで、このような情報送受信システムにおいて、基地局装置とモニタ装置との両者にインターネット、ＬＡＮ(Local Area Network)などのネットワーク接続機能を有させるとする。これにより、基地局装置とモニタ装置との間で、上記した映像／音声などのコンテンツ情報を、インターネット経由で送受信させることが可能となる。 In addition, in such an information transmission / reception system, it is assumed that both the base station apparatus and the monitor apparatus have a network connection function such as the Internet or a LAN (Local Area Network). As a result, the content information such as video / audio described above can be transmitted and received between the base station apparatus and the monitor apparatus via the Internet.

インターネット経由による通信の場合には、インターネットに接続可能な環境でありさえすれば、通信可能範囲の限定はなくなる。そこで、次のような情報送受信システムの使用を考えることができる。
つまり、先ず、例えば基地局装置を、ユーザがモニタ装置を利用している地域とは遠隔した地域に設置することとする。例としては、基地局装置を、モニタ装置を利用している地域を含むテレビジョン放送のサービスエリアとは全く異なるサービスエリア内に設置するようにされる。そのうえで、基地局装置とモニタ装置は、映像／音声データの送受信と共に、コマンドの送受信も可能なように構成することとする。
このようにすれば、ユーザは、インターネット経由で、基地局装置をコントロールして、その基地局装置で受信復調したテレビジョン放送の映像／音声データをモニタ装置に送信させ、モニタ装置により画像／音声を出力させることができる。つまり、例えば、ユーザが現にモニタ装置を使用している地域では本来受信することができないテレビジョン放送を、その地域に居ながらにして視聴することが可能になる。 In the case of communication via the Internet, as long as the environment is connectable to the Internet, the communication range is not limited. Therefore, use of the following information transmission / reception system can be considered.
That is, first, for example, the base station device is installed in a region remote from the region where the user uses the monitor device. As an example, the base station apparatus is installed in a service area that is completely different from the television broadcast service area including the area where the monitor device is used. In addition, the base station apparatus and the monitor apparatus are configured to be able to transmit / receive commands as well as transmit / receive video / audio data.
In this way, the user controls the base station device via the Internet, transmits the video / audio data of the television broadcast received and demodulated by the base station device to the monitor device, and the monitor device transmits the image / audio. Can be output. In other words, for example, a television broadcast that cannot be received in an area where the user is actually using the monitor device can be viewed while being in the area.

図８に、上記のようにして、基地局装置とモニタ装置との間でインターネットによる通信を行う場合の映像／音声データの流れを示す。
上記もしているように、インターネット経由での通信は、通信端末間の物理的距離の限定はなく、従って、インターネットに接続可能な環境でありさえすれば、これら通信端末がそれぞれ異なる国にあってもよいものである。そして、基地局装置とモニタ装置との間で映像／音声データを送受信する場合についても、インターネット経由による通信を利用することを前提とすれば、基地局装置とモニタ装置をそれぞれ異なる国に設置することも可能となる。そこで、図８では、基地局装置とモニタ装置とを、それぞれ異なる国に設置していることとする。 FIG. 8 shows a flow of video / audio data when communication is performed between the base station apparatus and the monitor apparatus via the Internet as described above.
As described above, communication via the Internet is not limited in the physical distance between communication terminals. Therefore, as long as the environment is connectable to the Internet, these communication terminals are located in different countries. Is also good. And when video / audio data is transmitted and received between the base station device and the monitor device, the base station device and the monitor device are installed in different countries, assuming that communication via the Internet is used. It is also possible. Therefore, in FIG. 8, it is assumed that the base station apparatus and the monitor apparatus are installed in different countries.

図８に示すようにして、基地局装置１は、Ａ国内においてインターネット１００と接続されている。また、モニタ装置２は、Ｂ国内においてインターネット１００と接続されている。
そして、基地局装置１からは、例えば前述したように、テレビジョン放送を受信して得た映像／音声データ、あるいは外部ＡＶ機器から入力された映像／音声データなどを、ストリーミングデータとしてインターネット１００を経由して、モニタ装置２を送信先として送信するようにされる。
このようにして送信された映像／音声データは、インターネット経由でモニタ装置２により受信される、そして、モニタ装置２では、この受信取得した映像／音声データについて、例えば再生時間軸が同期した状態で、映像データについては画像として表示出力させると共に、音声データについては、例えばスピーカなどから音声として出力させる。 As shown in FIG. 8, base station apparatus 1 is connected to Internet 100 in A country. Further, the monitor device 2 is connected to the Internet 100 in the country B.
Then, as described above, for example, video / audio data obtained by receiving a television broadcast or video / audio data input from an external AV device is transmitted from the base station apparatus 1 as streaming data over the Internet 100. Via, the monitor device 2 is transmitted as a transmission destination.
The video / audio data transmitted in this way is received by the monitor device 2 via the Internet. In the monitor device 2, the received video / audio data is synchronized with, for example, the playback time axis. The video data is displayed and output as an image, and the audio data is output as sound from, for example, a speaker.

ここで、実際に上記のようにして、基地局装置１とモニタ装置２とをそれぞれ、Ａ国とＢ国という異なる国で使用するとした場合において、例えば基地局装置１の仕様として、内蔵するテレビジョン放送チューナをＡ国の放送に対応したものとすれば、基地局装置１は、このＡ国のテレビジョン放送を受信して得た映像／音声データをモニタ装置２に対して送信できることになる。
また、Ａ国に設置される基地局装置１に対して接続されるＡＶ機器としても、通常はＡ国内で使用されているものであると考えられるから、例えばこのようなＡＶ機器側が対応する映像／音声ソフトとしても、そのＡ国で鑑賞されることを前提としたものが多く得られることとなる。そして、このような映像／音声ソフトをソースとする映像／音声データについても、基地局装置１側からモニタ装置２に送信することが可能とされる。
つまり、モニタ装置２を使用するユーザは、Ｂ国に居ながらにしてＡ国のテレビジョン放送やＡ国の映像／音声ソフトなどを視聴することが可能になる。 Here, in the case where the base station device 1 and the monitor device 2 are actually used in different countries, such as country A and country B, as described above, for example, as a specification of the base station device 1, a built-in television If the John broadcast tuner is compatible with the broadcast of country A, the base station apparatus 1 can transmit the video / audio data obtained by receiving the television broadcast of country A to the monitor apparatus 2. .
Moreover, since it is considered that AV equipment connected to the base station apparatus 1 installed in the country A is normally used in the country A, for example, such an AV equipment side supports video. / Various audio software that is premised on viewing in country A can be obtained. Also, video / audio data using such video / audio software as a source can be transmitted from the base station apparatus 1 side to the monitor apparatus 2.
That is, the user who uses the monitor device 2 can view the television broadcast of country A, the video / audio software of country A, etc. while staying in country B.

特開２００１−１０１１９０号公報JP 2001-101190 A

ところで、上記したようなシステムを実際に使用するのにあたっては、例えばユーザが享受できる利便性、娯楽性などを考慮すると、上記したように単に映像／音声データを送受信するのみではなく、これに対してさらに何らかの付加価値が与えられることが好ましい。この点について、再度、図８のシステムを参照して、１つの具体例を挙げてみる。 By the way, when actually using the system as described above, for example, considering the convenience and entertainment that can be enjoyed by the user, the video / audio data is not simply transmitted / received as described above. It is preferable that some additional value is given. In this regard, referring again to the system of FIG. 8, a specific example will be given.

ここで、上記図８において基地局装置１が使用されるＡ国と、モニタ装置２が使用されるＢ国とでは、それぞれの使用言語が言語Ａと言語Ｂで異なっているものとする。
この場合において、最も分かりやすい例として、基地局装置１側にて受信するテレビジョン放送は、Ａ国の国民が視聴することを前提としているので、その番組のほとんどは言語Ａを使用して制作されていることになる。つまり、言語的には、言語Ａを使用できる人が理解できるコンテンツとなっているものである。
図８に示す通信によっては、このようなコンテンツ内容の映像／音声データが、Ａ国内の基地局装置１から、Ｂ国内のモニタ装置２に対して送信されることになる。Ｂ国内においてモニタ装置２を使用しているユーザは、例えばＢ国の国民であるなど、言語Ｂを使用言語としていることが、一般には想定される。このために、例えばモニタ装置２を使用しているユーザが通常は言語Ｂを使用していて、言語Ｂは使用、理解できないというばあい、モニタ装置２にて視聴しているＡ国のコンテンツの内容も充分に理解できないという状況となる。
つまり、モニタ装置２を使用しているユーザとしては、確かに、他国のコンテンツを手軽に視聴できるという利益は享受できるものの、その使用言語が理解できない場合に、充分にそのコンテンツの内容を理解して楽しみながら鑑賞することができないという不都合が生じることになる。 Here, in FIG. 8, it is assumed that the language A and the language B are different in country A in which the base station device 1 is used and in country B in which the monitor device 2 is used.
In this case, as the most easy-to-understand example, since the television broadcast received at the base station apparatus 1 is assumed to be viewed by the people of country A, most of the programs are produced using language A. Will be. That is, in terms of language, the content can be understood by those who can use the language A.
Depending on the communication shown in FIG. 8, such video / audio data of the content content is transmitted from the base station device 1 in A to the monitor device 2 in B. In general, it is assumed that the user who uses the monitor device 2 in the country B uses the language B as a working language, for example, is a citizen of the country B. For this reason, for example, if the user who uses the monitor device 2 normally uses the language B and cannot use or understand the language B, the content of the country A being viewed on the monitor device 2 The situation is such that the contents cannot be fully understood.
In other words, the user who uses the monitor device 2 can certainly enjoy the benefit of being able to easily view content in other countries, but understands the content of the content sufficiently when the language used cannot be understood. Inconvenience that it is impossible to appreciate while enjoying.

このような不都合を解決するための１つの対策として、例えば、テレビジョン放送の運営側で、映像／音声信号に対して翻訳情報を付加して送出させるようにして、これを受信復調する側の装置構成として、上記翻訳情報を抽出、復調して、所定の出力態様により映像／音声情報と共に再生出力させることが考えられる。
しかしながら、この場合には、テレビジョン放送の送出側において、上記したような翻訳データを作成して映像／音声情報に付加して送出させるための設備、環境を追加しなければならないために、テレビジョン放送の運営側としては相当のコスト負担となってしまう。また、このような設備、環境を備えて、映像／音声情報に翻訳データを付加するサービスを実際に提供したとしても、この場合には、例えば映像／音声情報に対する翻訳データの付加作業は、既に映像／音声ソースとしてあるコンテンツに対する編集としてしか行うことができない。従って、現実問題として、このような作業を、全ての放送番組に対して行うことは不可能であり、例えば、人気のある一部の番組のみに限定するような、小規模な範囲でのサービスとならざるを得ない。特に、例えばスポーツ中継やニュースの番組などのような、いわゆる生番組といわれるものについては、上記した施設環境の場合には、リアルタイム的な処理は不可能であるために、翻訳データを付加するサービスからは除外されてしまう。
この点では、映像／音声ソフトをソースとした映像／音声データについても同様のことがいえる。例えば、ＤＶＤなどの映像／音声ソフトには、字幕などのデータも記録されており、本来の画像／音声と共に、この字幕も合成表示させることができたり、また、言語吹き替えの音声も用意されていたりする。しかしながら、この場合には、ソフトとしてのパッケージメディアに予め記録されている特定言語の字幕、吹き替えの音声しか出力させることができない。
つまり、例えばインターネットなどのネットワーク経由で映像／音声データを送受信するシステムにおいて、映像／音声データに対して例えば翻訳データなどの付加情報を与えることができるようにしてユーザの利益向上を図ろうとしても、現状においては、このようなことが可能な映像／音声ソースは限られてしまっている。 As one countermeasure for solving such inconvenience, for example, on the television broadcast management side, translation information is added to the video / audio signal and transmitted, and this is received and demodulated. As an apparatus configuration, it is conceivable that the translation information is extracted and demodulated and reproduced and output together with video / audio information in a predetermined output mode.
However, in this case, on the transmission side of the television broadcast, it is necessary to add facilities and environment for creating the translation data as described above and adding it to the video / audio information for transmission. It will be a considerable cost burden for the management side of John Broadcasting. Further, even if a service for providing translation data to video / audio information is actually provided with such facilities and environments, in this case, for example, the operation of adding translation data to video / audio information has already been performed. It can only be done as editing for some content as a video / audio source. Therefore, as a practical matter, it is impossible to perform such work for all broadcast programs. For example, a service in a small range such as limiting to only some popular programs. It must be. In particular, for so-called live programs, such as sports broadcasts and news programs, a service that adds translation data because real-time processing is not possible in the case of the facility environment described above. Will be excluded.
In this respect, the same can be said for video / audio data using video / audio software as a source. For example, video / audio software such as a DVD records data such as subtitles, and the subtitles can be synthesized and displayed together with the original image / audio, and language-dubbed audio is also prepared. Or However, in this case, only subtitles in a specific language and voice-over audio recorded in advance on package media as software can be output.
In other words, in a system for transmitting / receiving video / audio data via a network such as the Internet, for example, additional information such as translation data can be given to the video / audio data to improve the user's profit. At present, video / audio sources that can do this are limited.

そこで本発明は上記した課題を考慮して情報処理装置として次のように構成することとした。
つまり、所定の通信網を経由したストリーミング送信を受信して取得されるもので、少なくとも映像データ及び／又は音声データを含む映像／音声データを入力して、この入力された映像／音声データから認識した所定のコンテンツ内容に基づいて、所定の情報内容を有する付加情報を生成する付加情報生成手段と、付加情報を入力された映像／音声データに付加して、所定の通信網を経由して特定の端末装置に対してストリーミング送信すべき送信用映像／音声データを生成する映像／音声データ生成手段とを備えて構成することとした。 In view of the above problems, the present invention is configured as an information processing apparatus as follows.
In other words, it is obtained by receiving streaming transmission via a predetermined communication network, and inputs video / audio data including at least video data and / or audio data, and recognizes from the input video / audio data Additional information generating means for generating additional information having predetermined information content based on the predetermined content content, and adding the additional information to the input video / audio data and specifying it via a predetermined communication network And video / audio data generating means for generating video / audio data for transmission to be transmitted to the terminal device.

また、所定の通信網を経由したストリーミング送信を受信して取得されるもので、少なくとも映像データ及び／又は音声データを含む映像／音声データを入力して、この入力された映像／音声データから認識した所定のコンテンツ内容に基づいて、所定の情報内容を有する付加情報を生成する付加情報生成処理と、付加情報を入力された映像／音声データに付加して、所定の通信網を経由して特定の端末装置に対してストリーミング送信すべき送信用映像／音声データを生成する映像／音声データ生成処理とを実行するようにして、情報処理方法を構成することとした。 Also, it is obtained by receiving streaming transmission via a predetermined communication network. At least video / audio data including video data and / or audio data is input and recognized from the input video / audio data. Additional information generation processing for generating additional information having the predetermined information content based on the predetermined content content, and adding the additional information to the input video / audio data and specifying it via a predetermined communication network The information processing method is configured to execute the video / audio data generation processing for generating video / audio data for transmission to be transmitted to the terminal device.

上記各構成によっては、所定の通信網経由でストリーミング送信されてきた映像／音声データを入力して、この入力された映像／音声データについての付加情報を生成するようにされる。この付加情報は、重力された映像／音声データから認識したコンテンツ内容に基づいて所定の情報内容を有するものとされる。従って、付加情報の内容は、ストリーミングデータとして入力された映像／音声データのコンテンツ内容に適応して決まるものとなる。
そして、上記のようにして生成した付加情報を、元の入力された映像／音声データに付加して送信用映像／音声データを生成し、ストリーミングデータとして送信するようにされる。ここで、入力された映像／音声データがストリーミングデータとして送信されてきたものであり、かつ、付加情報を付加した送信用映像／音声データもストリーミングデータとして送信するということは、送信用映像／音声データを送信出力するのにあたり、元の入力された映像／音声データが有していた時間的連続性は保たれているということになる。
つまり本発明としては、例えばストリーミングデータとして入力された映像／音声データについて、例えば一時保存などは行わない、リアルタイム的な処理によって、映像／音声データに対して、そのコンテンツ内容に応じた所定内容の付加情報を生成し、これを付加して送出しているということになる。 Depending on each configuration described above, video / audio data streamed and transmitted via a predetermined communication network is input, and additional information about the input video / audio data is generated. This additional information has predetermined information content based on the content content recognized from the gravitational video / audio data. Therefore, the content of the additional information is determined in accordance with the content content of the video / audio data input as streaming data.
Then, the additional information generated as described above is added to the original input video / audio data to generate transmission video / audio data, which is transmitted as streaming data. Here, input video / audio data has been transmitted as streaming data, and transmission video / audio data to which additional information is added is also transmitted as streaming data. In transmitting and outputting data, the temporal continuity of the original input video / audio data is maintained.
In other words, according to the present invention, for example, video / audio data input as streaming data is not temporarily stored, for example, by real-time processing. This means that additional information is generated, added, and transmitted.

例えば、従来までの考え方として、映像／音声データに付加情報を付加しようとした場合には、予め、既にあるとされる映像／音声データとしてのソースに対して付加情報を付加することとしていた。このために、現実問題として、付加情報を付加して提供可能な映像／音声データのソースを多く作成することは困難であった。
これに対して本発明では、ストリーミングデータとして送信されてきた映像／音声データについて、リアルタイム性を持って、そのコンテンツ内容に適応した内容の付加情報を作成、付加して再送出するようにされている。従って、処理対象として入力された映像／音声データであれば、そのコンテンツ内容等にかかわらず、付加情報を付加して送出させることが可能となる。換言すれば、処理対象として入力されてくる映像／音声データについては、全て、適切な付加情報を付加して送出する可能になる。つまり、処理対象として入力されてくる限り、映像／音声データに付加情報を付加することの制限は無いものであり、これまでよりも高い汎用性が得られることになる。さらに、例えば放送を送出する側や、映像／音声ソースのパッケージメディア制作者などの、映像／音声データを提供する側にとっては、付加情報を作成して映像／音声データに付加するための設備環境を整える必要がないという利益も得られることになる。 For example, as a conventional way of thinking, when additional information is to be added to video / audio data, the additional information is previously added to a source as existing video / audio data. For this reason, as a real problem, it has been difficult to create many video / audio data sources that can be provided with additional information.
On the other hand, in the present invention, with respect to video / audio data transmitted as streaming data, additional information having contents adapted to the contents of the contents is created, added, and retransmitted. Yes. Therefore, if it is video / audio data input as a processing target, it is possible to send it by adding additional information regardless of the content content. In other words, all the video / audio data input as a processing target can be transmitted with appropriate additional information added. That is, as long as it is input as a processing target, there is no restriction on adding additional information to video / audio data, and higher versatility than before can be obtained. Further, for example, for broadcasters and video / audio data providers such as video / audio source package media producers, the equipment environment for creating additional information and adding it to video / audio data There is also a benefit that there is no need to arrange.

以下、本発明を実施するための最良の形態（以下、実施の形態という）についての説明を行っていくこととする。
ここで、本実施の形態としては、例えば家屋などの屋内で使用されることを前提とし、基地局装置とモニタ装置とによりインターネット経由で情報送受信を行うことのできる情報送受信システムと、これらインターネット上で、上記基地局装置からモニタ装置に対してコンテンツデータ（映像／音声データ）を送信するときに介在するようにされた翻訳サーバとから成るネットワークシステムを例に挙げることとする。 Hereinafter, the best mode for carrying out the present invention (hereinafter referred to as an embodiment) will be described.
Here, as the present embodiment, for example, on the premise that it is used indoors such as a house, an information transmission / reception system capable of transmitting / receiving information via the Internet between the base station device and the monitor device, and these on the Internet Thus, a network system including a translation server interposed when transmitting content data (video / audio data) from the base station apparatus to the monitor apparatus is taken as an example.

先ず、図１により、実施の形態のネットワークシステムにおいて、上記のようにして基地局装置とモニタ装置の組から成る情報送受信システムの構成概念を説明しておく。
本実施の形態の情報送受信システムは、図１に示される基地局装置１とモニタ装置２から成るもので、例えば家庭の屋内で用いられる。この基地局装置１とモニタ装置２は、以降説明するように、相互に通信を行うことが可能とされている。 First, the configuration concept of an information transmission / reception system including a set of a base station apparatus and a monitor apparatus as described above in the network system of the embodiment will be described with reference to FIG.
The information transmission / reception system according to the present embodiment includes the base station apparatus 1 and the monitor apparatus 2 shown in FIG. 1 and is used, for example, indoors at home. The base station device 1 and the monitor device 2 can communicate with each other as will be described later.

基地局装置１は、例えば家庭の屋内にてしかるべき場所に対して固定的に設置される。そして、テレビジョン放送を受信選局して復調して映像／音声情報を得る機能を有している。
このため、基地局装置１は、テレビジョン放送受信機能に対応してテレビジョン放送受信用のアンテナANTを接続可能とされている。そして、アンテナANTより受信した放送信号について選局及び復調を行ってテレビジョン信号としての映像／音声情報を得る。そして、このテレビジョン信号については、所定方式により圧縮符号化された映像／音声データに変換する。 The base station apparatus 1 is fixedly installed at an appropriate place in a home, for example. It has a function of receiving and selecting a television broadcast and demodulating it to obtain video / audio information.
For this reason, the base station apparatus 1 can be connected to an antenna ANT for receiving a television broadcast corresponding to the television broadcast receiving function. Then, the broadcast signal received from the antenna ANT is selected and demodulated to obtain video / audio information as a television signal. The television signal is converted into video / audio data that has been compression-encoded by a predetermined method.

また、本実施の形態の基地局装置１は、例えばビデオ入力端子も備えており、例えば外部ＡＶ機器からこのビデオ入力端子に入力されたビデオ／オーディオ信号についても映像／音声情報として取得して、上記と同様に、圧縮符号化した映像／音声データに変換可能とされている。 In addition, the base station apparatus 1 of the present embodiment also includes, for example, a video input terminal. For example, a video / audio signal input to the video input terminal from an external AV device is acquired as video / audio information, Similarly to the above, it can be converted into compressed / encoded video / audio data.

さらに基地局装置１は、インターネット（ネットワーク）接続機能も有している。これにより、基地局装置１は、インターネット経由で、所要の端末にアクセスして通信を行うことが可能とされる。 Furthermore, the base station apparatus 1 also has an Internet (network) connection function. Thereby, the base station apparatus 1 can access and communicate with a required terminal via the Internet.

そして、基地局装置１からは、上記のようにして得た圧縮符号化された映像／音声データを、先ず、無線により電波として送信出力可能とされている。つまり、基地局装置１では、受信選局したテレビジョン放送の画像、ＡＶ機器から入力された映像／音声、及びインターネット画像を含むインターフェイス画像としての画像情報などの各種コンテンツ情報を、無線送信可能とされている。また、画像情報以外の各種データも無線により送信出力可能とされる。そして、このようにして基地局装置１から送信された情報は、次に説明するモニタ装置２側で受信できるようになっている。
また、基地局装置１とモニタ装置２は、この無線通信によりコマンドの送受信を行うことも可能とされている。これにより、モニタ装置２により基地局装置１をコントロールしたり、また逆に、基地局装置１からモニタ装置２をコントロールすることもできる。 The base station apparatus 1 can first transmit and output the compression-coded video / audio data obtained as described above as radio waves. That is, the base station apparatus 1 can wirelessly transmit various content information such as television broadcast images received and selected, video / audio input from AV equipment, and image information as interface images including Internet images. Has been. Various data other than image information can be transmitted and output wirelessly. The information transmitted from the base station apparatus 1 in this way can be received by the monitor apparatus 2 described below.
In addition, the base station device 1 and the monitor device 2 can transmit and receive commands through this wireless communication. Thereby, the base station apparatus 1 can be controlled by the monitor apparatus 2, and conversely, the monitor apparatus 2 can be controlled from the base station apparatus 1.

ここで、上記した無線通信は、例えば実際には、約３０ｍの範囲内での通信可能距離とする仕様となっている。つまり、この無線通信は、基地局装置１とモニタ装置２とが例えば同じ家屋内のような比較的至近の距離にある使用環境を前提とした場合に採用されるべきものとなる。そこで、以降においては、この無線通信について、例えばインターネットなどの通信と比較した場合には、通信距離が限定されたローカル的な通信であることを理由に、ローカル通信ともいうことにする。 Here, the above-described wireless communication is, for example, actually specified as a communicable distance within a range of about 30 m. That is, this wireless communication should be employed when the base station apparatus 1 and the monitor apparatus 2 are based on a usage environment in a relatively close distance such as the same house. Therefore, hereinafter, this wireless communication is also referred to as local communication because it is local communication with a limited communication distance when compared with communication such as the Internet.

モニタ装置２は、例えばユーザが屋内にて持ち歩き可能なように配慮されたサイズ形状を有しているものとされる。
このモニタ装置２では、上記のようにして基地局装置１から電波として無線送信された信号を受信して内部に入力することができる。例えば、入力した受信信号が圧縮符号化形式の映像／音声データによるコンテンツ情報であるときには、復号処理を施して映像／音声データを得る。 The monitor device 2 is assumed to have a size and shape so that the user can carry it indoors, for example.
The monitor device 2 can receive a signal wirelessly transmitted as a radio wave from the base station device 1 as described above and input it to the inside. For example, when the input received signal is content information based on video / audio data in a compression encoding format, video / audio data is obtained by performing a decoding process.

そして、モニタ装置２は、例えばＬＣＤ（Liquid Crystal Display）などの表示デバイスにより構成される表示部２７を備えており、上記のようにして得たコンテンツ情報における映像情報を表示部２７に画像として表示させる。つまり、モニタ装置２では、基地局装置１側で受信選局したテレビジョン放送の映像／音声情報や、ＡＶ機器から入力取得した映像／音声情報と、及びインターネット画像を含むインターフェイス画像を表示出力することができるようになっている。また、スピーカ２９を含む音声情報の出力機能も備えることで、例えば映像／音声情報の音声や、インターフェイス画像に対する操作などに対応した音声も出力可能とされている。 The monitor device 2 includes a display unit 27 configured by a display device such as an LCD (Liquid Crystal Display), for example. The video information in the content information obtained as described above is displayed on the display unit 27 as an image. Let That is, the monitor device 2 displays and outputs the television broadcast video / audio information received and selected on the base station device 1 side, the video / audio information input and acquired from the AV device, and the interface image including the Internet image. Be able to. Further, by providing an audio information output function including the speaker 29, for example, audio of video / audio information and audio corresponding to an operation on an interface image can be output.

また、表示部２７としての表示部位に対しては、タッチパネル３０ａが取り付けられており、このタッチパネル３０ａに対する操作を検出することによって操作情報を発生させるようにしている。
このタッチパネル３０ａに対する操作情報は、必要に応じて、無線によってアンテナ２６から基地局装置１に対して送信するようにされ、基地局装置１では、受信した情報に基づいて、所要の制御処理を実行するようにされる。このような送受信を含む動作が行われることによって、テレビジョン受像機のモニタとしての機能とインターネット機能との切り換え行ったり、また、テレビジョン放送のチャネル選択、外部映像／音声ソースの選択などを行ったりすることが可能とされる。 In addition, a touch panel 30a is attached to a display portion as the display unit 27, and operation information is generated by detecting an operation on the touch panel 30a.
The operation information for the touch panel 30a is wirelessly transmitted from the antenna 26 to the base station apparatus 1 as necessary, and the base station apparatus 1 executes a required control process based on the received information. To be done. By performing such operations including transmission and reception, the television receiver can be switched between the monitor function and the Internet function, the television broadcast channel can be selected, and the external video / audio source can be selected. It is possible to do.

このように、基地局装置１は、テレビジョン放送や外部ＡＶ機器などから映像／音声情報を取得し、さらにこれらの取得したコンテンツ情報を送信出力するというように、周囲から取得可能なコンテンツ情報をモニタ装置に伝送するためのインターフェイスとしての機能を有している。
また、モニタ装置２は、基地局装置１が取得した映像／音声情報を画像、音声によってユーザに提示し、また、当該システムに対するユーザの操作入力を受け付けるユーザインターフェイスとしての機能を有している。 In this way, the base station apparatus 1 acquires content information that can be acquired from the surroundings, such as acquiring video / audio information from a television broadcast or an external AV device, and further transmitting and transmitting the acquired content information. It has a function as an interface for transmission to the monitor device.
In addition, the monitor device 2 has a function as a user interface that presents the video / audio information acquired by the base station device 1 to the user by an image and sound, and receives a user operation input to the system.

そして、本実施の形態としては、モニタ装置２についてもインターネット接続機能を与えることとしている。つまり、情報送受信システムとして、基地局装置１及びモニタ装置２は、前述した電波による無線通信であるローカル通信と、このインターネット接続による通信との２つの通信網による情報通信を実行することが可能とされる。そして、本実施の形態としては、インターネット通信によっても、上記ローカル通信と同様にして、基地局装置１からモニタ装置２へのコンテンツ情報の送信と、基地局装置１とモニタ装置２との間でのコマンド送受信が可能なようにされている。 In this embodiment, the monitor device 2 is also provided with an Internet connection function. That is, as an information transmission / reception system, the base station device 1 and the monitor device 2 can execute information communication using two communication networks, that is, local communication that is wireless communication using the above-described radio waves and communication using the Internet connection. Is done. In the present embodiment, the content information is transmitted from the base station apparatus 1 to the monitor apparatus 2 between the base station apparatus 1 and the monitor apparatus 2 by Internet communication as well as the local communication. The command can be sent and received.

また、上記のようにして本実施の形態の情報送受信システム（基地局装置１，モニタ装置２）がインターネット（ネットワーク）との接続機能を備えていることで、基地局装置１及びモニタ装置２間のデータ通信以外のインターネット利用も可能である。
例えばインターネット上にあるＷｅｂサイトにアクセスしてＷｅｂコンテンツの閲覧を行うことも可能であり、また、電子メールの送受信なども可能とされる。
このためには、本実施の形態の情報送受信システムに対して、Ｗｅｂサイト閲覧のためのアプリケーションソフトウェアや、電子メールの送受信機能を有する、いわゆるメーラなどのアプリケーションソフトウェア（ここではネットワークアプリケーションともいう）をインストールするようにすればよい。
本実施の形態の場合には、基地局装置１とモニタ装置２の何れもインターネット接続機能を有していることから、上記したＷｅｂサイト閲覧機能や電子メール送受信機能などのネットワークアプリケーションは、基地局装置１とモニタ装置２のいずれにもインストールして実行させることができる。
ネットワークアプリケーションを基地局装置１にインストールした場合には、この基地局装置１側において、ネットワークアプリケーションが動作して、Ｗｅｂサイトへのアクセスや電子メールの送受信が実行されることになる。そして、基地局装置１からは、ネットワークアプリケーションのＧＵＩ画像をモニタ装置２に対して送信出力するようにされる。モニタ装置２側では、ネットワークアプリケーションのＧＵＩ画像を表示出力させるとともに、例えばこのＧＵＩ画像に対する操作も可能なように構成する。モニタ装置２側で行われたＧＵＩ操作に応じては、モニタ装置２と基地局装置１との間でコマンド通信が行われ、これにより、基地局装置１側でのネットワークアプリケーションの動作がコントロールされることとなる。
また、ネットワークアプリケーションをモニタ装置２に対してインストールすれば、このネットワークアプリケーションの動作は、モニタ装置２側のみで実行されるものとなる。つまり、モニタ装置２単体でＷｅｂサイトへのアクセスや、電子メールの送受信などを行うことができる。 In addition, as described above, the information transmission / reception system (base station apparatus 1, monitor apparatus 2) of the present embodiment has a function of connecting to the Internet (network), so that the base station apparatus 1 and the monitor apparatus 2 are connected to each other. Internet use other than data communication is also possible.
For example, it is possible to browse a Web content by accessing a Web site on the Internet, and to send and receive e-mails.
For this purpose, application software for browsing the website and application software such as a so-called mailer (herein also referred to as a network application) having an e-mail transmission / reception function are provided for the information transmission / reception system of the present embodiment. Just install it.
In the case of the present embodiment, since both the base station device 1 and the monitor device 2 have the Internet connection function, the network application such as the above-mentioned Web site browsing function and the e-mail transmission / reception function is the base station. It can be installed and executed on either the device 1 or the monitor device 2.
When the network application is installed in the base station apparatus 1, the network application operates on the base station apparatus 1 side, and access to the website and transmission / reception of e-mail are executed. Then, the base station apparatus 1 transmits and outputs a GUI image of the network application to the monitor apparatus 2. On the monitor device 2 side, a GUI image of the network application is displayed and output, and for example, an operation on the GUI image can be performed. In response to the GUI operation performed on the monitor device 2 side, command communication is performed between the monitor device 2 and the base station device 1, thereby controlling the operation of the network application on the base station device 1 side. The Rukoto.
If the network application is installed in the monitor device 2, the operation of the network application is executed only on the monitor device 2 side. That is, the monitor device 2 alone can access a website, send and receive e-mails, and the like.

上記した基地局装置１及びモニタ装置２から成る情報送受信システムに加えて、インターネット上に翻訳サーバを設置することで、本実施の形態としては、例えば図２に示されるネットワークシステムを構築する。
この場合、基地局装置１とモニタ装置２は、共にインターネット１００と接続した環境とする。これにより、基地局装置１とモニタ装置２との間でインターネット経由で映像／音声データ送受信のための通信を行えるようにしている。また、これまでの説明からも理解されるようにインターネット経由での通信は、インターネットと接続可能な環境でありさえすれば、端末間の物理的距離に制限はないものとされている。そこで、本実施の形態としては、情報送受信システムの使用形態として、基地局装置１とモニタ装置２とについて、相互に距離的に遠隔した地域にそれぞれを設置して使用することとしている。そして、この場合においては、基地局装置１をＡ国にて使用し、モニタ装置２をＢ国にて使用することとする。つまり、相互に遠隔する地域として、それぞれ異なる国にて基地局装置１とモニタ装置２を使用することとしているものである。また、この場合においては、Ａ国とＢ国とでは、それぞれ使用言語が異なっていることとする。Ａ国においては言語Ａが使用され、Ｂ国においては言語Ｂが使用される。 In addition to the information transmission / reception system including the base station apparatus 1 and the monitor apparatus 2 described above, a network system shown in FIG. 2 is constructed as the present embodiment by installing a translation server on the Internet.
In this case, the base station apparatus 1 and the monitor apparatus 2 are both connected to the Internet 100. Thereby, communication for video / audio data transmission / reception can be performed between the base station apparatus 1 and the monitor apparatus 2 via the Internet. Further, as understood from the above description, in the communication via the Internet, the physical distance between the terminals is not limited as long as the environment is connectable to the Internet. Therefore, in this embodiment, as a usage form of the information transmission / reception system, the base station apparatus 1 and the monitor apparatus 2 are installed and used in areas remote from each other. In this case, the base station device 1 is used in the country A, and the monitor device 2 is used in the country B. That is, the base station apparatus 1 and the monitor apparatus 2 are used in different countries as areas that are remote from each other. In this case, it is assumed that the languages used in Country A and Country B are different. Language A is used in country A, and language B is used in country B.

この図２により、本実施の形態のネットワークシステムの利用例、及び概略的な動作例について説明しておく。ここでの動作手順の順序は、図２において白抜きの円形の枠内に記載される自然数により示される。
ここで、例えばモニタ装置２を使用しているとされるユーザが、基地局装置側で取得できる映像／音声情報をモニタ装置２により出力させて視聴したいとおもい、そして、このための所定操作をモニタ装置２に対して行ったとする。この操作に応じては、モニタ装置２は、インターネット経由で基地局装置１を送信先として、映像／音声データの送信を要求するコマンド（データ送信要求コマンド）を送信する。 An example of use of the network system of the present embodiment and a schematic operation example will be described with reference to FIG. The order of the operation procedure here is indicated by a natural number described in a white circular frame in FIG.
Here, for example, a user who is assumed to be using the monitor device 2 wants to output and view the video / audio information that can be acquired on the base station device side by the monitor device 2, and performs a predetermined operation for this purpose. Suppose that it performed with respect to the monitor apparatus 2. FIG. In response to this operation, the monitor apparatus 2 transmits a command (data transmission request command) requesting transmission of video / audio data to the base station apparatus 1 as a transmission destination via the Internet.

インターネット経由で上記データ送信要求コマンドを受信した基地局装置１では、このコマンドに対する応答として、手順２として示すようにして、要求された映像／音声データをインターネット経由で送信出力する。
このときには、例えば基地局装置１は、データ送信要求のコマンドにより指定されていた映像／音声データを取得するための制御処理を実行することになる。例えば、データ送信要求のコマンドにより指定される映像／音声データが所定の放送チャンネルによるテレビジョン放送である場合には、内蔵テレビジョンチューナにより、この指定された放送チャンネルが選局されるように制御を実行する。そして、このようにして選局された放送チャンネルにより放送されるテレビジョン放送の映像／音声データをインターネット経由で送出する。また、この場合においては、送信データは、圧縮符号化された映像／音声データによるストリームデータの形式とすることとしている。つまり、いわゆるネットワーク上でのストリーミング送信の形態をとる。
ただし、この場合において送出される映像／音声データは、インターネット経由で直接的にモニタ装置２に対してレスポンスとして返送することは行わない。つまり、図２に示すようにして、基地局装置１は、本来はモニタ装置２に対して送信するべき映像／音声データを、インターネット上の翻訳サーバ３に対して送信するようにされる。 The base station apparatus 1 that has received the data transmission request command via the Internet transmits and outputs the requested video / audio data via the Internet as shown in step 2 as a response to this command.
At this time, for example, the base station apparatus 1 executes a control process for acquiring the video / audio data specified by the data transmission request command. For example, when the video / audio data specified by the data transmission request command is a television broadcast using a predetermined broadcast channel, control is performed so that the specified broadcast channel is selected by the built-in television tuner. Execute. Then, video / audio data of the television broadcast broadcast on the broadcast channel selected in this way is transmitted via the Internet. In this case, the transmission data is assumed to be in the form of stream data by compression / encoded video / audio data. That is, it takes the form of streaming transmission on a so-called network.
However, the video / audio data transmitted in this case is not sent back as a response directly to the monitor device 2 via the Internet. That is, as shown in FIG. 2, the base station apparatus 1 transmits video / audio data that should be transmitted to the monitor apparatus 2 to the translation server 3 on the Internet.

このためには、例えば予め、基地局装置１に対して、翻訳サーバ３のインターネット上でのアドレス（例えばＵＲＬ(Uniform Resource Locator)）を登録しておくようにされる。そして、基地局装置１は、上記したデータ送信要求コマンドの受信に応答した処理として、翻訳サーバ３に映像／音声データを送信するときには、先ず、上記のようにして登録されている翻訳サーバ３のアドレスを宛先として送信処理を実行するようにされる。また、翻訳サーバ３に対して映像／音声データを送信するときのフォーマットとして、本来の映像／音声データの送信先が先の手順１によりデータ送信要求コマンドを送信したモニタ装置２であることを示す情報（送信先情報）を、例えばパケットのヘッダなどに格納して送信を行うように規定しておくこととする。 For this purpose, for example, an address (for example, URL (Uniform Resource Locator)) of the translation server 3 on the Internet is registered in advance in the base station apparatus 1. When the base station apparatus 1 transmits the video / audio data to the translation server 3 as a process in response to the reception of the data transmission request command, first, the base station apparatus 1 first sends the translation server 3 registered as described above. Transmission processing is executed with the address as the destination. In addition, as a format for transmitting the video / audio data to the translation server 3, the original transmission destination of the video / audio data is the monitor device 2 that has transmitted the data transmission request command in the previous procedure 1. Information (transmission destination information) is defined to be stored in, for example, a packet header and transmitted.

ここで、上記基地局装置１から送信された映像／音声データは、例えばＡ国において視聴されることが想定されたコンテンツであり、従って、そのコンテンツ内容としては、言語Ａを理解できる人が視聴することを前提として制作されている。従って、音声データに関すれば、これを再生出力したときに聞こえる、発話音声は主として言語Ａとなっている。
この場合の翻訳サーバ３は、上記手順２によりストリーミング送信されてくる映像／音声データを受信取得していくことになるが、これに応じて、翻訳サーバ３は手順３として、翻訳メタデータを作成する。つまり、翻訳サーバ３は、受信取得していく映像／音声データのうち、音声データについては、これを再生したときに聞こえるとされる発話内容について、言語Ａによる音声認識を行い、この音声認識結果に基づいて、この発話内容を言語Ｂに翻訳したデータを生成していくようにされる。このようにして得られた翻訳データは、元の映像／音声データに対してメタデータ的な扱いとなるので、本明細書では翻訳メタデータということとしている。
また、本実施の形態における翻訳メタデータの形式としては、次の２つの何れかであることとする。つまり、１つには元の音声データの言語Ａによる発話部分を、言語Ｂで翻訳した発話に置き換えた、いわゆる吹き替え音声が記録された音声データ（吹き替え音声データ）とされる。あるいは、元の音声データの言語Ａによる発話部分を言語Ｂによる文字等として、画像に対して合成表示する、いわゆる字幕データとされる。 Here, the video / audio data transmitted from the base station apparatus 1 is content assumed to be viewed in, for example, country A. Therefore, the content can be viewed by a person who understands language A. It is produced on the premise of doing. Therefore, with regard to the voice data, the speech voice that is heard when the data is reproduced and output is mainly in the language A.
In this case, the translation server 3 receives and acquires the video / audio data stream-transmitted in the above procedure 2. In response to this, the translation server 3 creates the translation metadata as the procedure 3. To do. That is, the translation server 3 performs speech recognition in the language A on the utterance content that can be heard when the audio data among the video / audio data to be received and acquired is reproduced, and the result of the speech recognition On the basis of the above, data obtained by translating the utterance content into language B is generated. The translation data obtained in this way is treated as metadata with respect to the original video / audio data, and is therefore referred to as translation metadata in this specification.
Also, the translation metadata format in the present embodiment is one of the following two. That is, one is voice data (so-called “dubbing voice data”) in which so-called dubbed voice is recorded by replacing the utterance portion in the language A of the original voice data with the utterance translated in the language B. Or it is set as what is called caption data which synthesize | combines and displays with respect to an image the speech part by the language A of the original audio | voice data as a character by the language B, etc.

そして手順４として、翻訳サーバ３においては、上記のようにして生成した翻訳メタデータを、元の受信取得した映像／音声データに対して多重化するようにして付加するようにされる。そして、この翻訳メタデータを付加した映像／音声データを、基地局装置１からの送信出力時と同様に、ストリーミング送信によりインターネット経由で送信出力する。そして、このようにして送信出力される翻訳メタデータ付きの映像／音声データは、先の手順１によりデータ送信要求コマンドを送信したモニタ装置２が送信先として設定されている。この送信先設定は、例えば先に説明したように、基地局装置１から映像／音声データと共に送信されてくる送信先情報に基づいて行うことができる。 In step 4, the translation server 3 adds the translation metadata generated as described above to the original received / acquired video / audio data. Then, the video / audio data to which the translation metadata is added is transmitted and output via the Internet by streaming transmission in the same manner as the transmission output from the base station apparatus 1. The video / audio data with translation metadata transmitted and output in this way is set as the transmission destination by the monitor device 2 that has transmitted the data transmission request command in the previous procedure 1. For example, as described above, the transmission destination setting can be performed based on transmission destination information transmitted from the base station apparatus 1 together with video / audio data.

モニタ装置２では、先の手順１によるデータ送信要求コマンドに対する応答結果として、翻訳サーバ３から上記翻訳メタデータ付きの映像／音声データを受信取得することになる。そして、手順５として、この翻訳メタデータ付きの映像／音声データについてモニタ出力を行う。
ここで、上記したように、翻訳メタデータ付きの映像／音声データはストリーミング送信されてくるものであることから、モニタ装置２では、上記手順５としては、ストリーミングデータとしての映像／音声データのデコード処理を実行して再生時間軸が同期された画像／音声として再生出力するようにされる。 The monitor device 2 receives and acquires the video / audio data with the translation metadata from the translation server 3 as a response result to the data transmission request command in the previous procedure 1. Then, as procedure 5, monitor output is performed for the video / audio data with the translation metadata.
Here, as described above, since the video / audio data with the translation metadata is transmitted by streaming, the monitor device 2 decodes the video / audio data as the streaming data as the procedure 5 described above. The process is executed to reproduce and output as an image / sound whose reproduction time axis is synchronized.

また、例えばユーザ操作などに応じて、翻訳メタデータも再生出力させる指示が得られた場合には、映像／音声データから翻訳メタデータを分離抽出してデコードを行い、映像／音声データと共に翻訳メタデータも再生出力させる。翻訳メタデータが吹き替え音声データである場合には、映像／音声データにおける本来の音声データに代えて、この吹き替え音声データを、映像データの再生画像に再生時間軸が同期するようにして再生出力する。また、翻訳メタデータが字幕データの場合には、所定の表示態様によって、映像データの再生画像に対して字幕が合成表示されるようにして字幕データの再生を行う。 Also, for example, when an instruction to reproduce and output translation metadata is obtained in response to a user operation or the like, the translation metadata is separated and extracted from the video / audio data and decoded, and the translation metadata together with the video / audio data is decoded. Data is also reproduced and output. When the translation metadata is dubbed audio data, the dubbed audio data is reproduced and output so that the reproduction time axis is synchronized with the reproduced image of the video data, instead of the original audio data in the video / audio data. . When the translation metadata is subtitle data, the subtitle data is reproduced so that the subtitle is synthesized and displayed on the reproduced image of the video data according to a predetermined display mode.

このようにして、本実施の形態としては、基地局装置１とモニタ装置２とをそれぞれＡ国とＢ国との異なる使用言語の国で使用するような場合において、基地局装置１とモニタ装置２との間でのインターネット経由での映像／音声データの送受信について、翻訳サーバ３を介在させることとしている。これにより、基地局装置１から言語Ａによるコンテンツ内容の映像／音声データを送信したとしても、モニタ装置２では、言語Ｂにより翻訳された情報が付加された映像／音声データのコンテンツを受信取得して再生出力させることが可能となる。つまり、モニタ装置２を使用するユーザは、基地局装置１から送信される言語Ａによるコンテンツを、言語Ｂで理解できる形態で視聴することが可能となる。 Thus, in this embodiment, in the case where the base station apparatus 1 and the monitor apparatus 2 are used in different languages of countries A and B, respectively, the base station apparatus 1 and the monitor apparatus The translation server 3 is interposed for transmission / reception of video / audio data to / from 2 via the Internet. As a result, even if the video / audio data of the content content in the language A is transmitted from the base station device 1, the monitor device 2 receives and acquires the content of the video / audio data to which the information translated in the language B is added. Can be reproduced and output. That is, the user using the monitor device 2 can view the content in the language A transmitted from the base station device 1 in a form that can be understood in the language B.

従来においては、テレビジョン放送を運営する側や、映像／音声ソースを例えばＤＶＤなどをはじめとしたパッケージメディアとして制作する側が、予めの編集作業段階で、翻訳メタデータに相当するデータを作成して映像／音声データに付加するようにしていた。このため、特にテレビジョン放送を例に挙げると分かりやすいが、現実問題として、ほとんど全ての番組に翻訳メタデータ的な映像／音声データを付加するための制作、編集作業を行うことは不可能であることになる。特に、映像／音声データのコンテンツ内容が、ニュースやスポーツ実況などのリアルタイム性が要求される生番組である場合には、映像／音声データについて翻訳データのような付加情報を付加することは実質的に不可能となる。 Conventionally, the side that operates television broadcasts and the side that produces video / audio sources as package media such as DVDs create data corresponding to translation metadata at the pre-editing stage. It was added to video / audio data. For this reason, it is easy to understand by taking television broadcasting as an example, but as a practical matter, it is impossible to perform production and editing work to add translated metadata-like video / audio data to almost all programs. There will be. In particular, when the content of video / audio data is a live program that requires real-time performance such as news and sports, it is practical to add additional information such as translation data to the video / audio data. It becomes impossible.

これに対して本実施の形態においては、言語Ａによる映像／音声データについての付加情報の作成、多重化の処理は、翻訳サーバ３において、基地局装置１から送信されてくる映像／音声データを取得してのリアルタイム的な処理によって行われる。
このことは、この翻訳サーバ３に入力される、言語Ａによるコンテンツ内容の映像／音声データであれば、常に言語Ｂに翻訳した翻訳メタデータが付加された映像／音声データが得られるということを意味する。つまり、モニタ装置２のユーザは、基地局装置１から送信させた映像／音声データについて、常に翻訳情報が付加されたコンテンツとして享受して視聴することが可能となる。
そして、本実施の形態の翻訳サーバ３は、ストリーミング送信される映像／音声データを受信入力して翻訳メタデータ（付加情報）を生成して付加するが、翻訳メタデータを付加した映像／音声データを送出するときも、受信取得時のストリーミング送信の形式を保つようにされている。従って、翻訳サーバ３の処理が相応に高速でありさえすれば、基地局装置１→翻訳サーバ３→モニタ装置２の経路で伝送される映像／音声データのリアルタイム性を保つことも可能となる。これにより、例えばテレビジョン放送番組を例に挙げれば、ドラマなどの編集作業を伴って制作された番組はもちろんのこと、上記したような生番組であっても、特に区別することなく翻訳メタデータが付加されたコンテンツとして得ることができる。このことは、従来ではほぼ不可能であったことを考慮すると、大きなメリットであるということがいえる。
また、上記のようにテレビジョン放送を運営したり、或いは、映像／音声ソースをパッケージメディアとして制作する制作者側としてみれば、これらの制作者側のそれぞれが、例えば映像／音声データに対して翻訳メタデータ的な付加情報を作成して付加するための設備環境を整える必要もなくなる。これに対して、本実施の形態では、翻訳メタデータの生成及び付加は、翻訳サーバにて集約的に行われるということがいえる。 On the other hand, in the present embodiment, additional information creation / multiplexing processing for video / audio data in language A is performed by translating video / audio data transmitted from base station apparatus 1 in translation server 3. It is performed by real-time processing after acquisition.
This means that video / audio data to which translation metadata translated into language B is always added can be obtained for video / audio data of content contents in language A input to translation server 3. means. That is, the user of the monitor device 2 can always enjoy and view the video / audio data transmitted from the base station device 1 as content with translation information added thereto.
Then, the translation server 3 of the present embodiment receives and inputs streaming video / audio data to generate and add translation metadata (additional information), but the video / audio data to which the translation metadata is added. Even when sending the message, the streaming transmission format at the time of reception acquisition is maintained. Therefore, as long as the processing of the translation server 3 is reasonably fast, the real-time property of the video / audio data transmitted through the route of the base station device 1 → the translation server 3 → the monitor device 2 can be maintained. As a result, for example, in the case of a television broadcast program, translation metadata is not particularly distinguished even for live programs such as those described above, as well as programs produced with editing work such as dramas. Can be obtained as content to which is added. This can be said to be a great merit in consideration of what was impossible in the past.
Further, if the creator side who operates the television broadcasting as described above or produces the video / audio source as the package media, each of these producer side, for example, for the video / audio data, There is no need to prepare a facility environment for creating and adding additional information such as translation metadata. In contrast, in the present embodiment, it can be said that the generation and addition of translation metadata are performed collectively by the translation server.

以下、上記図２により説明した本実施の形態のシステムにおける動作を実現するための構成例について説明していくこととして、先ずは、図３を参照して、基地局装置１とモニタ装置２の内部構成例について説明する。 Hereinafter, a configuration example for realizing the operation in the system of the present embodiment described with reference to FIG. 2 will be described. First, referring to FIG. 3, the base station apparatus 1 and the monitor apparatus 2 An example of the internal configuration will be described.

先ず、図３に示す基地局装置１においては、所定のテレビジョン放送に対応する受信、選局、復調機能を有するテレビジョンチューナ（ＴＶチューナ）１１が備えられる。このＴＶチューナ１１では、アンテナANTにより受信された放送波を入力して、例えば制御部１５の制御に従って、指定のチャンネルの映像／音声信号が得られるように選局、復調処理を実行する。そして、このようにして得られた映像／音声信号を、例えば内部データバス１９を介するようにして映像／音声エンコーダ１２に対して転送する。 First, the base station apparatus 1 shown in FIG. 3 includes a television tuner (TV tuner) 11 having reception, channel selection, and demodulation functions corresponding to a predetermined television broadcast. In the TV tuner 11, a broadcast wave received by the antenna ANT is input, and channel selection and demodulation processing are executed so that a video / audio signal of a designated channel is obtained, for example, under the control of the control unit 15. The video / audio signal obtained in this way is transferred to the video / audio encoder 12 via, for example, the internal data bus 19.

また、映像／音声入力部１３は、外部ＡＶ機器から出力される映像／音声信号を入力して映像／音声データとして取り込むための部位とされる。例えば実際には、ＡＶ機器から出力される映像／音声信号を入力するための、所定の信号方式に対応する入力端子を所定組数備え、また、これらの端子から入力される映像／音声信号のうちから、例えば制御部１５により指定された映像／音声信号を選択するためのセレクタを備える。さらに、入力される映像／音声信号がアナログ信号である場合に、この入力信号をデジタル信号に変換するＡ／Ｄコンバータなどを備えて構成される。この映像／音声入力部１３から出力される映像／音声データも、制御部１５の制御に従って、内部データバス１９を介して映像／音声エンコーダ１２に対して転送される。 The video / audio input unit 13 is a part for inputting a video / audio signal output from an external AV device and capturing it as video / audio data. For example, in practice, a predetermined number of input terminals corresponding to a predetermined signal system for inputting video / audio signals output from an AV device are provided, and video / audio signals input from these terminals are provided. For example, a selector for selecting a video / audio signal designated by the control unit 15 is provided. In addition, when the input video / audio signal is an analog signal, an A / D converter that converts the input signal into a digital signal is provided. The video / audio data output from the video / audio input unit 13 is also transferred to the video / audio encoder 12 via the internal data bus 19 under the control of the control unit 15.

映像／音声エンコーダ１２では、上記のようにして内部データバス１９経由で転送されてくる映像／音声データを入力し、例えば所定の圧縮符号化方式に従って圧縮符号化された形式の圧縮映像／音声データに符号化変換する。この場合の圧縮符号化方式としては特に限定されるべきものではないが、現状であれば、例えばＭＰＥＧ(Moving Picture Experts Group)方式の範疇のものを採用することが考えられる。
そして、映像／音声エンコーダ１２から出力される圧縮映像／音声データは、制御部１５の制御により、次に説明するローカル通信部１７又はネットワークインターフェイス１８に対して転送される。 The video / audio encoder 12 receives the video / audio data transferred via the internal data bus 19 as described above, and is compressed video / audio data in a format that is compression-encoded according to, for example, a predetermined compression encoding method. Is encoded and converted. The compression encoding method in this case is not particularly limited, but under the present circumstances, for example, it is conceivable to adopt a category in the MPEG (Moving Picture Experts Group) method.
The compressed video / audio data output from the video / audio encoder 12 is transferred to the local communication unit 17 or the network interface 18 described below under the control of the control unit 15.

先に図１により説明したように、本実施の形態の情報送受信システムでは、基地局装置１とモニタ装置２は、ローカル通信（例えば近距離無線通信）とインターネット経由によるネットワーク通信との両者の形態の通信により、外部との通信を行うことが可能とされており、基地局装置１とモニタ装置２との間での通信についても、上記ローカル通信とネットワーク通信との何れによっても行うことが可能となっている。 As described above with reference to FIG. 1, in the information transmission / reception system of this embodiment, the base station apparatus 1 and the monitor apparatus 2 are both forms of local communication (for example, short-range wireless communication) and network communication via the Internet. Communication between the base station apparatus 1 and the monitor apparatus 2 can be performed by any of the local communication and the network communication. It has become.

ローカル通信部１７は、本実施の形態としてのローカル通信に採用される所定の通信プロトコルに従っての通信処理を実行する。あくまでも一例であるが、本実施の形態の場合、ローカル通信の通信プロトコルとしてはＴＣＰ／ＩＰが採用される。つまり、ローカル通信であるが、その通信プロトコルとしては、インターネットと同様としている。ＴＣＰ／ＩＰは、既にインターネットで広く普及していることから、応用が容易であるという利点を有している。
例えば、送信時においては、ＴＣＰ／ＩＰに従った送受信が可能な形式となるように、パケット化などの処理を施し、例えば所定のキャリア変調などを施して電波として送信出力する。また、受信時においては、先ず、電波として伝送されてきた送信情報を受信してパケット単位での情報に復調する。そして、ＴＣＰ／ＩＰの通信プロトコルに従ってパケット化を解くなどの処理を実行することで、コマンド等やコンテンツ情報の実データを得るようにされる。
なお、ここでは、ローカル通信は電波通信とされているが、この電波通信の方式としては特に限定されるべきものではない。また、近距離での基地局装置１とモニタ装置２との通信が良好に行えるのでありさえすれば、例えば赤外線通信などによる無線通信としてもよい。有線による所定通信方式による通信も可能なように構成してよい。 The local communication unit 17 executes communication processing according to a predetermined communication protocol employed for local communication as the present embodiment. In the present embodiment, TCP / IP is adopted as a communication protocol for local communication, which is merely an example. That is, although it is local communication, the communication protocol is the same as that of the Internet. TCP / IP has the advantage of being easy to apply because it is already widely used on the Internet.
For example, at the time of transmission, processing such as packetization is performed so that the transmission / reception according to TCP / IP is possible, and for example, predetermined carrier modulation is performed and transmitted as radio waves. At the time of reception, first, transmission information transmitted as a radio wave is received and demodulated into information in units of packets. Then, actual data such as commands and content information is obtained by executing processing such as packetization in accordance with the TCP / IP communication protocol.
Here, the local communication is radio communication, but the radio communication method is not particularly limited. Further, as long as the communication between the base station apparatus 1 and the monitor apparatus 2 can be satisfactorily performed at a short distance, wireless communication such as infrared communication may be used. You may comprise so that communication by the predetermined communication system by wire is also possible.

インターネット（ネットワーク）１００経由での通信は、ネットワークインターフェイス１８により行われる。周知のようにして、インターネットによる通信プロトコルもＴＣＰ／ＩＰであり、従って、ネットワークインターフェイス１８としても、ＴＣＰ／ＩＰに従ったインターネット経由での通信が行われるように処理を実行する。
この場合にも、送信時においては、ＴＣＰ／ＩＰに従った送受信が可能な形式となるように、パケット化などの処理を施し、インターネット１００上にある目的の送信先端末に対して送信出力を行う。また、受信時においては、インターネット１００で送信元から送信されてきた情報についてパケット化を解くなどの所定の復調処理を実行して、上記と同様に、コマンドやコンテンツ情報の実データを得る。
なお、ネットワークインターフェイス１８の実際としては、例えばイーサネット（Ethernet 登録商標）などのＬＡＮに対応する構成とすることが考えられる。この構成を採れば、インターネットへの接続については、ｘＤＳＬ(Digital Subscriber Line)、ＦＴＴＨ(Fiber To The Home)、ＣＡＴＶ(Cable Television)などに代表されるいわゆるブロードバンドといわれる回線を使用できる。また、モデムを備えた構成とすれば、電話回線を使用してのインターネットの接続が可能とされることにもなる。 Communication via the Internet (network) 100 is performed by the network interface 18. As is well known, the communication protocol by the Internet is also TCP / IP. Therefore, the network interface 18 executes processing so that communication via the Internet according to TCP / IP is performed.
Also in this case, at the time of transmission, processing such as packetization is performed so that transmission / reception according to TCP / IP is possible, and transmission output is sent to the target destination terminal on the Internet 100. Do. Further, at the time of reception, predetermined demodulation processing such as packetization is performed on information transmitted from the transmission source on the Internet 100, and actual data of commands and content information is obtained in the same manner as described above.
Note that the network interface 18 may be configured to be compatible with a LAN such as Ethernet (registered trademark). If this configuration is adopted, a so-called broadband line represented by xDSL (Digital Subscriber Line), FTTH (Fiber To The Home), CATV (Cable Television), etc. can be used for connection to the Internet. In addition, if the configuration is provided with a modem, it is possible to connect to the Internet using a telephone line.

先に説明したようにして、映像／音声エンコーダ１２にて圧縮符号化された圧縮映像／音声データは、モニタ装置２に対して送信出力すべきものとされる。そして、この圧縮映像／音声データをローカル通信によりモニタ装置２に送信するときには、制御部１５は、この圧縮映像／音声データを、内部データバス１９経由でローカル通信部１７に対して入力させるように制御する。ローカル通信部１７では、入力された圧縮映像／音声データについて、上記したようにしてパケット化、キャリア変調などの処理を実行して電波として送出する。
また、インターネット経由での通信により圧縮映像／音声データをモニタ装置２に送信するときには、制御部１５は、この圧縮映像／音声データを、内部データバス１９経由でネットワークインターフェイス１８に入力させるように制御する。ネットワークインターフェイス１８も、入力された圧縮映像／音声データについて、上記のようしてパケット化などを行った上で、インターネット１００経由で目的とする送信先に送信出力するするようにされる。また、圧縮映像／音声データが通常のビデオ信号、オーディオ信号のようにして時系列性（時間的連続性）を有するものである場合には、送信データとして、いわゆるストリーミングデータといわれる、時間的連続性が保たれたデータとして送出することも可能とされる。このためには、例えば、制御部１５が、映像／音声エンコーダ１２からネットワークインターフェイス１８への圧縮映像／音声データの転送タイミングと、ネットワークインターフェイス１８におけるパケット送出タイミングを制御するようにされる。 As described above, the compressed video / audio data compressed and encoded by the video / audio encoder 12 should be transmitted and output to the monitor device 2. When the compressed video / audio data is transmitted to the monitor device 2 by local communication, the control unit 15 inputs the compressed video / audio data to the local communication unit 17 via the internal data bus 19. Control. In the local communication unit 17, the input compressed video / audio data is subjected to processing such as packetization and carrier modulation as described above, and is transmitted as a radio wave.
Further, when transmitting the compressed video / audio data to the monitor device 2 by communication via the Internet, the control unit 15 controls to input the compressed video / audio data to the network interface 18 via the internal data bus 19. To do. The network interface 18 is also configured to packetize the input compressed video / audio data as described above, and transmit and output it to a target transmission destination via the Internet 100. In addition, when the compressed video / audio data has a time-series property (temporal continuity) like a normal video signal or audio signal, the transmission data is referred to as so-called streaming data. It is also possible to send it out as data with preserved characteristics. For this purpose, for example, the control unit 15 controls the transfer timing of compressed video / audio data from the video / audio encoder 12 to the network interface 18 and the packet transmission timing in the network interface 18.

制御部１５は、これまでの説明から理解されるように、基地局装置１において各種の制御処理を実行する。この制御部１５は、例えばＣＰＵ(Central Processing Unit)、ＲＯＭ、ＲＡＭなどから成るマイクロコンピュータを備えて構成される。周知のようにして、ＣＰＵは、ＲＯＭに記憶保持されているプログラムや各種設定情報に従って処理を実行する。ＲＡＭは、ＣＰＵが実行すべきプログラムが展開される領域とされる。また、ＣＰＵが処理を実行するときの演算作業領域として利用される。 As understood from the above description, the control unit 15 executes various control processes in the base station apparatus 1. The control unit 15 includes a microcomputer composed of, for example, a CPU (Central Processing Unit), ROM, RAM, and the like. As is well known, the CPU executes processing in accordance with programs stored in the ROM and various setting information. The RAM is an area where a program to be executed by the CPU is expanded. Further, it is used as a calculation work area when the CPU executes processing.

なお、例えば基地局装置１に対してハードディスクドライブ（ＨＤＤ）等を備えて、このＨＤＤに映像／音声データ等をはじめとしたコンテンツデータを記憶可能とした構成とすることも考えられる。このような構成とした場合には、ＨＤＤに記憶されているコンテンツデータも、ローカル通信若しくはインターネット経由でモニタ装置２側に対して送信出力することが可能とされる。 Note that, for example, a configuration in which a hard disk drive (HDD) or the like is provided for the base station apparatus 1 and content data such as video / audio data can be stored in the HDD is also conceivable. In such a configuration, content data stored in the HDD can also be transmitted and output to the monitor device 2 side via local communication or the Internet.

続いて、同じ図３に示すモニタ装置２においても、外部との通信を行う手段として、ローカル通信部２４と、ネットワークインターフェイス２５が備えられる。
これらローカル通信部２４及びネットワークインターフェイス２５としては、例えば、前述した基地局装置１側のローカル通信部１７及びネットワークインターフェイス１８と同様の構成とされればよい。 3 also includes a local communication unit 24 and a network interface 25 as means for performing communication with the outside.
The local communication unit 24 and the network interface 25 may have the same configuration as the local communication unit 17 and the network interface 18 on the base station device 1 side described above, for example.

ここで、例えば、基地局装置１側のローカル通信部１７からコンテンツ情報として圧縮符号化された映像／音声データ（圧縮映像／音声データ）が送信出力された場合、モニタ装置２のローカル通信部２４では、受信復調処理によって、送信されてきた圧縮映像／音声データを抽出するようにされる。 Here, for example, when video / audio data (compressed video / audio data) compressed and encoded as content information is transmitted and output from the local communication unit 17 on the base station device 1 side, the local communication unit 24 of the monitor device 2 Then, the transmitted compressed video / audio data is extracted by the reception demodulation process.

同様にして、インターネット経由で、コンテンツ情報として圧縮映像／音声データが送信されてきた場合、モニタ装置２側のネットワークインターフェイス２５としても、これを受信してパケット化を解くなどの復調処理を行って圧縮映像／音声データを抽出する。
そして、上記のようにしてローカル通信部２４又はネットワークインターフェイス２５により抽出された圧縮映像／音声データは、制御部２２の制御によって映像／音声デコーダ２１に対して内部データバス２９を経由して転送される。 Similarly, when compressed video / audio data is transmitted as content information via the Internet, the network interface 25 on the monitor device 2 side receives this and performs demodulation processing such as unpacking. Extract compressed video / audio data.
The compressed video / audio data extracted by the local communication unit 24 or the network interface 25 as described above is transferred to the video / audio decoder 21 via the internal data bus 29 under the control of the control unit 22. The

また、例えば図２にて説明したようにして翻訳サーバ３からインターネット経由で圧縮映像／音声データが送信されてくる場合、この圧縮映像／音声データには翻訳メタデータが多重化されている。この翻訳メタデータを再生出力する支持が得られた場合、ネットワークインターフェイス２５は、制御部２２の制御に従って、パケットとして送信されてくる映像／音声データのストリーミングデータから翻訳メタデータを格納しているパケットを分離し、このパケットから翻訳メタデータの実データ部分を抜き出していくことで、翻訳メタデータを分離抽出するようにされる。このようにして分離抽出された翻訳メタデータは、メタデータ処理部３２に対して転送される。 For example, when the compressed video / audio data is transmitted from the translation server 3 via the Internet as described with reference to FIG. 2, translation metadata is multiplexed on the compressed video / audio data. When support for reproducing and outputting the translation metadata is obtained, the network interface 25, in accordance with control of the control unit 22, stores a packet storing translation metadata from streaming data of video / audio data transmitted as a packet. And the translation metadata is separated and extracted by extracting the actual data portion of the translation metadata from the packet. The translation metadata separated and extracted in this way is transferred to the metadata processing unit 32.

映像／音声デコーダ２１では、上記のようにして転送されてくる圧縮映像／音声データを入力して、この圧縮映像／音声データの形式に応じた復号化処理を実行して、伸長された映像信号データ、音声信号データを得るようにされる。そして、制御部２２の制御により、このようにして得られた映像信号データ、音声信号データのうち、映像信号データについては表示制御部２６に対して転送させ、音声信号データは音声処理部２８に対して転送させる。 The video / audio decoder 21 receives the compressed video / audio data transferred as described above, executes a decoding process according to the format of the compressed video / audio data, and decompresses the video signal. Data and audio signal data are obtained. Under the control of the control unit 22, among the video signal data and audio signal data thus obtained, the video signal data is transferred to the display control unit 26, and the audio signal data is sent to the audio processing unit 28. In response.

表示制御部２６では、映像／音声デコーダ２１から映像信号データが転送されてきたときには、この映像信号データが表示部２７により表示されるようにするための信号処理及び、表示部２７に対する駆動制御を実行する。また、例えばタッチパネル操作などのためのＧＵＩ画像を表示出力するような場合には、制御部２２の指示に応じて、表示制御部２６において、ＧＵＩ画像としての画像データを生成し、これが表示部２７により画像として表示されるように制御処理を実行する。また、映像／音声デコーダ２１から転送されてくる映像信号データを主画像とし、ＧＵＩ画像部分を、この主画像における所要位置に対して重畳表示させるための画像信号処理も、ここでは表示制御部２６において実行するものとされる。 When the video signal data is transferred from the video / audio decoder 21, the display control unit 26 performs signal processing for displaying the video signal data on the display unit 27 and drive control for the display unit 27. Execute. For example, when a GUI image for a touch panel operation or the like is displayed and output, the display control unit 26 generates image data as a GUI image in response to an instruction from the control unit 22. The control process is executed so as to be displayed as an image. Further, here, the image signal processing for displaying the video signal data transferred from the video / audio decoder 21 as a main image and displaying the GUI image portion superimposed on a required position in the main image is also performed here. It shall be executed in

表示部２７は、ＬＣＤ（Liquid Crystal Display）などの表示デバイスにより構成されるもので、図１に示したようにして、その表示画面部がモニタ装置の全面に表出するようにして設けられる。上記のようにして表示制御部２６が動作を実行することで、この表示部２７としての表示画面に対して、例えば基地局装置１側で取得して送信されてきた映像／音声データの映像や、ＧＵＩ画像などが表示出力されることになる。 The display unit 27 is configured by a display device such as an LCD (Liquid Crystal Display), and is provided such that the display screen unit is exposed on the entire surface of the monitor device as shown in FIG. When the display control unit 26 executes the operation as described above, for example, a video of the video / audio data acquired and transmitted on the base station apparatus 1 side or the like on the display screen as the display unit 27 A GUI image or the like is displayed and output.

音声信号処理部２８では、映像／音声デコーダ２１から転送されてきた音声信号データを入力して所要のデジタル音声信号処理、Ｄ／Ａ変換、及び増幅処理などを実行し、最終的にスピーカ２９から音声を出力させる。 The audio signal processing unit 28 inputs the audio signal data transferred from the video / audio decoder 21 and executes necessary digital audio signal processing, D / A conversion, amplification processing, and the like. Output audio.

ここで、映像／音声デコーダにおけるデコード処理としては、圧縮映像／音声データを対象としているが、ここでの圧縮映像／音声データとしては、例えばＭＰＥＧ方式によ映像信号データ及び音声信号データのようにして、映像信号データと音声信号データとについて再生時間軸が同期するようにして圧縮符号化されるものの他、例えば映像信号データのみを圧縮符号化したものや、音声信号データのみを圧縮符号化したものにも対応可能なように構成できる。さらには、静止画としての所定フォーマットによる画像データについてのデコード処理にも対応可能とされている。
なお、上記したように映像／音声とで再生時間軸が同期するようにして圧縮符号化される映像／音声データは、本実施の形態の場合、ストリーミングデータとして伝送されて時間的連続性を有して再生されることが要求される。このような映像／音声データをモニタ装置２側で受信した場合には、映像／音声デコーダ２１において、映像出力と音声出力とが再生時間的に同期して再生されるようにして、復号化処理を実行するようにされる。また、このようにして復号化される映像信号データ及び音声信号データが、連続性を有して途切れることなく再生されるように、制御部２２は、映像／音声デコーダ２１における復号処理タイミング、及び復号されたデータの、表示制御部２６又は音声処理部２８への転送タイミングなどを制御するようにされる。 Here, the decoding processing in the video / audio decoder is targeted for compressed video / audio data, but the compressed video / audio data here is, for example, video signal data and audio signal data according to the MPEG system. The video signal data and the audio signal data are compressed and encoded so that the reproduction time axes are synchronized. For example, only the video signal data is compressed and the audio signal data is compressed and encoded. It can be configured to be compatible with anything. Furthermore, it is possible to cope with decoding processing of image data in a predetermined format as a still image.
In this embodiment, video / audio data that is compression-encoded so that the playback time axis is synchronized with video / audio as described above is transmitted as streaming data and has temporal continuity. And is required to be played. When such video / audio data is received on the monitor device 2 side, the video / audio decoder 21 decodes the video output and the audio output so that the video output and the audio output are reproduced in synchronization with the reproduction time. To be executed. In addition, the control unit 22 performs a decoding process timing in the video / audio decoder 21 so that the video signal data and audio signal data decoded in this way are reproduced without interruption. The transfer timing of the decoded data to the display control unit 26 or the audio processing unit 28 is controlled.

また、メタデータ処理部３２では、入力された翻訳メタデータが適正に再生出力されるようにするための所要のデータ処理を実行する。
翻訳メタデータが吹き替え音声データである場合には、この吹き替え音声データとしても、元の映像／音声データの音声データと同様にして、映像データと再生時間軸が同期されるようにして圧縮符号化された形式となっている。この場合の、メタデータ処理部３２の処理としては、例えば制御部２２の制御に従って、映像／音声デコーダ２１にてデコード処理される映像データの再生時間軸に同期するようにして、復号処理を実行する。なお、この復号処理としては、吹き替え音声データを映像／音声デコーダ２１に転送して、この映像／音声デコーダ２１にて映像データと再生時間軸を同期させた復号処理を実行させるようにしてもよい。そして、このようにして復号して得られた吹き替え音声データの音声信号データを、元の映像／音声データの音声データに代えて、音声処理部２８に対して転送する。音声処理部２８がこの入力された音声信号データについて所要の信号処理を実行して最終的にスピーカ２９から音声として出力させることで、モニタ装置２においては、元の映像／音声データの画像を表示部２７に対して表示させがら、この画像に対して再生時間が同期した状態で、吹き替え音声データに基づいた吹き替え音声が出力されることとなる。 Further, the metadata processing unit 32 performs necessary data processing for properly reproducing and outputting the input translation metadata.
If the translation metadata is dubbed audio data, this dubbed audio data is also compressed and encoded so that the video data and playback time axis are synchronized in the same way as the audio data of the original video / audio data. It has become a format. In this case, as the processing of the metadata processing unit 32, for example, under the control of the control unit 22, the decoding processing is executed in synchronization with the reproduction time axis of the video data decoded by the video / audio decoder 21. To do. As this decoding process, the dubbed audio data may be transferred to the video / audio decoder 21, and the video / audio decoder 21 may execute a decoding process in which the video data and the reproduction time axis are synchronized. . Then, the audio signal data of the dubbed audio data obtained by decoding in this way is transferred to the audio processing unit 28 instead of the audio data of the original video / audio data. The audio processing unit 28 performs necessary signal processing on the input audio signal data and finally outputs it as audio from the speaker 29, so that the monitor device 2 displays an image of the original video / audio data. While being displayed on the unit 27, the dubbing sound based on the dubbing sound data is output in a state where the reproduction time is synchronized with the image.

また、翻訳メタデータが字幕データである場合には、メタデータ処理部３２は、この翻訳メタデータである字幕データの形式に対応したデコード処理を行うことになる。このデコード処理によっては、元の映像／音声データの画像に対して合成表示させるべき字幕画像のデータが得られることになる。また、字幕データは、例えば元の映像／音声データの映像データ（主画像）に対する表示出力タイミング（再生時間）を示す時間情報を含む構造となっているものとされる。そして、メタデータ処理部３２では、この時間情報に従った表示出力タイミングとなるように、映像／音声デコーダ２１から表示処理部２６に転送される主画像としての映像データの転送タイミングと同期が得られるようなタイミングで、デコード出力である字幕画像データを、表示処理部２６に転送するようにされる。このようなメタデータ処理部３２における、時間情報に基づいた字幕画像データの転送タイミングは、制御部２２が上記時間情報に基づいて制御を実行するようにすればよい。
そして、表示処理部２６では上記のようにして入力される主画像としての映像データに対して字幕画像データを重畳するようにして合成する画像合成処理を実行したうえで、この合成処理により得られた映像データにより表示部２７に画像表示を行う。 When the translation metadata is caption data, the metadata processing unit 32 performs a decoding process corresponding to the format of caption data that is the translation metadata. By this decoding process, subtitle image data to be synthesized and displayed with respect to the original video / audio data image is obtained. The caption data has a structure including time information indicating display output timing (reproduction time) for the video data (main image) of the original video / audio data, for example. The metadata processing unit 32 synchronizes with the transfer timing of the video data as the main image transferred from the video / audio decoder 21 to the display processing unit 26 so that the display output timing according to the time information is obtained. The subtitle image data that is the decoded output is transferred to the display processing unit 26 at such a timing. The transfer timing of subtitle image data based on time information in the metadata processing unit 32 may be controlled by the control unit 22 based on the time information.
Then, the display processing unit 26 executes the image composition process for compositing the video data as the main image input as described above so as to superimpose the caption image data, and then obtains the result by this composition process. An image is displayed on the display unit 27 by the video data.

また、この図３のモニタ装置２において示される操作入力部３０は、操作入力に関するユーザインターフェイスとして、モニタ装置２に備えられる操作子、操作入力部、及びこれらの操作子、操作入力部に対して行われた操作に応じた操作コマンドを発生して制御部２２に出力するためのデバイスなどを一括して示したものとされる。従って、この操作入力部３０としては、表示部２７に対して設けられるタッチパネル３０ａを備えて構成されるも含む。
また、特に図示してはいないが、モニタ装置２とは別体として付属されるリモートコントローラと、このリモートコントローラから送信されるコマンド信号を受信して上記操作コマンドを発生するような構成も、操作入力部３０として含まれる。 Further, the operation input unit 30 shown in the monitor device 2 of FIG. 3 serves as a user interface related to operation input with respect to the operation elements, operation input units, and these operation elements and operation input units provided in the monitor apparatus 2. A device for generating an operation command corresponding to the performed operation and outputting the operation command to the control unit 22 is collectively shown. Accordingly, the operation input unit 30 includes a touch panel 30 a provided for the display unit 27.
Although not shown in particular, a remote controller attached as a separate body from the monitor device 2 and a configuration for receiving the command signal transmitted from the remote controller and generating the operation command can also be used. An input unit 30 is included.

制御部２２では、操作入力部３０から出力された操作コマンドに応じて、その操作コマンドに対応する動作が得られるように、各部の制御を実行する。例えば、モニタ装置２にて完結する動作として、ＬＣＤとしての表示部２７の明るさをコントロールする操作が行われたとすれば、例えば制御部２２は、表示制御部２６を制御してＬＣＤのバックライトの光量制御を実行するようにされる。 In response to the operation command output from the operation input unit 30, the control unit 22 executes control of each unit so that an operation corresponding to the operation command is obtained. For example, if an operation for controlling the brightness of the display unit 27 as an LCD is performed as an operation completed in the monitor device 2, for example, the control unit 22 controls the display control unit 26 to control the backlight of the LCD. The amount of light control is executed.

また、例えば、モニタ装置２の操作入力部３０に対する操作によっては、基地局装置１の各種動作をコントロールすることも可能なように構成されている。例として、操作入力部３０に対する操作として、基地局装置１のＴＶチューナ１１により受信選局している放送チャンネルを切り換えるための操作が行われたとする。この操作に応じた操作コマンドを入力した制御部２２は、ローカル通信部２４又はネットワークインターフェイス２５に対して、チャンネル切り換え要求のコマンドの送信を指示する。このとき、基地局装置１との通信が、ローカル通信により確立されていれば、制御部２２は、チャンネル切り換え要求のコマンドの送信をローカル通信部２４に対して指示することになる。これに対して、インターネット経由での通信により確立されていれば、ネットワークインターフェイス２５に対して指示を行うことになる。 Further, for example, it is configured such that various operations of the base station device 1 can be controlled by an operation on the operation input unit 30 of the monitor device 2. As an example, it is assumed that an operation for switching the broadcast channel selected for reception by the TV tuner 11 of the base station apparatus 1 is performed as an operation on the operation input unit 30. The control unit 22 having input an operation command corresponding to this operation instructs the local communication unit 24 or the network interface 25 to transmit a channel switching request command. At this time, if communication with the base station apparatus 1 is established by local communication, the control unit 22 instructs the local communication unit 24 to transmit a command for channel switching. On the other hand, if it is established by communication via the Internet, an instruction is given to the network interface 25.

上記した制御部２２の制御によって、ローカル通信部２４又はネットワークインターフェイス２５からは、チャンネル切り換え要求のコマンドが基地局装置１に対して送信され、基地局装置１側では、このコマンドをローカル通信部１７又はネットワークインターフェイス１８により受信復調して、制御部１５に転送する。制御部１５では、このコマンドが指示するチャンネルに切り換えが行われるようにＴＶチューナ１１を制御する。この結果、モニタ装置２側の映像／音声出力は、これまで出力されていた放送チャンネルの映像／音声に代わり、ユーザが操作により指定したチャンネルの映像／音声に切り換わることとなる。 Under the control of the control unit 22 described above, a channel switching request command is transmitted from the local communication unit 24 or the network interface 25 to the base station device 1, and the base station device 1 transmits this command to the local communication unit 17. Alternatively, it is received and demodulated by the network interface 18 and transferred to the control unit 15. The controller 15 controls the TV tuner 11 so that the channel designated by this command is switched. As a result, the video / audio output on the monitor device 2 side is switched to the video / audio of the channel designated by the user operation instead of the video / audio of the broadcast channel that has been output so far.

制御部２２は、モニタ装置２における各種の制御処理を実行する。この制御部２２としても、ＣＰＵ(Central Processing Unit)、ＲＯＭ、ＲＡＭなどから成るマイクロコンピュータを備えて構成される。 The control unit 22 executes various control processes in the monitor device 2. The control unit 22 also includes a microcomputer including a CPU (Central Processing Unit), ROM, RAM, and the like.

なお、本実施の形態の情報送受信システムとしての基地局装置１及びモニタ装置２は、インターネットとの接続機能を備えている。従って、このインターネット接続機能を利用して、基地局装置１及びモニタ装置２間のデータ通信以外のインターネット利用も可能である。
例えばインターネット上にあるＷｅｂサイトにアクセスしてＷｅｂコンテンツの閲覧を行うことも可能であり、また、電子メールの送受信なども可能とされる。このためには、本実施の形態の情報送受信システムに対して、Ｗｅｂサイト閲覧のためのアプリケーションソフトウェアや、電子メールの送受信機能を有する、いわゆるメーラなどのアプリケーションソフトウェア（ここではネットワークアプリケーションともいう）をインストールするようにすればよい。
本実施の形態の場合には、基地局装置１とモニタ装置２の何れもインターネット接続機能を有していることから、上記したＷｅｂサイト閲覧機能や電子メール送受信機能などのネットワークアプリケーションは、基地局装置１とモニタ装置２のいずれにもインストールして実行させることができる。
ネットワークアプリケーションを基地局装置１にインストールした場合には、この基地局装置１側において、ネットワークアプリケーションが動作して、Ｗｅｂサイトへのアクセスや電子メールの送受信が実行されることになる。そして、基地局装置からは、ネットワークアプリケーションのＧＵＩ画像をモニタ装置２に対して送信出力するようにされる。モニタ装置２側では、ネットワークアプリケーションのＧＵＩ画像を表示出力させるとともに、例えばこのＧＵＩ画像に対する操作も可能なように構成する。モニタ装置２側で行われたＧＵＩ操作に応じては、モニタ装置２と基地局装置１との間でコマンド通信が行われ、これにより、基地局装置１側でのネットワークアプリケーションの動作がコントロールされることとなる。
また、ネットワークアプリケーションをモニタ装置２に対してインストールすれば、このネットワークアプリケーションの動作は、モニタ装置２側のみで実行されるものとなる。つまり、モニタ装置２単体でＷｅｂサイトへのアクセスや、電子メールの送受信などを行うことができる。 Note that the base station apparatus 1 and the monitor apparatus 2 as the information transmission / reception system of the present embodiment have a function of connecting to the Internet. Therefore, it is possible to use the Internet other than data communication between the base station device 1 and the monitor device 2 by using this Internet connection function.
For example, it is possible to browse a Web content by accessing a Web site on the Internet, and to send and receive e-mails. For this purpose, application software for browsing the website and application software such as a so-called mailer (herein also referred to as a network application) having an e-mail transmission / reception function are provided for the information transmission / reception system of the present embodiment. Just install it.
In the case of the present embodiment, since both the base station device 1 and the monitor device 2 have the Internet connection function, the network application such as the above-mentioned Web site browsing function and the e-mail transmission / reception function is the base station. It can be installed and executed on either the device 1 or the monitor device 2.
When the network application is installed in the base station apparatus 1, the network application operates on the base station apparatus 1 side, and access to the website and transmission / reception of e-mail are executed. Then, the base station apparatus transmits and outputs a GUI image of the network application to the monitor apparatus 2. On the monitor device 2 side, a GUI image of the network application is displayed and output, and for example, an operation on the GUI image can be performed. In response to the GUI operation performed on the monitor device 2 side, command communication is performed between the monitor device 2 and the base station device 1, thereby controlling the operation of the network application on the base station device 1 side. The Rukoto.
If the network application is installed in the monitor device 2, the operation of the network application is executed only on the monitor device 2 side. That is, the monitor device 2 alone can access a website, send and receive e-mails, and the like.

続いて、翻訳サーバ３について説明する。
図４は、翻訳サーバ３の内部構成例を示している。この図に示す翻訳サーバ３としては、受信処理部４１、音声データ抽出部４２、メタデータ作成部４３、合成処理部４４、送信処理部４５を備えて成るものとされる。また、これらの各部は、ハードウェア的にはそれぞれ独立した装置部、若しくは回路モジュールなどとして構成される。
受信処理部４１では、インターネット１００を経由して基地局装置１からストリーミング送信されてきた映像／音声データ（圧縮映像／音声データである）を、インターネットの通信プロトコル（ＴＣＰ／ＩＰ：Transmission Control Protocol/Internet Protocol）に従って受信取得するための処理を実行する。つまり、例えばパケット単位のシーケンスにより送信されてくる映像／音声データについて、パケット化を解き、映像／音声のストリームデータを得るようにされる。そして、この映像／音声のストリームデータを、音声データ抽出部４２に対して転送する。 Next, the translation server 3 will be described.
FIG. 4 shows an internal configuration example of the translation server 3. The translation server 3 shown in this figure includes a reception processing unit 41, an audio data extraction unit 42, a metadata creation unit 43, a synthesis processing unit 44, and a transmission processing unit 45. Each of these units is configured as an independent device unit or a circuit module in terms of hardware.
In the reception processing unit 41, video / audio data (compressed video / audio data) stream-transmitted from the base station apparatus 1 via the Internet 100 is transmitted to an Internet communication protocol (TCP / IP: Transmission Control Protocol / A process for receiving and acquiring is executed according to the Internet Protocol. That is, for example, video / audio data transmitted by a sequence in packet units is depacketized to obtain video / audio stream data. The video / audio stream data is transferred to the audio data extraction unit 42.

音声データ抽出部４２は、転送されてくる映像／音声データ（ストリームデータ）から、音声データを抽出する。ただし、音声データ抽出部４２では、転送されてきた元の映像／音声データ（ストリームデータ）から、音声データを分離することは行わない。つまり、音声データ抽出部４２による音声データの抽出処理結果としては、元のままの映像／音声データと、この映像／音声データにおける音声データとが得られることになる。
元の映像／音声データから分離された音声データは、メタデータ作成部４３に対して転送される。また、元の映像／音声データそのものは、所定のタイミングで合成処理部４４に対して転送される。 The audio data extraction unit 42 extracts audio data from the transferred video / audio data (stream data). However, the audio data extraction unit 42 does not separate the audio data from the transferred original video / audio data (stream data). That is, as the result of the audio data extraction processing by the audio data extraction unit 42, the original video / audio data and the audio data in this video / audio data are obtained.
The audio data separated from the original video / audio data is transferred to the metadata creation unit 43. The original video / audio data itself is transferred to the synthesis processing unit 44 at a predetermined timing.

メタデータ作成部４３は、先に図２によっても説明した翻訳メタデータを、転送されてきた音声データを利用して作成するための部位とされる。この図に示す翻訳サーバ３を、図２に示したネットワークシステムにおけるものであるとすれば、メタデータ作成部４３には、言語Ａの音声データが入力されることになる。そして、言語Ａの音声データを利用して、言語Ｂによる翻訳メタデータを作成することとなる。つまり、入力された音声データについての言語Ａによる発話内容を、言語Ｂに置き換えた吹き替え音声データを作成する。或いは、入力された音声データにおける言語Ａによる発話内容についての言語Ｂへの翻訳結果が字幕として表示されるようにするための字幕データを作成する。
そして、このようにして作成された翻訳メタデータを、所定タイミングで合成処理部４４に対して逐次転送していく。 The metadata creation unit 43 is a part for creating the translation metadata described above with reference to FIG. 2 using the transferred audio data. If the translation server 3 shown in this figure is the one in the network system shown in FIG. 2, the language A speech data is input to the metadata creating unit 43. Then, translation metadata in the language B is created using the speech data in the language A. That is, the voice-over voice data is created by replacing the utterance content in the language A with respect to the input voice data by the language B. Alternatively, subtitle data is generated so that the translation result into the language B of the utterance content in the language A in the input voice data is displayed as a subtitle.
Then, the translation metadata created in this way is sequentially transferred to the composition processing unit 44 at a predetermined timing.

合成処理部４４に対しては、元の映像／音声データと、この映像／音声データに対応する翻訳メタデータとが入力されることになる。そして、合成処理部４４では、入力されてくる元の映像／音声データとされるストリームデータの構造に対して、翻訳メタデータを多重化するようにして合成するための処理を実行する。図３においても説明したように、翻訳メタデータが吹き替え音声データ、字幕データの何れとされるにせよ、元の映像／音声データに対する再生時間（再生出力タイミング）が規定されている。このために、規定された再生時間に従った再生出力が可能なようにして、元の映像／音声データに対する翻訳メタデータの合成位置が制御されるようになっている。 The original video / audio data and the translation metadata corresponding to the video / audio data are input to the composition processing unit 44. Then, the synthesis processing unit 44 executes a process for synthesizing the structure of stream data, which is the original video / audio data input, by multiplexing the translation metadata. As described with reference to FIG. 3, the reproduction time (reproduction output timing) for the original video / audio data is defined regardless of whether the translation metadata is dubbed audio data or caption data. For this reason, the synthesis output position of the translation metadata with respect to the original video / audio data is controlled so that reproduction output according to the prescribed reproduction time is possible.

合成処理部４４により翻訳メタデータが合成された映像／音声データは、送信処理部４５に対して転送される。送信処理部４５では、転送されてきた映像／音声データについて、インターネットに対応する通信プロトコルに従って、インターネット１００を経由してストリーミング送信を行うようにされる。このときの映像／音声データの送信先は、図２にて説明したように、データ送信要求コマンドを送信したモニタ装置２となる。つまり、であり、このモニタ装置２のアドレスは、例えば前述した送信先情報として、翻訳サーバ３に送信されてきた映像／音声データと共に送信されていたものである。送信処理部４５では、この送信先情報を分離抽出して取得し、この送信先情報が指定するアドレスを送信先として、映像／音声データを送出する。 The video / audio data synthesized with the translation metadata by the synthesis processing unit 44 is transferred to the transmission processing unit 45. The transmission processing unit 45 performs streaming transmission of the transferred video / audio data via the Internet 100 according to a communication protocol corresponding to the Internet. The transmission destination of the video / audio data at this time is the monitor device 2 that has transmitted the data transmission request command, as described with reference to FIG. That is, the address of the monitor device 2 is, for example, transmitted together with the video / audio data transmitted to the translation server 3 as the transmission destination information described above. In the transmission processing unit 45, the transmission destination information is separated and extracted, and the video / audio data is transmitted with the address specified by the transmission destination information as the transmission destination.

図５は、翻訳メタデータとして吹き替え音声データを作成する場合に対応したメタデータ作成部４３の構成例を示している。この場合には図示するようにして、音声認識処理部５１、翻訳処理部５２、音声変換処理部５３、及び音声合成処理部５４を備えて成る。なお、これらのメタデータ作成部４３を構成する各部も、それぞれ独立した装置、或いは回路モジュールとされる。
音声データ抽出部４２にて抽出された音声データは、先ず、音声認識処理部５１に対して入力される。また、分岐して音声合成処理部５４に対しても入力される。
音声認識処理部５１では、例えば音声データについての音響分析を行い、人間が発話しているとされる音声成分を抽出する。そして、所定の音声認識アルゴリズムにしたがい、例えば音声成分に含まれる言語Ａとしての単語ごとに音声認識処理を行い、この音声認識処理結果を、音声認識データとして出力する。上記説明から分かるように、この音声認識データは、言語Ａとしての発話音声を、言語Ａにより音声認識して得られたものである。 FIG. 5 shows a configuration example of the metadata creation unit 43 corresponding to the case of creating dubbed voice data as translation metadata. In this case, as shown in the figure, a speech recognition processing unit 51, a translation processing unit 52, a speech conversion processing unit 53, and a speech synthesis processing unit 54 are provided. In addition, each part which comprises these metadata production | generation parts 43 is also an independent apparatus or a circuit module, respectively.
The voice data extracted by the voice data extraction unit 42 is first input to the voice recognition processing unit 51. Further, the data is branched and input to the speech synthesis processing unit 54.
The voice recognition processing unit 51 performs, for example, acoustic analysis on voice data, and extracts a voice component that is supposed to be spoken by a human. Then, according to a predetermined speech recognition algorithm, for example, speech recognition processing is performed for each word as the language A included in the speech component, and the speech recognition processing result is output as speech recognition data. As can be seen from the above description, the speech recognition data is obtained by speech recognition of speech speech as language A using language A.

音声認識処理部５１から出力された言語Ａによる音声認識データは、翻訳処理部５２に対して転送される。翻訳処理部５２では、例えば言語Ａと言語Ｂとの各言語文が対応付けられた辞書を備えており、音声認識データが示す言語Ａの言語文に対応付けされ言語Ｂの言語文を検索して、このようにして検索された言語Ｂの言語文を所定規則に従って連結していくようにされる。このようして得られる言語Ｂの言語文の情報が、言語Ａの音声データの発話内容を言語Ｂに翻訳した内容を有していることになる。これが、言語Ｂの翻訳データとされ、音声変換処理部５３に対して転送される。 The speech recognition data in the language A output from the speech recognition processing unit 51 is transferred to the translation processing unit 52. For example, the translation processing unit 52 includes a dictionary in which language sentences of language A and language B are associated with each other, and searches for the language sentence of language B associated with the language sentence of language A indicated by the speech recognition data. Thus, the language sentence of the language B searched in this way is connected according to a predetermined rule. The language B language sentence information obtained in this way has contents obtained by translating the utterance contents of the speech data of the language A into the language B. This is translated into the language B and transferred to the speech conversion processing unit 53.

音声変換処理部５３では、入力された翻訳データが示す言語Ｂによる言語文の情報を、音声信号データに変換するための処理を実行する。なお、この変換処理を行うのにあたり、音声については、例えば、元の音声データにおける発話音声の周波数特性などに基づいて、元の音声データにおける発話音声と同等の声質とすることも可能とされる。そして、この音声信号データ（翻訳音声信号データ）を、音声合成処理部５４に対して転送する。 The voice conversion processing unit 53 executes a process for converting the language sentence information in the language B indicated by the input translation data into voice signal data. In performing this conversion processing, the voice can be made to have a voice quality equivalent to the voice of the original voice data based on, for example, the frequency characteristics of the voice of the voice of the original voice data. . Then, the speech signal data (translated speech signal data) is transferred to the speech synthesis processing unit 54.

音声合成処理部５４には、上記翻訳音声信号データとともに、音声データ抽出部４２からの言語Ａによる元の音声データが入力される。音声合成処理部５４は、先ず、元の音声データについての帯域フィルタ処理などを行うことで、元の音声データから発話音声成分を消去する。そして、この発話音声成分が消去された音声データに対して、翻訳音声信号データの成分を合成していくようにされる。このときには、消去された発話音声成分の発話内容と、翻訳音声信号データの翻訳発話内容とが、ほぼ時間的に対応するようにして合成処理が行われるようにされる。
このような処理によって、音声データとしては、元の音声データから、発話音声だけが言語Ａから言語Ｂに吹き替えられた、吹き替え音声データが得られることとなる。そして、この吹き替え音声データを翻訳データとして、図４に示した合成処理部４４に対して転送するようにされる。 The speech synthesis processing unit 54 receives the original speech data in the language A from the speech data extraction unit 42 together with the translated speech signal data. The speech synthesis processing unit 54 first deletes the uttered speech component from the original speech data by performing band filter processing on the original speech data. Then, the component of the translated speech signal data is synthesized with the speech data from which the utterance speech component has been deleted. At this time, the synthesis process is performed so that the utterance content of the deleted utterance speech component and the translation utterance content of the translated speech signal data substantially correspond in time.
By such processing, as voice data, dubbed voice data in which only the spoken voice is dubbed from language A to language B is obtained from the original voice data. Then, this dubbed voice data is transferred as translation data to the synthesis processing unit 44 shown in FIG.

図６は、翻訳メタデータとして字幕データを作成する場合に対応したメタデータ作成部４３の構成例を示しており、この場合には図示するようにして、音声認識処理部５１、翻訳処理部５２、及び字幕データ化処理部５５を備えて成る。なお、この場合にも、これら各部は、それぞれ独立した装置、或いは回路モジュールとされる。
この場合にも、音声データ抽出部４２にて抽出された音声データは、先ず、音声認識処理部５１に対して入力される。なお、この音声認識処理部５１と、この音声認識処理部５１から転送出力される音声認識データが入力される翻訳処理部５２は、図５の場合と同様の構成とされればよいことから、ここでの説明は省略する。 FIG. 6 shows a configuration example of the metadata creation unit 43 corresponding to the case where caption data is created as translation metadata. In this case, as shown in the figure, a speech recognition processing unit 51 and a translation processing unit 52 are illustrated. And a caption data conversion processing unit 55. In this case as well, these units are independent devices or circuit modules.
Also in this case, the voice data extracted by the voice data extraction unit 42 is first input to the voice recognition processing unit 51. Note that the speech recognition processing unit 51 and the translation processing unit 52 to which the speech recognition data transferred and output from the speech recognition processing unit 51 are input may have the same configuration as in FIG. The description here is omitted.

翻訳処理部５２にて得られた言語Ｂによる翻訳データは、字幕データ化処理部５５に対して転送される。
字幕データ化処理部５５では、入力される翻訳データの内容に基づいて、字幕データを生成する。ここで、字幕データとしては、例えば、元の音声データに対する再生出力タイミングを規定したテキスト情報とすることが考えられる。あるいは、元の音声データに対する再生出力タイミングを規定した、主画像に対して合成すべき画像データとすることが考えられる。
字幕データをテキストデータとする場合には、入力される翻訳データの内容を、言語Ｂによるテキストデータに変換するための処理を実行することになる。また、字幕データを画像データとする場合には、先ず、翻訳データの内容からテキストデータを発生させ、このテキストデータに対応する文字画像データを得るようにされる。そして、この文字画像データを利用して、字幕として読めるようにして文字配列された表示態様の画像データを生成するようにされる。
そして、このようにして生成した字幕データを、翻訳メタデータとして合成処理部４４に対して転送する。 The translation data in the language B obtained by the translation processing unit 52 is transferred to the caption data conversion processing unit 55.
The caption data conversion processing unit 55 generates caption data based on the contents of the input translation data. Here, as the subtitle data, for example, it is conceivable that the text information defines the reproduction output timing for the original audio data. Alternatively, it is conceivable to use image data to be synthesized with the main image, which defines the reproduction output timing for the original audio data.
When subtitle data is used as text data, processing for converting the content of input translation data into text data in language B is executed. When subtitle data is used as image data, first, text data is generated from the contents of translation data, and character image data corresponding to the text data is obtained. Then, using this character image data, image data of a display mode in which characters are arranged so as to be read as subtitles is generated.
Then, the caption data generated in this way is transferred to the composition processing unit 44 as translation metadata.

また、図７に翻訳サーバ３としての他の構成例を示す。なお、この図において図６と同一部分には同一符号を付して説明を省略する。
この図においては、メタデータ作成部として、例えばメタデータ作成部４３−１，４３−２，４３−３の３つが備えられている。この構成の場合、メタデータ作成部４３−１，４３−２，４３−３は、何れも言語Ａによる音声データが入力されるようになっているが、メタデータ作成部４３−１では、言語Ａから言語Ｂに翻訳した翻訳メタデータを生成して出力する。メタデータ作成部４３−２では、言語Ａから言語Ｃに翻訳した翻訳メタデータを生成して出力する。そして、メタデータ作成部４３−３では、言語Ａから言語Ｄに翻訳した翻訳メタデータを生成して出力する。つまり、メタデータ作成部４３−１，４３−２，４３−３は、それぞれ異なる言語に翻訳した翻訳メタデータを生成して出力するようにされている。 FIG. 7 shows another configuration example as the translation server 3. In this figure, the same parts as those in FIG.
In this figure, for example, three metadata creation units 43-1, 43-2, and 43-3 are provided as metadata creation units. In the case of this configuration, the metadata creation units 43-1, 43-2, and 43-3 are all configured to receive audio data in the language A, but the metadata creation unit 43-1 The translation metadata translated from A to language B is generated and output. The metadata creation unit 43-2 generates and outputs translation metadata translated from the language A into the language C. Then, the metadata creation unit 43-3 generates and outputs translation metadata translated from the language A into the language D. That is, the metadata creation units 43-1, 43-2, and 43-3 generate and output translation metadata translated into different languages.

このような構成を採る場合には、例えば図２に示した手順１としてモニタ装置２から送信するデータ送信要求コマンドのフォーマットとして、例えば翻訳言語を指定する情報（翻訳言語指定情報）を含めることとする。翻訳言語の指定は、例えばモニタ装置２に対する操作によってユーザが選択できるようにすればよい。そして、このコマンドを受信して、基地局装置１が翻訳サーバ３に対して映像／音声データを送信するときには、例えばこの映像／音声データを送信するパケットのヘッダなどに対して、翻訳言語指定情報を格納しておくようにされる。
翻訳サーバ３では、映像／音声データを受信したときには、例えばパケットから抽出した翻訳言語指定情報を参照して、指定された翻訳言語について認識する。そして、音声データ抽出部４２により抽出された音声データを、上記のようにして認識した翻訳言語に翻訳するメタデータ作成部（４３−１，４３−２，４３−３の何れか）に対して飲み転送するようにされる。例えば、翻訳言語指定情報が示す翻訳言語が言語Ｃであれば、メタデータ作成部４３−２に対して音声データを転送し、言語Ｃに翻訳された翻訳メタデータを作成させることになる。 In the case of adopting such a configuration, for example, information specifying the translation language (translation language designation information) is included as the format of the data transmission request command transmitted from the monitor device 2 in the procedure 1 shown in FIG. To do. The translation language may be designated by the user by operating the monitor device 2, for example. When the base station apparatus 1 receives this command and transmits the video / audio data to the translation server 3, for example, the translation language designation information for the header of the packet for transmitting the video / audio data. Will be stored.
When the translation server 3 receives the video / audio data, the translation server 3 recognizes the designated translation language with reference to, for example, translation language designation information extracted from the packet. And with respect to the metadata preparation part (any one of 43-1, 43-2, 43-3) which translates the audio | voice data extracted by the audio | voice data extraction part 42 into the translation language recognized as mentioned above. To drink and be transferred. For example, if the translation language indicated by the translation language designation information is language C, the audio data is transferred to the metadata creation unit 43-2, and translation metadata translated into language C is created.

このような構成とすれば、モニタ装置２を使用しているユーザは、元はＡ国から送信された映像／音声データのコンテンツでありながら、例えばＢ国以外の言語にも翻訳されたメタ翻訳データが付加されることで、多様な言語により視聴して楽しむことができる。このような視聴のしかたは、例えば語学学習にも有効である。 With this configuration, the user who uses the monitor device 2 is originally a content of video / audio data transmitted from the country A, but for example, a metatranslation translated into a language other than the country B. By adding data, you can enjoy watching in various languages. This way of viewing is also effective for language learning, for example.

なお、図７に示す構成を採る場合において、翻訳サーバ３に対する翻訳言語の指定としては、例えばモニタ装置２と翻訳サーバ３とが直接的に送受信を行うことによって、モニタ装置２が翻訳サーバ３に対して、翻訳言語の指定通知、登録を行うように構成することも考えられる。
また、メタデータ作成部の数としては特に限定されるべきものではなく、実際に対応することした翻訳言語に応じて適宜変更されるべきものである。このことから、図７に示す構成であれば、メタデータ作成部について異なる翻訳言語ごとに個別に構成したものを追加していけば、対応可能な翻訳言語は容易に増加させることが可能であり、この点での拡張性も高いということがいえる。
さらには、翻訳サーバ３について、図７に示すようにして複数のメタデータ作成部を備えるとした場合において、必ずしも、図７に例示しているように、入力される音声データの言語については、１つの言語のみに限定されるべきものではない。
つまり、例えば入力音声データの言語として、言語Ａ，Ｂの２つの言語の何れかとされることが想定され、翻訳言語としては、言語Ｃ，Ｄの何れかに翻訳するというような場合であれば、
言語Ａ→言語Ｃ
言語Ｂ→言語Ｃ
言語Ａ→言語Ｄ
言語Ｂ→言語Ｄ
の４つの翻訳のパターンを考えることができる。そこで、このパターンに対応する４つのメタデータ作成部４３を設けるようにすることが考えられる。
また、図７では、翻訳言語の違いに応じて、それぞれメタデータ作成部４３を設けることとしているが、実際においては、例えば１つのメタデータ作成部としての装置、回路モジュールにおいて、翻訳言語の切り換えが行われるようにして、実質的に、図７に示される複数の翻訳言語対応機能が得られるようにしてもよい。この点については、例えばメタデータ作成部としての機能を、コンピュータ装置が実行するプログラムとして構成する場合にも同様のことがいえる。
また、メタデータ作成部だけではなく、図４又は図７に例示したような翻訳サーバ３を形成する各部位の機能の全て或いは一部を、コンピュータ装置がプログラムを実行することにより実現される処理として構成してもよい。 In the case of adopting the configuration shown in FIG. 7, as the designation of the translation language for the translation server 3, for example, the monitor device 2 and the translation server 3 directly transmit and receive, whereby the monitor device 2 communicates with the translation server 3. On the other hand, it may be configured to notify and register the translation language.
Further, the number of metadata creation units is not particularly limited, and should be changed as appropriate according to the translation language that actually corresponds. Therefore, with the configuration shown in FIG. 7, the number of translation languages that can be handled can be easily increased by adding the metadata creation unit individually configured for each different translation language. It can be said that the extensibility in this respect is also high.
Further, in the case where the translation server 3 includes a plurality of metadata creation units as shown in FIG. 7, as illustrated in FIG. It should not be limited to only one language.
In other words, for example, it is assumed that the language of the input voice data is one of the two languages A and B, and the translation language is any one of the languages C and D. ,
Language A → Language C
Language B → Language C
Language A → Language D
Language B → Language D
The four translation patterns can be considered. Therefore, it is conceivable to provide four metadata creation units 43 corresponding to this pattern.
In FIG. 7, the metadata creation unit 43 is provided for each translation language. However, in actuality, for example, in one device or circuit module as a metadata creation unit, the translation language is switched. As a result, the functions corresponding to a plurality of translated languages shown in FIG. 7 may be substantially obtained. The same can be said about this point, for example, when the function as the metadata creation unit is configured as a program executed by the computer apparatus.
In addition to the metadata creation unit, processing realized by the computer device executing all or part of the functions of each part forming the translation server 3 illustrated in FIG. 4 or FIG. You may comprise as.

また、翻訳サーバ３全体についての構成としては、例えば受信処理部４１、音声データ抽出部４２、メタデータ作成部４３、合成処理部４４、送信処理部４５から成るものとしているが、これらの部位を装置部として構成した場合には、例えばこれらの装置部位の一部、若しくは全部を、ネットワーク経由で接続した構成とすることも考えられる。
また、メタデータ作成部４３としては、図５と図６により、翻訳メタデータとして吹き替え音声データを生成する構成と、字幕データを生成する構成とをそれぞれ別個に示したが、１つのメタデータ作成部４３により、吹き替え音声データと字幕データの何れの翻訳メタデータも生成可能な構成としてよい。この場合には、例えば図７において説明したモニタ装置２から言語指定を行うのと同様にして、例えばユーザのモニタ装置２に対する操作に応じて、翻訳メタデータについて吹き替え音声データと字幕データのいずれかとするのかをモニタ装置２から指定できるようにする。そして、この指定に応じて、メタデータ作成部４３は、翻訳データとして、吹き替え音声データと字幕データの何れを生成するのかを選択するようにされる。このようにすれば、翻訳メタデータ（翻訳情報の出力のさせ方）についてもユーザが選択できることとなって、娯楽性、利便性などがより高められることとなる。 The translation server 3 as a whole includes, for example, a reception processing unit 41, an audio data extraction unit 42, a metadata creation unit 43, a synthesis processing unit 44, and a transmission processing unit 45. When configured as a device unit, for example, a part or all of these device parts may be connected via a network.
Further, as the metadata creation unit 43, the configuration for generating dubbed audio data as translation metadata and the configuration for generating caption data are separately shown in FIGS. 5 and 6, but one metadata creation is performed. The unit 43 may be configured to generate any translation metadata of dubbed audio data and caption data. In this case, for example, the language designation is performed from the monitor device 2 described with reference to FIG. 7. For example, depending on the user's operation on the monitor device 2, either the dubbed audio data or the caption data for the translation metadata is used. It is possible to specify from the monitor device 2 whether to do this. In response to this designation, the metadata creation unit 43 selects whether to generate dubbed audio data or caption data as translation data. In this way, the user can also select translation metadata (how to output translation information), and entertainment and convenience are further enhanced.

また、上記実施の形態では、本発明としての付加情報について、吹き替え音声データ又は字幕データなどの翻訳データとしているが、付加情報としてはこれに限定されない。つまり、入力される映像／音声データを再生出力して視覚的、聴覚的に認識されるコンテンツ内容に基づいて生成されるもので、その付加情報が映像／音声データを再生する側の出力されることで、ユーザが何らかの利益を享受できるようなものであればよい。例えば、入力される映像／音声データを基として、その画像を３Ｄ化して出力できるようにするためのデータを付加情報とすることも考えられる。また、このような付加情報であれば、基地局装置とモニタ装置とについて、特に遠隔した地域に設置する必要もないということにもなる。 Moreover, in the said embodiment, although the additional information as this invention is used as translation data, such as dubbing audio | voice data or subtitle data, as additional information, it is not limited to this. In other words, the input video / audio data is reproduced and output and generated based on the contents of the content visually and audibly recognized, and the additional information is output on the side of reproducing the video / audio data. As long as the user can enjoy some kind of benefit. For example, based on the input video / audio data, it is also conceivable to use the data for making the image 3D and outputting it as additional information. In addition, with such additional information, it is not necessary to install the base station device and the monitor device in a particularly remote area.

また、上記実施の形態では、ネットワークシステムにおける映像／音声データの送受信システムとして、基地局装置とモニタ装置とから成るシステムを例に挙げているが、これ以外の形態の送受信システムに対しても適用されるべきものである。例えば、基地局装置とモニタ装置に代えて、パーソナルコンピュータなどとすることも考えられるし、特に、モニタ装置に代えては、より携帯性、可搬性に優れたＰＤＡ(Personal Digital Assistants)などの小型端末装置とすることも考えられる。 In the above embodiment, the video / audio data transmission / reception system in the network system is exemplified by a system including a base station device and a monitor device. However, the present invention is also applicable to transmission / reception systems of other forms. Is to be done. For example, a personal computer or the like may be used instead of the base station device and the monitor device, and in particular, a small size such as a PDA (Personal Digital Assistants) that is more portable and portable than the monitor device. A terminal device may be considered.

本発明の実施の形態としてのネットワークシステムにおける情報送受信システムの基本構成を示す図である。It is a figure which shows the basic composition of the information transmission / reception system in the network system as embodiment of this invention. 実施の形態のネットワークシステムの利用例をその動作手順により説明するための図である。It is a figure for demonstrating the usage example of the network system of embodiment by the operation | movement procedure. 情報送受信システムを構成する基地局装置とモニタ装置の内部構成例を示すブロック図である。It is a block diagram which shows the internal structural example of the base station apparatus and monitor apparatus which comprise an information transmission / reception system. 翻訳サーバの構成例を示すブロック図である。It is a block diagram which shows the structural example of a translation server. メタデータ作成部の構成例を示すブロック図である。It is a block diagram which shows the structural example of a metadata production | generation part. メタデータ作成部の他の構成例を示すブロック図である。It is a block diagram which shows the other structural example of a metadata production | generation part. 翻訳サーバの他の構成例を示すブロック図である。It is a block diagram which shows the other structural example of a translation server. 従来における情報送受信システムの利用例を説明するための図である。It is a figure for demonstrating the usage example of the information transmission / reception system in the past.

Explanation of symbols

１基地局装置、２モニタ装置、３翻訳サーバ、１１ＴＶチューナ、１２映像／音声エンコーダ、１３映像／音声入力部、１５制御部、１７，２４ローカル通信部、１８，２５ネットワークインターフェイス、１９，３１内部データバス、２１映像／音声デコーダ、２２制御部、２６表示制御部、２７表示部、２８音声処理部、２９スピーカ、３０操作入力部、３０ａタッチパネル、３２メタデータ処理部、４１受信処理部、４２音声データ抽出部、４３メタデータ作成部、４４合成処理部、４５送信処理部、５１音声認識処理部、５２翻訳処理部、５３音声化処理部、５４音声合成処理部、５５字幕データ化処理部、１００インターネット、ANT アンテナ DESCRIPTION OF SYMBOLS 1 Base station apparatus, 2 Monitor apparatus, 3 Translation server, 11 TV tuner, 12 Video / audio encoder, 13 Video / audio input part, 15 Control part, 17, 24 Local communication part, 18, 25 Network interface, 19, 31 Internal data bus, 21 video / audio decoder, 22 control unit, 26 display control unit, 27 display unit, 28 audio processing unit, 29 speaker, 30 operation input unit, 30a touch panel, 32 metadata processing unit, 41 reception processing unit, 42 voice data extraction unit, 43 metadata creation unit, 44 synthesis processing unit, 45 transmission processing unit, 51 speech recognition processing unit, 52 translation processing unit, 53 speech processing unit, 54 speech synthesis processing unit, 55 subtitle data conversion processing Department, 100 Internet, ANT antenna

Claims

This is obtained by receiving streaming transmission via a predetermined communication network, and inputs video / audio data including at least video data and / or audio data, and is recognized from the input video / audio data. Additional information generating means for generating additional information having predetermined information content based on the content content of
Video / audio data generation for adding the additional information to the input video / audio data and generating video / audio data for transmission to be streamed to a specific terminal device via a predetermined communication network Means,
An information processing apparatus comprising:

The additional information generating means includes
Voice recognition means adapted to recognize the utterance content in a predetermined first language as the content content recognition from the audio data of the input video / audio data;
Translation information generating means for generating translation information of contents obtained by translating the recognized utterance contents into a predetermined second language based on the result of recognizing the utterance contents by the voice recognition means as the additional information;
The information processing apparatus according to claim 1, further comprising:

The translation information generation means includes
Generating, as the translation information, audio data in which the utterance content in the audio data of the input video / audio data is translated into a predetermined second language;
The information processing apparatus according to claim 2.

The translation information generation means includes
As the translation information, text information obtained by translating the utterance content in the audio data of the input video / audio data is generated.
The information processing apparatus according to claim 2.

This is obtained by receiving streaming transmission via a predetermined communication network, and inputs video / audio data including at least video data and / or audio data, and is recognized from the input video / audio data. Additional information generation processing for generating additional information having predetermined information content based on the content content of
Video / audio data generation for adding the additional information to the input video / audio data and generating video / audio data for transmission to be streamed to a specific terminal device via a predetermined communication network Processing,
The information processing method characterized by performing.