JP2009017438A

JP2009017438A - Information transmission apparatus

Info

Publication number: JP2009017438A
Application number: JP2007179406A
Authority: JP
Inventors: Hideji Kuramoto; 秀治倉本; Kazunari Nawa; 一成那和; Takeshi Kawaguchi; 剛川口
Original assignee: Toyota Motor Corp; Toyota InfoTechnology Center Co Ltd
Current assignee: Toyota Motor Corp; Toyota InfoTechnology Center Co Ltd
Priority date: 2007-07-09
Filing date: 2007-07-09
Publication date: 2009-01-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information transmission apparatus capable of obtaining presence. <P>SOLUTION: A projector 15 captures a moving image and audio and displays the moving image on a screen 11. When the relevant moving image is displayed, an object detection section 13 detects an object from the relevant moving image and based on a detection result of the object detection section 13, an audio control section 14 controls a plurality of audio output sections 12. The plurality of audio control sections 12 are disposed so as to correspond to different positions on the screen 11. An audio control section 14 is controlled so as to output the audio of the object only from an audio output section 12 corresponding to a position of the relevant object, for example. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、動画像と音声を伝達する情報伝達装置に関する。 The present invention relates to an information transmission device that transmits moving images and sound.

テレビジョン、コンピュータ端末、テレビ電話などの情報伝達装置では、臨場感を出すための技術が望まれている。 In an information transmission device such as a television, a computer terminal, and a videophone, a technique for giving a sense of reality is desired.

従来、臨場感を出すために、視覚的には、動画像が表示される画面（スクリーン）を大きくしたり、聴覚的には、低音出力用スピーカーから大音量の音声を出力したり、ホームシアターシステムのように複数台のスピーカーを用いたりされている。 Conventionally, in order to give a sense of realism, visually, the screen (screen) on which moving images are displayed is enlarged, or auditoryly, a loud sound is output from a low-frequency output speaker, or a home theater system. In some cases, multiple speakers are used.

また、このような臨場感を出すための従来技術は、例えば、特許文献１や特許文献２に開示されている。 Moreover, the prior art for producing such a sense of presence is disclosed by patent document 1 and patent document 2, for example.

しかし、これらの装置は、スクリーンとスピーカーが空間的に離れた場所に存在するために、臨場感が低下するという問題がある。 However, these devices have a problem that a sense of reality is lowered because the screen and the speaker are located in a spatially separated place.

このような問題に鑑みた従来技術として特許文献３には、動画像表示面と実質上同一の面から音声発生が行われることを特徴とし、動画像表示面の部分（部分音声発生素子）ごとに音声の定位を制御することを特徴とする情報伝達システムが開示されている。 As a conventional technique in view of such problems, Patent Document 3 is characterized in that sound is generated from substantially the same surface as the moving image display surface, and each portion (partial sound generating element) of the moving image display surface is characterized in that Discloses an information transmission system that controls sound localization.

しかし、特許文献３の技術では、予め部分音声発生素子ごとに音声信号を割り当てるため、音声信号が予め分けられていなければならず、モノラル音声信号や２ｃｈステレオ音声信号などの動画像を観賞する際に高い臨場感を得ることができない。
特開平６−２３７４５８号公報特開２００３−２７４３１４号公報特開２００１−７８２８２号公報 However, in the technique of Patent Document 3, since an audio signal is assigned to each partial audio generating element in advance, the audio signal must be divided in advance, and when a moving image such as a monaural audio signal or a 2ch stereo audio signal is viewed. Can not get a high sense of realism.
JP-A-6-237458 JP 2003-274314 A JP 2001-78282 A

本発明は上記実情に鑑みてなされたものであって、その目的とするところは、臨場感を得ることのできる情報伝達装置を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an information transmission device capable of providing a sense of reality.

上記目的を達成するために、本発明は以下の構成を採用する。 In order to achieve the above object, the present invention adopts the following configuration.

本発明に係る情報伝達装置は、動画像が表示されるスクリーンと、前記スクリーン上の異なる位置に対応して配置された複数の音声出力部と、前記動画像から対象物を検出する対象物検出手段と、前記対象物検出手段の検出結果に基づいて、前記複数の音声出力部の夫々を制御する制御手段と、を備えることを特徴とする。 An information transmission apparatus according to the present invention includes a screen on which a moving image is displayed, a plurality of audio output units arranged corresponding to different positions on the screen, and an object detection for detecting an object from the moving image. And a control means for controlling each of the plurality of sound output units based on the detection result of the object detection means.

本発明に係る情報伝達装置は、複数の制御可能な音声出力部がスクリーン上の異なる位置に対応して配置されており、夫々の音声出力部は対象物検出手段の検出結果に基づいて制御される。例えば、スクリーンの一部に或る対象物が写っている場合、当該対象物の位置に対応する音声出力部から音声が出力されるように制御すれば、視聴者は、対象物の方向から音声が聞こえてくるため、当該対象物が音声を発しているような感覚を得ることができる。更に、当該対象物が移動する場合、その動きに合わせて当該音声を出力する音声
出力部を切り換えることにより、視聴者は当該対象物が音声を発しているような感覚を得続けることができる。換言すれば、視聴者は、動画像に写っているものが本当にそこにあるかのような立体感を得ることができ、ひいては、あたかもスクリーンに写っている空間内に自分がいるかのような臨場感を得ることができる。 In the information transmission apparatus according to the present invention, a plurality of controllable sound output units are arranged corresponding to different positions on the screen, and each sound output unit is controlled based on the detection result of the object detection means. The For example, when a certain object is shown on a part of the screen, the viewer can hear the sound from the direction of the object by controlling the sound output unit to output the sound corresponding to the position of the object. Can be heard, so that it is possible to obtain a feeling that the object is producing sound. Furthermore, when the target object moves, the viewer can continue to obtain a feeling that the target object emits sound by switching the sound output unit that outputs the sound in accordance with the movement. In other words, the viewer can get a three-dimensional feeling as if what is actually shown in the moving image is there, and as a result, it feels as if he is in the space on the screen. A feeling can be obtained.

また、本発明に係る情報伝達装置は、入力される音声から、前記対象物検出手段で検出された対象物の音声を抽出する音声抽出手段を更に備え、前記制御手段は、前記抽出手段で抽出された音声を前記対象物の表示位置に対応する音声出力部から出力することが好ましい。これにより、入力される音声に対象物が発している音声以外の音声が混ざっていたとしても（実際、映画やテレビ番組などの動画像では、対象物が発する音声以外の音声も混ざっていることが多い）、視聴者は、対象物が発している音声のみが当該対象物の方向から聞こえるような感覚を得ることができる。 The information transmission apparatus according to the present invention further includes voice extraction means for extracting the voice of the target detected by the target detection means from the input voice, and the control means is extracted by the extraction means. It is preferable that the sound output is output from a sound output unit corresponding to the display position of the object. As a result, even if the input sound is mixed with sound other than the sound emitted by the object (actually, the moving image such as a movie or TV program also includes sound other than the sound emitted by the object) In many cases, the viewer can obtain a feeling that only the sound emitted from the object can be heard from the direction of the object.

また、本発明に係る情報伝達装置は、前記動画像と音声が入力される入力部と、入力された動画像を前記スクリーンに表示する動画像表示部と、を更に備えることが好ましい。これにより、当該情報伝達装置の有用性が向上する。 The information transmission apparatus according to the present invention preferably further includes an input unit for inputting the moving image and sound, and a moving image display unit for displaying the input moving image on the screen. Thereby, the usefulness of the information transmission device is improved.

制御手段は、対象物の大きさに基づいて抽出手段で抽出された音声の音量を決定することが好ましい。現実には、遠くの対象物が発する音と近くの対象物が発する音とでは、その大きさに違いがあるのが普通である（例えば、遠くの対象物が発する音のほうが小さく聞こえるなど）。また、動画像は、遠くの対象物ほど小さく写る傾向にあるため、対象物の大きさに基づいて音量を決定することにより、より現実に近い状況を再現することができる。 Preferably, the control means determines the volume of the sound extracted by the extraction means based on the size of the object. In reality, there is usually a difference in the magnitude of the sound produced by a distant object and the sound produced by a nearby object (for example, the sound produced by a distant object sounds smaller). . In addition, since moving images tend to appear smaller as an object is farther away, a situation closer to reality can be reproduced by determining the volume based on the size of the object.

また、本発明に係る情報伝達装置は、スクリーンと独立の位置に配置される独立音声出力部を更に備え、制御手段は、抽出手段で抽出された音声以外の少なくとも一部の音声を独立音声出力部から出力することが好ましい。これにより、例えば、動画像中に映っていない人物の音声やＢＧＭなどがスクリーン外から聞こえるようになり、音の立体感および臨場感が更に高まる。 In addition, the information transmission apparatus according to the present invention further includes an independent sound output unit arranged at a position independent of the screen, and the control means outputs at least a part of the sound other than the sound extracted by the extraction means as an independent sound output. It is preferable to output from the section. As a result, for example, the voice or BGM of a person who is not shown in the moving image can be heard from outside the screen, and the three-dimensional effect and the sense of presence of the sound are further enhanced.

制御手段は、抽出手段で抽出された音声以外の少なくとも一部の音声をスクリーンの全ての位置に対応する音声出力部から出力することが好ましい。これにより、ＢＧＭやナレーションなどがスクリーン全体から聞こえるようになる。更に、付加機器を備えることなく対象物以外の音声を出力することができる。 Preferably, the control means outputs at least a part of the sound other than the sound extracted by the extracting means from the sound output unit corresponding to all positions on the screen. As a result, BGM and narration can be heard from the entire screen. Furthermore, it is possible to output sound other than the object without providing an additional device.

制御手段は、抽出手段で抽出された音声以外の少なくとも一部の音声をスクリーンの一部の領域に対応する音声出力部から出力することが好ましい。これにより、全ての音声出力部のうち、音声を出力する音声出力部の数を減らすことができるため、消費電力を低く抑えることができる。 Preferably, the control means outputs at least a part of the sound other than the sound extracted by the extracting means from an audio output unit corresponding to a partial area of the screen. This can reduce the number of audio output units that output audio among all the audio output units, thereby reducing power consumption.

本発明に係る情報伝達装置では、音声出力部を対象物検出手段の結果に基づいて制御することにより、どのような動画像（音声を含む）に対しても臨場感を得ることができる。 In the information transmission apparatus according to the present invention, it is possible to obtain a sense of reality for any moving image (including sound) by controlling the sound output unit based on the result of the object detection means.

以下に図面を参照して、この発明の好適な実施の形態を例示的に詳しく説明する。 Exemplary embodiments of the present invention will be described in detail below with reference to the drawings.

＜装置構成＞
図１は、本発明の実施形態に係る情報伝達装置の機能構成を示すブロック図である。この情報伝達装置は、動画像と音声を伝達する装置である。 <Device configuration>
FIG. 1 is a block diagram showing a functional configuration of an information transmission apparatus according to an embodiment of the present invention. This information transmission device is a device that transmits moving images and sound.

情報伝達装置は、スクリーン１１、複数の音声出力部１２、対象物検出部１３、及び、音声制御部１４を備える（図１参照）。 The information transmission device includes a screen 11, a plurality of audio output units 12, an object detection unit 13, and an audio control unit 14 (see FIG. 1).

スクリーン１１は、映写幕やディスプレイ装置の映像面など、動画像を表示することができれば、どのような技術を適用してもよい。本実施形態におけるスクリーン１１は映写幕である。本実施形態では、図１に示す市販の映写機１５（プロジェクタ）が動画像及び音声を取得し、スクリーン１１に当該動画像を表示する。なお、映写機１５は、ＤＶＤプレーヤーなどを用いて市販のＤＶＤ（映画など）から動画像及び音声を取得してもよいし、ハードディスクなどに記憶されている動画像（音声を含む）を取得してもよい。 Any technology may be applied to the screen 11 as long as it can display a moving image such as a projection screen or a video screen of a display device. The screen 11 in this embodiment is a projection screen. In the present embodiment, a commercially available projector 15 (projector) shown in FIG. 1 acquires a moving image and sound, and displays the moving image on the screen 11. Note that the projector 15 may acquire a moving image and sound from a commercially available DVD (movie or the like) using a DVD player or the like, or acquire a moving image (including sound) stored in a hard disk or the like. Also good.

音声出力部１２は、コーンスピーカーやフラットパネルスピーカーなど、音声を出力することができれば、どのような技術を適用してもよい。本実施形態における音声出力部１２はフラットパネルスピーカーである。フラットパネルスピーカーはコーンスピーカーに比べ音波の拡散・減衰が小さいため、ユーザは音声の聞こえてくる位置をより正確に認識することができる。すなわち、音声出力部１２としてフラットパネルスピーカーを用いることにより、ユーザは、より高い臨場感を得ることができる。 The sound output unit 12 may apply any technique as long as it can output sound, such as a cone speaker or a flat panel speaker. The audio output unit 12 in this embodiment is a flat panel speaker. The flat panel speaker has a smaller sound wave diffusion / attenuation than the cone speaker, so that the user can more accurately recognize the position where the sound can be heard. That is, by using a flat panel speaker as the audio output unit 12, the user can obtain a higher sense of realism.

更に、複数の音声出力部１２は、スクリーン１１上の異なる位置に対応して配置されている。本実施形態における複数の音声出力部１２は、図２に示すようにスクリーン１１の裏面にｍ×ｎのマトリクス状に配置されている。なお、複数の音声出力部１２は、図２のように配置しなくてもよい。音声出力部１２が透明であれば、スクリーン１１の前面に配置してもよい。マトリクス状でなく、放射状に配置してもよい。音声出力部１２の数もいくつでもよい（例えば、１００個でもよいし、２個でもよい。）。音声出力部１２の配置及び数は、ユーザの目的に応じてどのように配置してもよい。 Further, the plurality of audio output units 12 are arranged corresponding to different positions on the screen 11. The plurality of audio output units 12 in the present embodiment are arranged in a matrix of m × n on the back surface of the screen 11 as shown in FIG. The plurality of audio output units 12 need not be arranged as shown in FIG. If the audio output unit 12 is transparent, the audio output unit 12 may be disposed on the front surface of the screen 11. You may arrange | position radially instead of a matrix form. The number of audio output units 12 may be any number (for example, 100 or 2). The arrangement and number of the audio output units 12 may be arranged in any manner according to the purpose of the user.

対象物検出部１３は、動画像から対象物を検出する機能である（本実施形態における対象物検出部１３は、図１に示すように、映写機１５から動画像を取得する）。対象物検出部１３による対象物検出処理は、既存のどのような技術が適用されてもよい。対象物は、音を発するものや動きのあるものなど、どのようなものであってもよい。例えば、人物、乗り物（電車、自動車、飛行機など）などを対象物とすればよい。なお、動きがあるものは、表示位置自体に変化があるものであってもよいし、周囲の動き（背景など）に対して動きがあるものであってもよい。対象物検出部１３の検出結果は、当該対象物の種類、大きさ、位置など、その対象物を表す量であればどのようなものでもよい（本実施形態では、検出結果として、対象物の大きさ及び位置（対象物の中心の位置座標）が出力される）。 The object detection unit 13 has a function of detecting an object from a moving image (the object detection unit 13 in the present embodiment acquires a moving image from the projector 15 as shown in FIG. 1). Any existing technique may be applied to the object detection processing by the object detection unit 13. The target object may be anything such as a sound-generating object or a moving object. For example, a person, a vehicle (a train, an automobile, an airplane, etc.) may be used as the object. In addition, what has a movement may have a change in display position itself, and may have a movement with respect to surrounding movements (background etc.). The detection result of the object detection unit 13 may be any amount as long as it represents the object, such as the type, size, and position of the object (in this embodiment, the detection result of the object The size and position (position coordinates of the center of the object) are output).

音声制御部１４は、対象物検出部１３の検出結果に基づいて夫々の音声出力部１２を制御する機能である（本実施形態における音声制御部１４は、図１に示すように、映写機１５から音声を取得する）。なお、音声制御部１４は、音声出力部１２ごとに備えていてもよいし、１つの音声制御部１４で全ての音声出力部１２を制御してもよい。音声出力部１２の制御方法の例については後で詳しく説明する。 The audio control unit 14 has a function of controlling the respective audio output units 12 based on the detection result of the object detection unit 13 (the audio control unit 14 in the present embodiment includes a projector 15 as shown in FIG. Get audio). In addition, the audio | voice control part 14 may be provided for every audio | voice output part 12, and you may control all the audio | voice output parts 12 with the one audio | voice control part 14. FIG. An example of the control method of the audio output unit 12 will be described in detail later.

＜情報伝達機能＞
図３のフローチャートに沿って、情報伝達装置の機能及び処理の流れについて説明する。 <Information transmission function>
The function and processing flow of the information transmission apparatus will be described with reference to the flowchart of FIG.

情報伝達機能が起動すると、映写機１５が、動画像及びその音声を取得する（ステップＳ１０１）。本実施形態では、図４のように人物４１のみが写っている動画像及びその音声（人物４１の声のみの音声）が取得されたとする。 When the information transmission function is activated, the projector 15 acquires a moving image and its sound (step S101). In the present embodiment, it is assumed that a moving image in which only the person 41 is captured and its sound (sound only of the voice of the person 41) is acquired as shown in FIG.

次に、ステップＳ１０１で取得された動画像がスクリーン１１に表示される（ステップＳ１０２）。 Next, the moving image acquired in step S101 is displayed on the screen 11 (step S102).

ステップＳ１０２と同時に（動画像がスクリーン１１に表示されると同時に）以下の工程が行われる。 Simultaneously with step S102 (simultaneously with the moving image being displayed on the screen 11), the following steps are performed.

対象物検出部１３が、ステップＳ１０１で取得された動画像から対象物を検出する（ステップＳ１０３）。本実施形態では、対象物として、人の顔を考える（すなわち、図４の例では、人物４１の顔が検出される）。 The object detection unit 13 detects the object from the moving image acquired in step S101 (step S103). In the present embodiment, a human face is considered as the object (that is, the face of the person 41 is detected in the example of FIG. 4).

そして、音声制御部１４が、対象物検出部１３の検出結果に基づいて夫々の音声出力部１２を制御する（ステップＳ１０４）。図４の例では、音声制御部１４は、人物４１の顔の位置に対応する音声出力部４２からのみ音声を出力するように、夫々の音声出力部１２（音声出力部４２を含む）を制御する。これにより、人物４１の方向からのみ音声が聞こえるようになる。なお、当該音声は、顔（対象物）の位置に対応する１つの音声出力部４２から出力したものであってもよいし、当該位置を中心とする複数の音声出力部４２から出力したものであってもよい（例えば、顔の大きさに応じて当該音声を出力させる音声出力部４２の数を決定してもよい）。 And the audio | voice control part 14 controls each audio | voice output part 12 based on the detection result of the target object detection part 13 (step S104). In the example of FIG. 4, the sound control unit 14 controls each sound output unit 12 (including the sound output unit 42) so that sound is output only from the sound output unit 42 corresponding to the position of the face of the person 41. To do. Thereby, the sound can be heard only from the direction of the person 41. The sound may be output from one sound output unit 42 corresponding to the position of the face (object), or may be output from a plurality of sound output units 42 centered on the position. (For example, the number of audio output units 42 that output the audio may be determined according to the size of the face).

対象物が移動する場合であっても（例えば、人物４１がスクリーン上の右から左へ動くような動画像であったとしても）、当該対象物の移動に伴って、その位置に対応する音声出力部４２は順次切り替わるため、人物４１の方向からのみ音声が聞こえるという状態を保つことができる（図４参照）。 Even when the object moves (for example, even if the person 41 is a moving image moving from right to left on the screen), the sound corresponding to the position of the object moves along with the movement of the object. Since the output unit 42 is sequentially switched, it is possible to maintain a state where sound can be heard only from the direction of the person 41 (see FIG. 4).

以上述べたように、本実施形態では、対象物の検出結果に応じて複数の音声出力部１２を制御することにより、臨場感を得ることができる（図４の例では、実際に人物４１がその場にいるかのような臨場感を得ることができる。）。 As described above, in the present embodiment, it is possible to obtain a sense of realism by controlling the plurality of audio output units 12 according to the detection result of the object (in the example of FIG. 4, the person 41 actually You can get a sense of presence as if you were there.)

＜変形例１＞
変形例１では、音声に対象物以外の音声（例えばＢＧＭなど）が混在している場合について説明する。なお、上記説明と同様の機能及び処理については説明を省略する。 <Modification 1>
In the first modification, a case where voices other than the target object (for example, BGM) are mixed in the voice will be described. Note that descriptions of functions and processes similar to those described above are omitted.

＜装置構成＞
図５は、変形例１における情報伝達装置の機能構成を示すブロック図である。この情報伝達装置は、音声抽出部５１を更に備えている。なお、図１と同様の機能については同じ符号を付けている。 <Device configuration>
FIG. 5 is a block diagram illustrating a functional configuration of the information transmission device according to the first modification. This information transmission device further includes a voice extraction unit 51. In addition, the same code | symbol is attached | subjected about the function similar to FIG.

音声抽出部５１は、対象物検出部１３で検出された対象物の音声を抽出する機能である。変形例１では、音声抽出部５１が、映写機１５から取得した音声のうち、当該対象物の音声を抽出・区別し、音声制御部１４へ出力する。なお、音声抽出部５１における音声抽出処理は、既存のどのような技術が適用されてもよい。一例を挙げると、音声を周波数分析し、対象物の音声の周波数域のみを抽出する手法などがある。 The voice extraction unit 51 has a function of extracting the voice of the target detected by the target detection unit 13. In the first modification, the voice extraction unit 51 extracts and distinguishes the voice of the target object from the voices acquired from the projector 15 and outputs the voice to the voice control unit 14. Note that any existing technique may be applied to the voice extraction processing in the voice extraction unit 51. As an example, there is a method of performing frequency analysis of sound and extracting only the frequency range of the sound of the object.

＜情報伝達機能＞
図６のフローチャートに沿って、変形例１の情報伝達装置の機能及び処理の流れについて説明する。 <Information transmission function>
The function and processing flow of the information transmission apparatus according to the first modification will be described with reference to the flowchart of FIG.

ステップＳ２０１〜Ｓ２０３の処理は、夫々、図３のステップＳ１０１〜Ｓ１０３の処理と同様のため、説明を省略する。ただし、変形例１では、音声に対象物以外の音声が混
在している。 The processes in steps S201 to S203 are the same as the processes in steps S101 to S103 in FIG. However, in the modification 1, the sound other than the object is mixed in the sound.

ステップＳ２０３の次に、音声抽出部５１が、ステップＳ２０１で取得された音声から対象物の音声を抽出する（ステップＳ２０４）。図４の例では、人物４１の音声（声）が抽出される。 Following step S203, the voice extraction unit 51 extracts the voice of the object from the voice acquired in step S201 (step S204). In the example of FIG. 4, the voice (voice) of the person 41 is extracted.

そして、音声制御部１４が、対象物検出部１３の検出結果に基づいて夫々の音声出力部１２を制御する（ステップＳ２０５）。図４の例では、ステップＳ２０４で抽出された音声が、人物４１の顔の位置に対応する音声出力部４２からのみ出力されるように制御される。これにより、音声に人物４１以外の音声が混在していても、人物４１の方向からのみ当該人物の音声が聞こえるようになる。 And the audio | voice control part 14 controls each audio | voice output part 12 based on the detection result of the target object detection part 13 (step S205). In the example of FIG. 4, the audio extracted in step S <b> 204 is controlled so as to be output only from the audio output unit 42 corresponding to the face position of the person 41. Thereby, even if voices other than the person 41 are mixed in the voice, the voice of the person can be heard only from the direction of the person 41.

以上述べたように、変形例１では、情報伝達装置が音声抽出部５１を更に備え、取得した音声から対象物の音声を抽出する。それにより、音声に対象物以外の音声が混在していても、対象物の方向からのみ当該人物の音声が聞こえるようになり、臨場感を得ることができる。 As described above, in the first modification, the information transmission apparatus further includes the voice extraction unit 51, and extracts the voice of the object from the acquired voice. As a result, even when voices other than the target are mixed in the voice, the voice of the person can be heard only from the direction of the target, and a sense of reality can be obtained.

＜変形例２＞
変形例２では、対象物の大きさに応じて、音声の大きさを決定する。実際、近くの対象物が発する音は、遠くの対象物が発する音と比較して大きい。また、動画像は、遠くの対象物ほど小さく写る傾向にある。そのため、対象物の大きさに応じて音声の大きさを決定することにより、より現実に近い状況を再現することができる。 <Modification 2>
In the second modification, the loudness is determined according to the size of the object. In fact, the sound emitted by nearby objects is louder than the sound emitted by distant objects. In addition, moving images tend to appear smaller as an object is farther away. For this reason, it is possible to reproduce a more realistic situation by determining the volume of the sound according to the size of the object.

例えば、図７に示すように人物７１が近づいてくるような動画像では、人物７１が近づくことによって徐々にその大きさ（サイズ）が大きく写るようになる。そこで、人物７１の大きさが大きくなるにつれて、人物７１の音声（人物７１の位置に対応する音声出力部７２から出力される音声）の音量を徐々に大きくする。それにより、視聴者は、人物７１が本当に近づいてきているかのような立体感を得ることができる。なお、音量は、音量そのものを上げてもよいし、当該音声を出力する音声出力部７２の数を増やすことにより音量を増やすことができる場合には、そのようにして音量を上げてもよい。 For example, as shown in FIG. 7, in a moving image in which a person 71 approaches, the size (size) gradually appears larger as the person 71 approaches. Therefore, as the size of the person 71 increases, the volume of the sound of the person 71 (the sound output from the sound output unit 72 corresponding to the position of the person 71) is gradually increased. Thereby, the viewer can obtain a stereoscopic effect as if the person 71 is really approaching. Note that the volume may be increased, or the volume may be increased in the case where the volume can be increased by increasing the number of audio output units 72 that output the sound.

＜変形例３＞
変形例３では、図８に示すように、変形例１の情報伝達装置が独立音声出力部８１を更に備えている。 <Modification 3>
In Modification 3, as shown in FIG. 8, the information transmission apparatus of Modification 1 further includes an independent audio output unit 81.

独立音声出力部８１は、音声出力部１２同様、音声を出力することのできる装置であればどのようなものであってもよい。ただし、独立音声出力部８１は、スクリーン１１と独立の位置に配置されている。 As with the audio output unit 12, the independent audio output unit 81 may be any device that can output audio. However, the independent audio output unit 81 is arranged at a position independent of the screen 11.

変形例３では、図９に示すように、変形例１における対象物以外の音声（音声抽出部５１で（対象物の音声が）抽出された残りの音声）が独立音声出力部８１から出力されるように制御される（図９の例では、対象物として人物９１が検出されたとし、人物９１の音声は人物９１の位置に対応する音声出力部１２から出力されるとする）。これにより、対象物（人物９１）以外から発せられる音声（ＢＧＭなど）は、スクリーン１１外から聞こえるようになり、音の立体感および臨場感が更に高まる。 In Modification 3, as shown in FIG. 9, the sound other than the object in Modification 1 (the remaining sound extracted by the sound extraction unit 51 (the sound of the object)) is output from the independent sound output unit 81. (In the example of FIG. 9, it is assumed that a person 91 is detected as an object, and the sound of the person 91 is output from the sound output unit 12 corresponding to the position of the person 91). Thereby, sound (BGM or the like) uttered from other than the object (person 91) can be heard from outside the screen 11, and the three-dimensional effect and the sense of reality of the sound are further enhanced.

＜変形例４＞
変形例４では、図１０に示すように、対象物以外の音声が全ての音声出力部１２から出力されるように制御される（図１０の例では、対象物として人物１０１が検出されたとし、人物１０１の音声は人物１０１の位置に対応する音声出力部１２から出力されるとする
）。これにより、抽出手段で抽出された音声と残りの音声を合わせた音声（つまり、映写機１５が取得した音声）が対象物の位置に対応する音声出力部１２から出力され、対象物以外の音声が当該音声出力部１２以外の音声出力部１２から出力される。 <Modification 4>
In Modification 4, as shown in FIG. 10, control is performed so that sound other than the object is output from all the sound output units 12 (in the example of FIG. 10, the person 101 is detected as the object). The voice of the person 101 is output from the voice output unit 12 corresponding to the position of the person 101). As a result, a sound obtained by combining the sound extracted by the extracting means and the remaining sound (that is, the sound acquired by the projector 15) is output from the sound output unit 12 corresponding to the position of the object, and the sound other than the object is output. The sound is output from the sound output unit 12 other than the sound output unit 12.

変形例４では、対象物以外の位置から対象物の音声を無くすことにより、対象物以外の音声はスクリーン１１全体から聞こえ、対象物の音声は対象物の方向からのみ聞こえるようになる。これにより、視聴者は、臨場感を得ることができる。 In the modified example 4, by eliminating the sound of the object from a position other than the object, the sound other than the object can be heard from the entire screen 11, and the sound of the object can be heard only from the direction of the object. Thereby, the viewer can get a sense of reality.

＜変形例５＞
変形例５では、対象物以外の音声がスクリーン１１の一部の領域から出力されるように制御される。 <Modification 5>
In the fifth modification, control is performed so that sound other than the object is output from a partial area of the screen 11.

図１１の例では、対象物の位置に対応しない音声出力部１２（すなわち、図１１の点線で囲まれる領域以外に対応する音声出力部１２）から対象物以外の音声を出力する（図１１の例では、対象物として人物１１１が検出されたとし、人物１１１の音声は人物１１１の位置に対応する音声出力部１２から出力されるとする）。対象物の位置と区別して音声を出力することにより、視聴者は臨場感を得ることができる。 In the example of FIG. 11, the sound other than the object is output from the sound output unit 12 that does not correspond to the position of the object (that is, the sound output unit 12 that corresponds to a region other than the region surrounded by the dotted line in FIG. 11) (FIG. 11). In the example, it is assumed that the person 111 is detected as an object, and the sound of the person 111 is output from the sound output unit 12 corresponding to the position of the person 111). By outputting the sound separately from the position of the object, the viewer can obtain a sense of reality.

図１２の例では、スクリーン１１の中心位置に対応する音声出力部１２から対象物以外の音声を出力する（図１２の例では、対象物として人物１２１が検出されたとし、人物１２１の音声は人物１２１の位置に対応する音声出力部１２から出力されるとする）。対象物以外の音声を常にスクリーン１１の中心位置から出力することにより、視聴者は、当該音声を対象物の音声と区別することができる（対象物がスクリーン１１の中心位置に写るときがあったとしても、対象物以外の音声は常に中心位置から出力されているため区別することができる）。 In the example of FIG. 12, a sound other than the object is output from the sound output unit 12 corresponding to the center position of the screen 11 (in the example of FIG. 12, assuming that the person 121 is detected as the object, the sound of the person 121 is It is assumed that the sound is output from the sound output unit 12 corresponding to the position of the person 121). By always outputting the sound other than the object from the center position of the screen 11, the viewer can distinguish the sound from the sound of the object (the object sometimes appears in the center position of the screen 11. However, since the sound other than the object is always output from the center position, it can be distinguished).

変形例５では、対象物以外の音声を一部の領域に限定して出力することにより、消費電力を低く抑えることができ、且つ、視聴者は臨場感を得ることができる。 In the modified example 5, by outputting the sound other than the target object limited to a part of the region, the power consumption can be suppressed low, and the viewer can get a sense of reality.

なお、変形例１〜５は、互いに組み合わせてもよい。その場合、対象物以外の音声は、複数に分けてもよい。例えば、対象物以外の音声の一部を音声出力部１２から出力し、残りの音声を独立音声出力部８１から出力してもよい（雨の音などは音声出力部１２から出力し、ＢＧＭは独立音声出力部８１から出力するなど）。また、対象物検出処理などの画像解析を用いて音声制御を行う手法であれば、上記手法に限らなくてもよい。 Modifications 1 to 5 may be combined with each other. In that case, the audio other than the object may be divided into a plurality. For example, a part of the sound other than the object may be output from the sound output unit 12, and the remaining sound may be output from the independent sound output unit 81 (rain sound or the like is output from the sound output unit 12, and BGM For example, output from the independent voice output unit 81). Further, the method is not limited to the above method as long as the method performs sound control using image analysis such as object detection processing.

なお、図３，６に示す処理の流れは、夫々、可能な限り処理の順序を入れ替えてもよい。例えば、映写機１５で取得した後に対象物検出処理や音声制御などの処理をするのではなく、対象物検出部１３と音声制御部１４が、動画像と音声を夫々取得し、種々の処理をした後に映写機１５に出力してもよい（音声は映写機１５に出力せずに、直接、音声出力部１２に出力してもよい）。これにより、対象物検出や音声制御などの処理負荷が大きくても（処理時間が長くても）、音声の出力と動画像の表示を同じタイミングにすることができる。 Note that the processing order shown in FIGS. 3 and 6 may be changed as much as possible. For example, instead of performing processing such as object detection processing and audio control after acquisition by the projector 15, the object detection unit 13 and the audio control unit 14 acquire moving images and audio, respectively, and perform various processes. It may be output to the projector 15 later (the audio may be output directly to the audio output unit 12 without being output to the projector 15). Thereby, even if the processing load such as object detection and voice control is large (even if the processing time is long), the output of the voice and the display of the moving image can be made at the same timing.

なお、映写機１５の持つ機能を情報伝達装置の構成要素に含めてもよい。図１３は当該情報伝達装置の機能構成を示すブロック図である。当該情報伝達装置は、入力部１３１及び動画像表示部１３２を更に備える。 The function of the projector 15 may be included in the constituent elements of the information transmission device. FIG. 13 is a block diagram showing a functional configuration of the information transmission apparatus. The information transmission apparatus further includes an input unit 131 and a moving image display unit 132.

入力部１３１は、動画像と音声が入力されるものであり、動画像表示部１３２は、入力された動画像をスクリーンに表示するものである（つまり、入力部１３１と動画像表示部１３２とを合わせたものは、映写機１５の役割を果たす）。 The input unit 131 is for inputting a moving image and sound, and the moving image display unit 132 is for displaying the input moving image on the screen (that is, the input unit 131 and the moving image display unit 132). Are combined to play the role of the projector 15).

なお、本実施形態では、動画像について詳しく述べていないが、動画像は、（本実施形態で例示したような）ＤＶＤなどに記憶されている映画でなくてもよい。例えば、アンテナで受信したテレビ番組であってもよいし、ゲーム機などから得られるゲーム画面であってもよい。対象物が検出できる動画像であればどのようなものであってもよい。 Although the moving image is not described in detail in the present embodiment, the moving image may not be a movie stored on a DVD (as exemplified in the present embodiment). For example, it may be a television program received by an antenna, or a game screen obtained from a game machine or the like. Any moving image that can detect the object may be used.

なお、本実施形態では、映写機１５を用いて映写幕（スクリーン１１）に動画像が表示される場合（映画館などの上映システムに適用した例）について説明したが、本発明の情報伝達装置は、スクリーン１１にテレビジョン装置などの映像面を用い、動画像をアンテナやＤＶＤ再生装置などから当該テレビジョン装置が取得するような構成であってもよい。すなわち、本発明の情報伝達装置は、上映システムに限らず、テレビジョン装置やコンピュータ端末、テレビ電話などに適用することができる。 In this embodiment, the case where a moving image is displayed on the projection screen (screen 11) using the projector 15 (example applied to a screening system such as a movie theater) has been described. Alternatively, a configuration may be adopted in which a video screen such as a television device is used for the screen 11 and the television device acquires a moving image from an antenna, a DVD playback device, or the like. That is, the information transmission device of the present invention is not limited to a screening system, but can be applied to a television device, a computer terminal, a video phone, and the like.

図１は、情報伝達装置の機能構成を示すブロック図である。FIG. 1 is a block diagram illustrating a functional configuration of the information transmission apparatus. 図２は、音声出力部１２の設置様態の一例を示す図である。FIG. 2 is a diagram illustrating an example of an installation mode of the audio output unit 12. 図３は、情報伝達装置の処理の流れを示すフローチャートである。FIG. 3 is a flowchart showing a flow of processing of the information transmission apparatus. 図４は、音声制御部１４の制御方法の一例を示す図である。FIG. 4 is a diagram illustrating an example of a control method of the voice control unit 14. 図５は、変形例１における情報伝達装置の機能構成を示すブロック図である。FIG. 5 is a block diagram illustrating a functional configuration of the information transmission device according to the first modification. 図６は、変形例１における情報伝達装置の処理の流れを示すフローチャートである。FIG. 6 is a flowchart illustrating a processing flow of the information transmission apparatus according to the first modification. 図７は、音声制御部１４の制御方法の一例を示す図である。FIG. 7 is a diagram illustrating an example of a control method of the voice control unit 14. 図８は、独立音声出力部８１を備えた情報伝達装置の機能構成を示すブロック図である。FIG. 8 is a block diagram illustrating a functional configuration of an information transmission device including the independent audio output unit 81. 図９は、音声制御部１４の制御方法の一例を示す図である。FIG. 9 is a diagram illustrating an example of a control method of the voice control unit 14. 図１０は、音声制御部１４の制御方法の一例を示す図である。FIG. 10 is a diagram illustrating an example of a control method of the voice control unit 14. 図１１は、音声制御部１４の制御方法の一例を示す図である。FIG. 11 is a diagram illustrating an example of a control method of the voice control unit 14. 図１２は、音声制御部１４の制御方法の一例を示す図である。FIG. 12 is a diagram illustrating an example of a control method of the voice control unit 14. 図１３は、入力部１３１及び動画像表示部１３２を備えた情報伝達装置の機能構成を示すブロック図である。FIG. 13 is a block diagram illustrating a functional configuration of an information transmission apparatus including the input unit 131 and the moving image display unit 132.

Explanation of symbols

１１スクリーン
１２，４２，７２音声出力部
１３対象物検出部
１４音声制御部
１５映写機
４１，７１，９１，１０１，１１１，１２１人物
８１独立音声出力部
１３１入力部
１３２動画像表示部 DESCRIPTION OF SYMBOLS 11 Screen 12,42,72 Audio | voice output part 13 Object detection part 14 Audio | voice control part 15 Projector 41,71,91,101,111,121 Person 81 Independent audio | voice output part 131 Input part 132 Moving image display part

Claims

A screen on which moving images are displayed;
A plurality of audio output units arranged corresponding to different positions on the screen;
Object detection means for detecting an object from the moving image;
Control means for controlling each of the plurality of sound output units based on the detection result of the object detection means;
An information transmission device comprising:

Voice extraction means for extracting the voice of the object detected by the object detection means from the input voice;
The information transmission apparatus according to claim 1, wherein the control unit outputs the voice extracted by the extraction unit from a voice output unit corresponding to a display position of the object.

An input unit for inputting the moving image and sound;
A moving image display unit for displaying the input moving image on the screen;
The information transmission apparatus according to claim 1, further comprising: