JP2015159461A

JP2015159461A - Communication device, communication system, image segmentation method, and program

Info

Publication number: JP2015159461A
Application number: JP2014033891A
Authority: JP
Inventors: 太一本荘; Taichi Honjo; 大河村山; Taiga Murayama
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2014-02-25
Filing date: 2014-02-25
Publication date: 2015-09-03

Abstract

PROBLEM TO BE SOLVED: To provide a communication device, communication system, image segmentation method, and program for easily determining an image area to be segmented as an object to be displayed on a communication device of the opposite party.SOLUTION: A control unit 23 of each digital signage device 1 detects the direction of a voice acquired from a voice acquisition unit 31, determines an image area to be segmented from a photographed image of a participant in communication facing an image forming unit on the basis of the detected direction of the voice, and segments the determined image area from the photographed image and transmits the image area to the digital signage device 1 of the opposite party with a communication unit 26.

Description

本発明は、通信装置、通信システム、画像切り出し方法及びプログラムに関する。 The present invention relates to a communication device, a communication system, an image clipping method, and a program.

従来、映像供給装置に接続される映像出力装置と、反射部材と、スクリーンとを有し、映像出力装置からコンテンツを投影する出力光を反射部材によって反射し、この反射部材によって反射された出力光を当該コンテンツの輪郭の形状に形成されたスクリーンに投影することにより、閲覧者に対して印象を高めた表示を行うことができる映像出力装置搭載機器が知られている（例えば、特許文献１参照）。 2. Description of the Related Art Conventionally, an image output device connected to an image supply device, a reflection member, and a screen, output light that projects content from the image output device is reflected by the reflection member, and output light reflected by the reflection member Is projected on a screen formed in the shape of the outline of the content, and there is known a video output device-equipped device that can perform a display with a higher impression on the viewer (for example, see Patent Document 1). ).

特開２０１１−１５０２２１号公報JP 2011-150221 A

しかしながら、特許文献１に記載の映像出力装置搭載機器では、コンテンツ表示用に表示対象の人物等の画像（静止画像やできるだけ動かないで撮影した動画像）を撮影し、それに合わせて表示手段としてのスクリーンを作成していた。そのため、このような装置は、ビデオ通話用の通信装置として相手側の通信装置から送られてくる話者の画像を表示するには適さなかった。特に、スクリーンには一人の画像しか表示されないため、例えば、通話に参加している人物が複数人である場合、どの人を表示対象として切り出すのかが問題となる。 However, the video output device-equipped device described in Patent Document 1 captures an image of a person or the like to be displayed for content display (a still image or a moving image captured without moving as much as possible), and accordingly, as a display unit I was creating a screen. For this reason, such a device is not suitable for displaying an image of a speaker sent from a communication device on the other side as a communication device for a video call. In particular, since only one image is displayed on the screen, for example, when there are a plurality of persons participating in a call, it becomes a problem which one is to be cut out as a display target.

本発明の課題は、相手側通信装置への表示対象として切り出す画像領域を容易に決定できるようにすることである。 An object of the present invention is to make it possible to easily determine an image region to be cut out as a display target on a counterpart communication device.

上記課題を解決するため、請求項１に記載の発明は、
相手側の通信装置と画像及び音声を送受信する通信装置であって、
前記相手側の通信装置に送信するための画像を撮影する撮影手段と、
音声を取得する音声取得手段と、
前記音声取得手段により取得された音声の方向を検出する検出手段と、
前記検出された音声の方向に基づいて前記撮影手段により取得された撮影画像から切り出す画像領域を決定する決定手段と、
を備える。 In order to solve the above-mentioned problem, the invention described in claim 1
A communication device that transmits and receives images and sound to and from a communication device on the other side,
Photographing means for photographing an image to be transmitted to the communication device on the other side;
Audio acquisition means for acquiring audio;
Detecting means for detecting the direction of the sound acquired by the sound acquiring means;
Determining means for determining an image region to be cut out from the captured image acquired by the imaging means based on the direction of the detected sound;
Is provided.

本発明によれば、相手側通信装置への表示対象として切り出す画像領域を容易に決定することが可能となる。 According to the present invention, it is possible to easily determine an image region to be cut out as a display target on the counterpart communication device.

本実施形態における通信システムの全体構成例を示す図である。It is a figure which shows the example of whole structure of the communication system in this embodiment. 図１のデジタルサイネージ装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the digital signage apparatus of FIG. 図２のスクリーン部の概略構成を示す図である。It is a figure which shows schematic structure of the screen part of FIG. 図２の制御部により実行される通話処理を示すフローチャートである。It is a flowchart which shows the telephone call process performed by the control part of FIG. （ａ）は、音声を検出する際に複数の方向によって分割された範囲を示す図、（ｂ）は、撮影画像における音声を検出する際に分割された範囲のそれぞれに対応する領域を示す図である。(A) is a figure which shows the range divided | segmented by the several direction when detecting an audio | voice, (b) is a figure which shows the area | region corresponding to each of the range divided | segmented when detecting the audio | voice in a picked-up image. It is. 覗き込み動作を説明するための図である。It is a figure for demonstrating a peeping operation | movement. 図２の制御部により実行される移動検出処理を示すフローチャートである。It is a flowchart which shows the movement detection process performed by the control part of FIG.

以下、添付図面を参照して本発明に係る好適な実施形態を詳細に説明する。なお、本発明は、図示例に限定されるものではない。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The present invention is not limited to the illustrated example.

［通信システム１００の構成］
図１は、本発明の実施形態における通信システム１００の全体構成を示す図である。図１に示すように、通信システム１００は、複数のデジタルサイネージ装置１が通信ネットワークＮを介して接続可能に構成されている。なお、本実施形態においては、通信ネットワークＮはインターネットであることとして説明するが、これに限定されず、例えば、電話回線としてもよい。 [Configuration of Communication System 100]
FIG. 1 is a diagram showing an overall configuration of a communication system 100 according to an embodiment of the present invention. As shown in FIG. 1, the communication system 100 is configured such that a plurality of digital signage devices 1 can be connected via a communication network N. In the present embodiment, the communication network N is described as being the Internet, but is not limited thereto, and may be a telephone line, for example.

［デジタルサイネージ装置１の構成］
図２は、本実施形態における通信装置としてのデジタルサイネージ装置１の主制御構成を示すブロック図である。デジタルサイネージ装置１は、通信ネットワークＮを介して他のデジタルサイネージ装置１とのビデオ通話が可能である。 [Configuration of Digital Signage Device 1]
FIG. 2 is a block diagram showing a main control configuration of the digital signage device 1 as a communication device in the present embodiment. The digital signage device 1 can make a video call with another digital signage device 1 via the communication network N.

図２に示すように、デジタルサイネージ装置１は、コンテンツの映像光を照射する投影部２１と、投影部２１から照射された映像光を背面で受けて前面に投影するスクリーン部２２とを備えている。 As shown in FIG. 2, the digital signage apparatus 1 includes a projection unit 21 that irradiates content image light, and a screen unit 22 that receives the image light emitted from the projection unit 21 at the back and projects it onto the front. Yes.

まず、投影部２１について説明する。
投影部２１は、制御部２３と、プロジェクタ２４と、記憶部２５と、通信部２６と、を備えている。プロジェクタ２４、記憶部２５、通信部２６は、図１に示すように制御部２３に接続されている。 First, the projection unit 21 will be described.
The projection unit 21 includes a control unit 23, a projector 24, a storage unit 25, and a communication unit 26. The projector 24, the storage unit 25, and the communication unit 26 are connected to the control unit 23 as shown in FIG.

制御部２３は、記憶部２５に記憶されている各種のプログラムを実行して所定の演算や各部の制御を行うＣＰＵ（Central Processing Unit）とプログラム実行時の作業領域となるメモリとを備えている（いずれも図示略）。制御部２３は、記憶部２５のプログラム記憶部２５１に記憶されているプログラムとの協働により、検出手段、決定手段、切り出し手段、送信制御手段、音声出力制御手段として機能する。また、撮像部３０との協働により移動検出手段として機能する。 The control unit 23 includes a CPU (Central Processing Unit) that executes various programs stored in the storage unit 25 to perform predetermined calculations and control of each unit, and a memory that is a work area when the program is executed. (Both not shown). The control unit 23 functions as a detection unit, a determination unit, a cutout unit, a transmission control unit, and a voice output control unit in cooperation with a program stored in the program storage unit 251 of the storage unit 25. In addition, it functions as a movement detection unit in cooperation with the imaging unit 30.

プロジェクタ２４は、制御部２３から出力された画像データを映像光に変換してスクリーン部２２の背面に向けて照射する投影装置である。プロジェクタ２４は、例えば、アレイ状に配列された複数個（ＸＧＡの場合、横１０２４画素×縦７６８画素）の微小ミラーの各傾斜角度を個々に高速でオン／オフ動作して表示動作することでその反射光により光像を形成する表示素子であるＤＭＤ（デジタルマイクロミラーデバイス）を利用したＤＬＰ（Digital Light Processing）(登録商標)プロジェクタが適用可能である。 The projector 24 is a projection device that converts the image data output from the control unit 23 into video light and irradiates it toward the back of the screen unit 22. For example, the projector 24 performs a display operation by individually turning on / off each tilt angle of a plurality of micromirrors arranged in an array (in the case of XGA, horizontal 1024 pixels × vertical 768 pixels) at high speed. A DLP (Digital Light Processing) (registered trademark) projector using a DMD (digital micromirror device), which is a display element that forms a light image with the reflected light, is applicable.

記憶部２５は、ＨＤＤ（Hard Disk Drive）や不揮発性の半導体メモリ等により構成される。記憶部２５には、図１に示すように、プログラム記憶部２５１、電話帳記憶部２５２等が設けられている。 The storage unit 25 is configured by an HDD (Hard Disk Drive), a nonvolatile semiconductor memory, or the like. As shown in FIG. 1, the storage unit 25 is provided with a program storage unit 251, a telephone directory storage unit 252, and the like.

プログラム記憶部２５１には、制御部２３で実行されるシステムプログラムや各種処理プログラム、これらのプログラムの実行に必要なデータ等が記憶されている。
電話帳記憶部２５２には、予め登録された電話番号と名前とが対応付けて記憶されている。 The program storage unit 251 stores a system program executed by the control unit 23, various processing programs, data necessary for executing these programs, and the like.
The telephone book storage unit 252 stores a telephone number and a name registered in advance in association with each other.

通信部２６は、モデム、ルータ、ネットワークカード等により構成され、外部機器との通信を行う。 The communication unit 26 includes a modem, a router, a network card, and the like, and performs communication with an external device.

次に、スクリーン部２２について説明する。
図３は、スクリーン部２２の概略構成を示す正面図である。図３に示すようにスクリーン部２２には、画像形成部２７と、画像形成部２７を支持する台座２８とが備えられている。 Next, the screen unit 22 will be described.
FIG. 3 is a front view illustrating a schematic configuration of the screen unit 22. As shown in FIG. 3, the screen unit 22 includes an image forming unit 27 and a pedestal 28 that supports the image forming unit 27.

画像形成部２７は、映像光の照射方向に対して略直交するように配置された、例えばアクリル板などの人型に成形された一枚の透光板２９に、フィルム状のフレネルレンズが積層された背面投影用のフィルムスクリーンが貼付されて構成されたスクリーンである。この画像形成部２７は、表示手段を構成している。 The image forming unit 27 is formed by laminating a film-like Fresnel lens on a single translucent plate 29 formed in a human shape such as an acrylic plate, which is disposed so as to be substantially orthogonal to the image light irradiation direction. The screen is configured by attaching a rear projection film screen. The image forming unit 27 constitutes a display unit.

画像形成部２７の上部には、例えばカメラなどの撮像部３０が設けられている。この撮像部３０が画像形成部２７に対向する空間の画像を撮影して撮影画像を生成し、制御部２３に出力する。撮像部３０は、図示は省略するが、光学系及び撮像素子を備えるカメラと、カメラを制御する撮像制御部とを備えている。撮像素子は、例えば、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal-oxide Semiconductor）等のイメージセンサであり、光学系を通過した光学像を２次元の画像信号に変換する。撮像部３０は、撮影手段として機能する。 An imaging unit 30 such as a camera is provided on the upper part of the image forming unit 27. The imaging unit 30 captures an image of a space facing the image forming unit 27 to generate a captured image, and outputs the captured image to the control unit 23. Although not shown, the imaging unit 30 includes a camera including an optical system and an imaging element, and an imaging control unit that controls the camera. The imaging element is an image sensor such as a charge coupled device (CCD) or a complementary metal-oxide semiconductor (CMOS), and converts an optical image that has passed through the optical system into a two-dimensional image signal. The imaging unit 30 functions as a photographing unit.

台座２８には、音声取得部３１と、操作部３２と、音声出力部３３と、が設けられている。 The pedestal 28 is provided with an audio acquisition unit 31, an operation unit 32, and an audio output unit 33.

音声取得部３１は、指向性マイクロフォンを複数備え、各マイクロフォンにより音声を取得して電気信号に変換し、制御部２３に出力する。 The sound acquisition unit 31 includes a plurality of directional microphones, acquires sound by each microphone, converts the sound into an electric signal, and outputs the electric signal to the control unit 23.

操作部３２には、発信ボタン３２ａ、応答ボタン３２ｂ、通話終了ボタン３２ｃ、操作パネル３２ｄ等が設けられている。発信ボタン３２ａは、ビデオ通話の発信を指示するためのボタンである。応答ボタン３２ｂは、ビデオ通話への応答を指示するためのボタンである。通話終了ボタン３２ｃは、ビデオ通話の終了を指示するためのボタンである。操作パネル３２ｄは、ＬＣＤ（Liquid Crystal Display）等の表示部と、ＬＣＤの表面を覆うように透明電極を格子状に配置したタッチパネルとにより構成され、電話帳記憶部２５２に記憶されている電話帳等を表示して、手指等により選択された電話番号の位置情報を検出して制御部２３に出力する。 The operation unit 32 includes a call button 32a, a response button 32b, a call end button 32c, an operation panel 32d, and the like. The transmission button 32a is a button for instructing transmission of a video call. The response button 32b is a button for instructing a response to the video call. The call end button 32c is a button for instructing the end of the video call. The operation panel 32d is composed of a display unit such as an LCD (Liquid Crystal Display) and a touch panel in which transparent electrodes are arranged in a grid pattern so as to cover the surface of the LCD, and is stored in the telephone directory storage unit 252. , Etc., and the position information of the telephone number selected by the finger or the like is detected and output to the control unit 23.

音声出力部３３は、音声出力手段としてのスピーカ３３ａ〜３３ｅを備え、制御部２３からの指示に従って音声を出力する。 The audio output unit 33 includes speakers 33 a to 33 e as audio output means, and outputs audio according to instructions from the control unit 23.

撮像部３０、音声取得部３１、操作部３２、音声出力部３３は、図２に示すように制御部２３に接続されている。 The imaging unit 30, the audio acquisition unit 31, the operation unit 32, and the audio output unit 33 are connected to the control unit 23 as illustrated in FIG.

［通信システム１００の動作］
次に、通信システム１００のデジタルサイネージ装置１におけるビデオ通話動作について説明する。
何れかのデジタルサイネージ装置１において、操作パネル３２ｄによりビデオ通話の相手先の電話番号が選択され、発信ボタン３２ａが押下されると、制御部２３は、通信部２６により通信ネットワークＮを介して相手先のデジタルサイネージ装置１に接続要求を送信する。
相手先のデジタルサイネージ装置１において、通信部２６により接続要求が着信（受信）されると、制御部２３は、着信を通知するための着信音を音声出力部３３に出力させる。応答ボタン３２ｂが押下されると、制御部２３は、通信部２６により通信ネットワークＮを介して発信元に接続応答を送信する。これにより、２つのデジタルサイネージ装置１間でセッションが確立し、通話回線が接続され、ビデオ通話状態となる。ビデオ通話状態では、後述する通話処理が実行され、発信側及び受信側の通話参加者は、それぞれのデジタルサイネージ装置１の画像形成部２７に対向して画像形成部２７に写っている相手側の通話参加者を見ながら通話を行う。 [Operation of Communication System 100]
Next, the video call operation in the digital signage apparatus 1 of the communication system 100 will be described.
In any digital signage apparatus 1, when the telephone number of the other party of the video call is selected by the operation panel 32d and the call button 32a is pressed, the control unit 23 causes the communication unit 26 to communicate with the other party via the communication network N. A connection request is transmitted to the previous digital signage apparatus 1.
When the connection request is received (received) by the communication unit 26 in the other party's digital signage apparatus 1, the control unit 23 causes the voice output unit 33 to output a ring tone for notifying the incoming call. When the response button 32 b is pressed, the control unit 23 transmits a connection response to the transmission source via the communication network N by the communication unit 26. As a result, a session is established between the two digital signage devices 1, a call line is connected, and a video call state is established. In the video call state, the call processing described later is executed, and the call participants on the transmission side and the reception side face the image forming unit 27 of each digital signage device 1 on the other side of the image forming unit 27. Make a call while watching the call participants.

通話回線が接続された何れかのデジタルサイネージ装置１において、通話終了ボタン３２ｃが押下されると、制御部２３は、通信部２６により通信ネットワークＮを介して回線切断要求を相手先のデジタルサイネージ装置１に送信する。相手先のデジタルサイネージ装置１に回線切断応答が届いた時点でセッションが終了し、通話回線が切断される。 When the call end button 32c is pressed in any of the digital signage devices 1 to which the call line is connected, the control unit 23 causes the communication unit 26 to issue a line disconnect request via the communication network N to the partner digital signage device. 1 to send. When a line disconnection response arrives at the other party's digital signage device 1, the session is terminated and the call line is disconnected.

図４に、ビデオ通話状態において発信側及び受信側のデジタルサイネージ装置１において実行される通話処理のフローチャートを示す。通話処理は、デジタルサイネージ装置１のそれぞれにおいて、制御部２３と記憶部２５のプログラム記憶部２５１に記憶されているプログラムとの協働により実行される。 FIG. 4 shows a flowchart of call processing executed in the digital signage device 1 on the transmission side and reception side in the video call state. Call processing is executed in each digital signage device 1 by cooperation of the control unit 23 and the program stored in the program storage unit 251 of the storage unit 25.

なお、本願において「切り出す」とは、実際に領域を切り出すことの他、相手側のデジタルサイネージ装置１において切り出し対象の領域を特定可能とすることを含み、例えば、切り出し対象の領域以外の領域を黒に置き換えたり、切り出し対象の領域の位置情報を撮影画像に対応付けたりすることを含むものとする。 In addition, in the present application, “cut out” includes, in addition to actually cutting out an area, including enabling to specify a region to be cut out in the digital signage device 1 on the other side. It includes replacement with black or associating position information of a region to be cut out with a captured image.

まず、制御部２３は、撮像部３０により撮影画像を取得させ（ステップＳ１）、取得された撮影画像に基づいて通話参加者の初期位置を取得する処理を行う（ステップＳ２）。具体的には、取得された撮影画像から顔領域を認識し、認識された顔領域が１つの場合にはその顔領域の中心を通話参加者の初期位置としてメモリに記憶する。顔領域が複数認識された場合は、撮影画像の最も中心に近い位置から検出された顔領域の中心を通話参加者の初期位置としてメモリに記憶する。なお、顔領域の認識及び顔領域を含む人物領域の認識は、公知の画像処理技術を用いて行うことができるので説明を省略する。 First, the control unit 23 causes the imaging unit 30 to acquire a captured image (step S1), and performs a process of acquiring the initial position of the call participant based on the acquired captured image (step S2). Specifically, the face area is recognized from the acquired photographed image, and when there is one recognized face area, the center of the face area is stored in the memory as the initial position of the call participant. When a plurality of face areas are recognized, the center of the face area detected from the position closest to the center of the captured image is stored in the memory as the initial position of the call participant. Note that the recognition of the face area and the recognition of the person area including the face area can be performed using a known image processing technique, and thus description thereof is omitted.

次いで、制御部２３は、通信部２６により相手側画像及び／又は相手側音声が受信されたか否かを判断する（ステップＳ３）。ここで、相手側画像は、相手側のデジタルサイネージ装置１で撮影された画像（即ち、相手側のデジタルサイネージ装置１の画像形成部２７に対向している通話参加者の撮影画像）のうち、一の通話参加者を表示対象としてその通話参加者の領域を切り出した画像である。 Next, the control unit 23 determines whether or not the partner image and / or partner voice has been received by the communication unit 26 (step S3). Here, the other party image is an image photographed by the other party digital signage device 1 (that is, a photographed image of a call participant facing the image forming unit 27 of the other party digital signage device 1). It is the image which cut out the area | region of the call participant for one call participant as a display object.

通信部２６により相手側画像及び／又は相手側音声が受信されていないと判断した場合（ステップＳ３；ＮＯ）、制御部２３は、ステップＳ５の処理に移行する。通信部２６により相手側画像及び／又は相手側音声が受信されたと判断した場合（ステップＳ３；ＹＥＳ）、制御部２３は、受信した相手側画像を画像形成部２７の形状に合うようにフィッティングしてプロジェクタ２４により画像形成部２７に表示（投影）させるとともに、受信した相手側音声を音声出力部３３により出力させ（ステップＳ４）、ステップＳ５の処理に移行する。 When it is determined that the other party image and / or the other party voice is not received by the communication unit 26 (step S3; NO), the control unit 23 proceeds to the process of step S5. When it is determined that the other party image and / or the other party voice has been received by the communication unit 26 (step S3; YES), the control unit 23 fits the received other party image to match the shape of the image forming unit 27. Then, the projector 24 displays (projects) the image on the image forming unit 27, and the received voice of the other party is output by the audio output unit 33 (step S4), and the process proceeds to step S5.

ここで、通信システム１００において、各デジタルサイネージ装置１の制御部２３は、後述するステップＳ８において、図５（ａ）に示すように、音声取得部３１の周囲の予め定められた範囲（前側１８０°の範囲）を複数（ここでは５つ）の方向によって分割し、その分割した複数の範囲毎の音声の大きさのレベル（以下、音声レベルという）を検出する。そして、後述するステップＳ９で説明するように、相手側のデジタルサイネージ装置１に音声を送信する場合には、上記複数の範囲毎の音声レベルを示す情報を合わせて送信する。即ち、相手側のデジタルサイネージ装置１からは、音声とともに、音声の検出を行った複数の範囲毎、即ち、方向毎の音声レベルを示す情報が送信されてくる。そこで、制御部２３は、ステップＳ４において音声を出力する際には、音声の検出を行った複数の範囲のそれぞれの音声レベルに基づいて、スピーカ３３ａ〜３３ｅから出力する音量に偏りを持たせて調整して出力する。具体的には、スピーカ３３ａ〜３３ｅは、それぞれ相手側のデジタルサイネージ装置１において音声を検出した複数の範囲（範囲１〜５）に対応しており、制御部２３は、音声レベルが大きい範囲に対応するスピーカほど受信した音声を大きい音量で出力させる。 Here, in the communication system 100, the control unit 23 of each digital signage device 1 in step S <b> 8 described later, as shown in FIG. 5A, a predetermined range (front side 180) around the voice acquisition unit 31. The range (°) is divided into a plurality of (here, five) directions, and the level of the sound level (hereinafter referred to as the sound level) for each of the divided ranges is detected. Then, as will be described later in step S9, when audio is transmitted to the digital signage device 1 on the other side, information indicating the audio level for each of the plurality of ranges is also transmitted. That is, the digital signage apparatus 1 on the other side transmits information indicating the sound level for each of a plurality of ranges in which sound is detected, that is, for each direction, along with the sound. Therefore, when outputting the sound in step S4, the control unit 23 gives a bias to the sound volume output from the speakers 33a to 33e based on the sound levels of the plurality of ranges in which the sound is detected. Adjust and output. Specifically, the speakers 33a to 33e correspond to a plurality of ranges (ranges 1 to 5) in which audio is detected in the digital signage device 1 on the other side, and the control unit 23 sets the range where the audio level is high. The corresponding speaker outputs the received sound at a louder volume.

ステップＳ５において、制御部２３は、撮像部３０により撮影画像を取得させ（ステップＳ５）、移動検出処理を実行する（ステップＳ６）。
ここで、上述のように、例えば、相手側の通話参加者が複数人であり、中心でない位置にいる人物が話者となった場合、ステップＳ４においては、話者が存在する範囲の方向に対応するスピーカから大きな音声が出力されることとなる。そうすると、画像形成部２７には相手側の通話参加者のうち一人の画像しか表示されないため、画像形成部２７の前にいる自装置側の通話参加者は、少なくとも顔（頭）を移動させてより大きな音声が出力されている方向にいる人物を覗き込む動作を行ってその方向の人物を見ようとする。
そこで、ステップＳ６においては移動検出処理を行って、ステップＳ５で取得した撮影画像に基づいて、画像形成部２７に対向している通話参加者（複数人の場合は、撮影画像の中心に顔領域の中心が最も近い通話参加者）の顔の中心がステップＳ１において取得した初期位置から移動したか否かを検出する。移動が検出された場合は、覗き込み動作があったことを示す情報として初期位置からの移動量及び移動方向を示す角度変更情報（詳細後述）を相手側のデジタルサイネージ装置１に送信することで、相手側のデジタルサイネージ装置１で表示対象の領域を切り出す際に、音声の方向（音声レベルが最も大きい範囲）に存在する通話参加者（即ち、話者）が切り出されるようにする。
なお、本実施形態においては、図６に示すように、画像形成部２７の右側（画像形成部２７を見ている人物Ａから見て左側）から大きな声が聞こえた場合、人物Ａは少なくとも顔を右側に移動させて覗き込み動作を行うことを前提として説明するが、これに限定されるものではない。 In step S5, the control unit 23 causes the imaging unit 30 to acquire a captured image (step S5) and executes a movement detection process (step S6).
Here, as described above, for example, when there are a plurality of call participants on the other side and a person who is not at the center is a speaker, in step S4, the speaker is in the direction of the range where the speaker exists. A loud sound is output from the corresponding speaker. Then, since only one image of the other party's call participant is displayed on the image forming unit 27, the call participant on the own device side in front of the image forming unit 27 moves at least the face (head). An operation of looking into a person in the direction in which a larger sound is output is performed to see the person in that direction.
Therefore, in step S6, a movement detection process is performed, and based on the photographed image acquired in step S5, a call participant facing the image forming unit 27 (in the case of a plurality of persons, a face region at the center of the photographed image). It is detected whether the center of the face of the call participant whose center is closest is moved from the initial position acquired in step S1. When movement is detected, by transmitting to the digital signage device 1 on the other side, angle change information (details will be described later) indicating the movement amount and movement direction from the initial position as information indicating that the peeping operation has occurred. When the other party's digital signage device 1 cuts out a display target area, a call participant (that is, a speaker) existing in the voice direction (range where the voice level is the highest) is cut out.
In the present embodiment, as shown in FIG. 6, when a loud voice is heard from the right side of the image forming unit 27 (left side as viewed from the person A looking at the image forming unit 27), the person A is at least a face. However, the present invention is not limited to this.

図７に、ステップＳ６において実行される移動検出処理のフローチャートを示す。移動検出処理は、制御部２３とプログラム記憶部２５１に記憶されているプログラムとの協働により実行される。 FIG. 7 shows a flowchart of the movement detection process executed in step S6. The movement detection process is executed in cooperation with the control unit 23 and the program stored in the program storage unit 251.

まず、制御部２３は、撮像部３０により取得された撮影画像から顔領域を認識し、認識した顔領域の中心位置を検出する（ステップＳ６０１）。複数の顔領域が認識された場合は、撮影画像の中心に最も近い顔領域の中心位置を検出する。ここでは、ステップＳ２で初期位置が取得された通話参加者の顔領域の中心位置を検出している。 First, the control unit 23 recognizes the face area from the captured image acquired by the imaging unit 30, and detects the center position of the recognized face area (step S601). When a plurality of face areas are recognized, the center position of the face area closest to the center of the captured image is detected. Here, the center position of the face area of the call participant whose initial position has been acquired in step S2 is detected.

次いで、制御部２３は、検出した顔の中心位置とメモリに記憶されている初期位置に基づいて、通話参加者（初期位置が取得された通話参加者）の移動が検出されたか否かを判断する（ステップＳ６０２）。具体的には、制御部２３は、検出した顔の中心位置とメモリに記憶されている初期位置がずれている場合に、通話参加者が移動したことを検出する。 Next, the control unit 23 determines whether the movement of the call participant (call participant whose initial position has been acquired) has been detected based on the detected center position of the face and the initial position stored in the memory. (Step S602). Specifically, the control unit 23 detects that the call participant has moved when the detected center position of the face is shifted from the initial position stored in the memory.

通話参加者の移動が検出されたと判断した場合（ステップＳ６０２；ＹＥＳ）、制御部２３は、初期位置に対する検出した顔の中心位置の上下、左右方向のそれぞれの成分のずれ量（移動量）を、初期位置、撮像部３０のレンズの中心、及びステップＳ６０１で検出した顔の中心位置の３点で形成される角度θとして算出し、角度変更情報として取得する（ステップＳ６０３）。なお、上下方向の角度θは、上方向への移動の場合をプラス、下方向への移動の場合をマイナスとする。左右方向の角度θは、撮像部３０からみて右方向への移動の場合をプラス、左方向への移動の場合をマイナスとする。各方向の角度θは、撮影画像上の中心及び各位置の座標、撮影画像のサイズや画角等に基づいて求めることができる。 If it is determined that the movement of the call participant has been detected (step S602; YES), the control unit 23 calculates the shift amount (movement amount) of each component in the vertical and horizontal directions of the detected center position of the face with respect to the initial position. The angle θ formed by three points of the initial position, the center of the lens of the imaging unit 30 and the center position of the face detected in step S601 is calculated and acquired as angle change information (step S603). The vertical angle θ is positive when moving upward and negative when moving downward. The angle θ in the left-right direction is positive when moving in the right direction as viewed from the imaging unit 30 and negative when moving in the left direction. The angle θ in each direction can be obtained based on the center of the captured image, the coordinates of each position, the size and angle of view of the captured image, and the like.

そして、制御部２３は、通信部２６により角度変更情報を相手側のデジタルサイネージ装置１に送信し（ステップＳ６０４）、移動検出処理を終了し、図４のステップＳ７に移行する。 And the control part 23 transmits angle change information to the digital signage apparatus 1 of the other party by the communication part 26 (step S604), complete | finishes a movement detection process, and transfers to step S7 of FIG.

一方、通話参加者の移動が検出されていないと判断した場合（ステップＳ６０２；ＮＯ）、制御部２３は、移動なしを通知する情報を通信部２６により相手側のデジタルサイネージ装置１に送信し（ステップＳ６０５）、移動検出処理を終了し、図４のステップＳ７に移行する。 On the other hand, when it is determined that the movement of the call participant is not detected (step S602; NO), the control unit 23 transmits information notifying the movement to the digital signage device 1 on the other side by the communication unit 26 ( In step S605), the movement detection process is terminated, and the process proceeds to step S7 in FIG.

図４のステップＳ７において、制御部２３は、音声取得部３１の各マイクロフォンにより音声信号を取得させ（ステップＳ７）、取得された音声信号に基づいて音声の方向を検出し（ステップＳ８）、取得された音声信号に基づく音声データと方向毎の音声レベルを示す情報を通信部２６により相手側のデジタルサイネージ装置１に送信する（ステップＳ９）。
ここで、ステップＳ８において、制御部２３は、音声取得部３１の各マイクロフォンにより取得された音声信号に基づいて、例えば、図５（ａ）に示すように音声取得部３１を中心として周囲１８０°（前面側）の範囲を方向によって複数（ここでは５つ）の範囲に分割した各範囲毎の音声レベルを検出する。そして、最も音声レベルの大きい範囲の方向を音声の方向として検出する。なお、音声の方向を検出する際に分割する範囲の数は、音声取得部３１の指向性マイクロフォンの数等に基づいて決定される。 In step S7 of FIG. 4, the control unit 23 causes the microphones of the sound acquisition unit 31 to acquire a sound signal (step S7), detects the direction of the sound based on the acquired sound signal (step S8), and acquires the sound signal. The communication unit 26 transmits the audio data based on the received audio signal and the information indicating the audio level for each direction to the digital signage device 1 on the other side (step S9).
Here, in step S8, the control unit 23, based on the audio signal acquired by each microphone of the audio acquisition unit 31, for example, around the audio acquisition unit 31 as shown in FIG. A sound level is detected for each range obtained by dividing the (front side) range into a plurality of (here, five) ranges depending on the direction. Then, the direction of the range having the highest audio level is detected as the audio direction. Note that the number of ranges to be divided when detecting the direction of voice is determined based on the number of directional microphones of the voice acquisition unit 31 and the like.

次いで、制御部２３は、所定値（所定の音声レベル）以上の音声が取得されたか否かを判断する（ステップＳ１０）。所定値以上の音声が取得されたと判断した場合（ステップＳ１０；ＹＥＳ）、制御部２３は、ステップＳ５で取得された撮影画像を音声の検出を行った複数の範囲に対応するように分割する（ステップＳ１１）。 Next, the control unit 23 determines whether or not a sound equal to or higher than a predetermined value (predetermined sound level) has been acquired (step S10). When it is determined that a sound of a predetermined value or more has been acquired (step S10; YES), the control unit 23 divides the captured image acquired in step S5 so as to correspond to a plurality of ranges in which sound is detected ( Step S11).

例えば、図５（ａ）に示すように音声の方向を検出した範囲が範囲１〜５である場合、撮影画像は、図５（ｂ）に示すように、範囲１〜５のそれぞれに対応する（範囲１〜５のそれぞれが写る）領域１〜領域５に分割される。なお、下部に示す数値は、各分割領域１〜５の範囲を撮像部３０のレンズの中心を０°とした角度で示したものであるが、一例であり、これに限定されるものではない。 For example, when the range in which the direction of sound is detected is in the range 1 to 5 as shown in FIG. 5A, the captured image corresponds to each of the ranges 1 to 5 as shown in FIG. It is divided into areas 1 to 5 (in which each of the ranges 1 to 5 is captured). In addition, although the numerical value shown below shows the range of each division | segmentation area | region 1-5 with the angle which made the center of the lens of the imaging part 30 0 degree, it is an example and is not limited to this. .

次いで、制御部２３は、撮影画像における、音声の方向に対応する、即ち、音声レベルの最も大きい範囲に対応する分割領域に顔認識処理を行い、顔領域が含まれているか否かを判断する（ステップＳ１２）。音声の方向に対応する撮影画像の分割領域に顔領域が含まれていると判断した場合（ステップＳ１２；ＹＥＳ）、制御部２３は、ステップＳ１４に移行する。音声の方向に対応する撮影画像の分割領域に顔領域が含まれていないと判断した場合（ステップＳ１２；ＮＯ）、制御部２３は、この分割領域に隣接した分割領域に顔認識処理を行い、隣接した分割領域に顔領域があるか否かを判断する（ステップＳ１３）。隣接した分割領域に顔領域があると判断した場合（ステップＳ１３；ＹＥＳ）、制御部２３は、ステップＳ１４の処理に移行する。 Next, the control unit 23 performs face recognition processing on the divided area corresponding to the direction of sound in the captured image, that is, corresponding to the range with the highest sound level, and determines whether or not the face area is included. (Step S12). When it is determined that the face area is included in the divided area of the captured image corresponding to the voice direction (step S12; YES), the control unit 23 proceeds to step S14. When it is determined that the face area is not included in the divided area of the captured image corresponding to the voice direction (step S12; NO), the control unit 23 performs face recognition processing on the divided area adjacent to the divided area, It is determined whether or not there is a face area in the adjacent divided area (step S13). When it is determined that there is a face area in the adjacent divided area (step S13; YES), the control unit 23 proceeds to the process of step S14.

ステップＳ１４において、制御部２３は、音声の方向に対応する分割領域又はその隣接領域（音声の方向に対応する分割領域に顔領域が存在しない場合）から認識された顔領域を含む人物領域を切り出し位置候補として決定し、メモリに記憶する（ステップＳ１４）。なお、顔領域が複数認識された場合は、何れか一つ、例えば、撮影画像の中心に近い顔領域を含む人物領域を切り出し位置候補として決定する。 In step S <b> 14, the control unit 23 cuts out a human area including a face area recognized from a divided area corresponding to the voice direction or its adjacent area (when no face area exists in the divided area corresponding to the voice direction). It determines as a position candidate and memorize | stores in memory (step S14). When a plurality of face areas are recognized, any one, for example, a person area including a face area close to the center of the captured image is determined as a cutout position candidate.

次いで、制御部２３は、通信部２６により相手側のデジタルサイネージ装置１から角度変更情報が受信されたか否かに基づいて、相手側の通話参加者が覗き込み動作を行ったか否かを判断する（ステップＳ１５）。相手側のデジタルサイネージ装置１から角度変更情報が受信された場合、相手側の通話参加者が覗き込み動作を行ったと判断する。 Next, based on whether or not the angle change information is received from the other-side digital signage apparatus 1 by the communication unit 26, the control unit 23 determines whether or not the other-party call participant has performed a peeping operation. (Step S15). When angle change information is received from the other party's digital signage device 1, it is determined that the other party's call participant has performed a peeping operation.

相手側の通話参加者が覗き込み動作を行ったと判断した場合（ステップＳ１５；ＹＥＳ）、制御部２３は、受信した角度変更情報に基づいて、覗き込みの方向が切り出し位置候補を含む分割領域に対応する方向であるか否かを判断する（ステップＳ１６）。
例えば、左右方向の角度変更情報がプラスである場合、撮影画像における切り出し位置候補を含む分割領域が領域４又は領域５である場合に、覗き込みの方向が切り出し位置候補を含む分割領域に対応する方向であると判断される。左右方向の角度変更情報がマイナスである場合、撮影画像における切り出し位置候補を含む分割領域が領域１又は領域２である場合に、覗き込みの方向が切り出し位置候補を含む分割領域に対応する方向であると判断される。 When it is determined that the call participant on the other side has performed the peeping operation (step S15; YES), the control unit 23 sets the peeping direction to the divided region including the cutout position candidate based on the received angle change information. It is determined whether or not the direction is a corresponding direction (step S16).
For example, when the angle change information in the left-right direction is positive, when the divided area including the cutout position candidate in the captured image is the area 4 or the area 5, the peeping direction corresponds to the divided area including the cutout position candidate. It is determined that the direction. When the angle change information in the left-right direction is negative, when the divided region including the cutout position candidate in the captured image is the region 1 or the region 2, the peeping direction is a direction corresponding to the divided region including the cutout position candidate. It is judged that there is.

覗き込みの方向が切り出し位置候補を含む分割領域に対応する方向であると判断した場合（ステップＳ１６；ＹＥＳ）、制御部２３は、覗き込みの角度（即ち、移動量）が切り出し位置候補を含む分割領域に到達しているか否かを判断する（ステップＳ１７）。
例えば、相手側のデジタルサイネージ装置１から受信した角度変更情報における左右方向の角度θが、撮影画像における切り出し位置候補を含む分割領域の範囲内に相当する角度か又は分割領域の範囲内の角度を超えている場合は、覗き込みの角度が切り出し位置候補を含む分割領域に到達していると判断する。例えば、切り出し位置候補を含む分割領域が音声の方向に対応する分割領域である場合、覗き込みの角度が音声の方向に対応する分割領域の最小角度（０に近いほうの角度）を超える場合に、覗き込みの角度が切り出し位置候補を含む分割領域に到達していると判断する。 When it is determined that the peeping direction is the direction corresponding to the divided region including the cutout position candidate (step S16; YES), the control unit 23 includes the peeping angle (that is, the movement amount) including the cutout position candidate. It is determined whether or not the divided area has been reached (step S17).
For example, the angle θ in the left-right direction in the angle change information received from the other-side digital signage device 1 is an angle corresponding to or within the range of the divided region including the cutout position candidate in the captured image. If it exceeds, it is determined that the viewing angle has reached the divided area including the cutout position candidate. For example, when the divided region including the cut-out position candidate is a divided region corresponding to the direction of the voice, the peeping angle exceeds the minimum angle (an angle closer to 0) of the divided region corresponding to the direction of the voice. Then, it is determined that the peeping angle has reached the divided area including the cutout position candidate.

覗き込みの角度が切り出し位置候補を含む分割領域に到達していると判断した場合（ステップＳ１７；ＹＥＳ）、制御部２３は、切り出し位置候補を切り出し位置として決定し（ステップＳ１８）、ステップＳ２０に移行する。 When it is determined that the peeping angle has reached the divided region including the cutout position candidate (step S17; YES), the control unit 23 determines the cutout position candidate as the cutout position (step S18), and proceeds to step S20. Transition.

一方、ステップＳ１０において、音声レベルが所定値以上の音声がないと判断した場合（ステップＳ１０；ＮＯ）、ステップＳ１３において、隣接した分割領域に顔領域が存在しないと判断した場合（ステップＳ１３；ＮＯ）、ステップＳ１５において、相手側の通話参加者の覗き込み動作がないと判断した場合（ステップＳ１５；ＮＯ）、ステップＳ１６において、覗き込みの方向が切り出し位置候補を含む分割領域に対応する方向ではないと判断した場合（ステップＳ１６；ＮＯ）、又は、ステップＳ１７において、覗き込みの角度が切り出し位置候補を含む分割領域に到達していないと判断した場合（ステップＳ１７；ＮＯ）、制御部２３は、撮影画像の中心に最も近い人物の領域（撮影画像の中心に最も中心が近い顔領域を含む人物領域）を切り出し位置として決定し（ステップＳ１９）、ステップＳ２０に移行する。 On the other hand, when it is determined in step S10 that there is no sound whose sound level is equal to or higher than the predetermined value (step S10; NO), when it is determined in step S13 that no face area exists in the adjacent divided area (step S13; NO). ) When it is determined in step S15 that there is no peeping operation of the other party on the call (step S15; NO), in step S16, the peeping direction is in the direction corresponding to the divided area including the cutout position candidate. If it is determined that there is not (step S16; NO), or if it is determined in step S17 that the peeping angle has not reached the divided region including the cutout position candidate (step S17; NO), the control unit 23 , The area of the person closest to the center of the captured image (including the face area closest to the center of the captured image Pass) was determined as the cut-out position (step S19), and proceeds to step S20.

ステップＳ２０において、制御部２３は、撮影画像から決定された切り出し位置の画像を切り出して通信部２６により相手側のデジタルサイネージ装置１に送信し（ステップＳ２０）、ステップＳ３の処理に戻る。
相手側のデジタルサイネージ装置１との通信が切断されるまで、制御部２３はステップＳ３〜ステップＳ２０の処理を繰り返し実行する。 In step S20, the control unit 23 cuts out the image at the cut-out position determined from the photographed image, transmits the cut-out position image to the other-side digital signage device 1 through the communication unit 26 (step S20), and returns to the process of step S3.
Until the communication with the digital signage device 1 on the other side is disconnected, the control unit 23 repeatedly executes the processes of step S3 to step S20.

以上説明したように、通信システム１００によれば、デジタルサイネージ装置１のそれぞれの制御部２３は、音声取得部３１により取得された音声の方向を検出し、検出された音声の方向に基づいて、画像形成部２７に対向している通話参加者の撮影画像から切り出す画像領域を決定し、決定された画像領域を撮影画像から切り出して通信部２６により相手側のデジタルサイネージ装置１に送信する。 As described above, according to the communication system 100, each control unit 23 of the digital signage apparatus 1 detects the direction of the sound acquired by the sound acquisition unit 31, and based on the detected direction of the sound, An image area to be cut out from the captured image of the call participant facing the image forming unit 27 is determined, and the determined image area is cut out from the captured image and transmitted to the digital signage device 1 on the other side by the communication unit 26.

従って、相手側のデジタルサイネージ装置１への表示対象として切り出す画像領域を音声の方向に基づいて容易に決定することが可能となる。 Therefore, it is possible to easily determine an image area to be cut out as a display target on the other party's digital signage device 1 based on the direction of sound.

具体的に、制御部２３は、撮影画像における検出された音声の方向に対応する領域に顔認識処理を行い、顔領域が認識された場合に、当該認識された顔領域を含む人物の画像領域を撮影画像から切り出す領域として決定する。
更に具体的には、制御部２３は、音声取得部３１の周囲の予め定められた範囲を複数の方向によって分割し、その分割した複数の範囲毎の音声の大きさのレベルを検出し、撮影画像を分割した複数の範囲に対応する複数の領域に分割する。そして、撮影画像における音声レベルが最も大きい範囲に対応する領域に顔認識処理を行い、顔領域が認識された場合に、当該認識された顔領域を含む人物の画像領域を撮影画像から切り出す領域として決定する。
従って、音声の方向に対応する領域に存在する人物領域、即ち、話者の領域を相手側のデジタルサイネージ装置１への表示対象として切り出す画像領域に容易に決定することが可能となる。 Specifically, the control unit 23 performs face recognition processing on an area corresponding to the detected voice direction in the captured image, and when the face area is recognized, the image area of the person including the recognized face area Is determined as a region to be cut out from the captured image.
More specifically, the control unit 23 divides a predetermined range around the sound acquisition unit 31 by a plurality of directions, detects the level of sound level for each of the divided plurality of ranges, and performs shooting. The image is divided into a plurality of areas corresponding to the divided ranges. Then, face recognition processing is performed on an area corresponding to the range where the sound level in the photographed image is the highest, and when the face area is recognized, an image area of a person including the recognized face area is cut out from the photographed image. decide.
Accordingly, it is possible to easily determine a person area existing in an area corresponding to the direction of the voice, that is, an image area to be cut out as a display target on the digital signage apparatus 1 on the other side.

また、制御部２３は、撮影画像における音声の大きさのレベルが最も大きい範囲に対応する領域から顔領域が認識されなかった場合に、その領域に隣接する領域に顔認識処理を行い、顔領域が認識された場合に、当該認識された顔領域を含む人物の画像領域を撮影画像から切り出す領域として決定する。従って、音声の方向から人物が検出されなくても隣接する領域の人物を切り出す領域として決定することが可能となる。 Further, when the face area is not recognized from the area corresponding to the range where the level of the sound volume in the captured image is the largest, the control unit 23 performs face recognition processing on the area adjacent to the area, Is recognized, the person's image area including the recognized face area is determined as an area to be cut out from the captured image. Therefore, even if a person is not detected from the direction of the voice, it is possible to determine an adjacent region as a region to be cut out.

また、各デジタルサイネージ装置１は、相手側のデジタルサイネージ装置１で音声の検出を行った複数の範囲のそれぞれに対応して設けられた複数のスピーカ３３ａ〜３３ｅを有し、制御部２３は、スピーカ３３ａ〜３３ｅのそれぞれに対し、相手側のデジタルサイネージ装置１から受信した音声を対応する範囲の音声の大きさのレベルに応じた音量で出力させる。従って、相手側のデジタルサイネージ装置１で取得された音声の方向に応じて偏りを持たせて音声を再生することが可能となる。 In addition, each digital signage device 1 has a plurality of speakers 33a to 33e provided corresponding to each of a plurality of ranges in which audio is detected by the partner digital signage device 1, and the control unit 23 For each of the speakers 33a to 33e, the sound received from the digital signage device 1 on the other side is output at a volume corresponding to the level of the sound volume in the corresponding range. Therefore, it is possible to reproduce the sound with a bias depending on the direction of the sound acquired by the digital signage device 1 on the other side.

また、各デジタルサイネージ装置１は、画像形成部２７に対向している通話参加者の移動方向及び移動量を検出し、検出された移動方向及び移動量の情報を相手側のデジタルサイネージ装置１に送信する。従って、相手側のデジタルサイネージ装置１に対し、通話参加者が覗き込み動作を行ったことを知らせることができる。
また、各デジタルサイネージ装置１は、相手側のデジタルサイネージ装置１の通話参加者の移動方向が、自装置の音声取得部３１により取得された音声の大きさのレベルが最も大きい範囲に対応する方向であり、かつ、移動量が音声の方向に応じて予め定められた閾値を超えている場合に、撮影画像における音声の大きさのレベルが最も大きい範囲に対応する領域に含まれる人物の画像領域を撮影画像から切り出す領域として決定する。従って、相手側が覗き込み動作等を行って、音声の方向に対応する話者への表示対象の切り替えを望んだ場合に、表示対象を音声の方向に応じて切り替えることができる。 Also, each digital signage device 1 detects the moving direction and moving amount of the call participant facing the image forming unit 27, and sends information on the detected moving direction and moving amount to the other digital signage device 1. Send. Therefore, it is possible to notify the other party's digital signage device 1 that the call participant has performed the peeping operation.
Further, each digital signage device 1 corresponds to a direction in which the moving direction of the call participant of the other-side digital signage device 1 corresponds to a range in which the level of the volume of the voice acquired by the voice acquisition unit 31 of the own apparatus is the highest. And when the amount of movement exceeds a predetermined threshold according to the direction of the sound, the image region of the person included in the region corresponding to the range where the level of the sound level in the captured image is the largest Is determined as a region to be cut out from the captured image. Therefore, when the other party performs a look-in operation or the like and desires to switch the display target to the speaker corresponding to the voice direction, the display target can be switched according to the voice direction.

なお、上記実施形態における記述内容は、本発明に係るデジタルサイネージ装置の好適な一例であり、これに限定されるものではない。 In addition, the description content in the said embodiment is a suitable example of the digital signage apparatus which concerns on this invention, and is not limited to this.

例えば、上記実施形態においては、検出された音声の方向への覗き込み動作があり、かつ、覗き込みの角度が音声の方向に応じた閾値を超えている場合（覗き込みの角度が音声の方向に対応する分割領域に到達している場合）に、撮影画像における音声の方向に対応する分割領域に存在する人物領域を切り出し位置として決定することとしたが、これに限定されない。例えば、覗き込み動作の有無にかかわらず、単に、検出された音声の方向に対応する分割領域に存在する人物領域を切り出し位置として決定することとしてもよい。また、覗き込み動作の方向が音声の方向に対応している場合に、検出された音声の方向に対応する分割領域に存在する人物領域を切り出し位置として決定することとしてもよい。このようにしても、相手側が覗き込み動作等を行って、音声の方向に対応する話者への表示対象の切り替えを望んだ場合に、表示対象を音声の方向に応じて切り替えることができる。 For example, in the above embodiment, when there is a peeping operation in the detected voice direction and the peeping angle exceeds a threshold corresponding to the voice direction (the peeping angle is the voice direction) The person area existing in the divided area corresponding to the direction of the sound in the captured image is determined as the cutout position. However, the present invention is not limited to this. For example, a person region existing in a divided region corresponding to the detected voice direction may be simply determined as a cutout position regardless of whether or not a peeping operation is performed. Further, when the direction of the peeping operation corresponds to the direction of the voice, the person area existing in the divided area corresponding to the detected voice direction may be determined as the cut-out position. Even in this case, when the other party performs a look-in operation or the like and desires to switch the display target to the speaker corresponding to the voice direction, the display target can be switched according to the voice direction.

また、上記実施形態においては、撮影画像における音声の方向に対応する分割領域から顔領域が認識されない場合、隣接する分割領域に存在する顔領域を切り出し位置候補とすることとしたが、音声の方向に対応する分割領域から顔領域が認識されない場合、音声レベルの高い順に他の分割領域から顔領域を認識し、認識された顔領域を切り出し位置候補とすることとしてもよい。
また、上記実施形態においては、撮影画像における音声の方向に対応する分割領域で顔領域が認識された場合に、当該認識された顔領域を含む人物の画像領域を撮影画像から切り出す領域として決定したが、顔領域を含むようにスクリーン部２２の形状に合わせて切り出す領域としてもよい。 In the above embodiment, when a face area is not recognized from the divided area corresponding to the direction of the sound in the captured image, the face area existing in the adjacent divided area is determined as a cutout position candidate. When the face area is not recognized from the divided area corresponding to, the face area may be recognized from the other divided areas in descending order of the sound level, and the recognized face area may be set as a cutout position candidate.
In the above embodiment, when a face area is recognized in a divided area corresponding to the direction of sound in the captured image, the person's image area including the recognized face area is determined as an area to be cut out from the captured image. However, it is good also as an area | region cut out according to the shape of the screen part 22 so that a face area | region may be included.

また、上記実施形態においては、本発明をプロジェクタからスクリーンに画像を投影することで画像の表示を行うデジタルサイネージ装置に適用した場合を例にとり説明したが、例えば、液晶ディスプレイ、プラズマディスプレイ等、他の表示装置に適用しても同様の効果を奏することができ、この例に限定されない。 In the above embodiment, the case where the present invention is applied to a digital signage apparatus that displays an image by projecting an image from a projector onto a screen has been described as an example. Even when applied to this display device, the same effect can be obtained, and the present invention is not limited to this example.

その他、通信システムを構成する各装置の細部構成及び細部動作に関しても、発明の趣旨を逸脱することのない範囲で適宜変更可能である。 In addition, the detailed configuration and detailed operation of each device constituting the communication system can be changed as appropriate without departing from the spirit of the invention.

本発明のいくつかの実施形態を説明したが、本発明の範囲は、上述の実施形態に限定するものではなく、特許請求の範囲に記載された発明の範囲とその均等の範囲を含む。
以下に、この出願の願書に最初に添付した特許請求の範囲に記載した発明を付記する。付記に記載した請求項の項番は、この出願の願書に最初に添付した特許請求の範囲の通りである。
［付記］
＜請求項１＞
相手側の通信装置と画像及び音声を送受信する通信装置であって、
前記相手側の通信装置に送信するための画像を撮影する撮影手段と、
音声を取得する音声取得手段と、
前記音声取得手段により取得された音声の方向を検出する検出手段と、
前記検出された音声の方向に基づいて前記撮影手段により取得された撮影画像から切り出す画像領域を決定する決定手段と、
を備える通信装置。
＜請求項２＞
前記決定された画像領域を前記撮影画像から切り出す切り出し手段と、
前記切り出された画像領域を前記相手側の通信装置に送信する送信制御手段と、
を備える請求項１に記載の通信装置。
＜請求項３＞
前記決定手段は、前記撮影画像における前記検出された音声の方向に対応する領域に顔認識処理を行い、顔領域が認識された場合に、当該認識された顔領域を含む人物の画像領域を前記撮影画像から切り出す領域として決定する請求項１又は２に記載の通信装置。
＜請求項４＞
前記検出手段は、前記音声取得手段の周囲の予め定められた範囲を複数の方向によって分割し、その分割した複数の範囲毎の音声の大きさのレベルを検出し、
前記決定手段は、前記撮影画像を前記分割した複数の範囲に対応する複数の領域に分割し、前記撮影画像における前記音声の大きさのレベルが最も大きい範囲に対応する領域に顔認識処理を行い、顔領域が認識された場合に、当該認識された顔領域を含む人物の画像領域を前記撮影画像から切り出す領域として決定する請求項２に記載の通信装置。
＜請求項５＞
前記決定手段は、前記撮影画像における前記音声の大きさのレベルが最も大きい範囲に対応する領域から顔領域が認識されなかった場合に、その領域に隣接する領域に顔認識処理を行い、顔領域が認識された場合に、当該認識された顔領域を含む人物の画像領域を前記撮影画像から切り出す領域として決定する請求項４に記載の通信装置。
＜請求項６＞
前記送信制御手段は、前記検出手段により検出された前記複数の範囲毎の音声の大きさのレベルを示す情報を前記取得された音声とともに前記相手側の通信装置に送信し、
前記相手側の通信装置で音声の検出を行った前記複数の範囲のそれぞれに対応して設けられた複数の音声出力手段と、
前記複数の音声出力手段のそれぞれに、前記相手側の通信装置から受信した音声をその音声出力手段に対応する範囲の音声の大きさのレベルに応じた音量で出力させる音声出力制御手段と、
を備える請求項４又は５に記載の通信装置。
＜請求項７＞
前記相手側の通信装置から受信した画像を表示する表示手段と、
前記表示手段に対向している人物の移動方向を検出する移動検出手段を備え、
前記送信制御手段は、前記移動検出手段により検出された移動方向の情報を前記相手側の通信装置に送信し、
前記決定手段は、前記相手側の通信装置から受信した前記移動方向の情報に基づいて特定される前記相手側の通信装置の表示手段に対向している人物の移動方向が、前記検出手段により検出された前記音声の大きさのレベルが最も大きい範囲に対応する方向である場合に、前記撮影画像における前記音声の大きさのレベルが最も大きい範囲に対応する領域に顔認識処理を行い、顔領域が認識された場合に、当該認識された顔領域を含む人物の画像領域を前記撮影画像から切り出す領域として決定する請求項６に記載の通信装置。
＜請求項８＞
前記移動検出手段は、更に、前記表示手段に対向している人物の移動量を検出し、
前記送信制御手段は、前記移動検出手段により検出された移動方向及び移動量の情報を前記相手側の通信装置に送信し、
前記決定手段は、前記相手側の通信装置から受信した移動方向の情報に基づいて特定される前記相手側の通信装置の表示手段に対向している人物の移動方向が、前記検出手段により検出された前記音声の大きさのレベルが最も大きい範囲に対応する方向であり、かつ、その移動量が前記音声の大きさのレベルが最も大きい範囲に応じて予め定められた閾値を超えている場合に、前記撮影画像における前記音声の大きさのレベルが最も大きい範囲に対応する領域に顔認識処理を行い、顔領域が認識された場合に、当該認識された顔領域を含む人物の画像領域を前記撮影画像から切り出す領域として決定する請求項７に記載の通信装置。
＜請求項９＞
画像及び音声を送受信する複数の通信装置が通話回線により接続可能な通信システムであって、
前記通信装置のそれぞれは、
相手側の通信装置から受信した画像を表示する表示手段と、
前記相手側の通信装置に送信するための画像を撮影する撮影手段と、
音声を取得する音声取得手段と、
前記音声取得手段により取得された音声の方向を検出する検出手段と、
前記検出された音声の方向に基づいて前記撮影手段により取得された撮影画像から切り出す画像領域を決定する決定手段と、
前記決定された画像領域を前記撮影画像から切り出す切り出し手段と、
前記切り出された画像領域を前記相手側の通信装置に送信する送信制御手段と、
を備える通信システム。
＜請求項１０＞
相手側の通信装置と画像及び音声を送受信する通信装置における画像切り出し方法であって、
前記相手側の通信装置に送信するための画像を撮影する工程と、
音声を取得する工程と、
前記取得された音声の方向を検出する工程と、
前記検出された音声の方向に基づいて前記撮影手段により取得された撮影画像から切り出す画像領域を決定する工程と、
を含む画像切り出し方法。
＜請求項１１＞
相手側の通信装置に送信するための画像を撮影する撮影手段及び音声を取得する音声取得手段を備え、相手側の通信装置と画像及び音声を送受信する通信装置に用いられるコンピュータを、
前記音声取得手段により取得された音声の方向を検出する検出手段、
前記検出された音声の方向に基づいて、前記撮影手段により取得された撮影画像から切り出す画像領域を決定する決定手段、
として機能させるためのプログラム。 Although several embodiments of the present invention have been described, the scope of the present invention is not limited to the above-described embodiments, but includes the scope of the invention described in the claims and equivalents thereof.
The invention described in the scope of claims attached to the application of this application will be added below. The item numbers of the claims described in the appendix are as set forth in the claims attached to the application of this application.
[Appendix]
<Claim 1>
A communication device that transmits and receives images and sound to and from a communication device on the other side,
Photographing means for photographing an image to be transmitted to the communication device on the other side;
Audio acquisition means for acquiring audio;
Detecting means for detecting the direction of the sound acquired by the sound acquiring means;
Determining means for determining an image region to be cut out from the captured image acquired by the imaging means based on the direction of the detected sound;
A communication device comprising:
<Claim 2>
Clipping means for cutting out the determined image region from the captured image;
Transmission control means for transmitting the clipped image area to the counterpart communication device;
The communication apparatus according to claim 1.
<Claim 3>
The determination unit performs face recognition processing on an area corresponding to the detected voice direction in the captured image, and when a face area is recognized, an image area of a person including the recognized face area is The communication device according to claim 1, wherein the communication device is determined as an area to be cut out from a captured image.
<Claim 4>
The detection means divides a predetermined range around the sound acquisition means by a plurality of directions, detects a level of sound volume for each of the divided ranges,
The determination unit divides the photographed image into a plurality of regions corresponding to the plurality of divided ranges, and performs face recognition processing on a region corresponding to a range where the level of the sound volume in the photographed image is the largest. The communication apparatus according to claim 2, wherein when a face area is recognized, an image area of a person including the recognized face area is determined as an area to be cut out from the captured image.
<Claim 5>
When the face area is not recognized from the area corresponding to the range where the level of the volume of the sound in the captured image is the largest, the determining means performs face recognition processing on an area adjacent to the area, The communication apparatus according to claim 4, wherein when an image is recognized, an image area of a person including the recognized face area is determined as an area cut out from the captured image.
<Claim 6>
The transmission control means transmits information indicating the level of the loudness level for each of the plurality of ranges detected by the detection means to the communication apparatus on the other side together with the acquired voice,
A plurality of voice output means provided corresponding to each of the plurality of ranges in which voice is detected by the counterpart communication device;
A voice output control means for causing each of the plurality of voice output means to output the voice received from the counterpart communication device at a volume corresponding to the level of the voice in a range corresponding to the voice output means;
The communication device according to claim 4 or 5.
<Claim 7>
Display means for displaying an image received from the counterpart communication device;
A movement detecting means for detecting a moving direction of the person facing the display means;
The transmission control means transmits information on the moving direction detected by the movement detecting means to the communication apparatus on the other side,
The determining means detects, by the detecting means, the moving direction of the person facing the display means of the partner communication device specified based on the information on the moving direction received from the partner communication device. Face recognition processing is performed on an area corresponding to the range where the level of the loudness level in the photographed image is the largest in the direction corresponding to the range where the volume level of the generated voice is the largest. The communication apparatus according to claim 6, wherein when an image is recognized, an image area of a person including the recognized face area is determined as an area to be cut out from the captured image.
<Claim 8>
The movement detection means further detects the movement amount of the person facing the display means,
The transmission control means transmits information on the movement direction and the movement amount detected by the movement detection means to the communication apparatus on the other side,
The determining means detects, by the detecting means, the moving direction of the person facing the display means of the partner communication device specified based on the information of the movement direction received from the partner communication device. And the direction corresponding to the range in which the loudness level is the largest, and the amount of movement exceeds a predetermined threshold according to the range in which the loudness level is the largest. , When face recognition processing is performed on an area corresponding to a range where the level of the volume of the sound in the photographed image is the largest, and when a face area is recognized, an image area of a person including the recognized face area is The communication device according to claim 7, wherein the communication device is determined as an area to be cut out from a captured image.
<Claim 9>
A communication system in which a plurality of communication devices that transmit and receive images and sounds can be connected via a telephone line,
Each of the communication devices
Display means for displaying an image received from the communication device on the other side;
Photographing means for photographing an image to be transmitted to the communication device on the other side;
Audio acquisition means for acquiring audio;
Detecting means for detecting the direction of the sound acquired by the sound acquiring means;
Determining means for determining an image region to be cut out from the captured image acquired by the imaging means based on the direction of the detected sound;
Clipping means for cutting out the determined image region from the captured image;
Transmission control means for transmitting the clipped image area to the counterpart communication device;
A communication system comprising:
<Claim 10>
An image clipping method in a communication device that transmits and receives images and sound to and from a communication device on the other side,
Capturing an image for transmission to the counterpart communication device;
Obtaining audio,
Detecting the direction of the acquired voice;
Determining an image region to be cut out from the captured image acquired by the imaging unit based on the detected direction of the sound;
Image clipping method including
<Claim 11>
A computer used for a communication apparatus that includes an imaging unit that captures an image to be transmitted to a communication device on the other side and a sound acquisition unit that acquires sound, and that is used for the communication device that transmits and receives images and sound to and from the other side communication device.
Detecting means for detecting the direction of the sound acquired by the sound acquiring means;
Determining means for determining an image region to be cut out from the captured image acquired by the imaging means based on the direction of the detected sound;
Program to function as.

１デジタルサイネージ装置
２１投影部
２２スクリーン部
２３制御部
２４プロジェクタ
２５記憶部
２５１プログラム記憶部
２５２電話帳記憶部
２６通信部
２７画像形成部
２８台座
２９透光板
３０撮像部
３１音声取得部
３２操作部
３３音声出力部 DESCRIPTION OF SYMBOLS 1 Digital signage apparatus 21 Projection part 22 Screen part 23 Control part 24 Projector 25 Memory | storage part 251 Program memory | storage part 252 Telephone directory memory | storage part 26 Communication part 27 Image formation part 28 Base 29 Translucent board 30 Imaging part 31 Voice acquisition part 32 Operation part 33 Audio output unit

Claims

A communication device that transmits and receives images and sound to and from a communication device on the other side,
Photographing means for photographing an image to be transmitted to the communication device on the other side;
Audio acquisition means for acquiring audio;
Detecting means for detecting the direction of the sound acquired by the sound acquiring means;
Determining means for determining an image region to be cut out from the captured image acquired by the imaging means based on the direction of the detected sound;
A communication device comprising:

Clipping means for cutting out the determined image region from the captured image;
Transmission control means for transmitting the clipped image area to the counterpart communication device;
The communication apparatus according to claim 1.

The determination unit performs face recognition processing on an area corresponding to the detected voice direction in the captured image, and when a face area is recognized, an image area of a person including the recognized face area is The communication device according to claim 1, wherein the communication device is determined as an area to be cut out from a captured image.

The detection means divides a predetermined range around the sound acquisition means by a plurality of directions, detects a level of sound volume for each of the divided ranges,
The determination unit divides the photographed image into a plurality of regions corresponding to the plurality of divided ranges, and performs face recognition processing on a region corresponding to a range where the level of the sound volume in the photographed image is the largest. The communication apparatus according to claim 2, wherein when a face area is recognized, an image area of a person including the recognized face area is determined as an area to be cut out from the captured image.

When the face area is not recognized from the area corresponding to the range where the level of the volume of the sound in the captured image is the largest, the determining means performs face recognition processing on an area adjacent to the area, The communication apparatus according to claim 4, wherein when an image is recognized, an image area of a person including the recognized face area is determined as an area cut out from the captured image.

The transmission control means transmits information indicating the level of the loudness level for each of the plurality of ranges detected by the detection means to the communication apparatus on the other side together with the acquired voice,
A plurality of voice output means provided corresponding to each of the plurality of ranges in which voice is detected by the counterpart communication device;
A voice output control means for causing each of the plurality of voice output means to output the voice received from the counterpart communication device at a volume corresponding to the level of the voice in a range corresponding to the voice output means;
The communication device according to claim 4 or 5.

Display means for displaying an image received from the counterpart communication device;
A movement detecting means for detecting a moving direction of the person facing the display means;
The transmission control means transmits information on the moving direction detected by the movement detecting means to the communication apparatus on the other side,
The determining means detects, by the detecting means, the moving direction of the person facing the display means of the partner communication device specified based on the information on the moving direction received from the partner communication device. Face recognition processing is performed on an area corresponding to the range where the level of the loudness level in the photographed image is the largest in the direction corresponding to the range where the volume level of the generated voice is the largest. The communication apparatus according to claim 6, wherein when an image is recognized, an image area of a person including the recognized face area is determined as an area to be cut out from the captured image.

The movement detection means further detects the movement amount of the person facing the display means,
The transmission control means transmits information on the movement direction and the movement amount detected by the movement detection means to the communication apparatus on the other side,
The determining means detects, by the detecting means, the moving direction of the person facing the display means of the partner communication device specified based on the information of the movement direction received from the partner communication device. And the direction corresponding to the range in which the loudness level is the largest, and the amount of movement exceeds a predetermined threshold according to the range in which the loudness level is the largest. , When face recognition processing is performed on an area corresponding to a range where the level of the volume of the sound in the photographed image is the largest, and when a face area is recognized, an image area of a person including the recognized face area is The communication device according to claim 7, wherein the communication device is determined as an area to be cut out from a captured image.

A communication system in which a plurality of communication devices that transmit and receive images and sounds can be connected via a telephone line,
Each of the communication devices
Display means for displaying an image received from the communication device on the other side;
Photographing means for photographing an image to be transmitted to the communication device on the other side;
Audio acquisition means for acquiring audio;
Detecting means for detecting the direction of the sound acquired by the sound acquiring means;
Determining means for determining an image region to be cut out from the captured image acquired by the imaging means based on the direction of the detected sound;
Clipping means for cutting out the determined image region from the captured image;
Transmission control means for transmitting the clipped image area to the counterpart communication device;
A communication system comprising:

An image clipping method in a communication device that transmits and receives images and sound to and from a communication device on the other side,
Capturing an image for transmission to the counterpart communication device;
Obtaining audio,
Detecting the direction of the acquired voice;
Determining an image region to be cut out from the captured image acquired by the imaging unit based on the detected direction of the sound;
Image clipping method including

A computer used for a communication apparatus that includes an imaging unit that captures an image to be transmitted to a communication device on the other side and a sound acquisition unit that acquires sound, and that is used for the communication device that transmits and receives images and sound to and from the other side communication device.
Detecting means for detecting the direction of the sound acquired by the sound acquiring means;
Determining means for determining an image region to be cut out from the captured image acquired by the imaging means based on the direction of the detected sound;
Program to function as.