JP6631166B2

JP6631166B2 - Imaging device, program, and imaging method

Info

Publication number: JP6631166B2
Application number: JP2015214722A
Authority: JP
Inventors: 清人五十嵐; 内山　裕章; 裕章内山; 耕司桑田; 高橋　仁人; 仁人高橋; 智幸後藤; 和紀北澤; 宣正銀川; 未来袴谷
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-08-03
Filing date: 2015-10-30
Publication date: 2020-01-15
Anticipated expiration: 2035-10-30
Also published as: JP2017034645A

Description

本発明は、撮影装置、プログラム及び撮影方法に関する。 The present invention relates to a photographing device, a program, and a photographing method.

会議で発話した参加者をカメラで追尾して撮影し、会議の映像を外部に配信するシステムがある。 2. Description of the Related Art There is a system in which a participant who speaks at a conference is tracked and photographed by a camera, and a video of the conference is distributed to the outside.

例えば、会議室に設置されているカメラにマイクアレイが配列されており、マイクアレイによって発話した参加者の方向が検知される。検知した参加者の方向にカメラの向きが制御され、発話している参加者が撮影される。別の参加者が発話した場合、発話した参加者の方向にカメラの向きが変更される。撮影された会議の映像（画像）は、ネットワークを介して閲覧者の端末に配信される。 For example, a microphone array is arranged on a camera installed in a conference room, and the direction of a participant who speaks is detected by the microphone array. The direction of the camera is controlled according to the detected direction of the participant, and the speaking participant is photographed. When another participant utters, the direction of the camera is changed to the direction of the uttering participant. The captured video (image) of the conference is distributed to the terminal of the viewer via the network.

例えば、複数のマイクを配列して構成されたマイクアレイを用いて発話者の方向を検出し、検出した方向にカメラの向きを制御する技術が開示されている（例えば、特許文献１）。 For example, a technology has been disclosed in which the direction of a speaker is detected using a microphone array configured by arranging a plurality of microphones, and the direction of a camera is controlled in the detected direction (for example, Patent Document 1).

しかしながら、従来の技術では、撮影時に鮮明な音声を収録できないという問題があった。 However, the conventional technique has a problem that clear audio cannot be recorded during shooting.

例えば、発話者がカメラから遠く離れた位置に着席している場合、音声が小さすぎるため、鮮明な音声を収録できないことがある。また、発話者がカメラに近すぎる位置に着席している場合、音声が大きくなりすぎるため、音声が割れるおそれがある。 For example, if the speaker is seated far away from the camera, the voice may be too low to record clear voice. Also, if the speaker is seated too close to the camera, the sound may be too loud and the sound may be broken.

そこで、開示の技術では、より鮮明な音声を収録することを目的とする。 Thus, the disclosed technology aims to record clearer audio.

実施形態では、人物を撮影する撮影装置において、発話した人物の方向及び発話時の音量を検出する検出部と、前記検出した方向に撮影方向を制御する制御部と、前記検出部により検出された方向の前記人物の画像上の大きさを計測する計測部と、前記音量が所定範囲に係る第１範囲の最小値よりも小さく、かつ前記人物の画像上の大きさが第２範囲の最小値よりも小さい場合、前記人物に通知をする通知部と、を有する撮影装置が開示される。 In the embodiment, in a photographing apparatus for photographing a person, a detection unit that detects a direction of the person who spoke and a volume at the time of speech, a control unit that controls a photographing direction in the detected direction, and a detection unit that is detected by the detection unit A measurement unit for measuring the size of the person in the image in the direction, wherein the volume is smaller than the minimum value of the first range according to the predetermined range , and the size of the person on the image is the minimum value of the second range. And a notifying unit for notifying the person when the distance is smaller than the threshold value.

より鮮明な音声を収録することができる。 More clear audio can be recorded.

映像配信システムの全体の構成の例を示す図である。FIG. 1 is a diagram illustrating an example of the overall configuration of a video distribution system. 配信端末のハードウェア構成を示す図である。FIG. 3 is a diagram illustrating a hardware configuration of a distribution terminal. 実施形態１に係る配信端末の機能構成の例を示す図である。FIG. 3 is a diagram illustrating an example of a functional configuration of a distribution terminal according to the first embodiment. カメラの位置座標と角度との関係を示す図である。FIG. 4 is a diagram illustrating a relationship between a camera position coordinate and an angle. 参加者の位置と通知されるメッセージとの関係を示す図である。It is a figure showing the relation between the position of a participant, and the message notified. 実施形態１の制御フローを示す図である。FIG. 3 is a diagram illustrating a control flow according to the first embodiment. 実施形態２に係る配信端末の機能構成の例を示す図である。FIG. 9 is a diagram illustrating an example of a functional configuration of a distribution terminal according to a second embodiment. ユーザテーブルのデータ構造の例を示す図である。FIG. 4 is a diagram illustrating an example of a data structure of a user table. 実施形態２の制御フローを示す図である。FIG. 10 is a diagram illustrating a control flow according to the second embodiment. 認証処理のフローを示す図である。It is a figure showing the flow of authentication processing. 実施形態３の配信端末の第１のハードウェア構成を示す図である。FIG. 14 is a diagram illustrating a first hardware configuration of a distribution terminal according to a third embodiment. 実施形態３の配信端末の第２のハードウェア構成を示す図である。FIG. 14 is a diagram illustrating a second hardware configuration of the distribution terminal according to the third embodiment. ステレオカメラの外観の一例を示す図である。It is a figure showing an example of the appearance of a stereo camera. 距離センサの外観の一例を示す図である。It is a figure showing an example of the appearance of a distance sensor. 実施形態３の制御フローを示す図である。FIG. 13 is a diagram illustrating a control flow according to the third embodiment.

以下、本発明の実施形態について添付の図面を参照しながら説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することによって重複した説明を省く。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the specification and the drawings, components having substantially the same function and configuration are denoted by the same reference numerals, and redundant description is omitted.

（実施形態１）
図１は、映像配信システム１の全体の構成の例を示す図である。映像配信システム１は、サーバ２と、配信端末３と、利用者端末４ａ〜４ｎと、表示装置６とを有する。サーバ２は、通信部２１を有する。配信端末３は、通信部３１と、処理部３２と、データ取得部３３と、データ出力部３４と、記憶部３５とを有する。サーバ２と、配信端末３と、利用者端末４ａ〜４ｎとは、通信ネットワーク５を介して接続される。また、表示装置６は、データ出力部３４に接続される。なお、表示装置６は、配信端末３に設けられていてもよい。 (Embodiment 1)
FIG. 1 is a diagram illustrating an example of the overall configuration of the video distribution system 1. The video distribution system 1 includes a server 2, a distribution terminal 3, user terminals 4a to 4n, and a display device 6. The server 2 has a communication unit 21. The distribution terminal 3 includes a communication unit 31, a processing unit 32, a data acquisition unit 33, a data output unit 34, and a storage unit 35. The server 2, the distribution terminal 3, and the user terminals 4a to 4n are connected via a communication network 5. The display device 6 is connected to the data output unit 34. The display device 6 may be provided in the distribution terminal 3.

データ取得部３３は、例えば、会議室内の画像データ及び音声データを取得する。通信部３１は、取得した画像データ及び音声データを、通信ネットワーク５を介してサーバ２に送信する。サーバ２は、通信ネットワーク５を介して利用者端末４ａ〜４ｎに画像データ及び音声データを配信する。 The data acquisition unit 33 acquires, for example, image data and audio data in a conference room. The communication unit 31 transmits the acquired image data and audio data to the server 2 via the communication network 5. The server 2 distributes image data and audio data to the user terminals 4a to 4n via the communication network 5.

図２は、配信端末３のハードウェア構成を示す図である。図２に示されているように、配信端末３は、配信端末３全体の動作を制御するＣＰＵ(Central Processing Unit)１０１、ＩＰＬ(Initial Program Loader)等のＣＰＵ１０１の駆動に用いられるプログラムを記憶したＲＯＭ(Read Only Memory)１０２、ＣＰＵ１０１のワークエリアとして使用されるＲＡＭ(Random Access Memory)１０３、端末用プログラム、画像データ、及び音声データ等の各種データを記憶するフラッシュメモリ１０４、ＣＰＵ１０１の制御にしたがってフラッシュメモリ１０４に対する各種データの読み出し又は書き込みを制御するＳＳＤ（Solid State Drive）１０５、フラッシュメモリ等の記録メディア１０６に対するデータの読み出し又は書き込み（記憶）を制御するメディアドライブ１０７、配信端末３の宛先を選択する場合などに操作される操作ボタン１０８、配信端末３の電源のＯＮ／ＯＦＦを切り換えるための電源スイッチ１０９、通信ネットワーク５を利用してデータ伝送をするためのネットワークＩ／Ｆ(Interface)１１１を備えている。 FIG. 2 is a diagram illustrating a hardware configuration of the distribution terminal 3. As shown in FIG. 2, the distribution terminal 3 stores a program used for driving the CPU 101 such as a CPU (Central Processing Unit) 101 and an IPL (Initial Program Loader) for controlling the operation of the entire distribution terminal 3. A ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103 used as a work area of the CPU 101, a flash memory 104 for storing various data such as a terminal program, image data, and audio data, and under the control of the CPU 101. An SSD (Solid State Drive) 105 that controls reading or writing of various data to or from the flash memory 104, a media drive 107 that controls reading or writing (storage) of data to or from a recording medium 106 such as a flash memory, and a destination of the distribution terminal 3. Operation buttons operated when selecting And a network I / F (Interface) 111 for performing data transmission using the communication network 5.

また、配信端末３は、ＣＰＵ１０１の制御に従って被写体を撮像して画像データを得る内蔵型のカメラ１１２、このカメラ１１２の駆動を制御する撮像素子Ｉ／Ｆ１１３、音声を入力する内蔵型のマイク１１４、音声を出力する内蔵型のスピーカ１１５、ＣＰＵ１０１の制御に従ってマイク１１４及びスピーカ１１５との間で音声信号の入出力を処理する音声入出力Ｉ／Ｆ１１６、ＣＰＵ１０１の制御に従って外付けのディスプレイ１２０に画像データを伝送するディスプレイＩ／Ｆ１１７、各種の外部機器を接続するための外部機器接続Ｉ／Ｆ１１８、及び上記各構成要素を図２に示されているように電気的に接続するためのアドレスバスやデータバス等のバスライン１１０を備えている。 The distribution terminal 3 includes a built-in camera 112 that captures an image of a subject under the control of the CPU 101 to obtain image data, an image sensor I / F 113 that controls driving of the camera 112, a built-in microphone 114 that inputs audio, A built-in speaker 115 that outputs sound, a sound input / output I / F 116 that processes input / output of a sound signal between the microphone 114 and the speaker 115 under the control of the CPU 101, and image data on an external display 120 under the control of the CPU 101. , A device interface 117 for connecting various external devices, and an address bus and data for electrically connecting the above-described components as shown in FIG. A bus line 110 such as a bus is provided.

ディスプレイ１２０は、被写体の画像や操作用アイコン等を表示する液晶や有機ＥＬによって構成された表示部である。また、ディスプレイ１２０は、ケーブル１２０ｃによってディスプレイＩ／Ｆ１１７に接続される。このケーブル１２０ｃは、アナログＲＧＢ（ＶＧＡ）信号用のケーブルであってもよいし、コンポーネントビデオ用のケーブルであってもよいし、ＨＤＭＩ（登録商標）(High-Definition Multimedia Interface)やＤＶＩ(Digital Video Interactive)信号用のケーブルであってもよい。 The display 120 is a display unit configured by a liquid crystal or an organic EL that displays an image of a subject, operation icons, and the like. The display 120 is connected to the display I / F 117 by a cable 120c. The cable 120c may be a cable for an analog RGB (VGA) signal, a cable for a component video, an HDMI (registered trademark) (High-Definition Multimedia Interface), or a DVI (Digital Video). (Interactive) signal cable.

カメラ１１２は、レンズや、光を電荷に変換して被写体の画像（映像）を電子化する固体撮像素子を含み、固体撮像素子として、ＣＭＯＳ(Complementary Metal Oxide Semiconductor)イメージセンサや、ＣＣＤ（Charge Coupled Device）イメージセンサ等が用いられる。 The camera 112 includes a lens and a solid-state imaging device that converts light into electric charge to convert an image (video) of a subject into electrons. As the solid-state imaging device, a CMOS (Complementary Metal Oxide Semiconductor) image sensor or a CCD (Charge Coupled) is used. Device) An image sensor or the like is used.

外部機器接続Ｉ／Ｆ１１８には、ＵＳＢ(Universal Serial Bus)ケーブル等によって、外付けカメラ、外付けマイク、及び外付けスピーカ等の外部機器がそれぞれ接続可能である。外付けカメラが接続された場合には、ＣＰＵ１０１の制御に従って、内蔵型のカメラ１１２に優先して、外付けカメラが動作する。同じく、外付けマイクが接続された場合や、外付けスピーカが接続された場合には、ＣＰＵ１０１の制御に従って、それぞれが内蔵型のマイク１１４や内蔵型のスピーカ１１５に優先して、外付けマイクや外付けスピーカを駆動させる。 External devices such as an external camera, an external microphone, and an external speaker can be connected to the external device connection I / F 118 via a USB (Universal Serial Bus) cable or the like. When an external camera is connected, the external camera operates prior to the built-in camera 112 under the control of the CPU 101. Similarly, when an external microphone is connected or an external speaker is connected, the external microphone or the internal speaker 115 is given priority over the built-in microphone 114 or the built-in speaker 115 under the control of the CPU 101. Drive external speakers.

なお、記録メディア１０６は、配信端末３に対して着脱自在な構成となっている。また、ＣＰＵ１０１の制御にしたがってデータの読み出し又は書き込みを行う不揮発性メモリであれば、フラッシュメモリ１０４に限らず、ＥＥＰＲＯＭ（Electrically Erasable and Programmable ROM）等を用いてもよい。 Note that the recording medium 106 is configured to be detachable from the distribution terminal 3. In addition, as long as it is a non-volatile memory that reads or writes data under the control of the CPU 101, an EEPROM (Electrically Erasable and Programmable ROM) may be used instead of the flash memory 104.

更に、上記端末用プログラムは、インストール可能な形式又は実行可能な形式のファイルで、上記記録メディア１０６等の、コンピュータで読み取り可能な記録媒体に記録して流通させるようにしてもよい。また、上記端末用プログラムは、フラッシュメモリ１０４ではなくＲＯＭ１０２に記憶させるようにしてもよい。 Further, the terminal program may be recorded in a computer-readable recording medium such as the recording medium 106 as an installable or executable file, and distributed. Further, the terminal program may be stored in the ROM 102 instead of the flash memory 104.

図３は、実施形態１に係る配信端末３の機能構成の例を示す図である。データ取得部３３は、音声取得部３３ａと、撮像部３３ｂとを有する。音声取得部３３ａは、会議室内の音声データを取得する。 FIG. 3 is a diagram illustrating an example of a functional configuration of the distribution terminal 3 according to the first embodiment. The data acquisition unit 33 has a voice acquisition unit 33a and an imaging unit 33b. The voice obtaining unit 33a obtains voice data in the conference room.

撮像部３３ｂは、発話者の画像データを取得する。撮像部３３ｂは、取得した発話者の画像データを記憶部３５に記憶する。また、通信部３１は、会議室内の画像データ及び音声データをサーバ２へ送信する。 The imaging unit 33b acquires the image data of the speaker. The imaging unit 33b stores the acquired image data of the speaker in the storage unit 35. Further, the communication unit 31 transmits image data and audio data in the conference room to the server 2.

処理部３２は、検出部３２ａと、制御部３２ｂと、計測部３２ｃと、通知部３２ｄとを有する。検出部３２ａは、音声取得部３３ａを用いて、会議室で発話した参加者の方向を検出する。具体的には、検出部３２ａは、例えば、音声取得部３３ａで取得された音声データから音声の平均値を算出し、音声の平均値が所定値以上の場合に発話者が存在するものと判定する。 The processing unit 32 includes a detection unit 32a, a control unit 32b, a measurement unit 32c, and a notification unit 32d. The detection unit 32a detects the direction of the participant speaking in the conference room using the voice acquisition unit 33a. Specifically, for example, the detection unit 32a calculates an average value of the voice from the voice data acquired by the voice acquisition unit 33a, and determines that the speaker exists when the average value of the voice is equal to or more than a predetermined value. I do.

続いて、検出部３２ａは、マイクアレイを構成する各マイクから取得された音声データに基づいて発話者の方向を検出する。例えば、検出部３２ａは、座標を（０,０）とする原点から会議室内の基準点（座標（０,Ｙ_０））を撮影する方向を０度とした場合、撮像部３３ｂの位置座標（Ｘ_０,Ｙ_０）を取得し、また音声データに基づいて参加者を撮影した時の撮影角度θとを取得する。 Subsequently, the detection unit 32a detects the direction of the speaker based on the audio data obtained from each microphone constituting the microphone array. For example, when the direction in which the reference point (coordinates (0, Y ₀ )) in the conference room is imaged from the origin having coordinates (0, ₀ ) to 0 degrees, the position coordinates ( X ₀ , Y ₀ ), and the photographing angle θ at the time of photographing the participant based on the audio data.

以下、発話者の位置に対応した撮影方向に関するデータを方向データという。方向データは、撮像部３３ｂの位置座標（Ｘ_０,Ｙ_０）と参加者を撮影した時の撮影角度θとにより、例えば、（Ｘ_０,Ｙ_０,θ）のように表される。 Hereinafter, data relating to the shooting direction corresponding to the position of the speaker is referred to as direction data. The direction data is represented, for example, as (X ₀ , Y ₀ , θ) by the position coordinates (X ₀ , Y ₀ ) of the imaging unit 33 b and the shooting angle θ when the participant was shot.

図４は、撮像部３３ｂの位置座標と角度との関係を示す図である。方向データＢ（０,０,０）は、原点（０,０）から基準点β（０,Ｙ_０）を撮影した場合の撮像部３３ｂの位置座標（０,０）及び撮影角度０度を示す。 FIG. 4 is a diagram illustrating a relationship between the position coordinates and the angle of the imaging unit 33b. The direction data B (0,0,0) indicates the position coordinates (0,0) and the imaging angle of 0 degrees of the imaging unit 33b when the reference point β (0, Y ₀ ) is imaged from the origin (0,0). Show.

方向データＡ（Ｘ_１,Ｙ_１,θ_１）は、撮像部３３ｂの位置座標（Ｘ_１,Ｙ_１）と、発話者が検出された方向に対応する撮像部３３ｂの撮影角度θ_１を示す。また、撮影角度θ_１は、原点（０,０）から基準点β（０,Ｙ_０）を撮影した撮像部３３ｂの撮影角度を０度とした場合において、位置座標（Ｘ_１,Ｙ_１）から発話者を撮影したときの撮影角度を示す。なお、撮像部３３ｂを原点（０,０）に固定する場合は、方向データを撮影角度θ_１のみで表してもよい。 The direction data A (X ₁ , Y ₁ , θ ₁ ) indicates the position coordinates (X ₁ , Y ₁ ) of the imaging unit 33 b and the shooting angle θ ₁ of the imaging unit 33 b corresponding to the direction in which the speaker is detected. . The photographing angle theta ₁ is origin (0, 0) from the reference point β in the case of (0, _{Y 0)} is 0 degrees imaging angle of the photographed image pickup unit 33b and position coordinates _{(X 1,} _Y 1) This shows the photographing angle when the speaker is photographed from. In the case of fixing the imaging unit 33b to the origin (0,0) may represent the direction data imaging angle theta ₁ only.

また、検出部３２ａは、複数の撮像部３３ｂを用いて発話者を検出してもよい。例えば、複数の撮像部３３ｂを用いて発話者を検出する場合、発話者を検出した撮像部３３ｂの識別番号ｎと、撮像部３３ｂの位置及び撮影角度のデータ（Ｘ_０,Ｙ_０,θ）とに基づいて、検出部３２ａは、カメラの識別番号ｎを含む方向データ（ｎ,Ｘ_０,Ｙ_０,θ）を取得してもよい。 Further, the detection unit 32a may detect the speaker using the plurality of imaging units 33b. For example, when a speaker is detected using a plurality of imaging units 33b, the identification number n of the imaging unit 33b that detected the speaker, and the data (X ₀ , Y ₀ , θ) of the position and imaging angle of the imaging unit 33b. based on the bets, the detection unit 32a, the direction data _{_{(n, X 0, Y 0}} , θ) , including an identification number n of cameras may be acquired.

図３に戻る。制御部３２ｂは、撮像部３３ｂが発話者の方向に向くように撮影方向を制御する。例えば、制御部３２ｂは、方向データに基づいて、発話者が画像の中心となるように撮像部３３ｂを旋回させる。続いて、制御部３２ｂは、撮像部３３ｂによって撮影された発話者の画像データを取得する。制御部３２ｂは、取得した画像データから顔部分の画像データを抽出して記憶部３５に記憶させる。 Referring back to FIG. The control unit 32b controls the imaging direction so that the imaging unit 33b faces the speaker. For example, the control unit 32b turns the imaging unit 33b based on the direction data so that the speaker is at the center of the image. Subsequently, the control unit 32b acquires the image data of the speaker captured by the imaging unit 33b. The control unit 32b extracts the image data of the face part from the acquired image data and stores the image data in the storage unit 35.

顔部分の画像データの抽出は、例えば、顔の形状及び肌色の領域を基に実行される。例えば、制御部３２ｂは、画像データの画素のＲＧＢ値をＹＣＣ表色系に変換し、Ｃｒ値及びＣｂ値が所定の範囲内にある肌色画素を抽出する。制御部３２ｂは、画像データから肌色画素が集まっている領域を特定することにより、顔画像を抽出する。 The extraction of the image data of the face portion is executed, for example, based on the face shape and the skin color area. For example, the control unit 32b converts the RGB values of the pixels of the image data into the YCC color system, and extracts skin color pixels whose Cr value and Cb value are within predetermined ranges. The control unit 32b extracts a face image by specifying a region where skin color pixels are gathered from the image data.

計測部３２ｃは、記憶部３５に記憶された画像データに含まれる参加者の顔部分のサイズを計測する。計測部３２ｃは、肌色画素が集まっている領域の左端、右端、上端、下端を特定することで顔のサイズ（インチ）を計測する。 The measuring unit 32c measures the size of the participant's face included in the image data stored in the storage unit 35. The measurement unit 32c measures the face size (inch) by specifying the left end, right end, upper end, and lower end of the area where the skin color pixels are gathered.

参加者が撮像部３３ｂの近くに着席していた場合、比較的顔部分のサイズが大きくなり、参加者が撮像部３３ｂから離れた位置に着席していた場合、比較的顔部分のサイズが小さくなる。 When the participant is seated near the imaging unit 33b, the size of the face part is relatively large, and when the participant is seated at a position away from the imaging unit 33b, the size of the face part is relatively small. Become.

通知部３２ｄは、音声取得部３３ａによって取得された参加者の音声が所定範囲の最大値よりも大きいか否かと、参加者の音声が所定範囲の最小値よりも小さいか否かと、を判定する。また、通知部３２ｄは、例えば、発話者の音声データに係る音量の平均値を用いて上記所定範囲と比較する判定を行う。また、計測部３２ｃによって計測された参加者の顔部分のサイズが所定範囲の最大値よりも大きいか否かと、参加者の顔部分のサイズが所定範囲の最小値よりも小さいか否かと、を判定する。なお、参加者の音声は、デシベル「ｄＢ」等の単位で表され、参加者の顔のサイズは、インチ「ｉｎ」等の単位で表される。 The notifying unit 32d determines whether the voice of the participant obtained by the voice obtaining unit 33a is larger than the maximum value of the predetermined range and whether the voice of the participant is smaller than the minimum value of the predetermined range. . In addition, the notifying unit 32d makes a determination to compare with the predetermined range, for example, by using an average value of the volume of the voice data of the speaker. Further, whether the size of the face part of the participant measured by the measuring unit 32c is larger than the maximum value of the predetermined range, and whether the size of the face part of the participant is smaller than the minimum value of the predetermined range, judge. The voice of the participant is expressed in units such as decibels “dB”, and the size of the participant's face is expressed in units such as inches “in”.

通知部３２ｄは、参加者の音声が所定範囲の最大値よりも大きく、かつ参加者の顔部分のサイズが所定範囲の最大値よりも大きい場合、マイクから離れることを促すメッセージを参加者に通知する。また、通知部３２ｄは、参加者の音声が所定範囲の最小値よりも小さく、かつ参加者の顔部分のサイズが所定範囲の最小値よりも小さい場合、マイクに近づくことを促すメッセージを参加者に通知する。 The notifying unit 32d notifies the participant of a message prompting the user to leave the microphone when the voice of the participant is larger than the maximum value of the predetermined range and the size of the face of the participant is larger than the maximum value of the predetermined range. I do. When the voice of the participant is smaller than the minimum value of the predetermined range and the size of the face of the participant is smaller than the minimum value of the predetermined range, the notification unit 32d transmits a message prompting the participant to approach the microphone. Notify

このように、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値よりも大きいか、又は所定の範囲の最小値よりも小さい場合に、さらに、発話者の顔のサイズに基づいて発話者に通知を行うか否かを判定する。例えば、通知部３２ｄは、発話者がマイクから遠い位置にいるが、声量が大きく明瞭な音声を取得できる場合、発話者にメッセージを通知しない。これにより、音声の収録に影響がない場合にまで発話者にメッセージを通知することを避けることができる。 As described above, when the volume of the voice data of the speaker is larger than the maximum value of the predetermined range or smaller than the minimum value of the predetermined range, the notification unit 32d further determines the voice data based on the size of the face of the speaker. Then, it is determined whether or not to notify the speaker. For example, the notifying unit 32d does not notify the speaker of a message when the speaker is far from the microphone but can obtain a clear voice with a large voice volume. Thereby, it is possible to avoid notifying the speaker of the message even when the recording of the sound is not affected.

なお、本実施形態では、発話者の音声または顔部分のサイズのいずれか一方が所定範囲以内である場合、通知部３２ｄは、当該発話者を通知の対象としないが、これに限定されない。例えば、通知部３２ｄは、発話者の音声または顔部分のサイズのいずれか一方が所定範囲外の場合に当該発話者を通知の対象としてもよい。 In the present embodiment, when either the voice of the speaker or the size of the face part is within the predetermined range, the notification unit 32d does not target the speaker, but the present invention is not limited thereto. For example, the notification unit 32d may set the speaker as a notification target when either the voice of the speaker or the size of the face part is out of the predetermined range.

通知部３２ｄは、例えば、データ出力部３４を介して表示装置６にテロップを表示することでメッセージを参加者に通知する。また、通知部３２ｄは、データ出力部３４を介して音声データを外部に出力することによってメッセージを参加者に通知してもよい。なお、参加者へのメッセージの通知方法は、上記に限定されない。 The notification unit 32d notifies the participant of the message by displaying a telop on the display device 6 via the data output unit 34, for example. The notification unit 32d may notify the participant of the message by outputting voice data to the outside via the data output unit 34. The method of notifying the participant of the message is not limited to the above.

図５は、参加者の位置と通知されるメッセージとの関係を示す図である。図５（ａ）では、参加者βは、テーブルを介してデータ取得部３３（音声取得部３３ａ）の対面に着席しており、（ｂ）（ｃ）よりもデータ取得部３３から遠い位置に着席している。かかる場合に、通知部３２ｄは、参加者の音声が所定範囲の最小値よりも小さく、かつ参加者の顔部分のサイズが所定範囲の最小値よりも小さくなりやすいので、「マイクに近づいてください。」というテロップを表示装置６に表示する。 FIG. 5 is a diagram showing the relationship between the position of the participant and the message to be notified. In FIG. 5A, the participant β is seated opposite to the data acquisition unit 33 (voice acquisition unit 33a) via the table, and is located farther from the data acquisition unit 33 than (b) and (c). I'm sitting. In such a case, the notification unit 32d may display "Please approach the microphone because the voice of the participant is smaller than the minimum value of the predetermined range and the size of the face of the participant is likely to be smaller than the minimum value of the predetermined range." Is displayed on the display device 6.

また、図５（ｂ）では、参加者βは、データ取得部３３の近辺に着席しており、（ａ）（ｃ）よりもデータ取得部３３に近い位置に着席している。かかる場合に、通知部３２ｄは、参加者の音声が所定範囲の最大値よりも大きく、かつ参加者の顔部分のサイズが所定範囲の最大値よりも小さくなりやすいので、「マイクから離れてください。」というテロップを表示装置６に表示する。 In addition, in FIG. 5B, the participant β is seated near the data acquisition unit 33 and is seated at a position closer to the data acquisition unit 33 than in (a) and (c). In such a case, the notification unit 32d may indicate that the participant's voice is larger than the maximum value in the predetermined range and the size of the face of the participant is smaller than the maximum value in the predetermined range. Is displayed on the display device 6.

また、図５（ｃ）では、参加者βは、データ取得部３３に対して（ａ）よりもデータ取得部３３に近く、（ｂ）よりもデータ取得部３３に遠い位置に着席している。かかる場合に、通知部３２ｄは、参加者の音声が所定範囲内であるので、テロップ等を表示装置６に表示しない。 In FIG. 5C, the participant β is seated at a position closer to the data acquisition unit 33 than to the data acquisition unit 33 and farther from the data acquisition unit 33 than to FIG. . In such a case, the notification unit 32d does not display the telop and the like on the display device 6 because the voice of the participant is within the predetermined range.

なお、参加者に通知されるメッセージの内容は、上記に限定されない。 Note that the content of the message notified to the participant is not limited to the above.

図６は、実施形態１の制御フローを示す図である。ビデオ会議の配信が開始されると（ステップＳ１０）、検出部３２ａは、発話者が存在するか否かを判定する（ステップＳ１１）。検出部３２ａは、例えば、音声取得部３３ａによって取得された音声データの音量の平均値が所定値以上であった場合に発話者が存在すると判定する。検出部３２ａは、発話者が存在しない場合（ステップＳ１１Ｎｏ）、所定時間経過後、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。一方、検出部３２ａは、発話者が存在する場合（ステップＳ１１Ｙｅｓ）、音声取得部３３ａによって発話者の方向を検出する（ステップＳ１２）。続いて、制御部３２ｂは、発話者の方向に撮像部３３ｂの撮影方向を制御する（ステップＳ１３）。続いて、計測部３２ｃは、撮像部３３ｂによって撮影された発話者の画像データに基づき、発話者の顔部分のサイズを計測する（ステップＳ１４）。 FIG. 6 is a diagram illustrating a control flow according to the first embodiment. When distribution of a video conference is started (step S10), the detection unit 32a determines whether or not a speaker exists (step S11). The detection unit 32a determines that there is a speaker, for example, when the average value of the volume of the audio data acquired by the audio acquisition unit 33a is equal to or greater than a predetermined value. When there is no speaker (step S11: No), the detection unit 32a determines again whether or not there is a speaker after a predetermined time has elapsed (step S11). On the other hand, when there is a speaker (Step S11 Yes), the detection unit 32a detects the direction of the speaker by the voice acquisition unit 33a (Step S12). Subsequently, the control unit 32b controls the shooting direction of the imaging unit 33b in the direction of the speaker (Step S13). Subsequently, the measurement unit 32c measures the size of the speaker's face based on the image data of the speaker captured by the imaging unit 33b (step S14).

通知部３２ｄは、音声取得部３３ａから取得した発話者の音声データの音量が所定範囲の最小値よりも小さいか否かを判定する（ステップＳ１５）。通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さい場合（ステップ１５Ｙｅｓ）、通知部３２ｄは、計測部によって計測された発話者の顔のサイズが所定範囲の最小値よりも小さいか否かを判定する（ステップＳ１６）。検出部３２ａは、発話者の顔のサイズが所定範囲の最小値以上の場合（ステップＳ１６Ｎｏ）、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。一方、通知部３２ｄは、発話者の顔のサイズが所定範囲の最小値よりも小さい場合（ステップＳ１６Ｙｅｓ）、発話者にマイクに近づくように通知する（ステップＳ１７）。メッセージは、例えば、表示装置６にテロップを表示する等の方法によって通知される。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。 The notifying unit 32d determines whether or not the volume of the voice data of the speaker acquired from the voice acquiring unit 33a is smaller than the minimum value in the predetermined range (Step S15). When the volume of the voice data of the speaker is lower than the minimum value of the predetermined range (Yes in Step 15), the notification unit 32d determines that the face size of the speaker measured by the measurement unit is the minimum value of the predetermined range. It is determined whether it is smaller than (Step S16). When the size of the face of the speaker is equal to or larger than the minimum value of the predetermined range (No in step S16), the detection unit 32a determines again whether or not the speaker is present after a predetermined time elapses by the timer (step S21) (step S11). ). On the other hand, when the size of the speaker's face is smaller than the minimum value of the predetermined range (Step S16 Yes), the notification unit 32d notifies the speaker to approach the microphone (Step S17). The message is notified by, for example, displaying a telop on the display device 6. Subsequently, after a predetermined time elapses with the timer (step S21), the detection unit 32a again determines whether or not there is a speaker (step S11).

発話者の音声データの音量の判定（ステップＳ１５）において、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値以上の場合（ステップ１５Ｎｏ）、発話者の音声データの音量が所定範囲の最大値以下である否かの判定（ステップＳ１８）に移行する。続いて、検出部３２ａは、発話者の音声データの音量が所定範囲の最大値以下である場合（ステップＳ１８Ｎｏ）、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値より大きい場合（ステップＳ１８Ｙｅｓ）、発話者の顔サイズが所定範囲の最大値よりも大きいか否かを判定する（ステップＳ１９）。検出部３２ａは、発話者の顔のサイズが最大値以下である場合（ステップＳ１９Ｎｏ）、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。一方、通知部３２ｄは、発話者の顔のサイズが最大値より大きい場合（ステップＳ１９Ｙｅｓ）、発話者にマイクから離れるように通知する（ステップＳ２０）。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。 In the determination of the volume of the voice data of the speaker (step S15), when the volume of the voice data of the speaker is equal to or more than the minimum value of the predetermined range (No in step 15), the notification unit 32d determines that the volume of the voice data of the speaker is predetermined. The process proceeds to the determination of whether the value is equal to or less than the maximum value of the range (step S18). Subsequently, when the volume of the voice data of the speaker is equal to or lower than the maximum value of the predetermined range (No in step S18), the detection unit 32a determines whether or not the speaker is present again after a predetermined time has elapsed by the timer (step S21). Is performed (step S11). On the other hand, when the volume of the voice data of the speaker is larger than the maximum value of the predetermined range (Yes at Step S18), the notification unit 32d determines whether the face size of the speaker is larger than the maximum value of the predetermined range (Step S18). S19). When the size of the face of the speaker is equal to or smaller than the maximum value (No at Step S19), the detection unit 32a determines again whether or not there is a speaker (Step S11) after a predetermined time has elapsed with a timer (Step S21). Execute. On the other hand, when the size of the face of the speaker is larger than the maximum value (Yes in step S19), the notification unit 32d notifies the speaker to move away from the microphone (step S20). Subsequently, after a predetermined time elapses with the timer (step S21), the detection unit 32a again determines whether or not there is a speaker (step S11).

（実施形態２）
図７は、実施形態２に係る配信端末７の機能構成の例を示す図である。配信端末７は、通信部３１と、処理部３２と、データ取得部３３と、データ出力部３４と、記憶部３５とを有する。記憶部３５は、ユーザテーブル３５ａを有する。 (Embodiment 2)
FIG. 7 is a diagram illustrating an example of a functional configuration of the distribution terminal 7 according to the second embodiment. The distribution terminal 7 includes a communication unit 31, a processing unit 32, a data acquisition unit 33, a data output unit 34, and a storage unit 35. The storage unit 35 has a user table 35a.

データ取得部３３は、音声取得部３３ａと、撮像部３３ｂとを有する。処理部３２は、検出部３２ａと、制御部３２ｂと、計測部３２ｃと、通知部３２ｄと、格納部３２ｅとを有する。格納部３２ｅは、ユーザテーブル３５ａに接続される。また、通知部３２ｄは、データ出力部３４に接続される。 The data acquisition unit 33 has a voice acquisition unit 33a and an imaging unit 33b. The processing unit 32 includes a detection unit 32a, a control unit 32b, a measurement unit 32c, a notification unit 32d, and a storage unit 32e. The storage unit 32e is connected to the user table 35a. The notification unit 32d is connected to the data output unit 34.

図８は、ユーザテーブル３５ａのデータ構造の例を示す図である。ユーザテーブル３５ａは、通知済みの参加者を識別するためのテーブルであり、ユーザＩＤと、顔画像と、通知の有無とを対応付ける。「ユーザＩＤ」は、ユーザを一意に識別する番号である。配信端末７は、「ユーザＩＤ」によって参加者を識別する。「顔画像」は、ユーザの顔部分の画像データを示す。例えば、「顔画像」は、ユーザ全体の画像データから顔部分を切り抜いた画像データである。例えば、ＪＰＥＧ、ＧＩＦ等の画像フォーマットで記憶される。「通知の有無」は、参加者（ユーザ）に対して位置の移動を促す通知を行ったか否かを示す。「有」が格納されている場合、既に参加者に「マイクに近づいてください。」「マイクから離れてください。」等の通知を行ったことを示す。「無」が格納されている場合、まだ参加者に通知を行っていないことを示す。例えば、ユーザテーブル３５ａには、ユーザＩＤ「0101」の顔画像は「画像ａ」であり、まだ通知を行っていないことを示す。また、ユーザＩＤ「0103」の顔画像は「画像ｃ」であり、既に通知を行ったことを示す。 FIG. 8 is a diagram illustrating an example of the data structure of the user table 35a. The user table 35a is a table for identifying a notified participant, and associates a user ID, a face image, and the presence or absence of a notification. “User ID” is a number that uniquely identifies a user. The distribution terminal 7 identifies a participant by the “user ID”. “Face image” indicates image data of a user's face portion. For example, “face image” is image data obtained by cutting out a face portion from image data of the entire user. For example, it is stored in an image format such as JPEG or GIF. The “presence / absence of notification” indicates whether or not a notification to urge the participant (user) to move the position is made. If "Yes" is stored, it indicates that the participant has already notified the participant such as "Please approach the microphone." When "absence" is stored, it indicates that the participant has not been notified yet. For example, the user table 35a indicates that the face image of the user ID “0101” is “image a” and has not yet been notified. Further, the face image of the user ID “0103” is “image c”, which indicates that the notification has already been performed.

図７に戻る。検出部３２ａは、音声取得部３３ａを用いて、会議室で発話した参加者の方向を検出する。例えば、検出部３２ａは、座標を（０,０）とする原点から会議室内の基準点（座標（Ｘ_０,Ｙ_０））を撮影する方向を０度とした場合に、撮像部３３ｂの位置座標（Ｘ_１,Ｙ_１）と参加者を撮影した時の撮影角度θとを取得する。 Referring back to FIG. The detection unit 32a detects the direction of the participant speaking in the conference room using the voice acquisition unit 33a. For example, when the direction in which a reference point (coordinates (X ₀ , Y ₀ )) in the conference room is imaged from the origin having coordinates ( ₀ , ₀ ) to 0 degrees, the position of the imaging unit 33b is determined. The coordinates (X ₁ , Y ₁ ) and the photographing angle θ at the time of photographing the participant are acquired.

制御部３２ｂは、発話者の方向に、撮像部３３ｂの撮影方向を制御する。例えば、制御部３２ｂは、検出部３２ａによって検出された方向データに基づいて、発話者が撮影画像の中心となるように撮像部３３ｂを旋回させる。 The control unit 32b controls the shooting direction of the imaging unit 33b in the direction of the speaker. For example, based on the direction data detected by the detection unit 32a, the control unit 32b turns the imaging unit 33b so that the speaker becomes the center of the captured image.

計測部３２ｃは、記憶部３５に記憶された画像データに含まれる参加者の顔部分のサイズを計測する。 The measuring unit 32c measures the size of the participant's face included in the image data stored in the storage unit 35.

通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値よりも大きく、かつ発話者の顔サイズが所定範囲の最大値よりも大きい場合、または、発話者の音声データの音量が所定範囲の最小値よりも小さく、かつ発話者の顔サイズが所定範囲の最小値よりも小さい場合、発話者の画像データを内蔵メモリ内に記憶させる。発話者の画像データは、発話者全体の画像であってもよいし、発話者の顔部分を切り抜いた画像であってもよい。なお、通知部３２ｄは、発話者の音声データの音量が所定範囲以内である場合、及び発話者の顔サイズが所定範囲以内である場合は、処理を終了させる。 The notifying unit 32d determines that the volume of the voice data of the speaker is larger than the maximum value of the predetermined range and the face size of the speaker is larger than the maximum value of the predetermined range, or the volume of the voice data of the speaker is predetermined. When the face size of the speaker is smaller than the minimum value of the predetermined range, the image data of the speaker is stored in the built-in memory. The image data of the speaker may be an image of the entire speaker or an image obtained by cutting out the face of the speaker. The notifying unit 32d ends the process when the volume of the voice data of the speaker is within the predetermined range and when the face size of the speaker is within the predetermined range.

続いて、通知部３２ｄは、ユーザテーブル３５ａに記憶されているユーザの画像と比較し、顔認証することで発話者のユーザＩＤを特定する。例えば、通知部３２ｄは、固有顔等の顔認証アルゴリズムを用いて顔認証を行う。格納部３２ｅは、ユーザテーブル３５ａにおいて、特定したユーザＩＤに対応する「通知の有無」を参照し、過去に発話者に通知を行ったか否かを判定する。続いて、通知部３２ｄは、通知が行われていない場合、発話者に位置を移動する通知を行う。 Subsequently, the notifying unit 32d specifies the user ID of the speaker by comparing with the image of the user stored in the user table 35a and performing face authentication. For example, the notification unit 32d performs face authentication using a face authentication algorithm such as a unique face. The storage unit 32e refers to “presence or absence of notification” corresponding to the specified user ID in the user table 35a, and determines whether or not the speaker has been notified in the past. Subsequently, when the notification is not performed, the notifying unit 32d notifies the speaker of the position change.

例えば、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値よりも大きく、かつ発話者の顔サイズが所定範囲の最大値よりも大きい場合、マイクから離れることを促すメッセージを発話者に通知する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さく、かつ発話者の顔サイズが所定範囲の最小値よりも小さい場合、マイクに近づくことを促すメッセージを発話者に通知する。 For example, when the volume of the voice data of the speaker is larger than the maximum value in the predetermined range and the face size of the speaker is larger than the maximum value in the predetermined range, the notifying unit 32d utters a message prompting the user to move away from the microphone. Notify others. On the other hand, when the volume of the voice data of the speaker is smaller than the minimum value of the predetermined range and the face size of the speaker is smaller than the minimum value of the predetermined range, the notification unit 32d utters a message prompting the user to approach the microphone. Notify others.

格納部３２ｅは、通知部３２ｄによって発話者にメッセージが通知された場合、ユーザテーブル３５ａのユーザＩＤに対応する「通知の有無」に、通知済みであることを示す「有」を格納する。 When the message is notified to the speaker by the notifying unit 32d, the storage unit 32e stores “present” indicating that the notification has been made in “presence or absence of notification” corresponding to the user ID in the user table 35a.

図９は、実施形態２の制御フローを示す図である。ビデオ会議の配信が開始されると（ステップＳ３０）、検出部３２ａは、発話者が存在するか否かを判定する（ステップＳ３１）。検出部３２ａは、発話者が存在しない場合（ステップＳ３１Ｎｏ）、所定時間経過後、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。一方、検出部３２ａは、発話者が存在する場合（ステップＳ３１Ｙｅｓ）、音声取得部３３ａによって発話者の方向を検出する（ステップＳ３２）。続いて、制御部３２ｂは、発話者の方向に撮像部３３ｂの撮影方向を制御する（ステップＳ３３）。続いて、計測部３２ｃは、発話者の顔部分のサイズを計測する（ステップＳ３４）。 FIG. 9 is a diagram illustrating a control flow according to the second embodiment. When distribution of the video conference is started (step S30), the detection unit 32a determines whether or not a speaker exists (step S31). When there is no speaker (Step S31 No), the detection unit 32a determines again whether or not there is a speaker after a predetermined time has elapsed (Step S31). On the other hand, when there is a speaker (Step S31 Yes), the detection unit 32a detects the direction of the speaker by the voice acquisition unit 33a (Step S32). Subsequently, the control unit 32b controls the shooting direction of the imaging unit 33b in the direction of the speaker (Step S33). Subsequently, the measuring unit 32c measures the size of the speaker's face (step S34).

続いて、通知部３２ｄは、音声取得部３３ａから取得した発話者の音声データの音量が所定範囲の最小値よりも小さいか否かを判定する（ステップＳ３５）。通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さい場合（ステップ３５Ｙｅｓ）、計測部によって計測された発話者の顔のサイズが所定範囲の最小値よりも小さいか否かを判定する（ステップＳ３６）。検出部３２ａは、発話者の顔のサイズが所定範囲の最小値以上の場合（ステップＳ３６Ｎｏ）、タイマーで所定時間経過後（ステップＳ４１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。一方、通知部３２ｄは、発話者の顔のサイズが所定範囲の最小値よりも小さい場合、認証処理を実行する（ステップＳ３７）。認証処理に関しては、図１０で説明する。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ４１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。 Subsequently, the notification unit 32d determines whether or not the volume of the voice data of the speaker acquired from the voice acquisition unit 33a is smaller than a minimum value in a predetermined range (Step S35). When the volume of the voice data of the speaker is smaller than the minimum value of the predetermined range (step 35 Yes), the notifying unit 32d determines whether the size of the face of the speaker measured by the measurement unit is smaller than the minimum value of the predetermined range. Is determined (step S36). When the size of the face of the speaker is equal to or larger than the minimum value of the predetermined range (No at Step S36), the detection unit 32a determines again whether or not there is a speaker (Step S31) after a predetermined time has elapsed by the timer (Step S41). ). On the other hand, when the size of the speaker's face is smaller than the minimum value in the predetermined range, the notifying unit 32d executes an authentication process (step S37). The authentication process will be described with reference to FIG. Subsequently, after a predetermined time elapses with a timer (step S41), the detection unit 32a again determines whether or not there is a speaker (step S31).

発話者の音声データの音量の判定（ステップＳ３５）において、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値以上の場合（ステップ３５Ｎｏ）、発話者の音声データの音量が所定範囲の最大値以下であるか否かの判定（ステップＳ３８）に移行する。続いて、検出部３２ａは、発話者の音声データの音量が所定範囲の最大値以下である場合（ステップＳ３８Ｎｏ）、タイマーで所定時間経過後（ステップＳ４１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値より大きい場合（ステップＳ３８Ｙｅｓ）、発話者の顔サイズが所定範囲の最大値よりも大きいか否かを判定する（ステップＳ３９）。検出部３２ａは、発話者の顔のサイズが最大値以下である場合（ステップＳ３９Ｎｏ）、タイマーで所定時間経過後（ステップＳ４１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。一方、通知部３２ｄは、発話者の顔のサイズが最大値より大きい場合（ステップＳ３９Ｙｅｓ）、認証処理を実行する（ステップＳ４０）。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。 In the determination of the volume of the speaker's voice data (step S35), when the volume of the speaker's voice data is equal to or greater than the minimum value in the predetermined range (step 35No), the notifying unit 32d determines that the volume of the speaker's voice data is predetermined. The process proceeds to the determination of whether the value is equal to or less than the maximum value of the range (step S38). Subsequently, when the volume of the voice data of the speaker is equal to or less than the maximum value in the predetermined range (No in step S38), the detection unit 32a determines whether or not the speaker is present again after a predetermined time has elapsed by the timer (step S41). Is performed (step S31). On the other hand, when the volume of the voice data of the speaker is larger than the maximum value of the predetermined range (Yes at Step S38), the notification unit 32d determines whether the face size of the speaker is larger than the maximum value of the predetermined range (Step S38). S39). When the size of the speaker's face is equal to or smaller than the maximum value (No at Step S39), the detection unit 32a determines again whether or not a speaker exists (Step S31) after a predetermined time has elapsed with a timer (Step S41). Execute. On the other hand, when the size of the speaker's face is larger than the maximum value (Yes at Step S39), the notifying unit 32d executes an authentication process (Step S40). Subsequently, after a predetermined time has elapsed with the timer (step S21), the detection unit 32a determines again whether or not there is a speaker (step S31).

図１０は、認証処理のフローを示す図である。図１０の認証処理は、図９の（ステップＳ３７）及び（ステップＳ４０）に対応する。通知部３２ｄは、発話者の画像データを内蔵メモリ内に記憶させる（ステップＳ５０）。続いて、通知部３２ｄは、ユーザテーブル３５ａに記憶されているユーザの顔部分の画像データと比較し、顔認証を行うことで発話者のユーザＩＤを特定する（ステップＳ５１）。続いて、格納部３２ｅは、ユーザテーブル３５ａを参照し、過去に発話者に位置移動の通知を行ったか否かを判定する（ステップＳ５２）。通知部３２ｄは、過去に発話者に通知を行っていた場合（ステップＳ５２Ｎｏ）、処理を終了させる。まだ発話者に通知を行っていない場合（ステップＳ５２Ｙｅｓ）、通知部３２ｄは、発話者に位置を移動する通知を行う（ステップＳ５３）。 FIG. 10 is a diagram illustrating a flow of the authentication process. The authentication processing in FIG. 10 corresponds to (Step S37) and (Step S40) in FIG. The notifying unit 32d stores the image data of the speaker in the built-in memory (Step S50). Subsequently, the notifying unit 32d specifies the user ID of the speaker by comparing the image data of the user's face stored in the user table 35a and performing face authentication (step S51). Subsequently, the storage unit 32e refers to the user table 35a and determines whether or not the position change has been notified to the speaker in the past (step S52). The notifying unit 32d terminates the process when notifying the speaker in the past (Step S52 No). If the speaker has not been notified yet (step S52 Yes), the notifying unit 32d notifies the speaker that the position has been moved (step S53).

例えば、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さく、かつ発話者の顔サイズが所定範囲よりも小さい場合（ステップＳ３７から移行した場合）、マイクに近づくことを促すメッセージを発話者に通知する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値よりも大きく、かつ発話者の顔サイズが所定範囲の最大値よりも大きい場合（ステップＳ４０から移行した場合）、マイクから離れることを促すメッセージを発話者に通知する。 For example, when the volume of the voice data of the speaker is smaller than the minimum value of the predetermined range and the face size of the speaker is smaller than the predetermined range (when shifting from step S37), the notification unit 32d approaches the microphone. Is notified to the speaker. On the other hand, when the volume of the voice data of the speaker is larger than the maximum value of the predetermined range and the face size of the speaker is larger than the maximum value of the predetermined range (when the process proceeds from step S40), the notifying unit 32d sets the microphone. Notify the speaker of a message prompting you to leave.

続いて、格納部３２ｅは、ユーザテーブル３５ａの「通知の有無」に「有」を格納する（ステップＳ５４）。 Subsequently, the storage unit 32e stores “Yes” in “Presence or absence of notification” in the user table 35a (Step S54).

以上のように、過去にユーザに位置移動を通知したか否かを事前に判定することで、会議室内が満席である場合等のように、ユーザが移動できない事情がある場合に繰り返し通知することを避けることができる。 As described above, by determining in advance whether or not the user has been notified of the position movement in the past, it is possible to repeatedly notify when there is a situation where the user cannot move, such as when the conference room is full. Can be avoided.

（実施形態３）
実施形態１又は２のように顔のサイズから距離を計測する以外の方法で、発話者との距離を計測してもよい。例えば、計測部３２ｃは、距離センサを用いて発話者との距離を計測してもよい。距離センサには、例えば、ステレオカメラ、超音波センサ、赤外線センサ等が含まれる。なお、ステレオカメラは、発話者との距離の測定に並行して、発話者の画像データの取得を行ってもよい。 (Embodiment 3)
The distance from the speaker may be measured by a method other than measuring the distance from the face size as in the first or second embodiment. For example, the measuring unit 32c may measure the distance to the speaker using a distance sensor. Examples of the distance sensor include a stereo camera, an ultrasonic sensor, an infrared sensor, and the like. Note that the stereo camera may acquire the image data of the speaker in parallel with the measurement of the distance to the speaker.

図１１Ａは、実施形態３の配信端末８の第１のハードウェア構成を示す図である。第１のハードウェア構成では、ステレオカメラ５０を用いて発話者との距離が測定される。配信端末８は、撮像素子Ｉ／Ｆ１１３にステレオカメラ５０が接続されている点で実施形態１の配信端末３と異なり、他のハードウェア構成は実施形態１と同様である。なお、撮像素子Ｉ／Ｆ１１３に、撮影用のカメラ１１２と距離計測用のステレオカメラ５０とがそれぞれ接続される構成であってもよい。 FIG. 11A is a diagram illustrating a first hardware configuration of the distribution terminal 8 according to the third embodiment. In the first hardware configuration, the distance from the speaker is measured using the stereo camera 50. The distribution terminal 8 differs from the distribution terminal 3 of the first embodiment in that a stereo camera 50 is connected to the image sensor I / F 113, and the other hardware configuration is the same as that of the first embodiment. Note that a configuration may be adopted in which a camera 112 for photographing and a stereo camera 50 for distance measurement are connected to the image sensor I / F 113, respectively.

図１１Ｂは、実施形態３の配信端末８の第２のハードウェア構成を示す図である。第２のハードウェア構成は、赤外線センサ５１を用いて発話者との距離が測定される。配信端末８は、バス１１０に接続されたセンサＩ／Ｆ１２２を介して赤外線センサ５１又は超音波センサ５２が接続されている点で実施形態１の配信端末３と異なり、他のハードウェア構成は実施形態１と同様である。 FIG. 11B is a diagram illustrating a second hardware configuration of the distribution terminal 8 according to the third embodiment. In the second hardware configuration, the distance from the speaker is measured using the infrared sensor 51. The distribution terminal 8 differs from the distribution terminal 3 of the first embodiment in that an infrared sensor 51 or an ultrasonic sensor 52 is connected via a sensor I / F 122 connected to a bus 110, and other hardware configurations are not implemented. Same as in the first embodiment.

また、配信端末８は、図３と同様の機能構成を有するので、各構成の説明を省略する。 In addition, the distribution terminal 8 has the same functional configuration as that of FIG.

図１２は、ステレオカメラ５０の外観の一例を示す図である。ステレオカメラ５０は、並列して設置された複数のカメラを用いて発話者との距離を計測する機器である。各カメラの撮影方向は、独立して制御される。図１２の例では、２つのカメラが近接して設置されているが、カメラ間の距離を大きくしてもよい。また、図１２の例では、２つのカメラを使用しているが３つ以上のカメラを使用してもよい。 FIG. 12 is a diagram illustrating an example of the appearance of the stereo camera 50. The stereo camera 50 is a device that measures a distance from a speaker using a plurality of cameras installed in parallel. The shooting direction of each camera is independently controlled. In the example of FIG. 12, two cameras are installed close to each other, but the distance between the cameras may be increased. In the example of FIG. 12, two cameras are used, but three or more cameras may be used.

ステレオカメラ５０を用いて発話者との距離を算出する方法について説明する。カメラ間の距離Ａ［ｍ］と、カメラの焦点距離Ｂ［ｍ］と、各カメラによって撮像された発話者の位置の差Ｃ［ｍ］に基づいて、計測部３２ｃは、次の数式に基づいて発話者との距離Ｄ［ｍ］を算出する。 A method of calculating the distance from the speaker using the stereo camera 50 will be described. Based on the distance A [m] between the cameras, the focal length B [m] of the cameras, and the difference C [m] between the positions of the speakers imaged by each camera, the measuring unit 32c calculates the following formula: To calculate the distance D [m] from the speaker.

（数１）
Ｄ＝Ａ×Ｂ／Ｃ
カメラ間の距離Ａ［ｍ］は、大きい方が距離の測定精度が高いが、カメラ間の距離Ａを大きくする場合、距離の測定時間が大きくなる。カメラ間の距離が大きいと、発話者の探索を開始してから各カメラで発話者を捕捉するまでの時間が大きくなるためである。カメラ間の距離Ａ［ｍ］を大きくする場合、ステレオカメラ５０で計測するターゲットとなる距離範囲を狭く設定することで、探索にかかる時間を小さくすることができる。 (Equation 1)
D = A × B / C
The greater the distance A [m] between the cameras, the higher the distance measurement accuracy is. However, when the distance A between the cameras is increased, the distance measurement time increases. This is because if the distance between the cameras is large, the time from the start of the search for the speaker to the capture of the speaker by each camera becomes long. When the distance A [m] between the cameras is increased, the time required for the search can be reduced by setting the distance range to be the target measured by the stereo camera 50 narrow.

例えば、ステレオカメラ５０で計測するターゲットとなる距離範囲は、例えば、撮影する室内の広さに応じて設定される。ステレオカメラ５０で計測するターゲットとなる距離範囲をあらかじめ設定しておくことで、発話者との距離を計測する時間を短くすることができる。 For example, the target distance range measured by the stereo camera 50 is set according to, for example, the size of the room in which the image is taken. By setting in advance the distance range to be measured by the stereo camera 50, the time for measuring the distance to the speaker can be shortened.

また、発話者の位置の差Ｃは、一方のカメラにより撮像された発話者の位置と、他方のカメラにより撮像された発話者の位置とが左右に例えば、５ｃｍずれていた場合、発話者の位置の差Ｃ［ｍ］は０．０５となる。 Further, the difference C between the positions of the speakers is, for example, when the position of the speaker imaged by one camera and the position of the speaker imaged by the other camera are shifted left and right by 5 cm, for example. The position difference C [m] is 0.05.

また、ステレオカメラ５０は、計測部３３ｃ以外に図２のカメラ１１２として用いてもよい。例えば、ステレオカメラ５０は、カメラ１１２として使用される場合、複数のカメラにより撮像された画像を合成して画像データを生成してもよい。また、ステレオカメラ５０は、一方のカメラで撮像された画像を画像データとしてサーバ２に送信してもよい。また、ステレオカメラ５０は、専ら発話者との距離計測に用い、配信端末３は、発話者の画像データを取得するためのカメラ１１２を別に備えてもよい。 Further, the stereo camera 50 may be used as the camera 112 in FIG. 2 other than the measuring unit 33c. For example, when used as the camera 112, the stereo camera 50 may generate image data by combining images captured by a plurality of cameras. Further, the stereo camera 50 may transmit an image captured by one camera to the server 2 as image data. In addition, the stereo camera 50 may be used exclusively for measuring the distance to the speaker, and the distribution terminal 3 may include a separate camera 112 for acquiring image data of the speaker.

図１３は、赤外線センサ５１を有するカメラ１１２の外観の一例を示す図である。例えば、図１３に示すように赤外線センサ５１は、カメラ１１２の撮影方向と同じ方向に向くように、カメラ１１２と並列して配置される。赤外線センサ５１は、検出部３２ａによって検出された発話者の方向にカメラ１１２の撮影方向が制御された後に、発話者との距離の測定を開始する。なお、超音波センサ５２を距離センサとして用いる場合も、図１３の赤外線センサ５１と同様にカメラ１１２と並列して配置される。 FIG. 13 is a diagram illustrating an example of an external appearance of a camera 112 having the infrared sensor 51. For example, as shown in FIG. 13, the infrared sensor 51 is arranged in parallel with the camera 112 so as to face the same direction as the shooting direction of the camera 112. After the imaging direction of the camera 112 is controlled to the direction of the speaker detected by the detection unit 32a, the infrared sensor 51 starts measuring the distance to the speaker. When the ultrasonic sensor 52 is used as a distance sensor, the ultrasonic sensor 52 is arranged in parallel with the camera 112 similarly to the infrared sensor 51 in FIG.

また、赤外線センサ５１が距離センサである場合、赤外線センサ５１は、例えば、発話者に赤外線を照射し、反射光を検出した受光素子の位置に基づいて三角測量の原理で発話者との距離を計測する。 Further, when the infrared sensor 51 is a distance sensor, the infrared sensor 51 irradiates the speaker with infrared rays, for example, and determines the distance to the speaker based on the principle of triangulation based on the position of the light receiving element that has detected reflected light. measure.

具体的には、赤外線センサ５１は、発話者に赤外線を照射し、位置検出素子ＰＳＤ（Position Sensing Device）で発話者からの反射光を受光する。発話者との距離に応じて反射光を検出する位置検出素子の位置が変化するので、計測部３２ｃは、反射光を検出した位置検出素子の位置を距離に換算することにより、発話者との距離を算出することができる。 Specifically, the infrared sensor 51 irradiates the speaker with infrared light, and receives reflected light from the speaker using a position sensing element PSD (Position Sensing Device). Since the position of the position detecting element that detects the reflected light changes according to the distance from the speaker, the measuring unit 32c converts the position of the position detecting element that has detected the reflected light into a distance, thereby obtaining a communication with the speaker. The distance can be calculated.

なお、赤外線センサ５１として使用される素子は、ＰＳＤに限定されず、ＯＥＳ（Opto Elektronischer Schaltkreis）等の他の種類の素子を使用してもよい。 Note that the element used as the infrared sensor 51 is not limited to the PSD, and another type of element such as OES (Opto Elektronischer Schaltkreis) may be used.

また、超音波センサ５２を用いる場合は、計測部３２ｃは、検出部３２ａによって検出された発話者の方向に制御された後に、検出された発話者に超音波を発信して反射波を計測、又は発話者に赤外線を照射して反射光を計測することで、発話者との距離を測定する。 When the ultrasonic sensor 52 is used, the measuring unit 32c controls the direction of the speaker detected by the detecting unit 32a, transmits an ultrasonic wave to the detected speaker, measures a reflected wave, Alternatively, the distance to the speaker is measured by irradiating the speaker with infrared rays and measuring the reflected light.

例えば、計測部３２ｃは、超音波センサを用いて発話者に超音波を発信してから反射波を受信するまでの時間を計測することで発話者との距離を測定する。例えば、発話者に超音波を発信してから反射波を受信するまでの時間をｔ［ｓ］、音速をｃ［ｍ／ｓ］とした場合、計測部３２ｃは、以下の式に基づいて発話者との距離Ｌを算出する。 For example, the measuring unit 32c measures the distance from transmitting the ultrasonic wave to the speaker using the ultrasonic sensor to receiving the reflected wave, thereby measuring the distance from the speaker. For example, assuming that the time from transmitting an ultrasonic wave to a speaker to receiving a reflected wave is t [s] and the sound speed is c [m / s], the measuring unit 32c utters the speech based on the following equation. The distance L to the person is calculated.

（数２）
Ｌ＝ｃ×ｔ／２
図１４は、実施形態３の制御フローを示す図である。ビデオ会議の配信が開始されると（ステップＳ６０）、検出部３２ａは、発話者が存在するか否かを判定する（ステップＳ６１）。検出部３２ａは、発話者が存在しない場合（ステップＳ６１Ｎｏ）、所定時間経過後、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。一方、検出部３２ａは、発話者が存在する場合（ステップＳ６１Ｙｅｓ）、音声取得部３３ａによって発話者の方向を検出する（ステップＳ６２）。続いて、制御部３２ｂは、発話者の方向に撮像部３３ｂの撮影方向を制御する（ステップＳ６３）。続いて、計測部３２ｃは、ステレオカメラ５０、赤外線センサ５１又は超音波センサ５２に基づいて、発話者との距離を計測する（ステップＳ６４）。 (Equation 2)
L = c × t / 2
FIG. 14 is a diagram illustrating a control flow according to the third embodiment. When distribution of the video conference is started (step S60), the detection unit 32a determines whether or not there is a speaker (step S61). When there is no speaker (step S61 No), the detection unit 32a determines again whether or not there is a speaker after a predetermined time has elapsed (step S61). On the other hand, when there is a speaker (Step S61 Yes), the detection unit 32a detects the direction of the speaker by the voice acquisition unit 33a (Step S62). Subsequently, the control unit 32b controls the shooting direction of the imaging unit 33b in the direction of the speaker (Step S63). Subsequently, the measuring unit 32c measures the distance from the speaker based on the stereo camera 50, the infrared sensor 51, or the ultrasonic sensor 52 (Step S64).

通知部３２ｄは、音声取得部３３ａから取得した発話者の音声データの音量が所定範囲の最小値よりも小さいか否かを判定する（ステップＳ６５）。通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さい場合（ステップ６５Ｙｅｓ）、計測部によって計測された発話者との距離が所定範囲の最大値よりも大きいか否かを判定する（ステップＳ６６）。検出部３２ａは、発話者との距離が所定範囲の最大値以下の場合（ステップＳ６６Ｎｏ）、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。一方、通知部３２ｄは、発話者との距離が所定範囲の最大値より大きい場合（ステップＳ６６Ｙｅｓ）、発話者にマイクに近づくように通知する（ステップＳ６７）。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。 The notifying unit 32d determines whether or not the volume of the speaker's voice data acquired from the voice acquiring unit 33a is smaller than the minimum value in the predetermined range (Step S65). When the volume of the voice data of the speaker is smaller than the minimum value of the predetermined range (Step 65Yes), the notifying unit 32d determines whether the distance to the speaker measured by the measurement unit is larger than the maximum value of the predetermined range. Is determined (step S66). When the distance from the speaker is equal to or less than the maximum value of the predetermined range (No in step S66), the detection unit 32a determines again whether or not there is a speaker (step S61) after a predetermined time elapses with a timer (step S71). Execute On the other hand, when the distance to the speaker is larger than the maximum value in the predetermined range (Yes at Step S66), the notification unit 32d notifies the speaker to approach the microphone (Step S67). Subsequently, after a predetermined time elapses with the timer (step S71), the detection unit 32a again determines whether or not there is a speaker (step S61).

発話者の音声データの音量が所定範囲の最小値よりも小さいか否かの判定（ステップＳ６５）において、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値以上の場合（ステップ６５Ｎｏ）、発話者の音声データの音量が所定範囲の最大値以下であるか否かの判定（ステップＳ６８）に移行する。続いて、検出部３２ａは、発話者の音声データの音量が所定範囲の最大値以下である場合（ステップＳ６８Ｎｏ）、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値より大きい場合（ステップＳ６８Ｙｅｓ）、発話者との距離が所定範囲の最小値より小さいか否かを判定する（ステップＳ６９）。発話者との距離が最小値以上である場合（ステップＳ６９Ｎｏ）、検出部３２ａは、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。 In determining whether or not the volume of the voice data of the speaker is lower than the minimum value of the predetermined range (step S65), the notifying unit 32d determines that the volume of the voice data of the speaker is equal to or higher than the minimum value of the predetermined range (step S65). 65No), and it is determined whether or not the volume of the voice data of the speaker is equal to or less than the maximum value in the predetermined range (step S68). Subsequently, when the volume of the voice data of the speaker is equal to or lower than the maximum value of the predetermined range (No in step S68), the detection unit 32a determines whether or not the speaker is present again after a predetermined time has elapsed by the timer (step S71). (Step S61) is executed. On the other hand, when the volume of the voice data of the speaker is larger than the maximum value of the predetermined range (Yes at Step S68), the notification unit 32d determines whether the distance to the speaker is smaller than the minimum value of the predetermined range (Step S69). ). When the distance from the speaker is equal to or greater than the minimum value (No at Step S69), after a predetermined time has elapsed with the timer (Step S71), the detection unit 32a again determines whether or not there is a speaker (Step S61). I do.

一方、通知部３２ｄは、発話者との距離が最小値より小さい場合（ステップＳ６９Ｙｅｓ）、発話者にマイクから離れるように通知する（ステップＳ７０）。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。 On the other hand, when the distance to the speaker is smaller than the minimum value (step S69 Yes), the notification unit 32d notifies the speaker to move away from the microphone (step S70). Subsequently, after a predetermined time elapses with the timer (step S71), the detection unit 32a again determines whether or not there is a speaker (step S61).

また、実施形態３において実施形態２で説明した認証処理を行ってもよい。かかる場合、（ステップＳ６７）及び（ステップＳ７０）において、図１０に示した認証処理が行われる。 In the third embodiment, the authentication processing described in the second embodiment may be performed. In such a case, in (Step S67) and (Step S70), the authentication processing shown in FIG. 10 is performed.

また、実施形態１乃至３は、コンピュータの媒体に格納されたプログラムを実行させることにより、配信端末３、配信端末７又は配信端末８の機能を実現できる。 In the first to third embodiments, the functions of the distribution terminal 3, the distribution terminal 7, or the distribution terminal 8 can be realized by executing a program stored in a computer medium.

以上、配信端末３を実施形態１で、配信端末７を実施形態２で、配信端末８を実施形態３で説明したが、本発明は上記実施形態に限定されるものではなく、本発明の範囲内で種々の変形及び改良が可能である。 As described above, the distribution terminal 3 has been described in the first embodiment, the distribution terminal 7 has been described in the second embodiment, and the distribution terminal 8 has been described in the third embodiment. However, the present invention is not limited to the above-described embodiment. Various modifications and improvements are possible within.

なお、本実施形態において、配信端末３及び配信端末７は、撮影装置の一例である。記憶部３５は、記憶部の一例である。検出部３２ａは、検出部の一例である。制御部３２ｂは、制御部の一例である。計測部３２ｃは、計測部の一例である。通知部３２ｄは、通知部の一例である。格納部３２ｅは、格納部の一例である。ユーザＩＤは、識別情報の一例である。 Note that, in the present embodiment, the distribution terminal 3 and the distribution terminal 7 are examples of a photographing device. The storage unit 35 is an example of a storage unit. The detection unit 32a is an example of a detection unit. The control unit 32b is an example of a control unit. The measurement unit 32c is an example of a measurement unit. The notification unit 32d is an example of a notification unit. The storage unit 32e is an example of a storage unit. The user ID is an example of identification information.

２サーバ
３,７,８配信端末
４利用者端末
５通信ネットワーク
６表示装置
３１通信部
３２処理部
３２ａ検出部
３２ｂ制御部
３２ｃ計測部
３２ｄ通知部
３３データ取得部
３３ａ音声取得部
３３ｂ撮像部
３４データ出力部
３５記憶部
１００ビデオ配信システム 2 server 3, 7, 8 distribution terminal 4 user terminal 5 communication network 6 display device 31 communication unit 32 processing unit 32a detection unit 32b control unit 32c measurement unit 32d notification unit 33 data acquisition unit 33a audio acquisition unit 33b imaging unit 34 data Output unit 35 Storage unit 100 Video distribution system

特開２００８−１０３８２４号公報JP 2008-103824 A

Claims

In a photographing device for photographing a person,
A detection unit that detects the direction of the uttered person and the volume at the time of utterance;
A control unit that controls a shooting direction to the detected direction,
A measurement unit that measures the size of the person in the direction of the direction detected by the detection unit,
A notification unit that notifies the person when the volume is smaller than the minimum value of the first range according to the predetermined range and the size of the person on the image is smaller than the minimum value of the second range ;
An imaging device having

The photographing apparatus according to claim 1, wherein the notification unit notifies the person when the volume is higher than a maximum value in a predetermined range.

The measuring unit measures a distance to the person in the direction detected by the detecting unit ,
The notification unit, the volume is smaller than the minimum value before Symbol first range, and when the distance between the detected persons is larger than the maximum value of the third range, according to claim 1, a notification to the person Shooting equipment.

Prior Symbol notification unit, wherein the volume before Symbol greater than the maximum value of the first range and when the size of the image of the person is larger than the maximum value of the second range, the notification to the person claimed Item 3. An imaging device according to Item 2.

The measuring unit measures a distance to the person in the direction detected by the detecting unit ,
The notification unit, the volume is greater than the maximum value before Symbol first range, and when the distance between the detected persons is smaller than the minimum value of the third range, according to claim 2, the notification to the person Shooting equipment.

A storage unit that stores the identification information of the person corresponding to the face image extracted from the captured image in the storage unit in association with the presence or absence of the notification,
The notification unit, when not subjected to notification to the photographed person photographing apparatus according to any one of claims 1 to 5, the notification.

A program to be executed by a photographing device for photographing a person,
Detects the direction of the person who spoke and the volume at the time of speech,
Controlling the shooting direction to the detected direction,
Measure the size of the person in the detected direction on the image,
When the sound volume is smaller than the minimum value of the first range related to the predetermined range and the size of the person on the image is smaller than the minimum value of the second range , a process of notifying the person is performed on the photographing apparatus. Program to let.

A method performed by a photographing device for photographing a person,
Detects the direction of the person who spoke and the volume at the time of speech,
Controlling the shooting direction to the detected direction,
Measure the size of the person in the detected direction on the image,
When the sound volume is smaller than the minimum value of the first range related to the predetermined range and the size of the person on the image is smaller than the minimum value of the second range , the photographing device executes a process of notifying the person. how to.