JP2009267621A

JP2009267621A - Communication apparatus

Info

Publication number: JP2009267621A
Application number: JP2008112788A
Authority: JP
Inventors: Yukio Tada; 幸生多田
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2008-04-23
Filing date: 2008-04-23
Publication date: 2009-11-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique for easily recognizing whether a viewer understands the content of a conference, etc., when a remote conference, etc., is held by communication among a plurality of communication terminals. <P>SOLUTION: A terminal 10 transmits other terminal 10 sound data expressing sound which is collected with a microphone 15 and video data expressing a video photographed by a photographing part 19. The terminal 10 receives the video data and the sound data from other terminal 10, emits the received sound data from a speaker 17 as sound, and outputs the video data to a display part 13 to display the video. In this case, a control part 11 analyzes the video data to be output from the photographing part 19 and detects a face image, thereby detecting the nodding motion of a participant. The control part 11 calculates the times and frequencies of detection of the nodding motions, and outputs data indicating a calculation result to the display part 13, etc. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、通信装置に関する。 The present invention relates to a communication device.

近年、通信網を介して接続された複数の通信端末を用いて会議を行う遠隔会議システムが普及している。このような遠隔会議システムにおいては、発話者と聴取者が直接対面していないため、発話者が聴取者の反応を感じることが困難であり、自身の声が相手に届いているかを不安に感じる場合がある。特許文献１には、通信会議を円滑に進めることができるように、参加者が自己の感情や動作をボタンで入力すると、離れた場所にいるロボットが入力に応じた動きをするシステムが提案されている。このシステムにおいては、参加者が「うなずく」というボタンを押下すると、離れた場所でロボットがうなずくようになっている。また、特許文献２には、ビデオ映像を表示しているディスプレイを遠隔操作でかたむけて、参加者がうなずいているように見せるシステムが提案されている。このシステムでは、参加者が「うなずく」ボタンを押下すると、離れた場所のディスプレイが傾くようになっている。特許文献１や２に記載の技術によれば、聴取者がうなずいた旨を発話者が認識することができる。 In recent years, a remote conference system that performs a conference using a plurality of communication terminals connected via a communication network has become widespread. In such a teleconference system, since the speaker and the listener are not directly facing each other, it is difficult for the speaker to feel the listener's reaction, and he / she feels uneasy whether his / her voice reaches the other party. There is a case. Patent Document 1 proposes a system in which when a participant inputs his / her emotions and actions with buttons, a robot in a remote location moves according to the input so that the communication conference can proceed smoothly. ing. In this system, when a participant presses the “nod” button, the robot nods at a remote location. Further, Patent Document 2 proposes a system in which a display that displays a video image is held by remote control so that a participant nods. In this system, when a participant presses the “nod” button, the display at a remote location is tilted. According to the techniques described in Patent Documents 1 and 2, the speaker can recognize that the listener nodded.

また、特許文献３には、会議参加者の物理現象を記録しておき、索引付けを行うことによって、後に会議映像を見る者にとって有用な会議映像を提供する装置が提案されている。この装置においては、参加者の「うなずき」動作をバックチャネルとして認識し、記録するようになっている。また、特許文献４には、ユーザの言語情報を元にロボットの動きを制御するシステムが提案されている。特許文献４に記載の技術によれば、「うなずき」をユーザ言語情報取得部にて取得し、その結果を元に、「楽しい」動きをロボットに実行させることができる。
特開２００３−２３５０１９号公報特開２００５−０３３８１１号公報特開２００５−２７７４４５号公報特開２００７−０３００５０号公報 Patent Document 3 proposes an apparatus that provides a conference video useful for those who view the conference video later by recording the physical phenomena of the conference participants and performing indexing. In this apparatus, the “nodding” operation of the participant is recognized as a back channel and recorded. Patent Document 4 proposes a system for controlling the movement of a robot based on user language information. According to the technique described in Patent Literature 4, “nodding” is acquired by the user language information acquisition unit, and based on the result, a “fun” movement can be executed by the robot.
JP 2003-235019 A JP 2005-033811 A Japanese Patent Laying-Open No. 2005-277445 JP 2007-030050 A

しかしながら、上述の特許文献１乃至４に記載の技術では、聴取者がうなずいた旨を認識することができるものの、聴取者が話の内容を理解しているか否かを認識することは困難であった。
本発明は上述した背景に鑑みてなされたものであり、複数の通信端末間で通信によって会議等を行う際に、聴取者が話の内容を理解しているかを認識し易くすることのできる技術を提供することを目的とする。 However, although the techniques described in Patent Documents 1 to 4 described above can recognize that the listener nodded, it is difficult to recognize whether or not the listener understands the content of the story. It was.
The present invention has been made in view of the above-described background, and can make it easy for a listener to recognize whether or not the content of a talk is understood when a conference or the like is performed by communication between a plurality of communication terminals. The purpose is to provide.

上記課題を解決するために、本発明は、収音手段によって収音された聴取者の音声を表す音声データ及び撮影手段によって撮影された前記聴取者の映像を表す映像データの少なくともいずれか一方を含むデータを取得するデータ取得手段と、前記データ取得手段により取得されたデータを解析し、解析結果を予め定められた照合パターンと照合し、照合結果が所定の条件を満たすものを前記聴取者のうなずき動作として検出する検出手段と、前記検出手段によって検出されるうなずき動作の検出の頻度を算出する算出手段と、前記算出手段によって算出された頻度を表す頻度データを出力する出力手段とを具備することを特徴とする通信装置を提供する。 In order to solve the above problems, the present invention provides at least one of audio data representing a listener's voice collected by a sound collecting means and video data representing a video of the listener photographed by a photographing means. Data acquisition means for acquiring data, and analyzing the data acquired by the data acquisition means, comparing the analysis result with a predetermined matching pattern, and the result of the matching satisfying a predetermined condition of the listener Detecting means for detecting as a nodding action, calculating means for calculating the frequency of detection of the nodding action detected by the detecting means, and output means for outputting frequency data representing the frequency calculated by the calculating means. A communication device is provided.

本発明の好ましい態様において、前記頻度と理解度との対応関係を記憶する対応関係記憶手段と、前記対応関係記憶手段を参照して、前記算出手段によって算出された頻度に対応する理解度を特定する理解度特定手段とを具備し、前記出力手段は、前記理解度特定手段によって特定された理解度を示す理解度データを出力してもよい。 In a preferred aspect of the present invention, the correspondence storage means for storing the correspondence between the frequency and the understanding level, and the understanding level corresponding to the frequency calculated by the calculation means are identified with reference to the correspondence storage means. Understanding level specifying means, and the output means may output understanding level data indicating the level of understanding specified by the understanding level specifying means.

また、本発明の更に好ましい態様において、前記算出手段は、前記検出手段によって検出されるうなずき動作の検出の頻度を、予め定められた時間単位で算出してもよい。 Further, in a further preferred aspect of the present invention, the calculating means may calculate the frequency of detection of the nodding motion detected by the detecting means in a predetermined time unit.

また、本発明の更に好ましい態様において、前記照合パターンを、地域を識別する識別データ毎に記憶する照合パターン記憶手段と、前記地域を識別する識別データを取得する識別データ取得手段と、前記識別データ取得手段により取得された識別データに対応する照合パターンを前記照合パターン記憶手段から読み出す照合パターン読出手段とを具備し、前記検出手段は、前記照合パターン読出手段により読み出された照合パターンを前記解析結果と照合し、照合結果が所定の条件を満たすものを前記うなずき動作として検出してもよい。 Further, in a further preferred aspect of the present invention, a collation pattern storage unit that stores the collation pattern for each identification data that identifies a region, an identification data acquisition unit that obtains identification data that identifies the region, and the identification data Collating pattern reading means for reading a matching pattern corresponding to the identification data acquired by the acquiring means from the matching pattern storage means, and the detecting means analyzes the matching pattern read by the matching pattern reading means The result may be collated, and the collation result satisfying a predetermined condition may be detected as the nodding operation.

また、本発明の更に好ましい態様において、前記データ取得手段は、前記データを、通信ネットワークを介して接続された複数の端末からそれぞれ受信し、前記算出手段は、前記検出手段によって検出されるうなずき動作の検出の頻度を、前記端末毎に算出してもよい。 Further, in a further preferred aspect of the present invention, the data acquisition means receives the data from each of a plurality of terminals connected via a communication network, and the calculation means is a nodding operation detected by the detection means. May be calculated for each of the terminals.

また、本発明の更に好ましい態様において、前記検出手段によって検出されるうなずき動作の検出の頻度を、予め定められた単位時間毎に統計する統計手段と、前記統計手段による統計結果に応じて、基準となるうなずき動作のタイミングを基準タイミングとして算出する基準タイミング算出手段と、前記複数の端末のなかから、前記検出手段によって検出されたうなずき動作の検出のタイミングと前記基準タイミング算出手段によって算出された基準タイミングとの差分が予め定められた閾値以上である端末を特定する端末特定手段と、前記端末特定手段による特定結果を示す特定データを出力する特定データ出力手段とを具備してもよい。 Further, in a further preferred aspect of the present invention, a statistical means for statistically detecting the frequency of detection of the nod motion detected by the detection means for each predetermined unit time, and a reference according to a statistical result by the statistical means A reference timing calculating means for calculating the timing of the nodding operation as a reference timing, and a timing for detecting the nodding action detected by the detecting means from among the plurality of terminals and a reference calculated by the reference timing calculating means You may provide the terminal specific means which specifies the terminal whose difference with timing is more than a predetermined threshold value, and the specific data output means which outputs the specific data which shows the specific result by the said terminal specific means.

また、本発明の更に好ましい態様において、前記データ取得手段は、前記聴取者の映像を表す映像データを含むデータを取得し、前記検出手段は、前記データ取得手段により取得された映像データを解析して顔画像の検出を行う顔画像検出手段と、前記顔画像検出手段により検出された顔画像の位置及び方向を検出することによって顔の動きの検出を行う動き検出手段と、前記動き検出手段によって検出された顔の動きを予め定められた照合パターンと照合し、照合結果が所定の条件を満たすものを前記うなずき動作として検出するうなずき動作検出手段とを具備してもよい。 Further, in a further preferred aspect of the present invention, the data acquisition means acquires data including video data representing the video of the listener, and the detection means analyzes the video data acquired by the data acquisition means. A face image detecting means for detecting a face image, a motion detecting means for detecting a face motion by detecting a position and a direction of the face image detected by the face image detecting means, and the motion detecting means. Nodding motion detection means for collating the detected face movement with a predetermined collation pattern and detecting a collation result satisfying a predetermined condition as the nodding motion may be provided.

また、本発明の別の好ましい態様において、前記データ取得手段は、前記聴取者の音声を表す音声データを含むデータを取得し、前記検出手段は、前記データ取得手段により取得された音声データを予め定められた照合パターンと照合し、照合結果が所定の条件を満たすものを前記うなずき動作として検出してもよい。 In another preferable aspect of the present invention, the data acquisition unit acquires data including audio data representing the voice of the listener, and the detection unit preliminarily stores the audio data acquired by the data acquisition unit. Matching with a predetermined matching pattern and detecting that the matching result satisfies a predetermined condition may be detected as the nodding operation.

本発明によれば、複数の通信端末間で通信によって会議等を行う際に、聴取者が話の内容を理解しているかを認識し易くすることができる。 ADVANTAGE OF THE INVENTION According to this invention, when a meeting etc. are performed by communication between several communication terminals, it can make it easy to recognize whether the listener understands the content of the talk.

＜構成＞
図１は、この発明の一実施形態である遠隔会議システム１の構成を示すブロック図である。この遠隔会議システム１は、各地に設置された複数の端末１０ａ，１０ｂ，１０ｃ…が、インターネット等の通信網２０に接続されて構成される。なお、以下の説明においては、説明の便宜上、端末１０ａ，１０ｂ，１０ｃ…を各々区別する必要がない場合には、これらを「端末１０」と称して説明する。遠隔会議の参加者が端末１０を用いて通信を行うことで、遠隔会議が実現される。 <Configuration>
FIG. 1 is a block diagram showing a configuration of a remote conference system 1 according to an embodiment of the present invention. The remote conference system 1 is configured by connecting a plurality of terminals 10a, 10b, 10c... Installed in various places to a communication network 20 such as the Internet. In the following description, for convenience of description, when it is not necessary to distinguish the terminals 10a, 10b, 10c..., These will be referred to as “terminal 10”. A remote conference is realized by a remote conference participant performing communication using the terminal 10.

図２は、端末１０の構成の一例を示すブロック図である。図において、制御部１１は、ＣＰＵ（Central Processing Unit）やＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）を備え、ＲＯＭ又は記憶部１２に記憶されているコンピュータプログラムを読み出して実行することにより、バスを介して端末１０の各部を制御する。記憶部１２は、制御部１１によって実行されるコンピュータプログラムやその実行時に使用されるデータを記憶するための記憶手段であり、例えばハードディスク装置である。表示部１３は、液晶パネルを備え、制御部１１による制御の下に各種の画像を表示する。操作部１４は、端末１０の利用者による操作に応じた信号を出力する。マイクロホン１５は、収音し、収音した音声を表す音声信号（アナログ信号）を出力する。音声処理部１６は、マイクロホン１５が出力する音声信号（アナログ信号）をＡ／Ｄ変換によりデジタルデータに変換する。また、音声処理部１６は、供給されるデジタルデータをＤ／Ａ変換によりアナログ信号に変換してスピーカ１７に供給する。スピーカ１７は、音声処理部１６から出力されるアナログ信号に応じた強度で放音する。通信部１８は、他の端末１０との間で通信網２０を介して通信を行うための通信手段である。撮影部１９は、撮影し、撮影した映像を表す映像データを出力する。 FIG. 2 is a block diagram illustrating an example of the configuration of the terminal 10. In the figure, the control unit 11 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory), and reads and executes a computer program stored in the ROM or the storage unit 12. Control each part of the terminal 10 through the bus. The storage unit 12 is a storage unit for storing a computer program executed by the control unit 11 and data used at the time of execution, and is, for example, a hard disk device. The display unit 13 includes a liquid crystal panel and displays various images under the control of the control unit 11. The operation unit 14 outputs a signal corresponding to an operation by the user of the terminal 10. The microphone 15 collects sound and outputs a sound signal (analog signal) representing the collected sound. The audio processing unit 16 converts an audio signal (analog signal) output from the microphone 15 into digital data by A / D conversion. The audio processing unit 16 converts the supplied digital data into an analog signal by D / A conversion and supplies the analog signal to the speaker 17. The speaker 17 emits sound with an intensity corresponding to the analog signal output from the sound processing unit 16. The communication unit 18 is a communication unit for performing communication with another terminal 10 via the communication network 20. The photographing unit 19 photographs and outputs video data representing the photographed video.

なお、この実施形態では、マイクロホン１５とスピーカ１７とが端末１０に含まれている場合について説明するが、音声処理部１６に入力端子及び出力端子を設け、オーディオケーブルを介してその入力端子に外部マイクロホンを接続する構成としても良い。同様に、オーディオケーブルを介してその出力端子に外部スピーカを接続する構成としてもよい。また、この実施形態では、マイクロホン１５から音声処理部１６へ入力されるオーディオ信号及び音声処理部１６からスピーカ１７へ出力されるオーディオ信号がアナログオーディオ信号である場合について説明するが、デジタルオーディオデータを入出力するようにしても良い。このような場合には、音声処理部１６にてＡ／Ｄ変換やＤ／Ａ変換を行う必要はない。表示部１３や撮影部１９についても同様であり、外部出力端子や外部入力端子を設け、外部モニタや外部撮影装置を接続する構成としても良い。 In this embodiment, the case where the microphone 15 and the speaker 17 are included in the terminal 10 will be described. However, the audio processing unit 16 is provided with an input terminal and an output terminal, and the input terminal is externally connected to the input terminal via an audio cable. It is good also as a structure which connects a microphone. Similarly, an external speaker may be connected to the output terminal via an audio cable. In this embodiment, the audio signal input from the microphone 15 to the audio processing unit 16 and the audio signal output from the audio processing unit 16 to the speaker 17 are analog audio signals. You may make it input / output. In such a case, the audio processing unit 16 does not need to perform A / D conversion or D / A conversion. The same applies to the display unit 13 and the photographing unit 19, and an external output terminal and an external input terminal may be provided to connect an external monitor and an external photographing device.

記憶部１２は、図示のように、カウントテーブル記憶領域１２１を有している。カウントテーブル記憶領域１２１には、会議の参加者がうなずいたときの動作又は音声（以下「うなずき動作」と称する）の検出の回数や頻度を示すカウントデータが、自端末１０に接続されている他の端末１０毎に記憶される。図３は、カウントテーブルの内容の一例を示す図である。図示のように、このテーブルは、「端末ＩＤ」と「カウントデータ」との各項目が互いに関連付けて構成されている。これらの項目のうち、「端末ＩＤ」の項目には、端末１０を識別する端末ＩＤが記憶される。「カウントデータ」の項目には、対応する端末ＩＤの示す端末１０から送信されてくるデータからうなずき動作が検出された回数や頻度を示すカウントデータが記憶される。 The storage unit 12 has a count table storage area 121 as illustrated. In the count table storage area 121, count data indicating the number of times or frequency of detection of an operation or sound (hereinafter referred to as “nodding operation”) when a conference participant nods is connected to the own terminal 10 Is stored for each terminal 10. FIG. 3 is a diagram illustrating an example of the contents of the count table. As illustrated, this table is configured by associating items of “terminal ID” and “count data” with each other. Among these items, the “terminal ID” item stores a terminal ID for identifying the terminal 10. In the item “count data”, count data indicating the number and frequency of nodling operations detected from data transmitted from the terminal 10 indicated by the corresponding terminal ID is stored.

＜動作＞
次に、本実施形態の動作について説明する。端末１０は、マイクロホン１５で収音した音声を表す音声データと撮影部１９で撮影した映像を表す映像データとを含むデータ（以下「会議データ」と称する）を、他の端末１０に送信するとともに、他の端末１０から送信されてくる会議データを受信し、受信した会議データに含まれる音声データをスピーカ１７から音として放音するとともに、受信した会議データに含まれる映像データを表示部１３に出力して映像を表示させる。これにより遠隔会議が実現される。 <Operation>
Next, the operation of this embodiment will be described. The terminal 10 transmits data including audio data representing the sound collected by the microphone 15 and video data representing the video captured by the imaging unit 19 (hereinafter referred to as “conference data”) to the other terminals 10. The conference data transmitted from the other terminal 10 is received, the audio data included in the received conference data is emitted as sound from the speaker 17, and the video data included in the received conference data is displayed on the display unit 13. Output and display video. Thereby, the remote conference is realized.

このとき、端末１０の制御部１１は、マイクロホン１５で収音された音声を表す音声データ及び撮影部１９によって撮影された映像を表す映像データのうちの少なくともいずれか一方を解析し、解析結果を予め定められた照合パターンと照合し、照合結果が所定の条件を満たすものを参加者のうなずき動作として検出する。この動作例では、制御部１１は、撮影部１９から出力される映像データを解析して顔画像検出を行うことにより、参加者のうなずき動作を検出する。具体的には、まず、制御部１１は、映像データを解析して顔画像の検出を行う。次いで、制御部１１は、検出した顔画像の位置及び方向を検出することによって参加者の顔の動きの検出を行う。次いで、制御部１１は、検出された顔の動きのパターンを予め定められた照合パターンと照合し、照合結果が所定の条件を満たすものをうなずき動作として検出する。この照合パターンとしては、うなずき動作における顔の動きの特徴を表すデータを用いればよい。この照合パターンは、端末１０の記憶部１２の所定の記憶領域に予め記憶しておくようにしてもよく、また、所定のサーバ装置等から取得するようにしてもよい。 At this time, the control unit 11 of the terminal 10 analyzes at least one of the audio data representing the sound collected by the microphone 15 and the video data representing the video taken by the photographing unit 19, and the analysis result is obtained. A collation pattern determined in advance is collated, and a collation result satisfying a predetermined condition is detected as a nodding operation of the participant. In this operation example, the control unit 11 detects the nodding operation of the participant by analyzing the video data output from the photographing unit 19 and performing face image detection. Specifically, first, the control unit 11 analyzes the video data and detects a face image. Next, the control unit 11 detects the movement of the participant's face by detecting the position and direction of the detected face image. Next, the control unit 11 collates the detected facial motion pattern with a predetermined collation pattern, and detects a collation result that satisfies a predetermined condition as a nodding operation. As this collation pattern, data representing the feature of the face movement in the nodding operation may be used. This collation pattern may be stored in advance in a predetermined storage area of the storage unit 12 of the terminal 10, or may be acquired from a predetermined server device or the like.

制御部１１は、うなずき動作が検出された回数をカウントし、カウント結果を示すカウントデータを、カウントテーブル記憶領域１２１に記憶されたテーブルに記憶する。すなわち、制御部１１は、うなずき動作が検出される毎に、カウントテーブル記憶領域１２１に記憶されたカウントデータの値を更新する。 The control unit 11 counts the number of times the nodding operation is detected, and stores count data indicating the count result in a table stored in the count table storage area 121. That is, the control unit 11 updates the value of the count data stored in the count table storage area 121 every time a nodding operation is detected.

また、制御部１１は、予め定められた単位時間毎に、カウント結果を示すカウントデータを、通信中の他の端末１０に対して送信するとともに、他の端末１０から送信されてくるカウントデータを受信する。制御部１１は、他の端末１０から送信されてくるカウントデータをカウントテーブル記憶領域１２１に記憶されたテーブルに記憶する。すなわち、制御部１１は、他の端末１０からカウントデータを受信する毎に、カウントテーブル記憶領域１２１に記憶された端末１０毎のカウントデータの値を更新する。 In addition, the control unit 11 transmits count data indicating the count result to the other terminal 10 in communication for each predetermined unit time, and count data transmitted from the other terminal 10 is also transmitted. Receive. The control unit 11 stores the count data transmitted from the other terminal 10 in a table stored in the count table storage area 121. That is, the control unit 11 updates the value of the count data for each terminal 10 stored in the count table storage area 121 every time count data is received from another terminal 10.

また、制御部１１は、遠隔会議が行われている最中において、うなずき動作のカウント結果を表示部１３に表示させる。図４は、表示部１３に表示される画面の一例を示す図である。図４に示す例においては、遠隔会議に参加している参加者（すなわち他の端末１０の利用者）の映像Ａ１，Ａ２，Ａ３，Ａ４がそれぞれ表示されるとともに、各映像Ａ１，Ａ２，Ａ３，Ａ４の近傍に、カウントデータの示す内容を表す画像Ｂ１，Ｂ２，Ｂ３，Ｂ４がそれぞれ表示される。制御部１１は、所定単位時間毎に表示部１３に、カウントデータに対応するデータを表示部１３に出力し、表示部１３は、制御部１１から供給されるデータに応じて表示内容を更新する。すなわち、遠隔会議が行われている最中において、各参加者のうなずき動作の検出の頻度や回数を示す画像が表示されるとともに、その表示がリアルタイムで更新される。 Further, the control unit 11 causes the display unit 13 to display the count result of the nodding operation during the remote conference. FIG. 4 is a diagram illustrating an example of a screen displayed on the display unit 13. In the example shown in FIG. 4, the images A1, A2, A3, A4 of the participants participating in the remote conference (that is, users of other terminals 10) are displayed, and the images A1, A2, A3 are displayed. , A4, images B1, B2, B3, and B4 representing the contents indicated by the count data are respectively displayed. The control unit 11 outputs data corresponding to the count data to the display unit 13 every predetermined unit time, and the display unit 13 updates the display content according to the data supplied from the control unit 11. . That is, during a remote conference, an image showing the frequency and frequency of detection of the nodding operation of each participant is displayed, and the display is updated in real time.

遠隔会議の参加者は、表示部１３に表示される画面を確認することで、どの参加者がどのような頻度でうなずく動作を行っているかを確認することができる。一般的に、うなずき動作は、認知した内容に対して肯定的な理解を示したときに発生するので、おおまかにうなずき回数が参加者の内容理解度に比例すると考えられる。そのため、本実施形態によれば、表示部１３に表示される画面を参照することで、各参加者の理解度を推定することができる。 The participant of the remote conference can confirm which participant is performing a nodding operation at what frequency by confirming the screen displayed on the display unit 13. In general, the nodding action occurs when a positive understanding is shown with respect to the recognized content, so the number of nodding is roughly considered to be proportional to the level of understanding of the participants. Therefore, according to this embodiment, the comprehension level of each participant can be estimated by referring to the screen displayed on the display unit 13.

このように、計測されたうなずき動作のカウント結果は、リアルタイムに他の参加者の端末１０に送信され、各参加者のビデオ映像の脇に表示される。他の端末１０においては、他の端末１０の制御部１１は、受信されたカウントデータに応じた画像を表示する。なお、画像を表示するに代えて、音声メッセージを出力するようにしてもよい。これにより、他の端末１０の利用者は、自身のうなずきの回数や他の参加者のうなずき回数を、遠隔会議の最中に把握することができる。 Thus, the count result of the measured nodding action is transmitted to the other participant's terminal 10 in real time, and displayed beside each participant's video image. In another terminal 10, the control unit 11 of the other terminal 10 displays an image corresponding to the received count data. Instead of displaying an image, a voice message may be output. Thereby, the user of the other terminal 10 can grasp | ascertain the number of times of his own nodding and the number of times of nodding of other participants during a remote conference.

また、制御部１１は、参加者のうなずき動作のカウント結果を、会議中の時間軸に沿って記憶する。図５は、制御部１１が行うカウント処理の内容の一例を示す図である。図において、横軸は時刻を示し、縦軸はうなずき動作の回数を示す。制御部１１は、検出されるうなずき動作の回数を、予め定められた時間単位で算出し、算出結果を記憶部１２の所定の記憶領域に記憶する。この算出処理は、他の端末１０毎のそれぞれ個別に算出するようにしてもよく、また、複数の端末１０をまとめて全体として算出するようにしてもよい。また、図５に示すような、カウント結果を時間軸に沿って算出した結果を示す画像を、自端末１０又は他の端末１０の表示部１３に表示するようにしてもよい。 Moreover, the control part 11 memorize | stores the count result of a nodding action of a participant along the time axis during a meeting. FIG. 5 is a diagram illustrating an example of the content of the count process performed by the control unit 11. In the figure, the horizontal axis represents time, and the vertical axis represents the number of nodding operations. The control unit 11 calculates the number of detected nodding operations in a predetermined time unit, and stores the calculation result in a predetermined storage area of the storage unit 12. This calculation process may be performed individually for each of the other terminals 10, or may be performed as a whole for a plurality of terminals 10. Moreover, you may make it display the image which shows the result which computed the count result along the time axis as shown in FIG. 5 on the display part 13 of the own terminal 10 or the other terminal 10.

さて、遠隔会議を終えると、会議の参加者は、操作部１４を用いて、会議が終了した旨を入力する。制御部１１は、操作部１４から出力される信号に応じて、会議が終了したか否かを判定する。会議が終了したと判定すると、制御部１１は、カウントしたうなずき動作の回数（以下「うなずき回数」という）を集計して、どの参加者のうなずき回数が多かったか、どの時間帯のうなずき回数が高かったか、全体のうなずき回数はどれくらいか、話し手（講師）による理解度の違いはあるのかどうか、を計算する。 Now, when the remote conference is finished, the participant of the conference uses the operation unit 14 to input that the conference is finished. The control unit 11 determines whether or not the conference is ended according to a signal output from the operation unit 14. When it is determined that the meeting is ended, the control unit 11 counts the number of times of the nod operation (hereinafter referred to as “nod number”), which participant has the most nod number, and in which time period the nod number is high. It is calculated how many times the total number of nods is, and whether there is a difference in understanding level depending on the speaker (instructor).

図６は、うなずき動作の検出の回数を端末１０毎に算出した場合の統計結果の一例を示す図である。図６に示す例では、うなずき動作の検出回数が、端末１０の利用者毎に統計されている。制御部１１は、利用者の操作に応じて、図６に例示するような画像を表示部１３に表示させる。会議の参加者は、表示される画面をみることで、どの参加者のうなずき回数が多かったか、等を把握することができる。また、図６に示す例に限らず、制御部１１が、所定の時間帯毎のうなずき回数の統計をとるようにしてもよく、また、遠隔講義等を行う場合には、講師（発話者）毎のうなずき回数の統計をとって、どの講師による講義に対するうなずき回数が多いかを算出するようにしてもよい。 FIG. 6 is a diagram illustrating an example of a statistical result when the number of detections of the nodding operation is calculated for each terminal 10. In the example illustrated in FIG. 6, the number of detections of the nodding operation is statistically calculated for each user of the terminal 10. The control unit 11 causes the display unit 13 to display an image as illustrated in FIG. 6 in accordance with a user operation. By looking at the displayed screen, the participant of the conference can grasp which participant nodded a lot. In addition to the example shown in FIG. 6, the control unit 11 may take statistics of the number of nods for each predetermined time period, and in the case of conducting a remote lecture or the like, a lecturer (speaker) It may be possible to calculate statistics of the number of nods for each lecture to calculate which instructor has a large number of nods.

このように本実施形態では、端末１０は、参加者の顔のビデオ映像をリアルタイムに信号処理し、顔の位置及び方向の検出・顔の動きの検出を行い、参加者のうなずきの回数をカウントする。カウントした結果はリアルタイムに他の参加者に送信することもできるし、保存した回数を後でチェックすることもできる。このように、参加者のうなずき回数をビデオ映像の処理によって計測し、画面表示などの方法で参加者にフィードバックすることで、会議の理解度や会議参加の積極性を推し量ることができる。参加者は、うなずきの回数をリアルタイムに知ることができる。 As described above, in the present embodiment, the terminal 10 processes the video image of the participant's face in real time, detects the position and direction of the face, detects the movement of the face, and counts the number of times the participant nods. To do. The counted result can be transmitted to other participants in real time, and the number of times saved can be checked later. Thus, by measuring the number of nods of participants by processing video images and feeding them back to the participants by screen display or the like, it is possible to estimate the degree of understanding of the conference and the aggressiveness of the conference. Participants can know the number of nods in real time.

また、本実施形態では、うなずきの回数を記録しておき、後で集計することができる。うなずき回数を保存しておき、後で集計することによって、参加者別の理解度の違いや、時間帯別の理解度の違い（ある特定の部分の説明は全体のうなずきが少なければ、その部分は理解されていないとわかる）、話者（講師）別の理解度の違いなどを把握することができる。 In this embodiment, the number of nods can be recorded and tabulated later. By storing the number of nods and counting them later, differences in understanding by participant and differences in understanding by time zone (explaining a specific part if there is little overall nod, Can understand the difference in understanding level by speaker (instructor).

＜変形例＞
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその例を示す。なお、以下の各態様を適宜に組み合わせてもよい。
（１）上述の実施形態では、本発明に係る通信端末を用いて遠隔会議を行う場合について説明したが、本発明はこれに限らず、例えば、通信ネットワークを介して講義や講演を行う場合においても本発明を適用することができる。 <Modification>
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below. In addition, you may combine each following aspect suitably.
(1) In the above-described embodiment, the case where a remote conference is performed using the communication terminal according to the present invention has been described. However, the present invention is not limited to this, and for example, when a lecture or lecture is performed via a communication network. The present invention can also be applied.

（２）上述の実施形態において、制御部１１が、うなずき動作の回数から理解度を算出するようにしてもよい。この場合は、うなずき頻度と理解度との対応関係を記憶部１２の対応関係記憶領域１２２（図１に鎖線で図示）に記憶しておく。図７は、対応関係記憶領域１２２に記憶された対応関係の内容の一例を示す図である。図７において、横軸はうなずき動作の検出頻度を示し、縦軸は理解度を示す。上述したように、うなずき動作は、認知した内容に対して肯定的な理解を示したときに発生するので、おおまかにうなずき回数が参加者の内容理解度に比例すると考えられる。制御部１１は、対応関係記憶領域１２２に記憶された対応関係を参照して、算出したうなずき頻度に対応する理解度を特定し、特定した理解度を示すデータを表示部１３等に出力する。具体的には、例えば、表示部１３に「理解度ｘｘ％」といったメッセージを表示したり、理解度を示す音声メッセージをスピーカ１７から放音したりしてもよい。このようにすることにより、会議の参加者は、各参加者の理解度を把握し易くなる。 (2) In the above-described embodiment, the control unit 11 may calculate the degree of understanding from the number of nodding operations. In this case, the correspondence between the nodding frequency and the degree of understanding is stored in the correspondence storage area 122 (illustrated by a chain line in FIG. 1) of the storage unit 12. FIG. 7 is a diagram illustrating an example of the contents of the correspondence relationship stored in the correspondence relationship storage area 122. In FIG. 7, the horizontal axis indicates the detection frequency of the nodding action, and the vertical axis indicates the degree of understanding. As described above, the nodding action occurs when a positive understanding is shown with respect to the recognized content, so it is considered that the number of nodding is roughly proportional to the content understanding level of the participants. The control unit 11 refers to the correspondence stored in the correspondence storage area 122, identifies the degree of understanding corresponding to the calculated nodding frequency, and outputs data indicating the identified degree of understanding to the display unit 13 or the like. Specifically, for example, a message such as “understanding level xx%” may be displayed on the display unit 13, or a voice message indicating the level of understanding may be emitted from the speaker 17. By doing in this way, the participant of a meeting becomes easy to grasp the degree of understanding of each participant.

うなずき回数と理解度との対応関係は、図７に例示するものに限らず、例えば、対応関係を示すテーブル等であってもよい。また、制御部１１が、予め定められたアルゴリズムに従ってうなずき回数を理解度に変換するようにしてもよい（例えば、ある一定時間内に５回以上うなずいたら理解度１００％、等）。要は、うなずき回数と理解度との対応関係を参照して、制御部１１が、算出した回数に対応する理解度を特定するようにすればよい。 The correspondence between the number of nods and the degree of comprehension is not limited to that illustrated in FIG. 7, but may be, for example, a table indicating the correspondence. Further, the control unit 11 may convert the number of nods into an understanding level according to a predetermined algorithm (for example, an understanding level of 100% when nodding 5 times or more within a certain time). In short, referring to the correspondence between the number of nods and the degree of understanding, the control unit 11 may specify the degree of understanding corresponding to the calculated number of times.

（３）上述の実施形態において、肯定的な理解をうなずき動作で表現しない国（例えば、インドやブルガリアは、Ｙｅｓを首を横に傾けることで表現する、等）において遠隔会議を実施する場合は、制御部１１が、上述の実施形態に係るうなずき動作以外の顔の動きを検知するようにしてもよい。この場合は、地域（国等）を識別する地域識別データと、その地域において肯定的な理解を示す動作又は音声の特徴を表す照合パターンとを対応付けて記憶部１２の所定の記憶領域（以下「照合パターン記憶領域」という）に予め記憶しておき、端末１０の利用者が、操作部１４を操作して自身の属する地域を識別する地域識別データを入力するようにすればよい。制御部１１は、操作部１４から出力される信号に応じて、入力された地域識別データに対応する照合パターンを照合パターン記憶領域から読み出し、読み出した照合パターンを用いて、他の端末１０から受信される会議データに含まれる映像データ又は音声データからうなずき動作を検出するようにすればよい。このようにすることにより、肯定的な理解をどのような動作によって検出するかを、会議の参加者の地域に応じて切り替えることができる。 (3) In the above-described embodiment, when a remote conference is implemented in a country that does not express a positive understanding with a nodding action (for example, India or Bulgaria expresses Yes by tilting its head sideways). The control unit 11 may detect the movement of the face other than the nodding action according to the above-described embodiment. In this case, the region identification data for identifying the region (country, etc.) is associated with an operation indicating positive understanding in the region or a collation pattern representing a voice feature (hereinafter referred to as a predetermined storage region of the storage unit 12). It may be stored in advance in a “collation pattern storage area” and the user of the terminal 10 may input the area identification data for identifying the area to which the terminal 10 belongs by operating the operation unit 14. In response to the signal output from the operation unit 14, the control unit 11 reads a matching pattern corresponding to the input region identification data from the matching pattern storage area, and receives it from the other terminal 10 using the read matching pattern. The nodding operation may be detected from video data or audio data included in the conference data. By doing in this way, it is possible to switch according to the region of the participant of the conference what kind of operation detects positive understanding.

（４）上述の実施形態において、うなずきのタイミングをはずす参加者を検出するようにしてもよい。この場合は、例えば、制御部１１が、端末１０毎のうなずき動作の検出タイミングを統計し、統計結果に応じて基準となるうなずき動作のタイミング（以下「基準タイミング」）を算出する。次いで、制御部１１が端末１０のそれぞれについて、うなずき動作が検出されるタイミングと基準タイミングとの差分が予め定められた閾値以上である端末１０を、タイミングがずれている参加者の端末であると特定し、特定結果を示すデータを表示部１３等に出力するようにしてもよい。このようにすることにより、各参加者は、うなずくタイミングが他の人とずれていることを把握することができ、また、どの参加者がずれているかを把握することができる。 (4) In the above-described embodiment, a participant who removes the timing of nodding may be detected. In this case, for example, the control unit 11 statistics the detection timing of the nodding operation for each terminal 10, and calculates the timing of the nodding operation as a reference (hereinafter, “reference timing”) according to the statistical result. Next, for each terminal 10, the control unit 11 determines that the terminal 10 in which the difference between the timing at which the nodding operation is detected and the reference timing is equal to or greater than a predetermined threshold is the terminal of the participant whose timing is shifted. The data indicating the specific result may be output to the display unit 13 or the like. By doing in this way, each participant can grasp that the nodding timing has deviated from other people, and can grasp which participant has deviated.

（５）上述の実施形態において、参加者のそれぞれでうなずく頻度が異なる場合がある。例えば、頻繁にうなずき動作を行う傾向のある者や、話を理解しているもののそれほど頻繁にうなずかない傾向のある者もいる。そこで、制御部１１が、参加者毎にうなずき動作の検出回数や検出頻度の重み付けを行うようにしてもよい。この場合は、参加者を識別する参加者ＩＤ（又は端末１０を識別する端末ＩＤ）と重み付け係数とを対応付けて記憶部１２の所定の領域に予め記憶しておき、制御部１１が、記憶された対応関係を参照して、端末１０毎の検出されたうなずき回数に重み付け係数を乗算し、重み付けした結果を表示部１３等に表示するようにしてもよい。このようにすることで、参加者毎の理解度をより把握し易くすることができる。 (5) In the above-described embodiment, the nodding frequency may be different for each participant. For example, there are those who tend to nod frequently, and those who understand the story but tend to nod so often. Therefore, the control unit 11 may weight the detection frequency and detection frequency of the nodding operation for each participant. In this case, the participant ID for identifying the participant (or the terminal ID for identifying the terminal 10) and the weighting coefficient are associated with each other and stored in advance in a predetermined area of the storage unit 12, and the control unit 11 stores the information. With reference to the corresponding correspondence, the number of nods detected for each terminal 10 may be multiplied by a weighting coefficient, and the weighted result may be displayed on the display unit 13 or the like. By doing in this way, it is possible to make it easier to grasp the degree of understanding for each participant.

（６）上述の実施形態では、端末１０の制御部１１は、他の端末１０から受信する映像データを画像解析することによって参加者のうなずき動作を検出したが、うなずき動作の検出の態様はこれに限らず、他の端末１０から受信される音声データを音声解析することによってうなずき音声を検出するようにしてもよい。この場合は、例えば、端末１０の記憶部１２にうなずきやあいづちを表す音声（以下「うなずき音声」という）又はうなずき音声の特徴を表す照合パターンを予め記憶しておき、制御部１１が、受信された音声データを記憶部１２に記憶された照合パターンと照合し、両者の一致度に応じてうなずき音声を検出するようにしてもよい。また、うなずき音声の検出の態様としては、例えば、制御部１１が、受信した音声データを音声解析し、単語として認識されなかった箇所をうなずき音声として検出するようにしてもよい。 (6) In the above-described embodiment, the control unit 11 of the terminal 10 detects the nodding motion of the participant by analyzing the video data received from the other terminals 10, but the mode of detecting the nodding motion is this. In addition, the nodding voice may be detected by analyzing voice data received from another terminal 10. In this case, for example, a nodling or nicking voice (hereinafter referred to as “nodding voice”) or a collation pattern representing a characteristic of the nodding voice is stored in advance in the storage unit 12 of the terminal 10, and the control unit 11 is received. The voice data may be collated with the collation pattern stored in the storage unit 12, and the nodding voice may be detected according to the degree of coincidence between the two. In addition, as a mode of detecting the nodding voice, for example, the control unit 11 may analyze the received voice data and detect a part that is not recognized as a word as a nodding voice.

（７）上述の実施形態では、図４に示すように、各参加者の画像の近傍に、うなずき動作の検出結果を示す画像を表示するようにしたが、表示の態様は上述したものに限らず、例えば、検出結果を示す画像を半透明にして各参加者の画像に重畳して表示するようにしてもよい。また、上述の実施形態においては、端末１０は、表示部１３にカウント結果を示す画像を表示することによってカウント結果を参加者に報知したが、報知の態様はこれに限らず、例えば、音声メッセージを出力することによって報知してもよく、また、カウント結果を示すデータを電子メール形式で受講者のメール端末に送信するといった形態であってもよい。また、カウント結果を示す情報を記録媒体に出力して記憶させるようにしてもよく、この場合、参加者はコンピュータを用いてこの記録媒体から情報を読み出させることで、それらを参照することができる。また、カウント結果を所定の用紙に印刷出力してもよい。要は参加者に対して何らかの手段でメッセージ乃至情報を伝えられるように、カウント結果を示す情報を出力するものであればよい。 (7) In the above-described embodiment, as shown in FIG. 4, an image indicating the detection result of the nodding motion is displayed in the vicinity of each participant's image, but the display mode is not limited to that described above. Instead, for example, an image showing the detection result may be made translucent and displayed superimposed on each participant's image. In the above-described embodiment, the terminal 10 notifies the participant of the count result by displaying an image indicating the count result on the display unit 13. However, the notification mode is not limited to this, and for example, a voice message May be notified by outputting the data, or the data indicating the count result may be transmitted to the student's mail terminal in an electronic mail format. Further, information indicating the count result may be output and stored on a recording medium. In this case, the participant can refer to the information by reading the information from the recording medium using a computer. it can. The count result may be printed out on a predetermined sheet. In short, any information may be output as long as the information indicating the count result is output so that the message or information can be transmitted to the participant by some means.

また、端末１０が、カウント結果を音で報知する場合において、端末１０毎に音を異ならせるようにしてもよい。このようにすることで、端末１０毎（すなわち参加者毎）の理解度を把握し易くすることができる。また、端末１０が、カウント数に応じて音を異ならせるようにしてもよい。この場合は、例えば、制御部１１が、カウント数が多いほど音圧を大きくするように制御してもよい。 Further, when the terminal 10 notifies the count result by sound, the sound may be different for each terminal 10. By doing in this way, it is possible to easily grasp the degree of understanding for each terminal 10 (that is, for each participant). Further, the terminal 10 may vary the sound according to the count number. In this case, for example, the control unit 11 may perform control so that the sound pressure increases as the count number increases.

（８）上述の実施形態では、複数の端末１０のそれぞれが、その端末１０の利用者のうなずき回数やうなずき頻度を算出するようにしたが、これに代えて、複数の端末１０と通信ネットワークを介して接続されたサーバ装置が、複数の端末１０のそれぞれに対応する利用者のうなずき回数やうなずき頻度を算出するようにしてもよい。この場合は、サーバ装置の制御部が、通信ネットワークを介して接続された複数の端末１０から会議データを受信し、受信された端末１０毎の会議データをそれぞれ解析して、うなずき動作の検出の頻度を端末１０毎に算出し、算出結果を示すデータを、端末１０のそれぞれに送信するようにすればよい。 (8) In the above-described embodiment, each of the plurality of terminals 10 calculates the number of nods and the nodding frequency of the user of the terminal 10, but instead, the plurality of terminals 10 and the communication network are connected to each other. The server device connected via the terminal may calculate the number of nods and the nod frequency of the user corresponding to each of the plurality of terminals 10. In this case, the control unit of the server device receives the conference data from the plurality of terminals 10 connected via the communication network, analyzes the received conference data for each terminal 10, and detects the nodding operation. The frequency may be calculated for each terminal 10 and data indicating the calculation result may be transmitted to each terminal 10.

（９）上述の実施形態において端末１０の制御部１１によって実行されるプログラムは、磁気記録媒体（磁気テープ、磁気ディスクなど）、光記録媒体（光ディスクなど）、光磁気記録媒体、半導体メモリなどのコンピュータが読取可能な記録媒体に記録した状態で提供し得る。また、インターネットのようなネットワーク経由で端末１０にダウンロードさせることも可能である。 (9) The program executed by the control unit 11 of the terminal 10 in the above-described embodiment is a magnetic recording medium (magnetic tape, magnetic disk, etc.), an optical recording medium (optical disk, etc.), a magneto-optical recording medium, a semiconductor memory, etc. It can be provided in a state where it is recorded on a computer-readable recording medium. It is also possible to download to the terminal 10 via a network such as the Internet.

遠隔会議システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a remote conference system. 端末の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a terminal. カウントテーブルの内容の一例を示す図である。It is a figure which shows an example of the content of a count table. 表示部に表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed on a display part. 制御部が行うカウント処理の内容の一例を示す図である。It is a figure which shows an example of the content of the count process which a control part performs. うなずき動作の検出の回数を端末毎に算出した場合の統計結果の一例を示す図である。It is a figure which shows an example of a statistical result at the time of calculating the frequency | count of detection of a nodding action for every terminal. うなずき動作の頻度と理解度との対応関係の一例を示す図である。It is a figure which shows an example of the correspondence of the frequency of a nodding operation | movement and an understanding level.

Explanation of symbols

１…遠隔会議システム、１０…端末、１１…制御部、１２…記憶部、１３…表示部、１４…操作部、１５…マイクロホン、１６…音声処理部、１７…スピーカ、１８…通信部、１９…撮影部、２０…通信網、１２１…カウントテーブル記憶領域、１２２…対応関係記憶領域。 DESCRIPTION OF SYMBOLS 1 ... Remote conference system, 10 ... Terminal, 11 ... Control part, 12 ... Memory | storage part, 13 ... Display part, 14 ... Operation part, 15 ... Microphone, 16 ... Audio | voice processing part, 17 ... Speaker, 18 ... Communication part, 19 ... Image capturing unit, 20... Communication network, 121.

Claims

Data acquisition means for acquiring data including at least one of audio data representing the sound of the listener collected by the sound collection means and video data representing the video of the listener photographed by the photographing means;
Analyzing the data acquired by the data acquisition means, comparing the analysis result with a predetermined collation pattern, and detecting the collation result satisfying a predetermined condition as a nodding operation of the listener;
Calculating means for calculating the frequency of detection of the nod motion detected by the detecting means;
Output means for outputting frequency data representing the frequency calculated by the calculating means.

Correspondence storage means for storing the correspondence between the frequency and the degree of understanding;
An understanding level specifying means for specifying an understanding level corresponding to the frequency calculated by the calculating means with reference to the correspondence relationship storing means,
The communication device according to claim 1, wherein the output unit outputs understanding level data indicating the understanding level specified by the understanding level specifying unit.

The communication device according to claim 1, wherein the calculating unit calculates a frequency of detection of a nodding motion detected by the detecting unit in a predetermined time unit.

A collation pattern storage means for storing the collation pattern for each identification data for identifying a region;
Identification data acquisition means for acquiring identification data for identifying the area;
A verification pattern reading unit that reads a verification pattern corresponding to the identification data acquired by the identification data acquisition unit from the verification pattern storage unit;
The detection unit is configured to collate the collation pattern read by the collation pattern reading unit with the analysis result, and detect the collation result satisfying a predetermined condition as the nodding operation. 4. The communication device according to any one of items 3.

The data acquisition means receives the data from each of a plurality of terminals connected via a communication network,
The communication device according to any one of claims 1 to 4, wherein the calculation unit calculates a frequency of detection of a nodling motion detected by the detection unit for each terminal.

Statistical means for statistically detecting the frequency of detection of the nod motion detected by the detection means for each predetermined unit time;
Reference timing calculating means for calculating a reference nodding operation timing as a reference timing according to a statistical result by the statistical means;
A terminal that identifies a terminal in which a difference between a detection timing of a nod motion detected by the detection unit and a reference timing calculated by the reference timing calculation unit is equal to or greater than a predetermined threshold among the plurality of terminals. Specific means,
The communication apparatus according to claim 5, further comprising: specific data output means for outputting specific data indicating a specific result by the terminal specifying means.

The data acquisition means acquires data including video data representing the video of the listener;
The detection means includes
Face image detection means for analyzing the video data acquired by the data acquisition means and detecting a face image;
A motion detection means for detecting a face motion by detecting the position and direction of the face image detected by the face image detection means;
Nodding motion detection means for collating face motion detected by the motion detection means with a predetermined collation pattern, and detecting a collation result satisfying a predetermined condition as the nodding motion. The communication device according to any one of claims 1 to 6.

The data acquisition means acquires data including audio data representing the audio of the listener,
The detection unit is configured to collate the voice data acquired by the data acquisition unit with a predetermined collation pattern, and detect a collation result satisfying a predetermined condition as the nodding operation. The communication device according to any one of 1 to 6.