JP2022165144A

JP2022165144A - Support device and program

Info

Publication number: JP2022165144A
Application number: JP2021070378A
Authority: JP
Inventors: 直藤原; Nao Fujiwara
Original assignee: Mitsubishi Electric Building Solutions Corp
Current assignee: Mitsubishi Electric Building Solutions Corp
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2022-10-31

Abstract

To provide a support device capable of appropriately controlling a mute function of a microphone.SOLUTION: A voice output section 15 outputs voice from a speaker 6 on the basis of voice data received by a communication section 13 from an external device. A detection section 16 detects that a person on an image is uttering on the basis of the image photographed by a camera 3. A voice silencing section 17 performs silencing not to output the voice to be input to a microphone 4 from the external device unless the detection section 16 detects utterance. The communication section 13 transmits the data of the voice that has been input to the microphone 4 to the external device when the detection section 16 detects utterance.SELECTED DRAWING: Figure 2

Description

本開示は、会議を支援するための装置とプログラムとに関する。 TECHNICAL FIELD The present disclosure relates to devices and programs for supporting meetings.

特許文献１に、会議を支援するためのシステムが記載されている。特許文献１に記載されたシステムでは、ネットワークを介して複数の端末が接続される。当該システムであれば、遠隔の複数の拠点から会議に参加することができる。 US Pat. No. 6,200,000 describes a system for supporting meetings. In the system described in Patent Literature 1, multiple terminals are connected via a network. With this system, participants can participate in conferences from multiple remote sites.

特開２０１９－６１５９４号公報JP 2019-61594 A

このような会議に参加している人は、自分が発言するタイミングでマイクのミュート機能を解除する。しかし、発言の際にミュート機能を解除し忘れることもあり、その操作が煩わしいといった問題があった。 Participants in such meetings unmute their microphones when they speak. However, there is a problem that the mute function may be forgotten when speaking, and the operation is troublesome.

本開示は、上述のような課題を解決するためになされた。本開示の目的は、マイクのミュート機能を適切に制御できる支援装置を提供することである。本開示の他の目的は、マイクのミュート機能を適切に制御するためのプログラムを提供することである。 The present disclosure has been made to solve the problems described above. An object of the present disclosure is to provide a support device capable of appropriately controlling a microphone mute function. Another object of the present disclosure is to provide a program for appropriately controlling a microphone mute function.

本開示に係る支援装置は、外部機器から音声データを受信する通信手段と、通信手段が外部機器から受信した音声データに基づいて、スピーカから音声を出力する音声出力手段と、カメラによって撮影された画像に基づいて、当該画像に写っている人が発話していることを検出する検出手段と、検出手段が発話を検出していなければ、マイクに入力される音声が外部機器から出力されないように消音する消音手段と、を備える。通信手段は、検出手段が発話を検出していれば、マイクに入力された音声のデータを外部機器に送信する。 The support device according to the present disclosure includes communication means for receiving audio data from an external device, audio output means for outputting audio from a speaker based on the audio data received by the communication means from the external device, and Based on the image, detection means for detecting that the person in the image is speaking, and if the detection means does not detect the speech, the sound input to the microphone is not output from the external device. and muffling means for muffling. The communication means transmits data of the voice input to the microphone to the external device if the detection means detects the utterance.

本開示に係るプログラムは、外部機器から音声データを受信する第１通信処理と、第１通信処理で外部機器から受信した音声データに基づいて、スピーカから音声を出力する音声出力処理と、カメラによって撮影された画像に基づいて、当該画像に写っている人が発話していることを検出する検出処理と、検出処理で発話が検出されていなければ、マイクに入力される音声が外部機器から出力されないように消音する消音処理と、検出処理で発話が検出されていれば、マイクに入力された音声のデータを外部機器に送信する第２通信処理と、をコンピュータに実行させるためのものである。 A program according to the present disclosure includes a first communication process for receiving audio data from an external device, an audio output process for outputting audio from a speaker based on the audio data received from the external device in the first communication process, and a camera. Based on the captured image, detection processing detects that the person in the image is speaking, and if the detection processing does not detect speech, the sound input to the microphone is output from the external device. and a second communication process for transmitting voice data input to the microphone to an external device if speech is detected by the detection process. .

本開示によれば、会議を支援するための装置において、マイクのミュート機能を適切に制御できる。 Advantageous Effects of Invention According to the present disclosure, a microphone mute function can be appropriately controlled in a device for supporting a conference.

実施の形態１における支援装置を用いたシステムの例を示す図である。1 is a diagram showing an example of a system using a support device according to Embodiment 1; FIG. 支援装置の例を示す図である。It is a figure which shows the example of a support apparatus. 制御装置の動作例を示すフローチャートである。4 is a flowchart showing an operation example of a control device; 制御装置の他の動作例を示すフローチャートである。8 is a flowchart showing another operation example of the control device; 編集部によって作成された議事録がディスプレイに表示されている例を示す図である。FIG. 10 is a diagram showing an example of minutes created by an editorial department being displayed on the display; 制御装置のハードウェア資源の例を示す図である。It is a figure which shows the example of the hardware resources of a control apparatus. 制御装置のハードウェア資源の他の例を示す図である。FIG. 10 is a diagram showing another example of hardware resources of a control device;

以下に、図面を参照して詳細な説明を行う。重複する説明は、適宜簡略化或いは省略する。各図において、同一の符号は同一の部分又は相当する部分を示す。 A detailed description is given below with reference to the drawings. Duplicate descriptions are appropriately simplified or omitted. In each figure, the same reference numerals denote the same or corresponding parts.

実施の形態１．
図１は、実施の形態１における支援装置１を用いたシステムの例を示す図である。図１に示すシステムでは、複数の支援装置１がネットワーク２を介して接続される。複数の支援装置１は同じ建物の中に存在していても良い。当該複数の支援装置１のそれぞれが離れた場所に存在していても良い。 Embodiment 1.
FIG. 1 is a diagram showing an example of a system using a support device 1 according to Embodiment 1. As shown in FIG. In the system shown in FIG. 1, a plurality of support devices 1 are connected via a network 2. FIG. A plurality of support devices 1 may exist in the same building. Each of the plurality of support devices 1 may exist at a remote location.

本システムの利用者は、支援装置１を用いて所謂Ｗｅｂ会議を行う。支援装置１は、例えばパーソナルコンピュータである。支援装置１は、スマートフォンでも良い。支援装置１は、タブレット型の端末でも良い。 A user of this system uses the support device 1 to hold a so-called Web conference. The support device 1 is, for example, a personal computer. The support device 1 may be a smart phone. The support device 1 may be a tablet terminal.

一例として、ネットワーク２はＩＰネットワークである。ＩＰネットワークは、通信プロトコルとしてＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）を用いた通信ネットワークである。ネットワーク２は、クローズドネットワークでも良いし、オープンネットワークでも良い。 As an example, network 2 is an IP network. An IP network is a communication network using IP (Internet Protocol) as a communication protocol. Network 2 may be a closed network or an open network.

図２は、支援装置１の例を示す図である。支援装置１は、カメラ３、マイク４、ディスプレイ５、スピーカ６、及び制御装置７を備える。制御装置７は、記憶部１０、画像処理部１１、音声処理部１２、通信部１３、画像出力部１４、音声出力部１５、検出部１６、及び消音部１７を備える。 FIG. 2 is a diagram showing an example of the support device 1. As shown in FIG. The support device 1 includes a camera 3 , a microphone 4 , a display 5 , a speaker 6 and a control device 7 . The control device 7 includes a storage unit 10 , an image processing unit 11 , an audio processing unit 12 , a communication unit 13 , an image output unit 14 , an audio output unit 15 , a detection unit 16 and a muffling unit 17 .

カメラ３は画像を撮影する。例えば、利用者が支援装置１の前に座ると、カメラ３によって利用者の画像が撮影される。カメラ３によって撮影された画像のデータは、制御装置７に入力される。画像処理部１１は、カメラ３によって撮影された画像のデータを処理する。以下においては、画像を表すデータのことを単に画像データとも表記する。 Camera 3 takes an image. For example, when the user sits in front of the support device 1, the camera 3 takes an image of the user. Data of images captured by the camera 3 are input to the control device 7 . The image processing unit 11 processes image data captured by the camera 3 . In the following, data representing an image is also simply referred to as image data.

マイク４に入力された音声のデータは、制御装置７に入力される。音声処理部１２は、マイク４に入力された音声のデータを処理する。一例として、音声処理部１２は、マイク４からのアナログデータをデジタルデータに変換する。以下においては、音声を表すデータのことを単に音声データとも表記する。 Voice data input to the microphone 4 is input to the control device 7 . The audio processing unit 12 processes audio data input to the microphone 4 . As an example, the audio processing unit 12 converts analog data from the microphone 4 into digital data. In the following, data representing audio is also simply referred to as audio data.

通信部１３は、外部機器との通信を行う。本実施の形態に示す例では、外部機器は、ネットワーク２を介して接続された他の支援装置１である。通信部１３は、カメラ３によって撮影された画像のデータとマイク４に入力された音声のデータとを外部機器に送信する。具体的に、通信部１３は、画像処理部１１からの画像データと音声処理部１２からの音声データとを外部機器に送信する。 The communication unit 13 communicates with external devices. In the example shown in this embodiment, the external device is another support device 1 connected via network 2 . The communication unit 13 transmits image data captured by the camera 3 and audio data input to the microphone 4 to an external device. Specifically, the communication unit 13 transmits image data from the image processing unit 11 and audio data from the audio processing unit 12 to the external device.

外部機器においても同様の処理が行われるため、通信部１３は、外部機器から画像データと音声データとを受信する。画像出力部１４は、通信部１３が外部機器から受信した画像データに基づいて、ディスプレイ５に画像を表示する。音声出力部１５は、通信部１３が外部機器から受信した音声データに基づいて、スピーカ６から音声を出力する。このような基本動作により、利用者は、支援装置１を用いて遠隔にいる人とＷｅｂ会議を行うことができる。 Since similar processing is performed in the external device, the communication unit 13 receives image data and audio data from the external device. The image output unit 14 displays an image on the display 5 based on the image data received by the communication unit 13 from the external device. The audio output unit 15 outputs audio from the speaker 6 based on the audio data received by the communication unit 13 from the external device. With such basic operations, the user can hold a web conference with a remote person using the support device 1 .

図３は、制御装置７の動作例を示すフローチャートである。以下に、図３も参照し、本支援装置１が備える特徴的な機能について詳しく説明する。 FIG. 3 is a flow chart showing an operation example of the control device 7. As shown in FIG. Characteristic functions of the support device 1 will be described in detail below with reference to FIG. 3 as well.

会議が開始されると、カメラ３によって撮影された画像のデータが制御装置７に入力される（Ｓ１０１）。検出部１６は、カメラ３によって撮影された画像に基づいて、当該画像に写っている人が発話していることを検出する。一例として、検出部１６は、画像処理部１１によって処理された画像のデータから、人の顔を特定する。検出部１６は、特定した顔から更にその人の口を特定する。検出部１６は、特定した口の動きを表す指標を算出する。検出部１６は、算出した指標が閾値を超えると、画像に写っている人が発話していることを検出する。 When the conference starts, image data taken by the camera 3 is input to the control device 7 (S101). Based on the image captured by the camera 3, the detection unit 16 detects that the person in the image is speaking. As an example, the detection unit 16 identifies a person's face from the image data processed by the image processing unit 11 . The detection unit 16 further identifies the person's mouth from the identified face. The detection unit 16 calculates an index representing the movement of the specified mouth. When the calculated index exceeds the threshold, the detection unit 16 detects that the person in the image is speaking.

他の例として、検出部１６は、特定した顔からその人の視線を更に特定しても良い。検出部１６は、算出した指標と視線とに基づいて、画像に写っている人が発話していることを検出しても良い。 As another example, the detection unit 16 may further identify the line of sight of the person from the identified face. The detection unit 16 may detect that the person in the image is speaking based on the calculated index and line of sight.

制御装置７では、画像に写っている人が発話していることが検出部１６によって検出されたか否かが判定される（Ｓ１０２）。検出部１６が発話を検出していれば、Ｓ１０２でＹｅｓと判定される。 In the control device 7, it is determined whether or not the detection unit 16 has detected that the person in the image is speaking (S102). If the detection unit 16 has detected the speech, it is determined as Yes in S102.

消音部１７は、マイク４のミュート機能を司る。即ち、マイク４のミュート機能は、消音部１７によってオン（有効）及びオフ（無効）に自動的に切り替えられる。Ｓ１０２でＹｅｓと判定されると、消音部１７は、マイク４のミュート機能をオフにする（Ｓ１０３）。このため、Ｓ１０２でＹｅｓと判定されると、通信部１３は、マイク４に入力された音声のデータを外部機器に送信する。会議に参加している他の利用者は、マイク４に入力された音声を聞くことができる。 The muffling section 17 controls the mute function of the microphone 4 . That is, the mute function of the microphone 4 is automatically switched on (enabled) and off (disabled) by the muffling section 17 . If it is determined as Yes in S102, the muffling section 17 turns off the mute function of the microphone 4 (S103). Therefore, when it is determined as Yes in S102, the communication unit 13 transmits data of the voice input to the microphone 4 to the external device. Other users participating in the conference can hear the voice input to the microphone 4 .

一方、検出部１６が発話を検出していなければ、Ｓ１０２でＮｏと判定される。Ｓ１０２でＮｏと判定されると、消音部１７は、マイク４のミュート機能をオンにする（Ｓ１０４）。即ち、Ｓ１０２でＮｏと判定されると、消音部１７は、マイク４に入力される音声が外部機器から出力されないように消音する。当該消音の方法は、どのような方法であっても構わない。Ｓ１０２でＮｏと判定されると、通信部１３から外部機器に対して音声データは送信されない。 On the other hand, if the detection unit 16 has not detected an utterance, a determination of No is made in S102. If it is determined No in S102, the muffling unit 17 turns on the mute function of the microphone 4 (S104). That is, when the determination in S102 is No, the muffling unit 17 muffles the sound input to the microphone 4 so that the sound is not output from the external device. Any method may be used for the muffling. If it is determined as No in S102, the voice data is not transmitted from the communication section 13 to the external device.

このように、支援装置１では、カメラ３によって撮影された画像のデータに基づいて、当該画像に写っている人が発話していることが検出される。そして、検出部１６が発話を検出していれば、マイク４のミュート機能がオフになる。検出部１６が発話を検出していなければ、マイク４のミュート機能がオンになる。したがって、会議が行われている間、マイク４のミュート機能を適切に制御できる。 As described above, the support device 1 detects that the person in the image is speaking based on the data of the image captured by the camera 3 . Then, if the detection unit 16 detects the speech, the mute function of the microphone 4 is turned off. If the detection unit 16 does not detect speech, the mute function of the microphone 4 is turned on. Therefore, the mute function of the microphone 4 can be appropriately controlled during the conference.

図４は、制御装置７の他の動作例を示すフローチャートである。図４に示す例では、制御装置７は判定部１８を更に備える。また、第１検出基準及び第２検出基準が予め設定される。第１検出基準及び第２検出基準は、検出部１６が発話を検出するための基準である。第２検出基準は、第１検出基準とは異なる基準である。 FIG. 4 is a flow chart showing another operation example of the control device 7 . In the example shown in FIG. 4 , the control device 7 further includes a determination section 18 . Also, a first detection criterion and a second detection criterion are set in advance. The first detection criteria and the second detection criteria are criteria for the detection unit 16 to detect speech. The second detection criterion is a criterion different from the first detection criterion.

図４のＳ２０１に示す処理は、図３のＳ１０１に示す処理と同様である。会議が開始されると、カメラ３によって撮影された画像のデータが制御装置７に入力される。 The process shown in S201 of FIG. 4 is the same as the process shown in S101 of FIG. When the conference starts, image data captured by the camera 3 is input to the control device 7 .

判定部１８は、カメラ３によって撮影された画像に基づいて、当該画像に写っている人がマスクをしているか否かを判定する（Ｓ２０２）。一例として、判定部１８は、画像処理部１１によって処理された画像のデータから、人の目を特定する。判定部１８は、カメラ３によって撮影された画像において、特定した目の下の一定の範囲が布状のもので覆われていると判断できれば、当該画像に写っている人がマスクを着用していると判定する（Ｓ２０２のＹｅｓ）。判定部１８は、カメラ３によって撮影された画像において、特定した目の下にその人の口を特定することができれば、当該画像に写っている人がマスクを着用していないと判定する（Ｓ２０２のＮｏ）。判定部１８による判定の方法は、当該例に限定されない。 Based on the image captured by the camera 3, the determination unit 18 determines whether or not the person in the image is wearing a mask (S202). As an example, the determination unit 18 identifies human eyes from image data processed by the image processing unit 11 . If the determining unit 18 can determine that a certain area under the identified eye is covered with a cloth-like object in the image captured by the camera 3, it can be determined that the person in the image is wearing a mask. Determine (Yes in S202). If the determination unit 18 can identify the person's mouth under the identified eyes in the image captured by the camera 3, the determination unit 18 determines that the person in the image is not wearing a mask (No in S202). ). The method of determination by the determination unit 18 is not limited to this example.

Ｓ２０２でＹｅｓと判定されると、検出部１６は、発話を検出するための基準として第１検出基準を選択する（Ｓ２０３）。Ｓ２０２でＮｏと判定されると、検出部１６は、発話を検出するための基準として第２検出基準を選択する（Ｓ２０４）。 If determined as Yes in S202, the detection unit 16 selects the first detection criterion as a criterion for detecting speech (S203). When determined as No in S202, the detection unit 16 selects the second detection criterion as a criterion for detecting speech (S204).

Ｓ２０５に示す処理は、図３のＳ１０２に示す処理と同様である。即ち、制御装置７では、画像に写っている人が発話していることが検出部１６によって検出されたか否かが判定される。なお、Ｓ２０５では、検出部１６は、Ｓ２０２でＹｅｓと判定されていれば第１検出基準に基づいて発話を検出する。検出部１６は、Ｓ２０２でＮｏと判定されていれば第２検出基準に基づいて発話を検出する。一例として、検出部１６は、Ｓ２０２でＹｅｓと判定された場合は、Ｎｏと判定された場合よりも広い範囲のデータに基づいて動きを表す指標を算出しても良い。 The process shown in S205 is the same as the process shown in S102 of FIG. That is, in the control device 7, it is determined whether or not the detection unit 16 has detected that the person in the image is speaking. In addition, in S205, the detection unit 16 detects an utterance based on the first detection criterion if it is determined as Yes in S202. The detection unit 16 detects an utterance based on the second detection criterion if determined as No in S202. As an example, when the determination in S202 is Yes, the detection unit 16 may calculate the index representing the movement based on a wider range of data than when the determination is No.

Ｓ２０６及びＳ２０７に示す処理は、図３のＳ１０３及びＳ１０４に示す処理と同様である。即ち、Ｓ２０５でＹｅｓと判定されると、消音部１７は、マイク４のミュート機能をオフにする（Ｓ２０６）。Ｓ２０５でＮｏと判定されると、消音部１７は、マイク４のミュート機能をオンにする（Ｓ２０７）。 The processes shown in S206 and S207 are the same as the processes shown in S103 and S104 of FIG. That is, when it is determined as Yes in S205, the muffling unit 17 turns off the mute function of the microphone 4 (S206). If it is determined No in S205, the muffling unit 17 turns on the mute function of the microphone 4 (S207).

図４に示す例であれば、感染症が流行っている時期或いは花粉症の季節で利用者がマスクを着用している場合でも、マイク４のミュート機能を適切に制御できる。 In the example shown in FIG. 4, the mute function of the microphone 4 can be appropriately controlled even when the user wears a mask during the season when infectious diseases are prevalent or when hay fever occurs.

他の例として、制御装置７は保存部１９を更に備えても良い。本実施の形態に示す例では、検出部１６は、カメラ３によって撮影された画像に基づいて、当該画像に写っている人が発話していることを検出する。このため、制御装置７の処理能力が低いと、マイク４のミュート機能がオフになるタイミングが、発話が開始されたタイミングから僅かに遅れてしまう。 As another example, the control device 7 may further include a storage section 19 . In the example shown in this embodiment, the detection unit 16 detects, based on an image captured by the camera 3, that a person in the image is speaking. Therefore, if the processing capability of the control device 7 is low, the timing at which the mute function of the microphone 4 is turned off is slightly delayed from the timing at which the speech is started.

保存部１９は、マイク４に入力された音声のデータを記憶部１０に保存する。なお、この音声データの保存は、マイク４のミュート機能がオンである間も行われる。そして、通信部１３は、検出部１６が発話を検出すると、検出部１６が発話を検出する一定時間前から保存部１９によって保存されていた音声のデータを外部機器に送信する。当該一定時間は予め設定される。例えば、当該一定時間は０．５秒である。これにより、発話開始時からの音声データを外部機器に送信することができる。通信部１３は、検出部１６が発話を検出してから一定時間経過後にマイク４に入力された音声のデータをそのまま外部機器に送信することができるように、音声データの送信を制御しても良い。 The storage unit 19 stores voice data input to the microphone 4 in the storage unit 10 . Note that this audio data is saved even while the mute function of the microphone 4 is on. Then, when the detection unit 16 detects the speech, the communication unit 13 transmits the voice data stored in the storage unit 19 from a predetermined time before the detection unit 16 detects the speech to the external device. The certain period of time is set in advance. For example, the certain period of time is 0.5 seconds. As a result, it is possible to transmit the voice data from the beginning of the speech to the external device. The communication unit 13 controls the transmission of the voice data so that the data of the voice input to the microphone 4 can be transmitted to the external device as it is after a certain period of time has elapsed since the detection unit 16 detected the utterance. good.

他の例として、制御装置７は編集部２０を更に備えても良い。編集部２０は、会議中に保存部１９によって記憶部１０に保存された音声データ及び画像データを編集し、議事録を作成する。編集部２０によって作成された議事録は、利用者の操作に応じてディスプレイ５及びスピーカ６から出力される。 As another example, the control device 7 may further include an editing section 20 . The editing unit 20 edits the audio data and image data stored in the storage unit 10 by the storage unit 19 during the meeting, and creates minutes. The minutes created by the editing unit 20 are output from the display 5 and the speaker 6 according to the user's operation.

この例では、保存部１９は、検出部１６が検出した結果をマイク４に入力された音声のデータに紐付けて記憶部１０に保存することが好ましい。更に、保存部１９は、検出部１６が検出した結果をカメラ３によって撮影された画像のデータに紐付けて記憶部１０に記憶しても良い。検出部１６が検出した結果とは、例えば「発話あり」を示す第１情報と「発話なし」を示す第２情報である。更に、通信部１３は、検出部１６が検出した結果が紐付けられた音声データ及び画像データを外部機器に送信しても良い。 In this example, the storage unit 19 preferably stores the result detected by the detection unit 16 in the storage unit 10 in association with the voice data input to the microphone 4 . Furthermore, the storage unit 19 may store the result detected by the detection unit 16 in the storage unit 10 in association with the data of the image captured by the camera 3 . The results detected by the detection unit 16 are, for example, first information indicating "speech" and second information indicating "no speech". Furthermore, the communication unit 13 may transmit the audio data and image data associated with the results detected by the detection unit 16 to the external device.

外部機器においても同様の処理が行われる。このため、通信部１３は、第１情報或いは第２情報が紐付けられた音声データ及び画像データを外部機器から受信する。通信部１３が外部機器から受信した音声データ及び画像データは、第１情報或いは第２情報が紐付けられた状態で保存部１９によって記憶部１０に保存される。 Similar processing is performed in the external device. Therefore, the communication unit 13 receives audio data and image data linked with the first information or the second information from the external device. The audio data and image data received by the communication unit 13 from the external device are stored in the storage unit 10 by the storage unit 19 while being associated with the first information or the second information.

編集部２０は、例えば会議が終了すると、当該会議の議事録を作成する。この時、編集部２０は、第１情報が紐付けられた音声データ及び画像データのみを用いて議事録を作成しても良い。 For example, when a meeting ends, the editing section 20 creates minutes of the meeting. At this time, the editing unit 20 may create the minutes using only the audio data and image data associated with the first information.

図５は、編集部２０によって作成された議事録がディスプレイ５に表示されている例を示す図である。図５に示す例では、ディスプレイ５に、画像表示領域５ａ、シークバー５ｂ、及び再生リスト５ｃが表示される。 FIG. 5 is a diagram showing an example in which the minutes created by the editing unit 20 are displayed on the display 5. As shown in FIG. In the example shown in FIG. 5, the display 5 displays an image display area 5a, a seek bar 5b, and a playlist 5c.

再生リスト５ｃに含まれる各コンテンツは、会議中に記憶部１０に保存された音声データ及び画像データのうち第１情報が紐付けられたものを示す。再生リスト５ｃには、当該コンテンツが時系列で並べられている。図５に示す例では、各コンテンツに、発話している人の氏名Ａ～Ｆとその発話時間とが表記されている。再生リスト５ｃに含まれるコンテンツの１つを選択することにより、当該コンテンツの音声がスピーカ６から出力され、当該コンテンツの画像が画像表示領域５ａに表示される。なお、編集部２０が作成する議事録は、図５に示す例に限定されない。例えば、編集部２０は、音声データのみの議事録を作成しても良い。 Each content included in the play list 5c indicates the audio data and image data stored in the storage unit 10 during the meeting that are associated with the first information. The contents are arranged in chronological order in the play list 5c. In the example shown in FIG. 5, each content is labeled with the names A to F of the person speaking and the speaking time. By selecting one of the contents included in the reproduction list 5c, the sound of the content is output from the speaker 6, and the image of the content is displayed in the image display area 5a. Note that the minutes created by the editing unit 20 are not limited to the example shown in FIG. For example, the editing unit 20 may create minutes of only audio data.

図６は、制御装置７のハードウェア資源の例を示す図である。制御装置７は、ハードウェア資源として、プロセッサ３１とメモリ３２とを含む処理回路３０を備える。処理回路３０に複数のプロセッサ３１が含まれても良い。処理回路３０に複数のメモリ３２が含まれても良い。 FIG. 6 is a diagram showing an example of hardware resources of the control device 7. As shown in FIG. The control device 7 includes a processing circuit 30 including a processor 31 and a memory 32 as hardware resources. A plurality of processors 31 may be included in the processing circuitry 30 . A plurality of memories 32 may be included in the processing circuitry 30 .

本実施の形態において、符号１０～２０に示す各部は、制御装置７が有する機能を示す。符号１１～２０に示す各部の機能は、プログラムとして記述されたソフトウェア、ファームウェア、又はソフトウェアとファームウェアとの組み合わせによって実現できる。当該プログラムは、メモリ３２に記憶される。制御装置７は、メモリ３２に記憶されたプログラムをプロセッサ３１（コンピュータ）によって実行することにより、符号１１～２０に示す各部の機能を実現する。記憶部１０の機能はメモリ３２によって実現される。メモリ３２として、半導体メモリ等が採用できる。 In the present embodiment, each part indicated by reference numerals 10 to 20 indicates the function of the control device 7. FIG. The functions of the units indicated by reference numerals 11 to 20 can be implemented by software written as a program, firmware, or a combination of software and firmware. The program is stored in memory 32 . The control device 7 implements the functions of the units indicated by reference numerals 11 to 20 by executing the programs stored in the memory 32 by the processor 31 (computer). The functions of the storage unit 10 are implemented by the memory 32 . A semiconductor memory or the like can be used as the memory 32 .

図７は、制御装置７のハードウェア資源の他の例を示す図である。図７に示す例では、制御装置７は、プロセッサ３１、メモリ３２、及び専用ハードウェア３３を含む処理回路３０を備える。図７は、制御装置７が有する機能の一部を専用ハードウェア３３によって実現する例を示す。制御装置７が有する機能の全部を専用ハードウェア３３によって実現しても良い。専用ハードウェア３３として、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ＡＳＩＣ、ＦＰＧＡ、又はこれらの組み合わせを採用できる。 FIG. 7 is a diagram showing another example of hardware resources of the control device 7. As shown in FIG. In the example shown in FIG. 7 , the control device 7 comprises processing circuitry 30 including a processor 31 , memory 32 and dedicated hardware 33 . FIG. 7 shows an example in which a part of the functions of the control device 7 are implemented by dedicated hardware 33 . All the functions of the control device 7 may be realized by dedicated hardware 33 . Dedicated hardware 33 can be a single circuit, multiple circuits, programmed processors, parallel programmed processors, ASICs, FPGAs, or combinations thereof.

１支援装置
２ネットワーク
３カメラ
４マイク
５ディスプレイ
６スピーカ
７制御装置
１０記憶部
１１画像処理部
１２音声処理部
１３通信部
１４画像出力部
１５音声出力部
１６検出部
１７消音部
１８判定部
１９保存部
２０編集部
３０処理回路
３１プロセッサ
３２メモリ
３３専用ハードウェア 1 support device 2 network 3 camera 4 microphone 5 display 6 speaker 7 control device 10 storage unit 11 image processing unit 12 audio processing unit 13 communication unit 14 image output unit 15 audio output unit 16 detection unit 17 muffling unit 18 determination unit 19 storage unit 20 editor 30 processing circuit 31 processor 32 memory 33 dedicated hardware

Claims

a communication means for receiving audio data from an external device;
audio output means for outputting audio from a speaker based on the audio data received by the communication means from the external device;
detection means for detecting, based on an image captured by a camera, that a person in the image is speaking;
mute means for muting the sound input to the microphone so that it is not output from the external device if the detection means does not detect an utterance;
with
The communication means is a support device that transmits data of voice input to a microphone to the external device if the detection means detects an utterance.

Based on the image taken by the camera, further comprising determination means for determining whether or not the person in the image is wearing a mask,
The detection means is
When the determining means determines that the mask is being worn, detecting the utterance based on a specific first detection criterion,
2. The assisting device according to claim 1, wherein when said determining means determines that said speech is not masked, said speech is detected based on a specific second detection criterion different from said first detection criterion.

Further comprising storage means for storing data of the voice input to the microphone,
1 or claim 1, wherein, when said detection means detects an utterance, said communication means transmits to said external device the voice data stored by said storage means from a predetermined time before said detection means detects an utterance. 3. The support device according to 2.

a storage means for storing the result detected by the detection means in association with data of the voice input to the microphone;
Editing means for editing the audio data saved by the saving means to create minutes;
further comprising
3. The support device according to claim 1, wherein said communication means transmits audio data associated with a result detected by said detection means to said external device.

a first communication process for receiving audio data from an external device;
an audio output process for outputting audio from a speaker based on the audio data received from the external device in the first communication process;
a detection process for detecting, based on an image captured by a camera, that a person in the image is speaking;
mute processing for muting the sound input to the microphone so that it is not output from the external device if no speech is detected in the detection processing;
a second communication process for transmitting data of voice input to a microphone to the external device if an utterance is detected in the detection process;
A program that causes a computer to run

Based on the image captured by the camera, causing the computer to further execute determination processing for determining whether or not the person in the image is wearing a mask,
In the detection process,
When it is determined in the determination process that the mask is being worn, the speech is detected based on a specific first detection criterion,
6. The program according to claim 5, wherein if the determination processing determines that the mask is not applied, the program detects speech based on a specific second detection criterion different from the first detection criterion.