JP7286303B2

JP7286303B2 - Conference support system and conference robot

Info

Publication number: JP7286303B2
Application number: JP2018221283A
Authority: JP
Inventors: 博之関川; 貴志新居; 春夫福山; 雄一郎吉川; オスカーパリンコ; 浩平小川; 浩石黒
Original assignee: Osaka University NUC; Itoki Corp
Current assignee: Osaka University NUC; Itoki Corp
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2023-06-05
Anticipated expiration: 2038-11-27
Also published as: JP2020088637A

Description

この発明は、会議コミュニケーションを支援する技術に関する。 The present invention relates to technology for supporting conference communication.

特許文献１は、ユーザの見ている向きとユーザの移動方向とに基づいて、案内をするか否かを判断する案内ロボット制御システムを開示している。 Patent Literature 1 discloses a guidance robot control system that determines whether or not to provide guidance based on the user's viewing direction and the user's moving direction.

特開２０１７－１５９４１０号公報JP 2017-159410 A

特許文献１に開示の技術は、ユーザが案内を欲している場合に行う行動を利用して、案内をするか否かを判断している。 The technique disclosed in Patent Literature 1 determines whether or not to provide guidance by using the behavior that the user performs when he or she desires guidance.

しかしながら、複数人が参加する会議では、そのような行動を期待できない。会議においては、会議の進行中におけるコミュニケーションを遮ることの適否を判断して、言葉を発するタイミングを決定する必要がある。 However, such behavior cannot be expected in a conference in which multiple people participate. In a conference, it is necessary to determine the appropriateness of interrupting communication while the conference is in progress, and to determine the timing of speaking.

そこで、本発明は、会議において、言葉を発するのに適したタイミングを見出せるようにすることを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to find a suitable timing for speaking in a meeting.

上記課題を解決するため，第１の態様に係る会議支援システムは、複数の参加者による会議に用いられる会議支援システムであって、前記複数の参加者の会議参加状態を取得する参加状態取得部と、前記参加状態取得部で得られた前記複数の参加者の会議参加状態に基づいて、前記複数の参加者によるコミュニケーションの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定する制御部と、を備える。 In order to solve the above problems, a conference support system according to a first aspect is a conference support system used in a conference by a plurality of participants, and includes a participation state acquisition unit that acquires the conference participation states of the plurality of participants. and, based on the conference participation states of the plurality of participants obtained by the participation state acquisition unit, the likelihood of interrupting communication by the plurality of participants is obtained, and based on the obtained likelihood of interruption, at the timing of speaking and a control unit that determines whether or not there is.

第２の態様は、第１の態様に係る会議支援システムであって、前記制御部は、発言予定内容を記録する発言予定記録部を含み、発言タイミングであると判定されたときに、前記発言予定記録部に記録された発言予定内容を、スピーカを通じて発言するものである。 A second aspect is the conference support system according to the first aspect, wherein the control unit includes a speech schedule recording unit that records speech schedule content, and when it is determined that it is time to speak, the speech The contents of the speech schedule recorded in the schedule recording unit are spoken through the speaker.

第３の態様は、第２の態様に係る会議支援システムであって、前記制御部は、前記複数の参加者のうちの少なくとも一人の参加者の発言予定内容を前記発言予定記録部に記録する。 A third aspect is the conference support system according to the second aspect, wherein the control unit records the speech schedule content of at least one of the plurality of participants in the speech schedule recording unit. .

第４の態様は、第３の態様に係る会議支援システムであって、前記複数の参加者が、現実空間において互いの会議参加状態を観察可能な状態で会議を行う複数の第１参加者と、通信回線を介して会議に参加する第２参加者とを含み、前記制御部は、前記第２参加者の発言予定内容を前記発言予定記録部に記録するものである。 A fourth aspect is the conference support system according to the third aspect, wherein the plurality of participants hold a conference in a real space with a plurality of first participants who can observe each other's conference participation states. , and a second participant who participates in the conference via a communication line, and the control unit records the speech schedule content of the second participant in the speech schedule recording unit.

第５の態様は、第１から第４のいずれか１つの態様に係る会議支援システムであって、前記参加状態取得部は、前記複数の参加者のそれぞれの会議参加状態を取得する複数の個別参加状態取得部を含み、前記制御部は、前記複数の個別参加状態取得部で得られた前記複数の参加者の会議参加状態に基づいて、前記複数の参加者それぞれの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定するものである。 A fifth aspect is the conference support system according to any one of the first to fourth aspects, wherein the participation state acquisition unit includes a plurality of individual a participation state acquisition unit, wherein the control unit obtains the ease of interruption of each of the plurality of participants based on the conference participation states of the plurality of participants obtained by the plurality of individual participation state acquisition units; Whether or not it is time to speak is determined based on the obtained degree of obstruction.

第６の態様は、第１から第５のいずれか１つの態様に係る会議支援システムであって、前記複数の参加者のうちの少なくとも一人の参加者による発言の意思指令を取得する発言指令受付部と、前記会議が行われる現実空間において、会議への参加動作が可能な擬似会議参加装置と、をさらに備え、前記制御部は、発言の意思指令取得後、発言タイミングで無いと判定されると、前記擬似会議参加装置に、遮りやすさを上げる第１動作を実行させるものである。 A sixth aspect is the conference support system according to any one of the first to fifth aspects, wherein a speech command reception for acquiring a speech intention command by at least one of the plurality of participants and a pseudo-conference participation device capable of participating in the conference in the real space where the conference is held, wherein the control unit determines that it is not time to speak after obtaining the intention command to speak. Then, the pseudo-conference participation device is caused to perform a first operation for increasing the likelihood of interruption.

第７の態様は、第６の態様に係る会議支援システムであって、前記制御部は、前記第１動作実行後、さらに発言タイミングで無いと判定されると、前記擬似会議参加装置に、遮りやすさを上げる第２動作を実行させるものである。 A seventh aspect is the conference support system according to the sixth aspect, wherein, after the execution of the first operation, the control unit instructs the pseudo-conference participation device to interrupt when it is determined that it is not time to speak. This is to execute a second operation that increases ease.

第８の態様は、第７の態様に係る会議支援システムであって、前記第２動作は、前記第１動作よりも遮りやすさを上げるのに効果的な動作とされている。 An eighth aspect is the conference support system according to the seventh aspect, wherein the second action is more effective than the first action in increasing the likelihood of interruption.

上記課題を解決するため、第９の態様に係る会議用ロボットは、複数の参加者による会議に参加する会議用ロボットであって、前記複数の参加者の会議参加状態が入力される参加状態入力部と、前記参加状態入力部より入力された前記複数の参加者の会議参加状態に基づいて、前記複数の参加者によるコミュニケーションの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定し、発言タイミングであると判定されたときに、スピーカを通じた発言処理を実行する制御部と、を備える。 In order to solve the above-described problems, a conference robot according to a ninth aspect is a conference robot that participates in a conference of a plurality of participants, and is a conference robot for inputting a conference participation state of the plurality of participants. and based on the conference participation states of the plurality of participants input from the participation state input unit, the likelihood of interrupting communication by the plurality of participants is obtained, and the timing of utterances is determined based on the obtained likelihood of interruption. and a control unit that determines whether or not it is time to speak, and executes processing for speaking through the speaker when it is determined that it is time to speak.

第１０の態様は、第９の態様に係る会議用ロボットであって、前記参加状態入力部には、前記複数の参加者のそれぞれの会議参加状態が入力され、前記制御部は、前記複数の参加者それぞれの会議参加状態に基づいて、前記複数の参加者それぞれの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定するものである。 A tenth aspect is the conference robot according to the ninth aspect, wherein the conference participation state of each of the plurality of participants is input to the participation state input unit, and the control unit controls the Based on the state of participation in the conference of each participant, the likelihood of interruption for each of the plurality of participants is obtained, and whether or not it is time to speak is determined based on the obtained likelihood of interruption.

第１１の態様は、第９又は第１０の態様に係る会議用ロボットであって、ロボットとしての動作を実行するロボット動作部をさらに備え、前記制御部は、発生すべき発言予定内容が生じた後、発言タイミングで無いと判定されると、前記ロボット動作部に、遮りやすさを上げる第１動作を実行させるものである。 An eleventh aspect is the conference robot according to the ninth or tenth aspect, further comprising a robot action unit that performs an action as a robot, wherein the control unit controls the content of a scheduled speech to be generated. Later, when it is determined that it is not time to speak, the robot action section is caused to perform a first action to increase the likelihood of interruption.

第１２の態様は、第１１の態様に係る会議用ロボットであって、前記制御部は、前記第１動作実行後、さらに、発言タイミングで無いと判定されると、前記ロボット動作部に、遮りやすさを上げる第２動作を実行させるものである。 A twelfth aspect is the conference robot according to the eleventh aspect, wherein the control unit instructs the robot operation unit to interrupt when it is determined that it is not time to speak after executing the first operation. This is to execute a second operation that increases ease.

第１３の態様は、第１２の態様に係る会議用ロボットであって、前記第２動作は、前記第１動作よりも遮りやすさを上げるのに効果的な動作とされている。 A thirteenth aspect is the conference robot according to the twelfth aspect, wherein the second motion is more effective than the first motion in increasing the likelihood of interruption.

第１の態様によると、参加状態取得部で得られた複数の参加者の会議参加状態に基づいて、複数の参加者によるコミュニケーションの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定するため、会議において、発言するのに適したタイミングを見出せる。 According to the first aspect, based on the conference participation states of the plurality of participants obtained by the participation state acquisition unit, the likelihood of interrupting communication by the plurality of participants is obtained, and a statement is made based on the obtained likelihood of interruption. In order to determine whether or not it is time, it is possible to find the appropriate timing for speaking in a meeting.

第２の態様によると、予め発言予定内容を記録しておけば、適切なタイミングで発言することができる。 According to the second aspect, if the content of scheduled speech is recorded in advance, it is possible to speak at an appropriate timing.

第３の態様によると、会議の参加者は、予め発言予定内容を記録しておけば、適切なタイミングで発言することができる。 According to the third aspect, the participants of the conference can speak at appropriate timing by recording the content of scheduled speeches in advance.

第４の態様によると、通信回線を介して会議に参加する第２参加者は、複数の第１参加者の会議参加状態を観察し難い。そこで、第２参加者の発言予定内容を発言予定記録部に記録しておくことで、第２参加者は、予め発言予定内容を記録しておけば、適切なタイミングで発言することができる。 According to the fourth aspect, it is difficult for the second participant who participates in the conference via the communication line to observe the conference participation states of the plurality of first participants. Therefore, by recording the planned utterance contents of the second participant in the utterance schedule recording section, the second participant can speak at an appropriate timing if the planned utterance contents are recorded in advance.

第５の態様によると、複数の参加者それぞれの会議参加状態に基づいて、より適切に発言タイミングであるか否かを判定できる。 According to the fifth aspect, it is possible to more appropriately determine whether or not it is time to speak based on the conference participation status of each of the plurality of participants.

第６の態様によると、発言タイミングで無いと判定されたときに、前記擬似会議参加装置に、遮りやすさを上げる第１動作を実行させるため、会議において、発言し易くなる。 According to the sixth aspect, when it is determined that it is not time to speak, the pseudo-conference participation device is caused to perform the first operation to increase the likelihood of interruption, so that it is easier to speak in the conference.

第７の態様によると、さらに、発言タイミングで無いと判定されると、前記擬似会議参加装置に、遮りやすさを上げる第２動作を実行させるため、会議において、発言し易くなる。 According to the seventh aspect, furthermore, when it is determined that it is not time to speak, the pseudo-conference participation device is caused to perform the second operation to increase the likelihood of interruption, thereby making it easier to speak in the conference.

第８の態様によると、さらに、第１動作実行後、さらに、発言タイミングで無いと判定されると、遮りやすさを上げるのにより効果的な第２動作を実行させるため、会議において、発言し易くなる。 According to the eighth aspect, furthermore, after the execution of the first action, if it is determined that it is not time to speak, the second action that is more effective in increasing the likelihood of being interrupted is executed in the conference. becomes easier.

第９の態様によると、参加状態入力部より入力された前記複数の参加者の会議参加状態に基づいて、複数の参加者によるコミュニケーションの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定するため、ロボットは、会議において、適したタイミングで発言することができる。 According to the ninth aspect, based on the conference participation states of the plurality of participants input from the participation state input unit, the ease of interrupting communication by the plurality of participants is obtained, and based on the obtained ease of interruption In order to determine whether or not it is time to speak, the robot can speak at a suitable timing in the conference.

第１０の態様によると、複数の参加者それぞれの会議参加状態に基づいて、より適切に発言タイミングであるか否かを判定できる。 According to the tenth aspect, it is possible to more appropriately determine whether or not it is time to speak based on the conference participation status of each of the plurality of participants.

第１１の態様によると、発言タイミングで無いと判定されると、前記ロボット動作部に、遮りやすさを上げる第１動作を実行させるため、会議において、発言し易くなる。 According to the eleventh aspect, when it is determined that it is not time to speak, the robot action unit is caused to perform the first action that increases the likelihood of interruption, so that it becomes easier to speak in the conference.

第１２の態様によると、第１動作実行後、さらに、発言タイミングで無いと判定されると、前記ロボット動作部に、遮りやすさを上げる第２動作を実行させるため、会議において、発言し易くなる。 According to the twelfth aspect, after the execution of the first action, if it is further determined that it is not time to speak, the robot action unit is caused to perform the second action that increases the likelihood of interruption, so that the user can speak more easily in the conference. Become.

第１３の態様によると、第１動作実行後、さらに、発言タイミングで無いと判定されると、遮りやすさを上げるのにより効果的な第２動作を実行させるため、会議において、音声を発し易くなる。 According to the thirteenth aspect, after the execution of the first action, if it is further determined that it is not time to speak, the second action that is more effective in increasing the likelihood of interruption is executed. Become.

第１実施形態に係る会議支援システムが適用された会議の様子を示す説明図である。FIG. 2 is an explanatory diagram showing a state of a conference to which the conference support system according to the first embodiment is applied; 代理参加ロボットの動作例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of the operation of a substitute participation robot; カメラによる撮像例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of imaging by a camera; 会議支援システムの電気的構成を示すブロック図である。1 is a block diagram showing an electrical configuration of a conference support system; FIG. 学習済モデルの一例を示す図である。It is a figure which shows an example of a learned model. 学習中モデルの一例を示す図である。It is a figure which shows an example of a model during learning. 制御部の動作例を示すフローチャートである。4 is a flowchart showing an operation example of a control unit; 第２実施形態に係る会議用ロボットが適用された会議の様子を示す説明図である。FIG. 10 is an explanatory diagram showing a state of a conference to which the conference robot according to the second embodiment is applied; 同上の会議用ロボットの電気的構成を示すブロック図である。Fig. 3 is a block diagram showing an electrical configuration of the conference robot;

｛第１実施形態｝
以下、第１実施形態に係る会議支援システムについて説明する。図１は会議支援システム３０が適用された会議の様子を示す説明図である。 {First Embodiment}
A conference support system according to the first embodiment will be described below. FIG. 1 is an explanatory diagram showing a state of a conference to which the conference support system 30 is applied.

会議支援システム３０は、複数人の参加者１０Ａ、１０Ｂによる会議に用いられる。複数の参加者１０Ａ、１０Ｂは、会議に参加して何らかの発言を行ったり、他者の発言を聞いたりする主体である。参加者１０Ａ、１０Ｂは、同一の会議場所に存在していてもよいし、複数の場所に分散して存在していてもよい。 A conference support system 30 is used for a conference by a plurality of participants 10A and 10B. A plurality of participants 10A and 10B are subjects who participate in the conference, make some remarks, and listen to the remarks of others. The participants 10A and 10B may exist at the same conference place, or may exist dispersedly in a plurality of places.

図１では、会議室２０に複数人（５人）の第１参加者１０Ａが存在しており、会議室２０から離れた場所に第２参加者１０Ｂが存在している例が示されている。つまり、現実空間において互いの会議参加状態を観察可能な状態で複数の第１参加者１０Ａが会議を行う。会議室２０には、擬似開始参加装置として代理参加ロボット６０が存在している。第２参加者１０Ｂは、通信回線７０及び代理参加ロボット６０を通じて会議室２０内における複数人の第１参加者１０Ａの会議に参加することができる。 FIG. 1 shows an example in which a plurality of (five) first participants 10A are present in the conference room 20, and a second participant 10B is present at a location away from the conference room 20. . That is, a plurality of first participants 10A hold a conference in a state where they can observe each other's conference participation states in the physical space. In the conference room 20, a proxy participation robot 60 exists as a pseudo start participation device. The second participant 10B can participate in the conference of the first participants 10A in the conference room 20 through the communication line 70 and the proxy participation robot 60. FIG.

会議室２０には、会議机２２が置かれ、その周りに椅子２４が複数設けられている。複数人の第１参加者１０Ａ及び代理参加ロボット６０は、各椅子２４に着席している。 A conference desk 22 is placed in the conference room 20, and a plurality of chairs 24 are provided around it. A plurality of first participants 10A and proxy participation robots 60 are seated in respective chairs 24. As shown in FIG.

代理参加ロボット６０は、第２参加者１０Ｂの代理として、会議への参加行為を実行できるロボットである。例えば、図２に示すように、代理参加ロボット６０は、内蔵されたスピーカ等を通じて、会議室２０において発言する行為（図２では、“Excuse me.”（すみません）と発言）を実行できる。また、代理参加ロボット６０は、発言行為以外にも、会議中に行うことのある動作（図２では挙手する行為が例示）を実行できる。このため、第２参加者１０Ｂが代理参加ロボット６０を通じて円滑に会議に参加することができる。代理参加ロボット６０には、後述する制御部４０が組込まれている。 The proxy participation robot 60 is a robot capable of participating in the conference as a proxy for the second participant 10B. For example, as shown in FIG. 2, the proxy participation robot 60 can execute an act of speaking in the conference room 20 (in FIG. 2, saying "Excuse me.") through a built-in speaker or the like. In addition, the proxy participation robot 60 can perform actions other than speaking, which may be performed during a meeting (the action of raising a hand is exemplified in FIG. 2). Therefore, the second participant 10B can smoothly participate in the conference through the proxy participation robot 60. FIG. The proxy participation robot 60 incorporates a control unit 40, which will be described later.

図１に戻って、各第１参加者１０Ａに対応してカメラ５８及びマイク５９が設けられている。カメラ５８及びマイク５９は、第１参加者１０Ａの会議参加状態を動画データとして取得するために用いられる。例えば、マイク５９は、第１参加者の１０Ａの発言を音声データとして取得するために用いられる。カメラ５８は、第１参加者１０Ａの動作を取得するために用いられる。例えば、カメラ５８によって、図３に示すように、第１参加者１０Ａの視線の動き（矢符Ｐ１参照）、第１参加者１０Ａの顔の傾き（矢符Ｐ２参照、頷き、首振り動作等）等を撮像することができる。カメラ５８及びマイク５９は、制御部４０に接続されている。 Returning to FIG. 1, a camera 58 and a microphone 59 are provided corresponding to each first participant 10A. The camera 58 and the microphone 59 are used to acquire the meeting participation state of the first participant 10A as video data. For example, the microphone 59 is used to acquire speech data of the first participant 10A. A camera 58 is used to capture the motion of the first participant 10A. For example, with the camera 58, as shown in FIG. ) etc. can be imaged. The camera 58 and microphone 59 are connected to the controller 40 .

第２参加者１０Ｂは、机２２Ｂ前に設けられた椅子２４Ｂに着席している。第２参加者１０Ｂ前には、マイク８６、スピーカ８７及び表示装置８８が設けられている。マイク８６には、第１参加者１０Ａの音声が入力される。会議室２０における会議の様子は、表示装置８８に表示され、また、会議における発言は、スピーカ８７を通じて再生される。これらのマイク８６、スピーカ８７及び表示装置８８は、端末装置８０に接続されている。 The second participant 10B is seated on a chair 24B provided in front of the desk 22B. A microphone 86, a speaker 87 and a display device 88 are provided in front of the second participant 10B. The voice of the first participant 10A is input to the microphone 86 . The state of the conference in the conference room 20 is displayed on the display device 88 , and the speech in the conference is reproduced through the speaker 87 . These microphone 86 , speaker 87 and display device 88 are connected to the terminal device 80 .

上記代理参加ロボット６０及び端末装置８０は、通信回線７０を介して相互通信可能に接続される。通信回線７０は、有線式であっても無線式であってもよい。また、通信回線７０は、公衆通信網であっても専用回線であってもよい。 The proxy participation robot 60 and the terminal device 80 are connected via a communication line 70 so as to be mutually communicable. The communication line 70 may be wired or wireless. Also, the communication line 70 may be a public communication network or a dedicated line.

代理参加ロボット６０は、参加状態取得部としての各カメラ５８及び各マイク５９で得られた複数の第１参加者１０Ａの会議参加状態に基づいて、複数の参加者１０Ａによるコミュニケーション（会議）への遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定する制御部４０（後述）を備えている。そして、発言タイミングであると判定されたときに、第２参加者１０Ｂは、代理参加ロボット６０を介して、会議室２０における会議に参加することができる。 The proxy participation robot 60 determines communication (conference) by the plurality of participants 10A based on the conference participation states of the plurality of first participants 10A obtained by each camera 58 and each microphone 59 as a participation state acquisition unit. A control unit 40 (to be described later) is provided which determines whether or not it is time to speak based on the obtained likelihood of interruption. Then, when it is determined that it is time to speak, the second participant 10B can participate in the conference in the conference room 20 via the proxy participation robot 60 .

図４は会議支援システム３０の電気的構成を示すブロック図である。 FIG. 4 is a block diagram showing the electrical configuration of the conference support system 30. As shown in FIG.

会議支援システム３０は、会議室２０に設けられた制御部４０、代理参加ロボット６０及び複数組のカメラ５８及びマイク５９と、会議室２０から離れた位置に設けられた端末装置８０、マイク８６、スピーカ８７及び表示装置８８とを備える。制御部４０と端末装置８０とは、通信回線７０を通じて相互通信可能に接続されている。 The conference support system 30 includes a control unit 40, a proxy participation robot 60, a plurality of sets of cameras 58 and microphones 59 provided in the conference room 20, a terminal device 80 provided at a position away from the conference room 20, a microphone 86, A speaker 87 and a display device 88 are provided. The control unit 40 and the terminal device 80 are connected through a communication line 70 so as to be able to communicate with each other.

複数組のカメラ５８及びマイク５９は、複数の参加者（ここでは第１参加者１０Ａ）の会議参加状態を取得する参加状態取得部の一例である。ここでは、カメラ５８及びマイク５９の組は、各第１参加者１０Ａに対応して設けられている。このため、各組のカメラ５８及びマイク５９は、複数の第１参加者１０Ａのそれぞれの会議参加状態を取得する個別参加状態取得部の一例である。カメラ５８は、対応する第１参加者１０Ａのうち顔面を含む部分、ここでは、上半身を撮像する。これにより、カメラ５８は、第１参加者１０Ａの視線の動き、顔の傾きを撮像することができる。マイク５９は、主として対応する第１参加者１０Ａの発言による音を電気信号に変換する。マイク５９は、周りの参加者等の音を電気信号に変換してもよい。もっとも、マイク５９では、周りの参加者等の音は小さくなり、その影響は少ない。 A plurality of sets of cameras 58 and microphones 59 are an example of a participation state acquiring unit that acquires conference participation states of a plurality of participants (here, the first participant 10A). Here, a set of camera 58 and microphone 59 is provided corresponding to each first participant 10A. Therefore, each set of camera 58 and microphone 59 is an example of an individual participation state acquisition unit that acquires the conference participation state of each of the plurality of first participants 10A. The camera 58 images the part including the face of the corresponding first participant 10A, here the upper half of the body. Thereby, the camera 58 can image the movement of the line of sight and the inclination of the face of the first participant 10A. The microphone 59 converts the sound mainly made by the corresponding first participant 10A into an electric signal. The microphone 59 may convert sounds of surrounding participants and the like into electrical signals. However, with the microphone 59, the sounds of the surrounding participants are reduced, and the influence thereof is small.

参加状態取得部は、各第１参加者に対応して設けられることは必須ではない。複数の参加者の全体に対して１つの参加状態取得部が設けられていてもよい。この場合、カメラは、複数の参加者全体を撮像可能に設けられるとよい。また、マイクは、複数の参加者全体の会話による音を電気信号に変換可能に設けられるとよい。また、複数の参加者が複数のグループに分けられ、各グループに対して１つの参加状態取得部が設けられていてもよい。第２参加者に対しても参加状態取得部が設けられていてもよい。 It is not essential that the participation state acquisition unit is provided for each first participant. One participation state acquisition unit may be provided for all of the plurality of participants. In this case, the camera may be provided so as to be able to capture an image of all of the participants. Also, it is preferable that the microphones be provided so as to be able to convert the sounds of conversations of all the participants into electrical signals. Also, a plurality of participants may be divided into a plurality of groups, and one participation state acquisition unit may be provided for each group. A participation state acquisition unit may be provided for the second participant as well.

代理参加ロボット６０は、上記したように、第２参加者１０Ｂの代理として、会議への参加行為を実行できるロボットである。ここでは、代理参加ロボット６０は、ロボットとしての動作を行うロボット動作部６７を含む。ここでは、ロボット動作部６７は、スピーカ６８及び腕駆動部６９を含む。制御部４０による制御下、代理参加ロボット６０は、スピーカ６８を通じて音声を発することができる。また、腕駆動部６９は、少なくとも１つの腕を上下に駆動するモータ等を含んでいる。腕駆動部６９は、制御部４０による制御下、代理参加ロボット６０の腕を上げ（挙手）たり、下げたりする動作を実行することができる。代理参加ロボット６０は、人型ロボットであってもよいし、その他の多関節ロボット等であってもよい。代理参加ロボット６０が人型ロボットであれば、第２参加者１０Ｂが代理参加ロボット６０を通じて会議に参加する行為が、第１参加者１０Ａに対して違和感を与え難い。このため、円滑な会議の進行を期待できる。 As described above, the proxy participation robot 60 is a robot capable of participating in the conference as a proxy for the second participant 10B. Here, the proxy participation robot 60 includes a robot action unit 67 that operates as a robot. Here, the robot action section 67 includes a speaker 68 and an arm drive section 69 . Under the control of the control unit 40 , the substitute participation robot 60 can emit sound through the speaker 68 . The arm driving section 69 also includes a motor or the like for driving at least one arm up and down. Under the control of the control unit 40 , the arm drive unit 69 can raise (raise) or lower the arm of the proxy participation robot 60 . The substitute participation robot 60 may be a humanoid robot, or may be another articulated robot or the like. If the proxy participation robot 60 is a humanoid robot, the act of the second participant 10B participating in the conference through the proxy participation robot 60 is less likely to cause discomfort to the first participant 10A. Therefore, smooth progress of the conference can be expected.

代理参加ロボット６０は、会議が行われる現実空間において、会議への参加動作が可能な擬似会議参加装置の一例である。会議への参加動作としては、会議において発言予定内容があることを示す動作、発言する動作、賛否を示す動作等が想定される。これらの動作は、スピーカを通じた発言、ロボットによる行動の他、表示装置による表示動作によっても実現可能である。このため、会議への参加動作が可能な擬似会議参加装置としては、何らかの行動が可能なロボットの他、音声を発するスピーカ、発言予定内容の有無、発言内容、賛否等を表示可能な表示装置等が用いられてもよい。 The proxy participation robot 60 is an example of a pseudo-conference participation device capable of participating in a conference in the real space where the conference is held. As the operation to participate in the conference, an operation indicating that there is a scheduled speech content in the conference, an operation to speak, an operation to indicate approval or disapproval, and the like are assumed. These actions can be realized by speech through a speaker, action by a robot, and display action by a display device. For this reason, a pseudo-conference participation device capable of participating in a conference includes a robot capable of some action, a speaker that emits voice, a display device that can display the presence or absence of planned speech content, speech content, approval or disapproval, etc. may be used.

制御部４０は、複数組のカメラ５８及びマイク５９で得られた複数の第１参加者１０Ａの会議参加状態に基づいて、複数の第１参加者１０Ａによるコミュニケーションの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定する。ここでは、制御部４０は、少なくとも１つのプロセッサとしてのＣＰＵ（Central Processing Unit）４１、ＲＡＭ（Random Access Memory）４２、記憶部４３、入出力部４４，通信回路４５及びロボット制御入出力部４６等がバスライン４８を介して相互接続されたコンピュータによって構成されている。 Based on the conference participation states of the plurality of first participants 10A obtained by the plurality of sets of cameras 58 and microphones 59, the control unit 40 obtains the ease of interrupting communication by the plurality of first participants 10A, and Whether or not it is time to speak is determined based on the likelihood of interruption. Here, the control unit 40 includes a CPU (Central Processing Unit) 41 as at least one processor, a RAM (Random Access Memory) 42, a storage unit 43, an input/output unit 44, a communication circuit 45, a robot control input/output unit 46, and the like. are composed of computers interconnected via a bus line 48 .

記憶部４３は、フラッシュメモリ、あるいは、ハードディスク装置等の不揮発性の記憶装置によって構成されている。記憶部４３には、複数組のカメラ５８及びマイク５９で得られた複数の第１参加者１０Ａの会議参加状態に基づいて、複数の第１参加者１０Ａによるコミュニケーションの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定するプログラム４３ａが格納されている。ここでは、プログラム４３ａは、後述する学習済モデル４９を含んでおり、当該学習済モデル４９に基づいて、複数組のカメラ５８及びマイク５９で得られた複数の第１参加者１０Ａの会議参加状態に基づいて、複数の第１参加者１０Ａによるコミュニケーションの遮りやすさを求める処理プログラムを含む。また、プログラム４３ａは、第２参加者１０Ｂの発言を音声データ４３ｃとして記憶部４３に格納する処理、及び、発言タイミングであると判定されたときに、ロボット動作部６７を駆動制御する処理を実行するためのプログラムをも含む。つまり、制御部４０は、複数の参加者１０Ａ、１０Ｂのうちの少なくとも一人の第２参加者１０Ｂの発言予定内容を記録する発言予定記録部として記憶部４３を含む。プログラム４３ａによる処理例については後述する。 The storage unit 43 is composed of a nonvolatile storage device such as a flash memory or a hard disk device. In the storage unit 43, based on the conference participation states of the plurality of first participants 10A obtained by the plurality of sets of cameras 58 and microphones 59, the ease of interrupting communication by the plurality of first participants 10A is obtained, and the obtained A program 43a is stored for determining whether or not it is time to speak based on the likelihood of interruption. Here, the program 43a includes a learned model 49, which will be described later. based on, a processing program for determining how easily communication is interrupted by a plurality of first participants 10A. Further, the program 43a executes a process of storing the utterance of the second participant 10B as the voice data 43c in the storage unit 43, and a process of driving and controlling the robot operation unit 67 when it is determined that it is time to utter. Also includes a program for That is, the control unit 40 includes the storage unit 43 as a speech schedule recording unit that records the speech schedule contents of the second participant 10B, which is at least one of the participants 10A and 10B. An example of processing by the program 43a will be described later.

ＲＡＭ４２は、ＣＰＵ４１が所定の処理を行う際の作業領域として供される。入出力部４４は、複数組のカメラ５８及びマイク５９に接続されており、当該カメラ５８及びマイク５９からの信号が入出力部４４を介して本制御部４０に入力される。制御部４０と複数組のカメラ５８及びマイク５９とは、有線通信可能に接続されていてもよいし、無線通信可能に接続されていてもよい。通信回路４５は、通信回線７０を通じて端末装置８０に相互通信可能に接続される。ロボット制御入出力部４６は、ロボット動作部６７にロボット動作部６７の動作制御可能に接続されている。このため、本制御部４０は、ロボット動作部６７を動作制御することができる。 The RAM 42 is used as a working area when the CPU 41 performs predetermined processing. The input/output unit 44 is connected to a plurality of sets of cameras 58 and microphones 59 , and signals from the cameras 58 and microphones 59 are input to the main control unit 40 via the input/output unit 44 . The control unit 40 and the plurality of sets of cameras 58 and microphones 59 may be connected for wired communication, or may be connected for wireless communication. The communication circuit 45 is connected to the terminal device 80 through the communication line 70 so as to be mutually communicable. The robot control input/output unit 46 is connected to the robot operation unit 67 so as to control the operation of the robot operation unit 67 . Therefore, the controller 40 can control the operation of the robot operation unit 67 .

この制御部４０では、プログラム４３ａに記述された手順、端末装置８０を通じて入力される第２参加者１０Ｂの発言に関する動作、複数組のカメラ５８及びマイク５９からの入力等に従って、ＣＰＵ４１が、演算処理を行うことにより、ロボット動作部６７の制御を行う機能を実行する。 In this control unit 40, the CPU 41 performs arithmetic processing according to the procedure described in the program 43a, the operation related to the speech of the second participant 10B input through the terminal device 80, the input from the multiple sets of cameras 58 and the microphone 59, etc. , the function of controlling the robot operation unit 67 is executed.

本例では、制御部４０は、代理参加ロボット６０に組込まれている。制御部４０は、代理参加ロボット６０外に設けられていてもよい。 In this example, the control unit 40 is incorporated in the proxy participating robot 60 . The control unit 40 may be provided outside the proxy participation robot 60 .

なお、上記プログラム４３ａは、予め記憶部４３に格納されているものであるが、ＣＤ－ＲＯＭあるいはＤＶＤ－ＲＯＭ、外部のフラッシュメモリ等の記録媒体に記録された形態で、あるいは、ネットワークを介した外部サーバからのダウンロードなどにより、既存のコンピュータに提供されることもあり得る。 The program 43a is stored in advance in the storage unit 43, but may be recorded in a recording medium such as a CD-ROM, a DVD-ROM, or an external flash memory, or transmitted via a network. It may be provided to an existing computer by downloading from an external server or the like.

なお、上記制御部４０において実現される各機能の一部又は全部は、１つのプロセッサによって処理されてもよいし、複数のプロセッサによって適宜分散して処理されてもよい。上記制御部４０において実現される各機能の一部又は全部は、第２参加者１０Ｂ側の端末装置等、会議室外のコンピュータによって実現されてもよい。 Some or all of the functions realized by the control unit 40 may be processed by one processor, or may be processed by a plurality of processors in a distributed manner. Some or all of the functions realized by the control unit 40 may be realized by a computer outside the conference room, such as a terminal device on the side of the second participant 10B.

端末装置８０は、上記制御部４０と同様に、少なくとも１つのプロセッサとしてのＣＰＵ、ＲＡＭ４２、記憶部、入出力部，通信回路等を備えるコンピュータによって構成されている。端末装置８０は、第２参加者１０Ｂ側に設けられている。端末装置８０は、通信回線７０を通じて制御部４０に相互通信可能に接続されている。 As with the control unit 40, the terminal device 80 is configured by a computer including a CPU as at least one processor, a RAM 42, a storage unit, an input/output unit, a communication circuit, and the like. The terminal device 80 is provided on the side of the second participant 10B. The terminal device 80 is connected to the control unit 40 through the communication line 70 so as to be able to communicate with each other.

マイク８６は、第２参加者１０Ｂの発言による音を電気信号として変換可能な位置に設けられている。スピーカ８７は、第２参加者１０Ｂに対して音を提供可能な位置に設けられ、表示装置８８は第２参加者１０Ｂに対して表示内容を提供可能な位置に設けられている。マイク８６、スピーカ８７及び表示装置８８は、端末装置８０に接続されている。 The microphone 86 is provided at a position capable of converting the sound produced by the second participant 10B into an electrical signal. The speaker 87 is provided at a position capable of providing sound to the second participant 10B, and the display device 88 is provided at a position capable of providing display content to the second participant 10B. A microphone 86 , a speaker 87 and a display device 88 are connected to the terminal device 80 .

そして、第２参加者１０Ｂによる発言がマイク８６を通じて取得し、この発言に基づく音声データが端末装置８０から通信回線７０を通じて制御部４０に送信される。また、例えば、上記複数組のカメラ５８及びマイク５９を通じて得られた各第１参加者１０Ａの発言及び動画データが、制御部４０から通信回線７０を介して端末装置８０に送信され、各発言がスピーカ８７を通じて再生され、また、各第１参加者１０Ａの動画データが表示装置８８に表示される。第２参加者１０Ｂが所定の発言を行いたい場合に、発言すると、その発言予定内容はマイク８６にて取得される。この発言予定内容は、そのまま会議室において再生されるのではなく、第２参加者１０Ｂによる発言の意思指令として取得され、後述する発言タイミングで、代理参加ロボット６０を通じて発言される。この点において、マイク８６は、第２参加者１０Ｂによる発言の意思指令を取得する発言指令受付部として機能する。発言指令受付部は、その他のスイッチ、例えば、キーボード等であってもよいし、発言専用のスイッチであってもよい。 Then, the speech by the second participant 10B is acquired through the microphone 86, and voice data based on this speech is transmitted from the terminal device 80 to the control section 40 through the communication line 70. FIG. Further, for example, the utterances and video data of each first participant 10A obtained through the plurality of sets of cameras 58 and microphones 59 are transmitted from the control unit 40 to the terminal device 80 via the communication line 70, and each utterance is It is reproduced through the speaker 87, and the video data of each first participant 10A is displayed on the display device 88. FIG. When the second participant 10B wants to make a predetermined statement, when he or she makes a statement, the content of the planned statement is acquired by the microphone 86 . The contents of the speech schedule are not reproduced as they are in the conference room, but are obtained as an intention command of speech by the second participant 10B, and are spoken through the proxy participation robot 60 at the speech timing described later. In this regard, the microphone 86 functions as a speech command reception unit that acquires a speech intention command from the second participant 10B. The speech command receiving unit may be another switch such as a keyboard, or may be a switch dedicated to speech.

図５は学習済モデル４９の一例を示す図である。学習済モデル４９は、機械学習により生成されたモデルであり、上記コンピュータの一種である制御部４０において用いられる。学習済モデル４９は、特徴量抽出部４９ａと、識別層４９ｂとを含む。 FIG. 5 is a diagram showing an example of the trained model 49. As shown in FIG. The learned model 49 is a model generated by machine learning, and is used in the control unit 40, which is a kind of computer. The trained model 49 includes a feature amount extraction unit 49a and a discrimination layer 49b.

特徴量抽出部４９ａは、入力データである音声データ及び動画データから特徴量を抽出する。例えば、音声データは、上記マイク５９を通じて取得され、動画データは、上記カメラ５８を通じて取得される。音声データ４３ｃからは、例えば、特徴量として音量が抽出される。動画データからは、例えば、特徴量として、視線（目）の動き及び顔の傾き抽出される。識別層４９ｂは、パーセプトロンを複数層組合わせた周知の多層ニューラルネットワークによって構成されている。 The feature amount extraction unit 49a extracts feature amounts from audio data and video data that are input data. For example, audio data is acquired through the microphone 59 and video data is acquired through the camera 58 . For example, volume is extracted as a feature amount from the audio data 43c. For example, motion of the line of sight (eyes) and inclination of the face are extracted as feature amounts from the moving image data. The discrimination layer 49b is composed of a well-known multilayer neural network in which multiple layers of perceptrons are combined.

学習済モデル４９は、図６に示すように、模擬的又は実際におこなわれた会議において、いずれかの第１参加者１０Ａに応じた入力データ（所定時間における音声データ、動画データ）に対し、コミュニケーションを遮りやすいタイミングとしての適否かを区別した複数の学習用データに基づき学習された学習済モデルである。かかる学習用モデルは、例えば、本モデルの設計者が会議又は会議の入力データを実際に見聞き等してコミュニケーションを遮りやすいタイミングか否かを判別すること、又は、第１参加者１０Ａが実際に発言したか否かを判別することによって、生成可能である。 As shown in FIG. 6, the trained model 49, in a simulated or actually held conference, for input data (speech data, video data at a predetermined time) corresponding to one of the first participants 10A, This is a trained model learned based on a plurality of learning data that distinguishes whether or not it is appropriate as a timing at which communication is likely to be interrupted. Such a learning model, for example, determines whether the designer of this model actually sees and hears the input data of the conference or the conference and determines whether it is the timing when communication is likely to be interrupted, or the first participant 10A actually It can be generated by judging whether or not a statement has been made.

そして、図５に示すように、制御部４０のＣＰＵ４１が、プログラム４３ａの指示に従って、いずれかの第１参加者１０Ａに応じた入力データ（音声データ、動画データ）に対し、識別層４９ｂにおける学習済の重み係数と応答関数等に基づく演算を行うことによって、識別層４９ｂの出力として、コミュニケーションの遮りやすさを示す結果を出力することができる。コミュニケーションを遮り易いときに出力１を出力し、コミュニケーションを遮り難いときに出力２を出力してもよい。あるいは、コミュニケーションを遮り易さを数値化した指標値（適合度）を出力１に出力し、コミュニケーションを遮り難さを数値化した指標値（適合度）を出力２に出力してもよい。ここでは、前者であるとして説明する。 Then, as shown in FIG. 5, the CPU 41 of the control unit 40 performs learning in the identification layer 49b for input data (audio data, moving image data) corresponding to one of the first participants 10A, according to instructions of the program 43a. By performing calculations based on the weighting coefficients and the response function, etc., which have already been used, the identification layer 49b can output a result indicating how easily communication is interrupted. Output 1 may be output when communication is likely to be interrupted, and output 2 may be output when communication is difficult to be interrupted. Alternatively, an index value (suitability) that quantifies the likelihood of interrupting communication may be output as output 1, and an index value (suitability) that quantifies the difficulty of interrupting communication may be output as output 2. Here, the former case will be explained.

コミュニケーションの遮りやすさを求めるにあたって機械学習を適用する際、その手法はニューラルネットワークに限られず、他の手法、例えば、サポートベクターマシン等を適用してもよい。コミュニケーションの遮りやすさを求めるにあたって機械学習を適用することは必須ではなく、例えば、マイク５９の音量、動画データに基づく目の動き、顔の動きを数値化し、数値化された各値に基づいて所定の条件式を事前に設定し、当該条件式に基づいてコミュニケーションの遮りやすさを求めてもよい。また、目の動き、顔の動き等から、参加者がスマートフォンを見たか否か、資料を見たか否か等を判別し、その判別結果に基づいて、コミュニケーションの遮りやすさを求めてもよい。例えば、スマートフォンを見た場合、資料を見た場合は、会議のコミュニケーションを遮りやすさは高いといえる。 When applying machine learning to determine the ease of interrupting communication, the technique is not limited to neural networks, and other techniques such as support vector machines may be applied. It is not essential to apply machine learning to find the ease of blocking communication. A predetermined conditional expression may be set in advance, and the degree of interruption of communication may be obtained based on the conditional expression. In addition, it is possible to determine whether or not the participant has looked at the smartphone or the document based on eye movements, facial movements, etc., and based on the determination results, determine how easily communication is interrupted. . For example, when looking at a smartphone, when looking at a document, it can be said that there is a high possibility of interrupting communication in a meeting.

制御部４０は、学習済モデル４９の出力に基づいて発言タイミングであるか否かを判定する。学習済モデル４９から選択的に出力１及び出力２が出力される場合には、当該出力に従えばよい。出力１又は出力２から多段階に数値化された指標値が出力される場合には、当該指標値を、事前に設定された閾値と比較して、発言タイミングであるか否かを判定してもよい。例えば、出力１からコミュニケーションの遮りやすさを数値化した指標値が出力される場合、当該指標値が閾値以上となる場合又は閾値を超える場合に、発言タイミングであると判定してもよい。 Based on the output of the learned model 49, the control unit 40 determines whether or not it is time to speak. When output 1 and output 2 are selectively output from the trained model 49, the output may be followed. When an index value quantified in multiple stages is output from output 1 or output 2, the index value is compared with a preset threshold to determine whether or not it is time to speak. good too. For example, when an index value that quantifies the likelihood of interrupting communication is output from the output 1, it may be determined that it is time to speak when the index value is equal to or greater than a threshold value or exceeds the threshold value.

図７は制御部４０の動作例を示すフローチャートである。 FIG. 7 is a flow chart showing an operation example of the control unit 40. As shown in FIG.

会議の開始により処理が開始される。ステップＳ１において発言意思の有無が判定される。ここで、第２参加者１０Ｂがマイク８６を通じて会議向けの発言を行うと、当該発言がマイク８６を通じて取得される。発言に対応する音声データは、端末装置８０、通信回線７０を通じて制御部４０に送られる。制御部４０は、端末装置８０側から第２参加者１０Ｂの音声データが送信された場合、当該音声データ（第２参加者１０Ｂの発言予定内容）を記憶部４３に音声データ４３ｃとして記憶する。また、かかる音声データの受信は、第２参加者１０Ｂの発言意思指令として入力される。このため、制御部４０は、第２参加者１０Ｂから音声データを受信した場合、発言意思ありと判定し、音声データの受信が無い場合、発言意思無しと判定する。発言意思の指令は、上記音声データに基づかず、第２参加者１０Ｂ側のスイッチ操作によって入力されてもよい。発言予定内容は、音声データによる他、キーボード入力等された文字データとして記憶されてもよい。ステップＳ１において発言意思無しと判定された場合、ステップＳ２において、所定の待機時間待機した後、スタートに戻ってステップＳ１以降の処理を繰返す。ステップＳ１において発言意思有りと判定されると、ステップＳ３に進む。 Processing begins with the start of the conference. In step S1, it is determined whether or not there is an intention to speak. Here, when the second participant 10B makes a speech for the conference through the microphone 86, the speech is acquired through the microphone 86. FIG. Voice data corresponding to the utterance is sent to the control section 40 through the terminal device 80 and the communication line 70 . When voice data of the second participant 10B is transmitted from the terminal device 80 side, the control unit 40 stores the voice data (contents of scheduled speech of the second participant 10B) in the storage unit 43 as voice data 43c. Also, the reception of such voice data is input as a speech intention command of the second participant 10B. Therefore, the control unit 40 determines that there is an intention to speak when voice data is received from the second participant 10B, and determines that there is no intention to speak when no voice data is received. The command of the intention to speak may be input by operating a switch on the side of the second participant 10B, not based on the voice data. The scheduled speech content may be stored as character data input from a keyboard or the like, in addition to being stored as voice data. If it is determined in step S1 that there is no intention to speak, in step S2, after waiting for a predetermined waiting time, the process returns to the start and the processes after step S1 are repeated. If it is determined that there is an intention to speak in step S1, the process proceeds to step S3.

本例では、記憶部４３が発言予定記録部として参加者１０のうちの少なくとも一人の発言予定内容を記録する例で説明する。しかしながら、発言予定記録部には、参加者の発言予定内容ではなく、コンピュータによる人工的なシステムによる発言予定内容が記録されていてもよい。例えば、コンピュータ等による人工的な会議支援システムが、会議を遮って参加者に伝えたい情報（発言予定内容）として、「あと５分で会議を終了してください」といった内容を有している場合、かかる発言予定内容を発言予定記録部に記録してもよい。 In this example, an example will be described in which the storage unit 43 serves as a statement schedule recording unit and records the statement schedule contents of at least one of the participants 10 . However, the speech schedule recording section may record the speech schedule contents by an artificial computer system instead of the speech schedule contents of the participants. For example, when an artificial meeting support system using a computer, etc., has content such as "Please finish the meeting in 5 minutes" as information (scheduled speech content) that you want to interrupt the meeting and convey to the participants. , the contents of the speech schedule may be recorded in the speech schedule recording section.

ステップＳ３では、制御部４０は、各組のカメラ５８及びマイク５９からの出力を学習済モデル４９に入力し遮りやすさを求める。ここでは、制御部４０は、複数の第１参加者１０Ａのそれぞれについて、遮りやすさを求める。 In step S3, the control unit 40 inputs the output from each set of the camera 58 and the microphone 59 to the learned model 49 and obtains the degree of obstruction. Here, the control unit 40 obtains the easiness of obstruction for each of the plurality of first participants 10A.

次ステップＳ４では、ステップＳ３で求められた遮りやすさにもとづいて、発言タイミングであるか否かを判定する。上記のように、学習済モデル４９が、遮りやすさを示す２通の出力を出力する場合、当該出力に応じて発言タイミングであるか否かを判定すればよい。学習済モデル４９が遮りやすさを多段階で数値化した指標値を出力する場合、当該指標値と所定の閾値とに基づいて発言タイミングであるか否かを判定すればよい。この場合、第１参加者１０Ａ別に閾値を変えてもよい。また、会議の雰囲気に合わせて、閾値を変更してもよい。例えば、発言が少ないような場合等には、閾値を小さくし、第２参加者１０Ｂが容易に発言できるようにしてもよい。また、閾値は、会議の議長又は司会者がマニュアルで変更してもよいし、コンピュータが発言量等に鑑みて変更してもよい。また、ここでは、制御部４０は、それぞれの第１参加者１０Ａについて、求められた遮りやすさに基づいて発言タイミングであるか否かを判定する。そして、全ての第１参加者１０Ａについて、発言タイミングであると判定されると、会議における発言タイミングであると判定する。つまり、第１参加者１０Ａのうちの少なくとも一人に関して、発言に適さないタイミングであると判定されると、発言タイミングではないと判定する。 In the next step S4, it is determined whether or not it is time to speak based on the likelihood of interruption obtained in step S3. As described above, when the learned model 49 outputs two outputs indicating the likelihood of interruption, whether or not it is time to speak may be determined according to the outputs. When the learned model 49 outputs an index value that quantifies the likelihood of interruption in multiple stages, it is sufficient to determine whether or not it is time to speak based on the index value and a predetermined threshold value. In this case, the threshold may be changed for each first participant 10A. Also, the threshold may be changed according to the atmosphere of the conference. For example, when the number of statements is small, the threshold value may be made small so that the second participant 10B can easily make statements. Also, the threshold may be changed manually by the conference chair or moderator, or may be changed by the computer in consideration of the amount of speech. Further, here, the control unit 40 determines whether or not it is time to speak for each of the first participants 10A based on the obtained degree of obstruction. Then, when it is determined that it is time to speak for all the first participants 10A, it is determined that it is time to speak in the conference. That is, when it is determined that the timing is not suitable for speaking for at least one of the first participants 10A, it is determined that it is not the timing to speak.

ここでは、第１参加者１０Ａ別に、遮りやすさ、及び、発言タイミングであるか否かを判定しているが、会議の場全体として会議状態を取得し、会議全体として遮りやすさ、発言タイミングでるか否かを判定してもよい。 Here, the likelihood of interruption and whether or not it is time to speak is determined for each first participant 10A. It may be determined whether or not

ステップＳ４において発言タイミングであると判定されると、ステップＳ５に進む。ステップＳ５では、制御部４０は、記憶部４３に記憶された発言予定内容（ここでは音声データ４３ｃ）を、スピーカ６８を通じて発言する。発言予定内容が文字データとして記憶されている場合には、合成音声等を、スピーカ６８を通じて発言してもよい。スピーカ６８は、代理参加ロボット６０に組込まれているため、周囲の第１参加者１０Ａからは、代理参加ロボット６０を通じて参加する第２参加者１０Ｂの発言として、違和感少なく受止められる。発言終了後、スタートに戻って、ステップＳ１以降の処理を繰返す。 If it is determined in step S4 that it is time to speak, the process proceeds to step S5. In step S<b>5 , the control unit 40 speaks through the speaker 68 the planned speech content (here, the voice data 43 c ) stored in the storage unit 43 . When the contents of the scheduled speech are stored as character data, synthesized speech or the like may be spoken through the speaker 68 . Since the speaker 68 is built into the proxy participation robot 60, the surrounding first participant 10A can accept the utterance of the second participant 10B participating through the proxy participation robot 60 with little sense of incongruity. After finishing the speech, it returns to the start and repeats the processing after step S1.

ステップＳ４において発言タイミングでないと判定されると、ステップＳ６に進む。ステップＳ６では、制御部４０は、ロボット動作部６７に第１動作を実行させる。第１動作は、遮りやすさを上げる動作である。第１動作としては、発言があることを示す動き（例えば、挙手）、発言があることを示す短い発言（例えば、“Excuse me”、「すみません」、“I would like to say something”、「発言したいことがあります」等）等が想定される。擬似会議参加装置が人型ロボットである場合、無い場合において、同様の短い発言を、スピーカを通じて再生すること、表示装置に発言がある旨を表示すること（短い発言の表示、発言を示す発光表示等）、ビープ音等を発生させること、振動を生じさせること等が想定される。 If it is determined in step S4 that it is not time to speak, the process proceeds to step S6. In step S6, the control unit 40 causes the robot motion unit 67 to perform the first motion. The first operation is an operation for increasing the ease of blocking. The first action is a movement indicating that there is a statement (for example, raising a hand), a short statement indicating that there is a statement (for example, “Excuse me”, “Excuse me”, “I would like to say something”, “Speak”). There is something I want to do,” etc.) is assumed. When the pseudo-conference participation device is a humanoid robot or when there is no such device, the similar short speech is reproduced through the speaker, and the fact that there is a speech is displayed on the display device (display of the short speech, light-emitting display indicating the speech) etc.), generating a beep sound or the like, generating vibration, and the like.

次ステップＳ７において、ステップＳ３と同様に、遮りやすさを求める。 In the next step S7, similarly to step S3, the likelihood of obstruction is determined.

次ステップＳ８において、ステップＳ４と同様に、ステップＳ７で求められた遮りやすさにもとづいて、発言タイミングであるか否かを判定する。発言タイミングであると判定されると、ステップＳ５に進み、発言を行う。発言タイミングでないと判定されると、ステップＳ９に進む。 In the next step S8, similarly to step S4, it is determined whether or not it is time to speak based on the likelihood of interruption obtained in step S7. If it is determined that it is time to speak, the process proceeds to step S5, and speech is made. If it is determined that it is not time to speak, the process proceeds to step S9.

ステップＳ９では、第１動作後、予め定められた所定時間経過したか否かが判定される。所定時間経過前と判定されると、ステップＳ１０に進み、所定時間待機し、ステップＳ１７に戻り、ステップＳ７以降の処理を繰返す。所定時間経過後と判定されると、ステップＳ１１に進む。なお、第１動作後の経過時間が所定時間丁度である場合、ステップＳ１０、Ｓ１１のいずれに進んでもよい。 In step S9, it is determined whether or not a predetermined time has passed after the first action. If it is determined that the predetermined time has not passed yet, the process proceeds to step S10, waits for a predetermined time, returns to step S17, and repeats the processes after step S7. If it is determined that the predetermined time has elapsed, the process proceeds to step S11. If the elapsed time after the first action is exactly the predetermined time, the process may proceed to either step S10 or S11.

ステップＳ１１では、制御部４０は、ロボット動作部６７に第２動作を実行させる。第２動作は、第１動作と同様に、遮りやすさを上げる動作である。第２動作は第１動作と同じであってもよいし、異なっていてもよい。第２動作は、第１動作よりも遮りやすさを上げるのに効果的な動作であってもよい。例えば、第１動作が発言では無い動作であり、第２動作が短い発話であることが想定される。より具体的には、第１動作が挙手、表示装置への表示、又は、発光表示装置の発光であり、第２動作が“Excuse me”、「すみません」、「あの～」等の発言である場合が想定される。また、第１動作が意味の無い発言であり、第２動作が自らの発言を求める要求発言である場合が想定される。例えば、第１動作が“Excuse me”、「すみません」、「あの～」等の発言であり、第２動作が“I would like to say something”、「発言したいことがあります」等の発言であることが想定される。また、第１動作が丁寧な発言であり、第２動作が丁寧では無い発言であることが想定される。また、第１動作が短い発言であり、第２動作が長い発言であることが想定される。さらに、第１動作の発言よりも、第２動作の発言の音量が大きいことも想定される。 In step S11, the control unit 40 causes the robot motion unit 67 to perform the second motion. The second action is, like the first action, an action that increases the likelihood of blocking. The second action may be the same as or different from the first action. The second action may be an action that is more effective than the first action in increasing the likelihood of obstruction. For example, it is assumed that the first action is a non-speech action and the second action is a short speech. More specifically, the first action is raising the hand, displaying on the display device, or emitting light from the light-emitting display device, and the second action is saying "Excuse me", "Excuse me", "Uh~", etc. is assumed. Also, it is assumed that the first action is a meaningless utterance and the second action is a request utterance requesting one's own utterance. For example, the first action is "Excuse me", "Excuse me", "Ano~", etc., and the second action is "I would like to say something", "I have something to say", etc. is assumed. Also, it is assumed that the first action is a polite utterance and the second action is an unpolite utterance. Also, it is assumed that the first action is a short utterance and the second action is a long utterance. Furthermore, it is also assumed that the volume of the utterance of the second action is louder than that of the utterance of the first action.

次ステップＳ１２では、ステップＳ３と同様に、遮りやすさを求める。 In the next step S12, similarly to step S3, the likelihood of obstruction is obtained.

次ステップＳ１３において、ステップＳ４と同様に、ステップＳ１２で求められた遮りやすさにもとづいて、発言タイミングであるか否かを判定する。発言タイミングであると判定されると、ステップＳ５に進み、発言を行う。発言タイミングでないと判定されると、ステップＳ１４に進む。 In the next step S13, similarly to step S4, it is determined whether or not it is time to speak based on the likelihood of interruption obtained in step S12. If it is determined that it is time to speak, the process proceeds to step S5, and speech is made. If it is determined that it is not time to speak, the process proceeds to step S14.

ステップＳ１４では、第２動作後、予め定められた所定時間経過したか否かが判定される。所定時間経過前と判定されると、ステップＳ１５に進み、所定時間待機し、ステップＳ１２に戻り、ステップＳ１２以降の処理を繰返す。所定時間経過後と判定されると、ステップＳ１６に進む。なお、第２動作後の経過時間が所定時間丁度である場合、ステップＳ１５、Ｓ１６のいずれに進んでもよい。 In step S14, it is determined whether or not a predetermined time has passed after the second action. If it is determined that the predetermined time has not passed yet, the process proceeds to step S15, waits for a predetermined time, returns to step S12, and repeats the processes after step S12. If it is determined that the predetermined time has elapsed, the process proceeds to step S16. Note that if the elapsed time after the second action is exactly the predetermined time, the process may proceed to either step S15 or S16.

ステップＳ１６では、制御部４０は、ロボット動作部６７に第３動作を実行させる。第３動作は、第１動作と同様に、遮りやすさを上げる動作である。第３動作は第１動作、第２動作と同じであってもよいし、異なっていてもよい。第３動作は、第２動作よりも遮りやすさを上げるのに効果的な動作であってもよい。つまり、後に行われる動作であるほど、遮りやすさを上げるのに効果的な動作であってもよい。例えば、第１動作が発言では無い動作であり、第２動作が短い発話であり、第２動作が自らの発言を求める要求発言である場合が想定される。例えば、第１動作が挙手であり、第２動作が“Excuse me”、「すみません」、「あの～」等の発言であり、第３動作が“I would like to say something”、「発言したいことがあります」等の発言であることが想定される。その他、遮りやすさを上げるのに効果的な動作内容については、ステップＳ１１で説明したのと同様の考えを適用できる。 In step S16, the control section 40 causes the robot motion section 67 to perform the third motion. The third action is an action that increases the likelihood of obstruction, similar to the first action. The third action may be the same as or different from the first and second actions. The third action may be an action that is more effective than the second action in increasing the likelihood of obstruction. In other words, the later the action is, the more effective the action may be in increasing the likelihood of blocking. For example, it is assumed that the first action is a non-utterance action, the second action is a short utterance, and the second action is a request utterance requesting one's own utterance. For example, the first action is to raise your hand, the second action is to say "Excuse me", "I'm sorry", "Uh~", etc., and the third action is "I would like to say something", "I want to say something". It is assumed that it is a statement such as "There is In addition, the same idea as explained in step S11 can be applied to the contents of the operation that are effective in increasing the susceptibility to obstruction.

次ステップＳ１７では、ステップＳ３と同様に、遮りやすさを求める。 In the next step S17, similar to step S3, the likelihood of obstruction is obtained.

次ステップＳ１８において、ステップＳ４と同様に、ステップＳ１７で求められた遮りやすさにもとづいて、発言タイミングであるか否かを判定する。発言タイミングであると判定されると、ステップＳ５に進み、発言を行う。発言タイミングでないと判定されると、ステップＳ１９に進む。 In the next step S18, similarly to step S4, it is determined whether or not it is time to speak based on the likelihood of interruption obtained in step S17. If it is determined that it is time to speak, the process proceeds to step S5, and speech is made. If it is determined that it is not time to speak, the process proceeds to step S19.

ステップＳ１９では、第３動作後、予め定められた所定時間経過したか否かが判定される。所定時間経過前と判定されると、ステップＳ２０に進み、所定時間待機し、ステップＳ１７に戻り、ステップＳ１７以降の処理を繰返す。所定時間経過後と判定されると、ステップＳ５に進む。なお、第３動作後の経過時間が所定時間丁度である場合、ステップＳ５、Ｓ２０のいずれに進んでもよい。つまり、第３動作後、所定時間経過した場合、遮りやすさの程度、発言タイミングであるか否かに拘らず、強制的に発言を行う。 In step S19, it is determined whether or not a predetermined time has passed after the third action. If it is determined that the predetermined time has not elapsed, the process proceeds to step S20, waits for a predetermined time period, returns to step S17, and repeats the processes after step S17. If it is determined that the predetermined time has elapsed, the process proceeds to step S5. If the elapsed time after the third action is exactly the predetermined time, the process may proceed to either step S5 or S20. That is, when a predetermined time has passed after the third action, the speaker is forced to speak regardless of the degree of interruption and regardless of whether or not it is time to speak.

上記処理において、第３動作後、所定時間経過した場合には、発言を行わずに、スタートに戻ってもよい。また、第３動作を行うための処理、第２動作及び第３動作を行うための処理は省略されてもよい。 In the above process, if a predetermined time has passed after the third action, the player may return to the start without speaking. Further, the process for performing the third action and the processes for performing the second and third actions may be omitted.

以上のように構成された会議支援システム３０によると、参加状態取得部であるカメラ５８及びマイク５９を通じて得られた複数の第１参加者１０Ａの参加状態に基づいて、複数の第１参加者１０Ａによるコミュニケーションの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定するため、会議において、発言するのに適したタイミングを見出せる。 According to the conference support system 30 configured as described above, based on the participation states of the first participants 10A obtained through the camera 58 and the microphone 59, which are the participation state acquisition unit, the plurality of first participants 10A Since it is determined whether or not it is time to speak based on the obtained ease of interruption, it is possible to find a suitable timing for speaking in a conference.

また、第２参加者１０Ｂが予め発言予定内容を記憶部４３に記憶させておけば、制御部４０は、発言タイミングであると判定されたときに、その発言予定内容を、スピーカ６８を通じて発言する。このため、第２参加者１０Ｂは、予め発予定言内容を記録しておけば、適切なタイミングで発言することができる。 Further, if the second participant 10B stores the scheduled speech contents in the storage unit 43 in advance, the control unit 40 speaks the planned speech contents through the speaker 68 when it is determined that it is time to speak. . Therefore, the second participant 10B can speak at an appropriate timing by recording the content of the speech to be made in advance.

また、特に、第２参加者１０Ｂは、通信回線７０を通じて会議に参加するため、複数の第１参加者１０Ａの会議参加状態を観察し難い。このような状況においても、第２参加者１０Ｂは、適切なタイミングで発言し易い。 In particular, since the second participant 10B participates in the conference through the communication line 70, it is difficult to observe the conference participation states of the first participants 10A. Even in such a situation, the second participant 10B is likely to speak at an appropriate timing.

また、制御部４０は、複数の第１参加者１０Ａのそれぞれに設けられたカメラ５８及びマイクからの出力に基づき、複数の第１参加者１０Ａのそれぞれの会議参加状態を把握し、各参加状態に基づいて遮りやすさを求め、発言タイミングであるか否かを判定する。このため、複数の第１参加者１０Ａのそれぞれの会議参加状態に基づいて、より適切に発言タイミングであるか否かを判定できる。 In addition, the control unit 40 grasps the conference participation status of each of the plurality of first participants 10A based on outputs from the cameras 58 and microphones provided for each of the plurality of first participants 10A, and determines each participation status. Then, it is determined whether or not it is time to speak. Therefore, it is possible to more appropriately determine whether or not it is time to speak based on the conference participation status of each of the plurality of first participants 10A.

また、発言タイミングで無いと判定されたときに、擬似会議参加装置である代理参加ロボット６０に、遮りやすさ上げる第１動作を実行させる。このため、会議において、第２参加者１０Ｂが発言し易くなる。 Also, when it is determined that it is not time to speak, the proxy participation robot 60, which is a pseudo-conference participation device, is made to perform the first action of increasing the ease of interruption. Therefore, it becomes easier for the second participant 10B to speak in the conference.

第１動作後、さらに、発言タイミングでは無いと判定されると、擬似会議参加装置である代理参加ロボット６０に、遮りやすさ上げる第２動作を実行させる。このため、会議において、第２参加者１０Ｂがより発言し易くなる。 After the first action, if it is determined that it is not time to speak, the proxy participation robot 60, which is a pseudo-conference participation device, is made to perform the second action of increasing the ease of interruption. Therefore, it becomes easier for the second participant 10B to speak in the conference.

また、第１動作実行後、さらに、発言タイミングで無いと判定されると、遮りやすさを上げるのにより効果的な第２動作を実行させるため、会議において、第２参加者１０Ｂがより発言し易くなる。 Further, after the execution of the first action, if it is determined that it is not time to speak, the second participant 10B is allowed to speak more in the conference in order to execute the second action more effective in increasing the likelihood of interruption. becomes easier.

第１実施形態を前提として各種変形例について説明する。 Various modifications will be described on the premise of the first embodiment.

第１実施形態において、制御部４０が第２参加者１０Ｂの発言予定内容を記憶部４３に記憶する代りに、又は、記憶する機能に加えて、発言タイミングであると判定されたときに、その旨を表示装置８８に表示するようにしてもよい。この場合、第２参加者１０Ｂは、当該表示を見て、マイク８６に発言を行うようにしてもよい。その発言は、端末装置８０、通信回線７０を介して、代理参加ロボット６０から発言されるようにするとよい。第２参加者１０Ｂは、発言タイミングであるとの表示を見て、発言することができるため、適切なタイミングで発言することができる。特に、第１参加者１０Ａは、現実空間において互いの会議参加状態を観察可能な状態で会議を行っているのに対し、第２参加者１０Ｂは、通信回線７０を通じて会議に参加しているため、第１参加者１０Ａの会議参加状態を観察し難い。この場合でも、第２参加者１０Ｂは、適切なタイミングで発言できるというメリットがある。 In the first embodiment, instead of or in addition to the function of storing the scheduled speech content of the second participant 10B in the storage unit 43, the control unit 40 determines that it is time to speak. A message to that effect may be displayed on the display device 88 . In this case, the second participant 10B may speak into the microphone 86 while viewing the display. It is preferable that the speech be made from the proxy participation robot 60 via the terminal device 80 and the communication line 70 . Since the second participant 10B can speak after seeing the indication that it is time to speak, the second participant 10B can speak at an appropriate timing. In particular, the first participant 10A is holding a conference in a state where they can observe each other's participation in the conference in the real space, while the second participant 10B is participating in the conference through the communication line 70. , it is difficult to observe the conference participation state of the first participant 10A. Even in this case, there is an advantage that the second participant 10B can speak at an appropriate timing.

上記例では、第２参加者１０Ｂが一人である例で説明したが、第２参加者１０Ｂは複数人存在していてもよい。 In the above example, an example in which there is one second participant 10B has been described, but there may be a plurality of second participants 10B.

上記例では、第１参加者１０Ａに参加状態取得部であるカメラ５８及びマイク５９を設けた例で説明したが、第２参加者１０Ｂに参加状態取得部が設けられ、第１参加者１０Ａ及び第２参加者１０Ｂの会議参加状態に基づいて、遮りやすさを求め、適切な発言タイミングであるか否かを判定してもよい。 In the above example, the first participant 10A is provided with the camera 58 and the microphone 59, which are the participation state acquisition unit, but the second participant 10B is provided with the participation state acquisition unit, Based on the conference participation state of the second participant 10B, it may be determined whether or not it is an appropriate speech timing by obtaining the degree of interruption.

また、上記例では、第２参加者１０Ｂが通信回線７０を通じて会議に参加している例で説明したが、第２参加者１０Ｂは、現実空間で行われる会議場所において、第１参加者１０Ａと共に参加している者であってもよい。 Further, in the above example, the second participant 10B participates in the conference through the communication line 70, but the second participant 10B can participate in the conference in the real space together with the first participant 10A. It can be a participant.

また、上記例では、第２参加者１０Ｂが発言予定内容を入力すると、発言意思有りと判定する例で説明したが、発言意思は、他の参加者からの発言賛成数が所定数を超えた場合に、発言意思有りと判定するようにしてもよい。例えば、第２参加者１０Ｂが発言予定内容を文字情報として入力すると、当該発言予定内容が他の参加者の手元の端末装置に表示され、他の参加者が当該発言予定内容に賛同する場合には賛同する旨を入力する。そして、賛同数が所定数を超えると、発言意思有りと判定して、ステップＳ３以降の処理を実行するようにしてもよい。 Further, in the above example, when the second participant 10B enters the scheduled speech content, it is determined that there is an intention to speak. In this case, it may be determined that there is an intention to speak. For example, when the second participant 10B inputs the planned speech content as character information, the planned speech content is displayed on the terminal device at hand of the other participant, and when the other participant agrees with the planned speech content, enter that you agree. Then, when the number of approvals exceeds a predetermined number, it may be determined that there is an intention to speak, and the processing from step S3 onwards may be executed.

｛第２実施形態｝
第２実施形態に係る会議用ロボットについて説明する。なお、本実施の形態の説明において、第１実施形態で説明したものと同様構成要素については同一符号を付してその説明を省略する。図８は会議用ロボット１３０が適用された会議の様子を示す説明図であり、図９は同会議用ロボットの電気的構成を示すブロック図である。 {Second embodiment}
A conference robot according to the second embodiment will be described. In the description of the present embodiment, the same reference numerals are given to the same components as those described in the first embodiment, and the description thereof will be omitted. FIG. 8 is an explanatory diagram showing a conference to which the conference robot 130 is applied, and FIG. 9 is a block diagram showing the electrical configuration of the conference robot.

この会議用ロボット１３０が上記会議支援システム３０と異なるのは、会議の前提として第２参加者１０Ｂが存在せず、代りに、会議用ロボット１３０が自律的に会議用の発言を行う点である。 This conference robot 130 differs from the conference support system 30 in that the second participant 10B does not exist as a premise of the conference, and instead the conference robot 130 autonomously makes remarks for the conference. .

すなわち、会議用ロボット１３０は、上記代理参加ロボット６０に代えて、会議に参加している。 That is, the conference robot 130 participates in the conference instead of the proxy participation robot 60 .

会議用ロボット１３０は、制御部１４０と、ロボット動作部１６７及び複数組のカメラ５８及びマイク５９とを備える。 The conference robot 130 includes a control unit 140 , a robot operating unit 167 and multiple sets of cameras 58 and microphones 59 .

複数組のカメラ５８及びマイク５９は、上記と同様に、複数の参加者（ここでは第１参加者１０Ａ）の会議参加状態を取得する参加状態取得部の一例である。 A plurality of sets of cameras 58 and microphones 59 is an example of a participation state acquisition unit that acquires conference participation states of a plurality of participants (here, the first participant 10A), as described above.

ロボット動作部１６７は、上記ロボット動作部６７と同様に、ロボットとしての動作を行う。ここでは、ロボット動作部１６７は、スピーカ１６８及び腕駆動部１６９を含む。制御部１４０による制御下、ロボット動作部１６７は、スピーカ１６８を通じて音声を発することができる。また、腕駆動部１６９は、少なくとも１つの腕を上下に駆動するモータ等を含んでいる。腕駆動部１６９は、制御部１４０による制御下、会議用ロボット１３０の腕を上げ（挙手）たり、下げたりする動作を実行することができる。会議用ロボット１３０は、人型ロボットであってもよいし、その他の多関節ロボット等であってもよい。 The robot operation unit 167 performs the operation as a robot in the same manner as the robot operation unit 67 described above. Here, the robot action section 167 includes a speaker 168 and an arm drive section 169 . Under the control of the control unit 140 , the robot operation unit 167 can emit sound through the speaker 168 . The arm driving section 169 also includes a motor or the like for driving at least one arm up and down. Under the control of the control unit 140 , the arm driving unit 169 can raise (raise) or lower the arm of the conference robot 130 . The conference robot 130 may be a humanoid robot, or may be another articulated robot or the like.

制御部１４０は、複数組のカメラ５８及びマイク５９、ロボット動作部１６７に接続されており、複数組のカメラ５８及びマイク５９からの入力に基づいて、ロボット動作部１６７の動作制御を行う。特に、制御部１４０は、複数組のカメラ５８及びマイク５９で得られた複数の第１参加者１０Ａの会議参加状態に基づいて、複数の第１参加者１０Ａによるコミュニケーションの遮りやすさを求め、求められた遮りやすさに基づいて発言タイミングであるか否かを判定する。そして、発言タイミングであると判定されたときに、スピーカ６８を通じた発言処理を実行する。 The control unit 140 is connected to the multiple sets of cameras 58 and microphones 59 and the robot operation unit 167 , and controls the operation of the robot operation unit 167 based on inputs from the multiple sets of cameras 58 and microphones 59 . In particular, the control unit 140 obtains the ease of interrupting communication by the plurality of first participants 10A based on the conference participation states of the plurality of first participants 10A obtained by the plurality of sets of cameras 58 and microphones 59, Whether or not it is time to speak is determined based on the obtained degree of obstruction. Then, when it is determined that it is time to speak, speech processing through the speaker 68 is executed.

制御部１４０が制御部４０と異なる点は、本制御部１４０が会話プログラム１４０ｂを実行する点である。会話プログラム１４０ｂは、記憶部４３に記憶されたプログラムであり、ＣＰＵ４１が当該会話プログラム１４０ｂに会話プログラム１４０ｂに記述された手順、及び、複数のマイク５９からの入力等に基づいて、会話処理を実行する。会話プログラム１４０ｂ自体は、複数人の第１参加者１０Ａの発言内容に基づいて、当該発言内容に関連した発言を発する処理等を行う周知のプログラムを含むプログラムを適用することができる。会話プログラム１４０ｂが発する発言内容は、発言タイミングであると判定される前に作成され、第１実施形態と同様に、発言予定記録部に記録されていてもよいし、発言タイミングであると判定されたときに、生成されてもよい。 The control unit 140 differs from the control unit 40 in that the main control unit 140 executes the conversation program 140b. The conversation program 140b is a program stored in the storage unit 43, and the CPU 41 executes conversation processing based on the procedures described in the conversation program 140b and the inputs from the plurality of microphones 59. do. For the conversation program 140b itself, a program including a well-known program that performs a process of issuing a statement related to the statement content based on the statement content of the plurality of first participants 10A can be applied. The content of the statement issued by the conversation program 140b may be created before it is determined that it is time to speak, and may be recorded in the statement schedule recording unit as in the first embodiment, or it may be determined that it is time to speak. may be generated when

この会議用ロボット１３０によると、自律的に会議用の発言を行う会議用ロボット１３０が、適したタイミングで、発言を行うことができる。その他、第２参加者１０Ｂが発言を行うことによる効果を除き、上記第１実施形態と同様の作用効果を得ることができる。また、会議用ロボット１３０が会議用の発言を行うことで、会議を活発にすることができ、会議を支援することができる。 According to this conference robot 130, the conference robot 130, which autonomously makes a statement for a conference, can make a statement at an appropriate timing. In addition, except for the effect of the second participant 10B making a statement, it is possible to obtain the same effects as those of the first embodiment. In addition, the conference robot 130 can speak for the conference, making the conference lively and supporting the conference.

｛変形例｝
なお、上記各実施形態及び各変形例で説明した各構成は、相互に矛盾しない限り適宜組合わせることができる。例えば、第１実施形態と第２実施形態とが組合わされ、複数の第１参加者に加えて、第２参加者１０Ｂ及び会議用ロボット１３０の両方が会議に参加してもよい。 {Modification}
In addition, each configuration described in each of the above-described embodiments and modifications can be appropriately combined as long as they do not contradict each other. For example, the first embodiment and the second embodiment may be combined so that both the second participant 10B and the conference robot 130 participate in the conference in addition to the plurality of first participants.

以上のようにこの発明は詳細に説明されたが、上記した説明は、すべての局面において、例示であって、この発明がそれに限定されるものではない。例示されていない無数の変形例が、この発明の範囲から外れることなく想定され得るものと解される。 Although the present invention has been described in detail as above, the above description is illustrative in all aspects, and the present invention is not limited thereto. It is understood that numerous variations not illustrated can be envisioned without departing from the scope of the invention.

１０Ａ第１参加者
１０Ｂ第２参加者
３０会議支援システム
４０、１４０制御部
４３ａプログラム
４３ｃ音声データ
４５通信回路
４６ロボット制御入出力部
５８カメラ
５９マイク
６０代理参加ロボット
６７、１６７ロボット動作部
６８、１６８スピーカ
６９、１６９腕駆動部
７０通信回線
８０端末装置
８６マイク
１３０会議用ロボット
１４０ｂ会話プログラム 10A first participant 10B second participant 30 conference support system 40, 140 control unit 43a program 43c voice data 45 communication circuit 46 robot control input/output unit 58 camera 59 microphone 60 proxy participation robot 67, 167 robot operation unit 68, 168 Speakers 69, 169 Arm drive unit 70 Communication line 80 Terminal device 86 Microphone 130 Conference robot 140b Conversation program

Claims

A conference support system used in a conference by a plurality of participants,
a participation state acquisition unit that acquires the conference participation states of the plurality of participants;
Based on the conference participation states of the plurality of participants obtained by the participation state acquisition unit, the likelihood of interrupting communication by the plurality of participants is obtained, and whether it is time to speak based on the obtained likelihood of interruption. A control unit that determines whether
with
The control unit includes an utterance schedule recording unit that records utterance schedule contents, and when it is determined that it is time to utter, the utterance schedule contents recorded in the utterance schedule recording unit are said through a speaker. system.

The meeting support system according to claim 1 ,
The conference support system, wherein the control section records, in the speech schedule recording section, speech schedule contents of at least one participant among the plurality of participants.

The conference support system according to claim 2 ,
The plurality of participants includes a plurality of first participants who hold a conference in a state in which each other's conference participation status can be observed in real space, and a second participant who participates in the conference via a communication line,
The conference support system, wherein the control section records the speech schedule content of the second participant in the speech schedule recording section.

The conference support system according to any one of claims 1 to 3 ,
The participation state acquisition unit includes a plurality of individual participation state acquisition units that acquire the conference participation state of each of the plurality of participants,
The control unit obtains the ease of interruption for each of the plurality of participants based on the conference participation states of the plurality of participants obtained by the plurality of individual participation state acquisition units. A conference support system that determines whether or not it is time to speak based on the

A conference support system used in a conference by a plurality of participants,
a participation state acquisition unit that acquires the conference participation states of the plurality of participants;
Based on the conference participation states of the plurality of participants obtained by the participation state acquisition unit, the likelihood of interrupting communication by the plurality of participants is obtained, and whether it is time to speak based on the obtained likelihood of interruption. A control unit that determines whether
a speech command reception unit that acquires a speech command by at least one of the plurality of participants;
a virtual conference participation device capable of participating in a conference in the real space where the conference is held;
with
The conference support system, wherein the control unit causes the pseudo-conference participation device to perform a first operation to increase the likelihood of interruption when it is determined that it is not the time to speak after obtaining an intention command to speak.

The meeting support system according to claim 5 ,
The conference support system, wherein, after executing the first operation, the control unit causes the pseudo-conference participation device to execute a second operation to increase the likelihood of interruption when it is determined that it is not time to speak.

The meeting support system according to claim 6 ,
The conference support system, wherein the second motion is a motion more effective than the first motion in increasing the likelihood of interruption.

A conference robot that participates in a conference with a plurality of participants,
a participation state input unit for inputting conference participation states of the plurality of participants;
Based on the conference participation states of the plurality of participants input from the participation state input unit, the likelihood of interrupting communication by the plurality of participants is obtained, and whether it is time to speak based on the obtained likelihood of interruption. a control unit that determines whether or not, and executes processing for speaking through a speaker when it is determined that it is time to speak;
a robot action unit that performs actions as a robot;
with
The robot for conference, wherein the control unit causes the robot action unit to perform a first action to increase the likelihood of interruption when it is determined that it is not the timing to speak after the expected speech content to be generated occurs.

A conference robot according to claim 8 ,
the conference participation state of each of the plurality of participants is input to the participation state input unit;
The control unit obtains the likelihood of interruption for each of the plurality of participants based on the conference participation status of each of the plurality of participants, and determines whether or not it is time to speak based on the obtained likelihood of interruption. A conference robot.

The conference robot according to claim 8 or 9 ,
The conference robot, wherein, after the execution of the first action, the control section causes the robot action section to perform a second action that increases the likelihood of interruption when it is determined that it is not time to speak.

A conference robot according to claim 10 ,
The conference robot, wherein the second action is more effective than the first action in increasing the likelihood of interruption.