JP7290366B2

JP7290366B2 - COMMUNICATION TERMINAL, REMOTE CONFERENCE METHOD AND PROGRAM

Info

Publication number: JP7290366B2
Application number: JP2022021740A
Authority: JP
Inventors: 麻由子寺田
Original assignee: NEC Platforms Ltd
Current assignee: NEC Platforms Ltd
Priority date: 2020-12-11
Filing date: 2022-02-16
Publication date: 2023-06-13
Anticipated expiration: 2040-12-11
Also published as: JP2022093326A

Description

本発明は、通信端末、遠隔会議方法及びプログラムに関する。 The present invention relates to a communication terminal, teleconference method and program.

近年、ネットワークを介して互いに離れた地点に設けられた複数の通信端末の間で会議を開催することが可能となっている。このような遠隔会議を開催する遠隔会議システムでは、会議の参加者が１箇所の会議室に集まる必要はなく、各参加者は、各々の席又は自宅等にいながら、遠隔会議に参加することができる。 2. Description of the Related Art In recent years, it has become possible to hold a conference between a plurality of communication terminals provided at distant points via a network. In a teleconference system that holds such a teleconference, it is not necessary for the participants of the conference to gather in one conference room. can be done.

このような技術に関連し、特許文献１は、通信回線の負荷を考慮しつつ臨場感のある会議を提供する通信制御装置を開示する。また、特許文献２は、既存発話の再生中に新規発話を行っても既存発話の再生を損なうことなく、表現力が高く、発言しやすい、より対話性に富んだ会議システムを開示する。 In relation to such technology, Patent Literature 1 discloses a communication control device that provides a realistic conference while considering the load on the communication line. Further, Patent Literature 2 discloses a conference system that is highly expressive, easy to speak, and rich in interactivity without impairing the reproduction of existing speech even if a new speech is made during the reproduction of existing speech.

特開２０１０－２３９３９３号公報JP 2010-239393 A 特開２００１－２３０７７３号公報JP-A-2001-230773

遠隔会議では、通信の遅延等により、完全にリアルタイムなコミュニケーションを行うことが難しい場合がある。また、他の参加者の顔を視認することができないことがあるため、他の参加者の様子を理解することが難しい場合がある。このような場合、他の参加者が発話（発言）を行っている際に別の参加者が発話を行ってしまう、発話衝突が発生することがある。発話衝突が発生すると、遅れて発話を行った参加者が発話を遠慮ことがある。この場合、遅れて発話した（つまり発話衝突を発生させた）参加者の不満が増大するおそれがある。したがって、発話衝突が発生すると、遠隔会議がスムーズに進行することが阻害されてしまうおそれがある。 In teleconferencing, it may be difficult to achieve complete real-time communication due to communication delays and the like. In addition, since the faces of the other participants may not be visible, it may be difficult to understand the behavior of the other participants. In such a case, an utterance conflict may occur in which another participant speaks while another participant is speaking (uttering). When a speech collision occurs, the participant who speaks late may refrain from speaking. In this case, the dissatisfaction of the participant who spoke late (that is, the speech collision occurred) may increase. Therefore, if a speech collision occurs, there is a risk that the teleconference will not proceed smoothly.

本開示の目的は、このような課題を解決するためになされたものであり、遠隔会議をスムーズに進行することが可能となる遠隔会議システム、通信端末、遠隔会議方法及びプログラムを提供することにある。 An object of the present disclosure is to solve such problems, and to provide a teleconference system, a communication terminal, a teleconference method, and a program that enable a teleconference to proceed smoothly. be.

本開示にかかる遠隔会議システムは、遠隔会議の複数の参加者それぞれの音声が発話を示しているか又は相槌を示しているかを判定する発話判定手段と、前記複数の参加者それぞれの音声が前記複数の参加者それぞれの通信端末で出力されるように制御を行い、前記複数の参加者のうちのある参加者が発話を行っている際に他の参加者が発話を行った場合に、前記他の参加者の発話の出力を抑制するように制御を行う音声出力制御手段と、出力が抑制された発話である第１の発話の回数を、前記複数の参加者ごとにカウントするカウント手段と、前記回数に関する表示が前記複数の参加者それぞれの前記通信端末でなされるように制御を行う回数表示制御手段と、を有する。 A teleconferencing system according to the present disclosure includes speech determination means for determining whether the voice of each of a plurality of participants in a remote conference indicates an utterance or indicates a backtracking; is controlled so that it is output on the communication terminal of each of the participants, and when another participant speaks while a participant among the plurality of participants is speaking, the other participant voice output control means for controlling to suppress the output of the utterances of the participants; counting means for counting the number of first utterances, which are utterances whose output is suppressed, for each of the plurality of participants; and number display control means for controlling such that the number of times is displayed on the communication terminal of each of the plurality of participants.

また、本開示にかかる通信端末は、当該通信端末のユーザが参加する遠隔会議において当該ユーザの音声が発話を示しているか又は相槌を示しているかを判定する発話判定手段と、前記遠隔会議の複数の参加者それぞれの音声が当該通信端末で出力され、前記ユーザの音声が複数の参加者それぞれの前記通信端末である第１の通信端末で出力されるように制御を行い、前記複数の参加者のうちのある参加者が発話を行っている際に前記ユーザが発話を行った場合に、前記ユーザの発話が前記第１の通信端末で出力されることを抑制するように制御を行う音声出力制御手段と、当該通信端末のユーザについて、出力が抑制された発話である第１の発話の回数をカウントするカウント手段と、前記回数に関する表示が前記第１の通信端末でなされるように制御を行う回数表示制御手段と、を有する。 Further, the communication terminal according to the present disclosure includes speech determination means for determining whether the user's voice indicates an utterance or a backtracking in a teleconference in which the user of the communication terminal participates; The voice of each of the participants is output at the communication terminal, and the voice of the user is output at the first communication terminal, which is the communication terminal of each of the plurality of participants, and the plurality of participants When the user speaks while one of the participants is speaking, voice output is controlled to suppress the output of the user's speech at the first communication terminal. control means; counting means for counting the number of first utterances, which are utterances whose output is suppressed, for a user of the communication terminal; and control to display the number of times on the first communication terminal. and a control means for displaying the number of times performed.

また、本開示にかかる遠隔会議方法は、遠隔会議の複数の参加者それぞれの音声が発話を示しているか又は相槌を示しているかを判定し、前記複数の参加者それぞれの音声が前記複数の参加者それぞれの通信端末で出力されるように制御を行い、前記複数の参加者のうちのある参加者が発話を行っている際に他の参加者が発話を行った場合に、前記他の参加者の発話の出力を抑制するように制御を行い、出力が抑制された発話である第１の発話の回数を、前記参加者ごとにカウントし、前記回数に関する表示が前記複数の参加者それぞれの前記通信端末でなされるように制御を行う。 Further, the remote conference method according to the present disclosure determines whether the voice of each of the plurality of participants in the remote conference indicates an utterance or indicates a backtracking, and the voice of each of the plurality of participants control so that the communication terminal of each of the participants is output, and when another participant speaks while a certain participant among the plurality of participants is speaking, the other participant control to suppress the output of the utterance of the plurality of participants, the number of first utterances, which is the utterance whose output is suppressed, is counted for each of the participants, and the display of the number of times is displayed for each of the plurality of participants Control is performed as in the communication terminal.

また、本開示にかかるプログラムは、遠隔会議の複数の参加者それぞれの音声が発話を示しているか又は相槌を示しているかを判定する機能と、前記複数の参加者それぞれの音声が前記複数の参加者それぞれの通信端末で出力されるように制御を行い、前記複数の参加者のうちのある参加者が発話を行っている際に他の参加者が発話を行った場合に、前記他の参加者の発話の出力を抑制するように制御を行う機能と、出力が抑制された発話である第１の発話の回数を、前記参加者ごとにカウントする機能と、前記回数に関する表示が前記複数の参加者それぞれの前記通信端末でなされるように制御を行う機能と、をコンピュータに実現させる。 Further, the program according to the present disclosure includes a function of determining whether the voice of each of a plurality of participants in a teleconference indicates an utterance or a backtracking, and control so that the communication terminal of each of the participants is output, and when another participant speaks while a certain participant among the plurality of participants is speaking, the other participant a function of controlling to suppress the output of an utterance of a participant; a function of counting the number of first utterances, which are utterances whose output is suppressed, for each of the participants; and a function of controlling the communication terminal of each participant.

本開示によれば、遠隔会議をスムーズに進行することが可能となる遠隔会議システム、通信端末、遠隔会議方法及びプログラムを提供できる。 According to the present disclosure, it is possible to provide a teleconference system, a communication terminal, a teleconference method, and a program that enable a teleconference to proceed smoothly.

本開示の実施の形態にかかる遠隔会議システムを示す図である。1 is a diagram showing a remote conference system according to an embodiment of the present disclosure; FIG. 本開示の実施の形態にかかる遠隔会議システムによって実行される遠隔会議方法を示すフローチャートである。4 is a flow chart showing a teleconferencing method executed by the teleconferencing system according to the embodiment of the present disclosure; 実施の形態１にかかる遠隔会議システムを示す図である。1 illustrates a remote conference system according to a first embodiment; FIG. 実施の形態１にかかる通信端末の構成を示す図である。1 is a diagram showing a configuration of a communication terminal according to Embodiment 1; FIG. 実施の形態１にかかる遠隔会議装置の構成を示す図である。1 is a diagram showing a configuration of a teleconference device according to Embodiment 1; FIG. 実施の形態１にかかる参加者情報を例示する図である。4 is a diagram exemplifying participant information according to the first embodiment; FIG. 実施の形態１にかかる遠隔会議システムによって実行される遠隔会議方法を示すフローチャートである。4 is a flow chart showing a teleconference method executed by the teleconference system according to the first embodiment; 実施の形態２にかかる遠隔会議システムを示す図である。FIG. 10 is a diagram showing a teleconference system according to a second embodiment; FIG. 実施の形態２にかかる遠隔会議システムにおいて発話状態情報が送受信される状態を例示する図である。FIG. 10 is a diagram illustrating a state in which speech state information is transmitted and received in the teleconferencing system according to the second embodiment; 実施の形態２にかかる発話状態検出部の構成を示すブロック図である。FIG. 11 is a block diagram showing the configuration of an utterance state detection unit according to the second exemplary embodiment; FIG. 実施の形態２にかかる会議情報を例示する図である。FIG. 10 is a diagram illustrating conference information according to the second embodiment; FIG. 実施の形態２にかかる会議制御部の構成を示す図である。FIG. 10 is a diagram showing the configuration of a conference control unit according to the second embodiment; FIG. 実施の形態２にかかる遠隔会議システムで実行される遠隔会議方法を示すフローチャートである。9 is a flow chart showing a teleconference method executed by the teleconference system according to the second embodiment; 実施の形態２にかかる遠隔会議において各通信端末で表示される会議画像を例示する図である。FIG. 10 is a diagram illustrating a conference image displayed on each communication terminal in a teleconference according to the second embodiment; 実施の形態２にかかる遠隔会議において各通信端末で表示される会議画像を例示する図である。FIG. 10 is a diagram illustrating a conference image displayed on each communication terminal in a teleconference according to the second embodiment;

（本開示にかかる実施の形態の概要）
本開示の実施形態の説明に先立って、本開示にかかる実施の形態の概要について説明する。図１は、本開示の実施の形態にかかる遠隔会議システム１を示す図である。遠隔会議システム１は、遠隔会議（Ｗｅｂ会議）を実現する。遠隔会議は、複数の参加者の通信端末を用いて行われる。遠隔会議システム１は、例えば、コンピュータによって実現可能である。遠隔会議システム１は、遠隔会議の参加者の各通信端末で実現されてもよいし、遠隔会議を管理するサーバ等によって実現されてもよい。また、遠隔会議システム１は、サーバ及び通信端末といった、複数の装置で実現されてもよい。 (Overview of Embodiments According to the Present Disclosure)
Prior to describing the embodiments of the present disclosure, an outline of the embodiments of the present disclosure will be described. FIG. 1 is a diagram showing a remote conference system 1 according to an embodiment of the present disclosure. The remote conference system 1 realizes a remote conference (web conference). A remote conference is held using communication terminals of a plurality of participants. The teleconference system 1 can be implemented by, for example, a computer. The teleconference system 1 may be realized by each communication terminal of participants in the teleconference, or may be realized by a server or the like that manages the teleconference. Also, the teleconferencing system 1 may be realized by a plurality of devices such as a server and a communication terminal.

遠隔会議システム１は、発話判定部２と、音声出力制御部４と、カウント部６と、回数表示制御部８とを有する。発話判定部２は、発話判定手段としての機能を有する。音声出力制御部４は、音声出力制御手段としての機能を有する。カウント部６は、カウント手段としての機能を有する。回数表示制御部８は、回数表示制御手段としての機能を有する。 The teleconference system 1 has an utterance determination section 2 , an audio output control section 4 , a count section 6 , and a number display control section 8 . The utterance determination unit 2 has a function as utterance determination means. The audio output control section 4 has a function as an audio output control means. The counting section 6 has a function as counting means. The frequency display control section 8 has a function as frequency display control means.

図２は、本開示の実施の形態にかかる遠隔会議システム１によって実行される遠隔会議方法を示すフローチャートである。発話判定部２は、遠隔会議の複数の参加者それぞれの音声が発話を示しているか又は相槌を示しているかを判定する（ステップＳ１２）。判定方法については以降の実施の形態で説明する。ここで、「発話」とは、意味のある内容の言葉（言語）に対応する音声（発声）である。一方、「相槌」とは、それ自体に意味のない言葉に対応する音声（発声）である。本明細書では、「発話」と「相槌」とを互いに対になる用語としている。 FIG. 2 is a flow chart showing a remote conference method executed by the remote conference system 1 according to the embodiment of the present disclosure. The utterance determination unit 2 determines whether the voices of the plurality of participants in the teleconference represent utterances or backtracks (step S12). A determination method will be described in subsequent embodiments. Here, "utterance" is sound (utterance) corresponding to words (language) with meaningful content. On the other hand, a "backtrack" is a voice (utterance) corresponding to a word that has no meaning in itself. In this specification, the terms "utterance" and "backhand" are paired with each other.

音声出力制御部４は、複数の参加者それぞれの音声が複数の参加者それぞれの通信端末で出力されるように制御を行う。音声出力制御部４は、複数の参加者のうちのある参加者が発話を行っている際に他の参加者が発話を行った場合に、他の参加者の発話の出力を抑制するように制御を行う（ステップＳ１４）。つまり、音声出力制御部４は、発話衝突が発生したときに、他の参加者の発話（衝突発話）の出力を抑制する。なお、以後、後の発話（発話衝突を起こした発話）を、「衝突発話」と称することがある。したがって、衝突発話とは、出力が抑制された発話である。なお、衝突発話の出力を抑制するとは、例えば衝突発話が各参加者の通信端末で出力されないことであるが、これに限定されない。 The audio output control unit 4 performs control so that the audio of each of the multiple participants is output from the communication terminals of each of the multiple participants. The audio output control unit 4 suppresses the output of the other participant's speech when another participant speaks while a certain participant among the plurality of participants is speaking. Control is performed (step S14). In other words, the voice output control unit 4 suppresses the output of other participants' utterances (collision utterances) when an utterance collision occurs. In addition, hereinafter, the later utterance (the utterance that caused the utterance collision) may be referred to as "conflicting utterance". Therefore, a conflicting utterance is an utterance whose output is suppressed. Note that suppressing the output of conflicting utterances means, for example, not outputting conflicting utterances from the communication terminals of the participants, but is not limited to this.

なお、本実施の形態では、「発話衝突」とは、ある参加者が発話を行っている際に他の参加者が発話を行ってしまうことを意味するのであって、複数の参加者の発話が各通信端末で同時に出力されることを意味しているわけではない。本実施の形態では、複数の参加者の発話のうちの後の発話の出力が抑制され得ることに、留意されたい。したがって、本実施の形態では、「発話衝突」が発生したことは、衝突発話を発した参加者では認識することができるが、その他の参加者では認識しない可能性がある。つまり、衝突発話を行った参加者は、自身の通信端末で他の参加者の発話が出力されているときに発話を行ったのであるから、発話衝突が発生したことを把握できる。一方、衝突発話は各通信端末で出力が抑制されるので、衝突発話を発した参加者以外の参加者は、発話衝突が発生したことを認識しない可能性がある。 In the present embodiment, "utterance collision" means that when one participant is speaking, another participant speaks. does not mean that is output at the same time on each communication terminal. Note that in this embodiment, the output of later utterances among the multiple participants' utterances may be suppressed. Therefore, in the present embodiment, the participant who uttered the conflicting utterance can recognize that the "speech conflict" has occurred, but the other participants may not recognize it. In other words, the participant who made the colliding utterance made the utterance while the other participant's utterance was being output from his/her own communication terminal, so he/she can grasp that the utterance collision has occurred. On the other hand, since the output of the colliding utterance is suppressed at each communication terminal, participants other than the participant who uttered the colliding utterance may not recognize that the utterance collision has occurred.

カウント部６は、出力が抑制された発話（衝突発話；第１の発話）の回数を、参加者ごとにカウントする（ステップＳ１６）。回数表示制御部８は、回数に関する表示が複数の参加者の通信端末でなされるように制御を行う（ステップＳ１８）。これにより、各参加者は、どの参加者の発話衝突の回数が多いかといったことを把握することができる。 The counting unit 6 counts the number of utterances whose output is suppressed (conflicting utterance; first utterance) for each participant (step S16). The number display control unit 8 performs control so that the number of times is displayed on the communication terminals of a plurality of participants (step S18). As a result, each participant can grasp which participant's utterance collision occurs more often.

ここで、衝突発話の回数の多い（発話衝突を発生させた回数が多い）参加者は、発話をしたい参加者であると言える。したがって、衝突発話の回数が多いことを遠隔会議の参加者の通信端末に表示させるようにすることで、他の参加者は、その参加者が発話を行いたいことを、認識することができる。これにより、他の参加者は、その参加者に対して発話を促したり、その参加者が発話を行うまで待機したりといった行動を行うことができる。したがって、その参加者の発話を行いたくてもできないといった不満を低減することができる。したがって、本実施の形態にかかる遠隔会議システム１は、遠隔会議をスムーズに進行することが可能となる。 Here, it can be said that a participant who has made a large number of conflicting utterances (has a large number of occurrences of utterance conflict) is a participant who wants to speak. Therefore, by displaying on the communication terminal of a participant in the teleconference that the number of conflicting utterances is large, the other participants can recognize that the participant wants to make an utterance. As a result, other participants can perform actions such as prompting the participant to speak or waiting until the participant speaks. Therefore, the dissatisfaction that the participant cannot speak even if he/she wants to speak can be reduced. Therefore, the teleconference system 1 according to the present embodiment enables the teleconference to proceed smoothly.

（実施の形態１）
以下、実施形態について、図面を参照しながら説明する。説明の明確化のため、以下の記載及び図面は、適宜、省略、及び簡略化がなされている。また、各図面において、同一の要素には同一の符号が付されており、必要に応じて重複説明は省略されている。 (Embodiment 1)
Hereinafter, embodiments will be described with reference to the drawings. For clarity of explanation, the following descriptions and drawings are omitted and simplified as appropriate. Moreover, in each drawing, the same elements are denoted by the same reference numerals, and redundant description is omitted as necessary.

図３は、実施の形態１にかかる遠隔会議システム２０を示す図である。遠隔会議システム２０は、複数の通信端末３０と、遠隔会議装置１００とを有する。通信端末３０は、遠隔会議の参加者ごとに設けられ得る。複数の通信端末３０と、遠隔会議装置１００とは、互いにネットワーク２２を介して通信可能に接続されている。ネットワーク２２は、有線であってもよく、無線であってもよく、有線と無線の組み合わせであってもよい。ネットワーク２２は、インターネットであってもよいし、ＬＡＮ（Local Area Network）であってもよい。 FIG. 3 is a diagram showing the remote conference system 20 according to the first embodiment. The teleconference system 20 has a plurality of communication terminals 30 and a teleconference device 100 . A communication terminal 30 may be provided for each remote conference participant. The plurality of communication terminals 30 and the remote conference device 100 are connected to each other via the network 22 so as to be able to communicate with each other. Network 22 may be wired, wireless, or a combination of wired and wireless. The network 22 may be the Internet or a LAN (Local Area Network).

通信端末３０は、例えば、参加者が所有しているコンピュータである。通信端末３０は、例えば、パソコン（ＰＣ：Personal Computer）、及び、スマートフォン又はタブレット端末等の携帯端末である。通信端末３０は、参加者が遠隔会議に参加する際に、参加者の発した音声（発話又は相槌）を示す音声データを、ネットワーク２２を介して遠隔会議装置１００に送信する。また、通信端末３０は、他の参加者の音声（発話又は相槌）を示す音声データを、ネットワーク２２を介して遠隔会議装置１００から受信する。そして、通信端末３０は、その音声データに対応する音声を、その通信端末３０のユーザである参加者が聴取可能に出力する。 The communication terminal 30 is, for example, a computer owned by a participant. The communication terminal 30 is, for example, a personal computer (PC) and a mobile terminal such as a smart phone or a tablet terminal. When a participant participates in a remote conference, the communication terminal 30 transmits audio data indicating the voice (speech or backtracking) uttered by the participant to the remote conference device 100 via the network 22 . Also, the communication terminal 30 receives voice data representing voices (utterances or backtracking) of other participants from the teleconference device 100 via the network 22 . Then, the communication terminal 30 outputs the voice corresponding to the voice data so that the participant who is the user of the communication terminal 30 can listen to it.

遠隔会議装置１００は、例えば、サーバ等のコンピュータである。遠隔会議装置１００は、遠隔会議を管理する。遠隔会議装置１００は、各参加者の通信端末３０から音声データを受信して、複数の通信端末３０に送信する。なお、この場合、遠隔会議装置１００は、音声データを送信した通信端末３０には、その音声データを送信しなくてもよい（他の実施の形態でも同様）。なお、実施の形態１において、用語「音声」は、情報処理における処理対象としての、「音声を示す音声データ」も意味し得る。 The remote conference device 100 is, for example, a computer such as a server. The teleconference device 100 manages teleconferences. The teleconference device 100 receives voice data from each participant's communication terminal 30 and transmits the voice data to a plurality of communication terminals 30 . In this case, the teleconference device 100 does not have to transmit the voice data to the communication terminal 30 that transmitted the voice data (the same applies to other embodiments). In Embodiment 1, the term "speech" can also mean "speech data representing speech" as a processing target in information processing.

図４は、実施の形態１にかかる通信端末３０の構成を示す図である。通信端末３０は、主要なハードウェア構成として、制御部３２と、記憶部３４と、通信部３６と、インタフェース部３８（ＩＦ：Interface）を有する。制御部３２、記憶部３４、通信部３６及びインタフェース部３８は、データバスなどを介して相互に接続されている。 FIG. 4 is a diagram showing the configuration of the communication terminal 30 according to the first embodiment. The communication terminal 30 has a control unit 32, a storage unit 34, a communication unit 36, and an interface unit 38 (IF: Interface) as main hardware components. The control unit 32, storage unit 34, communication unit 36, and interface unit 38 are interconnected via a data bus or the like.

制御部３２は、例えばＣＰＵ（Central Processing Unit）等のプロセッサである。制御部３２は、制御処理及び演算処理等を行う演算装置としての機能を有する。記憶部３４は、例えばメモリ又はハードディスク等の記憶デバイスである。記憶部３４は、例えばＲＯＭ（Read Only Memory）又はＲＡＭ（Random Access Memory）等である。記憶部３４は、制御部３２によって実行される制御プログラム及び演算プログラム等を記憶するための機能を有する。また、記憶部３４は、処理データ等を一時的に記憶するための機能を有する。記憶部３４は、データベースを含み得る。 The control unit 32 is a processor such as a CPU (Central Processing Unit), for example. The control unit 32 has a function as an arithmetic device that performs control processing, arithmetic processing, and the like. The storage unit 34 is, for example, a storage device such as memory or hard disk. The storage unit 34 is, for example, a ROM (Read Only Memory) or a RAM (Random Access Memory). The storage unit 34 has a function of storing control programs, arithmetic programs, and the like executed by the control unit 32 . The storage unit 34 also has a function of temporarily storing processing data and the like. Storage unit 34 may include a database.

通信部３６は、遠隔会議装置１００等の遠隔会議システム２０を構成する装置と通信を行うために必要な処理を行う。通信部３６は、通信ポート、ルータ、ファイアウォール等を含み得る。インタフェース部１０８は、例えばユーザインタフェース（ＵＩ）である。インタフェース部１０８は、キーボード、タッチパネル又はマウス等の入力装置と、ディスプレイ又はスピーカ等の出力装置とを有する。インタフェース部１０８は、ユーザ（オペレータ）によるデータの入力の操作を受け付け、ユーザに対して情報を出力する。また、インタフェース部１０８は、入力装置として、マイクロフォン等の集音装置、及び、カメラ等の撮像装置を有し得る。また、インタフェース部１０８の少なくとも一部は、通信端末３０と物理的に一体である必要はない。インタフェース部１０８の少なくとも一部は、通信端末３０と、有線又は無線によって接続されていてもよい。 The communication unit 36 performs processing necessary for communicating with devices such as the remote conference device 100 that constitute the remote conference system 20 . Communication unit 36 may include communication ports, routers, firewalls, and the like. The interface unit 108 is, for example, a user interface (UI). The interface unit 108 has an input device such as a keyboard, touch panel, or mouse, and an output device such as a display or speaker. The interface unit 108 receives a data input operation by a user (operator) and outputs information to the user. Further, the interface unit 108 may have a sound collecting device such as a microphone and an imaging device such as a camera as input devices. Also, at least part of interface section 108 does not need to be physically integrated with communication terminal 30 . At least part of the interface section 108 may be connected to the communication terminal 30 by wire or wirelessly.

また、通信端末３０は、構成要素として、音声取得部４２、音声送信部４４、音声受信部４６、音声出力部４８、表示情報受信部５２、及び、画像表示部５４を有する。音声取得部４２、音声送信部４４、音声受信部４６、音声出力部４８、表示情報受信部５２、及び、画像表示部５４は、上述したハードウェア構成によって実現されてもよいし、ソフトウェアによって実現されてもよい。 In addition, the communication terminal 30 has an audio acquisition unit 42, an audio transmission unit 44, an audio reception unit 46, an audio output unit 48, a display information reception unit 52, and an image display unit 54 as components. The audio acquisition unit 42, the audio transmission unit 44, the audio reception unit 46, the audio output unit 48, the display information reception unit 52, and the image display unit 54 may be implemented by the hardware configuration described above, or may be implemented by software. may be

音声取得部４２は、遠隔会議の参加者である通信端末３０のユーザの発した音声を取得する。音声取得部４２は、インタフェース部３８である集音装置によって音声を取得してもよい。音声送信部４４は、取得されたユーザの音声（音声データ）を、ネットワーク２２を介して、遠隔会議装置１００に送信する。音声送信部４４は、通信部３６によって音声（音声データ）を送信してもよい。 The speech acquisition unit 42 acquires speech uttered by the user of the communication terminal 30 who is a participant in the teleconference. The voice acquisition unit 42 may acquire voice using a sound collector, which is the interface unit 38 . The voice transmission unit 44 transmits the acquired user's voice (voice data) to the teleconference device 100 via the network 22 . The audio transmission unit 44 may transmit audio (audio data) through the communication unit 36 .

音声受信部４６は、遠隔会議の複数の参加者の音声（音声データ）を、ネットワーク２２を介して、遠隔会議装置１００から受信する。音声受信部４６は、通信部３６によって音声（音声データ）を受信してもよい。音声出力部４８は、複数の参加者の音声を、通信端末３０のユーザが聴取可能に出力する。音声出力部４８は、インタフェース部３８であるスピーカによって音声を出力してもよい。 The voice receiving unit 46 receives voices (audio data) of a plurality of participants in the remote conference from the remote conference device 100 via the network 22 . The voice receiving unit 46 may receive voice (audio data) through the communication unit 36 . The voice output unit 48 outputs voices of a plurality of participants so that the user of the communication terminal 30 can listen to them. The audio output unit 48 may output audio through a speaker that is the interface unit 38 .

表示情報受信部５２は、表示情報を、ネットワーク２２を介して、遠隔会議装置１００から受信する。ここで、表示情報とは、通信端末３０のインタフェース部３８によって表示される情報を示す情報である。表示情報については後述する。表示情報受信部５２は、通信部３６によって表示情報を受信してもよい。画像表示部５４は、受信された表示情報に対応する画像を表示する。画像表示部５４は、インタフェース部３８であるディスプレイによって画像を表示してもよい。 The display information receiving unit 52 receives display information from the teleconference device 100 via the network 22 . Here, the display information is information indicating information displayed by the interface section 38 of the communication terminal 30 . Display information will be described later. The display information receiving section 52 may receive the display information through the communication section 36 . The image display unit 54 displays an image corresponding to the received display information. The image display section 54 may display an image on the display that is the interface section 38 .

図５は、実施の形態１にかかる遠隔会議装置１００の構成を示す図である。遠隔会議装置１００は、主要なハードウェア構成として、制御部１０２と、記憶部１０４と、通信部１０６と、インタフェース部１０８とを有する。制御部１０２、記憶部１０４、通信部１０６及びインタフェース部１０８は、データバスなどを介して相互に接続されている。 FIG. 5 is a diagram showing the configuration of the remote conference device 100 according to the first embodiment. The remote conference device 100 has a control unit 102, a storage unit 104, a communication unit 106, and an interface unit 108 as main hardware components. The control unit 102, storage unit 104, communication unit 106 and interface unit 108 are interconnected via a data bus or the like.

制御部１０２は、例えばＣＰＵ等のプロセッサである。制御部１０２は、解析処理、制御処理及び演算処理等を行う演算装置としての機能を有する。記憶部１０４は、例えばメモリ又はハードディスク等の記憶デバイスである。記憶部１０４は、例えばＲＯＭ又はＲＡＭ等である。記憶部１０４は、制御部１０２によって実行される制御プログラム及び演算プログラム等を記憶するための機能を有する。また、記憶部１０４は、処理データ等を一時的に記憶するための機能を有する。記憶部１０４は、データベースを含み得る。 The control unit 102 is a processor such as a CPU, for example. The control unit 102 has a function as an arithmetic device that performs analysis processing, control processing, arithmetic processing, and the like. The storage unit 104 is, for example, a storage device such as memory or hard disk. The storage unit 104 is, for example, ROM or RAM. The storage unit 104 has a function of storing a control program, an arithmetic program, and the like executed by the control unit 102 . The storage unit 104 also has a function of temporarily storing processing data and the like. Storage unit 104 may include a database.

通信部１０６は、通信端末３０等の他の装置とネットワーク２２を介して通信を行うために必要な処理を行う。通信部１０６は、通信ポート、ルータ、ファイアウォール等を含み得る。インタフェース部１０８は、例えばユーザインタフェース（ＵＩ）である。インタフェース部１０８は、キーボード、タッチパネル又はマウス等の入力装置と、ディスプレイ又はスピーカ等の出力装置とを有する。インタフェース部１０８は、オペレータによるデータの入力の操作を受け付け、オペレータに対して情報を出力する。 The communication unit 106 performs processing necessary for communicating with other devices such as the communication terminal 30 via the network 22 . Communication unit 106 may include communication ports, routers, firewalls, and the like. The interface unit 108 is, for example, a user interface (UI). The interface unit 108 has an input device such as a keyboard, touch panel, or mouse, and an output device such as a display or speaker. The interface unit 108 receives data input operations by the operator and outputs information to the operator.

実施の形態１にかかる遠隔会議装置１００は、構成要素として、参加者情報格納部１１０と、音声受信部１１２と、発話判定部１２０と、音声出力制御部１３０と、回数カウント部１４０と、表示制御部１５０とを有する。音声出力制御部１３０は、発話衝突判定部１３２と、発話出力抑制部１３４とを有する。表示制御部１５０は、回数表示制御部１５２と、アイコン表示制御部１５４とを有する。なお、遠隔会議装置１００は、物理的に１つの装置で構成されていなくてもよい。この場合、上述した各構成要素は、物理的に別個の複数の装置によって実現されてもよい。 The teleconferencing device 100 according to the first embodiment includes, as components, a participant information storage unit 110, an audio reception unit 112, an utterance determination unit 120, an audio output control unit 130, a number count unit 140, and a display unit. and a control unit 150 . The voice output control section 130 has a speech collision determination section 132 and a speech output suppression section 134 . The display control unit 150 has a count display control unit 152 and an icon display control unit 154 . Note that the teleconference device 100 does not have to be physically composed of one device. In this case, each component described above may be implemented by a plurality of physically separate devices.

参加者情報格納部１１０は、参加者情報格納手段としての機能を有する。音声受信部１１２は、音声受信手段としての機能を有する。発話判定部１２０は、図１に示した発話判定部２に対応する。発話判定部１２０は、発話判定手段としての機能を有する。音声出力制御部１３０は、図１に示した音声出力制御部４に対応する。音声出力制御部１３０は、音声出力制御手段としての機能を有する。回数カウント部１４０は、図１に示したカウント部６に対応する。回数カウント部１４０は、回数カウント手段としての機能を有する。表示制御部１５０は、表示制御手段としての機能を有する。 The participant information storage section 110 has a function as a participant information storage means. The voice receiving unit 112 has a function as voice receiving means. The utterance determination unit 120 corresponds to the utterance determination unit 2 shown in FIG. The utterance determination unit 120 has a function as utterance determination means. The audio output control section 130 corresponds to the audio output control section 4 shown in FIG. The audio output control section 130 has a function as audio output control means. The number counting section 140 corresponds to the counting section 6 shown in FIG. The number counting unit 140 has a function as number counting means. The display control unit 150 has a function as display control means.

また、発話衝突判定部１３２は、発話衝突判定手段としての機能を有する。発話出力抑制部１３４は、発話出力抑制手段としての機能を有する。回数表示制御部１５２は、図１に示した回数表示制御部８に対応する。回数表示制御部１５２は、回数表示制御手段としての機能を有する。アイコン表示制御部１５４は、アイコン表示制御手段としての機能を有する。 The speech collision determination unit 132 also functions as a speech collision determination means. The speech output suppression unit 134 functions as speech output suppression means. The number display control unit 152 corresponds to the number display control unit 8 shown in FIG. The number display control unit 152 has a function as number display control means. The icon display control unit 154 has a function as icon display control means.

なお、上述した各構成要素は、例えば、制御部１０２の制御によって、プログラムを実行させることによって実現できる。より具体的には、各構成要素は、記憶部１０４に格納されたプログラムを、制御部１０２が実行することによって実現され得る。また、必要なプログラムを任意の不揮発性記録媒体に記録しておき、必要に応じてインストールすることで、各構成要素を実現するようにしてもよい。また、各構成要素は、プログラムによるソフトウェアで実現することに限ることなく、ハードウェア、ファームウェア、及びソフトウェアのうちのいずれかの組み合わせ等により実現してもよい。また、各構成要素は、例えばＦＰＧＡ（field-programmable gate array）又はマイコン等の、ユーザがプログラミング可能な集積回路を用いて実現してもよい。この場合、この集積回路を用いて、上記の各構成要素から構成されるプログラムを実現してもよい。これらのことは、後述する他の実施の形態においても同様である。 Each component described above can be realized by executing a program under the control of the control unit 102, for example. More specifically, each component can be implemented by control unit 102 executing a program stored in storage unit 104 . Further, each component may be realized by recording necessary programs in an arbitrary non-volatile recording medium and installing them as necessary. Moreover, each component may be implemented by any combination of hardware, firmware, and software, without being limited to being implemented by program software. Also, each component may be implemented using a user-programmable integrated circuit such as an FPGA (field-programmable gate array) or a microcomputer. In this case, this integrated circuit may be used to implement a program composed of the above components. These are the same for other embodiments described later.

参加者情報格納部１１０は、遠隔会議の参加者に関する情報である参加者情報を格納する。
図６は、実施の形態１にかかる参加者情報を例示する図である。図６は、参加者Ａ～Ｄの４人が参加する遠隔会議に対応する参加者情報を例示している。参加者情報は、参加者それぞれの識別情報と、それぞれの参加者の参加状態と、それぞれの参加者の衝突回数とを含む。 The participant information storage unit 110 stores participant information, which is information about participants in the teleconference.
6 is a diagram exemplifying participant information according to the first embodiment; FIG. FIG. 6 illustrates participant information corresponding to a teleconference in which four participants A to D participate. The participant information includes the identification information of each participant, the participation status of each participant, and the number of collisions of each participant.

ここで、「参加状態」とは、それぞれの参加者が遠隔会議に、現在どのように参加しているかを示す。参加状態は、後述する発話判定部１２０及び発話衝突判定部１３２によって判定される。図６の例では、参加者Ｂが発話を行っている最中に参加者Ａが発話をしてしまっている。つまり、参加者Ａは発話衝突を発生させてしまっている。したがって、参加者Ａの参加状態は「発話衝突」であり、参加者Ｂの参加状態は「発話中」である。また、参加者Ｃが相槌を行っており、参加者Ｄは音声を発していない。したがって、参加者Ｃの参加状態は「相槌」であり、参加者Ｄの参加状態は「音声なし」である。 Here, "participation status" indicates how each participant is currently participating in the teleconference. The participation state is determined by an utterance determination unit 120 and an utterance collision determination unit 132, which will be described later. In the example of FIG. 6, the participant A speaks while the participant B speaks. In other words, participant A has caused an utterance collision. Therefore, the participation state of participant A is "speech collision", and the participation state of participant B is "speaking". Participant C is backtracking, and participant D is not speaking. Therefore, the participation state of participant C is "backhand", and the participation state of participant D is "no voice".

また、「衝突回数」とは、各参加者が発話衝突を発生させてしまった回数、つまり、各参加者の衝突発話の回数を示す。衝突回数は、後述する回数カウント部１４０によってカウントされる。図６の例では、参加者Ａの衝突回数は１回である。上述したように、参加者Ａは発話衝突を発生させてしまったので、衝突回数が、０回から１回に更新されている。また、参加者Ｂの衝突回数は２回であり、参加者Ｃの衝突回数は１回であり、参加者Ｄの衝突回数は０回である。 Also, the "number of collisions" indicates the number of times each participant has caused an utterance collision, that is, the number of colliding utterances of each participant. The number of collisions is counted by the number counting unit 140, which will be described later. In the example of FIG. 6, the number of collisions of participant A is one. As described above, since participant A has caused a speech collision, the number of collisions has been updated from 0 to 1. Also, the number of collisions for participant B is two, the number of collisions for participant C is one, and the number of collisions for participant D is zero.

音声受信部１１２は、各通信端末３０から、それぞれの通信端末３０のユーザである参加者の音声（音声データ）を、ネットワーク２２を介して受信する。音声受信部１１２は、通信部１０６によって、通信端末３０の音声送信部４４によって送信された参加者の音声を受信（音声データ）する。これにより、参加者Ａ～Ｄの音声が受信される。 The voice receiving unit 112 receives voices (audio data) of participants who are users of the respective communication terminals 30 from each communication terminal 30 via the network 22 . The voice receiving unit 112 receives the voice of the participant transmitted by the voice transmitting unit 44 of the communication terminal 30 (voice data) through the communication unit 106 . As a result, the voices of the participants A to D are received.

発話判定部１２０は、複数の参加者それぞれについて、音声受信部１１２によって受信された音声を解析して音声認識処理を行う。そして、発話判定部１２０は、各参加者の音声が発話を示しているか又は相槌を示しているかを判定する。つまり、発話判定部１２０は、各参加者が発話を行っているか否か（発話を行っているか相槌を行っているか）を判定する。 The utterance determining unit 120 analyzes the voice received by the voice receiving unit 112 and performs voice recognition processing for each of the plurality of participants. Then, the utterance determination unit 120 determines whether the voice of each participant indicates utterance or backtracking. In other words, the speech determination unit 120 determines whether or not each participant is speaking (speaking or backtracking).

具体的には、発話判定部１２０は、音響分析及び自然言語処理等の処理を行って、音声に含まれる単語を分析する。そして、発話判定部１２０は、音声に意味のある単語（主語、述語、目的語等）が含まれているか否かを判定する。言い換えると、発話判定部１２０は、音声に意味のない単語（間投詞等）以外の単語が含まれているか否かを判定する。発話判定部１２０は、音声に意味のある単語が含まれている場合、その音声が「発話」であると判定する。一方、発話判定部１２０は、音声に意味のない単語（間投詞等）のみが含まれている場合、その音声が「相槌」であると判定する。なお、発話判定部１２０は、受信された音声に人間の声が含まれているか否かを判定してもよい。発話判定部１２０は、音声に人間の声が含まれていない場合、その音声が背景音であるとして、上述した発話か相槌かの判定を行わなくてもよい。 Specifically, the utterance determination unit 120 performs processing such as acoustic analysis and natural language processing to analyze words included in speech. Then, the utterance determination unit 120 determines whether or not the speech includes meaningful words (subject, predicate, object, etc.). In other words, the utterance determination unit 120 determines whether or not the speech includes words other than meaningless words (interjections, etc.). If the speech contains a meaningful word, the speech determination unit 120 determines that the speech is “speech”. On the other hand, if the speech contains only meaningless words (such as interjections), the utterance determination unit 120 determines that the speech is a "backtrack." Note that the utterance determination unit 120 may determine whether or not the received voice includes human voice. If the voice does not include human voice, the utterance determination unit 120 does not need to determine whether the voice is background sound and whether it is an utterance or a backtracking.

音声出力制御部１３０は、複数の参加者それぞれの音声が複数の参加者それぞれの通信端末３０で出力されるように制御を行う。具体的には、音声出力制御部１３０は、受信された音声（音声データ）を、通信部１０６によって、ネットワーク２２を介して、複数の参加者それぞれの通信端末３０に送信する。これにより、各通信端末３０の音声出力部４８によって、音声が出力される。したがって、参加者Ａ～Ｄは、他の参加者の音声を聴取することができる。また、音声出力制御部１３０は、ミキシング処理を行って、複数の参加者の音声が同時に発せられた場合にそれぞれの音声が途切れないように処理を行ってもよい。但し、本実施の形態では、後述するように、発話衝突が発生した場合は、発話衝突を発生された音声の出力が抑制される。一方、音声出力制御部１３０は、音声が相槌に対応する場合は、その音声を複数の参加者それぞれの通信端末３０に送信する。これにより、各通信端末３０の音声出力部４８によって、参加者の相槌が出力される。 The audio output control unit 130 performs control so that the audio of each of the multiple participants is output from the communication terminals 30 of the multiple participants. Specifically, the audio output control unit 130 transmits the received audio (audio data) to the communication terminals 30 of the respective participants via the network 22 by the communication unit 106 . As a result, the sound is output by the sound output unit 48 of each communication terminal 30 . Therefore, participants AD can listen to the voices of other participants. In addition, the audio output control unit 130 may perform mixing processing so that when voices of a plurality of participants are uttered at the same time, the voices of the participants are not interrupted. However, in the present embodiment, as will be described later, when a speech collision occurs, the output of the voice in which the speech collision has occurred is suppressed. On the other hand, when the voice corresponds to the backtracking, the voice output control section 130 transmits the voice to the communication terminals 30 of the respective participants. As a result, the voice output unit 48 of each communication terminal 30 outputs the backtrack of the participant.

発話衝突判定部１３２は、複数の参加者それぞれについて、発話衝突が発生したか否かを判定する。具体的には、発話衝突判定部１３２は、発話判定部１２０によりある参加者が発話を行っていると判定された場合に、その参加者の発話が開始して発話が終了するまでの期間で、他の参加者が発話を開始したか否かを判定する。発話衝突判定部１３２は、ある参加者が発話を行っている期間で他の参加者が発話を開始した場合に、他の参加者（後で発話を行った参加者）が発話衝突を発生させたと判定する。この、発話衝突を発生させた他の参加者の発話を、衝突発話と称する。図６の例では、参加者Ｂが発話を行っている期間で参加者Ａが発話を開始してしまったので、発話衝突判定部１３２は、参加者Ａが発話衝突を発生させたと判定し、参加者Ａの発話を衝突発話と判定する。 The speech collision determination unit 132 determines whether or not a speech collision has occurred for each of the plurality of participants. Specifically, when the speech determination unit 120 determines that a certain participant is speaking, the speech collision determination unit 132 performs , determine whether another participant has started speaking. The speech collision determination unit 132 causes another participant (participant who speaks later) to cause speech collision when another participant starts speaking while a certain participant is speaking. I judge that. The utterances of the other participants that caused the utterance collision are referred to as colliding utterances. In the example of FIG. 6, since participant A started speaking while participant B was speaking, the speech collision determination unit 132 determines that participant A has caused a speech collision, Participant A's utterance is determined as a conflicting utterance.

発話出力抑制部１３４は、衝突発話の出力を抑制するための制御を行う。具体的には、発話出力抑制部１３４は、衝突発話（音声データ）を複数の参加者の通信端末３０に送信しないように制御を行う。これにより、各通信端末３０は衝突発話（音声データ）を受信しないので、通信端末３０で衝突発話が出力されない。したがって、図６の例では、各通信端末３０において、参加者Ａの発話（衝突発話）は、参加者Ｂの発話の聴取の妨げにならない。あるいは、発話出力抑制部１３４は、各通信端末３０で、衝突発話が小さな音量で出力されるように制御を行ってもよい。例えば、発話出力抑制部１３４は、衝突発話の音量が発話衝突を被った先の発話（図６の例では参加者Ｂの発話）の聴取を妨げない程度まで小さくなるように、衝突発話の音声データを加工してもよい。そして、音声出力制御部１３０は、その加工された音声データを各通信端末３０に送信してもよい。これにより、図６の例において、各通信端末３０では、参加者Ｂの発話の聴取の妨げにならない程度の極めて小さい音量で、参加者Ａの発話が出力される。 The speech output suppression unit 134 performs control for suppressing the output of conflicting speech. Specifically, the speech output suppression unit 134 performs control so as not to transmit conflicting speech (audio data) to the communication terminals 30 of a plurality of participants. As a result, each communication terminal 30 does not receive the collision utterance (voice data), so that the communication terminal 30 does not output the collision utterance. Therefore, in the example of FIG. 6, at each communication terminal 30, participant A's speech (collision speech) does not interfere with listening to participant B's speech. Alternatively, the speech output suppression unit 134 may control each communication terminal 30 so that the collision speech is output at a low volume. For example, the utterance output suppression unit 134 reduces the volume of the conflicting utterance so that the volume of the conflicting utterance (in the example of FIG. 6, the utterance of participant B) is reduced to the extent that it does not interfere with listening to the voice of the conflicting utterance. Data may be processed. Then, the voice output control section 130 may transmit the processed voice data to each communication terminal 30 . As a result, in the example of FIG. 6, each communication terminal 30 outputs the speech of the participant A at a very low volume that does not interfere with listening to the speech of the participant B. FIG.

回数カウント部１４０は、複数の参加者それぞれについて、発話衝突の発生した回数をカウントする。言い換えると、回数カウント部１４０は、複数の参加者（通信端末３０）ごとに、衝突発話の回数をカウントする。これにより、図６に例示した衝突回数がカウントされる。 The number-of-times counting unit 140 counts the number of occurrences of speech collision for each of the plurality of participants. In other words, the number counting unit 140 counts the number of conflicting utterances for each of a plurality of participants (communication terminals 30). As a result, the number of collisions illustrated in FIG. 6 is counted.

表示制御部１５０は、複数の参加者それぞれについて、各通信端末３０においてどのような画像が表示されるかを制御する。具体的には、表示制御部１５０は、各通信端末３０に表示させる画像を示す表示情報を生成する。そして、表示制御部１５０は、生成された表示情報を、各通信端末３０に送信する。また、表示制御部１５０は、参加者情報格納部１１０に格納された参加者情報に応じて、表示情報を生成してもよい。なお、表示制御部１５０は、発話衝突を発生させた参加者の通信端末３０に、他の参加者が発話中である旨のメッセージを表示させるような表示情報を送信してもよい。また、表示制御部１５０は、参加者情報と、参加者情報に応じた表示を行うことを示す指示とを含む表示情報を生成してもよい。この場合、通信端末３０は、表示情報に応じて、通信端末３０のインタフェース部２８で表示される画像を生成する。 The display control unit 150 controls what kind of image is displayed on each communication terminal 30 for each of the plurality of participants. Specifically, the display control unit 150 generates display information indicating an image to be displayed on each communication terminal 30 . The display control unit 150 then transmits the generated display information to each communication terminal 30 . Also, the display control unit 150 may generate display information according to the participant information stored in the participant information storage unit 110 . The display control unit 150 may transmit display information for displaying a message indicating that another participant is speaking to the communication terminal 30 of the participant who has caused the speech collision. Further, the display control unit 150 may generate display information including participant information and an instruction to display according to the participant information. In this case, the communication terminal 30 generates an image to be displayed on the interface section 28 of the communication terminal 30 according to the display information.

回数表示制御部１５２は、複数の参加者ごとの衝突発話の回数が各通信端末３０で表示されるように、制御を行う。具体的には、回数表示制御部１５２は、各参加者の衝突回数がどれだけであるかを示す表示情報を生成する。そして、表示制御部１５０がその表示情報を複数の通信端末３０に送信することで、複数の通信端末３０で、各参加者の衝突回数が表示される。図６の例では、参加者Ａ～Ｄそれぞれの通信端末３０で、参加者Ａの衝突回数が１回であり、参加者Ｂの衝突回数が２回であり、参加者Ｃの衝突回数が１回であり、参加者Ｄの衝突回数が０回であることが、表示される。これにより、各参加者は、全員の参加者の衝突回数を把握することができる。したがって、各参加者は、どの参加者が発話をしたがっているかを把握することができる。 The frequency display control unit 152 performs control so that each communication terminal 30 displays the number of conflicting utterances for each of a plurality of participants. Specifically, the number display control unit 152 generates display information indicating how many times each participant has collided. Then, the display control unit 150 transmits the display information to the plurality of communication terminals 30 so that the number of collisions of each participant is displayed on the plurality of communication terminals 30 . In the example of FIG. 6, the number of collisions for participant A is one, the number of collisions for participant B is two, and the number of collisions for participant C is one. and that the number of collisions for participant D is 0. Thereby, each participant can grasp the number of collisions of all the participants. Therefore, each participant can grasp which participant wants to speak.

なお、回数表示制御部１５２は、予め定められた閾値よりも多い衝突回数をこの閾値以下の衝突回数の表示よりも目立つような表示形態で表示させてもよい。つまり、回数表示制御部１５２は、ある参加者の衝突回数が予め定められた閾値よりも多い場合に、その衝突回数を他の参加者の衝突回数の表示よりも目立つような表示形態で表示させるようにしてもよい。回数表示制御部１５２は、その表示形態で衝突回数が表示されるための指示を含む表示情報を生成する。例えば、回数表示制御部１５２は、閾値以下の衝突回数を黒字で表示させ、閾値を超える衝突回数を赤字で表示させるようにしてもよい。これにより、各参加者は、どの参加者が発話をしたがっているかを、より確実に把握することができる。 Note that the number of collisions display control unit 152 may display the number of collisions greater than a predetermined threshold in a display form that is more conspicuous than the number of collisions equal to or less than the threshold. In other words, when the number of collisions of a certain participant is greater than a predetermined threshold, the number of collisions display control unit 152 displays the number of collisions in a display form that is more conspicuous than the number of collisions of other participants. You may do so. The number of times display control unit 152 generates display information including an instruction for displaying the number of collisions in that display form. For example, the frequency display control unit 152 may display the number of collisions below the threshold in black and the number of collisions exceeding the threshold in red. This allows each participant to more reliably grasp which participant wants to speak.

また、回数表示制御部１５２は、複数の参加者の衝突回数のうち最も多い衝突回数を、他の衝突回数の表示よりも目立つような表示形態で、各通信端末３０に表示させるようにしてもよい。回数表示制御部１５２は、その表示形態で衝突回数が表示されるようにする指示を含む表示情報を生成する。例えば、回数表示制御部１５２は、最も多い衝突回数を赤字で表示させ、その他の衝突回数を黒字で表示させるようにしてもよい。これにより、各参加者は、どの参加者の衝突回数が他の参加者の衝突回数よりも多いかを、確実に把握することができる。これにより、相対的に、どの参加者がより発話をしたがっているかを、確実に把握することができる。 Further, the number of times display control unit 152 may cause each communication terminal 30 to display the number of collisions, which is the largest among the number of collisions of a plurality of participants, in a display form that is more conspicuous than the display of other number of collisions. good. The number of times display control unit 152 generates display information including an instruction to display the number of collisions in that display form. For example, the frequency display control unit 152 may display the highest number of collisions in red and the other number of collisions in black. As a result, each participant can reliably grasp which participant's number of collisions is greater than the number of collisions of other participants. This makes it possible to reliably grasp which participant relatively wants to speak more.

また、回数表示制御部１５２は、他の衝突回数よりも突出して多い衝突回数を、他の衝突回数の表示よりも目立つような表示形態で、各通信端末３０に表示させるようにしてもよい。例えば、回数表示制御部１５２は、複数の参加者のうちの第１の参加者の衝突回数から他の参加者それぞれの衝突回数を減算する。そして、回数表示制御部１５２は、減算して得られた値が、全て、予め定められた閾値よりも大きい場合に、第１の参加者の衝突回数を他の参加者の衝突回数の表示よりも目立つような表示形態で表示させるようにしてもよい。回数表示制御部１５２は、その表示形態で衝突回数が表示されるようにする指示を含む表示情報を生成する。例えば、回数表示制御部１５２は、第１の参加者の衝突回数を赤字で表示させ、その他の参加者の衝突回数を黒字で表示させるようにしてもよい。これにより、各参加者は、どの参加者の衝突回数が他の参加者の衝突回数と比較して突出して多いかを、確実に把握することができる。これにより、相対的に、どの参加者がより発話をしたがっているかを、より確実に把握することができる。 In addition, the number of times display control unit 152 may cause each communication terminal 30 to display the number of collisions that is significantly larger than the number of other collisions in a display form that is more conspicuous than the display of the number of other collisions. For example, the number display control unit 152 subtracts the number of collisions of each of the other participants from the number of collisions of the first participant among the plurality of participants. Then, when the values obtained by the subtraction are all larger than a predetermined threshold value, the number display control unit 152 displays the number of collisions of the first participant more than the number of collisions of the other participants. may be displayed in a conspicuous display form. The number of times display control unit 152 generates display information including an instruction to display the number of collisions in that display form. For example, the number display control unit 152 may display the number of collisions of the first participant in red and the number of collisions of the other participants in black. As a result, each participant can reliably grasp which participant's number of collisions is significantly higher than that of other participants. This makes it possible to more reliably grasp which participant relatively wants to speak more.

アイコン表示制御部１５４は、複数の参加者それぞれに対応する顔アイコンが複数の参加者それぞれの通信端末３０に表示されるように、制御を行う。アイコン表示制御部１５４は、顔アイコンを表示する旨の指示を含む表示情報を生成する。図６の例では、参加者Ａ～Ｄに対応する４つの顔アイコンが、通信端末３０に表示される。 The icon display control unit 154 performs control so that face icons corresponding to each of the multiple participants are displayed on the communication terminals 30 of the multiple participants. Icon display control unit 154 generates display information including an instruction to display a face icon. In the example of FIG. 6, four face icons corresponding to participants A to D are displayed on communication terminal 30 .

ここで、アイコン表示制御部１５４は、各顔アイコンが、対応する参加者の参加状態に対応して動作するように、表示情報を生成してもよい。具体的には、アイコン表示制御部１５４は、衝突発話を行った参加者の顔アイコンを動作させないように、顔アイコンを表示させてもよい。一方、アイコン表示制御部１５４は、衝突発話以外の発話を行った参加者の顔アイコンを動作させるように、顔アイコンを表示させてもよい。また、アイコン表示制御部１５４は、相槌を行った参加者の顔アイコンを動作させるように、顔アイコンを表示させてもよい。 Here, the icon display control unit 154 may generate display information such that each face icon operates in accordance with the participation state of the corresponding participant. Specifically, the icon display control unit 154 may display the face icon of the participant who made the conflicting utterance so as not to move the face icon. On the other hand, the icon display control unit 154 may display the face icon so as to operate the face icon of the participant who made an utterance other than the conflicting utterance. In addition, the icon display control unit 154 may display the face icon so as to move the face icon of the participant who gave the backtrack.

例えば、アイコン表示制御部１５４は、発声（発話又は相槌）を行っていない参加者（図６の例では参加者Ｄ）の顔アイコンの口が閉じているように、顔アイコンを表示させてもよい。また、アイコン表示制御部１５４は、衝突発話以外の発話を行った参加者（図６の例では参加者Ｂ）の顔アイコンの口を開けるように、顔アイコンを表示させてもよい。あるいは、アイコン表示制御部１５４は、衝突発話以外の発話を行った参加者の顔アイコンの口が開閉するように、顔アイコンを表示させてもよい。また、アイコン表示制御部１５４は、相槌を行った参加者（図６の例では参加者Ｃ）の顔アイコンの口を開けるように、顔アイコンを表示させてもよい。あるいは、アイコン表示制御部１５４は、相槌を行った参加者の顔アイコンの口が開閉するように、顔アイコンを表示させてもよい。一方、アイコン表示制御部１５４は、衝突発話を行った参加者（図６の例では参加者Ａ）の顔アイコンの口が閉じたままとするように、顔アイコンを表示させてもよい。 For example, the icon display control unit 154 may display the face icon of the participant (participant D in the example of FIG. 6) who does not speak (utterance or backhand) so that the mouth of the face icon is closed. good. Further, the icon display control unit 154 may display the face icon of the participant (participant B in the example of FIG. 6) who has made an utterance other than the conflicting utterance so that the mouth of the face icon is open. Alternatively, the icon display control unit 154 may display the face icon such that the mouth of the face icon of the participant who made the utterance other than the collision utterance opens and closes. In addition, the icon display control unit 154 may display the face icon of the participant (participant C in the example of FIG. 6) who gave the backtrack so that the mouth of the face icon is open. Alternatively, the icon display control unit 154 may display the face icon such that the mouth of the face icon of the participant who made the backtracking opens and closes. On the other hand, the icon display control unit 154 may display the face icon of the participant who made the conflicting utterance (participant A in the example of FIG. 6) so that the mouth of the face icon remains closed.

これにより、各参加者は、各通信端末３０に表示された顔アイコンを見て、どの参加者が発話中であるかを把握することができる。また、各参加者は、相槌を行った参加者の通信端末３０がミュート設定である場合であっても、相槌を行った参加者が相槌を行ったことを把握することができる。また、発話衝突を発生させた参加者の顔アイコンが動作しないので、各参加者は、発話衝突による煩わしさを受けることが抑制される。 Thereby, each participant can see the face icon displayed on each communication terminal 30 and grasp which participant is speaking. Further, each participant can recognize that the participant who gave the backtrack has backtracked, even if the communication terminal 30 of the participant who has backtracked is set to mute. In addition, since the face icon of the participant who caused the speech collision does not move, each participant is prevented from being annoyed by the speech collision.

図７は、実施の形態１にかかる遠隔会議システム２０によって実行される遠隔会議方法を示すフローチャートである。図７に示す処理は、主に、遠隔会議装置１００によって実行される。遠隔会議装置１００は、遠隔会議を開始する（ステップＳ１０２）。このとき、表示制御部１５０によって生成される表示情報は、全ての参加者について、顔アイコンの口は閉じた状態（顔アイコンが動作していない状態）である旨、及び、衝突回数が０回である旨を示している。 FIG. 7 is a flow chart showing a remote conference method executed by the remote conference system 20 according to the first embodiment. The processing shown in FIG. 7 is mainly executed by the teleconference device 100. FIG. The remote conference device 100 starts a remote conference (step S102). At this time, the display information generated by the display control unit 150 indicates that the mouth of the face icon is in a closed state (a state in which the face icon is not operating) and that the number of collisions is 0 for all participants. It indicates that

次に、音声受信部１１２は、参加者Ｘの音声を受信する（ステップＳ１０４）。ここで、図６のように参加者Ａ～Ｄが遠隔会議に参加している場合、参加者Ｘ（及び後述する参加者Ｙ）は、参加者Ａ～Ｄのいずれかである。そして、発話判定部１２０は、上述したように、参加者Ｘの音声が発話を示しているか又は相槌を示しているかを判定する（ステップＳ１０６）。参加者Ｘの音声が発話を示していない（つまり相槌を示している）場合（ステップＳ１０８のＮＯ）、音声出力制御部１３０は、参加者Ｘの相槌が各通信端末３０で出力されるように制御を行う（ステップＳ１１２）。また、表示制御部１５０（アイコン表示制御部１５４）は、参加者Ｘの顔アイコンが動作するように各通信端末３０で表示されるように、制御を行う（ステップＳ１１４）。 Next, the voice receiving unit 112 receives the voice of the participant X (step S104). Here, when participants A to D are participating in the teleconference as shown in FIG. 6, participant X (and participant Y to be described later) is one of participants A to D. FIG. Then, as described above, the utterance determination unit 120 determines whether the voice of the participant X indicates utterance or backtracking (step S106). If the voice of the participant X does not indicate an utterance (that is, indicates a backtracking) (NO in step S108), the voice output control unit 130 causes the backtracking of the participant X to be output from each communication terminal 30. Control is performed (step S112). Further, the display control unit 150 (icon display control unit 154) performs control so that the face icon of the participant X is displayed on each communication terminal 30 so as to operate (step S114).

一方、参加者Ｘの音声が発話を示している場合（Ｓ１０８のＹＥＳ）、発話衝突判定部１３２は、参加者Ｘとは別の参加者Ｙが既に発話中であるか否かを判定する（ステップＳ１２０）。参加者Ｙが発話中でない場合（Ｓ１２０のＮＯ）、参加者Ｘが発話を行ったときに他の誰も発話を行っていないので、発話衝突が発生していない。したがって、音声出力制御部１３０は、参加者Ｘの発話が各通信端末３０で出力されるように制御を行う（ステップＳ１２２）。また、表示制御部１５０（アイコン表示制御部１５４）は、参加者Ｘの顔アイコンが動作するように各通信端末３０で表示されるように、制御を行う（ステップＳ１２４）。このとき、表示制御部１５０は、参加者Ｘが発話中であることを示すメッセージが各通信端末３０で表示されるように、制御を行ってもよい。 On the other hand, if the voice of the participant X indicates an utterance (YES in S108), the utterance collision determination unit 132 determines whether or not the participant Y different from the participant X is already speaking ( step S120). If participant Y is not speaking (NO in S120), no speech collision has occurred because no one else is speaking when participant X speaks. Therefore, the voice output control unit 130 performs control so that the voice of the participant X is output from each communication terminal 30 (step S122). Further, the display control unit 150 (icon display control unit 154) performs control so that the face icon of the participant X is displayed on each communication terminal 30 so as to operate (step S124). At this time, the display control unit 150 may perform control so that each communication terminal 30 displays a message indicating that the participant X is speaking.

一方、参加者Ｙが発話中である場合（Ｓ１２０のＹＥＳ）、参加者Ｘの発話によって発話衝突が発生している。したがって、音声出力制御部１３０（発話出力抑制部１３４）は、参加者Ｘの発話の出力を抑制するように、制御を行う（ステップＳ１３２）。また、回数カウント部１４０は、参加者Ｘの衝突回数を１つインクリメントする（ステップＳ１３４）。これにより、参加者情報格納部１１０に格納された参加者情報の、参加者Ｘの衝突回数が更新される。また、表示制御部１５０（回数表示制御部１５２）は、参加者Ｘの衝突回数の表示が更新されるように、制御を行う（ステップＳ１３６）。また、表示制御部１５０は、参加者Ｘの通信端末３０に、「他の参加者が発話中」である旨が表示されるように、制御を行う（ステップＳ１３８）。 On the other hand, if participant Y is speaking (YES in S120), participant X's speech causes a speech collision. Therefore, the voice output control unit 130 (speech output suppression unit 134) performs control so as to suppress the output of the speech of the participant X (step S132). In addition, the number counting unit 140 increments the number of collisions of the participant X by one (step S134). As a result, the number of collisions of the participant X in the participant information stored in the participant information storage unit 110 is updated. Further, the display control unit 150 (the number of times display control unit 152) performs control so that the display of the number of collisions of the participant X is updated (step S136). In addition, the display control unit 150 controls the communication terminal 30 of the participant X so that "another participant is speaking" is displayed (step S138).

（実施の形態２）
次に、実施の形態２について、図面を参照しながら説明する。説明の明確化のため、以下の記載及び図面は、適宜、省略、及び簡略化がなされている。また、各図面において、同一の要素には同一の符号が付されており、必要に応じて重複説明は省略されている。実施の形態２では、実施の形態１にかかる遠隔会議装置１００の機能が各通信端末において実現され得る点で、実施の形態１と異なる。 (Embodiment 2)
Next, Embodiment 2 will be described with reference to the drawings. For clarity of explanation, the following descriptions and drawings are omitted and simplified as appropriate. Moreover, in each drawing, the same elements are denoted by the same reference numerals, and redundant description is omitted as necessary. Embodiment 2 differs from Embodiment 1 in that the functions of the teleconference apparatus 100 according to Embodiment 1 can be implemented in each communication terminal.

図８は、実施の形態２にかかる遠隔会議システム２００を示す図である。遠隔会議システム２００は、複数の通信端末２０１Ａ～２０１Ｄと、会議サーバ２２０とを有する。通信端末２０１Ａ～２０１Ｄは、インターネット等のネットワークに接続されている。通信端末２０１Ａ～２０１Ｄと、会議サーバ２２０とは、互いにネットワークを介して通信可能に接続されている。なお、図８には、４つの通信端末２０１が示されているが、通信端末２０１の数は２以上の任意の数であり得る。 FIG. 8 is a diagram showing a remote conference system 200 according to the second embodiment. Teleconference system 200 has a plurality of communication terminals 201A to 201D and conference server 220 . Communication terminals 201A to 201D are connected to a network such as the Internet. The communication terminals 201A to 201D and the conference server 220 are communicably connected to each other via a network. Although four communication terminals 201 are shown in FIG. 8, the number of communication terminals 201 may be any number equal to or greater than two.

複数の通信端末２０１Ａ～２０１Ｄは、それぞれ、会議実行システム２０２、カメラ２０３、マイク２０４、ディスプレイ２０５、及びスピーカ２０６を有する。会議実行システム２０２は、遠隔会議を実行するように機能する。カメラ２０３は、その通信端末２０１のユーザの姿（顔など）を撮影可能である。マイク２０４は、その通信端末２０１のユーザの音声を収集可能である。ディスプレイ２０５は、遠隔会議に関する画像を表示可能である。スピーカ２０６は、遠隔会議の参加者（通信端末２０１Ａ～２０１Ｄのユーザ）の音声を出力可能である。 A plurality of communication terminals 201A-201D each have a conference execution system 202, a camera 203, a microphone 204, a display 205, and a speaker 206. Conference execution system 202 functions to conduct teleconferences. The camera 203 can photograph the appearance (face, etc.) of the user of the communication terminal 201 . Microphone 204 can collect the voice of the user of communication terminal 201 . The display 205 can display images relating to the teleconference. The speaker 206 can output the voices of the participants of the remote conference (users of the communication terminals 201A to 201D).

会議実行システム２０２は、構成要素として、発話状態検出部２０７、会議情報受信部２０８、会議制御部２０９、及び会議情報送信部２１０を有する。なお、各通信端末２０１は、上述した実施の形態１にかかる通信端末３０のハードウェア構成を有し得る。通信端末２０１の各構成要素の説明は後述する。 The conference execution system 202 has an utterance state detection unit 207, a conference information reception unit 208, a conference control unit 209, and a conference information transmission unit 210 as components. Each communication terminal 201 can have the hardware configuration of the communication terminal 30 according to the first embodiment described above. Description of each component of the communication terminal 201 will be given later.

通信端末２０１は、その通信端末２０１のユーザの音声を示す音声情報を、会議サーバ２２０に送信する。また、通信端末２０１は、ユーザの発話状態を検出して、検出された発話状態を示す発話状態情報を、会議サーバ２２０に送信する。ここで、「発話状態」とは、各参加者が発話を行っているか相槌を行っているかを示す。なお、発話状態は、参加者が無言であることを示してもよい。 The communication terminal 201 transmits voice information indicating the voice of the user of the communication terminal 201 to the conference server 220 . The communication terminal 201 also detects the user's speech state and transmits speech state information indicating the detected speech state to the conference server 220 . Here, the "utterance state" indicates whether each participant is uttering or backtracking. Note that the speech state may indicate that the participant is silent.

会議サーバ２２０は、各通信端末２０１から音声情報及び発話状態情報を受信すると、各ユーザ（遠隔会議の参加者）の音声情報に対してミキシング処理を行う。そして、会議サーバ２２０は、複数の通信端末２０１に、ミキシング処理が施された音声情報と、発話状態情報とを送信する。ミキシング処理が施された音声情報を送信することにより、各通信端末２０１において、スピーカ２０６から、安定して音声が出力され得る。 When the conference server 220 receives the voice information and the speech state information from each communication terminal 201, the conference server 220 performs mixing processing on the voice information of each user (participant in the teleconference). Then, the conference server 220 transmits the mixed voice information and the utterance state information to the plurality of communication terminals 201 . By transmitting audio information that has undergone mixing processing, audio can be stably output from speaker 206 in each communication terminal 201 .

図９は、実施の形態２にかかる遠隔会議システム２００において発話状態情報が送受信される状態を例示する図である。通信端末２０１Ａ（通信端末Ａ）は、通信端末２０１ＡのユーザＡの発話状態情報を、会議サーバ２２０に送信する。通信端末２０１Ｂ（通信端末Ｂ）は、通信端末２０１ＢのユーザＢの発話状態情報を、会議サーバ２２０に送信する。通信端末２０１Ｃ（通信端末Ｃ）は、通信端末２０１ＣのユーザＣの発話状態情報を、会議サーバ２２０に送信する。通信端末２０１Ｄ（通信端末Ｄ）は、通信端末２０１ＤのユーザＤの発話状態情報を、会議サーバ２２０に送信する。 FIG. 9 is a diagram illustrating a state in which speech state information is transmitted and received in the remote conference system 200 according to the second embodiment. 201 A of communication terminals (communication terminal A) transmit the speech state information of the user A of 201 A of communication terminals to the conference server 220. FIG. Communication terminal 201B (communication terminal B) transmits the speech state information of user B of communication terminal 201B to conference server 220 . The communication terminal 201C (communication terminal C) transmits the speech state information of the user C of the communication terminal 201C to the conference server 220 . Communication terminal 201D (communication terminal D) transmits the speech state information of user D of communication terminal 201D to conference server 220 .

また、通信端末２０１Ａは、全員（ユーザＡ～Ｄ）の発話状態情報を、会議サーバ２２０から受信する。同様に、通信端末２０１Ｂ～２０１Ｄは、全員（ユーザＡ～Ｄ）の発話状態情報を、会議サーバ２２０から受信する。なお、各通信端末２０１は、そのユーザ以外の全員の発話状態情報を、会議サーバ２２０から受信してもよい。例えば、通信端末２０１Ａは、ユーザＢ～Ｄの発話状態情報を、会議サーバ２２０から受信してもよい。 The communication terminal 201A also receives the speech state information of all members (users A to D) from the conference server 220. FIG. Similarly, the communication terminals 201B-201D receive speech state information of all users (users A-D) from the conference server 220. FIG. Each communication terminal 201 may receive the speech state information of all users other than the user from the conference server 220 . For example, the communication terminal 201A may receive speech state information of users BD from the conference server 220. FIG.

図１０は、実施の形態２にかかる発話状態検出部２０７の構成を示すブロック図である。発話状態検出部２０７は、図１に示した発話判定部２及び図５に示した発話判定部１２０に対応する。つまり、発話状態検出部２０７は、発話判定手段としての機能を有する。発話状態検出部２０７は、音声入力部２２２、音声検出部２２３、言語認識部２２４、及び発話有無判別部２２５を有する。 FIG. 10 is a block diagram showing the configuration of the speech state detection unit 207 according to the second embodiment. The utterance state detection unit 207 corresponds to the utterance determination unit 2 shown in FIG. 1 and the utterance determination unit 120 shown in FIG. In other words, the utterance state detection unit 207 has a function as utterance determination means. The utterance state detection unit 207 has a voice input unit 222 , a voice detection unit 223 , a language recognition unit 224 , and an utterance presence/absence determination unit 225 .

音声入力部２２２は、マイク２０４で収集された音声信号（通信端末２０１のユーザの音声信号）を受け付ける。音声検出部２２３は、音声信号から音声情報を検出する。言語認識部２２４は、音声認識処理、音響分析、及び自然言語処理等を行って、音声情報から意味のある言語（主語、述語、目的語等）を認識する。 Audio input unit 222 receives audio signals collected by microphone 204 (audio signals of the user of communication terminal 201). The voice detection unit 223 detects voice information from the voice signal. The language recognition unit 224 performs speech recognition processing, acoustic analysis, natural language processing, etc., and recognizes meaningful language (subject, predicate, object, etc.) from speech information.

発話有無判別部２２５は、音声情報が発話に対応するか相槌に対応するかを判定する。音声情報から言語（意味のある単語）が認識された場合、発話有無判別部２２５は、音声情報が発話に対応すると判定する。音声情報から言語が認識されなかった場合、発話有無判別部２２５は、音声情報が相槌に対応すると判定する。なお、音声情報から人間の声が認識されなかった場合、発話有無判別部２２５は、音声情報が「無言」（発話も相槌も行っていない状態）に対応すると判定してもよい。発話状態検出部２０７は、発話有無判別部２２５による判定結果に応じて、発話状態情報を生成する。なお、発話状態情報は、会議制御部２０９によって生成されてもよい。 The speech presence/absence determination unit 225 determines whether the audio information corresponds to speech or backtracking. When a language (meaningful word) is recognized from the voice information, the speech presence/absence determination unit 225 determines that the voice information corresponds to speech. If no language is recognized from the voice information, the utterance presence/absence determining unit 225 determines that the voice information corresponds to backtracking. If human voice is not recognized from the voice information, the utterance presence/absence determination unit 225 may determine that the voice information corresponds to "silent" (a state in which neither utterance nor backtracking is performed). The utterance state detection unit 207 generates utterance state information according to the determination result by the utterance presence/absence determination unit 225 . Note that the speech state information may be generated by the conference control unit 209 .

会議情報受信部２０８及び会議情報送信部２１０は、ネットワークを介して会議サーバ２２０と接続されている。会議情報受信部２０８は、会議サーバ２２０から、通信端末２０１Ａ～２０１Ｄのユーザの会議情報を受信する。会議情報送信部２１０は、会議サーバ２２０に、その通信端末２０１のユーザの会議情報を送信する。例えば、通信端末２０１Ａは、ユーザＡの会議情報を会議サーバ２２０に送信する。 The conference information receiving section 208 and the conference information transmitting section 210 are connected to the conference server 220 via a network. The conference information receiving unit 208 receives the conference information of the users of the communication terminals 201A to 201D from the conference server 220. FIG. The conference information transmission unit 210 transmits the conference information of the user of the communication terminal 201 to the conference server 220 . For example, the communication terminal 201A transmits user A's conference information to the conference server 220 .

図１１は、実施の形態２にかかる会議情報を例示する図である。会議情報は、顔アイコン表示情報と、発話状態情報と、音声情報と、衝突回数情報とを含む。また、会議情報は、対応するユーザ（通信端末２０１）の識別情報を含み得る。顔アイコン表示情報は、対応するユーザの顔アイコンをどのように表示させるかを示す情報である。衝突回数情報は、対応するユーザの衝突回数を示す情報である。なお、会議情報送信部２１０によって送信される会議情報は、図１１に示した情報の全てを含むとは限らない。また、会議情報受信部２０８によって受信される会議情報は、図１１に示した情報の全てを含むとは限らない。 FIG. 11 is a diagram exemplifying conference information according to the second embodiment. The conference information includes face icon display information, speech state information, voice information, and collision count information. Also, the conference information may include identification information of the corresponding user (communication terminal 201). The face icon display information is information indicating how to display the face icon of the corresponding user. The number of collisions information is information indicating the number of collisions of the corresponding user. Note that the conference information transmitted by the conference information transmission unit 210 does not necessarily include all of the information shown in FIG. 11 . Also, the conference information received by the conference information receiving unit 208 does not necessarily include all of the information shown in FIG.

会議制御部２０９は、会議情報送信部２１０によって送信される会議情報を生成する。言い換えると、会議制御部２０９は、図１１に例示した情報のうちのどの情報を会議情報として送信するかを決定する。ここで、会議制御部２０９は、会議情報受信部２０８によって受信された会議情報を用いて、会議情報送信部２１０によって送信される会議情報を生成する。また、会議制御部２０９は、会議情報受信部２０８によって受信された会議情報を用いて、ディスプレイ２０５に会議の画像を表示させる。また、会議制御部２０９は、会議情報受信部２０８によって受信された会議情報を用いて、スピーカ２０６に音声を出力させる。 The conference control section 209 generates conference information to be transmitted by the conference information transmission section 210 . In other words, the conference control unit 209 determines which information among the information illustrated in FIG. 11 is to be transmitted as the conference information. Here, the conference control section 209 uses the conference information received by the conference information receiving section 208 to generate conference information to be transmitted by the conference information transmitting section 210 . Also, the conference control unit 209 uses the conference information received by the conference information receiving unit 208 to display an image of the conference on the display 205 . Also, the conference control unit 209 uses the conference information received by the conference information receiving unit 208 to cause the speaker 206 to output sound.

図１２は、実施の形態２にかかる会議制御部２０９の構成を示す図である。会議制御部２０９は、音声出力制御部２１１と、回数カウント部２１５と、表示制御部２１６とを有する。音声出力制御部２１１は、発話衝突判定部２１２と、発話出力抑制部２１４とを有する。表示制御部２１６は、回数表示制御部２１７と、アイコン表示制御部２１８とを有する。会議制御部２０９は、実施の形態１にかかる遠隔会議装置１００が参加者ごとに行う処理を、対応する通信端末２０１のユーザについてのみ行うように、構成されていてもよい。 FIG. 12 is a diagram showing the configuration of the conference control unit 209 according to the second embodiment. The conference control unit 209 has an audio output control unit 211 , a number counting unit 215 and a display control unit 216 . The voice output control section 211 has a speech collision determination section 212 and a speech output suppression section 214 . The display control section 216 has a count display control section 217 and an icon display control section 218 . The conference control unit 209 may be configured so that the processing that the remote conference device 100 according to the first embodiment performs for each participant is performed only for the user of the corresponding communication terminal 201 .

音声出力制御部２１１は、図１に示した音声出力制御部４及び図５に示した音声出力制御部１３０に対応する。音声出力制御部２１１は、音声出力制御手段としての機能を有する。発話衝突判定部２１２は、図５に示した発話衝突判定部１３２に対応する。発話衝突判定部２１２は、発話衝突判定手段としての機能を有する。発話出力抑制部２１４は、図５に示した発話出力抑制部１３４に対応する。発話出力抑制部２１４は、発話出力抑制手段としての機能を有する。回数カウント部２１５は、図１に示したカウント部６及び図５に示した回数カウント部１４０に対応する。回数カウント部２１５は、カウント手段としての機能を有する。表示制御部２１６は、図５に示した表示制御部１５０に対応する。表示制御部２１６は、表示制御手段としての機能を有する。回数表示制御部２１７は、図１に示した回数表示制御部８及び図５に示した回数表示制御部１５２に対応する。回数表示制御部２１７は、回数表示制御手段としての機能を有する。アイコン表示制御部２１８は、図５に示したアイコン表示制御部１５４に対応する。アイコン表示制御部２１８は、アイコン表示制御手段としての機能を有する。 The audio output control section 211 corresponds to the audio output control section 4 shown in FIG. 1 and the audio output control section 130 shown in FIG. The audio output control unit 211 has a function as audio output control means. The speech collision determination unit 212 corresponds to the speech collision determination unit 132 shown in FIG. The speech collision determination unit 212 has a function as speech collision determination means. Speech output suppression section 214 corresponds to speech output suppression section 134 shown in FIG. The speech output suppression unit 214 functions as speech output suppression means. The number counting section 215 corresponds to the counting section 6 shown in FIG. 1 and the number counting section 140 shown in FIG. The number counting unit 215 has a function as counting means. A display control unit 216 corresponds to the display control unit 150 shown in FIG. The display control unit 216 has a function as display control means. The number display control unit 217 corresponds to the number display control unit 8 shown in FIG. 1 and the number display control unit 152 shown in FIG. The frequency display control unit 217 has a function as frequency display control means. The icon display control unit 218 corresponds to the icon display control unit 154 shown in FIG. The icon display control unit 218 has a function as icon display control means.

音声出力制御部２１１は、遠隔会議の複数の参加者それぞれの音声が対応する通信端末２０１で出力されるように制御を行う。また、音声出力制御部２１１は、対応する通信端末２０１のユーザの音声が複数の参加者それぞれの通信端末２０１（第１の通信端末）で出力されるように制御を行う。例えば通信端末２０１Ａでは、音声出力制御部２１１は、ユーザＡの音声が複数の参加者それぞれの通信端末２０１で出力されるように制御を行う。音声出力制御部２１１は、音声出力制御部１３０の機能と実質的に同様の機能を有してもよい。 The audio output control unit 211 performs control so that the audio of each of the multiple participants in the teleconference is output from the corresponding communication terminal 201 . Further, the voice output control unit 211 performs control so that the voice of the user of the corresponding communication terminal 201 is output from the communication terminal 201 (first communication terminal) of each of the multiple participants. For example, in the communication terminal 201A, the voice output control unit 211 performs control so that the voice of the user A is output from the communication terminals 201 of the respective participants. The audio output control section 211 may have substantially the same function as the audio output control section 130 .

発話衝突判定部２１２は、対応する通信端末２０１のユーザについて、発話衝突が発生したか否かを判定する。例えば通信端末２０１Ａでは、発話衝突判定部２１２は、ユーザＡの発話により発話衝突が発生したか否かを判定する。発話衝突判定部２１２は、会議情報受信部２０８によって受信された他のユーザに関する会議情報を用いて、ユーザＡの発話が他のユーザの発話の期間になされていないか否かを判定する。発話衝突判定部２１２は、発話衝突判定部１３２の機能と実質的に同様の機能を有してもよい。 Speech collision determination section 212 determines whether or not a speech collision has occurred for the user of corresponding communication terminal 201 . For example, in the communication terminal 201A, the speech collision determination unit 212 determines whether user A's speech causes a speech collision. The utterance collision determination unit 212 uses the conference information regarding other users received by the conference information reception unit 208 to determine whether user A's utterance is made during the other user's utterance period. The speech collision determination section 212 may have substantially the same function as the speech collision determination section 132 .

発話出力抑制部２１４は、対応する通信端末２０１のユーザが衝突発話を発生させた場合に複数の参加者それぞれの通信端末２０１（第１の通信端末）における衝突発話の出力を抑制するための制御を行う。例えば通信端末２０１Ａでは、発話出力抑制部２１４は、ユーザＡが衝突発話を発生させた場合に複数の参加者それぞれの通信端末２０１（第１の通信端末）における衝突発話の出力を抑制するための制御を行う。発話出力抑制部２１４は、発話出力抑制部１３４の機能と実質的に同様の機能を有してもよい。 The speech output suppressing unit 214 performs control for suppressing the output of the conflicting speech at the communication terminal 201 (first communication terminal) of each of the plurality of participants when the user of the corresponding communication terminal 201 generates a conflicting speech. I do. For example, in the communication terminal 201A, the speech output suppressing unit 214 controls the output of the conflicting speech in the communication terminals 201 (first communication terminals) of the respective participants when the user A makes a conflicting speech. control. The speech output suppression unit 214 may have substantially the same function as the speech output suppression unit 134 .

回数カウント部２１５は、対応する通信端末２０１のユーザについて、発話衝突の発生した回数をカウントする。例えば通信端末２０１Ａでは、回数カウント部２１５は、ユーザＡについて、発話衝突の発生した回数をカウントする。回数カウント部２１５は、回数カウント部１４０の機能と実質的に同様の機能を有してもよい。 Number-of-times counting section 215 counts the number of occurrences of speech collision for the user of corresponding communication terminal 201 . For example, in the communication terminal 201A, the number counting unit 215 counts the number of times that user A has had a speech collision. The number counting section 215 may have substantially the same function as the number counting section 140 .

表示制御部２１６は、対応する通信端末２０１のユーザについて、複数の参加者それぞれの通信端末２０１（第１の通信端末）においてどのような画像が表示されるかを制御する。例えば通信端末２０１Ａでは、表示制御部２１６は、複数の参加者それぞれの通信端末２０１（第１の通信端末）において、ユーザＡについてのどのような画像が表示されるかを制御する。表示制御部２１６は、表示制御部１５０の機能と実質的に同様の機能を有してもよい。 The display control unit 216 controls what kind of image is displayed on the communication terminal 201 (first communication terminal) of each of the plurality of participants for the user of the corresponding communication terminal 201 . For example, in the communication terminal 201A, the display control unit 216 controls what kind of image about the user A is displayed on the communication terminal 201 (first communication terminal) of each of the plurality of participants. The display control section 216 may have substantially the same function as the display control section 150 .

回数表示制御部２１７は、対応する通信端末２０１のユーザの衝突発話の回数が複数の参加者それぞれの通信端末２０１（第１の通信端末）で表示されるように、制御を行う。例えば通信端末２０１Ａでは、回数表示制御部２１７は、ユーザＡの衝突発話の回数が複数の参加者それぞれの通信端末２０１（第１の通信端末）で表示されるように、制御を行う。回数表示制御部２１７は、回数表示制御部１５２の機能と実質的に同様の機能を有してもよい。 The frequency display control unit 217 performs control so that the number of conflicting utterances of the user of the corresponding communication terminal 201 is displayed on the communication terminal 201 (first communication terminal) of each of the multiple participants. For example, in the communication terminal 201A, the frequency display control unit 217 performs control so that the number of conflicting utterances of the user A is displayed on the communication terminal 201 (first communication terminal) of each of the multiple participants. The number display control section 217 may have substantially the same function as the number display control section 152 .

アイコン表示制御部２１８は、対応する通信端末２０１のユーザに対応する顔アイコンが複数の参加者それぞれの通信端末２０１（第１の通信端末）に表示されるように、制御を行う。例えば通信端末２０１Ａでは、アイコン表示制御部２１８は、ユーザＡに対応する顔アイコンが複数の参加者それぞれの通信端末２０１（第１の通信端末）に表示されるように、制御を行う。アイコン表示制御部２１８は、アイコン表示制御部１５４の機能と実質的に同様の機能を有してもよい。 The icon display control unit 218 performs control so that the face icon corresponding to the user of the corresponding communication terminal 201 is displayed on the communication terminal 201 (first communication terminal) of each of the multiple participants. For example, in the communication terminal 201A, the icon display control unit 218 performs control so that the face icon corresponding to the user A is displayed on the communication terminal 201 (first communication terminal) of each of the multiple participants. Icon display control section 218 may have substantially the same function as icon display control section 154 .

図１３は、実施の形態２にかかる遠隔会議システム２００で実行される遠隔会議方法を示すフローチャートである。図１３にかかる遠隔会議方法は、主に、各通信端末２０１の会議実行システム２０２で実行される。以下の説明では、適宜、通信端末２０１Ａの処理について説明するが、他の通信端末２０１においても同様である。 FIG. 13 is a flowchart showing a remote conference method executed by the remote conference system 200 according to the second embodiment. The teleconference method according to FIG. 13 is mainly executed by the conference execution system 202 of each communication terminal 201. FIG. In the following description, the processing of the communication terminal 201A will be described as appropriate, but the same applies to the other communication terminals 201 as well.

まず、会議実行システム２０２が起動される（ステップＳ２０１）。このとき、遠隔会議の全ての参加者の衝突回数は０回である。また、遠隔会議の全ての参加者の顔アイコンは、口が閉じた状態である。そして、発話状態検出部２０７（音声入力部２２２）は、通信端末２０１Ａのマイク２０４から、音声信号の入力を行う（ステップＳ２０２）。音声検出部２２３は、ユーザＡの音声があるか否かを判定する。（ステップＳ２０３）。 First, the conference execution system 202 is activated (step S201). At this time, the number of collisions of all participants in the teleconference is zero. Also, the face icons of all participants in the teleconference have their mouths closed. Then, the utterance state detection unit 207 (voice input unit 222) inputs a voice signal from the microphone 204 of the communication terminal 201A (step S202). The voice detection unit 223 determines whether or not there is user A's voice. (Step S203).

ユーザＡの音声がないと判定された場合（Ｓ２０３のＮＯ）、会議制御部２０９は、この判定に対応する、ユーザＡの会議情報を生成して、会議サーバ２２０に送信する（ステップＳ２０４）。そして、処理フローはＳ２０２に戻る。具体的には、会議制御部２０９は、無言を示す発話状態情報、及び、口が開いていない顔アイコンを示す顔アイコン表示情報を含む会議情報を生成して、会議サーバ２２０に送信する。会議サーバ２２０は、この会議情報を通信端末２０１Ａ～２０１Ｄに送信する。これにより、各通信端末２０１のディスプレイ２０５に、ユーザＡの、口が開いていない顔アイコンが表示される。なお、会議情報に音声情報が含まれていないので、各通信端末２０１のスピーカ２０６では、ユーザＡの音声は出力されない。なお、顔アイコンの例については後述する。 If it is determined that there is no user A's voice (NO in S203), the conference control unit 209 generates user A's conference information corresponding to this determination and transmits it to the conference server 220 (step S204). Then, the processing flow returns to S202. Specifically, the conference control unit 209 generates conference information including speech state information indicating silence and face icon display information indicating a face icon with an open mouth, and transmits the conference information to the conference server 220 . The conference server 220 transmits this conference information to the communication terminals 201A-201D. As a result, the face icon of the user A whose mouth is not open is displayed on the display 205 of each communication terminal 201 . Since the conference information does not include voice information, the speaker 206 of each communication terminal 201 does not output the voice of the user A. FIG. Examples of face icons will be described later.

なお、Ｓ２０４の処理において、発話状態検出部２０７は、無言を示す発話状態情報を生成する。また、表示制御部２１６のアイコン表示制御部２１８は、口が開いていない顔アイコンを示す顔アイコン表示情報を生成する。また、音声出力制御部２１１は、音声情報を会議情報に含めないと決定する。なお、会議情報は、衝突回数が０回であることを示す衝突回数情報を含んでもよい。このとき、回数表示制御部２１７は、衝突回数が増加していないことを示す衝突回数情報を生成してもよい。 In addition, in the process of S204, the utterance state detection unit 207 generates utterance state information indicating silence. Also, the icon display control unit 218 of the display control unit 216 generates face icon display information indicating a face icon whose mouth is not open. Also, the audio output control unit 211 determines not to include the audio information in the conference information. Note that the meeting information may include collision number information indicating that the number of collisions is zero. At this time, the number display control unit 217 may generate collision number information indicating that the number of collisions has not increased.

一方、ユーザＡの音声があると判定された場合（Ｓ２０３のＹＥＳ）、言語認識部２２４は、上述した言語認識を行う（ステップＳ２０５）。そして、発話有無判別部２２５は、音声情報に言語があるか否かを判定する（ステップＳ２０６）。つまり、発話有無判別部２２５は、音声情報から言語が認識されたか否かを判定する。言語がない場合（Ｓ２０６のＮＯ）、発話有無判別部２２５は、ユーザＡの音声情報が相槌に対応すると判定する。 On the other hand, if it is determined that there is user A's voice (YES in S203), the language recognition unit 224 performs the language recognition described above (step S205). Then, the utterance presence/absence determination unit 225 determines whether or not there is a language in the voice information (step S206). That is, the utterance presence/absence determination unit 225 determines whether or not the language is recognized from the voice information. If there is no language (NO in S206), the utterance presence/absence determination unit 225 determines that the voice information of user A corresponds to backtracking.

そして、会議制御部２０９は、この判定に対応する、ユーザＡの会議情報を生成して、会議サーバ２２０に送信する（ステップＳ２０７）。そして、処理フローはＳ２０２に戻る。具体的には、会議制御部２０９は、相槌を示す発話状態情報、口が開いている顔アイコンを示す顔アイコン表示情報、及び音声情報を含む会議情報を生成して、会議サーバ２２０に送信する。会議サーバ２２０は、この会議情報を通信端末２０１Ａ～２０１Ｄに送信する。これにより、各通信端末２０１のディスプレイ２０５に、ユーザＡの、口が開いている顔アイコンが表示される。また、各通信端末２０１のスピーカ２０６で、ユーザＡの音声（相槌）が出力される。 Then, the conference control unit 209 generates user A's conference information corresponding to this determination, and transmits it to the conference server 220 (step S207). Then, the processing flow returns to S202. Specifically, the conference control unit 209 generates speech state information indicating backtracking, face icon display information indicating a face icon with an open mouth, and conference information including voice information, and transmits the generated conference information to the conference server 220 . . The conference server 220 transmits this conference information to the communication terminals 201A-201D. As a result, the face icon of user A with an open mouth is displayed on the display 205 of each communication terminal 201 . Also, the speaker 206 of each communication terminal 201 outputs the user A's voice (backhand).

なお、Ｓ２０７の処理において、発話状態検出部２０７は、相槌を示す発話状態情報を生成する。また、表示制御部２１６のアイコン表示制御部２１８は、口が開いている顔アイコンを示す顔アイコン表示情報を生成する。また、音声出力制御部２１１は、音声情報を会議情報に含めると決定する。なお、会議情報は、衝突回数が増加していないことを示す衝突回数情報を含んでもよい。このとき、回数表示制御部２１７は、衝突回数が増加していないことを示す衝突回数情報を生成してもよい。 In addition, in the process of S207, the utterance state detection unit 207 generates utterance state information indicating backtracking. Also, the icon display control unit 218 of the display control unit 216 generates face icon display information indicating a face icon with an open mouth. Also, the audio output control unit 211 determines to include the audio information in the conference information. Note that the meeting information may include collision number information indicating that the number of collisions has not increased. At this time, the number display control unit 217 may generate collision number information indicating that the number of collisions has not increased.

一方、言語がある場合（Ｓ２０６のＹＥＳ）、発話有無判別部２２５は、ユーザＡの音声情報に発話があると判定する（ステップＳ２０８）。このとき、会議制御部２０９の発話衝突判定部２１２は、他のユーザからの発話がないか否かを判定する（ステップＳ２０９）。言い換えると、発話衝突判定部２１２は、受信された他のユーザの会議情報（音声情報及び発話状態情報）を用いて、ユーザＡの発話の前に他のユーザが発話を行っていないかを判定する。さらに言い換えると、発話衝突判定部２１２は、ユーザＡの発話によって発話衝突が発生していないか否かを判定する。 On the other hand, if there is a language (YES in S206), the speech presence/absence determination unit 225 determines that there is speech in the voice information of user A (step S208). At this time, the utterance collision determination unit 212 of the conference control unit 209 determines whether or not there is an utterance from another user (step S209). In other words, the utterance collision determination unit 212 uses the received conference information (voice information and utterance state information) of other users to determine whether or not another user has spoken before user A speaks. do. In other words, the utterance collision determination unit 212 determines whether user A's utterance causes an utterance collision.

他のユーザからの発話がない場合（Ｓ２０９のＹＥＳ）、会議制御部２０９は、ユーザＡの発話は発話衝突を起こしていないと判定する。そして、会議制御部２０９は、この判定に対応する、ユーザＡの会議情報を生成して、会議サーバ２２０に送信する（ステップＳ２１０）。そして、処理フローはＳ２０２に戻る。具体的には、会議制御部２０９は、発話を示す発話状態情報、口が開いている顔アイコンを示す顔アイコン表示情報、及び音声情報を含む会議情報を生成して、会議サーバ２２０に送信する。会議サーバ２２０は、この会議情報を通信端末２０１Ａ～２０１Ｄに送信する。これにより、各通信端末２０１のディスプレイ２０５に、ユーザＡの、口が開いている顔アイコンが表示される。また、各通信端末２０１のスピーカ２０６で、ユーザＡの音声（発話）が出力される。このとき、会議情報は、ユーザＡが話し中である旨を表示する表示情報を含んでもよい。この場合、各通信端末２０１のディスプレイ２０５に、ユーザＡが話し中であることを示すメッセージが表示される。これにより、各ユーザは誰が発話を行っているのかを把握することができるので、議事録の作成を行いやすくなる。 If there is no utterance from another user (YES in S209), the conference control unit 209 determines that user A's utterance does not cause an utterance collision. Then, the conference control unit 209 generates user A's conference information corresponding to this determination, and transmits it to the conference server 220 (step S210). Then, the processing flow returns to S202. Specifically, the conference control unit 209 generates speech state information indicating an utterance, face icon display information indicating a face icon with an open mouth, and conference information including voice information, and transmits the conference information to the conference server 220 . . The conference server 220 transmits this conference information to the communication terminals 201A-201D. As a result, the face icon of user A with an open mouth is displayed on the display 205 of each communication terminal 201 . Also, the speaker 206 of each communication terminal 201 outputs the voice (utterance) of the user A. FIG. At this time, the conference information may include display information indicating that the user A is busy. In this case, the display 205 of each communication terminal 201 displays a message indicating that the user A is busy. As a result, each user can grasp who is speaking, which makes it easier to create the minutes.

なお、Ｓ２１０の処理において、発話状態検出部２０７は、発話を示す発話状態情報を生成する。また、表示制御部２１６のアイコン表示制御部２１８は、口が開いている顔アイコンを示す顔アイコン表示情報を生成する。また、音声出力制御部２１１は、音声情報を会議情報に含めると決定する。なお、会議情報は、衝突回数が増加していないことを示す衝突回数情報を含んでもよい。このとき、回数表示制御部２１７は、衝突回数が増加していないことを示す衝突回数情報を生成してもよい。 In addition, in the process of S210, the utterance state detection unit 207 generates utterance state information indicating utterance. Also, the icon display control unit 218 of the display control unit 216 generates face icon display information indicating a face icon with an open mouth. Also, the audio output control unit 211 determines to include the audio information in the conference information. Note that the meeting information may include collision number information indicating that the number of collisions has not increased. At this time, the number display control unit 217 may generate collision number information indicating that the number of collisions has not increased.

一方、他のユーザからの発話がある場合（Ｓ２０９のＮＯ）、会議制御部２０９は、ユーザＡの発話は発話衝突を起こしたと判定する。そして、会議制御部２０９は、通信端末２０１Ａのディスプレイ２０５に、「他のユーザが話し中です」といったメッセージを表示させる（ステップＳ２１１）。そして、会議制御部２０９は、この判定に対応する、ユーザＡの会議情報を生成して、会議サーバ２２０に送信する（ステップＳ２１２）。そして、処理フローはＳ２０２に戻る。具体的には、会議制御部２０９は、発話（衝突発話）を示す発話状態情報、口が開いていない顔アイコンを示す顔アイコン表示情報、及び、衝突回数を１つインクリメントした衝突回数情報を含む会議情報を生成して、会議サーバ２２０に送信する。会議サーバ２２０は、この会議情報を通信端末２０１Ａ～２０１Ｄに送信する。これにより、各通信端末２０１のディスプレイ２０５に、ユーザＡの、口が開いていない顔アイコンが表示される。また、各通信端末２０１のディスプレイ２０５に、１つ増加した、ユーザＡの衝突回数が表示される。なお、会議情報に音声情報が含まれていないので、各通信端末２０１のスピーカ２０６では、ユーザＡの音声は出力されない。 On the other hand, if there is an utterance from another user (NO in S209), the conference control unit 209 determines that user A's utterance has caused an utterance collision. Then, the conference control unit 209 causes the display 205 of the communication terminal 201A to display a message such as "another user is busy" (step S211). Then, the conference control unit 209 generates user A's conference information corresponding to this determination, and transmits it to the conference server 220 (step S212). Then, the processing flow returns to S202. Specifically, the conference control unit 209 includes utterance state information indicating an utterance (collision utterance), face icon display information indicating a face icon whose mouth is not open, and collision count information obtained by incrementing the number of collisions by one. It generates conference information and transmits it to the conference server 220 . The conference server 220 transmits this conference information to the communication terminals 201A-201D. As a result, the face icon of the user A whose mouth is not open is displayed on the display 205 of each communication terminal 201 . Also, the number of collisions of user A, which is incremented by one, is displayed on the display 205 of each communication terminal 201 . Since the conference information does not include voice information, the speaker 206 of each communication terminal 201 does not output the voice of the user A. FIG.

なお、Ｓ２１２の処理において、発話状態検出部２０７は、発話（衝突発話）を示す発話状態情報を生成する。また、表示制御部２１６のアイコン表示制御部２１８は、口が開いていない顔アイコンを示す顔アイコン表示情報を生成する。また、音声出力制御部２１１の発話出力抑制部２１４は、音声情報を会議情報に含めないと決定する。また、回数表示制御部２１７は、衝突回数が１つ増加したことを示す衝突回数情報を生成する。 In addition, in the process of S212, the utterance state detection unit 207 generates utterance state information indicating the utterance (collision utterance). Also, the icon display control unit 218 of the display control unit 216 generates face icon display information indicating a face icon whose mouth is not open. Also, the speech output suppression unit 214 of the voice output control unit 211 determines not to include the voice information in the conference information. In addition, the number display control unit 217 generates collision number information indicating that the number of collisions has increased by one.

図１４及び図１５は、実施の形態２にかかる遠隔会議において各通信端末２０１で表示される会議画像２３０を例示する図である。会議画像２３０において、各ユーザのユーザ名の近傍に、各ユーザに対応する顔アイコン２３１及び衝突回数２３２が表示される。したがって、ユーザＡのユーザ名の近傍に、顔アイコン２３１Ａ及び衝突回数２３２Ａが表示される。同様に、ユーザＢのユーザ名の近傍に、顔アイコン２３１Ｂ及び衝突回数２３２Ｂが表示される。ユーザＣのユーザ名の近傍に、顔アイコン２３１Ｃ及び衝突回数２３２Ｃが表示される。ユーザＤのユーザ名の近傍に、顔アイコン２３１Ｄ及び衝突回数２３２Ｄが表示される。図１４の例では、衝突回数２３２Ａは０回を示し、衝突回数２３２Ｂは２回を示し、衝突回数２３２Ｃは１回を示し、衝突回数２３２Ｄは０回を示している。なお、会議画像２３０は、ユーザＡ～Ｄごとに顔アイコン２３１及び衝突回数２３２を表示する、表示領域２３０ａ～２３０ｄを有していてもよい。 14 and 15 are diagrams illustrating conference images 230 displayed on each communication terminal 201 in the teleconference according to the second embodiment. In the conference image 230, a face icon 231 and the number of collisions 232 corresponding to each user are displayed near the user name of each user. Therefore, the face icon 231A and the number of collisions 232A are displayed in the vicinity of the user A's user name. Similarly, a face icon 231B and the number of collisions 232B are displayed near User B's user name. A face icon 231C and the number of collisions 232C are displayed in the vicinity of the user C's user name. A face icon 231D and the number of collisions 232D are displayed in the vicinity of the user D's user name. In the example of FIG. 14, the number of collisions 232A indicates zero, the number of collisions 232B indicates two, the number of collisions 232C indicates one, and the number of collisions 232D indicates zero. Note that the conference image 230 may have display areas 230a-230d that display the face icon 231 and the number of collisions 232 for each of the users A-D.

また、図１４に例示された会議画像２３０では、ユーザＢが発話を行っている。したがって、ユーザＢの顔アイコン２３１Ｂの近傍に、ユーザＢが発話を行っていることを示すメッセージ２３４が表示される。また、ユーザＢの顔アイコン２３１Ｂの口は開いている。また、ユーザＣは相槌を行っている。したがって、ユーザＣの顔アイコン２３１Ｃの口は開いている。また、ユーザＡ及びユーザＤは無言である。したがって、ユーザＡの顔アイコン２３１Ａの口及びユーザＤの顔アイコン２３１Ｄの口は閉じている。また、ユーザＢが発話を行っているので、各通信端末２０１は、ユーザＢの発話を出力する。また、ユーザＣが相槌を行っているので、各通信端末２０１は、ユーザＣの相槌を出力する。 Also, in the conference image 230 illustrated in FIG. 14, the user B is speaking. Therefore, a message 234 indicating that User B is speaking is displayed near User B's face icon 231B. Also, the mouth of the face icon 231B of user B is open. In addition, the user C is backtracking. Therefore, the mouth of the face icon 231C of user C is open. Also, user A and user D are silent. Therefore, the mouth of user A's face icon 231A and the mouth of user D's face icon 231D are closed. Also, since user B is speaking, each communication terminal 201 outputs user B's speech. Also, since user C is making a backhand, each communication terminal 201 outputs user C's backhand.

図１５は、図１４に例示された会議画像２３０の状態で、ユーザＡの発話により発話衝突が発生した場合を例示している。ユーザＢが発話を行っているときにユーザＡがユーザＢよりも遅れて発話を行った場合、ユーザＡの発話は衝突発話と判定される。このとき、ユーザＡの通信端末２０１Ａには、他のユーザ（ユーザＢ）が発話中である旨を示すメッセージ２３６が示される。また、ユーザＡの衝突回数２３２Ａは、０回から１回に更新されることを示す。なお、ユーザＡの発話は衝突発話であるので、ユーザＡの顔アイコン２３１Ａの口は閉じている。なお、メッセージ２３６はユーザＡの通信端末２０１Ａのみに表示されるが、メッセージ２３６以外の、各ユーザの通信端末２０１に表示される会議画像２３０は、互いに同じであり得る。 FIG. 15 illustrates a case where user A's utterance causes an utterance collision in the state of the conference image 230 exemplified in FIG. When user A speaks later than user B while user B speaks, user A's speech is determined to be a collision speech. At this time, user A's communication terminal 201A displays a message 236 indicating that another user (user B) is speaking. Also, the number of collisions 232A for user A is updated from 0 to 1. Note that the mouth of the face icon 231A of the user A is closed because the user A's utterance is a conflicting utterance. Although the message 236 is displayed only on the user A's communication terminal 201A, the conference images 230 displayed on the communication terminals 201 of each user other than the message 236 may be the same.

（本実施の形態にかかる効果）
以下、本実施の形態にかかる効果を説明する。
近年、参加者が自宅に滞在したまま遠隔会議を行うことが増加している。遠隔会議を開催するにあたり、自宅のインターネット環境を用いた遠隔会議を活用することが増えている。この場合、自宅のインターネット環境に起因した遅延が発生するため、複数の参加者の発話が被ること（発話衝突）や、お互いに発話を遠慮することが起こり易くなる可能性があり、遠隔会議がスムーズに進行しないおそれがある。また、自宅で遠隔会議に参加する場合、参加者は、プライバシーの問題やインターネット回線の輻輳を防ぐといった目的で、音声のみで遠隔会議に参加することが多い。その場合、会話時に相手の表情が読み取れないという問題がある。さらに、環境音が入るのを防ぐために自分が発言するとき以外はミュート設定することで、発言者に相槌も伝わりづらいという問題がある。また、音声情報がある人に発言中を示す表示をする仕組みを採用する技術では、相槌のみの場合でも発言中と見なされるため、会議参加者が多い場合は誰が発言しているのかがわかりづらいという問題もある。 (Effects of this embodiment)
The effects of this embodiment will be described below.
In recent years, there has been an increase in the number of participants holding teleconferences while staying at home. In holding remote conferences, the use of remote conferences using the Internet environment at home is increasing. In this case, there is a delay due to the Internet environment at home, so there is a possibility that multiple participants' speeches will overlap (speech collisions) or that they will refrain from speaking to each other. It may not proceed smoothly. Also, when participating in a teleconference at home, participants often participate in the teleconference using only voice for the purpose of preventing privacy issues and Internet line congestion. In that case, there is a problem that the facial expression of the other party cannot be read during conversation. Furthermore, there is a problem that it is difficult for the speaker to hear backtracking by setting mute except when he or she speaks in order to prevent environmental sounds from entering. In addition, in technology that uses a mechanism to indicate that a person is speaking to someone with audio information, even if only a backlog is given, it is considered that the person is speaking, so if there are many conference participants, it is difficult to understand who is speaking. There is also the problem of

本実施の形態にかかる遠隔会議システムは、ある参加者が発話しているときに遅れて他の参加者が発話を行った場合に、遅れて発話を行った他の参加者の発話が、各参加者の通信端末で出力されることが抑制されるように構成されている。これにより、各参加者は、通信端末で衝突発話（遅れて発話を行った他の参加者の発話）を聴くことが抑制されるので、遠隔会議の進行がスムーズとなる。 In the teleconferencing system according to the present embodiment, when another participant speaks later while a certain participant is speaking, the speech of the other participant who speaks later is It is configured to suppress output from the communication terminals of the participants. As a result, each participant is prevented from listening to the conflicting speech (the speech of the other participant who spoke later) on the communication terminal, so that the teleconference progresses smoothly.

さらに、本実施の形態にかかる遠隔会議システムは、発話衝突を発生させた参加者ごとに、衝突回数をカウントして、各通信端末で衝突回数に関する表示がなされるように構成されている。したがって、各参加者は、どの参加者の発話衝突の回数が多いかといったことを把握することができる。これにより、各参加者は、どの参加者が話したがっているかの気づきを与えることができる。これにより、他の参加者は、その参加者に対して発話を促したり、その参加者が発話を行うまで待機したりといった行動を行うことができる。したがって、本実施の形態にかかる遠隔会議システムは、遠隔会議をスムーズに進行することが可能となる。 Furthermore, the teleconference system according to the present embodiment is configured to count the number of collisions for each participant who has caused a speech collision, and display the number of collisions on each communication terminal. Therefore, each participant can grasp which participant has the most speech collisions. This allows each participant to give an awareness of which participant wants to speak. As a result, other participants can perform actions such as prompting the participant to speak or waiting until the participant speaks. Therefore, the teleconference system according to the present embodiment enables the teleconference to proceed smoothly.

また、本実施の形態にかかる遠隔会議システムは、複数の参加者ごとに、衝突回数が各参加者の通信端末で表示されるように構成されている。これにより、各参加者は、各参加者の衝突回数を把握することができる。 Further, the remote conference system according to the present embodiment is configured so that the number of collisions is displayed on each participant's communication terminal for each of a plurality of participants. Thereby, each participant can grasp the number of collisions of each participant.

また、本実施の形態にかかる遠隔会議システムは、遅れて発話を行った参加者の通信端末に、「他ユーザが話し中です」いったメッセージを表示させるように構成されている。したがって、衝突発話を行った参加者に発話衝突が発生したことの気づきを与えることができる。 Further, the remote conference system according to the present embodiment is configured to display a message such as "another user is busy" on the communication terminal of the participant who speaks late. Therefore, it is possible to make the participant who made the conflicting utterance aware of the occurrence of the utterance conflict.

また、本実施の形態にかかる遠隔会議システムは、ある参加者が発話を行っているときに他の参加者が相槌を行った場合であっても、相槌を各参加者の通信端末に出力させるように構成されている。これにより、発話行っている参加者（発言者）は、他の参加者に発言を聞いてもらっているという安心感を得ることができる。 In addition, the remote conference system according to the present embodiment outputs the backtracking to each participant's communication terminal even if another participant backtracks while a certain participant is speaking. is configured as As a result, the participant (speaker) who is speaking can feel secure that the other participant is listening to the speech.

また、本実施の形態にかかる遠隔会議システムは、ある参加者が相槌を行った場合、各参加者の通信端末に、相槌を行った参加者に対応する、口が開いた顔アイコンを表示させるように構成されている。これにより、相槌を行った参加者の通信端末がミュート設定である場合でも、発言者は、相槌を行っている参加者がいることを把握できるので、発言を聞いてもらっているという安心感を得ることができる。 Further, in the remote conference system according to the present embodiment, when a certain participant backtracks, the communication terminal of each participant displays a face icon with an open mouth corresponding to the participant who backtracked. is configured as As a result, even if the communication terminal of the participant who made the backtracking is set to mute, the speaker can grasp that there is a participant who backtracked, so that he/she can feel secure that he or she is being heard. be able to.

また、実施の形態２にかかる遠隔会議システムは、発話衝突が発生した際に、衝突発話の音声情報が通信端末から会議サーバに送信されないように構成されている。これにより、ネットワークの負荷を低減することができる。 Further, the teleconferencing system according to the second embodiment is configured such that, when a speech collision occurs, the voice information of the collision speech is not transmitted from the communication terminal to the conference server. This can reduce the load on the network.

（変形例）
なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。例えば、上述した複数の実施の形態は、相互に適用可能である。例えば、実施の形態１にかかる遠隔会議装置１００の機能を、実施の形態２にかかる通信端末２０１で実現してもよい。また、実施の形態２にかかる通信端末２０１の機能を、実施の形態１にかかる遠隔会議装置１００で実現してもよい。 (Modification)
It should be noted that the present invention is not limited to the above embodiments, and can be modified as appropriate without departing from the scope of the invention. For example, the multiple embodiments described above are mutually applicable. For example, the functions of the remote conference device 100 according to the first embodiment may be realized by the communication terminal 201 according to the second embodiment. Also, the functions of the communication terminal 201 according to the second embodiment may be realized by the remote conference apparatus 100 according to the first embodiment.

また、上述したフローチャートにおいて、各処理（ステップ）の順序は、適宜、変更可能である。また、複数ある処理（ステップ）のうちの１つ以上は、省略されてもよい。例えば、図７において、Ｓ１１２の処理とＳ１１４の処理の順序は互いに逆であってもよい。同様に、図１３において、Ｓ２１１の処理とＳ２１２の処理の順序は互いに逆であってもよい。また、図７において、Ｓ１１４，Ｓ１２４，Ｓ１３８の処理はなくてもよい。同様に、Ｓ２１１の処理はなくてもよい。 Also, in the flowchart described above, the order of each process (step) can be changed as appropriate. Also, one or more of a plurality of processes (steps) may be omitted. For example, in FIG. 7, the order of the processing of S112 and the processing of S114 may be reversed. Similarly, in FIG. 13, the order of the processing of S211 and the processing of S212 may be reversed. Moreover, in FIG. 7, the processing of S114, S124, and S138 may be omitted. Similarly, the processing of S211 may be omitted.

また、上述した実施の形態において、回数表示制御部は、各参加者（ユーザ）の衝突回数が複数の参加者の通信端末に表示されるように制御を行うとしたが、このような構成に限られない。回数表示制御部は、複数の通信端末に、衝突回数自体を表示させる必要はない。例えば、回数表示制御部は、衝突回数の数に応じたレベルを複数の通信端末に表示させるようにしてもよい。例えば、回数表示制御部は、衝突回数が２以下であればレベルＣ、衝突回数が３～４であればレベルＢ、衝突回数が５以上であればレベルＡといったように、各通信端末に表示させてもよい。また、回数表示制御部は、ある参加者の衝突回数が閾値を超えた場合に、各通信端末に警告を表示させるようにしてもよい。また、例えば、回数表示制御部は、衝突回数が増加した参加者の顔アイコンを、発話を行いたいことが分かるような形態（顔アイコンの色が赤くなる等）に動作させるように、各通信端末に表示させてもよい。 In the above-described embodiment, the number of times display control unit performs control so that the number of collisions of each participant (user) is displayed on the communication terminals of a plurality of participants. Not limited. The number of times display control unit does not need to display the number of times of collision itself on a plurality of communication terminals. For example, the number display control unit may cause a plurality of communication terminals to display levels corresponding to the number of collisions. For example, the number of times display control unit displays on each communication terminal level C if the number of collisions is 2 or less, level B if the number of collisions is 3 to 4, and level A if the number of collisions is 5 or more. You may let Further, the number display control unit may cause each communication terminal to display a warning when the number of collisions of a certain participant exceeds a threshold. Further, for example, the number display control unit operates the face icon of the participant whose number of collisions has increased in such a manner as to indicate that the participant wants to speak (the color of the face icon turns red, etc.). You can display it on your terminal.

また、衝突回数は、遠隔会議が実行されている間、発話衝突が発生するごとに増加し続けてもよいし、遠隔会議の途中でリセットされてもよい。例えば、衝突回数は、対応する参加者が衝突発話でない発話を予め定められた回数行った場合にリセットされてもよい。また、例えば、衝突回数は、対応する参加者が通信端末を操作することによりリセットされてもよい。 Also, the number of collisions may continue to increase each time a speech collision occurs while the teleconference is being executed, or may be reset during the teleconference. For example, the collision count may be reset when the corresponding participant makes a non-collision utterance a predetermined number of times. Also, for example, the number of collisions may be reset by the corresponding participant operating the communication terminal.

また、実施の形態２では、各通信端末２０１が対応するユーザの顔アイコン表示情報を生成するとしたが、このような構成に限られない。例えば、通信端末２０１Ａから送信されたユーザＡに関する発話状態情報を用いて、各通信端末２０１が、ユーザＡの顔アイコンを生成してもよい。 Also, in the second embodiment, each communication terminal 201 generates face icon display information for a corresponding user, but the configuration is not limited to this. For example, each communication terminal 201 may generate a face icon of user A using speech state information about user A transmitted from communication terminal 201A.

また、上述した実施の形態では、遠隔会議の実施中に、複数の通信端末それぞれで各ユーザ（参加者）の顔アイコンが表示されるとしたが、このような構成に限られない。カメラ２０３等によって撮影された各ユーザの顔の映像が、複数の通信端末それぞれで表示されてもよい。しかしながら、ユーザの顔の映像が表示されると、映像では、相槌を行っているユーザの口も衝突発話を行っているユーザの口も動いている可能性がある。したがって、他のユーザは、相槌と衝突発話との区別を、視覚的にできない可能性がある。これに対し、本実施の形態では、各通信端末で顔アイコンが表示され、衝突発話を行っているユーザの顔アイコンの口が閉じるようにし、相槌を行っているユーザの顔アイコンの口が開くようにしている。したがって、本実施の形態では、相槌と衝突発話との区別を、視覚的に行うことが可能である。さらに、本実施の形態にかかる遠隔会議システムでは、各通信端末が映像情報の送信を行わないので、ネットワークの負荷を低減しつつ、ユーザの発話状態を把握することができる。 Further, in the above-described embodiment, the face icon of each user (participant) is displayed on each of a plurality of communication terminals during the teleconference, but the present invention is not limited to such a configuration. An image of each user's face captured by the camera 203 or the like may be displayed on each of a plurality of communication terminals. However, when an image of the user's face is displayed, it is possible that both the mouth of the user making the backtracking and the mouth of the user making the conflicting utterance are moving in the image. Therefore, other users may not be able to visually distinguish between backtracking and conflicting utterances. On the other hand, in the present embodiment, the face icon is displayed on each communication terminal, and the mouth of the face icon of the user who is making a conflicting utterance is closed, and the mouth of the face icon of the user who is backtracking is open. I'm trying Therefore, in the present embodiment, it is possible to visually distinguish between backtracking and conflicting utterances. Furthermore, in the teleconferencing system according to the present embodiment, each communication terminal does not transmit video information, so it is possible to grasp the user's speech state while reducing the load on the network.

上述の例において、プログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ－ＲＯＭ（Read Only Memory）、ＣＤ－Ｒ、ＣＤ－Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（Random Access Memory））を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 In the above examples, the programs can be stored and delivered to computers using various types of non-transitory computer readable media. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media (eg, flexible discs, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical discs), CD-ROMs (Read Only Memory), CD-Rs, CD-R/W, semiconductor memory (eg, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)). The program may also be delivered to the computer on various types of transitory computer readable medium. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves. Transitory computer-readable media can deliver the program to the computer via wired channels, such as wires and optical fibers, or wireless channels.

上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。
（付記１）
遠隔会議の複数の参加者それぞれの音声が発話を示しているか又は相槌を示しているかを判定する発話判定手段と、
前記複数の参加者それぞれの音声が前記複数の参加者それぞれの通信端末で出力されるように制御を行い、前記複数の参加者のうちのある参加者が発話を行っている際に他の参加者が発話を行った場合に、前記他の参加者の発話の出力を抑制するように制御を行う音声出力制御手段と、
出力が抑制された発話である第１の発話の回数を、前記複数の参加者ごとにカウントするカウント手段と、
前記回数に関する表示が前記複数の参加者それぞれの前記通信端末でなされるように制御を行う回数表示制御手段と、
を有する遠隔会議システム。
（付記２）
前記回数表示制御手段は、前記複数の参加者ごとの前記第１の発話の回数が前記複数の参加者それぞれの前記通信端末に表示されるように制御を行う、
付記１に記載の遠隔会議システム。
（付記３）
前記回数表示制御手段は、予め定められた閾値よりも多い前記回数を、前記閾値以下の前記回数の表示よりも目立つような表示形態で、前記通信端末に表示させる、
付記２に記載の遠隔会議システム。
（付記４）
前記回数表示制御手段は、前記複数の参加者の前記回数のうち最も多い前記回数を、他の前記回数の表示よりも目立つような表示形態で、前記通信端末に表示させる、
付記２に記載の遠隔会議システム。
（付記５）
前記音声出力制御手段は、前記参加者が相槌を行った場合には、当該相槌が前記複数の参加者それぞれの前記通信端末で出力されるように制御を行う、
付記１から４のいずれか１項に記載の遠隔会議システム。
（付記６）
前記複数の参加者それぞれに対応する顔アイコンが前記複数の参加者それぞれの前記通信端末に表示されるように制御を行うアイコン表示制御手段、
をさらに有し、
前記アイコン表示制御手段は、前記第１の発話を行った前記他の参加者に対応する前記顔アイコンを動作させないように前記顔アイコンを表示させ、前記第１の発話以外の発話を行った前記参加者に対応する前記顔アイコンを動作させるように前記顔アイコンを表示させる、
付記１から５のいずれか１項に記載の遠隔会議システム。
（付記７）
前記アイコン表示制御手段は、前記参加者が相槌を行った場合には、当該参加者に対応する前記顔アイコンを動作させるように、前記顔アイコンを表示させる、
付記６に記載の遠隔会議システム。
（付記８）
通信端末であって、
当該通信端末のユーザが参加する遠隔会議において当該ユーザの音声が発話を示しているか又は相槌を示しているかを判定する発話判定手段と、
前記遠隔会議の複数の参加者それぞれの音声が当該通信端末で出力され、前記ユーザの音声が複数の参加者それぞれの前記通信端末である第１の通信端末で出力されるように制御を行い、前記複数の参加者のうちのある参加者が発話を行っている際に前記ユーザが発話を行った場合に、前記ユーザの発話が前記第１の通信端末で出力されることを抑制するように制御を行う音声出力制御手段と、
当該通信端末のユーザについて、出力が抑制された発話である第１の発話の回数をカウントするカウント手段と、
前記回数に関する表示が前記第１の通信端末でなされるように制御を行う回数表示制御手段と、
を有する通信端末。
（付記９）
前記回数表示制御手段は、当該通信端末のユーザの前記第１の発話の回数が前記第１の通信端末に表示されるように制御を行う、
付記８に記載の通信端末。
（付記１０）
前記音声出力制御手段は、当該通信端末のユーザが相槌を行った場合には、当該相槌が前記第１の通信端末で出力されるように制御を行う、
付記８又は９に記載の通信端末。
（付記１１）
当該通信端末のユーザに対応する顔アイコンが前記第１の通信端末に表示されるように制御を行うアイコン表示制御手段、
をさらに有し、
前記アイコン表示制御手段は、当該通信端末のユーザが前記第１の発話を行った場合に前記顔アイコンを動作させないように前記顔アイコンを表示させ、当該通信端末のユーザが前記第１の発話以外の発話を行った場合に前記顔アイコンを動作させるように前記顔アイコンを表示させる、
付記８から１０のいずれか１項に記載の通信端末。
（付記１２）
前記アイコン表示制御手段は、当該通信端末のユーザが相槌を行った場合には、前記顔アイコンを動作させるように、前記顔アイコンを表示させる、
付記１１に記載の通信端末。
（付記１３）
遠隔会議の複数の参加者それぞれの音声が発話を示しているか又は相槌を示しているかを判定し、
前記複数の参加者それぞれの音声が前記複数の参加者それぞれの通信端末で出力されるように制御を行い、
前記複数の参加者のうちのある参加者が発話を行っている際に他の参加者が発話を行った場合に、前記他の参加者の発話の出力を抑制するように制御を行い、
出力が抑制された発話である第１の発話の回数を、前記参加者ごとにカウントし、
前記回数に関する表示が前記複数の参加者それぞれの前記通信端末でなされるように制御を行う、
遠隔会議方法。
（付記１４）
前記複数の参加者ごとの前記第１の発話の回数が前記複数の参加者それぞれの前記通信端末に表示されるように制御を行う、
付記１３に記載の遠隔会議方法。
（付記１５）
予め定められた閾値よりも多い前記回数を、前記閾値以下の前記回数の表示よりも目立つような表示形態で、前記通信端末に表示させる、
付記１４に記載の遠隔会議方法。
（付記１６）
前記複数の参加者の前記回数のうち最も多い前記回数を、他の前記回数の表示よりも目立つような表示形態で、前記通信端末に表示させる、
付記１４に記載の遠隔会議方法。
（付記１７）
前記参加者が相槌を行った場合には、当該相槌が前記複数の参加者それぞれの前記通信端末で出力されるように制御を行う、
付記１３から１６のいずれか１項に記載の遠隔会議方法。
（付記１８）
前記複数の参加者それぞれに対応する顔アイコンが前記複数の参加者それぞれの前記通信端末に表示されるように制御を行い、
前記第１の発話を行った前記他の参加者に対応する前記顔アイコンを動作させないように前記顔アイコンを表示させ、
前記第１の発話以外の発話を行った前記参加者に対応する前記顔アイコンを動作させるように前記顔アイコンを表示させる、
付記１３から１７のいずれか１項に記載の遠隔会議方法。
（付記１９）
前記参加者が相槌を行った場合には、当該参加者に対応する前記顔アイコンを動作させるように、前記顔アイコンを表示させる、
付記１８に記載の遠隔会議方法。
（付記２０）
通信端末で実行される遠隔会議方法であって、
当該通信端末のユーザが参加する遠隔会議において当該ユーザの音声が発話を示しているか又は相槌を示しているかを判定し、
前記遠隔会議の複数の参加者それぞれの音声が当該通信端末で出力され、前記ユーザの音声が複数の参加者それぞれの前記通信端末である第１の通信端末で出力されるように制御を行い、
前記複数の参加者のうちのある参加者が発話を行っている際に前記ユーザが発話を行った場合に、前記ユーザの発話が前記第１の通信端末で出力されることを抑制するように制御を行い、
当該通信端末のユーザについて、出力が抑制された発話である第１の発話の回数をカウントし、
前記回数に関する表示が前記第１の通信端末でなされるように制御を行う、
遠隔会議方法。
（付記２１）
当該通信端末のユーザの前記第１の発話の回数が前記第１の通信端末に表示されるように制御を行う、
付記２０に記載の遠隔会議方法。
（付記２２）
当該通信端末のユーザが相槌を行った場合には、当該相槌が前記第１の通信端末で出力されるように制御を行う、
付記２０又は２１に記載の遠隔会議方法。
（付記２３）
当該通信端末のユーザに対応する顔アイコンが前記第１の通信端末に表示されるように制御を行い、
当該通信端末のユーザが前記第１の発話を行った場合に前記顔アイコンを動作させないように前記顔アイコンを表示させ、
当該通信端末のユーザが前記第１の発話以外の発話を行った場合に前記顔アイコンを動作させるように前記顔アイコンを表示させる、
付記２０から２２のいずれか１項に記載の遠隔会議方法。
（付記２４）
当該通信端末のユーザが相槌を行った場合には、前記顔アイコンを動作させるように、前記顔アイコンを表示させる、
付記２３に記載の遠隔会議方法。
（付記２５）
遠隔会議の複数の参加者それぞれの音声が発話を示しているか又は相槌を示しているかを判定する機能と、
前記複数の参加者それぞれの音声が前記複数の参加者それぞれの通信端末で出力されるように制御を行い、前記複数の参加者のうちのある参加者が発話を行っている際に他の参加者が発話を行った場合に、前記他の参加者の発話の出力を抑制するように制御を行う機能と、
出力が抑制された発話である第１の発話の回数を、前記参加者ごとにカウントする機能と、
前記回数に関する表示が前記複数の参加者それぞれの前記通信端末でなされるように制御を行う機能と、
をコンピュータに実現させるプログラム。
（付記２６）
通信端末で実行される遠隔会議方法を実行するためのプログラムであって、
当該通信端末のユーザが参加する遠隔会議において当該ユーザの音声が発話を示しているか又は相槌を示しているかを判定する機能と、
前記遠隔会議の複数の参加者それぞれの音声が当該通信端末で出力され、前記ユーザの音声が複数の参加者それぞれの前記通信端末である第１の通信端末で出力されるように制御を行い、前記複数の参加者のうちのある参加者が発話を行っている際に前記ユーザが発話を行った場合に、前記ユーザの発話が前記第１の通信端末で出力されることを抑制するように制御を行う機能と、
当該通信端末のユーザについて、出力が抑制された発話である第１の発話の回数をカウントする機能と、
前記回数に関する表示が前記第１の通信端末でなされるように制御を行う機能と、
をコンピュータに実現させるプログラム。 Some or all of the above-described embodiments can also be described in the following supplementary remarks, but are not limited to the following.
(Appendix 1)
an utterance determination means for determining whether the voice of each of the plurality of participants in the teleconference indicates an utterance or an acknowledgment;
Control is performed so that the voice of each of the plurality of participants is output from the communication terminal of each of the plurality of participants, and when one of the plurality of participants is speaking, another participant voice output control means for controlling to suppress the output of the other participant's utterance when the other participant utters;
counting means for counting the number of first utterances, which are utterances whose output is suppressed, for each of the plurality of participants;
a number display control means for controlling so that the number of times is displayed on the communication terminal of each of the plurality of participants;
Teleconferencing system with
(Appendix 2)
The frequency display control means performs control so that the number of times of the first utterance for each of the plurality of participants is displayed on the communication terminal of each of the plurality of participants.
The teleconferencing system according to appendix 1.
(Appendix 3)
The number display control means causes the communication terminal to display the number of times greater than a predetermined threshold in a display form that is more conspicuous than the display of the number of times less than or equal to the threshold.
The teleconferencing system according to appendix 2.
(Appendix 4)
The number display control means causes the communication terminal to display the number of times that is the largest among the number of times of the plurality of participants in a display form that is more conspicuous than the display of other numbers of times.
The teleconferencing system according to appendix 2.
(Appendix 5)
The voice output control means, when the participant makes a backtracking, performs control so that the backtracking is output from the communication terminals of each of the plurality of participants.
5. The remote conference system according to any one of Appendices 1 to 4.
(Appendix 6)
Icon display control means for controlling so that face icons corresponding to each of the plurality of participants are displayed on the communication terminals of each of the plurality of participants;
further having
The icon display control means displays the face icon so as not to operate the face icon corresponding to the other participant who has made the first utterance, and the face icon which has made an utterance other than the first utterance. displaying the face icon to activate the face icon corresponding to the participant;
6. The remote conference system according to any one of Appendices 1 to 5.
(Appendix 7)
The icon display control means displays the face icon so as to operate the face icon corresponding to the participant when the participant backtracks.
The teleconferencing system according to appendix 6.
(Appendix 8)
a communication terminal,
speech determination means for determining whether the user's voice indicates an utterance or a backtracking in a teleconference in which the user of the communication terminal participates;
controlling so that the voice of each of the plurality of participants in the teleconference is output from the communication terminal, and the voice of the user is output from the first communication terminal, which is the communication terminal of each of the plurality of participants; suppressing output of the user's speech at the first communication terminal when the user speaks while one of the plurality of participants is speaking an audio output control means for controlling;
counting means for counting the number of first utterances, which are utterances whose output is suppressed, for the user of the communication terminal;
a count display control means for controlling so that the display regarding the count is made on the first communication terminal;
A communication terminal having
(Appendix 9)
The number display control means performs control so that the number of times the user of the communication terminal utters the first utterance is displayed on the first communication terminal.
The communication terminal according to appendix 8.
(Appendix 10)
The voice output control means performs control so that, when the user of the communication terminal makes a backhand, the backhand is output from the first communication terminal;
The communication terminal according to appendix 8 or 9.
(Appendix 11)
icon display control means for controlling so that a face icon corresponding to the user of the communication terminal is displayed on the first communication terminal;
further having
The icon display control means displays the face icon so as not to operate the face icon when the user of the communication terminal utters the first utterance, and the user of the communication terminal utters an utterance other than the first utterance. displaying the face icon so as to operate the face icon when the utterance of
11. The communication terminal according to any one of Appendices 8 to 10.
(Appendix 12)
The icon display control means causes the face icon to be displayed so as to operate the face icon when the user of the communication terminal responds.
The communication terminal according to appendix 11.
(Appendix 13)
determining whether the audio of each of the multiple participants in the teleconference indicates speech or backtracking;
controlling so that the voice of each of the plurality of participants is output at each communication terminal of each of the plurality of participants;
When another participant speaks while one of the plurality of participants is speaking, controlling to suppress the output of the speech of the other participant,
Counting the number of first utterances, which are utterances whose output is suppressed, for each of the participants,
controlling so that the number of times is displayed on the communication terminal of each of the plurality of participants;
Teleconferencing method.
(Appendix 14)
controlling so that the number of times of the first utterance for each of the plurality of participants is displayed on the communication terminal of each of the plurality of participants;
The teleconference method according to appendix 13.
(Appendix 15)
causing the communication terminal to display the number of times greater than a predetermined threshold in a display form that is more conspicuous than the display of the number of times less than or equal to the threshold;
15. The teleconference method according to appendix 14.
(Appendix 16)
causing the communication terminal to display the number of times, which is the largest among the number of times of the plurality of participants, in a display form that is more conspicuous than the display of the other number of times;
15. The teleconference method according to appendix 14.
(Appendix 17)
When the participant makes a backtracking, control is performed so that the backtracking is output at the communication terminal of each of the plurality of participants;
17. The remote conference method according to any one of Appendices 13 to 16.
(Appendix 18)
controlling so that face icons corresponding to each of the plurality of participants are displayed on the communication terminals of each of the plurality of participants;
displaying the face icon so as not to operate the face icon corresponding to the other participant who made the first utterance;
displaying the face icon so as to operate the face icon corresponding to the participant who made an utterance other than the first utterance;
18. The teleconference method according to any one of Appendices 13 to 17.
(Appendix 19)
displaying the face icon so as to operate the face icon corresponding to the participant when the participant backtracks;
19. The teleconferencing method according to appendix 18.
(Appendix 20)
A remote conference method executed by a communication terminal,
Determining whether the user's voice indicates utterance or backtracking in a teleconference in which the user of the communication terminal participates,
controlling so that the voice of each of the plurality of participants in the teleconference is output from the communication terminal, and the voice of the user is output from the first communication terminal, which is the communication terminal of each of the plurality of participants;
suppressing output of the user's speech at the first communication terminal when the user speaks while one of the plurality of participants is speaking take control,
counting the number of first utterances, which are utterances whose output is suppressed, for the user of the communication terminal;
controlling so that the number of times is displayed on the first communication terminal;
Teleconferencing method.
(Appendix 21)
controlling so that the number of times the user of the communication terminal utters the first utterance is displayed on the first communication terminal;
21. The teleconferencing method according to appendix 20.
(Appendix 22)
When the user of the communication terminal makes a backtracking, control is performed so that the backhandling is output at the first communication terminal;
22. The teleconference method according to appendix 20 or 21.
(Appendix 23)
controlling so that a face icon corresponding to the user of the communication terminal is displayed on the first communication terminal;
displaying the face icon so as not to operate when the user of the communication terminal utters the first utterance;
displaying the face icon so as to operate the face icon when the user of the communication terminal utters an utterance other than the first utterance;
23. The teleconference method according to any one of appendices 20 to 22.
(Appendix 24)
displaying the face icon so as to operate the face icon when the user of the communication terminal backtracks;
24. The teleconferencing method according to appendix 23.
(Appendix 25)
the ability to determine whether the voices of each of a plurality of participants in a teleconference represent utterances or backtracking;
Control is performed so that the voice of each of the plurality of participants is output from the communication terminal of each of the plurality of participants, and when one of the plurality of participants is speaking, another participant a function of controlling to suppress the output of the other participant's speech when the participant speaks;
A function of counting the number of first utterances, which are utterances whose output is suppressed, for each of the participants;
A function of controlling so that the number of times is displayed on the communication terminal of each of the plurality of participants;
A program that makes a computer realize
(Appendix 26)
A program for executing a teleconference method executed by a communication terminal,
a function of determining whether the user's voice indicates an utterance or a backtracking in a teleconference in which the user of the communication terminal participates;
controlling so that the voice of each of the plurality of participants in the teleconference is output from the communication terminal, and the voice of the user is output from the first communication terminal, which is the communication terminal of each of the plurality of participants; suppressing output of the user's speech at the first communication terminal when the user speaks while one of the plurality of participants is speaking functions to control and
A function of counting the number of first utterances, which are utterances whose output is suppressed, for the user of the communication terminal;
a function of controlling so that the number of times is displayed on the first communication terminal;
A program that makes a computer realize

１遠隔会議システム
２発話判定部
４音声出力制御部
６カウント部
８回数表示制御部
２０遠隔会議システム
２２ネットワーク
３０通信端末
４２音声取得部
４４音声送信部
４６音声受信部
４８音声出力部
５２表示情報受信部
５４画像表示部
１００遠隔会議装置
１１０参加者情報格納部
１１２音声受信部
１２０発話判定部
１３０音声出力制御部
１３２発話衝突判定部
１３４発話出力抑制部
１４０回数カウント部
１５０表示制御部
１５２回数表示制御部
１５４アイコン表示制御部
２００遠隔会議システム
２０１通信端末
２０２会議実行システム
２０７発話状態検出部
２０８会議情報受信部
２０９会議制御部
２１０会議情報送信部
２１１音声出力制御部
２１２発話衝突判定部
２１４発話出力抑制部
２１５回数カウント部
２１６表示制御部
２１７回数表示制御部
２１８アイコン表示制御部
２２０会議サーバ
２２２音声入力部
２２３音声検出部
２２４言語認識部
２２５発話有無判別部 1 Teleconference system 2 Utterance determination unit 4 Audio output control unit 6 Count unit 8 Number display control unit 20 Teleconference system 22 Network 30 Communication terminal 42 Audio acquisition unit 44 Audio transmission unit 46 Audio reception unit 48 Audio output unit 52 Display information reception Unit 54 Image display unit 100 Teleconference device 110 Participant information storage unit 112 Voice receiving unit 120 Speech determination unit 130 Voice output control unit 132 Speech collision determination unit 134 Speech output suppression unit 140 Number counting unit 150 Display control unit 152 Number display control Unit 154 Icon display control unit 200 Remote conference system 201 Communication terminal 202 Conference execution system 207 Speech state detection unit 208 Conference information reception unit 209 Conference control unit 210 Conference information transmission unit 211 Voice output control unit 212 Speech collision determination unit 214 Speech output suppression Unit 215 Number counting unit 216 Display control unit 217 Number display control unit 218 Icon display control unit 220 Conference server 222 Voice input unit 223 Voice detection unit 224 Language recognition unit 225 Speech presence/absence determination unit

Claims

a communication terminal,
speech determination means for determining whether the user's voice indicates an utterance or a backtracking in a teleconference in which the user of the communication terminal participates;
controlling so that the voice of each of the plurality of participants in the teleconference is output from the communication terminal, and the voice of the user is output from the first communication terminal, which is the communication terminal of each of the plurality of participants; suppressing output of the user's speech at the first communication terminal when the user speaks while one of the plurality of participants is speaking an audio output control means for controlling;
counting means for counting the number of first utterances, which are utterances whose output is suppressed, for the user of the communication terminal;
a count display control means for controlling so that the display regarding the count is made on the first communication terminal;
In a communication terminal having
The number display control means performs control so that the number of times the user of the communication terminal utters the first utterance is displayed on the first communication terminal,
The voice output control means performs control so that, when the user of the communication terminal makes a backhand, the backhand is output from the first communication terminal;
communication terminal.

icon display control means for controlling so that a face icon corresponding to the user of the communication terminal is displayed on the first communication terminal;
further having
The icon display control means is
displaying the face icon so as not to operate when the user of the communication terminal utters the first utterance;
displaying the face icon so as to operate the face icon when the user of the communication terminal utters an utterance other than the first utterance;
The communication terminal according to claim 1 .

The icon display control means causes the face icon to be displayed so as to operate the face icon when the user of the communication terminal responds.
A communication terminal according to claim 2 .

A remote conference method executed by a communication terminal,
Determining whether the user's voice indicates utterance or backtracking in a teleconference in which the user of the communication terminal participates,
controlling so that the voice of each of the plurality of participants in the teleconference is output from the communication terminal, and the voice of the user is output from the first communication terminal, which is the communication terminal of each of the plurality of participants;
suppressing output of the user's speech at the first communication terminal when the user speaks while one of the plurality of participants is speaking take control,
counting the number of first utterances, which are utterances whose output is suppressed, for the user of the communication terminal;
controlling so that the number of times is displayed on the first communication terminal ;
controlling so that the number of times the user of the communication terminal utters the first utterance is displayed on the first communication terminal;
Teleconferencing method.

When the user of the communication terminal makes a backtracking, control is performed so that the backhandling is output at the first communication terminal;
The teleconference method according to claim 4 .

controlling so that a face icon corresponding to the user of the communication terminal is displayed on the first communication terminal;
displaying the face icon so as not to operate when the user of the communication terminal utters the first utterance;
displaying the face icon so as to operate the face icon when the user of the communication terminal utters an utterance other than the first utterance;
The teleconference method according to claim 4 or 5 .

displaying the face icon so as to operate the face icon when the user of the communication terminal backtracks;
The teleconference method according to claim 6 .

A program for causing a communication terminal to execute the teleconference method according to any one of claims 4 to 7 .