JP2009027239A

JP2009027239A - Telecommunication conference apparatus

Info

Publication number: JP2009027239A
Application number: JP2007185544A
Authority: JP
Inventors: Yuji Maekawa; 雄二前川
Original assignee: Nakayo Telecommunications Inc
Current assignee: Nakayo Telecommunications Inc
Priority date: 2007-07-17
Filing date: 2007-07-17
Publication date: 2009-02-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide a telecommunication conference apparatus which can make it easy to hear the speaker's voice by judging the speaker of a telecommunication conference system more naturally. <P>SOLUTION: The telecommunication conference apparatus in a telecommunication conference system connected with a plurality of teleconferencing terminals provide a comfortable teleconferencing environment by monitoring the volume of meeting participants in order to specify a speaker, and then increasing the volume of the speaker while suppressing unnecessary voice of non-speakers. The break of utterance of the speaker is judged by monitoring the silent time duration and unnatural volume control by superfluous control is suppressed. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、例えば、３人以上で音声通話による通信会議を行う際に用いられ、電話会議や遠隔テレビ会議等に用いる通信会議装置に関する。 The present invention relates to a communication conference device that is used, for example, when a communication conference by voice call is performed by three or more people and used for a conference call, a remote video conference, and the like.

特許文献１には電話会議システムが開示されており、上記電話会議システムは、一つの電話装置の受話器を用いて複数の人と会話する場合に、通常の会話とは感覚が異なり、円滑にコミュニケーションを図ることができない問題点を解決するものである。また、同時に会話する人数が多くなるにつれて、会話の中心となる相手の話を聞き分けることが困難になり、使い勝手が悪い問題点を解決するものであり、具体的に以下の手段を採っている。 Patent Document 1 discloses a telephone conference system. When the telephone conference system talks with a plurality of people using a handset of a single telephone device, it feels different from normal conversation and communicates smoothly. The problem that cannot be achieved is solved. In addition, as the number of people who talk at the same time increases, it becomes difficult to distinguish the story of the other party who is the center of the conversation and solves the problem of poor usability. Specifically, the following measures are taken.

・音声の重なり時間Ｔ１と当該音声の発言時間Ｔ２の比で議論対象音声と判断し、
・重みＷ＝Ｔ１／Ｔ２でゲインコントロールアンプを制御する。
特開２０００−４９９４８号公報 -It is determined as a speech to be discussed by the ratio of the voice overlap time T1 and the speech time T2 of the voice,
Control the gain control amplifier with weight W = T1 / T2.
JP 2000-49948 A

しかしながら、特許文献１に記載された電話会議システムには以下の問題点がある。
・Ｔ１，Ｔ２を計測してＷを算出した時、既に議論は進行している。
・整然とした議論では発言は重ならず、Ｔ１が注目音声の判断指標とは限らない。
つまり、注目した音声と実際の発言者との間に時間的なずれが生じる場合があり必ずしも自然な会話とはならない。 However, the telephone conference system described in Patent Document 1 has the following problems.
・ When T1 and T2 are measured and W is calculated, the discussion is already in progress.
-In orderly discussions, remarks do not overlap, and T1 is not necessarily the judgment index for the speech of interest.
In other words, there may be a time lag between the noticed voice and the actual speaker, which is not necessarily a natural conversation.

本発明は、発言者の判定をより自然なものとして、発言者の音声を聞き易くすることのできる通信会議装置を提供することを目的としている。 An object of the present invention is to provide a communication conference device that makes it easier to hear a speaker's voice by making the determination of the speaker more natural.

本発明の通信会議装置は、複数の音声会議端末が接続された通信会議システムにおいて、前記複数の音声会議端末のうち、通信会議の参加者が発言中の音声会議端末を特定する機能と、前記参加者が発言中の音声会議端末毎に通話音量を可変する機能と、前記参加者が発言を終了したことを判定する機能と、前記参加者が発言中の前記音声会議端末を強調する機能を備えていることを特徴とする。 The communication conference apparatus of the present invention is a communication conference system in which a plurality of audio conference terminals are connected, and among the plurality of audio conference terminals, the function of specifying an audio conference terminal in which a participant of the communication conference is speaking, A function for varying the call volume for each voice conference terminal in which the participant is speaking, a function for determining that the participant has finished speaking, and a function for emphasizing the voice conference terminal in which the participant is speaking It is characterized by having.

本発明によれば、音声会議端末のマイクから発言中の、発言者の音声を強調してより自然な通信会議を実現することができる。 According to the present invention, it is possible to realize a more natural communication conference by emphasizing the voice of the speaker who is speaking from the microphone of the audio conference terminal.

以下、図面を用いて、本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本発明の通信会議装置は以下の構成により発言者を判定する。
（１）Ａが発言すると通話判定音量以上の音量を検出すると、通信会議端末１aが発言したと判定し、参加者Ａ以外のアンプ利得を低下させる。
（２）Ａの無音期間を監視し、無音監視時間Ｔ０以上の時間が経過すると、発言が終了したと判定し、Ａ以外のアンプ利得を元に戻す。そして次の発言者を待つ。通話判定音量及び無音監視時間Ｔ０については後述する。
（３）つぎにＢが発言すると発言者Ｂとし、発言者Ｂ以外のアンプ利得を低下させる。 The communication conference apparatus of the present invention determines a speaker by the following configuration.
(1) When a volume higher than the call determination volume is detected when A speaks, it is determined that the communication conference terminal 1a speaks, and the amplifier gains other than the participant A are reduced.
(2) The silence period of A is monitored, and when the time equal to or longer than the silence monitoring time T0 has elapsed, it is determined that the speech has ended, and the amplifier gains other than A are restored. Then wait for the next speaker. The call determination volume and the silent monitoring time T0 will be described later.
(3) Next, when B speaks, it becomes speaker B, and amplifier gains other than speaker B are reduced.

図１は、本発明の通信会議装置１０を用いた通信会議システムの全体図である。図１において、１a，１ｂ，１ｃは、音声会議（汎用）端末、２はネットワーク／電話回線、１０は通信会議装置である。会議参加者Ａ，Ｂ，Ｃは、音声会議（汎用）端末１a，１ｂ，１ｃを使い、ネットワーク／電話回線２を介して通信会議を行う。 FIG. 1 is an overall view of a communication conference system using the communication conference apparatus 10 of the present invention. In FIG. 1, 1a, 1b, and 1c are voice conference (general purpose) terminals, 2 is a network / phone line, and 10 is a communication conference device. The conference participants A, B, and C conduct a communication conference via the network / telephone line 2 using the audio conference (general purpose) terminals 1a, 1b, and 1c.

ネットワーク／電話回線２には、本発明の通信会議装置１０が接続されており、通信会議装置１０は、ネットワーク／電話回線２を介した音声による通信会議を円滑に推進する。 The network / telephone line 2 is connected to the communication conference device 10 of the present invention, and the communication conference device 10 smoothly promotes a voice communication conference via the network / telephone line 2.

図２は、本発明の実施例１の通信会議装置１０のブロック構成図である。図２において、１００はネットワークインタフェース入力回路及び分解回路、１０１は入力音量調整部、１０２は音量監視部、１０３は発言分析部、１０４は利得調整部、１０５は加算回路及びネットワークインタフェース出力回路、１０６は中央制御回路、１０７はデータベースを示す。 FIG. 2 is a block configuration diagram of the communication conference apparatus 10 according to the first embodiment of this invention. In FIG. 2, 100 is a network interface input circuit and decomposition circuit, 101 is an input volume adjustment unit, 102 is a volume monitoring unit, 103 is a speech analysis unit, 104 is a gain adjustment unit, 105 is an adder circuit and network interface output circuit, 106 Indicates a central control circuit, and 107 indicates a database.

本ブロック構成図で示した各機能ブロックは、デジタル信号処理用のプロセッサで実現しても良いし、ハードウェアで実現しても良い。 Each functional block shown in the block configuration diagram may be realized by a digital signal processing processor or hardware.

各機能ブロックについて、以下に説明する。ネットワークインタフェース入力回路及び分解回路１００は、ネットワークから通信会議の複数の参加者の音声データが含まれているデータを受信し、それぞれの音声会議端末毎に分解し、各入力音声調整部１０１に音声データを渡す。 Each functional block will be described below. The network interface input circuit / decomposition circuit 100 receives data including audio data of a plurality of participants in the communication conference from the network, decomposes the data for each audio conference terminal, and outputs the audio to each input audio adjustment unit 101. Pass data.

入力音量調整部１０１は、音量監視部１０２からの各音声会議端末の音量データに基づき、それぞれの会議参加者からの入力音声の音量を予め定められた一定の音声レベルに音量を調整する。 Based on the volume data of each audio conference terminal from the volume monitoring unit 102, the input volume adjustment unit 101 adjusts the volume of the input audio from each conference participant to a predetermined predetermined audio level.

音量監視部１０２は、入力音量調整部１０１を通過し一定の音声レベルにされた音声データを分析し、その音量を監視する。 The sound volume monitoring unit 102 analyzes the sound data that has passed the input sound volume adjusting unit 101 and has a constant sound level, and monitors the sound volume.

発言分析部１０３は、入力音量調整部１０１を通過した音声データを監視し、会議参加者の発言の状況を分析する。例えば、発言の回数、通話判定音量以下になった時点から次の発言が通話判定音量以上になるまでの時間やこの時間の発生度数分布等を分析する。 The speech analysis unit 103 monitors the voice data that has passed through the input volume adjustment unit 101 and analyzes the speech status of the conference participants. For example, the number of utterances, the time until the next utterance becomes greater than or equal to the call determination volume from the time when the call determination volume or less is reached, the frequency distribution of this time, and the like are analyzed.

利得調整部１０４は、発言を強調する参加者と抑える参加者の区別をするため、中央制御回路の指示により、強調したい参加者のアンプ利得を「大」に調整し、強調したい発言者以外のアンプ利得を「小」に調整することによって発言者の送話利得が向上される。 The gain adjustment unit 104 adjusts the amplifier gain of the participant to be emphasized to “large” in accordance with an instruction from the central control circuit in order to distinguish between the participant who emphasizes the speech and the participant who suppresses the speech. By adjusting the amplifier gain to “small”, the speaker's transmission gain is improved.

加算回路及びネットワークインタフェース出力回路１０５は、入力音声調整部１０１、利得調整部１０４を通過した会議参加者音声データをミキシングし全ての会議参加者へマルチキャストまたはブロードキャストで送出する。 The adder and network interface output circuit 105 mixes the conference participant audio data that has passed through the input audio adjustment unit 101 and the gain adjustment unit 104, and sends them to all conference participants by multicast or broadcast.

中央制御回路１０６は、音量監視部１０２や発言分析部１０３から収集した情報を基に当該発言者の発言のあり／なしを判定し、発言ありの場合はどの発言を強調するかを判別し利得調整部１０４へ指示する。 The central control circuit 106 determines the presence / absence of the speaker based on information collected from the volume monitoring unit 102 and the speech analysis unit 103, and determines which speech is emphasized when there is a speech, and gain Instructs the adjustment unit 104.

データベース１０７は、発言者毎の最大音量、平均音量、入力音量調整値、及び発生度数分布（発言と発言の間隔など）の分析結果等を保存する。 The database 107 stores the analysis result of the maximum volume, the average volume, the input volume adjustment value, and the occurrence frequency distribution (such as the interval between the statements) for each speaker.

図３を使って、本発明の実施例１の入力音声とアンプ利得制御の基本動作について説明する。 The basic operation of the input voice and amplifier gain control according to the first embodiment of the present invention will be described with reference to FIG.

図３において、
（１）発言なしの状態。ＡＢＣ全ての参加者のアンプ利得を「大」に設定する。アンプ調整を「大」から「小」に設定することで発話中の参加者は自然な会話を行うことができる。
（２）Ａさんの音声が「通話判定音量Ｖｓｈ」を超えたので、Ｂさん及びＣさんの利得を「小」にしてＡさんの発言を強調する。 In FIG.
(1) State without speech. The amplifier gain of all ABC participants is set to “large”. By setting the amplifier adjustment from “large” to “small”, a participant who is speaking can have a natural conversation.
(2) Since the voice of Mr. A exceeds the “call determination volume Vsh”, the gain of Mr. B and Mr. C is set to “small” to emphasize Mr. A's remarks.

（３）Ａさんの音声が「通話判定音量Ｖｓｈ」を下回り、無音監視期間となる。
（４）Ａさんの音声が「無音監視時間Ｔ０」の間に再び通話判定音量Ｖｓｈを超えることがなかったので、発言者なしの状態と判断し、ＡＢＣ全ての利用者の利得を「大」とする。ここで無音監視時間Ｔ０とは発話者の息継ぎの無音時間や発話者の会話と会話の間の時間を意味する。
（５）Ｂさんの音声が通話判定音量Ｖｓｈを超えたので、Ａさん及びＣさんの利得を「小」にしてＡさんの発言を強調する。 (3) The voice of Mr. A falls below “call determination volume Vsh”, and the silent monitoring period is reached.
(4) Since the voice of Mr. A did not exceed the call determination volume Vsh again during the “silent monitoring time T0”, it is determined that there is no speaker and the gain of all ABC users is “high”. And Here, the silent monitoring time T0 means the silent time of the speaker's breath and the time between the conversations of the speaker.
(5) Since the voice of Mr. B exceeds the call determination volume Vsh, the gain of Mr. A and Mr. C is set to “small” to emphasize Mr. A's remarks.

（６）Ｂさんの発言中に、Ｃさんの音声が通話判定音量Ｖｓｈを超えたが、Ｂさんの音声が通話判定音量Ｖｓｈより上にあるため、Ｃさんの利得は「小」のまま変化しない。
（７）Ｂさんの音声が通話判定音量Ｖｓｈを下回り、無音監視期間となる。
（８）Ｂさんの音声が無音監視時間Ｔ０の間に再び通話判定音量Ｖｓｈを超えたので、前の状態を継続する。 (6) While Mr. B's speech, Mr. C's voice exceeded the call determination volume Vsh, but Mr. B's voice is above the call determination volume Vsh, so Mr. C's gain remains “small”. do not do.
(7) Mr. B's voice falls below the call determination volume Vsh, and the silent monitoring period is reached.
(8) Since the voice of Mr. B exceeded the call determination volume Vsh again during the silent monitoring time T0, the previous state is continued.

（９）Ｂさんの音声が通話判定音量Ｖｓｈを下回り、無音監視期間となる。
（１０）Ｂさんの音声が無音監視時間Ｔ０の間に再び通話判定音量Ｖｓｈを超えることがなかったので、発言者なしの状態とし、ＡＢＣ全ての利用者の利得を「大」とする。 (9) Mr. B's voice falls below the call determination volume Vsh, and the silent monitoring period is entered.
(10) Since the voice of Mr. B did not exceed the call determination volume Vsh again during the silent monitoring time T0, the state of no speaker is set, and the gains of all ABC users are set to “high”.

図４は、この発明の実施例１の発言者毎の音量の調整の動作に用いるデータベースを生成する動作を説明するフローチャートである。 FIG. 4 is a flowchart for explaining the operation of generating a database used for the volume adjustment operation for each speaker according to the first embodiment of the present invention.

図４のフローチャートが開始されると、入力音声レベル（人の発する声の周波数）が予め定められた通話判定音量（Ｖｓｈ）を越えるのを待つ（Ｓ１００）。入力音声レベルが通話判定音量（Ｖｓｈ）を越えたら（Ｓ１００：Ｙ）入力音声の計測を開始する（Ｓ１０１）。入力音声の計測は入力音声レベルが通話判定音量（Ｖｓｈ）を下回るまで継続する（Ｓ１０２）。 When the flowchart of FIG. 4 is started, the process waits for the input voice level (frequency of a human voice) to exceed a predetermined call determination volume (Vsh) (S100). When the input voice level exceeds the call determination volume (Vsh) (S100: Y), the measurement of the input voice is started (S101). Measurement of the input voice continues until the input voice level falls below the call determination volume (Vsh) (S102).

入力音声レベルが通話判定音量（Ｖｓｈ）を下回ると（Ｓ１０２：Ｙ）、入力音声レベルの計測を終了する（Ｓ１０３）。そして計測データを分析し、入力音声レベルの最大値、平均値が算出され、算出された値はデーベース１０７に追加される（Ｓ１０４）。データベース７に登録されていた過去のデータと新たに追加したデータから発言者毎に適切な入力音量調整値を算出し更新する（１０５）。 When the input voice level falls below the call determination volume (Vsh) (S102: Y), the measurement of the input voice level is ended (S103). Then, the measurement data is analyzed, the maximum value and the average value of the input voice level are calculated, and the calculated values are added to the database 107 (S104). An appropriate input volume adjustment value is calculated and updated for each speaker from past data registered in the database 7 and newly added data (105).

図４のフローによって会議参加者それぞれの環境、機材の違いや発言者個人の音量の違いにより、会議装置に到達する段階では参加者ごとに音量が異なっていることが考えられる。そのまま、発言者の判定を行おうとした場合に、例えば、声の大きい人またはよりマイクからの距離が近い人が勝つことになってしまい、発言の機会を公平に与えられない。 According to the flow in FIG. 4, the volume of each participant may be different at the stage of reaching the conference device due to differences in the environment, equipment, and individual speaker volume of the conference participants. When trying to determine the speaker as it is, for example, a person who speaks louder or a person closer to the microphone wins, and the opportunity to speak cannot be given fairly.

そこで入力音量調整値を算出することで、会議装置に到達した音量を参加者毎に適宜監視して、発言者判定に適切な会議参加者間で同等の音量となるよう会議装置の入力音量を調整する。また、入力段で適切な音量調整を行うので、発言者を強調する為のアンプ利得の切り替えは、「大」「小」の２段階を切り替えるだけのシンプルなものとすることができる。 Therefore, by calculating the input volume adjustment value, the volume that has reached the conference device is appropriately monitored for each participant, and the input volume of the conference device is set so that the volume is equivalent among the conference participants appropriate for speaker determination. adjust. Further, since appropriate volume adjustment is performed at the input stage, the switching of the amplifier gain for emphasizing the speaker can be as simple as switching between two stages of “large” and “small”.

図５は、データベース７に保存される発言者毎の音量の調整情報を示す図である。データベース１０７には、参加者、入力音声レベルの最大音量、平均音量、通話判定音量、入力音量調整値のデータが記録されている。 FIG. 5 is a diagram showing volume adjustment information for each speaker stored in the database 7. The database 107 stores data of participants, maximum volume of input voice level, average volume, call determination volume, and input volume adjustment value.

図６は、無音監視時間Ｔ０を適応させるための動作を説明するフローチャートである。 FIG. 6 is a flowchart for explaining the operation for adapting the silent monitoring time T0.

通信会議において一気に発言する者もいれば、一言一句を考えながら慎重に発言するものがおり、参加者によって個人差がある。後者の場合、発言中であるにも関わらず無音時間が一定の時間を越えて発言者なしと誤判定されてしまうことが多発する。 Some people speak at a time in a teleconference, while others speak carefully while thinking one word at a time, and there are individual differences depending on the participants. In the latter case, it is often the case that the silent time exceeds a certain time, but it is erroneously determined that there is no speaker even though the speech is being made.

無音監視時間Ｔ０を最適化することにより、参加者ごとに誤判定の回数を低減することができる。 By optimizing the silent monitoring time T0, the number of erroneous determinations can be reduced for each participant.

発言のパターン（発言と発言の間隔など）を適宜監視して、区切りを判定する時間（無音監視時間Ｔ０）を適応させる。 A speech pattern (a speech-to-speech interval or the like) is appropriately monitored, and a time for determining a break (silent monitoring time T0) is adapted.

図６のフローチャートに示すように、まず、通信会議に参加した時に参加者毎の無音監視適応の為のデータベースに過去の参加者記録が存在しないかを検索する。具体的には、図７に図示しないが参加者毎に声紋認証用の情報を蓄積し、入力された音声によって当該参加者を特定する。（Ｓ２００） As shown in the flowchart of FIG. 6, first, when participating in a communication conference, a search is made for past participant records in the database for silent monitoring adaptation for each participant. Specifically, although not shown in FIG. 7, information for voiceprint authentication is accumulated for each participant, and the participant is specified by the input voice. (S200)

データベースに過去データがある場合（Ｓ２００：Ｙ）、無音監視時間Ｔ０をデータベースに記録されている値を設定する（Ｓ２０１）。 When there is past data in the database (S200: Y), the silent monitoring time T0 is set to a value recorded in the database (S201).

データベースに過去データがない場合（Ｓ２００：Ｎ）には、無音監視時間Ｔ０をシステムデフォルト値Ｘ（例えば３００ｍＳ）に設定する（Ｓ２０２）。そしてさらにデータベースに新たな参加者として追加する（Ｓ２０３）。 If there is no past data in the database (S200: N), the silent monitoring time T0 is set to the system default value X (for example, 300 mS) (S202). Further, it is added as a new participant to the database (S203).

次に、当該参加者毎に無音が検出されたかどうかの判定（即ち、通話判定音量Ｖｓｈを下回るか？）を行う（Ｓ２０４）。無音を検出しない場合は、検出されるまで処理を繰り返す。 Next, it is determined whether or not silence has been detected for each participant (that is, is it below the call determination volume Vsh?) (S204). If no silence is detected, the process is repeated until it is detected.

無音を検出した場合（Ｓ２０４：Ｙ）、発言が完了したことを判定または、無音監視時間を計測するために、予め定められた時間（例えば３秒）でタイムアウトする無音時間計測タイマＴ１を起動し、無音時間の計測を開始する（Ｓ２０５）。ここで、通信会議終了の場合（Ｓ２０６：Ｙ）には、無音時間計測タイマＴ１を停止し（Ｓ２０７）、処理を終了する。 When silence is detected (S204: Y), a silence time measurement timer T1 that times out at a predetermined time (for example, 3 seconds) is started in order to determine that the speech has been completed or to measure the silence monitoring time. Then, the silent time measurement is started (S205). Here, when the communication conference is ended (S206: Y), the silent time measurement timer T1 is stopped (S207), and the process is ended.

通常、通信会議が終了していない場合（Ｓ２０６：Ｎ）、無音時間計測タイマＴ１がタイムアウトするかを判定する（Ｓ２０８）。 Usually, when the communication conference has not ended (S206: N), it is determined whether the silent time measurement timer T1 times out (S208).

タイムアウトしていない場合（Ｓ２０８：Ｎ）、音声が入力されるのを待つ（Ｓ２０９）。ここで参加者の音声が検出（ステップＳ２０９：Ｙ）されると、無音時間計測タイマＴ１を停止し、停止時の時間ｔ１の値を分析し、図７に示す無音監視時間データベース７を更新する。具体的にはそのタイマ値にあてはまる発生度数分布の区分の度数を１つ追加して、標本数を＋１する。さらにデータベースの内容を分析して無音時間最頻値、無音監視時間ｔ０を最適な値に更新しその後、無音時間計測タイマＴ１をリセットしてから、ステップＳ２０４に戻って、無音を検出するのを待つ。 If not timed out (S208: N), it waits for a voice to be input (S209). Here, when the voice of the participant is detected (step S209: Y), the silent time measurement timer T1 is stopped, the value of the time t1 at the time of the stop is analyzed, and the silent monitoring time database 7 shown in FIG. 7 is updated. . Specifically, the frequency of the occurrence frequency distribution corresponding to the timer value is added by one, and the number of samples is incremented by one. Further, the contents of the database are analyzed, the silent time mode value and the silent monitoring time t0 are updated to optimum values, and then the silent time measurement timer T1 is reset, and then the process returns to step S204 to detect silence. wait.

無音時間計測タイマＴ１が一定時間経過し、タイムアウトした場合（Ｓ２０８：Ｙ）、参加者のアンプ利得を「大」に調整（ステップＳ２１１）し、無音時間計測タイマＴ１をリセットしてから、ステップＳ２０６に移行する。 If the silent time measurement timer T1 has timed out and timed out (S208: Y), the amplifier gain of the participant is adjusted to “large” (step S211), the silent time measurement timer T1 is reset, and then the step S206. Migrate to

図７は、本発明の実施例１の無音監視時間Ｔ０を適応させるために利用するデータベース１０７である。データベース１０７には、参加者、標本数、無音時間最頻値、発生度数分布、無音監視時間Ｔ０が記録されている。 FIG. 7 shows the database 107 used for adapting the silent monitoring time T0 according to the first embodiment of the present invention. In the database 107, the participant, the number of samples, the silent time mode value, the occurrence frequency distribution, and the silent monitoring time T0 are recorded.

図８は、本発明の実施例２の発言の冒頭部分の不要な利得変化を解消する方式の基本原理を説明する図である。
本発明の実施例１では、発言者がいないときには全ての会議参加者の利得を「大」にすることを特徴とし、発言者強調の為の利得可変時において発言者の音量変化を行わず違和感をなくすことにしている。 FIG. 8 is a diagram for explaining the basic principle of a method for eliminating an unnecessary gain change at the beginning of a statement according to the second embodiment of the present invention.
The first embodiment of the present invention is characterized in that when there is no speaker, the gain of all conference participants is set to “large”, and the volume of the speaker is not changed when the gain is varied for emphasizing the speaker. We are going to lose.

しかしながら、会議参加者の内一人でも背景雑音が大きい環境から参加している者がいる場合には発言者がいないときに、会議参加者全員又は背景雑音の大きい参加者の利得を「小」に制御したほうが良く、その場合においては発言開始時利得が「小」→「大」と変化する為に、参加者の発言が聞き苦しくなる。 However, if at least one of the conference participants is participating from an environment with high background noise, when there is no speaker, the gain of all conference participants or participants with high background noise is reduced to “low”. It is better to control, and in that case, the gain at the start of speech changes from “small” to “large”, so that the participant's speech becomes difficult to hear.

図９は、本発明の実施例２の通信会議装置のブロック構成図である。図８には、各利得調整部１０４の前に、それぞれ、音声データのバッファ１０８を追加した構成が示されている。 FIG. 9 is a block diagram of the communication conference apparatus according to the second embodiment of the present invention. FIG. 8 shows a configuration in which an audio data buffer 108 is added before each gain adjustment unit 104.

音声データのバッファ１０８によって、喋り始めの音声を、一時的に音声をメモリに蓄えて、利得を「大」に引き上げてからわずかな時間遅らせて再生することで喋り始めの音量変化を排除する。なお、ここで発生させたわずかな遅延を、例えば無音時間の再生を省略しながら行うことにより発言が進むにつれ実際の音声に近づけることが可能であり、遅延が好ましくない環境においても、影響を軽減できる。 The voice data buffer 108 temporarily stores the voice in the memory and reproduces it by delaying it for a short time after raising the gain to “high”, thereby eliminating the volume change at the beginning of the voice. Note that the slight delay generated here can be made closer to the actual voice as the utterance progresses, for example, by omitting the reproduction of silence time, and the effect is reduced even in environments where delay is not desirable. it can.

図９に示された「実際の音声」と「再生音声」の時間的な変化を参照すると、「実際の音声」について（１）音声を検出し、「再生音声」については、（２）利得調整回路の利得を「小」→「大」にしてから、（３）音声データの出力を開始する。また、（４）「再生音声」の無音区間を少しずつ間引いて「実際の音声」に近づけるようにする。（５）では、「実際の音声」と「再生音声」の開始時間は、ほぼ一致する。 Referring to the temporal changes in “actual speech” and “reproduced speech” shown in FIG. 9, (1) speech is detected for “actual speech”, and (2) gain for “reproduced speech”. After the gain of the adjustment circuit is changed from “small” to “large”, (3) output of audio data is started. Also, (4) silent sections of “reproduced speech” are thinned out little by little so as to approach “actual speech”. In (5), the start times of “actual audio” and “reproduced audio” are substantially the same.

そのまま再生すると、利得調整回路の利得調整前に音声が出力され不自然となる。図９には、各利得調整部１０４の前に、それぞれ、バッファを追加するブロック図上の位置を示す（１０８）。再生音声が実際の音声に近づいたかどうかは、バッファに存在するデータの量で判断する。データ量が少ないほど実際の音声と再生音声のタイミングが近づいていることを示す。 If it is reproduced as it is, the sound is output before the gain adjustment of the gain adjustment circuit and becomes unnatural. FIG. 9 shows a position on the block diagram where a buffer is added before each gain adjusting unit 104 (108). Whether or not the reproduced sound approaches the actual sound is determined by the amount of data existing in the buffer. The smaller the amount of data, the closer the timing between the actual voice and the reproduced voice is.

なお、複数の音声会議端末の利得調整に当っては、演説モード：アンプ利得が変動しにくいモード、議論モード：アンプ利得が変動し易いモード、或いは、２人以上の複数の参加者の発言が強調されるモードなどを設定可能とし、発言者がずっと喋り続けた場合には、演説モードの設定やゲインバランスのリセットにより、他の人の音声が低レベルのままになる欠点を解消することができる。 When adjusting the gains of a plurality of voice conference terminals, speech mode: mode in which the amplifier gain is unlikely to change, discussion mode: mode in which the amplifier gain is likely to change, or remarks by two or more participants It is possible to set the mode to be emphasized, etc. If the speaker keeps speaking for a long time, setting the speech mode or resetting the gain balance can eliminate the disadvantage that other people's voice remains at a low level. it can.

また、一定時間または無音期間毎に、会議参加者全ての調整済みゲインバランスをリセットして、複数の音声会議端末の利得調整を確実に設定することができる。 In addition, it is possible to reset the adjusted gain balance of all the conference participants and to set the gain adjustment of a plurality of audio conference terminals with certainty or every silent period.

尚、本実施例では、通話判定音量Ｖｓｈを使って説明してきたが、参加者の発言開始の判定と参加者の発言終了の判定音量を分けて実施してもよい。具体的には通話開始判定音量Ｖｓｈ１と通話終了判定音量Ｖｓｈ２を設けて通話開始判定音量Ｖｓｈ１の判定音量は低く設定し、通話終了判定音量Ｖｓｈ２の判定音量は高く設定することで、より高い精度の無音監視時間Ｔ０を測定することができる。 In the present embodiment, the call determination volume Vsh has been described. However, the participant's speech start determination and the participant's speech end determination volume may be separately performed. Specifically, the call start determination volume Vsh1 and the call end determination volume Vsh2 are provided so that the determination volume of the call start determination volume Vsh1 is set low, and the determination volume of the call end determination volume Vsh2 is set high, thereby achieving higher accuracy. The silent monitoring time T0 can be measured.

図１は、本発明の通信会議装置を用いた通信会議システムの全体図である。FIG. 1 is an overall view of a communication conference system using the communication conference apparatus of the present invention. 図２は、本発明の実施例１の通信会議装置のブロック構成図である。FIG. 2 is a block diagram of the communication conference apparatus according to the first embodiment of the present invention. 図３は、本発明の実施例１の入力音声とアンプ利得制御との関係の基本原理を説明する図である。FIG. 3 is a diagram for explaining the basic principle of the relationship between the input voice and the amplifier gain control according to the first embodiment of the present invention. 図４は、本発明の実施例１のデータベース作成の動作を説明するフローチャートである。FIG. 4 is a flowchart for explaining the database creation operation according to the first embodiment of the present invention. 図５は、本発明の実施例１の発言者毎に音量を適宜監視して音量の調整を行うためのデータベースを示す図である。FIG. 5 is a diagram illustrating a database for adjusting the volume by appropriately monitoring the volume for each speaker according to the first embodiment of the present invention. 図６は、本発明の実施例１の無音監視時間Ｔ０を適応させるための動作を説明するフローチャートである。FIG. 6 is a flowchart for explaining the operation for adapting the silent monitoring time T0 according to the first embodiment of the present invention. 図７は、本発明の実施例１の無音監視時間Ｔ０を適応させるために利用するデータベースを示す図である。FIG. 7 is a diagram showing a database used for adapting the silent monitoring time T0 according to the first embodiment of the present invention. 図８は、本発明の実施例２の通信会議装置のブロック構成図である。FIG. 8 is a block diagram of the communication conference apparatus according to the second embodiment of the present invention. 図９は、本発明の実施例２の発言の冒頭部分の不要な利得変化を解消する方式の基本原理を説明する図である。FIG. 9 is a diagram for explaining the basic principle of a method for eliminating an unnecessary gain change at the beginning of a statement according to the second embodiment of the present invention.

Explanation of symbols

１ａ音声会議端末
１ｂ音声会議端末
１ｃ音声会議端末
２ネットワーク／電話回線
１０通信会議装置
１００ネットワークインタフェース入力回路及び分解回路
１０１入力音量調整部
１０２音量監視部
１０３発言分析部
１０４利得調整部
１０５加算回路及びネットワークインタフェース出力回路
１０６中央制御回路
１０７データベース
１０８バッファ DESCRIPTION OF SYMBOLS 1a Voice conference terminal 1b Voice conference terminal 1c Voice conference terminal 2 Network / telephone line 10 Communication conference apparatus 100 Network interface input circuit and decomposition circuit 101 Input volume adjustment part 102 Volume monitoring part 103 Statement analysis part 104 Gain adjustment part 105 Addition circuit and Network interface output circuit 106 Central control circuit 107 Database 108 Buffer

Claims

A communication conference apparatus in a communication conference system to which a plurality of voice conference terminals are connected, and a function of identifying a voice conference terminal that inputs speech voices of nearby communication conference participants among the plurality of voice conference terminals A function of varying the transmission gain for each voice conference terminal that is inputting speech voices of the nearby communication conference participants, a function of determining that the neighboring communication conference participants have finished speaking, A communication conference apparatus having a function of displaying the voice conference terminal.

The teleconference device according to claim 1,
The communication is characterized in that the transmission gain of the audio conference terminal that is inputting the voice of the nearby communication conference participant is set to be larger than that of the audio conference terminal that is not inputting the audio of the nearby communication conference participant and transmitted. Conference equipment.

In the communication conference apparatus according to claim 1 or 2,
A communication conferencing apparatus characterized by monitoring a speech voice pattern for each voice conference terminal and determining a break or end of a speech for each voice conference terminal.

In the communication conference apparatus according to claim 1,
When the system setting for switching the transmission gain from small to large when speech voice is detected is set, the function for temporarily storing the speech voice and the temporary storage after switching the transmission gain from small to large A communication conference apparatus, comprising: a function of reproducing the speech sound that has been played; and a function of reproducing the silent part every time a silent part is detected after the start of reproduction of the speech sound.