JP2023032434A

JP2023032434A - Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method

Info

Publication number: JP2023032434A
Application number: JP2021138564A
Authority: JP
Inventors: 高詩石黒; Takashi Ishiguro
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2023-03-09
Also published as: WO2023026600A1

Abstract

To provide a conference system capable of effectively suppressing the occurrence of howling in a web conference or the like.SOLUTION: A conference system comprises: first mixing means for outputting, to each terminal participating in a conference, a first synthesized sound signal obtained by synthesizing respective microphone input signals of terminals other than one terminal; sound detecting means for detecting any of the first synthesized sound signals includes a sound; attenuating means for attenuating a specific band of each first synthesized sound signal and outputting a synthesized output signal of each terminal; band limitation detecting means for detecting that the attenuated output signals of each terminal are included in the microphone input signals; echo detecting means for detecting that the microphone input signal includes an echo when it is detected that the microphone input signal includes the attenuated output signals of each terminal, and it is determined that the output signals include sound; and echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.SELECTED DRAWING: Figure 1

Description

本発明は、会議システム、会議サーバ、会議端末、エコー消去プログラム、及びエコー消去方法に関し、例えば、Ｗｅｂ会議を行う会議システムに適用し得る。 The present invention relates to a conference system, a conference server, a conference terminal, an echo cancellation program, and an echo cancellation method, and can be applied, for example, to a conference system that conducts a web conference.

近年、新型コロナウイルスの影響もあり、様々な場面で、Ｗｅｂ会議、テレビ電話会議等の遠隔会議システムを利用することが増えている。 In recent years, partly due to the influence of the novel coronavirus, the use of teleconferencing systems such as web conferences and video conferences has increased in various situations.

上述のＷｅｂ会議等の遠隔会議システムでは、エコーやハウリングを防止するために様々な技術が導入されている。例えば、特許文献１では、同一グループ内の各端末マイクからの音声入力信号を合成してから、適応フィルタを用いてエコーを除去する技術が開示されている。 Various techniques have been introduced to prevent echoes and howling in teleconferencing systems such as the above-described web conferencing. For example, Patent Literature 1 discloses a technique of synthesizing voice input signals from each terminal microphone in the same group and then removing echo using an adaptive filter.

特開２０１３－２５１６３０号公報JP 2013-251630 A 特開２０１１－０７００８４号公報JP 2011-070084 A 特開２０１５－１７０８６７号公報JP 2015-170867 A 特開２０１７－０３４３５５号公報JP 2017-034355 A 特開２００９－０３３３４４号公報JP 2009-033344 A

しかしながら、上述の従来技術では、ノートブック型の携帯型パーソナルコンピュータ（ノートＰＣ）やタブレット端末のような情報端末を利用してＷＥＢ会議を行うような場合、同じ部屋に集まった会議の参加者（２名以上の参加者）のいずれかが情報端末でスピーカフォンを実行して、近接した情報端末間の音声について発生するハウリングを効果的に抑制できなかった。 However, in the conventional technology described above, when a web conference is held using an information terminal such as a notebook-type portable personal computer (notebook PC) or a tablet terminal, the conference participants ( (2 or more participants) could not effectively suppress the howling that occurs with the voices between adjacent information terminals by running the speakerphone on the information terminal.

例えば、特許文献１に記載の技術は、同一グループでスピーカ出力１台かつ同一グループのマイク入力を合成できる構成にしか適用できないので、一般的なＷｅｂ会議には適用できない。即ち、Ｗｅｂ会議では、自端末以外の音声入力を合成してスピーカ出力するので、端末毎に出力音が異なる。 For example, the technology described in Patent Literature 1 can only be applied to a configuration in which one speaker output in the same group and microphone inputs in the same group can be combined, so it cannot be applied to a general Web conference. That is, in the Web conference, voice input from terminals other than the own terminal is synthesized and output from the speaker, so the output sound differs from terminal to terminal.

そのため、Ｗｅｂ会議等におけるハウリングの発生を効果的に抑制できる会議システム、会議サーバ、会議端末、エコー消去プログラム、及びエコー消去方法が望まれている。 Therefore, a conference system, a conference server, a conference terminal, an echo cancellation program, and an echo cancellation method that can effectively suppress the occurrence of howling in web conferences and the like are desired.

第１の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムであって、（１）前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第１の合成音信号として出力する第１のミキサ手段と、（２）前記各第１の合成音信号のいずれかが有音であることを検出する有音検出手段と、（３）前記各第１の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力する特定帯域減衰手段と、（４）前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、（５）前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、（６）前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段とを有することを特徴とする。 A first aspect of the present invention is a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, wherein: (1) each of the conference terminals has a first mixer means for synthesizing the microphone input signal of the conference terminal and outputting it as a first synthesized sound signal; (3) specific band attenuation means for attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; (4) the above (5) band limit detection means for detecting that the output signal of each conference terminal after attenuation in a specific band is included in the microphone input signal; Detecting that an echo is included in the microphone input signal when it is detected that the output signal of each conference terminal is included and the presence of voice is determined by the voice detection means. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

第２の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議サーバであって、（１）前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第１の合成音信号として出力する第１のミキサ手段と、（２）前記各第１の合成音信号のいずれかが有音であることを検出する有音検出手段と、（３）前記各第１の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力する特定帯域減衰手段と、（４）前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、（５）前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、（６）前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段とを有することを特徴とする。 A second aspect of the present invention is a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, wherein: (1) each of the conference terminals has its own terminal (2) any one of said first synthesized speech signals is active; (3) specific band attenuation means for attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; 4) band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after the conference terminal is included, and when it is determined that there is voice by the voice presence detection means, the microphone input signal includes an echo. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

第３の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議端末であって、（１）前記会議サーバから取得した自端末以外の他の会議端末の前記マイク入力信号を合成した第１の合成音信号に、自端末の前記マイク入力信号を合成し、第２の合成音信号として出力する第２のミキサ手段と、（２）前記第２の合成音信号が有音であることを検出する有音検出手段と、（３）前記第１の合成音信号の特定帯域を減衰し、減衰後の信号を前記会議端末の出力信号として出力する特定帯域減衰手段と、（４）前記マイク入力信号に対して、特定帯域減衰後の前記会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、（５）前記帯域制限検出手段により、特定帯域減衰後の前記会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、（６）前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段とを有することを特徴とする。 A third aspect of the present invention is a conference terminal in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server, wherein: (2) second mixer means for synthesizing the microphone input signal of its own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of another conference terminal and outputting the result as a second synthesized sound signal; (3) attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal; (4) band limit detection means for detecting that the microphone input signal includes the output signal of the conference terminal after attenuation of the specific band; (5) the above When the band limit detection means detects that the output signal of the conference terminal after attenuation in a specific band is included and the voice activity detection means determines that there is voice activity, the microphone input is detected. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

第４の本発明のエコー消去プログラムは、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議サーバに搭載されるコンピュータを、（１）前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第１の合成音信号として出力する第１のミキサ手段と、（２）前記各第１の合成音信号のいずれかが有音であることを検出する有音検出手段と、（３）前記各第１の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力する特定帯域減衰手段と、（４）前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、（５）前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、（６）前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段として機能させることを特徴とする。 An echo cancellation program according to a fourth aspect of the present invention provides a computer mounted on a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server. (2) first mixer means for synthesizing the microphone input signals of conference terminals other than the own terminal and outputting the signals as first synthesized speech signals; (3) attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; (4) band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; (5) the band limit. When the detection means detects that the output signal of each of the conference terminals after attenuation in a specific band is included and the voice activity detection means determines that there is voice activity, the microphone input signal is detected. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

第５の本発明のエコー消去プログラムは、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議端末に搭載されるコンピュータを、（１）前記会議サーバから取得した自端末以外の他の会議端末の前記マイク入力信号を合成した第１の合成音信号に、自端末の前記マイク入力信号を合成し、第２の合成音信号として出力する第２のミキサ手段と、（２）前記第２の合成音信号が有音であることを検出する有音検出手段と、（３）前記第１の合成音信号の特定帯域を減衰し、減衰後の信号を前記会議端末の出力信号として出力する特定帯域減衰手段と、（４）前記マイク入力信号に対して、特定帯域減衰後の前記会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、（５）前記帯域制限検出手段により、特定帯域減衰後の前記会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、（６）前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段とを有することを特徴とする。 An echo cancellation program according to a fifth aspect of the present invention provides a computer installed in a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, (1) the conference server; a first synthesized sound signal obtained by synthesizing the microphone input signal of another conference terminal other than the own terminal obtained from (2) sound detection means for detecting that the second synthesized sound signal is sound; as an output signal of the conference terminal; and (4) band limit detection for detecting that the microphone input signal includes the output signal of the conference terminal after attenuating the specific band. (5) the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included, and the voice activity detection means determines that there is voice activity; and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal. It is characterized by

第６の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムに使用するエコー消去方法であって、第１のミキサ手段、有音検出手段、特定帯域減衰手段、帯域制限検出手段、エコー検出手段、及びエコー消去手段を有し、（１）前記第１のミキサ手段は、前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第１の合成音信号として出力し、（２）前記有音検出手段は、前記各第１の合成音信号のいずれかが有音であることを検出し、（３）前記特定帯域減衰手段は、前記各第１の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力し、（４）前記帯域制限検出手段は、前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出し、（５）前記エコー検出手段は、前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出し、（６）前記エコー消去手段は、前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去することを特徴とする。 A sixth aspect of the present invention is an echo canceling method used in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, comprising first mixer means and active voice detection means. , specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means; synthesize the microphone input signals of and output as a first synthesized sound signal; (3) the specific band attenuation means attenuates a specific band of each first synthesized sound signal and outputs the signal after attenuation as an output signal of each conference terminal; (5) the echo detection means detects that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after the conference terminal is included, and when it is determined that there is voice by the voice presence detection means, the microphone input signal includes an echo. (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;

第７の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議サーバに使用するエコー消去方法であって、第１のミキサ手段、有音検出手段、特定帯域減衰手段、帯域制限検出手段、エコー検出手段、及びエコー消去手段を有し、（１）前記第１のミキサ手段は、前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第１の合成音信号として出力し、（２）前記有音検出手段は、前記各第１の合成音信号のいずれかが有音であることを検出し、（３）前記特定帯域減衰手段は、前記各第１の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力し、（４）前記帯域制限検出手段は、前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出し、（５）前記エコー検出手段は、前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出し、（６）前記エコー消去手段は、前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去することを特徴とする。 A seventh aspect of the present invention is an echo canceling method used in a conference server in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server, the method comprising first mixer means, It has sound detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means, and (1) the first mixer means is provided for each of the conference terminals to (2) the active voice detecting means detects that any one of the first synthetic voice signals is active; (3) the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals, outputs the signal after attenuation as an output signal of each of the conference terminals, and (4) limits the band. The detection means detects that the microphone input signal includes the output signal of each of the conference terminals after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after attenuation in a specific band is included, and when it is determined by the voice activity detecting means that there is voice activity, the microphone input signal includes an echo. (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;

第８の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議端末に使用するエコー消去方法であって、第２のミキサ手段、有音検出手段、特定帯域減衰手段、帯域制限検出手段、エコー検出手段、及びエコー消去手段を有し、（１）前記第２のミキサ手段は、前記会議サーバから取得した自端末以外の他の会議端末の前記マイク入力信号を合成した第１の合成音信号に、自端末の前記マイク入力信号を合成し、第２の合成音信号として出力し、（２）前記有音検出手段は、前記第２の合成音信号が有音であることを検出し、（３）前記特定帯域減衰手段は、前記第１の合成音信号の特定帯域を減衰し、減衰後の信号を前記会議端末の出力信号として出力し、（４）前記帯域制限検出手段は、前記マイク入力信号に対して、特定帯域減衰後の前記会議端末の出力信号が含まれていることを検出し、（５）前記エコー検出手段は、前記帯域制限検出手段により、特定帯域減衰後の前記会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出し、（６）前記エコー消去手段は、前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去することを特徴とする。 An eighth aspect of the present invention is an echo canceling method used in a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, comprising: second mixer means; a sound detection means, a specific band attenuation means, a band limit detection means, an echo detection means, and an echo cancellation means; synthesizes the microphone input signal of its own terminal with a first synthesized speech signal obtained by synthesizing the microphone input signal of the terminal, and outputs the synthesized speech signal as a second synthesized speech signal; (3) the specific band attenuation means attenuates a specific band of the first synthesized sound signal and outputs the signal after attenuation as an output signal of the conference terminal; (4) the band limit detection means detects that the microphone input signal includes the output signal of the conference terminal after attenuation in a specific band; (5) the echo detection means is detected by the band limit detection means that the output signal of the conference terminal after attenuation of a specific band is included, and when the spurt detection means determines that there is spurt, (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;

本発明によれば、Ｗｅｂ会議等におけるハウリングの発生を効果的に抑制できる。 According to the present invention, it is possible to effectively suppress the occurrence of howling in a web conference or the like.

実施形態に係る会議システムの構成例について示すブロック図である。1 is a block diagram showing a configuration example of a conference system according to an embodiment; FIG. 実施形態に係る帯域制限部の詳細構成を示すブロック図である。3 is a block diagram showing the detailed configuration of a band limiter according to the embodiment; FIG. 実施形態に係る帯域制限検出部の詳細構成を示すブロック図である。4 is a block diagram showing a detailed configuration of a band limit detector according to the embodiment; FIG. 実施形態に係る会議端末のマイク入力信号の周波数特性を説明する説明図である。FIG. 4 is an explanatory diagram illustrating frequency characteristics of a microphone input signal of the conference terminal according to the embodiment; 変形形態に係る会議システムの構成例について示すブロック図である。FIG. 11 is a block diagram showing a configuration example of a conference system according to a modification;

（Ａ）主たる実施形態
以下では、会議システム、会議サーバ、会議端末、エコー消去プログラム、及びエコー消去方法の一実施形態を、図面を参照しながら詳細に説明する。 (A) Main Embodiments An embodiment of a conference system, a conference server, a conference terminal, an echo cancellation program, and an echo cancellation method will be described in detail below with reference to the drawings.

（Ａ－１）の実施形態の構成
（Ａ－１－１）全体構成
図１は、実施形態に係る会議システムの構成例について示すブロック図である。 (A-1) Configuration of Embodiment (A-1-1) Overall Configuration FIG. 1 is a block diagram showing a configuration example of a conference system according to an embodiment.

図１において、会議システム１は、３台の会議端末２（２－Ａ～２－Ｃ）と、会議サーバ３とを有する。図１では、説明を簡易なものにするため、３台の会議端末２（２－Ａ～２－Ｃ）を示しているが、会議端末２の台数は特に限定されるものではない。即ち、この実施形態では、３台の会議端末２－Ａ～２－Ｃが１つの会議を開催する場合を例示するが、１つの会議を開催する会議端末２の数は特に限定されるものではない。 In FIG. 1, the conference system 1 has three conference terminals 2 (2-A to 2-C) and a conference server 3. FIG. Although three conference terminals 2 (2-A to 2-C) are shown in FIG. 1 to simplify the explanation, the number of conference terminals 2 is not particularly limited. That is, in this embodiment, three conference terminals 2-A to 2-C hold one conference, but the number of conference terminals 2 holding one conference is not particularly limited. do not have.

また、図１では、会議端末２と会議サーバ３との間の接続構成については図示を省略しているが、種々の接続構成を適用することができる。この実施形態では、会議端末２及び会議サーバ３は、通信回線（例えば、インタ－ネットなどのような広域ネットワーク、電話回線、又は、専用回線など）を経由して双方向に通信可能であるものとする。 Also, in FIG. 1, illustration of the connection configuration between the conference terminal 2 and the conference server 3 is omitted, but various connection configurations can be applied. In this embodiment, the conference terminal 2 and the conference server 3 are capable of two-way communication via a communication line (for example, a wide area network such as the Internet, a telephone line, or a dedicated line). and

会議サーバ３は、複数の拠点にある各会議端末２から得た音声を合成して、会議用データに変換する機能を有する。会議サーバ３は、各会議端末２のための会議用データ（合成音）を各会議端末に出力（送信）する。 The conference server 3 has a function of synthesizing voices obtained from each of the conference terminals 2 at a plurality of bases and converting them into conference data. The conference server 3 outputs (transmits) conference data (synthetic sound) for each conference terminal 2 to each conference terminal.

会議端末２は、会議に参加する端末であって、音声の入出力機能（マイク、スピーカ）及び通信機能を備えた情報処理端末であれば良い。例えば、会議端末２は、ＰＣ、スマートフォン等の携帯端末、タブレット、ウェアラブル装置等を適用することができる。 The conference terminal 2 is a terminal that participates in a conference, and may be an information processing terminal that has an audio input/output function (microphone, speaker) and a communication function. For example, the conference terminal 2 can be a PC, a mobile terminal such as a smartphone, a tablet, a wearable device, or the like.

以下では、図１の各会議端末２を示す際、会議端末２－Ａを「端末Ａ」、会議端末２－Ｂを「端末Ｂ」、会議端末２－Ｃを「端末Ｃ」と単に呼ぶこともある。また、各端末Ａ～Ｃのマイクから入力されて、会議サーバ３に送信される音声（音声データ）をそれぞれ、「音声Ａ」、「音声Ｂ」、「音声Ｃ」と呼ぶこともある。 Hereinafter, when referring to each conference terminal 2 in FIG. 1, the conference terminal 2-A is simply referred to as "terminal A", the conference terminal 2-B as "terminal B", and the conference terminal 2-C as "terminal C". There is also Also, the voices (voice data) input from the microphones of the terminals A to C and transmitted to the conference server 3 may be called "voice A", "voice B", and "voice C", respectively.

（Ａ－１－２）会議サーバ３の詳細構成
図１において、会議サーバ３は、第１のミキサ手段としてのミキサ部３０と、帯域制限部３１と、帯域制限検出部３２と、有音検出部３３と、エコー検出部３５と、エコー消去部３６とを有する。 (A-1-2) Detailed Configuration of Conference Server 3 In FIG. It has a section 33 , an echo detection section 35 and an echo cancellation section 36 .

会議サーバ３は、プロセッサやメモリ等を有するコンピュータにプログラム（実施形態に係るエコー消去プログラム）をインストールして実現するようにしても良いが、この場合でも、会議サーバ３は機能的には図１を用いて示すことができる。なお、会議サーバ３については一部又は全部をハードウェア的に実現するようにしても良い。 The conference server 3 may be implemented by installing a program (echo cancellation program according to the embodiment) in a computer having a processor, memory, etc., but even in this case, the conference server 3 is functionally as shown in FIG. can be shown using Part or all of the conference server 3 may be realized by hardware.

ミキサ部３０は、各会議端末２から供給される音声データ（マイク入力信号）を合成（ミキシング）した音声データ（会議用の合成音信号）を生成して、対応する各会議端末２に供給する。例えば、図１では、ミキサ部３０は、（１）端末Ａに端末Ｂ、Ｃのマイク入力信号の合成音（音声Ｂ＋Ｃ）、（２）端末Ｂに端末Ａ、Ｃのマイク入力信号の合成音（音声Ａ＋Ｃ）、端末Ｃに端末Ａ、Ｂのマイク入力信号の合成音（音声Ａ＋Ｂ）を各々、出力する。 The mixer unit 30 generates audio data (synthetic sound signal for conference) by synthesizing (mixing) the audio data (microphone input signal) supplied from each conference terminal 2 and supplies it to each corresponding conference terminal 2 . . For example, in FIG. 1, the mixer unit 30 provides (1) synthesized sound (voice B+C) of microphone input signals of terminals B and C to terminal A, and (2) synthesized sound of microphone input signals of terminals A and C to terminal B. (speech A+C), and synthesized sounds (speech A+B) of the microphone input signals of terminals A and B are output to terminal C, respectively.

有音検出部３３（３３－１～３３－３）は、ミキサ部３０から出力された合成音に対して、有音検出処理を行う。有音検出部３３は、種々様々な有音検出処理を適用することができるが、例えば、特許文献２に記載の技術を適用することができる。有音検出部３３は、有音判定結果を条件判定部３４に与える。 The spurt detection section 33 (33-1 to 33-3) performs spurt detection processing on the synthesized speech output from the mixer section 30. FIG. The spurt detection unit 33 can apply various spurt detection processes, and for example, the technology described in Patent Document 2 can be applied. The voice presence detection unit 33 gives the voice presence determination result to the condition determination unit 34 .

条件判定部３４は、ミキサ部３０から出力された各合成音に対する各有音検出部３３（３３－１～３３－３）の判定結果について、判定結果の内、いずれかが有音との判定結果になっているか（ＯＲ条件）判定を行う。条件判定部３４は、ＯＲ条件判定の結果をエコー検出部３５に与える。有音検出手段は、例えば、上述の有音検出部３３及び当該条件判定部３４により実現される。 The condition determination unit 34 determines that one of the determination results of the voice detection units 33 (33-1 to 33-3) for each synthetic sound output from the mixer unit 30 is voice. It is determined whether the result is obtained (OR condition). The condition determination unit 34 gives the OR condition determination result to the echo detection unit 35 . The voice presence detection means is implemented by, for example, the voice presence detection unit 33 and the condition determination unit 34 described above.

帯域制限部３１（３１－１～３１－３）は、ミキサ部３０から出力された合成音に対して、所定の帯域を制限（消去）して、各会議端末２のスピーカで出力する信号として出力するものである。例えば、特許文献３に記載の技術を適用して、聴感に影響の少ない２．５～３．０ｋＨｚ帯域を制限する。 The band limiting unit 31 (31-1 to 31-3) limits (erases) a predetermined band of the synthesized sound output from the mixer unit 30, and outputs the signal from the speaker of each conference terminal 2. This is the output. For example, the technique described in Patent Document 3 is applied to limit the 2.5-3.0 kHz band, which has little effect on hearing.

帯域制限検出部３２（３２－１～３２－３）は、各会議端末２のマイク入力（マイク入力信号）に対して、上述の帯域制限部３１で帯域制限した信号（各会議端末２のスピーカで出力した信号）のマイク入力を検出する。帯域制限検出部３２は、帯域制限の検出結果（会議端末２のマイク入力信号に、各会議端末２の帯域制限部３１で帯域制限した信号が含まれているか否か）をエコー検出部３５に与える。帯域制限検出部３２の詳細は、動作の項で述べる。 The band limit detection unit 32 (32-1 to 32-3) detects the signal (speaker Detects the microphone input of the signal output by the The band limit detector 32 sends the band limit detection result (whether or not the microphone input signal of the conference terminal 2 contains a signal band-limited by the band limiter 31 of each conference terminal 2) to the echo detector 35. give. The details of the band limit detector 32 will be described in the section on operation.

エコー検出部３５（３５－１～３５－３）は、各会議端末２のマイク入力（マイク入力信号）にエコー成分が含まれているか否かを検出するものである。具体的に、エコー検出部３５は、帯域制限検出部３２の帯域制限の検出結果、及び条件判定部３４のＯＲ条件判定の結果に基づいて、エコー検出を行う。例えば、エコー検出部３５は、各会議端末２で、「帯域制限信号のマイク入力」かつ「いずれかのスピーカ出力で有音状態」が成立したら、エコー検出とする。エコー検出部３５は、エコー検出結果をエコー消去部３６に与える。 The echo detector 35 (35-1 to 35-3) detects whether or not the microphone input (microphone input signal) of each conference terminal 2 contains an echo component. Specifically, the echo detection unit 35 performs echo detection based on the band limitation detection result of the band limitation detection unit 32 and the OR condition determination result of the condition determination unit 34 . For example, when each conference terminal 2 satisfies "microphone input of band-limited signal" and "sound output from any speaker", the echo detection unit 35 detects the echo. The echo detector 35 gives the echo detection result to the echo canceler 36 .

エコー消去部３６（３６－１～３６－３）は、各エコー検出部３５によるエコー検出結果がエコー検出の場合には、マイク入力信号のエコーを消去する。 The echo canceller 36 (36-1 to 36-3) cancels the echo of the microphone input signal when the echo detection result by each echo detector 35 is echo detection.

例えば、エコー消去部３６は、エコーサプレッサを適用することができる。エコー消去部３６がエコーサプレッサを用いる場合、最も簡単な方法は、エコー検出時に、マイク入力信号をそのまま減衰させる方法である。また、エコー消去部３６は、当該信号を帯域分割して、必要に応じて帯域毎に減衰させるようにしても良い（その際、ＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）等を使用しても良い）。エコー消去部３６がエコーサプレッサを用いる場合、例えば、特許文献４に記載の技術（方法）を適用することができる。 For example, the echo canceller 36 can apply an echo suppressor. When the echo canceller 36 uses an echo suppressor, the simplest method is to attenuate the microphone input signal as it is when echo is detected. Also, the echo canceller 36 may divide the signal into bands and attenuate each band if necessary (in this case, FFT (Fast Fourier Transform) or the like may be used). If the echo canceller 36 uses an echo suppressor, for example, the technology (method) described in Patent Document 4 can be applied.

なお、エコー消去部３６は、適応エコーキャンセラを用いても良い。例えば、適応エコーキャンセラの実現には、特許文献３又は５に記載の技術（方法）を適用することができる。 Note that the echo canceller 36 may use an adaptive echo canceller. For example, the technology (method) described in Patent Document 3 or 5 can be applied to implement an adaptive echo canceller.

（Ａ－２）実施形態の動作
次に、以上のような構成を有する実施形態に係る会議システム１の動作を説明する。会議システム１では、エコーを検出する処理（主に、帯域制限検出処理）に特徴を有するので、以下では、この点を中心に、図面を参照しながら詳細に説明する。 (A-2) Operation of Embodiment Next, the operation of the conference system 1 according to the embodiment having the configuration as described above will be described. Since the conference system 1 is characterized by echo detection processing (mainly band limit detection processing), this point will be mainly described in detail below with reference to the drawings.

（Ａ－２－１）帯域制限部３１及び帯域制限検出部３２の処理
図２は、実施形態に係る帯域制限部の詳細構成を示すブロック図である。 (A-2-1) Processing of Band Limiter 31 and Band Limit Detector 32 FIG. 2 is a block diagram showing the detailed configuration of the band limiter according to the embodiment.

図２に示すように、帯域制限部３１は、ＢＥＦ（帯域阻止フィルタ）３１０を有する。ＢＥＦ３１０は、ミキサ部３０からの出力信号（各会議端末２のスピーカで出力する合成音）に対して、例えば、聴感に影響の少ない２．５～３．０ｋＨｚ帯域を消して出力する。そして、２．５～３．０ｋＨｚの特定帯域の信号成分が減衰した信号は、各会議端末２に送信されて、スピーカ出力される。 As shown in FIG. 2, the band limiting section 31 has a BEF (Band Elimination Filter) 310 . The BEF 310 eliminates, for example, the 2.5 to 3.0 kHz band, which has little effect on hearing, from the output signal from the mixer section 30 (synthesized sound output from the speaker of each conference terminal 2). Then, the signal in which the signal component in the specific band of 2.5 to 3.0 kHz is attenuated is transmitted to each conference terminal 2 and output from the speaker.

一方、各会議端末２からのマイク入力信号が会議サーバ３に送信される。会議サーバ３は受信した各会議端末２のマイク入力信号を、各帯域制限検出部３２（３２－１～３２－３）に与える。 On the other hand, a microphone input signal from each conference terminal 2 is transmitted to the conference server 3 . The conference server 3 gives the received microphone input signal of each conference terminal 2 to each band limit detector 32 (32-1 to 32-3).

図３は、実施形態に係る帯域制限検出部の詳細構成を示すブロック図である。 FIG. 3 is a block diagram showing the detailed configuration of the band limit detector according to the embodiment.

帯域通過フィルタ（ＢＰＦ）３２１は、マイク入力信号のうち特定帯域の信号（２．５～３．０ｋＨｚ帯域の信号）を通過させる。また、ＢＰＦ３２１からの出力信号は、電力算出部３２３に与えられる。 A band pass filter (BPF) 321 passes a signal in a specific band (signal in a band of 2.5 to 3.0 kHz) among microphone input signals. Also, the output signal from the BPF 321 is given to the power calculator 323 .

帯域通過フィルタ（ＢＰＦ）３２２は、マイク入力信号のうち特定帯域の信号（２．０～２．５ｋＨｚ帯域の信号）を通過させる。また、ＢＰＦ３２２からの出力信号は、電力算出部３２４に与えられる。 A band-pass filter (BPF) 322 passes a signal in a specific band (signal in a band of 2.0 to 2.5 kHz) among microphone input signals. Also, the output signal from the BPF 322 is given to the power calculator 324 .

電力算出部３２３は、ＢＰＦ３２１からの出力信号が入力され、その入力信号の各サンプル値を平均化した電力値Ｐ＿ＢＰＦ１を算出して、帯域制限判定部３２５に与える。 Power calculator 323 receives the output signal from BPF 321 , calculates a power value P_BPF1 by averaging each sample value of the input signal, and provides it to band limit determination unit 325 .

ここで、電力算出部３２３による入力信号の平均電力値の算出方法は、例えば、入力信号の各サンプル値を２乗した値をＦＩＲ形フィルタにより平均化して電力値Ｐ＿ＢＰＦ１を算出する方法を適用することができる。なお、電力算出部３２３が入力信号の平均電力値を算出する方法は、サンプル値を２乗する方法に限定されるものではなく、入力信号のサンプル値の絶対値を用いるようにしても良い。また、電力算出部３２３は、ＦＩＲフィルタに代えて適当な時定数を持つＩＩＲ形ＬＰＦを用いるようにしても良い。 Here, as the method of calculating the average power value of the input signal by the power calculating unit 323, for example, a method of calculating the power value P_BPF1 by averaging the values obtained by squaring each sample value of the input signal using an FIR filter is applied. be able to. The method by which the power calculator 323 calculates the average power value of the input signal is not limited to the method of squaring the sample value, and the absolute value of the sample value of the input signal may be used. Also, the power calculator 323 may use an IIR type LPF having an appropriate time constant instead of the FIR filter.

電力算出部３２４は、ＢＰＦ３２２からの出力信号が入力され、その入力信号の各サンプル値を平均化した電力値Ｐ＿ＢＰＦ２を算出して、帯域制限判定部３２５に与える。電力算出部３２４による入力信号の平均電力値の算出方法は、電力算出部３２３と同様の方法を適用することができる。 The power calculator 324 receives the output signal from the BPF 322 , calculates a power value P_BPF2 by averaging each sample value of the input signal, and supplies the power value P_BPF2 to the band limit determination unit 325 . As the method of calculating the average power value of the input signal by the power calculator 324, the same method as that of the power calculator 323 can be applied.

帯域制限判定部３２５は、電力算出部３２３及び電力算出部３２４からの各電力値Ｐ＿ＢＰＦ１、Ｐ＿ＢＰＦ２に基づき、帯域制限の状態（即ち、エコー状態）を判定して、その判定結果をエコー検出部３５に出力する。 Based on the power values P_BPF1 and P_BPF2 from the power calculator 323 and the power calculator 324, the band limit determination unit 325 determines the band limit state (that is, the echo state), and sends the determination result to the echo detector 35. output to

図４は、実施形態に係る会議端末のマイク入力信号の周波数特性を説明する説明図である。図４（Ａ）は、会議端末２のマイクからエコーが入力された時のマイク入力信号の周波数特性を示しており、図４（Ｂ）は、該当端末のマイクに発話した声が入力された時のマイク入力信号の周波数特性を示している。 FIG. 4 is an explanatory diagram illustrating frequency characteristics of a microphone input signal of the conference terminal according to the embodiment. FIG. 4A shows the frequency characteristics of the microphone input signal when an echo is input from the microphone of the conference terminal 2, and FIG. shows the frequency characteristics of the microphone input signal.

帯域制限部３１（３１－１～３１－３）により各会議端末２のスピーカ出力信号の２．５～３．０ｋＨｚの特定帯域の信号は減衰されている。 A signal in a specific band of 2.5 to 3.0 kHz of the speaker output signal of each conference terminal 2 is attenuated by the band limiter 31 (31-1 to 31-3).

従って、マイクからエコー入力時（例えば、同一部屋に端末Ａ及び端末Ｂが存在する場合には、端末Ａ及び又は端末Ｂのスピーカからの出力信号が端末Ａ及び又は端末Ｂのマイクに入力された時）は、マイク入力信号は、図４（Ａ）に示すように、２．５～３．０ｋＨｚが減衰している。 Therefore, when an echo is input from the microphone (for example, when terminal A and terminal B exist in the same room, the output signal from the speaker of terminal A and/or terminal B is input to the microphone of terminal A and/or terminal B). time), the microphone input signal is attenuated at 2.5 to 3.0 kHz as shown in FIG. 4(A).

そのため、特定帯域２．５～３．０ｋＨｚを通過させるＢＰＦ３２１を通過する信号の平均電力値Ｐ＿ＢＰＦ１は、特定帯域以外の帯域２．０～２．５ｋＨｚを通過させるＢＰＦ３２２を通過する信号の平均電力値Ｐ＿ＢＰＦ２に比べて大きく減衰する傾向となる。 Therefore, the average power value P_BPF1 of the signal passing through the BPF 321 that passes the specific band of 2.5 to 3.0 kHz is the average power value of the signal that passes through the BPF 322 that passes the band of 2.0 to 2.5 kHz other than the specific band. It tends to be attenuated more than P_BPF2.

一方、該当端末のマイクに発話した声が入力された時は、会議端末２の利用者が発話するため、利用者の音声が、直接、マイクに入力される。従って、マイク入力信号は、２．５～３．０ｋＨｚの特定帯域の信号パワーが大きくなる。 On the other hand, when the uttered voice is input to the microphone of the corresponding terminal, the user of the conference terminal 2 speaks, so the user's voice is directly input to the microphone. Therefore, the microphone input signal has a large signal power in the specific band of 2.5 to 3.0 kHz.

そのため、２．５～３．０ｋＨｚ帯域（ＢＥＦ阻止帯域内）のパワーＰ＿ＢＰＦ１は、２．０～２．５ｋＨｚ帯域（ＢＥＦ阻止帯域外）のパワーＰ＿ＢＰＦ２に比べて、それほど大きく減衰する傾向とはならない。 Therefore, the power P_BPF1 in the 2.5-3.0 kHz band (within the BEF stopband) does not tend to attenuate significantly compared to the power P_BPF2 in the 2.0-2.5 kHz band (outside the BEF stopband). .

帯域制限判定部３２５は、図４（Ａ）及び図４（Ｂ）に示すマイク入力信号の周波数特性に基づいて、下記の式（１）の条件が成立したときに帯域制限検出有り（エコー検出）と判定し、式（１）の条件が成立しないときに帯域制限検出無しと判定する。 Based on the frequency characteristics of the microphone input signal shown in FIGS. 4(A) and 4(B), the band limit determination unit 325 detects band limit detection (echo detection) when the following formula (1) is met. ), and it is determined that the band limit is not detected when the condition of formula (1) is not satisfied.

Ｐ＿ＢＰＦ１／Ｐ＿ＢＰＦ２＜ＴＨ …（１）
式（１）は、ＢＰＦ３２１を通過した信号の平均電力値Ｐ＿ＢＰＦ１と、ＢＰＦ３２２を通過した信号の平均電力値Ｐ＿ＢＰＦ２との比が閾値ＴＨ未満であることを条件としている。これは、図４（Ａ）に示すように、エコー入力時には、特定帯域２．５～３．０ｋＨｚの信号の電力値の減衰量が小さいことを判断するためである。 P_BPF1/P_BPF2<TH (1)
Expression (1) is conditioned on the fact that the ratio of the average power value P_BPF1 of the signal that has passed through the BPF 321 and the average power value P_BPF2 of the signal that has passed through the BPF 322 is less than the threshold TH. This is because, as shown in FIG. 4A, it is determined that the attenuation of the power value of the signal in the specific band of 2.5 to 3.0 kHz is small when the echo is input.

帯域制限判定部３２５は、式（１）の条件を満たすか否かの判定結果を、エコー検出部３５に与える。 The band limit determination unit 325 gives the echo detection unit 35 a determination result as to whether or not the condition of expression (1) is satisfied.

（Ａ－２－２）エコー検出部３５等の処理
ミキサ部３０の出力信号は、各有音検出部３３（３３－１～３３－３）で、有音検出がされる。条件判定部３４は、各有音検出部３３（３３－１～３３－３）のいずれかで有音が検出されたか否か（ＯＲ条件判定の結果）をエコー検出部３５に与える。 (A-2-2) Processing of Echo Detector 35 and Others The output signal of the mixer 30 is subjected to voice activity detection by each of the voice activity detectors 33 (33-1 to 33-3). The condition determination unit 34 provides the echo detection unit 35 with information as to whether or not the presence of sound is detected by any of the presence detection units 33 (33-1 to 33-3) (result of OR condition determination).

エコー検出部３５は、上述の帯域制限検出部３２（帯域制限判定部３２５）で、マイク入力信号に帯域制限が検出（エコー検出）され、条件判定部３４から、各ミキサ部３０の出力信号のいずれかが有音である場合のみ、エコーを検出したとする（それ以外は、エコー不検出とする）。 In the echo detection unit 35, the band limit detection unit 32 (band limit determination unit 325) detects the band limitation (echo detection) in the microphone input signal, and the condition determination unit 34 determines the output signal of each mixer unit 30. Echo is detected only when one of them is spurt (otherwise, echo is not detected).

エコー消去部３６は、エコー検出部３５でエコー検出された場合、マイク入力信号に対して、エコーサプレッサ等（例えば、上述の特許文献３、上述の特許文献４、上述の特許文献５等に記載の方法）によるエコー消去（制限）処理を行う。 When an echo is detected by the echo detection unit 35, the echo canceller 36 applies an echo suppressor or the like (for example, described in the above-described Patent Document 3, Patent Document 4, Patent Document 5, etc.) to the microphone input signal. Echo cancellation (restriction) processing is performed by the method of (2).

（Ａ－３）実施形態の効果
本実施形態によれば、会議システム１では、聴感に影響しない帯域を消去した音声（会議音声）を各会議端末２のスピーカで出力し、各会議端末２のマイク入力信号に対して該当帯域減衰を検出することにより、エコー入力を即座に検出することが可能となった。 (A-3) Effect of the Embodiment According to the present embodiment, in the conference system 1, the audio (conference audio) in which the band that does not affect hearing is eliminated is output from the speaker of each conference terminal 2, By detecting the corresponding band attenuation for the microphone input signal, it became possible to detect the echo input immediately.

そして、エコー検出時には、各会議端末２のマイク入力信号のエコーを消去することで、Ｗｅｂ会議でのハウリングを防止できる。Ｗｅｂ会議に参加している端末が同室に２台（端末Ａ、Ｂ…）以上あり、いずれかがスピーカモードになって発生したハウリングを防止できる。 At the time of echo detection, by canceling the echo of the microphone input signal of each conference terminal 2, howling in the Web conference can be prevented. There are two or more terminals (terminals A, B, . . . ) participating in the Web conference in the same room, and howling caused by one of them becoming a speaker mode can be prevented.

（Ｂ）他の実施形態
上記実施形態においても種々の変形実施形態を言及したが、本発明は、以下の変形実施形態にも適用できる。 (B) Other Embodiments Although various modified embodiments have been mentioned in the above embodiments, the present invention can also be applied to the following modified embodiments.

（Ｂ－１）上記実施形態では、会議サーバ３が主体となって、ハウリングを防止する構成を示したが、図５に示すように、各会議端末２が主体となる構成を採用して、ハウリングを防止しても良い。図５では、各会議端末２が図１で示した会議サーバ３の構成（帯域制限部３１、帯域制限検出部、有音検出部３３、エコー検出部３５、及びエコー消去部３６）に加えて、第２のミキサ手段としてのミキサ部２１を有する。 (B-1) In the above embodiment, the conference server 3 plays a major role in preventing howling. However, as shown in FIG. Howling may be prevented. In FIG. 5, in addition to the configuration of the conference server 3 shown in FIG. , and a mixer section 21 as a second mixer means.

ミキサ部２１は、会議サーバ３のミキサ部３０で合成された音声（自端末用のスピーカで出力する合成音声）に自端末の音声を加算するものである。 The mixer unit 21 adds the voice of the own terminal to the voice synthesized by the mixer unit 30 of the conference server 3 (synthesized voice output by the speaker for the own terminal).

また、図５の有音検出部３３では、上述のミキサ部２１で合成した合成音声（会議に参加している全ての会議端末２の合成音声）に対して、有音検出判定を行う。これ以外の処理は、上述の図１で示した各構成の処理と同一（又は類似）する。 Further, the active voice detection unit 33 in FIG. 5 performs active voice detection determination on the synthetic voice synthesized by the mixer unit 21 (synthesized voice of all the conference terminals 2 participating in the conference). Processing other than this is the same (or similar) as the processing of each configuration shown in FIG.

（Ｂ－２）変形例として、会議システム１の構成例は、上述の図１又は図５で示したものに限らず、各構成（機能）は、各会議端末２、会議サーバ３に適宜分散して配置しても良い。 (B-2) As a modified example, the configuration example of the conference system 1 is not limited to those shown in FIG. 1 or FIG. You can also place the

（Ｂ－３）上記実施形態では、各会議端末２がスピーカ出力であることを前提として、会議サーバ３のミキサ部３０の出力信号に対して帯域制限部３１で帯域制限を適用することとしたが、各会議端末２がスピーカ出力なのかヘッドホン出力であるかを検知可能で有る場合には、スピーカ出力にのみ帯域制限を適用しても良い（ヘッドホン出力の場合には、エコーが混入する可能性は低いため）。 (B-3) In the above embodiment, on the premise that each conference terminal 2 is a speaker output, the band limitation unit 31 applies band limitation to the output signal of the mixer unit 30 of the conference server 3. However, if each conference terminal 2 can detect whether it is speaker output or headphone output, band limitation may be applied only to speaker output (in the case of headphone output, echo may be mixed in). (because of the low volatility).

（Ｂ－４）変形例として、同一拠点（同一部屋）に存在する参加者（会議端末２）が特定可能な場合には、上述のエコーを検出して消去する処理（ハウリング防止処理）の適用範囲を同一拠点に存在する会議端末２のみに絞っても良い。会議端末２が同一拠点に存在するか否かは、例えば、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）により判定しても良い。 (B-4) As a modification, when participants (conference terminals 2) existing at the same site (same room) can be identified, application of the above-described echo detection and elimination processing (howling prevention processing) The range may be narrowed down to only the conference terminals 2 existing at the same base. Whether or not the conference terminals 2 are present at the same location may be determined by, for example, GPS (Global Positioning System).

（Ｂ－５）上記実施形態では、会議端末２のスピーカで出力する音声に対して有音検出を行っていたが、会議端末２のマイクで入力した音声（マイク入力信号）に対して有音検出を行うようにしても良い。 (B-5) In the above embodiment, the presence of sound is detected in the voice output from the speaker of the conference terminal 2. Detection may be performed.

（Ｂ－６）上記実施形態では、帯域制限の前に有音検出するようにしたが、帯域制限後に有音検出するようにしても良い。 (B-6) In the above embodiment, voice presence is detected before band limitation, but voice presence detection may be performed after band limitation.

１…会議システム、２（２－Ａ～２－Ｃ）…会議端末、３…会議サーバ、２１、３０…ミキサ部、３１（３１－１～３１－３）…帯域制限部、３２（３２－１～３２－３）…帯域制限検出部、３３（３３－１～３３－３）…有音検出部、３４…条件判定部、３５（３５－１～３５－３）…エコー検出部、３６（３６－１～３６－３）…エコー消去部、３２１、３２２…ＢＰＦ、３２３、３２４…電力算出部、３２５…帯域制限判定部。 1... conference system, 2 (2-A to 2-C)... conference terminal, 3... conference server, 21, 30... mixer unit, 31 (31-1 to 31-3)... band limiter, 32 (32- 1 to 32-3) ... band limit detection section, 33 (33-1 to 33-3) ... voice detection section, 34 ... condition determination section, 35 (35-1 to 35-3) ... echo detection section, 36 (36-1 to 36-3)... Echo cancellers, 321, 322... BPF, 323, 324... Power calculators, 325... Band limit determiners.

Claims

A conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
a first mixer means for synthesizing the microphone input signals of other conference terminals other than the own terminal and outputting the synthesized sound signal as a first synthesized sound signal to each of the conference terminals;
a spurt detection means for detecting that any one of the first synthesized sound signals is spurt;
specific band attenuation means for attenuating a specific band of each of the first synthesized sound signals and outputting the signal after attenuation as an output signal of each of the conference terminals;
band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band;
When the band limit detection means detects that the output signal of each of the conference terminals after attenuation of the specific band is included and the voice activity detection means determines that there is voice activity, the echo detection means for detecting that the microphone input signal contains an echo;
and echo cancellation means for canceling the echo detected by the echo detection means from the microphone input signal.

The band limit detection means is
a specific band extraction unit that extracts the specific band of the microphone input signal;
an out-of-specific-band extraction unit that extracts a band other than the specific band of the microphone input signal;
When the ratio of the power value of the output signal from the specific band extraction unit to the power value of the output signal from the out-of-specific band extraction unit is less than a threshold, the output signal of each conference terminal after specific band attenuation is included. 2. The conference system according to claim 1, further comprising: a bandwidth limit determination unit that determines that the bandwidth limit is set.

3. The conference system according to claim 1, wherein the specific band is a band that has little effect on hearing.

4. The conference system according to claim 3, wherein the specific band is a band of 2.5-3.0 kHz.

A conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
a first mixer means for synthesizing the microphone input signals of other conference terminals other than the own terminal and outputting the synthesized sound signal as a first synthesized sound signal to each of the conference terminals;
a spurt detection means for detecting that any one of the first synthesized sound signals is spurt;
specific band attenuation means for attenuating a specific band of each of the first synthesized sound signals and outputting the signal after attenuation as an output signal of each of the conference terminals;
band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band;
When the band limit detection means detects that the output signal of each of the conference terminals after attenuation of the specific band is included and the voice activity detection means determines that there is voice activity, the echo detection means for detecting that the microphone input signal contains an echo;
and echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

A conference terminal in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server,
Synthesizing the microphone input signal of the own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of a conference terminal other than the own terminal acquired from the conference server and outputting the signal as a second synthesized sound signal a second mixer means;
a spurt detection means for detecting that the second synthesized sound signal is spurt;
specific band attenuation means for attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal;
band limit detection means for detecting that the output signal of the conference terminal after attenuation in a specific band is included in the microphone input signal;
When the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included and the voice activity detection means determines that there is voice activity, the microphone echo detection means for detecting that an input signal contains an echo;
and echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

A computer mounted on a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
a first mixer means for synthesizing the microphone input signals of other conference terminals other than the own terminal and outputting the synthesized sound signal as a first synthesized sound signal to each of the conference terminals;
a spurt detection means for detecting that any one of the first synthesized sound signals is spurt;
specific band attenuation means for attenuating a specific band of each of the first synthesized sound signals and outputting the signal after attenuation as an output signal of each of the conference terminals;
band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band;
When the band limit detection means detects that the output signal of each of the conference terminals after attenuation of the specific band is included and the voice activity detection means determines that there is voice activity, the echo detection means for detecting that the microphone input signal contains an echo;
An echo cancellation program characterized by functioning as an echo cancellation means for canceling an echo detected by said echo detection means from said microphone input signal.

A computer mounted on a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
Synthesizing the microphone input signal of the own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of a conference terminal other than the own terminal acquired from the conference server and outputting the signal as a second synthesized sound signal a second mixer means;
a spurt detection means for detecting that the second synthesized sound signal is spurt;
specific band attenuation means for attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal;
band limit detection means for detecting that the output signal of the conference terminal after attenuation in a specific band is included in the microphone input signal;
When the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included and the voice activity detection means determines that there is voice activity, the microphone echo detection means for detecting that an input signal contains an echo;
and echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

An echo cancellation method used in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server,
having first mixer means, voice presence detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means;
the first mixer means for each of the conference terminals to synthesize the microphone input signals of other conference terminals other than the own terminal and output the synthesized signal as a first synthesized sound signal;
the spurt detection means detects that any one of the first synthesized speech signals is spurt;
the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals and outputs the signal after attenuation as an output signal of each of the conference terminals;
the band limit detection means detects that the microphone input signal includes an output signal of each conference terminal after attenuation in a specific band;
The echo detection means detects that the output signal of each of the conference terminals after attenuation of a specific band is included by the band limit detection means, and determines that there is sound by the voice activity detection means. detected that the microphone input signal contains an echo,
The echo cancellation method, wherein the echo cancellation means cancels the echo detected by the echo detection means from the microphone input signal.

An echo cancellation method used in a conference server in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server,
having first mixer means, voice presence detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means;
the first mixer means for each of the conference terminals to synthesize the microphone input signals of other conference terminals other than the own terminal and output the synthesized signal as a first synthesized sound signal;
the spurt detection means detects that any one of the first synthesized speech signals is spurt;
the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals and outputs the signal after attenuation as an output signal of each of the conference terminals;
the band limit detection means detects that the microphone input signal includes an output signal of each conference terminal after attenuation in a specific band;
The echo detection means detects that the output signal of each of the conference terminals after attenuation of a specific band is included by the band limit detection means, and determines that there is sound by the voice activity detection means. detected that the microphone input signal contains an echo,
The echo cancellation method, wherein the echo cancellation means cancels the echo detected by the echo detection means from the microphone input signal.

An echo canceling method used for a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
a second mixer means, voice presence detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means;
The second mixer unit synthesizes the microphone input signal of its own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of a conference terminal other than the own terminal obtained from the conference server, and output as a synthesized sound signal of 2,
The spurt detection means detects that the second synthesized speech signal is spurt,
the specific band attenuation means attenuates a specific band of the first synthesized sound signal and outputs the signal after attenuation as an output signal of the conference terminal;
The band limit detection means detects that the microphone input signal includes an output signal of the conference terminal after attenuation in a specific band,
In the echo detection means, the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included, and the voice activity detection means determines that there is voice activity. detecting that the microphone input signal contains an echo,
The echo cancellation method, wherein the echo cancellation means cancels the echo detected by the echo detection means from the microphone input signal.