WO2023026600A1 - Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method - Google Patents

Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method Download PDF

Info

Publication number
WO2023026600A1
WO2023026600A1 PCT/JP2022/019837 JP2022019837W WO2023026600A1 WO 2023026600 A1 WO2023026600 A1 WO 2023026600A1 JP 2022019837 W JP2022019837 W JP 2022019837W WO 2023026600 A1 WO2023026600 A1 WO 2023026600A1
Authority
WO
WIPO (PCT)
Prior art keywords
conference
echo
detection means
signal
microphone input
Prior art date
Application number
PCT/JP2022/019837
Other languages
French (fr)
Japanese (ja)
Inventor
高詩 石黒
Original Assignee
沖電気工業株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 沖電気工業株式会社 filed Critical 沖電気工業株式会社
Publication of WO2023026600A1 publication Critical patent/WO2023026600A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B3/00Line transmission systems
    • H04B3/02Details
    • H04B3/20Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other
    • H04B3/23Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other using a replica of transmitted signal in the time domain, e.g. echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers

Definitions

  • the present invention relates to a conference system, a conference server, a conference terminal, an echo cancellation program, and an echo cancellation method, and can be applied, for example, to a conference system that conducts web conferences.
  • Patent Literature 1 discloses a technique of synthesizing voice input signals from each terminal microphone in the same group and then removing echo using an adaptive filter.
  • JP 2013-251630 A JP 2011-070084 A JP 2015-170867 A JP 2017-034355 A JP 2009-033344 A
  • Patent Document 1 can only be applied to a configuration that can synthesize one speaker output in the same group and microphone inputs in the same group, so it cannot be applied to a general Web conference. That is, in the Web conference, voice input from terminals other than the own terminal is synthesized and output from the speaker, so the output sound differs from terminal to terminal.
  • a first aspect of the present invention is a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, wherein: (1) each of the conference terminals has a first mixer means for synthesizing the microphone input signal of the conference terminal and outputting it as a first synthesized sound signal; (3) specific band attenuation means for attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; (4) the above (5) band limit detection means for detecting that the output signal of each conference terminal after attenuation in a specific band is included in the microphone input signal; Detecting that an echo is included in the microphone input signal when it is detected that the output signal of each conference terminal is included and the presence of voice is determined by the voice detection means. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
  • a second aspect of the present invention is a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, wherein: (1) each of the conference terminals has its own terminal (2) any one of said first synthesized speech signals is active; (3) specific band attenuation means for attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; 4) band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after the conference terminal is included, and when it is determined that there is voice by the voice presence detection means, the microphone input signal includes an echo. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
  • a third aspect of the present invention is a conference terminal in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server, wherein: (2) second mixer means for synthesizing the microphone input signal of its own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of another conference terminal and outputting the result as a second synthesized sound signal; (3) attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal; (4) band limit detection means for detecting that the microphone input signal includes the output signal of the conference terminal after attenuation of the specific band; (5) the above When the band limit detection means detects that the output signal of the conference terminal after attenuation in a specific band is included and the voice activity detection means determines that there is voice activity, the microphone input is detected. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
  • An echo cancellation program provides a computer mounted on a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server.
  • band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; (5) the band limit.
  • the detection means detects that the output signal of each of the conference terminals after attenuation in a specific band is included and the voice activity detection means determines that there is voice activity, the microphone input signal is detected. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
  • An echo cancellation program provides a computer installed in a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, (1) the conference server; a first synthesized sound signal obtained by synthesizing the microphone input signal of another conference terminal other than the own terminal obtained from (2) sound detection means for detecting that the second synthesized sound signal is sound; as an output signal of the conference terminal; and (4) band limit detection for detecting that the microphone input signal includes the output signal of the conference terminal after attenuating the specific band.
  • the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included, and the voice activity detection means determines that there is voice activity; and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal. It is characterized by
  • a sixth aspect of the present invention is an echo canceling method used in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, comprising first mixer means and active voice detection means. , specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means; synthesize the microphone input signals of and output as a first synthesized sound signal; (3) the specific band attenuation means attenuates a specific band of each first synthesized sound signal and outputs the signal after attenuation as an output signal of each conference terminal; (5) the echo detection means detects that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after the conference terminal is included, and when it is determined that there is voice by the voice presence detection means, the microphone input signal includes an echo. (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;
  • a seventh aspect of the present invention is an echo canceling method used in a conference server in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server, the method comprising first mixer means, It has sound detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means, and (1) the first mixer means is provided for each of the conference terminals to (2) the active voice detecting means detects that any one of the first synthetic voice signals is active; (3) the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals, outputs the signal after attenuation as an output signal of each of the conference terminals, and (4) limits the band.
  • the detection means detects that the microphone input signal includes the output signal of each of the conference terminals after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after attenuation in a specific band is included, and when it is determined by the voice activity detecting means that there is voice activity, the microphone input signal includes an echo. (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;
  • An eighth aspect of the present invention is an echo canceling method used in a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, comprising: second mixer means; a sound detection means, a specific band attenuation means, a band limit detection means, an echo detection means, and an echo cancellation means; synthesizes the microphone input signal of its own terminal with a first synthesized speech signal obtained by synthesizing the microphone input signal of the terminal, and outputs the synthesized speech signal as a second synthesized speech signal; (3) the specific band attenuation means attenuates a specific band of the first synthesized sound signal and outputs the signal after attenuation as an output signal of the conference terminal; (4) the band limit detection means detects that the microphone input signal includes the output signal of the conference terminal after attenuation in a specific band; (5) the echo detection means is detected by the band limit detection means that the output signal of the conference terminal after attenuation of a specific
  • FIG. 1 is a block diagram showing a configuration example of a conference system according to an embodiment
  • FIG. 3 is a block diagram showing the detailed configuration of a band limiter according to the embodiment
  • FIG. 4 is a block diagram showing a detailed configuration of a band limit detector according to the embodiment
  • FIG. 4 is an explanatory diagram illustrating frequency characteristics of a microphone input signal of the conference terminal according to the embodiment
  • FIG. 4 is an explanatory diagram illustrating frequency characteristics of a microphone input signal of the conference terminal according to the embodiment
  • FIG. 11 is a block diagram showing a configuration example of a conference system according to a modification
  • FIG. 1 is a block diagram showing a configuration example of a conference system according to an embodiment.
  • the conference system 1 has three conference terminals 2 (2-A to 2-C) and a conference server 3. Although three conference terminals 2 (2-A to 2-C) are shown in FIG. 1 to simplify the explanation, the number of conference terminals 2 is not particularly limited. That is, in this embodiment, three conference terminals 2-A to 2-C hold one conference, but the number of conference terminals 2 holding one conference is not particularly limited. do not have.
  • connection configuration between the conference terminal 2 and the conference server 3 is omitted, but various connection configurations can be applied.
  • the conference terminal 2 and the conference server 3 are capable of two-way communication via a communication line (for example, a wide area network such as the Internet, a telephone line, or a dedicated line). and
  • the conference server 3 has a function of synthesizing the voices obtained from each of the conference terminals 2 at multiple bases and converting them into conference data.
  • the conference server 3 outputs (transmits) conference data (synthetic sound) for each conference terminal 2 to each conference terminal.
  • the conference terminal 2 is a terminal that participates in the conference, and may be an information processing terminal that has an audio input/output function (microphone, speaker) and a communication function.
  • the conference terminal 2 can be a PC, a mobile terminal such as a smartphone, a tablet, a wearable device, or the like.
  • the conference terminal 2-A is simply referred to as “terminal A”, the conference terminal 2-B as “terminal B”, and the conference terminal 2-C as “terminal C”.
  • the voices (voice data) input from the microphones of the terminals A to C and transmitted to the conference server 3 may be called “voice A”, “voice B”, and "voice C", respectively.
  • FIG. 3 (A-1-2) Detailed Configuration of Conference Server 3 In FIG. It has a section 33 , an echo detection section 35 and an echo cancellation section 36 .
  • the conference server 3 may be implemented by installing a program (echo cancellation program according to the embodiment) in a computer having a processor, memory, etc., but even in this case, the conference server 3 is functionally as shown in FIG. can be shown using Part or all of the conference server 3 may be realized by hardware.
  • the mixer unit 30 generates audio data (synthetic sound signal for conference) by synthesizing (mixing) the audio data (microphone input signal) supplied from each conference terminal 2 and supplies it to each corresponding conference terminal 2 .
  • the mixer unit 30 provides (1) synthesized sound (voice B+C) of microphone input signals of terminals B and C to terminal A, and (2) synthesized sound of microphone input signals of terminals A and C to terminal B. (speech A+C), and synthesized sounds (speech A+B) of the microphone input signals of terminals A and B are output to terminal C, respectively.
  • the spurt detection unit 33 (33-1 to 33-3) performs spurt detection processing on the synthesized speech output from the mixer unit 30.
  • the spurt detection unit 33 can apply various spurt detection processes, and for example, the technology described in Patent Document 2 can be applied.
  • the voice presence detection unit 33 gives the voice presence determination result to the condition determination unit 34 .
  • the condition determination unit 34 determines that one of the determination results of the voice detection units 33 (33-1 to 33-3) for each synthetic sound output from the mixer unit 30 is voice. It is determined whether the result is obtained (OR condition). The condition determination unit 34 gives the OR condition determination result to the echo detection unit 35 .
  • the voice presence detection means is implemented by, for example, the voice presence detection unit 33 and the condition determination unit 34 described above.
  • the band limiting unit 31 limits (erases) a predetermined band of the synthesized sound output from the mixer unit 30, and outputs the signal from the speaker of each conference terminal 2. This is the output.
  • the technique described in Patent Document 3 is applied to limit the 2.5-3.0 kHz band, which has little effect on hearing.
  • the band limit detection unit 32 (32-1 to 32-3) detects the signal (speaker Detects the microphone input of the signal output by the The band limit detector 32 sends the band limit detection result (whether or not the microphone input signal of the conference terminal 2 contains a signal band-limited by the band limiter 31 of each conference terminal 2) to the echo detector 35. give. The details of the band limit detector 32 will be described in the section on operation.
  • the echo detector 35 (35-1 to 35-3) detects whether or not the microphone input (microphone input signal) of each conference terminal 2 contains an echo component. Specifically, the echo detection unit 35 performs echo detection based on the band limitation detection result of the band limitation detection unit 32 and the OR condition determination result of the condition determination unit 34 . For example, when each conference terminal 2 satisfies "microphone input of band-limited signal" and "sound output from any speaker", the echo detection unit 35 detects the echo. The echo detector 35 gives the echo detection result to the echo canceler 36 .
  • the echo canceller 36 (36-1 to 36-3) cancels the echo of the microphone input signal when the echo detection result by each echo detector 35 is echo detection.
  • the echo cancellation unit 36 can apply an echo suppressor.
  • the echo canceller 36 uses an echo suppressor, the simplest method is to attenuate the microphone input signal as it is when echo is detected.
  • the echo cancellation unit 36 may divide the signal into bands and attenuate each band as necessary (in this case, FFT (Fast Fourier Transform) or the like may be used). If the echo canceller 36 uses an echo suppressor, for example, the technology (method) described in Patent Document 4 can be applied.
  • the echo cancellation unit 36 may use an adaptive echo canceller.
  • the technology (method) described in Patent Document 3 or 5 can be applied to implement an adaptive echo canceller.
  • FIG. 2 is a block diagram showing the detailed configuration of the band limiter according to the embodiment.
  • the band limiting section 31 has a BEF (Band Elimination Filter) 310 .
  • the BEF 310 eliminates, for example, the 2.5 to 3.0 kHz band, which has little effect on hearing, from the output signal from the mixer section 30 (synthesized sound output from the speaker of each conference terminal 2). Then, the signal in which the signal component in the specific band of 2.5 to 3.0 kHz is attenuated is transmitted to each conference terminal 2 and output from the speaker.
  • the microphone input signal from each conference terminal 2 is transmitted to the conference server 3.
  • the conference server 3 gives the received microphone input signal of each conference terminal 2 to each band limit detector 32 (32-1 to 32-3).
  • FIG. 3 is a block diagram showing the detailed configuration of the band limit detection unit according to the embodiment.
  • a band-pass filter (BPF) 321 passes a signal in a specific band (signal in a band of 2.5 to 3.0 kHz) among microphone input signals. Also, the output signal from the BPF 321 is given to the power calculator 323 .
  • a band-pass filter (BPF) 322 passes a signal in a specific band (signal in a band of 2.0 to 2.5 kHz) among microphone input signals. Also, the output signal from the BPF 322 is given to the power calculator 324 .
  • the power calculation unit 323 receives the output signal from the BPF 321 , calculates a power value P_BPF1 by averaging each sample value of the input signal, and provides it to the band limit determination unit 325 .
  • the method of calculating the average power value of the input signal by the power calculating unit 323 for example, a method of calculating the power value P_BPF1 by averaging the values obtained by squaring each sample value of the input signal using an FIR filter is applied. be able to.
  • the method by which the power calculator 323 calculates the average power value of the input signal is not limited to the method of squaring the sample value, and the absolute value of the sample value of the input signal may be used. Also, the power calculator 323 may use an IIR type LPF having an appropriate time constant instead of the FIR filter.
  • the power calculation unit 324 receives the output signal from the BPF 322 , calculates a power value P_BPF2 by averaging each sample value of the input signal, and provides it to the band limit determination unit 325 .
  • the method of calculating the average power value of the input signal by the power calculator 324 the same method as that of the power calculator 323 can be applied.
  • the band limit determination unit 325 determines the band limit state (that is, the echo state), and sends the determination result to the echo detector 35. output to
  • FIG. 4A and 4B are explanatory diagrams explaining the frequency characteristics of the microphone input signal of the conference terminal according to the embodiment.
  • FIG. 4A shows the frequency characteristics of the microphone input signal when an echo is input from the microphone of the conference terminal 2
  • FIG. It shows frequency characteristics.
  • a signal in a specific band of 2.5 to 3.0 kHz of the speaker output signal of each conference terminal 2 is attenuated by the band limiter 31 (31-1 to 31-3).
  • the microphone input signal is attenuated from 2.5 to 3.0 kHz, as shown in FIG. 4A.
  • the average power value P_BPF1 of the signal passing through the BPF 321 that passes the specific band of 2.5 to 3.0 kHz is the average power value of the signal that passes through the BPF 322 that passes the band of 2.0 to 2.5 kHz other than the specific band. It tends to be attenuated more than P_BPF2.
  • the microphone input signal has a large signal power in the specific band of 2.5 to 3.0 kHz.
  • the power P_BPF1 in the 2.5-3.0 kHz band (within the BEF stopband) does not tend to attenuate significantly compared to the power P_BPF2 in the 2.0-2.5 kHz band (outside the BEF stopband). .
  • the band limit determination unit 325 determines that there is band limit detection (echo detection) when the condition of the following formula (1) is satisfied. When the condition (1) is not satisfied, it is determined that the band limit is not detected.
  • Expression (1) is conditioned on the fact that the ratio of the average power value P_BPF1 of the signal that has passed through the BPF 321 and the average power value P_BPF2 of the signal that has passed through the BPF 322 is less than the threshold TH. This is because, as shown in FIG. 4A, it is determined that the amount of attenuation of the power value of the signal in the specific band of 2.5 to 3.0 kHz is small during echo input.
  • the band limit determination unit 325 provides the echo detection unit 35 with the determination result of whether or not the condition of formula (1) is satisfied.
  • the output signal of the mixer 30 is subjected to voice activity detection by each of the voice activity detectors 33 (33-1 to 33-3).
  • the condition determination unit 34 provides the echo detection unit 35 with information as to whether or not the presence of sound is detected by any of the presence detection units 33 (33-1 to 33-3) (result of OR condition determination).
  • the band limit detection unit 32 detects the band limitation (echo detection) in the microphone input signal, and the condition determination unit 34 determines the output signal of each mixer unit 30. Echo is detected only when one of them is spurt (otherwise, echo is not detected).
  • the echo canceller 36 applies an echo suppressor or the like (for example, described in the above-described Patent Document 3, Patent Document 4, Patent Document 5, etc.) to the microphone input signal. Echo cancellation (restriction) processing is performed by the method of (2).
  • the conference server 3 plays a major role in preventing howling. However, as shown in FIG. Howling may be prevented.
  • FIG. 5 in addition to the configuration of the conference server 3 shown in FIG. , and a mixer section 21 as a second mixer means.
  • the mixer unit 21 adds the sound of the own terminal to the sound synthesized by the mixer unit 30 of the conference server 3 (synthesized sound output by the speaker for the own terminal).
  • the active voice detection unit 33 in FIG. 5 performs active voice detection determination on the synthetic voice synthesized by the mixer unit 21 (synthesized voice of all the conference terminals 2 participating in the conference). Processing other than this is the same (or similar) as the processing of each configuration shown in FIG.
  • the band limitation unit 31 applies band limitation to the output signal of the mixer unit 30 of the conference server 3.
  • band limitation may be applied only to speaker output (in the case of headphone output, echo may be mixed in). (because of the low volatility).
  • the presence of sound is detected in the voice output from the speaker of the conference terminal 2. Detection may be performed.
  • voice activity is detected before band limitation, but voice activity detection may be performed after band limitation.
  • 1... conference system 2 (2-A to 2-C)... conference terminal, 3... conference server, 21, 30... mixer unit, 31 (31-1 to 31-3)... band limiter, 32 (32- 1 to 32-3) ... band limit detection section, 33 (33-1 to 33-3) ... voice detection section, 34 ... condition determination section, 35 (35-1 to 35-3) ... echo detection section, 36 (36-1 to 36-3)... Echo cancellers, 321, 322... BPF, 323, 324... Power calculators, 325... Band limit determiners.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

[Problem] To provide a conference system capable of effectively suppressing the occurrence of howling in a web conference or the like. [Solution] A conference system according to the present invention comprises: a first mixing means for outputting, to each terminal participating in a conference, a first combined sound signal obtained by combining respective microphone input signals of other terminals beside the terminal itself; a with-sound detecting means for detecting whether any of the first combined sound signals includes a sound; an attenuating means for attenuating a specific band of each first combined sound signal and outputting an output signal for each terminal after attenuation; a band limit detecting means for detecting that the output signals for each terminal after attenuation are included in the microphone input signals; an echo detecting means for detecting that the microphone input signal includes an echo if it is detected that the microphone input signal includes the output signals of each terminal after attenuation, and it is determined that the output signals include a sound; and an echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

Description

会議システム、会議サーバ、会議端末、エコー消去プログラム、及びエコー消去方法Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method
 本発明は、会議システム、会議サーバ、会議端末、エコー消去プログラム、及びエコー消去方法に関し、例えば、Web会議を行う会議システムに適用し得る。 The present invention relates to a conference system, a conference server, a conference terminal, an echo cancellation program, and an echo cancellation method, and can be applied, for example, to a conference system that conducts web conferences.
 近年、新型コロナウイルスの影響もあり、様々な場面で、Web会議、テレビ電話会議等の遠隔会議システムを利用することが増えている。 In recent years, due to the impact of the new coronavirus, the use of remote conference systems such as web conferences and video conferences has increased in various situations.
 上述のWeb会議等の遠隔会議システムでは、エコーやハウリングを防止するために様々な技術が導入されている。例えば、特許文献1では、同一グループ内の各端末マイクからの音声入力信号を合成してから、適応フィルタを用いてエコーを除去する技術が開示されている。 Various techniques have been introduced to prevent echoes and howling in teleconferencing systems such as the above-mentioned web conferencing. For example, Patent Literature 1 discloses a technique of synthesizing voice input signals from each terminal microphone in the same group and then removing echo using an adaptive filter.
特開2013-251630号公報JP 2013-251630 A 特開2011-070084号公報JP 2011-070084 A 特開2015-170867号公報JP 2015-170867 A 特開2017-034355号公報JP 2017-034355 A 特開2009-033344号公報JP 2009-033344 A
 しかしながら、上述の従来技術では、ノートブック型の携帯型パーソナルコンピュータ(ノートPC)やタブレット端末のような情報端末を利用してWeb会議を行うような場合、同じ部屋に集まった会議の参加者(2名以上の参加者)のいずれかが情報端末でスピーカフォンを実行して、近接した情報端末間の音声について発生するハウリングを効果的に抑制できなかった。 However, in the conventional technology described above, when a web conference is held using an information terminal such as a notebook-type portable personal computer (notebook PC) or a tablet terminal, the conference participants ( (2 or more participants) could not effectively suppress the howling that occurs with the voices between adjacent information terminals by running the speakerphone on the information terminal.
 例えば、特許文献1に記載の技術は、同一グループでスピーカ出力1台かつ同一グループのマイク入力を合成できる構成にしか適用できないので、一般的なWeb会議には適用できない。即ち、Web会議では、自端末以外の音声入力を合成してスピーカ出力するので、端末毎に出力音が異なる。 For example, the technology described in Patent Document 1 can only be applied to a configuration that can synthesize one speaker output in the same group and microphone inputs in the same group, so it cannot be applied to a general Web conference. That is, in the Web conference, voice input from terminals other than the own terminal is synthesized and output from the speaker, so the output sound differs from terminal to terminal.
 そのため、Web会議等におけるハウリングの発生を効果的に抑制できる会議システム、会議サーバ、会議端末、エコー消去プログラム、及びエコー消去方法が望まれている。 Therefore, there is a demand for a conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method that can effectively suppress the occurrence of howling in web conferences and the like.
 第1の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムであって、(1)前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力する第1のミキサ手段と、(2)前記各第1の合成音信号のいずれかが有音であることを検出する有音検出手段と、(3)前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力する特定帯域減衰手段と、(4)前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、(5)前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、(6)前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段とを有することを特徴とする。 A first aspect of the present invention is a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, wherein: (1) each of the conference terminals has a first mixer means for synthesizing the microphone input signal of the conference terminal and outputting it as a first synthesized sound signal; (3) specific band attenuation means for attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; (4) the above (5) band limit detection means for detecting that the output signal of each conference terminal after attenuation in a specific band is included in the microphone input signal; Detecting that an echo is included in the microphone input signal when it is detected that the output signal of each conference terminal is included and the presence of voice is determined by the voice detection means. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
 第2の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議サーバであって、(1)前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力する第1のミキサ手段と、(2)前記各第1の合成音信号のいずれかが有音であることを検出する有音検出手段と、(3)前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力する特定帯域減衰手段と、(4)前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、(5)前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、(6)前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段とを有することを特徴とする。 A second aspect of the present invention is a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, wherein: (1) each of the conference terminals has its own terminal (2) any one of said first synthesized speech signals is active; (3) specific band attenuation means for attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; 4) band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after the conference terminal is included, and when it is determined that there is voice by the voice presence detection means, the microphone input signal includes an echo. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
 第3の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議端末であって、(1)前記会議サーバから取得した自端末以外の他の会議端末の前記マイク入力信号を合成した第1の合成音信号に、自端末の前記マイク入力信号を合成し、第2の合成音信号として出力する第2のミキサ手段と、(2)前記第2の合成音信号が有音であることを検出する有音検出手段と、(3)前記第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記会議端末の出力信号として出力する特定帯域減衰手段と、(4)前記マイク入力信号に対して、特定帯域減衰後の前記会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、(5)前記帯域制限検出手段により、特定帯域減衰後の前記会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、(6)前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段とを有することを特徴とする。 A third aspect of the present invention is a conference terminal in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server, wherein: (2) second mixer means for synthesizing the microphone input signal of its own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of another conference terminal and outputting the result as a second synthesized sound signal; (3) attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal; (4) band limit detection means for detecting that the microphone input signal includes the output signal of the conference terminal after attenuation of the specific band; (5) the above When the band limit detection means detects that the output signal of the conference terminal after attenuation in a specific band is included and the voice activity detection means determines that there is voice activity, the microphone input is detected. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
 第4の本発明のエコー消去プログラムは、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議サーバに搭載されるコンピュータを、(1)前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力する第1のミキサ手段と、(2)前記各第1の合成音信号のいずれかが有音であることを検出する有音検出手段と、(3)前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力する特定帯域減衰手段と、(4)前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、(5)前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、(6)前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段として機能させることを特徴とする。 An echo cancellation program according to a fourth aspect of the present invention provides a computer mounted on a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server. (2) first mixer means for synthesizing the microphone input signals of conference terminals other than the own terminal and outputting the signals as first synthesized speech signals; (3) attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; (4) band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; (5) the band limit. When the detection means detects that the output signal of each of the conference terminals after attenuation in a specific band is included and the voice activity detection means determines that there is voice activity, the microphone input signal is detected. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
 第5の本発明のエコー消去プログラムは、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議端末に搭載されるコンピュータを、(1)前記会議サーバから取得した自端末以外の他の会議端末の前記マイク入力信号を合成した第1の合成音信号に、自端末の前記マイク入力信号を合成し、第2の合成音信号として出力する第2のミキサ手段と、(2)前記第2の合成音信号が有音であることを検出する有音検出手段と、(3)前記第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記会議端末の出力信号として出力する特定帯域減衰手段と、(4)前記マイク入力信号に対して、特定帯域減衰後の前記会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、(5)前記帯域制限検出手段により、特定帯域減衰後の前記会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、(6)前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段とを有することを特徴とする。 An echo cancellation program according to a fifth aspect of the present invention provides a computer installed in a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, (1) the conference server; a first synthesized sound signal obtained by synthesizing the microphone input signal of another conference terminal other than the own terminal obtained from (2) sound detection means for detecting that the second synthesized sound signal is sound; as an output signal of the conference terminal; and (4) band limit detection for detecting that the microphone input signal includes the output signal of the conference terminal after attenuating the specific band. (5) the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included, and the voice activity detection means determines that there is voice activity; and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal. It is characterized by
 第6の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムに使用するエコー消去方法であって、第1のミキサ手段、有音検出手段、特定帯域減衰手段、帯域制限検出手段、エコー検出手段、及びエコー消去手段を有し、(1)前記第1のミキサ手段は、前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力し、(2)前記有音検出手段は、前記各第1の合成音信号のいずれかが有音であることを検出し、(3)前記特定帯域減衰手段は、前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力し、(4)前記帯域制限検出手段は、前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出し、(5)前記エコー検出手段は、前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出し、(6)前記エコー消去手段は、前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去することを特徴とする。 A sixth aspect of the present invention is an echo canceling method used in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, comprising first mixer means and active voice detection means. , specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means; synthesize the microphone input signals of and output as a first synthesized sound signal; (3) the specific band attenuation means attenuates a specific band of each first synthesized sound signal and outputs the signal after attenuation as an output signal of each conference terminal; (5) the echo detection means detects that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after the conference terminal is included, and when it is determined that there is voice by the voice presence detection means, the microphone input signal includes an echo. (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;
 第7の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議サーバに使用するエコー消去方法であって、第1のミキサ手段、有音検出手段、特定帯域減衰手段、帯域制限検出手段、エコー検出手段、及びエコー消去手段を有し、(1)前記第1のミキサ手段は、前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力し、(2)前記有音検出手段は、前記各第1の合成音信号のいずれかが有音であることを検出し、(3)前記特定帯域減衰手段は、前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力し、(4)前記帯域制限検出手段は、前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出し、(5)前記エコー検出手段は、前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出し、(6)前記エコー消去手段は、前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去することを特徴とする。 A seventh aspect of the present invention is an echo canceling method used in a conference server in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server, the method comprising first mixer means, It has sound detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means, and (1) the first mixer means is provided for each of the conference terminals to (2) the active voice detecting means detects that any one of the first synthetic voice signals is active; (3) the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals, outputs the signal after attenuation as an output signal of each of the conference terminals, and (4) limits the band. The detection means detects that the microphone input signal includes the output signal of each of the conference terminals after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after attenuation in a specific band is included, and when it is determined by the voice activity detecting means that there is voice activity, the microphone input signal includes an echo. (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;
 第8の本発明は、マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議端末に使用するエコー消去方法であって、第2のミキサ手段、有音検出手段、特定帯域減衰手段、帯域制限検出手段、エコー検出手段、及びエコー消去手段を有し、(1)前記第2のミキサ手段は、前記会議サーバから取得した自端末以外の他の会議端末の前記マイク入力信号を合成した第1の合成音信号に、自端末の前記マイク入力信号を合成し、第2の合成音信号として出力し、(2)前記有音検出手段は、前記第2の合成音信号が有音であることを検出し、(3)前記特定帯域減衰手段は、前記第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記会議端末の出力信号として出力し、(4)前記帯域制限検出手段は、前記マイク入力信号に対して、特定帯域減衰後の前記会議端末の出力信号が含まれていることを検出し、(5)前記エコー検出手段は、前記帯域制限検出手段により、特定帯域減衰後の前記会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出し、(6)前記エコー消去手段は、前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去することを特徴とする。 An eighth aspect of the present invention is an echo canceling method used in a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, comprising: second mixer means; a sound detection means, a specific band attenuation means, a band limit detection means, an echo detection means, and an echo cancellation means; synthesizes the microphone input signal of its own terminal with a first synthesized speech signal obtained by synthesizing the microphone input signal of the terminal, and outputs the synthesized speech signal as a second synthesized speech signal; (3) the specific band attenuation means attenuates a specific band of the first synthesized sound signal and outputs the signal after attenuation as an output signal of the conference terminal; (4) the band limit detection means detects that the microphone input signal includes the output signal of the conference terminal after attenuation in a specific band; (5) the echo detection means is detected by the band limit detection means that the output signal of the conference terminal after attenuation of a specific band is included, and when the spurt detection means determines that there is spurt, (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;
 本発明によれば、Web会議等におけるハウリングの発生を効果的に抑制できる。 According to the present invention, it is possible to effectively suppress the occurrence of howling in web conferences and the like.
実施形態に係る会議システムの構成例について示すブロック図である。1 is a block diagram showing a configuration example of a conference system according to an embodiment; FIG. 実施形態に係る帯域制限部の詳細構成を示すブロック図である。3 is a block diagram showing the detailed configuration of a band limiter according to the embodiment; FIG. 実施形態に係る帯域制限検出部の詳細構成を示すブロック図である。4 is a block diagram showing a detailed configuration of a band limit detector according to the embodiment; FIG. 実施形態に係る会議端末のマイク入力信号の周波数特性を説明する説明図である。FIG. 4 is an explanatory diagram illustrating frequency characteristics of a microphone input signal of the conference terminal according to the embodiment; 実施形態に係る会議端末のマイク入力信号の周波数特性を説明する説明図である。FIG. 4 is an explanatory diagram illustrating frequency characteristics of a microphone input signal of the conference terminal according to the embodiment; 変形形態に係る会議システムの構成例について示すブロック図である。FIG. 11 is a block diagram showing a configuration example of a conference system according to a modification;
 (A)主たる実施形態
 以下では、会議システム、会議サーバ、会議端末、エコー消去プログラム、及びエコー消去方法の一実施形態を、図面を参照しながら詳細に説明する。
(A) Main Embodiments An embodiment of a conference system, a conference server, a conference terminal, an echo cancellation program, and an echo cancellation method will be described in detail below with reference to the drawings.
 (A-1)の実施形態の構成
 (A-1-1)全体構成
 図1は、実施形態に係る会議システムの構成例について示すブロック図である。
(A-1) Configuration of Embodiment (A-1-1) Overall Configuration FIG. 1 is a block diagram showing a configuration example of a conference system according to an embodiment.
 図1において、会議システム1は、3台の会議端末2(2-A~2-C)と、会議サーバ3とを有する。図1では、説明を簡易なものにするため、3台の会議端末2(2-A~2-C)を示しているが、会議端末2の台数は特に限定されるものではない。即ち、この実施形態では、3台の会議端末2-A~2-Cが1つの会議を開催する場合を例示するが、1つの会議を開催する会議端末2の数は特に限定されるものではない。 In FIG. 1, the conference system 1 has three conference terminals 2 (2-A to 2-C) and a conference server 3. Although three conference terminals 2 (2-A to 2-C) are shown in FIG. 1 to simplify the explanation, the number of conference terminals 2 is not particularly limited. That is, in this embodiment, three conference terminals 2-A to 2-C hold one conference, but the number of conference terminals 2 holding one conference is not particularly limited. do not have.
 また、図1では、会議端末2と会議サーバ3との間の接続構成については図示を省略しているが、種々の接続構成を適用することができる。この実施形態では、会議端末2及び会議サーバ3は、通信回線(例えば、インタ-ネットなどのような広域ネットワーク、電話回線、又は、専用回線など)を経由して双方向に通信可能であるものとする。 Also, in FIG. 1, illustration of the connection configuration between the conference terminal 2 and the conference server 3 is omitted, but various connection configurations can be applied. In this embodiment, the conference terminal 2 and the conference server 3 are capable of two-way communication via a communication line (for example, a wide area network such as the Internet, a telephone line, or a dedicated line). and
 会議サーバ3は、複数の拠点にある各会議端末2から得た音声を合成して、会議用データに変換する機能を有する。会議サーバ3は、各会議端末2のための会議用データ(合成音)を各会議端末に出力(送信)する。 The conference server 3 has a function of synthesizing the voices obtained from each of the conference terminals 2 at multiple bases and converting them into conference data. The conference server 3 outputs (transmits) conference data (synthetic sound) for each conference terminal 2 to each conference terminal.
 会議端末2は、会議に参加する端末であって、音声の入出力機能(マイク、スピーカ)及び通信機能を備えた情報処理端末であれば良い。例えば、会議端末2は、PC、スマートフォン等の携帯端末、タブレット、ウェアラブル装置等を適用することができる。 The conference terminal 2 is a terminal that participates in the conference, and may be an information processing terminal that has an audio input/output function (microphone, speaker) and a communication function. For example, the conference terminal 2 can be a PC, a mobile terminal such as a smartphone, a tablet, a wearable device, or the like.
 以下では、図1の各会議端末2を示す際、会議端末2-Aを「端末A」、会議端末2-Bを「端末B」、会議端末2-Cを「端末C」と単に呼ぶこともある。また、各端末A~Cのマイクから入力されて、会議サーバ3に送信される音声(音声データ)をそれぞれ、「音声A」、「音声B」、「音声C」と呼ぶこともある。 Hereinafter, when referring to each conference terminal 2 in FIG. 1, the conference terminal 2-A is simply referred to as "terminal A", the conference terminal 2-B as "terminal B", and the conference terminal 2-C as "terminal C". There is also Also, the voices (voice data) input from the microphones of the terminals A to C and transmitted to the conference server 3 may be called "voice A", "voice B", and "voice C", respectively.
 (A-1-2)会議サーバ3の詳細構成
 図1において、会議サーバ3は、第1のミキサ手段としてのミキサ部30と、帯域制限部31と、帯域制限検出部32と、有音検出部33と、エコー検出部35と、エコー消去部36とを有する。
(A-1-2) Detailed Configuration of Conference Server 3 In FIG. It has a section 33 , an echo detection section 35 and an echo cancellation section 36 .
 会議サーバ3は、プロセッサやメモリ等を有するコンピュータにプログラム(実施形態に係るエコー消去プログラム)をインストールして実現するようにしても良いが、この場合でも、会議サーバ3は機能的には図1を用いて示すことができる。なお、会議サーバ3については一部又は全部をハードウェア的に実現するようにしても良い。 The conference server 3 may be implemented by installing a program (echo cancellation program according to the embodiment) in a computer having a processor, memory, etc., but even in this case, the conference server 3 is functionally as shown in FIG. can be shown using Part or all of the conference server 3 may be realized by hardware.
 ミキサ部30は、各会議端末2から供給される音声データ(マイク入力信号)を合成(ミキシング)した音声データ(会議用の合成音信号)を生成して、対応する各会議端末2に供給する。例えば、図1では、ミキサ部30は、(1)端末Aに端末B、Cのマイク入力信号の合成音(音声B+C)、(2)端末Bに端末A、Cのマイク入力信号の合成音(音声A+C)、端末Cに端末A、Bのマイク入力信号の合成音(音声A+B)を各々、出力する。 The mixer unit 30 generates audio data (synthetic sound signal for conference) by synthesizing (mixing) the audio data (microphone input signal) supplied from each conference terminal 2 and supplies it to each corresponding conference terminal 2 . . For example, in FIG. 1, the mixer unit 30 provides (1) synthesized sound (voice B+C) of microphone input signals of terminals B and C to terminal A, and (2) synthesized sound of microphone input signals of terminals A and C to terminal B. (speech A+C), and synthesized sounds (speech A+B) of the microphone input signals of terminals A and B are output to terminal C, respectively.
 有音検出部33(33-1~33-3)は、ミキサ部30から出力された合成音に対して、有音検出処理を行う。有音検出部33は、種々様々な有音検出処理を適用することができるが、例えば、特許文献2に記載の技術を適用することができる。有音検出部33は、有音判定結果を条件判定部34に与える。 The spurt detection unit 33 (33-1 to 33-3) performs spurt detection processing on the synthesized speech output from the mixer unit 30. FIG. The spurt detection unit 33 can apply various spurt detection processes, and for example, the technology described in Patent Document 2 can be applied. The voice presence detection unit 33 gives the voice presence determination result to the condition determination unit 34 .
 条件判定部34は、ミキサ部30から出力された各合成音に対する各有音検出部33(33-1~33-3)の判定結果について、判定結果の内、いずれかが有音との判定結果になっているか(OR条件)判定を行う。条件判定部34は、OR条件判定の結果をエコー検出部35に与える。有音検出手段は、例えば、上述の有音検出部33及び当該条件判定部34により実現される。 The condition determination unit 34 determines that one of the determination results of the voice detection units 33 (33-1 to 33-3) for each synthetic sound output from the mixer unit 30 is voice. It is determined whether the result is obtained (OR condition). The condition determination unit 34 gives the OR condition determination result to the echo detection unit 35 . The voice presence detection means is implemented by, for example, the voice presence detection unit 33 and the condition determination unit 34 described above.
 帯域制限部31(31-1~31-3)は、ミキサ部30から出力された合成音に対して、所定の帯域を制限(消去)して、各会議端末2のスピーカで出力する信号として出力するものである。例えば、特許文献3に記載の技術を適用して、聴感に影響の少ない2.5~3.0kHz帯域を制限する。 The band limiting unit 31 (31-1 to 31-3) limits (erases) a predetermined band of the synthesized sound output from the mixer unit 30, and outputs the signal from the speaker of each conference terminal 2. This is the output. For example, the technique described in Patent Document 3 is applied to limit the 2.5-3.0 kHz band, which has little effect on hearing.
 帯域制限検出部32(32-1~32-3)は、各会議端末2のマイク入力(マイク入力信号)に対して、上述の帯域制限部31で帯域制限した信号(各会議端末2のスピーカで出力した信号)のマイク入力を検出する。帯域制限検出部32は、帯域制限の検出結果(会議端末2のマイク入力信号に、各会議端末2の帯域制限部31で帯域制限した信号が含まれているか否か)をエコー検出部35に与える。帯域制限検出部32の詳細は、動作の項で述べる。 The band limit detection unit 32 (32-1 to 32-3) detects the signal (speaker Detects the microphone input of the signal output by the The band limit detector 32 sends the band limit detection result (whether or not the microphone input signal of the conference terminal 2 contains a signal band-limited by the band limiter 31 of each conference terminal 2) to the echo detector 35. give. The details of the band limit detector 32 will be described in the section on operation.
 エコー検出部35(35-1~35-3)は、各会議端末2のマイク入力(マイク入力信号)にエコー成分が含まれているか否かを検出するものである。具体的に、エコー検出部35は、帯域制限検出部32の帯域制限の検出結果、及び条件判定部34のOR条件判定の結果に基づいて、エコー検出を行う。例えば、エコー検出部35は、各会議端末2で、「帯域制限信号のマイク入力」かつ「いずれかのスピーカ出力で有音状態」が成立したら、エコー検出とする。エコー検出部35は、エコー検出結果をエコー消去部36に与える。 The echo detector 35 (35-1 to 35-3) detects whether or not the microphone input (microphone input signal) of each conference terminal 2 contains an echo component. Specifically, the echo detection unit 35 performs echo detection based on the band limitation detection result of the band limitation detection unit 32 and the OR condition determination result of the condition determination unit 34 . For example, when each conference terminal 2 satisfies "microphone input of band-limited signal" and "sound output from any speaker", the echo detection unit 35 detects the echo. The echo detector 35 gives the echo detection result to the echo canceler 36 .
 エコー消去部36(36-1~36-3)は、各エコー検出部35によるエコー検出結果がエコー検出の場合には、マイク入力信号のエコーを消去する。 The echo canceller 36 (36-1 to 36-3) cancels the echo of the microphone input signal when the echo detection result by each echo detector 35 is echo detection.
 例えば、エコー消去部36は、エコーサプレッサを適用することができる。エコー消去部36がエコーサプレッサを用いる場合、最も簡単な方法は、エコー検出時に、マイク入力信号をそのまま減衰させる方法である。また、エコー消去部36は、当該信号を帯域分割して、必要に応じて帯域毎に減衰させるようにしても良い(その際、FFT(Fast Fourier Transform)等を使用しても良い)。エコー消去部36がエコーサプレッサを用いる場合、例えば、特許文献4に記載の技術(方法)を適用することができる。 For example, the echo cancellation unit 36 can apply an echo suppressor. When the echo canceller 36 uses an echo suppressor, the simplest method is to attenuate the microphone input signal as it is when echo is detected. The echo cancellation unit 36 may divide the signal into bands and attenuate each band as necessary (in this case, FFT (Fast Fourier Transform) or the like may be used). If the echo canceller 36 uses an echo suppressor, for example, the technology (method) described in Patent Document 4 can be applied.
 なお、エコー消去部36は、適応エコーキャンセラを用いても良い。例えば、適応エコーキャンセラの実現には、特許文献3又は5に記載の技術(方法)を適用することができる。 Note that the echo cancellation unit 36 may use an adaptive echo canceller. For example, the technology (method) described in Patent Document 3 or 5 can be applied to implement an adaptive echo canceller.
 (A-2)実施形態の動作
 次に、以上のような構成を有する実施形態に係る会議システム1の動作を説明する。会議システム1では、エコーを検出する処理(主に、帯域制限検出処理)に特徴を有するので、以下では、この点を中心に、図面を参照しながら詳細に説明する。
(A-2) Operation of Embodiment Next, the operation of the conference system 1 according to the embodiment having the configuration as described above will be described. Since the conference system 1 is characterized by echo detection processing (mainly band limit detection processing), this point will be mainly described in detail below with reference to the drawings.
 (A-2-1)帯域制限部31及び帯域制限検出部32の処理
 図2は、実施形態に係る帯域制限部の詳細構成を示すブロック図である。
(A-2-1) Processing of Band Limiter 31 and Band Limit Detector 32 FIG. 2 is a block diagram showing the detailed configuration of the band limiter according to the embodiment.
 図2に示すように、帯域制限部31は、BEF(帯域阻止フィルタ)310を有する。BEF310は、ミキサ部30からの出力信号(各会議端末2のスピーカで出力する合成音)に対して、例えば、聴感に影響の少ない2.5~3.0kHz帯域を消して出力する。そして、2.5~3.0kHzの特定帯域の信号成分が減衰した信号は、各会議端末2に送信されて、スピーカ出力される。 As shown in FIG. 2 , the band limiting section 31 has a BEF (Band Elimination Filter) 310 . The BEF 310 eliminates, for example, the 2.5 to 3.0 kHz band, which has little effect on hearing, from the output signal from the mixer section 30 (synthesized sound output from the speaker of each conference terminal 2). Then, the signal in which the signal component in the specific band of 2.5 to 3.0 kHz is attenuated is transmitted to each conference terminal 2 and output from the speaker.
 一方、各会議端末2からのマイク入力信号が会議サーバ3に送信される。会議サーバ3は受信した各会議端末2のマイク入力信号を、各帯域制限検出部32(32-1~32-3)に与える。 On the other hand, the microphone input signal from each conference terminal 2 is transmitted to the conference server 3. The conference server 3 gives the received microphone input signal of each conference terminal 2 to each band limit detector 32 (32-1 to 32-3).
 図3は、実施形態に係る帯域制限検出部の詳細構成を示すブロック図である。 FIG. 3 is a block diagram showing the detailed configuration of the band limit detection unit according to the embodiment.
 帯域通過フィルタ(BPF)321は、マイク入力信号のうち特定帯域の信号(2.5~3.0kHz帯域の信号)を通過させる。また、BPF321からの出力信号は、電力算出部323に与えられる。 A band-pass filter (BPF) 321 passes a signal in a specific band (signal in a band of 2.5 to 3.0 kHz) among microphone input signals. Also, the output signal from the BPF 321 is given to the power calculator 323 .
 帯域通過フィルタ(BPF)322は、マイク入力信号のうち特定帯域の信号(2.0~2.5kHz帯域の信号)を通過させる。また、BPF322からの出力信号は、電力算出部324に与えられる。 A band-pass filter (BPF) 322 passes a signal in a specific band (signal in a band of 2.0 to 2.5 kHz) among microphone input signals. Also, the output signal from the BPF 322 is given to the power calculator 324 .
 電力算出部323は、BPF321からの出力信号が入力され、その入力信号の各サンプル値を平均化した電力値P_BPF1を算出して、帯域制限判定部325に与える。 The power calculation unit 323 receives the output signal from the BPF 321 , calculates a power value P_BPF1 by averaging each sample value of the input signal, and provides it to the band limit determination unit 325 .
 ここで、電力算出部323による入力信号の平均電力値の算出方法は、例えば、入力信号の各サンプル値を2乗した値をFIR形フィルタにより平均化して電力値P_BPF1を算出する方法を適用することができる。なお、電力算出部323が入力信号の平均電力値を算出する方法は、サンプル値を2乗する方法に限定されるものではなく、入力信号のサンプル値の絶対値を用いるようにしても良い。また、電力算出部323は、FIRフィルタに代えて適当な時定数を持つIIR形LPFを用いるようにしても良い。 Here, as the method of calculating the average power value of the input signal by the power calculating unit 323, for example, a method of calculating the power value P_BPF1 by averaging the values obtained by squaring each sample value of the input signal using an FIR filter is applied. be able to. The method by which the power calculator 323 calculates the average power value of the input signal is not limited to the method of squaring the sample value, and the absolute value of the sample value of the input signal may be used. Also, the power calculator 323 may use an IIR type LPF having an appropriate time constant instead of the FIR filter.
 電力算出部324は、BPF322からの出力信号が入力され、その入力信号の各サンプル値を平均化した電力値P_BPF2を算出して、帯域制限判定部325に与える。電力算出部324による入力信号の平均電力値の算出方法は、電力算出部323と同様の方法を適用することができる。 The power calculation unit 324 receives the output signal from the BPF 322 , calculates a power value P_BPF2 by averaging each sample value of the input signal, and provides it to the band limit determination unit 325 . As the method of calculating the average power value of the input signal by the power calculator 324, the same method as that of the power calculator 323 can be applied.
 帯域制限判定部325は、電力算出部323及び電力算出部324からの各電力値P_BPF1、P_BPF2に基づき、帯域制限の状態(即ち、エコー状態)を判定して、その判定結果をエコー検出部35に出力する。 Based on the power values P_BPF1 and P_BPF2 from the power calculator 323 and the power calculator 324, the band limit determination unit 325 determines the band limit state (that is, the echo state), and sends the determination result to the echo detector 35. output to
 図4Aおよび図4Bは、実施形態に係る会議端末のマイク入力信号の周波数特性を説明する説明図である。図4Aは、会議端末2のマイクからエコーが入力された時のマイク入力信号の周波数特性を示しており、図4Bは、該当端末のマイクに発話した声が入力された時のマイク入力信号の周波数特性を示している。 4A and 4B are explanatory diagrams explaining the frequency characteristics of the microphone input signal of the conference terminal according to the embodiment. FIG. 4A shows the frequency characteristics of the microphone input signal when an echo is input from the microphone of the conference terminal 2, and FIG. It shows frequency characteristics.
 帯域制限部31(31-1~31-3)により各会議端末2のスピーカ出力信号の2.5~3.0kHzの特定帯域の信号は減衰されている。 A signal in a specific band of 2.5 to 3.0 kHz of the speaker output signal of each conference terminal 2 is attenuated by the band limiter 31 (31-1 to 31-3).
 従って、マイクからエコー入力時(例えば、同一部屋に端末A及び端末Bが存在する場合には、端末A及び又は端末Bのスピーカからの出力信号が端末A及び又は端末Bのマイクに入力された時)は、マイク入力信号は、図4Aに示すように、2.5~3.0kHzが減衰している。 Therefore, when an echo is input from the microphone (for example, when terminal A and terminal B exist in the same room, the output signal from the speaker of terminal A and/or terminal B is input to the microphone of terminal A and/or terminal B). ), the microphone input signal is attenuated from 2.5 to 3.0 kHz, as shown in FIG. 4A.
 そのため、特定帯域2.5~3.0kHzを通過させるBPF321を通過する信号の平均電力値P_BPF1は、特定帯域以外の帯域2.0~2.5kHzを通過させるBPF322を通過する信号の平均電力値P_BPF2に比べて大きく減衰する傾向となる。 Therefore, the average power value P_BPF1 of the signal passing through the BPF 321 that passes the specific band of 2.5 to 3.0 kHz is the average power value of the signal that passes through the BPF 322 that passes the band of 2.0 to 2.5 kHz other than the specific band. It tends to be attenuated more than P_BPF2.
 一方、該当端末のマイクに発話した声が入力された時は、会議端末2の利用者が発話するため、利用者の音声が、直接、マイクに入力される。従って、マイク入力信号は、2.5~3.0kHzの特定帯域の信号パワーが大きくなる。 On the other hand, when the voice spoken is input to the microphone of the terminal, the user of the conference terminal 2 speaks, so the user's voice is directly input to the microphone. Therefore, the microphone input signal has a large signal power in the specific band of 2.5 to 3.0 kHz.
 そのため、2.5~3.0kHz帯域(BEF阻止帯域内)のパワーP_BPF1は、2.0~2.5kHz帯域(BEF阻止帯域外)のパワーP_BPF2に比べて、それほど大きく減衰する傾向とはならない。 Therefore, the power P_BPF1 in the 2.5-3.0 kHz band (within the BEF stopband) does not tend to attenuate significantly compared to the power P_BPF2 in the 2.0-2.5 kHz band (outside the BEF stopband). .
 帯域制限判定部325は、図4A及び図4Bに示すマイク入力信号の周波数特性に基づいて、下記の式(1)の条件が成立したときに帯域制限検出有り(エコー検出)と判定し、式(1)の条件が成立しないときに帯域制限検出無しと判定する。 Based on the frequency characteristics of the microphone input signal shown in FIGS. 4A and 4B, the band limit determination unit 325 determines that there is band limit detection (echo detection) when the condition of the following formula (1) is satisfied. When the condition (1) is not satisfied, it is determined that the band limit is not detected.
 P_BPF1/P_BPF2<TH …(1)
 式(1)は、BPF321を通過した信号の平均電力値P_BPF1と、BPF322を通過した信号の平均電力値P_BPF2との比が閾値TH未満であることを条件としている。これは、図4Aに示すように、エコー入力時には、特定帯域2.5~3.0kHzの信号の電力値の減衰量が小さいことを判断するためである。
P_BPF1/P_BPF2<TH (1)
Expression (1) is conditioned on the fact that the ratio of the average power value P_BPF1 of the signal that has passed through the BPF 321 and the average power value P_BPF2 of the signal that has passed through the BPF 322 is less than the threshold TH. This is because, as shown in FIG. 4A, it is determined that the amount of attenuation of the power value of the signal in the specific band of 2.5 to 3.0 kHz is small during echo input.
 帯域制限判定部325は、式(1)の条件を満たすか否かの判定結果を、エコー検出部35に与える。 The band limit determination unit 325 provides the echo detection unit 35 with the determination result of whether or not the condition of formula (1) is satisfied.
 (A-2-2)エコー検出部35等の処理
 ミキサ部30の出力信号は、各有音検出部33(33-1~33-3)で、有音検出がされる。条件判定部34は、各有音検出部33(33-1~33-3)のいずれかで有音が検出されたか否か(OR条件判定の結果)をエコー検出部35に与える。
(A-2-2) Processing of Echo Detector 35 and Others The output signal of the mixer 30 is subjected to voice activity detection by each of the voice activity detectors 33 (33-1 to 33-3). The condition determination unit 34 provides the echo detection unit 35 with information as to whether or not the presence of sound is detected by any of the presence detection units 33 (33-1 to 33-3) (result of OR condition determination).
 エコー検出部35は、上述の帯域制限検出部32(帯域制限判定部325)で、マイク入力信号に帯域制限が検出(エコー検出)され、条件判定部34から、各ミキサ部30の出力信号のいずれかが有音である場合のみ、エコーを検出したとする(それ以外は、エコー不検出とする)。 In the echo detection unit 35, the band limit detection unit 32 (band limit determination unit 325) detects the band limitation (echo detection) in the microphone input signal, and the condition determination unit 34 determines the output signal of each mixer unit 30. Echo is detected only when one of them is spurt (otherwise, echo is not detected).
 エコー消去部36は、エコー検出部35でエコー検出された場合、マイク入力信号に対して、エコーサプレッサ等(例えば、上述の特許文献3、上述の特許文献4、上述の特許文献5等に記載の方法)によるエコー消去(制限)処理を行う。 When an echo is detected by the echo detection unit 35, the echo canceller 36 applies an echo suppressor or the like (for example, described in the above-described Patent Document 3, Patent Document 4, Patent Document 5, etc.) to the microphone input signal. Echo cancellation (restriction) processing is performed by the method of (2).
 (A-3)実施形態の効果
 本実施形態によれば、会議システム1では、聴感に影響しない帯域を消去した音声(会議音声)を各会議端末2のスピーカで出力し、各会議端末2のマイク入力信号に対して該当帯域減衰を検出することにより、エコー入力を即座に検出することが可能となった。
(A-3) Effect of the Embodiment According to the present embodiment, in the conference system 1, the audio (conference audio) in which the band that does not affect hearing is eliminated is output from the speaker of each conference terminal 2, By detecting the corresponding band attenuation for the microphone input signal, it became possible to detect the echo input immediately.
 そして、エコー検出時には、各会議端末2のマイク入力信号のエコーを消去することで、Web会議でのハウリングを防止できる。Web会議に参加している端末が同室に2台(端末A、B…)以上あり、いずれかがスピーカモードになって発生したハウリングを防止できる。 Then, at the time of echo detection, by canceling the echo of the microphone input signal of each conference terminal 2, howling in the Web conference can be prevented. There are two or more terminals (terminals A, B, . . . ) participating in the Web conference in the same room, and howling caused by one of them becoming a speaker mode can be prevented.
 (B)他の実施形態
 上記実施形態においても種々の変形実施形態を言及したが、本発明は、以下の変形実施形態にも適用できる。
(B) Other Embodiments Although various modified embodiments have been mentioned in the above embodiments, the present invention can also be applied to the following modified embodiments.
 (B-1)上記実施形態では、会議サーバ3が主体となって、ハウリングを防止する構成を示したが、図5に示すように、各会議端末2が主体となる構成を採用して、ハウリングを防止しても良い。図5では、各会議端末2が図1で示した会議サーバ3の構成(帯域制限部31、帯域制限検出部、有音検出部33、エコー検出部35、及びエコー消去部36)に加えて、第2のミキサ手段としてのミキサ部21を有する。 (B-1) In the above embodiment, the conference server 3 plays a major role in preventing howling. However, as shown in FIG. Howling may be prevented. In FIG. 5, in addition to the configuration of the conference server 3 shown in FIG. , and a mixer section 21 as a second mixer means.
 ミキサ部21は、会議サーバ3のミキサ部30で合成された音声(自端末用のスピーカで出力する合成音声)に自端末の音声を加算するものである。 The mixer unit 21 adds the sound of the own terminal to the sound synthesized by the mixer unit 30 of the conference server 3 (synthesized sound output by the speaker for the own terminal).
 また、図5の有音検出部33では、上述のミキサ部21で合成した合成音声(会議に参加している全ての会議端末2の合成音声)に対して、有音検出判定を行う。これ以外の処理は、上述の図1で示した各構成の処理と同一(又は類似)する。 In addition, the active voice detection unit 33 in FIG. 5 performs active voice detection determination on the synthetic voice synthesized by the mixer unit 21 (synthesized voice of all the conference terminals 2 participating in the conference). Processing other than this is the same (or similar) as the processing of each configuration shown in FIG.
 (B-2)変形例として、会議システム1の構成例は、上述の図1又は図5で示したものに限らず、各構成(機能)は、各会議端末2、会議サーバ3に適宜分散して配置しても良い。 (B-2) As a modified example, the configuration example of the conference system 1 is not limited to those shown in FIG. 1 or FIG. You can also place the
 (B-3)上記実施形態では、各会議端末2がスピーカ出力であることを前提として、会議サーバ3のミキサ部30の出力信号に対して帯域制限部31で帯域制限を適用することとしたが、各会議端末2がスピーカ出力なのかヘッドホン出力であるかを検知可能で有る場合には、スピーカ出力にのみ帯域制限を適用しても良い(ヘッドホン出力の場合には、エコーが混入する可能性は低いため)。 (B-3) In the above embodiment, on the premise that each conference terminal 2 is a speaker output, the band limitation unit 31 applies band limitation to the output signal of the mixer unit 30 of the conference server 3. However, if each conference terminal 2 can detect whether it is speaker output or headphone output, band limitation may be applied only to speaker output (in the case of headphone output, echo may be mixed in). (because of the low volatility).
 (B-4)変形例として、同一拠点(同一部屋)に存在する参加者(会議端末2)が特定可能な場合には、上述のエコーを検出して消去する処理(ハウリング防止処理)の適用範囲を同一拠点に存在する会議端末2のみに絞っても良い。会議端末2が同一拠点に存在するか否かは、例えば、GPS(Global Positioning System)により判定しても良い。 (B-4) As a modification, when participants (conference terminals 2) existing at the same site (same room) can be identified, application of the above-described echo detection and elimination processing (howling prevention processing) The range may be narrowed down to only the conference terminals 2 existing at the same base. Whether or not the conference terminal 2 exists at the same location may be determined by, for example, GPS (Global Positioning System).
 (B-5)上記実施形態では、会議端末2のスピーカで出力する音声に対して有音検出を行っていたが、会議端末2のマイクで入力した音声(マイク入力信号)に対して有音検出を行うようにしても良い。 (B-5) In the above embodiment, the presence of sound is detected in the voice output from the speaker of the conference terminal 2. Detection may be performed.
 (B-6)上記実施形態では、帯域制限の前に有音検出するようにしたが、帯域制限後に有音検出するようにしても良い。 (B-6) In the above embodiment, voice activity is detected before band limitation, but voice activity detection may be performed after band limitation.
 1…会議システム、2(2-A~2-C)…会議端末、3…会議サーバ、21、30…ミキサ部、31(31-1~31-3)…帯域制限部、32(32-1~32-3)…帯域制限検出部、33(33-1~33-3)…有音検出部、34…条件判定部、35(35-1~35-3)…エコー検出部、36(36-1~36-3)…エコー消去部、321、322…BPF、323、324…電力算出部、325…帯域制限判定部。 1... conference system, 2 (2-A to 2-C)... conference terminal, 3... conference server, 21, 30... mixer unit, 31 (31-1 to 31-3)... band limiter, 32 (32- 1 to 32-3) ... band limit detection section, 33 (33-1 to 33-3) ... voice detection section, 34 ... condition determination section, 35 (35-1 to 35-3) ... echo detection section, 36 (36-1 to 36-3)... Echo cancellers, 321, 322... BPF, 323, 324... Power calculators, 325... Band limit determiners.

Claims (11)

  1.  マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムであって、
     前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力する第1のミキサ手段と、
     前記各第1の合成音信号のいずれかが有音であることを検出する有音検出手段と、
     前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力する特定帯域減衰手段と、
     前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、
     前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、
     前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段と
     を有することを特徴とする会議システム。
    A conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
    a first mixer means for synthesizing the microphone input signals of other conference terminals other than the own terminal and outputting the synthesized sound signal as a first synthesized sound signal to each of the conference terminals;
    a spurt detection means for detecting that any one of the first synthesized sound signals is spurt;
    specific band attenuation means for attenuating a specific band of each of the first synthesized sound signals and outputting the signal after attenuation as an output signal of each of the conference terminals;
    band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band;
    When the band limit detection means detects that the output signal of each of the conference terminals after attenuation of the specific band is included and the voice activity detection means determines that there is voice activity, the echo detection means for detecting that the microphone input signal contains an echo;
    and echo cancellation means for canceling the echo detected by the echo detection means from the microphone input signal.
  2.  前記帯域制限検出手段は、
      前記マイク入力信号の前記特定帯域を抽出する特定帯域抽出部と、
      前記マイク入力信号の前記特定帯域以外の帯域を抽出する特定帯域外抽出部と、
      前記特定帯域抽出部からの出力信号の電力値と、前記特定帯域外抽出部からの出力信号の電力値との比が閾値未満のとき、特定帯域減衰後の前記各会議端末の出力信号が含まれていると判定する帯域制限判定部と
     を有することを特徴とする請求項1に記載の会議システム。
    The band limit detection means is
    a specific band extraction unit that extracts the specific band of the microphone input signal;
    an out-of-specific-band extraction unit that extracts a band other than the specific band of the microphone input signal;
    When the ratio of the power value of the output signal from the specific band extraction unit to the power value of the output signal from the out-of-specific band extraction unit is less than a threshold, the output signal of each conference terminal after specific band attenuation is included. 2. The conference system according to claim 1, further comprising: a bandwidth limit determination unit that determines that the bandwidth limit is set.
  3.  前記特定帯域が、聴感上の影響の少ない帯域のものであることを特徴とする請求項1に記載の会議システム。  The conference system according to claim 1, characterized in that the specific band is a band that has little effect on hearing.
  4.  前記特定帯域が、2.5~3.0kHzの帯域であることを特徴とする請求項3に記載の会議システム。 The conference system according to claim 3, wherein the specific band is a band of 2.5 to 3.0 kHz.
  5.  マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議サーバであって、
     前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力する第1のミキサ手段と、
     前記各第1の合成音信号のいずれかが有音であることを検出する有音検出手段と、
     前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力する特定帯域減衰手段と、
     前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、
     前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、
     前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段と
     を有することを特徴とする会議サーバ。
    A conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
    a first mixer means for synthesizing the microphone input signals of other conference terminals other than the own terminal and outputting the synthesized sound signal as a first synthesized sound signal to each of the conference terminals;
    a spurt detection means for detecting that any one of the first synthesized sound signals is spurt;
    specific band attenuation means for attenuating a specific band of each of the first synthesized sound signals and outputting the signal after attenuation as an output signal of each of the conference terminals;
    band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band;
    When the band limit detection means detects that the output signal of each of the conference terminals after attenuation of the specific band is included and the voice activity detection means determines that there is voice activity, the echo detection means for detecting that the microphone input signal contains an echo;
    and echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
  6.  マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議端末であって、
     前記会議サーバから取得した自端末以外の他の会議端末の前記マイク入力信号を合成した第1の合成音信号に、自端末の前記マイク入力信号を合成し、第2の合成音信号として出力する第2のミキサ手段と、
     前記第2の合成音信号が有音であることを検出する有音検出手段と、
     前記第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記会議端末の出力信号として出力する特定帯域減衰手段と、
     前記マイク入力信号に対して、特定帯域減衰後の前記会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、
     前記帯域制限検出手段により、特定帯域減衰後の前記会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、
     前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段と
     を有することを特徴とする会議端末。
    A conference terminal in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server,
    Synthesizing the microphone input signal of the own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of a conference terminal other than the own terminal acquired from the conference server and outputting the signal as a second synthesized sound signal a second mixer means;
    a spurt detection means for detecting that the second synthesized sound signal is spurt;
    specific band attenuation means for attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal;
    band limit detection means for detecting that the output signal of the conference terminal after attenuation in a specific band is included in the microphone input signal;
    When the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included and the voice activity detection means determines that there is voice activity, the microphone echo detection means for detecting that an input signal contains an echo;
    and echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
  7.  マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議サーバに搭載されるコンピュータを、
     前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力する第1のミキサ手段と、
     前記各第1の合成音信号のいずれかが有音であることを検出する有音検出手段と、
     前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力する特定帯域減衰手段と、
     前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、
     前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、
     前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段と
     して機能させることを特徴とするエコー消去プログラム。
    A computer mounted on a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
    a first mixer means for synthesizing the microphone input signals of other conference terminals other than the own terminal and outputting the synthesized sound signal as a first synthesized sound signal to each of the conference terminals;
    a spurt detection means for detecting that any one of the first synthesized sound signals is spurt;
    specific band attenuation means for attenuating a specific band of each of the first synthesized sound signals and outputting the signal after attenuation as an output signal of each of the conference terminals;
    band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band;
    When the band limit detection means detects that the output signal of each of the conference terminals after attenuation of the specific band is included and the voice activity detection means determines that there is voice activity, the echo detection means for detecting that the microphone input signal contains an echo;
    An echo cancellation program characterized by functioning as an echo cancellation means for canceling an echo detected by said echo detection means from said microphone input signal.
  8.  マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議端末に搭載されるコンピュータを、
     前記会議サーバから取得した自端末以外の他の会議端末の前記マイク入力信号を合成した第1の合成音信号に、自端末の前記マイク入力信号を合成し、第2の合成音信号として出力する第2のミキサ手段と、
     前記第2の合成音信号が有音であることを検出する有音検出手段と、
     前記第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記会議端末の出力信号として出力する特定帯域減衰手段と、
     前記マイク入力信号に対して、特定帯域減衰後の前記会議端末の出力信号が含まれていることを検出する帯域制限検出手段と、
     前記帯域制限検出手段により、特定帯域減衰後の前記会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出するエコー検出手段と、
     前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去するエコー消去手段と
     して機能させることを特徴とするエコー消去プログラム。
    A computer mounted on a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
    Synthesizing the microphone input signal of the own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of a conference terminal other than the own terminal acquired from the conference server and outputting the signal as a second synthesized sound signal a second mixer means;
    a spurt detection means for detecting that the second synthesized sound signal is spurt;
    specific band attenuation means for attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal;
    band limit detection means for detecting that the output signal of the conference terminal after attenuation in a specific band is included in the microphone input signal;
    When the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included and the voice activity detection means determines that there is voice activity, the microphone echo detection means for detecting that an input signal contains an echo;
    An echo cancellation program characterized by functioning as an echo cancellation means for canceling an echo detected by said echo detection means from said microphone input signal.
  9.  マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムに使用するエコー消去方法であって、
     第1のミキサ手段、有音検出手段、特定帯域減衰手段、帯域制限検出手段、エコー検出手段、及びエコー消去手段を有し、
     前記第1のミキサ手段は、前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力し、
     前記有音検出手段は、前記各第1の合成音信号のいずれかが有音であることを検出し、
     前記特定帯域減衰手段は、前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力し、
     前記帯域制限検出手段は、前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出し、
     前記エコー検出手段は、前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出し、
     前記エコー消去手段は、前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去する
     ことを特徴とするエコー消去方法。
    An echo cancellation method used in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server,
    having first mixer means, voice presence detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means;
    the first mixer means for each of the conference terminals to synthesize the microphone input signals of other conference terminals other than the own terminal and output the synthesized signal as a first synthesized sound signal;
    the spurt detection means detects that any one of the first synthesized speech signals is spurt;
    the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals and outputs the signal after attenuation as an output signal of each of the conference terminals;
    the band limit detection means detects that the microphone input signal includes an output signal of each conference terminal after attenuation in a specific band;
    The echo detection means detects that the output signal of each of the conference terminals after attenuation of a specific band is included by the band limit detection means, and determines that there is sound by the voice activity detection means. detected that the microphone input signal contains an echo,
    The echo cancellation method, wherein the echo cancellation means cancels the echo detected by the echo detection means from the microphone input signal.
  10.  マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議サーバに使用するエコー消去方法であって、
     第1のミキサ手段、有音検出手段、特定帯域減衰手段、帯域制限検出手段、エコー検出手段、及びエコー消去手段を有し、
     前記第1のミキサ手段は、前記各会議端末に、それぞれ、自端末以外の他の会議端末の前記マイク入力信号を合成し、第1の合成音信号として出力し、
     前記有音検出手段は、前記各第1の合成音信号のいずれかが有音であることを検出し、
     前記特定帯域減衰手段は、前記各第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記各会議端末の出力信号として出力し、
     前記帯域制限検出手段は、前記マイク入力信号に対して、特定帯域減衰後の前記各会議端末の出力信号が含まれていることを検出し、
     前記エコー検出手段は、前記帯域制限検出手段により、特定帯域減衰後の前記各会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出し、
     前記エコー消去手段は、前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去する
     ことを特徴とするエコー消去方法。
    An echo cancellation method used in a conference server in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server,
    having first mixer means, voice presence detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means;
    the first mixer means for each of the conference terminals to synthesize the microphone input signals of other conference terminals other than the own terminal and output the synthesized signal as a first synthesized sound signal;
    the spurt detection means detects that any one of the first synthesized speech signals is spurt;
    the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals and outputs the signal after attenuation as an output signal of each of the conference terminals;
    the band limit detection means detects that the microphone input signal includes an output signal of each conference terminal after attenuation in a specific band;
    The echo detection means detects that the output signal of each of the conference terminals after attenuation of a specific band is included by the band limit detection means, and determines that there is sound by the voice activity detection means. detected that the microphone input signal contains an echo,
    The echo cancellation method, wherein the echo cancellation means cancels the echo detected by the echo detection means from the microphone input signal.
  11.  マイク入力信号が入力されるマイクロフォンを備える複数の会議端末と、会議サーバとを含む会議システムにおける会議端末に使用するエコー消去方法であって、
     第2のミキサ手段、有音検出手段、特定帯域減衰手段、帯域制限検出手段、エコー検出手段、及びエコー消去手段を有し、
     前記第2のミキサ手段は、前記会議サーバから取得した自端末以外の他の会議端末の前記マイク入力信号を合成した第1の合成音信号に、自端末の前記マイク入力信号を合成し、第2の合成音信号として出力し、
     前記有音検出手段は、前記第2の合成音信号が有音であることを検出し、
     前記特定帯域減衰手段は、前記第1の合成音信号の特定帯域を減衰し、減衰後の信号を前記会議端末の出力信号として出力し、
     前記帯域制限検出手段は、前記マイク入力信号に対して、特定帯域減衰後の前記会議端末の出力信号が含まれていることを検出し、
     前記エコー検出手段は、前記帯域制限検出手段により、特定帯域減衰後の前記会議端末の出力信号が含まれていることが検出され、かつ、前記有音検出手段により、有音であると判定された場合に、前記マイク入力信号にエコーが含まれていることを検出し、
     前記エコー消去手段は、前記エコー検出手段により検出されたエコーを前記マイク入力信号から消去する
     ことを特徴とするエコー消去方法。
    An echo canceling method used for a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
    a second mixer means, voice presence detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means;
    The second mixer unit synthesizes the microphone input signal of its own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of a conference terminal other than the own terminal acquired from the conference server, and output as a synthesized sound signal of 2,
    The spurt detection means detects that the second synthesized speech signal is spurt,
    the specific band attenuation means attenuates a specific band of the first synthesized sound signal and outputs the signal after attenuation as an output signal of the conference terminal;
    The band limit detection means detects that the microphone input signal includes an output signal of the conference terminal after attenuation in a specific band,
    The echo detection means detects that the output signal of the conference terminal after attenuation of a specific band is included by the band limit detection means, and is determined to be voice by the voice activity detection means. detecting that the microphone input signal contains an echo,
    The echo cancellation method, wherein the echo cancellation means cancels the echo detected by the echo detection means from the microphone input signal.
PCT/JP2022/019837 2021-08-27 2022-05-10 Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method WO2023026600A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021138564A JP2023032434A (en) 2021-08-27 2021-08-27 Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method
JP2021-138564 2021-08-27

Publications (1)

Publication Number Publication Date
WO2023026600A1 true WO2023026600A1 (en) 2023-03-02

Family

ID=85321733

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/019837 WO2023026600A1 (en) 2021-08-27 2022-05-10 Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method

Country Status (2)

Country Link
JP (1) JP2023032434A (en)
WO (1) WO2023026600A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015170867A (en) * 2014-03-04 2015-09-28 沖電気工業株式会社 Talk state detector, talk state detection method, echo canceller, and echo suppressor
JP2015220482A (en) * 2014-05-14 2015-12-07 日本電信電話株式会社 Handset terminal, echo cancellation system, echo cancellation method, program
JP2016184110A (en) * 2015-03-26 2016-10-20 沖電気工業株式会社 Multipoint conference device, multipoint conference control program, and multipoint conference control method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015170867A (en) * 2014-03-04 2015-09-28 沖電気工業株式会社 Talk state detector, talk state detection method, echo canceller, and echo suppressor
JP2015220482A (en) * 2014-05-14 2015-12-07 日本電信電話株式会社 Handset terminal, echo cancellation system, echo cancellation method, program
JP2016184110A (en) * 2015-03-26 2016-10-20 沖電気工業株式会社 Multipoint conference device, multipoint conference control program, and multipoint conference control method

Also Published As

Publication number Publication date
JP2023032434A (en) 2023-03-09

Similar Documents

Publication Publication Date Title
US10142484B2 (en) Nearby talker obscuring, duplicate dialogue amelioration and automatic muting of acoustically proximate participants
US10880427B2 (en) Method, apparatus, and computer-readable media utilizing residual echo estimate information to derive secondary echo reduction parameters
US7155018B1 (en) System and method facilitating acoustic echo cancellation convergence detection
CA2560034C (en) System for selectively extracting components of an audio input signal
US7272224B1 (en) Echo cancellation
US8233632B1 (en) Method and apparatus for multi-channel audio processing using single-channel components
US7092516B2 (en) Echo processor generating pseudo background noise with high naturalness
CN110970045B (en) Mixing processing method, mixing processing device, electronic equipment and storage medium
EP1612963A2 (en) Multi-Channel Echo Cancellation With Round Robin Regularization
US9042535B2 (en) Echo control optimization
CN101958122B (en) Method and device for eliminating echo
WO2021077599A1 (en) Double-talk detection method and apparatus, computer device and storage medium
US8553520B2 (en) System and method for echo suppression in web browser-based communication
US5666407A (en) Software-based bridging system for full duplex audio telephone conferencing
WO2023026600A1 (en) Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method
US20230410828A1 (en) Systems and methods for echo mitigation
US11626093B2 (en) Method and system for avoiding howling disturbance on conferences
Fukui et al. Acoustic echo and noise canceller for personal hands-free video IP phone
JPH09130306A (en) Loud speaking device and echo canceller
WO2020225851A1 (en) Data correction device, data correction method, and program
Fischer et al. A SOFTWARE STEREO ACOUSTIC ECHO CANCELER FOR MICROSOFT WINDOWSTM
WO2024084854A1 (en) Sound adjustment method, sound adjustment device, sound adjustment system, and progarm
JP2012105115A (en) Echo canceller, echo cancellation program, and telephone apparatus
Papp et al. Hands-free VoIP solution for embedded platforms in consumer electronics
JPS61121625A (en) Echo signal erasing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22860897

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22860897

Country of ref document: EP

Kind code of ref document: A1