WO2023026600A1

WO2023026600A1 - Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method

Info

Publication number: WO2023026600A1
Application number: PCT/JP2022/019837
Authority: WO
Inventors: 高詩石黒
Original assignee: 沖電気工業株式会社
Priority date: 2021-08-27
Filing date: 2022-05-10
Publication date: 2023-03-02
Also published as: JP2023032434A

Abstract

[Problem] To provide a conference system capable of effectively suppressing the occurrence of howling in a web conference or the like. [Solution] A conference system according to the present invention comprises: a first mixing means for outputting, to each terminal participating in a conference, a first combined sound signal obtained by combining respective microphone input signals of other terminals beside the terminal itself; a with-sound detecting means for detecting whether any of the first combined sound signals includes a sound; an attenuating means for attenuating a specific band of each first combined sound signal and outputting an output signal for each terminal after attenuation; a band limit detecting means for detecting that the output signals for each terminal after attenuation are included in the microphone input signals; an echo detecting means for detecting that the microphone input signal includes an echo if it is detected that the microphone input signal includes the output signals of each terminal after attenuation, and it is determined that the output signals include a sound; and an echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

Description

Conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method

The present invention relates to a conference system, a conference server, a conference terminal, an echo cancellation program, and an echo cancellation method, and can be applied, for example, to a conference system that conducts web conferences.

In recent years, due to the impact of the new coronavirus, the use of remote conference systems such as web conferences and video conferences has increased in various situations.

Various techniques have been introduced to prevent echoes and howling in teleconferencing systems such as the above-mentioned web conferencing. For example, Patent Literature 1 discloses a technique of synthesizing voice input signals from each terminal microphone in the same group and then removing echo using an adaptive filter.

JP 2013-251630 A JP 2011-070084 A JP 2015-170867 A JP 2017-034355 A JP 2009-033344 A

However, in the conventional technology described above, when a web conference is held using an information terminal such as a notebook-type portable personal computer (notebook PC) or a tablet terminal, the conference participants ( (2 or more participants) could not effectively suppress the howling that occurs with the voices between adjacent information terminals by running the speakerphone on the information terminal.

For example, the technology described in Patent Document 1 can only be applied to a configuration that can synthesize one speaker output in the same group and microphone inputs in the same group, so it cannot be applied to a general Web conference. That is, in the Web conference, voice input from terminals other than the own terminal is synthesized and output from the speaker, so the output sound differs from terminal to terminal.

Therefore, there is a demand for a conference system, conference server, conference terminal, echo cancellation program, and echo cancellation method that can effectively suppress the occurrence of howling in web conferences and the like.

A first aspect of the present invention is a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, wherein: (1) each of the conference terminals has a first mixer means for synthesizing the microphone input signal of the conference terminal and outputting it as a first synthesized sound signal; (3) specific band attenuation means for attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; (4) the above (5) band limit detection means for detecting that the output signal of each conference terminal after attenuation in a specific band is included in the microphone input signal; Detecting that an echo is included in the microphone input signal when it is detected that the output signal of each conference terminal is included and the presence of voice is determined by the voice detection means. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

A second aspect of the present invention is a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, wherein: (1) each of the conference terminals has its own terminal (2) any one of said first synthesized speech signals is active; (3) specific band attenuation means for attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; 4) band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after the conference terminal is included, and when it is determined that there is voice by the voice presence detection means, the microphone input signal includes an echo. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

A third aspect of the present invention is a conference terminal in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server, wherein: (2) second mixer means for synthesizing the microphone input signal of its own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of another conference terminal and outputting the result as a second synthesized sound signal; (3) attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal; (4) band limit detection means for detecting that the microphone input signal includes the output signal of the conference terminal after attenuation of the specific band; (5) the above When the band limit detection means detects that the output signal of the conference terminal after attenuation in a specific band is included and the voice activity detection means determines that there is voice activity, the microphone input is detected. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

An echo cancellation program according to a fourth aspect of the present invention provides a computer mounted on a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server. (2) first mixer means for synthesizing the microphone input signals of conference terminals other than the own terminal and outputting the signals as first synthesized speech signals; (3) attenuating a specific band of each of the first synthesized speech signals and outputting the signal after attenuation as an output signal of each of the conference terminals; (4) band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; (5) the band limit. When the detection means detects that the output signal of each of the conference terminals after attenuation in a specific band is included and the voice activity detection means determines that there is voice activity, the microphone input signal is detected. and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.

An echo cancellation program according to a fifth aspect of the present invention provides a computer installed in a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, (1) the conference server; a first synthesized sound signal obtained by synthesizing the microphone input signal of another conference terminal other than the own terminal obtained from (2) sound detection means for detecting that the second synthesized sound signal is sound; as an output signal of the conference terminal; and (4) band limit detection for detecting that the microphone input signal includes the output signal of the conference terminal after attenuating the specific band. (5) the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included, and the voice activity detection means determines that there is voice activity; and (6) echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal. It is characterized by

A sixth aspect of the present invention is an echo canceling method used in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, comprising first mixer means and active voice detection means. , specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means; synthesize the microphone input signals of and output as a first synthesized sound signal; (3) the specific band attenuation means attenuates a specific band of each first synthesized sound signal and outputs the signal after attenuation as an output signal of each conference terminal; (5) the echo detection means detects that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after the conference terminal is included, and when it is determined that there is voice by the voice presence detection means, the microphone input signal includes an echo. (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;

A seventh aspect of the present invention is an echo canceling method used in a conference server in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server, the method comprising first mixer means, It has sound detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means, and (1) the first mixer means is provided for each of the conference terminals to (2) the active voice detecting means detects that any one of the first synthetic voice signals is active; (3) the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals, outputs the signal after attenuation as an output signal of each of the conference terminals, and (4) limits the band. The detection means detects that the microphone input signal includes the output signal of each of the conference terminals after attenuation in a specific band; When it is detected that the output signal of each of the conference terminals after attenuation in a specific band is included, and when it is determined by the voice activity detecting means that there is voice activity, the microphone input signal includes an echo. (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;

An eighth aspect of the present invention is an echo canceling method used in a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server, comprising: second mixer means; a sound detection means, a specific band attenuation means, a band limit detection means, an echo detection means, and an echo cancellation means; synthesizes the microphone input signal of its own terminal with a first synthesized speech signal obtained by synthesizing the microphone input signal of the terminal, and outputs the synthesized speech signal as a second synthesized speech signal; (3) the specific band attenuation means attenuates a specific band of the first synthesized sound signal and outputs the signal after attenuation as an output signal of the conference terminal; (4) the band limit detection means detects that the microphone input signal includes the output signal of the conference terminal after attenuation in a specific band; (5) the echo detection means is detected by the band limit detection means that the output signal of the conference terminal after attenuation of a specific band is included, and when the spurt detection means determines that there is spurt, (6) the echo canceling means cancels the echo detected by the echo detecting means from the microphone input signal;

According to the present invention, it is possible to effectively suppress the occurrence of howling in web conferences and the like.

1 is a block diagram showing a configuration example of a conference system according to an embodiment; FIG. 3 is a block diagram showing the detailed configuration of a band limiter according to the embodiment; FIG. 4 is a block diagram showing a detailed configuration of a band limit detector according to the embodiment; FIG. FIG. 4 is an explanatory diagram illustrating frequency characteristics of a microphone input signal of the conference terminal according to the embodiment; FIG. 4 is an explanatory diagram illustrating frequency characteristics of a microphone input signal of the conference terminal according to the embodiment; FIG. 11 is a block diagram showing a configuration example of a conference system according to a modification;

(A) Main Embodiments An embodiment of a conference system, a conference server, a conference terminal, an echo cancellation program, and an echo cancellation method will be described in detail below with reference to the drawings.

(A-1) Configuration of Embodiment (A-1-1) Overall Configuration FIG. 1 is a block diagram showing a configuration example of a conference system according to an embodiment.

In FIG. 1, the conference system 1 has three conference terminals 2 (2-A to 2-C) and a conference server 3. Although three conference terminals 2 (2-A to 2-C) are shown in FIG. 1 to simplify the explanation, the number of conference terminals 2 is not particularly limited. That is, in this embodiment, three conference terminals 2-A to 2-C hold one conference, but the number of conference terminals 2 holding one conference is not particularly limited. do not have.

Also, in FIG. 1, illustration of the connection configuration between the conference terminal 2 and the conference server 3 is omitted, but various connection configurations can be applied. In this embodiment, the conference terminal 2 and the conference server 3 are capable of two-way communication via a communication line (for example, a wide area network such as the Internet, a telephone line, or a dedicated line). and

The conference server 3 has a function of synthesizing the voices obtained from each of the conference terminals 2 at multiple bases and converting them into conference data. The conference server 3 outputs (transmits) conference data (synthetic sound) for each conference terminal 2 to each conference terminal.

The conference terminal 2 is a terminal that participates in the conference, and may be an information processing terminal that has an audio input/output function (microphone, speaker) and a communication function. For example, the conference terminal 2 can be a PC, a mobile terminal such as a smartphone, a tablet, a wearable device, or the like.

Hereinafter, when referring to each conference terminal 2 in FIG. 1, the conference terminal 2-A is simply referred to as "terminal A", the conference terminal 2-B as "terminal B", and the conference terminal 2-C as "terminal C". There is also Also, the voices (voice data) input from the microphones of the terminals A to C and transmitted to the conference server 3 may be called "voice A", "voice B", and "voice C", respectively.

(A-1-2) Detailed Configuration of Conference Server 3 In FIG. It has a section 33 , an echo detection section 35 and an echo cancellation section 36 .

The conference server 3 may be implemented by installing a program (echo cancellation program according to the embodiment) in a computer having a processor, memory, etc., but even in this case, the conference server 3 is functionally as shown in FIG. can be shown using Part or all of the conference server 3 may be realized by hardware.

The mixer unit 30 generates audio data (synthetic sound signal for conference) by synthesizing (mixing) the audio data (microphone input signal) supplied from each conference terminal 2 and supplies it to each corresponding conference terminal 2 . . For example, in FIG. 1, the mixer unit 30 provides (1) synthesized sound (voice B+C) of microphone input signals of terminals B and C to terminal A, and (2) synthesized sound of microphone input signals of terminals A and C to terminal B. (speech A+C), and synthesized sounds (speech A+B) of the microphone input signals of terminals A and B are output to terminal C, respectively.

The spurt detection unit 33 (33-1 to 33-3) performs spurt detection processing on the synthesized speech output from the mixer unit 30. FIG. The spurt detection unit 33 can apply various spurt detection processes, and for example, the technology described in Patent Document 2 can be applied. The voice presence detection unit 33 gives the voice presence determination result to the condition determination unit 34 .

The condition determination unit 34 determines that one of the determination results of the voice detection units 33 (33-1 to 33-3) for each synthetic sound output from the mixer unit 30 is voice. It is determined whether the result is obtained (OR condition). The condition determination unit 34 gives the OR condition determination result to the echo detection unit 35 . The voice presence detection means is implemented by, for example, the voice presence detection unit 33 and the condition determination unit 34 described above.

The band limiting unit 31 (31-1 to 31-3) limits (erases) a predetermined band of the synthesized sound output from the mixer unit 30, and outputs the signal from the speaker of each conference terminal 2. This is the output. For example, the technique described in Patent Document 3 is applied to limit the 2.5-3.0 kHz band, which has little effect on hearing.

The band limit detection unit 32 (32-1 to 32-3) detects the signal (speaker Detects the microphone input of the signal output by the The band limit detector 32 sends the band limit detection result (whether or not the microphone input signal of the conference terminal 2 contains a signal band-limited by the band limiter 31 of each conference terminal 2) to the echo detector 35. give. The details of the band limit detector 32 will be described in the section on operation.

The echo detector 35 (35-1 to 35-3) detects whether or not the microphone input (microphone input signal) of each conference terminal 2 contains an echo component. Specifically, the echo detection unit 35 performs echo detection based on the band limitation detection result of the band limitation detection unit 32 and the OR condition determination result of the condition determination unit 34 . For example, when each conference terminal 2 satisfies "microphone input of band-limited signal" and "sound output from any speaker", the echo detection unit 35 detects the echo. The echo detector 35 gives the echo detection result to the echo canceler 36 .

The echo canceller 36 (36-1 to 36-3) cancels the echo of the microphone input signal when the echo detection result by each echo detector 35 is echo detection.

For example, the echo cancellation unit 36 can apply an echo suppressor. When the echo canceller 36 uses an echo suppressor, the simplest method is to attenuate the microphone input signal as it is when echo is detected. The echo cancellation unit 36 may divide the signal into bands and attenuate each band as necessary (in this case, FFT (Fast Fourier Transform) or the like may be used). If the echo canceller 36 uses an echo suppressor, for example, the technology (method) described in Patent Document 4 can be applied.

Note that the echo cancellation unit 36 may use an adaptive echo canceller. For example, the technology (method) described in Patent Document 3 or 5 can be applied to implement an adaptive echo canceller.

(A-2) Operation of Embodiment Next, the operation of the conference system 1 according to the embodiment having the configuration as described above will be described. Since the conference system 1 is characterized by echo detection processing (mainly band limit detection processing), this point will be mainly described in detail below with reference to the drawings.

(A-2-1) Processing of Band Limiter 31 and Band Limit Detector 32 FIG. 2 is a block diagram showing the detailed configuration of the band limiter according to the embodiment.

As shown in FIG. 2 , the band limiting section 31 has a BEF (Band Elimination Filter) 310 . The BEF 310 eliminates, for example, the 2.5 to 3.0 kHz band, which has little effect on hearing, from the output signal from the mixer section 30 (synthesized sound output from the speaker of each conference terminal 2). Then, the signal in which the signal component in the specific band of 2.5 to 3.0 kHz is attenuated is transmitted to each conference terminal 2 and output from the speaker.

On the other hand, the microphone input signal from each conference terminal 2 is transmitted to the conference server 3. The conference server 3 gives the received microphone input signal of each conference terminal 2 to each band limit detector 32 (32-1 to 32-3).

FIG. 3 is a block diagram showing the detailed configuration of the band limit detection unit according to the embodiment.

A band-pass filter (BPF) 321 passes a signal in a specific band (signal in a band of 2.5 to 3.0 kHz) among microphone input signals. Also, the output signal from the BPF 321 is given to the power calculator 323 .

A band-pass filter (BPF) 322 passes a signal in a specific band (signal in a band of 2.0 to 2.5 kHz) among microphone input signals. Also, the output signal from the BPF 322 is given to the power calculator 324 .

The power calculation unit 323 receives the output signal from the BPF 321 , calculates a power value P_BPF1 by averaging each sample value of the input signal, and provides it to the band limit determination unit 325 .

Here, as the method of calculating the average power value of the input signal by the power calculating unit 323, for example, a method of calculating the power value P_BPF1 by averaging the values obtained by squaring each sample value of the input signal using an FIR filter is applied. be able to. The method by which the power calculator 323 calculates the average power value of the input signal is not limited to the method of squaring the sample value, and the absolute value of the sample value of the input signal may be used. Also, the power calculator 323 may use an IIR type LPF having an appropriate time constant instead of the FIR filter.

The power calculation unit 324 receives the output signal from the BPF 322 , calculates a power value P_BPF2 by averaging each sample value of the input signal, and provides it to the band limit determination unit 325 . As the method of calculating the average power value of the input signal by the power calculator 324, the same method as that of the power calculator 323 can be applied.

Based on the power values P_BPF1 and P_BPF2 from the power calculator 323 and the power calculator 324, the band limit determination unit 325 determines the band limit state (that is, the echo state), and sends the determination result to the echo detector 35. output to

4A and 4B are explanatory diagrams explaining the frequency characteristics of the microphone input signal of the conference terminal according to the embodiment. FIG. 4A shows the frequency characteristics of the microphone input signal when an echo is input from the microphone of the conference terminal 2, and FIG. It shows frequency characteristics.

A signal in a specific band of 2.5 to 3.0 kHz of the speaker output signal of each conference terminal 2 is attenuated by the band limiter 31 (31-1 to 31-3).

Therefore, when an echo is input from the microphone (for example, when terminal A and terminal B exist in the same room, the output signal from the speaker of terminal A and/or terminal B is input to the microphone of terminal A and/or terminal B). ), the microphone input signal is attenuated from 2.5 to 3.0 kHz, as shown in FIG. 4A.

Therefore, the average power value P_BPF1 of the signal passing through the BPF 321 that passes the specific band of 2.5 to 3.0 kHz is the average power value of the signal that passes through the BPF 322 that passes the band of 2.0 to 2.5 kHz other than the specific band. It tends to be attenuated more than P_BPF2.

On the other hand, when the voice spoken is input to the microphone of the terminal, the user of the conference terminal 2 speaks, so the user's voice is directly input to the microphone. Therefore, the microphone input signal has a large signal power in the specific band of 2.5 to 3.0 kHz.

Therefore, the power P_BPF1 in the 2.5-3.0 kHz band (within the BEF stopband) does not tend to attenuate significantly compared to the power P_BPF2 in the 2.0-2.5 kHz band (outside the BEF stopband). .

Based on the frequency characteristics of the microphone input signal shown in FIGS. 4A and 4B, the band limit determination unit 325 determines that there is band limit detection (echo detection) when the condition of the following formula (1) is satisfied. When the condition (1) is not satisfied, it is determined that the band limit is not detected.

P_BPF1/P_BPF2<TH (1)
Expression (1) is conditioned on the fact that the ratio of the average power value P_BPF1 of the signal that has passed through the BPF 321 and the average power value P_BPF2 of the signal that has passed through the BPF 322 is less than the threshold TH. This is because, as shown in FIG. 4A, it is determined that the amount of attenuation of the power value of the signal in the specific band of 2.5 to 3.0 kHz is small during echo input.

The band limit determination unit 325 provides the echo detection unit 35 with the determination result of whether or not the condition of formula (1) is satisfied.

(A-2-2) Processing of Echo Detector 35 and Others The output signal of the mixer 30 is subjected to voice activity detection by each of the voice activity detectors 33 (33-1 to 33-3). The condition determination unit 34 provides the echo detection unit 35 with information as to whether or not the presence of sound is detected by any of the presence detection units 33 (33-1 to 33-3) (result of OR condition determination).

In the echo detection unit 35, the band limit detection unit 32 (band limit determination unit 325) detects the band limitation (echo detection) in the microphone input signal, and the condition determination unit 34 determines the output signal of each mixer unit 30. Echo is detected only when one of them is spurt (otherwise, echo is not detected).

When an echo is detected by the echo detection unit 35, the echo canceller 36 applies an echo suppressor or the like (for example, described in the above-described Patent Document 3, Patent Document 4, Patent Document 5, etc.) to the microphone input signal. Echo cancellation (restriction) processing is performed by the method of (2).

(A-3) Effect of the Embodiment According to the present embodiment, in the conference system 1, the audio (conference audio) in which the band that does not affect hearing is eliminated is output from the speaker of each conference terminal 2, By detecting the corresponding band attenuation for the microphone input signal, it became possible to detect the echo input immediately.

Then, at the time of echo detection, by canceling the echo of the microphone input signal of each conference terminal 2, howling in the Web conference can be prevented. There are two or more terminals (terminals A, B, . . . ) participating in the Web conference in the same room, and howling caused by one of them becoming a speaker mode can be prevented.

(B) Other Embodiments Although various modified embodiments have been mentioned in the above embodiments, the present invention can also be applied to the following modified embodiments.

(B-1) In the above embodiment, the conference server 3 plays a major role in preventing howling. However, as shown in FIG. Howling may be prevented. In FIG. 5, in addition to the configuration of the conference server 3 shown in FIG. , and a mixer section 21 as a second mixer means.

The mixer unit 21 adds the sound of the own terminal to the sound synthesized by the mixer unit 30 of the conference server 3 (synthesized sound output by the speaker for the own terminal).

In addition, the active voice detection unit 33 in FIG. 5 performs active voice detection determination on the synthetic voice synthesized by the mixer unit 21 (synthesized voice of all the conference terminals 2 participating in the conference). Processing other than this is the same (or similar) as the processing of each configuration shown in FIG.

(B-2) As a modified example, the configuration example of the conference system 1 is not limited to those shown in FIG. 1 or FIG. You can also place the

(B-3) In the above embodiment, on the premise that each conference terminal 2 is a speaker output, the band limitation unit 31 applies band limitation to the output signal of the mixer unit 30 of the conference server 3. However, if each conference terminal 2 can detect whether it is speaker output or headphone output, band limitation may be applied only to speaker output (in the case of headphone output, echo may be mixed in). (because of the low volatility).

(B-4) As a modification, when participants (conference terminals 2) existing at the same site (same room) can be identified, application of the above-described echo detection and elimination processing (howling prevention processing) The range may be narrowed down to only the conference terminals 2 existing at the same base. Whether or not the conference terminal 2 exists at the same location may be determined by, for example, GPS (Global Positioning System).

(B-5) In the above embodiment, the presence of sound is detected in the voice output from the speaker of the conference terminal 2. Detection may be performed.

(B-6) In the above embodiment, voice activity is detected before band limitation, but voice activity detection may be performed after band limitation.

1... conference system, 2 (2-A to 2-C)... conference terminal, 3... conference server, 21, 30... mixer unit, 31 (31-1 to 31-3)... band limiter, 32 (32- 1 to 32-3) ... band limit detection section, 33 (33-1 to 33-3) ... voice detection section, 34 ... condition determination section, 35 (35-1 to 35-3) ... echo detection section, 36 (36-1 to 36-3)... Echo cancellers, 321, 322... BPF, 323, 324... Power calculators, 325... Band limit determiners.

Claims

A conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
a first mixer means for synthesizing the microphone input signals of other conference terminals other than the own terminal and outputting the synthesized sound signal as a first synthesized sound signal to each of the conference terminals;
a spurt detection means for detecting that any one of the first synthesized sound signals is spurt;
specific band attenuation means for attenuating a specific band of each of the first synthesized sound signals and outputting the signal after attenuation as an output signal of each of the conference terminals;
band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band;
When the band limit detection means detects that the output signal of each of the conference terminals after attenuation of the specific band is included and the voice activity detection means determines that there is voice activity, the echo detection means for detecting that the microphone input signal contains an echo;
and echo cancellation means for canceling the echo detected by the echo detection means from the microphone input signal.
The band limit detection means is
a specific band extraction unit that extracts the specific band of the microphone input signal;
an out-of-specific-band extraction unit that extracts a band other than the specific band of the microphone input signal;
When the ratio of the power value of the output signal from the specific band extraction unit to the power value of the output signal from the out-of-specific band extraction unit is less than a threshold, the output signal of each conference terminal after specific band attenuation is included. 2. The conference system according to claim 1, further comprising: a bandwidth limit determination unit that determines that the bandwidth limit is set.
　The conference system according to claim 1, characterized in that the specific band is a band that has little effect on hearing.
The conference system according to claim 3, wherein the specific band is a band of 2.5 to 3.0 kHz.
A conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
a first mixer means for synthesizing the microphone input signals of other conference terminals other than the own terminal and outputting the synthesized sound signal as a first synthesized sound signal to each of the conference terminals;
a spurt detection means for detecting that any one of the first synthesized sound signals is spurt;
specific band attenuation means for attenuating a specific band of each of the first synthesized sound signals and outputting the signal after attenuation as an output signal of each of the conference terminals;
band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band;
When the band limit detection means detects that the output signal of each of the conference terminals after attenuation of the specific band is included and the voice activity detection means determines that there is voice activity, the echo detection means for detecting that the microphone input signal contains an echo;
and echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
A conference terminal in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server,
Synthesizing the microphone input signal of the own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of a conference terminal other than the own terminal acquired from the conference server and outputting the signal as a second synthesized sound signal a second mixer means;
a spurt detection means for detecting that the second synthesized sound signal is spurt;
specific band attenuation means for attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal;
band limit detection means for detecting that the output signal of the conference terminal after attenuation in a specific band is included in the microphone input signal;
When the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included and the voice activity detection means determines that there is voice activity, the microphone echo detection means for detecting that an input signal contains an echo;
and echo canceling means for canceling the echo detected by the echo detecting means from the microphone input signal.
A computer mounted on a conference server in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
a first mixer means for synthesizing the microphone input signals of other conference terminals other than the own terminal and outputting the synthesized sound signal as a first synthesized sound signal to each of the conference terminals;
a spurt detection means for detecting that any one of the first synthesized sound signals is spurt;
specific band attenuation means for attenuating a specific band of each of the first synthesized sound signals and outputting the signal after attenuation as an output signal of each of the conference terminals;
band limit detection means for detecting that the microphone input signal includes the output signal of each conference terminal after attenuation in a specific band;
When the band limit detection means detects that the output signal of each of the conference terminals after attenuation of the specific band is included and the voice activity detection means determines that there is voice activity, the echo detection means for detecting that the microphone input signal contains an echo;
An echo cancellation program characterized by functioning as an echo cancellation means for canceling an echo detected by said echo detection means from said microphone input signal.
A computer mounted on a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
Synthesizing the microphone input signal of the own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of a conference terminal other than the own terminal acquired from the conference server and outputting the signal as a second synthesized sound signal a second mixer means;
a spurt detection means for detecting that the second synthesized sound signal is spurt;
specific band attenuation means for attenuating a specific band of the first synthesized sound signal and outputting the signal after attenuation as an output signal of the conference terminal;
band limit detection means for detecting that the output signal of the conference terminal after attenuation in a specific band is included in the microphone input signal;
When the band limit detection means detects that the output signal of the conference terminal after attenuation of a specific band is included and the voice activity detection means determines that there is voice activity, the microphone echo detection means for detecting that an input signal contains an echo;
An echo cancellation program characterized by functioning as an echo cancellation means for canceling an echo detected by said echo detection means from said microphone input signal.
An echo cancellation method used in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server,
having first mixer means, voice presence detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means;
the first mixer means for each of the conference terminals to synthesize the microphone input signals of other conference terminals other than the own terminal and output the synthesized signal as a first synthesized sound signal;
the spurt detection means detects that any one of the first synthesized speech signals is spurt;
the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals and outputs the signal after attenuation as an output signal of each of the conference terminals;
the band limit detection means detects that the microphone input signal includes an output signal of each conference terminal after attenuation in a specific band;
The echo detection means detects that the output signal of each of the conference terminals after attenuation of a specific band is included by the band limit detection means, and determines that there is sound by the voice activity detection means. detected that the microphone input signal contains an echo,
The echo cancellation method, wherein the echo cancellation means cancels the echo detected by the echo detection means from the microphone input signal.
An echo cancellation method used in a conference server in a conference system including a plurality of conference terminals having microphones to which microphone input signals are input, and a conference server,
having first mixer means, voice presence detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means;
the first mixer means for each of the conference terminals to synthesize the microphone input signals of other conference terminals other than the own terminal and output the synthesized signal as a first synthesized sound signal;
the spurt detection means detects that any one of the first synthesized speech signals is spurt;
the specific band attenuation means attenuates a specific band of each of the first synthesized sound signals and outputs the signal after attenuation as an output signal of each of the conference terminals;
the band limit detection means detects that the microphone input signal includes an output signal of each conference terminal after attenuation in a specific band;
The echo detection means detects that the output signal of each of the conference terminals after attenuation of a specific band is included by the band limit detection means, and determines that there is sound by the voice activity detection means. detected that the microphone input signal contains an echo,
The echo cancellation method, wherein the echo cancellation means cancels the echo detected by the echo detection means from the microphone input signal.
An echo canceling method used for a conference terminal in a conference system including a plurality of conference terminals equipped with microphones to which microphone input signals are input, and a conference server,
a second mixer means, voice presence detection means, specific band attenuation means, band limit detection means, echo detection means, and echo cancellation means;
The second mixer unit synthesizes the microphone input signal of its own terminal with a first synthesized sound signal obtained by synthesizing the microphone input signal of a conference terminal other than the own terminal acquired from the conference server, and output as a synthesized sound signal of 2,
The spurt detection means detects that the second synthesized speech signal is spurt,
the specific band attenuation means attenuates a specific band of the first synthesized sound signal and outputs the signal after attenuation as an output signal of the conference terminal;
The band limit detection means detects that the microphone input signal includes an output signal of the conference terminal after attenuation in a specific band,
The echo detection means detects that the output signal of the conference terminal after attenuation of a specific band is included by the band limit detection means, and is determined to be voice by the voice activity detection means. detecting that the microphone input signal contains an echo,
The echo cancellation method, wherein the echo cancellation means cancels the echo detected by the echo detection means from the microphone input signal.