WO2018230062A1

WO2018230062A1 - Voice signal processing device, voice signal processing method and voice signal processing program

Info

Publication number: WO2018230062A1
Application number: PCT/JP2018/010330
Authority: WO
Inventors: 鬼塚一浩; 相川徹; 菊原靖仁; 実方友里
Original assignee: 株式会社オーディオテクニカ
Priority date: 2017-06-12
Filing date: 2018-03-15
Publication date: 2018-12-20
Also published as: CN110741563B; JP7122756B2; US20200105289A1; EP3641141A1; JP7323959B2; JP2022125069A; JPWO2018230062A1; US11227618B2; EP3641141A4; CN110741563A

Abstract

Provided are a voice signal processing device, a voice signal processing method and a voice signal processing program which can remove, with a simple circuit configuration, an echo signal included in input signals from a plurality of microphones. This device 1 includes: an output unit 20 which outputs a reception signal s2; an input unit 30 which collects an echo component and the voice of a speaker from the reception signal, and synthesizes signals respectively input from a plurality of microphones 3 that generate echo signals es according to the echo component and voice signals s1 according to the voice of the speaker so as to generate a transmission signal s4; a removal signal generation unit 70 which generates, on the basis of filter coefficients F, a removal signal s5 for removing the echo signals included in the transmission signal; a control unit 50 which calculates the filter coefficients; and a removal unit 80 which generates, on the basis of the transmission signal and the removal signal, an echo removal signal s6, wherein the control unit calculates individual filter coefficients k respectively corresponding to the plurality of microphones and synthesizes the individual filter coefficients to calculate the filter coefficient.

Description

Audio signal processing apparatus, audio signal processing method, and audio signal processing program

The present invention relates to an audio signal processing device, an audio signal processing method, and an audio signal processing program.

In recent years, teleconferencing systems such as teleconference systems and video conference systems using communication lines such as the Internet have been used for conferences between physically separate locations. In such a communication conference system, when a sound based on an audio signal received from one site (hereinafter referred to as “received signal”) is output from a speaker at the other site, the output audio is transmitted to the other site. When the microphone picks up sound, an acoustic echo is generated.

Acoustic echo is normally suppressed and removed by an echo canceller provided in the teleconference system. A general echo canceller includes an adaptive filter that generates a removal signal for removing an echo signal based on a received signal and an echo signal corresponding to an acoustic echo, and adds or subtracts the removal signal and the echo signal. Thus, the echo signal is removed.

As such an echo canceller, a multi-channel echo canceller that suppresses and removes echo signals from a plurality of microphones has been proposed (for example, see Patent Document 1).

JP 2002-252577 A

The echo canceller disclosed in Patent Document 1 includes a plurality of echo cancellation units corresponding to each of a plurality of microphones, and each echo cancellation unit removes an echo signal included in an input signal from a corresponding microphone, Supports multiple channels. In other words, the echo canceller disclosed in Patent Document 1 requires the same number of echo canceling units as the microphones, which complicates the circuit configuration and signal processing.

The present invention has been made to solve the above-described problems of the prior art, and an audio signal processing apparatus capable of removing echo signals contained in input signals from a plurality of microphones with a simple circuit configuration, and An object of the present invention is to provide an audio signal processing method and an audio signal processing program.

An audio signal processing apparatus according to the present invention picks up an output unit that outputs a received signal, an echo component of the received signal, and a speaker's voice, and an echo signal according to the echo component, An input unit that generates a transmission signal by synthesizing signals input from each of a plurality of microphones that generate an audio signal corresponding to the audio, and a filter coefficient that removes an echo signal included in the transmission signal A removal signal generation unit that generates a filter coefficient, a control unit that calculates a filter coefficient, and a removal unit that generates an echo cancellation signal based on the transmission signal and the removal signal. The filter coefficient is calculated by calculating individual filter coefficients corresponding to each of the plurality of microphones and combining the individual filter coefficients.

According to the present invention, it is possible to remove an echo signal included in an input signal from each of a plurality of microphones with a simple circuit configuration.

It is a functional block diagram which shows embodiment of the audio | voice signal processing apparatus concerning this invention. It is a flowchart which shows embodiment of the audio | voice signal processing method concerning this invention. It is a flowchart of the initialization process included in the audio | voice signal processing method of FIG. It is a flowchart of the initial learning process included in the audio | voice signal processing method of FIG. FIG. 5 is a functional block diagram showing a signal flow in the initial learning process of FIG. 4. It is a flowchart of the echo signal removal process included in the audio | voice signal processing method of FIG. It is a functional block diagram which shows the flow of the signal in the echo signal removal process of FIG. It is a flowchart of the specific process included in the signal processing of FIG. It is a functional block diagram which shows the flow of the signal in the specific process of FIG. It is a flowchart of the update process included in the signal processing of FIG. It is a functional block diagram which shows the flow of the signal in the update process of FIG. It is a functional block diagram which shows another embodiment of this invention.

Hereinafter, embodiments of an audio signal processing device, an audio signal processing method, and an audio signal processing program according to the present invention will be described with reference to the drawings.

● Audio signal processing device ●
First, an embodiment of an audio signal processing apparatus (hereinafter referred to as “this apparatus”) according to the present invention will be described.

Configuration of Audio Signal Processing Device FIG. 1 is a functional block diagram showing an embodiment of this device.
The apparatus 1 performs processing such as mixing, distribution, and balance adjustment of a signal (input signal) from a device such as a microphone 3 that converts voice or musical sound into an electrical signal. The device 1 is, for example, a mixer.

Hereinafter, for example, in a video conference performed between a speaker at a first site where the device 1 is installed and a speaker at a second site physically separated from the first site, the device 1 Is used, and the present apparatus 1 includes one speaker 2 disposed at the first base and six

microphones

3a, 3b, 3c, 3d, 3e, 3f (so-called six channels). An example of a connection is described. The first base and the second base are, for example, rooms such as a conference room.

Part of the sound (sound) from the second base output from the speaker 2 to the indoor space is collected by the microphone 3 via the indoor space. At this time, the microphone 3 generates a signal (hereinafter referred to as “echo signal”) es corresponding to a part of the sound (sound) output from the speaker 2 (hereinafter referred to as “echo component”), and the echo signal es is output. When the speaker at the first site speaks, the microphone 3 generates a signal (hereinafter referred to as “voice signal”) s1 corresponding to the voice of the speaker and outputs the voice signal s1. That is, when the speaker at the first site and the speaker at the second site are speaking, the signal output from the microphone 3 includes the audio signal s1 and the echo signal es. On the other hand, when only the speaker at the second site is speaking, the signal output from the microphone 3 includes an echo signal es.

The apparatus 1 includes a first input unit 10, a first output unit 20, a second input unit 30, a switching unit 40, a control unit 50, a storage unit 60, a removal signal generation unit 70, and a removal unit. 80 and a second output unit 90.

The device 1 is realized by a personal computer or the like. In the apparatus 1, an information processing program (hereinafter referred to as “this program”) according to the present invention operates, and this program cooperates with the hardware resources of the apparatus 1, and an audio signal according to the present invention to be described later. A processing method (hereinafter referred to as “the present method”) is realized.

In addition, by causing the computer (not shown) to execute the program, the computer can be caused to function in the same manner as the apparatus, and the computer can execute the method.

The first input unit 10 is connected to the communication device 4 at the second site via the communication line 5 such as a communication cable, and receives a voice signal (hereinafter referred to as “received signal”) s2 from the second site. The first input unit 10 includes, for example, a communication interface (I / F) such as a connector or a terminal, an amplifier, and the like. The received signal s <b> 2 from the first input unit 10 is input to the first output unit 20, the control unit 50, and the removal signal generation unit 70.

The first output unit 20 outputs the reception signal s2 from the first input unit 10 and the reference signal s3 from the control unit 50 to the speaker 2. The first output unit 20 includes, for example, an I / F, an amplifier, and the like. The first output unit 20 is an output unit in the present invention. The “reference signal s3” is a signal corresponding to a reference sound (for example, white noise) emitted through the speaker 2 when the apparatus 1 executes the method described later. The reference signal s3 is generated by the control unit 50.

The second input unit 30 is connected to each microphone 3a-3f and receives signals from the respective microphones 3a-3f. The second input unit 30 includes, for example, an I / F, an amplifier, an AD converter, a variable resistor, and the like. The second input unit 30 is an input unit in the present invention. The second input unit 30 generates signals (hereinafter referred to as “individual transmission signals”) s41, s42, s43, s44, s45, and s46 in which the gains of the received signals are adjusted, and the individual transmission signals s41. A signal (hereinafter referred to as “transmission signal”) s4 obtained by combining −s46 is generated. That is, the second input unit 30 generates the transmission signal s4 by combining the individual transmission signals s41 to s46, in other words, the signals from the respective microphones 3a to 3f. The second input unit 30 includes seven transmission paths (not shown) corresponding to the transmission signal s4 and the individual transmission signals s41 to s46. The generated transmission signal s4 and the individual transmission signals s41 to s46 are input to the switching unit 40. Hereinafter, when the individual transmission signals s41 to s46 are collectively referred to without distinction, the individual transmission signals s41 to s46 are referred to as individual transmission signals s40.

The adjustment of the gain of each signal is performed using a known gain sharing algorithm. “Gain sharing” compares the input from each microphone 3a-3f with the sum of the inputs (for example, when there is a signal input only from the microphone 3a and when there is a signal input from the microphone 3a-3f) And gain values g1, g2, g3, g4, and g5 set in the transmission paths (amplifiers) of signals from the microphones 3a to 3f so that the total gain value G becomes a constant value. , G6. In other words, gain sharing is an algorithm that adjusts the gain values g1-g6 corresponding to the microphones 3a-3f so that the total gain value G of each transmission line becomes a constant value. The gain values g1-g6 set for each transmission path are stored in the storage unit 60. Hereinafter, when the gain values g1-g6 are collectively referred to without distinction, the gain values g1-g6 are referred to as gain values g.

As described above, the transmission signal s4 and the individual transmission signals s41 to s46 are generated based on the signals from the respective microphones 3a to 3f. That is, the transmission signal s4 and the individual transmission signals s41 to s46 include the voice signal s1 and the echo signal es when the speaker at the first site is speaking, and the speaker at the first site is speaking. When there is no echo signal es is included.

The switching unit 40 switches signals input from the second input unit 30 to the control unit 50 and the removal unit 80 by switching the transmission path of the second input unit 30 based on the switching signal from the control unit 50. That is, the switching unit 40 switches a signal input to the control unit 50 or the removal unit 80 among the individual transmission signal s40 and the transmission signal s4 corresponding to each of the six microphones 3a to 3f. The switching unit 40 is composed of, for example, a rotary switch or a slide switch. The operation of the switching unit 40 will be described later.

The control unit 50 performs calculation of coefficients necessary for the apparatus 1 to execute the method described later, detection of the audio signal s1 and the reception signal s2, measurement of echo return loss, and the like. The controller 50 includes, for example, a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like. Consists of a circuit. The operation of the control unit 50 and the echo return loss will be described later.

The storage unit 60 is a means for storing information necessary for the apparatus 1 to execute the method described later. The storage unit 60 includes, for example, a recording device such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive), a semiconductor memory element such as a RAM (Random Access Memory), a flash memory, and the like. Information stored in the storage unit 60 will be described later.

The removal signal generation unit 70 generates a removal signal s5 based on the received signal s2 and the filter coefficient F. The removal signal generation unit 70 is, for example, a known FIR (Finite Impulse Response) filter. The “removal signal s5” is a signal for removing (suppressing) the echo signal es included in the transmission signal s4. That is, for example, the removal signal s5 is a signal having the same phase (or almost as close as possible) to the echo signal es included in the transmission signal s4. The generation of the removal signal s5 by the removal signal generation unit 70 will be described later.

“Filter coefficient F” is a coefficient used by the removal signal generation unit 70 to perform FIR processing on the received signal s2 and generate the removal signal s5. That is, the removal signal generation unit 70 performs FIR processing on the received signal s2 based on the filter coefficient F to generate a removal signal s5. The filter coefficient F is calculated by the control unit 50 as described above. The calculation of the filter coefficient F by the control unit 50 will be described later.

The removal unit 80 removes the echo signal es included in the transmission signal s4 based on the transmission signal s4 and the removal signal s5, and generates an echo removal signal s6. The removal unit 80 is an arithmetic circuit such as a subtraction circuit or an addition circuit, for example. The generation of the echo removal signal s6 by the removal unit 80 will be described later.

The “echo removal signal s6” is a signal obtained by removing (suppressing) the echo signal es from the transmission signal s4 as described above. The echo cancellation signal s6 includes a voice signal s1 and a residual echo signal res when the speaker at the first site is speaking, and includes a residual echo signal res when the speaker at the first site is not speaking. The “residual echo signal res” is a difference signal between the echo signal es and the removal signal s5. That is, for example, when the echo removal signal s6 completely removes the echo signal es (when the phase of the echo signal es and the removal signal s5 is the same phase), the residual echo signal res generated by subtracting both signals The signal level is “0”. The echo removal signal s6 is input to the control unit 50 and the second output unit 90.

The second output unit 90 is connected to the communication line 5 and outputs an echo cancellation signal s6 to the communication line 5. The second output unit 90 includes, for example, an I / F, an amplifier, and the like. The echo cancellation signal s6 from the second output unit 90 is input to the communication device 4 at the second site via the communication line 5.

● Audio signal processing method ●
Next, this method will be described.

FIG. 2 is a flowchart showing an embodiment of the present method.
In the present method, the present apparatus 1 uses an initialization process (ST1), an initial learning process (ST2), an echo signal removal process (ST3), a later-described specific process (ST4) (see FIG. 8), and a later-described process. Update processing (ST5) (see FIG. 10) is executed. In the present method, the apparatus 1 corresponds to six microphones 3 (6 channels) with one common FIR filter (removed signal generation unit 70) by executing each process (ST1-ST5) in the present method, and As will be described later, echo cancellation that automatically responds to environmental changes is realized.

The apparatus 1 executes an initialization process (ST1) after the apparatus 1 is turned on.

Initialization Process FIG. 3 is a flowchart of the initialization process (ST1).
The “initialization process (ST1)” is a process for executing parameter initialization, environment measurement, and the like.

First, the apparatus 1 initializes parameters (ST101). The “parameter” is a value set in an algorithm (adaptive algorithm) used for calculating the individual filter coefficient k described later.

Next, the present apparatus 1 uses the control unit 50 to perform environmental measurement of the first site where the present apparatus 1, the speaker 2, and the microphone 3 are installed (ST102). “Environmental measurement” is an item related to the transmission path (environment) of echo components from the speaker 2 to the microphone 3 at the first site where the apparatus 1, the speaker 2, and the microphone 3 are installed (for example, reverberation time, delay time). , Maximum echo amount, background noise). The apparatus 1 outputs a reference sound to the first site via the speaker 2 and collects an echo component of the reference sound via the microphone 3. The control unit 50 measures reverberation time, delay time, maximum echo amount, and background noise. The environmental measurement is executed for each microphone 3a-3f.

“Reverberation time” is the time required for the energy density of the reverberant sound of the same sound to decay by 60 dB after the reference sound is output (radiated) in the first base and the output of the reference sound is stopped. “Delay time” is the time required for the microphone 3 to collect the reference sound output from the speaker 2. The “maximum echo amount” is the maximum amount of echo components collected by the microphone 3 in the first site. “Background noise” is the sound pressure level of noise (such as air-conditioning sound or outdoor car sound) in the first site.

Next, the present apparatus 1 stores the measurement results of the environmental measurement corresponding to each of the microphones 3a-3f in the storage unit 60 (ST103).

Next, the present apparatus 1 specifies parameters based on the measurement results of the environmental measurements corresponding to the microphones 3a to 3f (ST104). The parameter is newly calculated based on the measurement result of the environmental measurement, or one parameter group is selected from a plurality of parameter groups stored in advance in the storage unit 60 based on the measurement result of the environmental measurement. Specified.

Initial Learning Process FIG. 4 is a flowchart of the initial learning process (ST2).
FIG. 5 is a functional block diagram showing a signal flow in the initial learning process (ST2).
In the figure, the main flow is indicated by arrows in the signal flow in the initial learning process (ST2).

“Initial learning process (ST2)” is a process in which the apparatus 1 first calculates (learns) the filter coefficient F after the apparatus 1 is powered on.

First, the device 1 switches the transmission path of the second input unit 30 to the transmission path from the microphone 3a using the switching unit 40 and the control unit 50 (ST201). The switching of the transmission path by the switching unit 40 is performed based on a switching signal from the control unit 50.

Next, the present apparatus 1 uses the second input unit 30 to generate an individual transmission signal s41 corresponding to the microphone 3a (ST202). The control unit 50 generates the reference signal s3 and inputs the reference signal s3 to the first output unit 20. The apparatus 1 outputs a reference sound from the speaker 2 and picks up the echo component of the reference sound by the microphone 3 (microphone 3a) corresponding to the transmission path switched in the above-described processing (ST201). The second input unit 30 generates an individual transmission signal s41 corresponding to the microphone 3a based on a signal input from the microphone 3a. The individual transmission signal s41 includes an echo signal es corresponding to the echo component of the reference sound. The individual transmission signal s41 is input from the second input unit 30 to the removal unit 80 via the switching unit 40.

Next, the apparatus 1 generates the individual removal signal s51 using the control unit 50 and the removal signal generation unit 70 (ST203). The “individual removal signal s51” is a signal for removing an echo signal (hereinafter referred to as “individual echo signal”) es1 included in the individual transmission signal s41. Hereinafter, when the individual removal signals s51-s56 are collectively referred to without distinction, the individual removal signals s51-s56 are referred to as individual removal signals s50.

The control unit 50 reads the initial value of the individual filter coefficient k1 corresponding to the microphone 3a from the storage unit 60, and inputs (sets) it to the removal signal generation unit 70. The removal signal generation unit 70 calculates the individual removal signal s51 based on the reference signal s3 and the individual filter coefficient k1. The individual removal signal s51 is input to the removal unit 80.

“Individual filter coefficient k1” is a transfer function of an acoustic transfer path from the speaker 2 to the microphone 3a. That is, the individual filter coefficient k1 is a coefficient used by the removal signal generation unit 70 to perform the FIR process on the reference signal and generate the individual removal signal s51. The “reference signal” is a signal that is a basis for the removal signal generation unit 70 to generate the individual removal signal s51 based on the individual filter coefficient k1 (the reference signal s3 in the initial learning process (ST2), the echo signal removal process (ST3), The specific process (ST4) and the update process (ST5) are received signals s2).

Next, the present apparatus 1 uses the removal unit 80 to remove the individual echo signal es1 included in the individual transmission signal s41 and generate the individual echo removal signal s61 (ST204). The removal unit 80 generates the individual echo removal signal s61 based on the individual transmission signal s41 and the individual removal signal s51. The individual echo removal signal s61 is input from the removal unit 80 to the control unit 50 and the second output unit 90. At this time, the second output unit 90 mutes the individual echo removal signal s61. As a result, the individual echo removal signal s61 is not transmitted to the second site.

Note that the second output unit may attenuate the individual echo cancellation signal, or may mute the individual echo cancellation signal and transmit dummy noise (pink noise) to the second site.

The “individual echo cancellation signal s61” is a signal obtained by removing (suppressing) the individual echo signal es1 from the individual transmission signal s41. The individual echo removal signal s61 includes an individual residual echo signal res11. Hereinafter, when the individual echo cancellation signals s61-s66 are collectively referred to without distinction, the individual echo cancellation signals s61-s66 are referred to as individual echo cancellation signals s60. The “individual residual echo signal res11” is a difference signal between the individual echo signal es1 and the individual removal signal s51. The removing unit 80 subtracts the individual removal signal s51 from the individual transmission signal s41 to generate the individual echo removal signal s61. Hereinafter, when the individual residual echo signals res11 to res16 are collectively referred to without distinction, the individual residual echo signals res11 to res16 are referred to as individual residual echo signals res10.

Next, the present apparatus 1 uses the control unit 50 to calculate an individual filter coefficient k1 corresponding to the microphone 3a (ST205). The control unit 50 reads the gain value g1 (corresponding to the microphone 3a) set in the transmission path of the signal from the microphone 3a from the storage unit 60. Next, the control unit 50 is known based on the read gain value g1, the reference signal s3, and the individual echo removal signal s61 (that is, the individual residual echo signal res11 included in the individual echo removal signal s61). Is used to calculate the individual filter coefficient k1 corresponding to the microphone 3a.

The calculated individual filter coefficient k1 is stored in the storage unit 60 (ST206). As a result, the individual filter coefficient k1 stored in the storage unit 60 is updated from the initial value to the calculated value.

The apparatus 1 repeats the processing (ST201-ST206) until the individual filter coefficients k1-k6 corresponding to all the microphones 3a-3f are calculated (“No” in ST207). Here, as described above, the parameter of the adaptive algorithm is specified based on the measurement result of the environmental measurement corresponding to each microphone 3. In other words, the control unit 50 calculates the individual filter coefficient k corresponding to each microphone 3 based on the measurement result of the environmental measurement corresponding to each microphone 3.

When the individual filter coefficients k1-k6 corresponding to all the microphones 3a-3f have been calculated (“Yes” in ST207), the present apparatus 1 calculates the filter coefficient F using the control unit 50 (ST208). The control unit 50 reads the gain values g1-g6 of the signal transmission paths from the microphones 3a-3f and the individual filter coefficients k1-k6 from the storage unit 60, and the gain values g1-g6 and the individual filter coefficients k1. Based on −k6, the filter coefficient F is calculated. The filter coefficient F is calculated by combining the individual filter coefficients k1-k6. Hereinafter, when the individual filter coefficients k1-k6 are collectively referred to without distinction, the individual filter coefficients k1-k6 are referred to as individual filter coefficients k.

The synthesis of each individual filter coefficient k1-k6 is performed by multiplying the individual filter coefficient k by the gain value g and adding the result for each individual filter coefficient k1-k6 corresponding to each microphone 3a-3f. Is done. That is, the filter coefficient F is a value obtained by multiplying the individual filter coefficient k1 corresponding to the microphone 3a and the gain value g1, a value obtained by multiplying the individual filter coefficient k2 corresponding to the microphone 3b and the gain value g2, and the microphone 3c. A value obtained by multiplying the corresponding individual filter coefficient k3 and the gain value g3, a value obtained by multiplying the individual filter coefficient k4 and the gain value g4 corresponding to the microphone 3d, an individual filter coefficient k5 and a gain value g5 corresponding to the microphone 3e. And a value obtained by multiplying the individual filter coefficient k6 corresponding to the microphone 3f and the gain value g6 are calculated.

The calculated filter coefficient F is stored in the storage unit 60 and input (set) to the removal signal generation unit 70 (ST209). As a result, the removal signal generation unit 70 can generate the removal signal s5 based on the filter coefficient F.

Thus, in this method, the filter coefficient F is calculated by calculating and combining the individual filter coefficients k1-k6 corresponding to the microphones 3a-3f. Therefore, unlike the conventional apparatus that includes an echo canceling unit for each microphone, the present device 1 uses a single echo canceling unit (corresponding to the control unit 50, the removal signal generating unit 70, and the removing unit 80) to each microphone 3a- The echo signal es included in the signal from 3f can be removed. That is, the present apparatus 1 executes echo cancellation by using one common FIR filter (removal signal generation unit 70) for the inputs from the six microphones 3a-3f. That is, the present apparatus 1 can remove the echo signal es included in the signal from each microphone 3a-3f with a simple circuit configuration as compared with the conventional apparatus.

Echo Signal Removal Process FIG. 6 is a flowchart of the echo signal removal process (ST3).
FIG. 7 is a functional block diagram showing a signal flow in the echo signal removal process (ST3).
In the figure, the main flow of the signal flow in the echo signal removal process (ST3) is indicated by an arrow.

The “echo signal removal process (ST3)” is performed when the received signal s2 is included in the signal received by the first input unit 10, for example, during a meeting between the first site and the second site. Is a process of removing the echo signal es corresponding to the received signal s2. As described above, the signal (received signal s <b> 2) from the first input unit 10 is input to the first output unit 20, the control unit 50, and the removal signal generation unit 70.

First, the apparatus 1 uses the control unit 50 to detect whether or not the received signal s2 is included in the signal from the first input unit 10, that is, the presence or absence of the received signal s2 (ST301). For example, the control unit 50 detects the presence or absence of the received signal s2 by comparing a signal (signal level) from the first input unit 10 with a predetermined threshold value V1. When there is a reception signal s2, the transmission signal s4 includes an echo signal es corresponding to the reception signal s2.

“Threshold V1” is a threshold for the control unit 50 to detect whether or not the received signal s2 is included in the signal from the first input unit 10. The threshold value V1 is stored in the storage unit 60.

When the signal (signal level) from the first input unit 10 is smaller than the threshold value V1 (there is no received signal s2) (“No” in ST301), the present apparatus 1 repeats the detection of the presence or absence of the received signal s2.

On the other hand, when the signal from the first input unit 10 is equal to or higher than the threshold value V1 (there is an incoming signal s2) (“Yes” in ST301), the present apparatus 1 uses the switching unit 40 and the control unit 50 to The transmission path of input unit 30 is switched to the transmission path of transmission signal s4 (ST302).

Next, the present apparatus 1 generates a transmission signal s4 using the second input unit 30 (ST303). The transmission signal s4 is input from the second input unit 30 to the control unit 50 and the removal unit 80 via the switching unit 40.

Next, the present apparatus 1 generates a removal signal s5 using the control unit 50 and the removal signal generation unit 70 (ST304). The control unit 50 reads the filter coefficient F from the storage unit 60 and inputs (sets) the filter coefficient F to the removal signal generation unit 70. The removal signal generation unit 70 generates a removal signal s5 from the received signal s2 based on the filter coefficient F input from the control unit 50. The filter coefficient F is the filter coefficient F calculated in the initial learning process (ST2) or the filter coefficient F calculated and updated in the update process (ST5) described later.

Next, the present apparatus 1 uses the removal unit 80 to remove the echo signal es included in the transmission signal s4 and generate an echo removal signal s6 (ST305). The removal unit 80 generates an echo removal signal s6 based on the transmission signal s4 and the removal signal s5. The echo removal signal s6 is input to the control unit 50 and the second output unit 90.

Next, the present apparatus 1 measures an echo return loss (ERL) using the control unit 50 (ST306).

“ERL” is a level difference between the transmission signal s4 and the echo cancellation signal s6, that is, the magnitude (signal level) of the residual echo signal res included in the echo cancellation signal s6. The ERL is influenced by, for example, a change in the installation location of the microphone 3 or a change in the output level of the speaker 2. That is, for example, ERL deteriorates when the position of the microphone 3 is moved by the speaker and the transmission path of the echo component changes (environmental change). The control unit 50 measures ERL based on the signal level of the transmission signal s4 and the signal level of the echo cancellation signal s6. That is, the control unit 50 measures the ERL by subtracting the signal level of the echo removal signal s6 from the signal level of the transmission signal s4.

Next, the present apparatus 1 uses the control unit 50 to compare the measured ERL with a predetermined threshold value V2 (ST307). The “threshold value V2” is a threshold value indicating whether or not the echo signal es is sufficiently removed by the apparatus 1 (whether the signal level of the residual echo signal res is high). That is, when the removal of the echo signal es by the present apparatus 1 is insufficient, the ERL becomes equal to or more than the threshold value V2 (deteriorates). On the other hand, when the removal of the echo signal es by the apparatus 1 is sufficient, the ERL is smaller than the threshold value V2. The threshold value V2 is a reference value in the present invention. The threshold value V2 is stored in the storage unit 60.

When the ERL is smaller than the threshold value V2 (“No” in ST307), the apparatus 1 uses the second output unit 90 to output the echo removal signal s6 to the communication apparatus 4 at the second site (ST308) for processing. Return to (ST301).

On the other hand, when the ERL is equal to or higher than the threshold value V2 (“Yes” in ST307), the apparatus 1 uses the control unit 50 to determine whether the transmission signal s4 includes the audio signal s1 (whether the audio signal s1 is present). ) Is detected (ST309). For example, the control unit 50 detects the presence or absence of the audio signal s1 by comparing the transmission signal s4 (signal level) from the second input unit 30 with a predetermined threshold value V3.

“Threshold V3” is a threshold for the control unit 50 to detect whether or not the audio signal s1 is included in the transmission signal s4 from the second input unit 30. The threshold value V3 is stored in the storage unit 60.

When the signal level of the transmission signal s4 is equal to or higher than the threshold value V3 (the voice signal s1 is present) (“Yes” in ST309), the present apparatus 1 uses the second output unit 90 to transmit the echo cancellation signal s6 to the second base. Is output to the communication device 4 (ST308), and the process returns to ST301.

On the other hand, when the signal level of the transmission signal s4 is lower than the threshold value V3 (no audio signal s1) (“No” in ST309), the present apparatus 1 uses the second output unit 90 to output the echo cancellation signal s6 as the first signal. The data is output to the communication devices 4 at two sites (ST310), and the specific process (ST4) is executed.

Thus, when the ERL is equal to or greater than the threshold value V2, the present apparatus 1 executes the specific process (ST4) at the timing when the received signal s2 is present and the voice signal s1 is absent. That is, the present device 1 is based on the comparison result between the ERL and the threshold value V2, and when the echo signal es is included in the transmission signal s4 and the audio signal s1 is not included in the transmission signal s4, A specific process (ST4) is executed. In other words, when the apparatus 1 detects an environmental change during the execution of the echo signal removal process (ST3), the apparatus 1 executes the specific process (ST4).

When the ERL value is measured as a negative value, the threshold value V2 is a negative value, and the present apparatus 1 can reverse the magnitude comparison between the ERL and the threshold value V2 in the above-described processing (ST307). Good. That is, for example, when the negative ERL is equal to or less than the threshold value V2, the apparatus uses the control unit to detect whether or not a voice signal is included in the transmission signal (the presence or absence of a voice signal). Also good.

Specific processing FIG. 8 is a flowchart of the specific processing (ST4).
FIG. 9 is a functional block diagram showing a signal flow in the specific process (ST4).
In the figure, of the signal flow in the specific processing (ST4), the main flow is indicated by arrows. For convenience of explanation, FIG. 4 shows only signals corresponding to signals from the microphone 3a among the microphones 3a to 3f.

“Specific processing (ST4)” is processing for specifying the microphone 3 as a specific microphone or a non-specific microphone. The “specific microphone” is a microphone 3 in which the corresponding individual filter coefficient k is not appropriate (deviation), that is, the microphone 3 for which the individual filter coefficient k is to be updated. The deterioration of ERL is caused by the shift of the filter coefficient F with respect to the echo signal es, that is, the shift of the individual filter coefficients k1-k6 with respect to the individual echo signals es1-es6. Therefore, the individual filter coefficient k corresponding to the specific microphone needs to be updated to an appropriate value. The “non-specific microphone” is a microphone 3 in which the corresponding individual filter coefficient k is appropriate (not shifted), that is, a microphone 3 that is not subject to update of the individual filter coefficient k.

First, the present apparatus 1 uses the control unit 50 to detect whether or not the voice signal s1 is included in the transmission signal s4 (presence / absence of the voice signal s1) (ST401). The detection of the presence / absence of the audio signal s1 (ST401) is the same process as the detection of the presence / absence of the audio signal s1 (ST309) in the echo signal removal process (ST3).

When the transmission signal s4 does not include the audio signal s1 (no audio signal s1) (“No” in ST401), the present apparatus 1 uses the switching unit 40 and the control unit 50 to generate a second input unit. The 30 transmission paths are switched to the transmission path of the signal from the microphone 3a (ST402).

Next, the present apparatus 1 uses the second input unit 30 to generate an individual transmission signal s41 based on the signal from the microphone 3a (ST403). The individual transmission signal s41 is input to the removal unit 80 via the switching unit 40.

Next, the present apparatus 1 generates an individual removal signal (specific removal signal) s51 using the control unit 50 and the removal signal generation unit 70 (ST404). The control unit 50 reads the individual filter coefficient k1 corresponding to the microphone 3a from the storage unit 60 and inputs it to the removal signal generation unit 70. The removal signal generation unit 70 generates an individual removal signal s51 based on the received signal s2 and the individual filter coefficient k1. The individual removal signal s51 is input to the removal unit 80.

Next, the present apparatus 1 uses the removal unit 80 to remove the individual echo signal es1 included in the individual transmission signal s41 and generate the individual echo removal signal s61 (ST405). The removal unit 80 generates the individual echo removal signal s61 based on the individual transmission signal s41 and the individual removal signal s51. The individual echo removal signal s61 is input from the removal unit 80 to the control unit 50 and the second output unit 90.

Next, the present apparatus 1 measures the individual ERL using the control unit 50 (ST406).

“Individual ERL” is a level difference between the individual transmission signal s41 and the individual echo removal signal s61, that is, the magnitude (signal level) of the individual residual echo signal res11 included in the individual echo removal signal s61. The control unit 50 measures the individual ERL based on the signal level of the individual transmission signal s41 and the signal level of the individual echo removal signal s61. That is, the control unit 50 measures the individual ERL by subtracting the signal level of the individual echo removal signal s61 from the signal level of the individual transmission signal s41.

Next, the present apparatus 1 uses the control unit 50 to compare the measured individual ERL with a predetermined threshold value V4 (ST407).

The “threshold value V4” is a threshold value indicating whether or not the removal of the individual echo signal es1 by the apparatus 1 is sufficient (whether the signal level of the individual residual echo signal res11 is large). That is, when the removal of the individual echo signal es1 by the present apparatus 1 is insufficient, the individual ERL becomes equal to or higher than the threshold value V4 (deteriorates). On the other hand, when the individual echo signal es1 is sufficiently removed by the apparatus 1, the individual ERL is smaller than the threshold value V4. The threshold value V4 is an individual reference value in the present invention. The threshold value V4 is stored in the storage unit 60.

When the individual ERL is smaller than the threshold value V4 (“No” in ST407), the present apparatus 1 specifies the microphone 3a as a non-specific microphone (ST408). On the other hand, when the individual ERL is greater than or equal to threshold value V4 (“Yes” in ST407), apparatus 1 identifies microphone 3a as a specific microphone (ST409). The specific result is stored in storage unit 60 (ST410). At this time, the individual echo removal signal s61 is output from the second output unit 90.

This apparatus 1 repeats the processing (ST401-ST410) on the signals from the remaining microphones 3b-3f until all microphones 3a-3f are specified as specific microphones or non-specific microphones (“No” in ST411) "). That is, the apparatus 1 uses the switching unit 40 to input the individual transmission signals s42 to s46 corresponding to the remaining microphones 3b to 3f to the removal unit 80 while switching, and specifies each microphone 3a to 3f. It is determined as either a microphone or a non-specific microphone.

The apparatus 1 executes the update process (ST5) when each microphone 3a-3f is specified as a specific microphone or a non-specific microphone (“Yes” in ST411). At this time, the microphone 3 includes a specific microphone and a non-specific microphone.

When the transmission signal s4 includes the voice signal s1 (the voice signal s1 is present) (“Yes” in ST401), the apparatus 1 ends (interrupts) the specific process (ST4) and removes the echo signal. The process (ST3) is executed. That is, when the control unit 50 detects the audio signal s1 before the specific process (ST4) is completed, the apparatus 1 interrupts the specific process (ST4) and executes the echo signal removal process (ST3). When the specific process (ST4) is interrupted, the apparatus 1 determines that the audio signal s1 is not included in the transmission signal s4 in the echo signal removal process (ST3), and the interrupted process (specific microphone or The specific process (ST4) is executed (restarted) from the process for the signal from the microphone 3 not specified as the non-specific microphone. That is, for example, if the specific process (ST4) is interrupted to the microphone 3d among the microphones 3a to 3f, the specific process (ST4) is resumed from the microphone 3e.

Note that, when the specific process is interrupted, this apparatus may execute the specific process from the beginning, that is, all microphones.

Further, when the individual ERL value is measured as a negative value, the threshold value V4 is a negative value, and this apparatus 1 reverses the comparison of the individual ERL and the threshold value V4 in the above-described processing (ST407). May be. That is, for example, when the individual ERL that is a negative value is equal to or less than the threshold value V4, the apparatus may identify the microphone 3 corresponding to the individual ERL as a specific microphone.

As described above, the present apparatus 1 is based on the comparison result between the individual ERL and the individual reference value (threshold value V4), the specific microphone that is the target of updating the individual filter coefficient k from among the plurality of microphones 3a-3f, Non-specific microphones that are not targeted for updating the individual filter coefficient k are determined. That is, when the ERL deteriorates, the device 1 determines a specific microphone at a timing at which the echo signal es is included in the transmission signal s4 and the audio signal s1 is not included in the transmission signal s4. Therefore, the present apparatus 1 limits the microphone 3 that needs to update the individual filter coefficient k, and reduces the time and processing load required for updating the individual filter coefficient k and the filter coefficient F.

Update Process FIG. 10 is a flowchart of the update process (ST5).
FIG. 11 is a functional block diagram showing a signal flow in the update process (ST5).
In the figure, of the signal flow in the update process (ST5), the main flow is indicated by arrows. The figure shows only each signal corresponding to the signal from the microphone 3c.

“Update process (ST5)” is a process of updating the filter coefficient F by updating the individual filter coefficient k corresponding to the microphone 3 specified as the specific microphone. That is, for example, when the microphone 3a is specified as a specific microphone, the device 1 updates the filter coefficient F by updating the individual filter coefficient k1 corresponding to the microphone 3a. When the

microphones

3e and 3f are specified as specific microphones, the apparatus 1 updates the filter coefficients F by updating the individual filter coefficients k5 and k6 corresponding to the

microphones

3e and 3f. Hereinafter, a case where the microphone 3c is specified as a specific microphone will be described as an example.

First, the present apparatus 1 uses the control unit 50 to detect whether or not the audio signal s1 is included in the transmission signal s4 (or the individual transmission signal s43) (the presence or absence of the audio signal s1) (ST501). ). The detection of the presence / absence of the audio signal s1 (ST501) is the same processing as the detection of the presence / absence of the audio signal s1 (ST309) in the echo signal removal processing (ST3).

First, the present apparatus 1 uses the switching unit 40 and the control unit 50 to switch the transmission path of the second input unit 30 to the transmission path of the signal from the specific microphone (microphone 3c) (ST502).

Next, this apparatus 1 generates an individual transmission signal s43 based on a signal from a specific microphone (microphone 3c) (ST503).

Next, the present apparatus 1 generates an individual removal signal s53 using the control unit 50 and the removal signal generation unit 70 (ST504). The control unit 50 reads out the individual filter coefficient k3 corresponding to the specific microphone from the storage unit 60 and inputs it to the removal signal generation unit 70. The removal signal generation unit 70 generates an individual removal signal s53 based on the received signal s2 and the individual filter coefficient k3. The individual removal signal s53 is a specific removal signal in the present invention. The individual removal signal s53 is input to the removal unit 80.

Next, the present apparatus 1 uses the removal unit 80 to remove the individual echo signal es3 included in the individual transmission signal s43 and generate the individual echo removal signal s63 (ST505). The individual echo removal signal s63 is a specific echo removal signal in the present invention. The individual echo removal signal s63 is input to the control unit 50 and the second output unit 90.

Next, the present apparatus 1 measures an individual echo return loss (individual ERL) using the control unit 50 (ST506).

Next, the present apparatus 1 uses the control unit 50 to compare the measured individual ERL with a predetermined threshold value V4 (ST507).

When the individual ERL is greater than or equal to the threshold value V4 (“Yes” in ST507), the present apparatus 1 calculates the individual filter coefficient k3 using the control unit 50 (ST508). The control unit 50 reads the gain value g3 set in the transmission path of the signal from the specific microphone from the storage unit 60. The controller 50 reads the read gain value g3, the individual echo removal signal s63 (that is, the individual residual echo signal res13 included in the individual (specific) echo removal signal s63), the received signal s2, and the environment measurement result. , The individual filter coefficient k3 is calculated.

Next, the present apparatus 1 stores the calculated individual filter coefficient k3 in the storage unit 60, that is, updates the individual filter coefficient k3 stored in the storage unit 60 (ST509), and returns to the processing (ST504). .

On the other hand, when the individual ERL is smaller than the threshold value V4 (“No” in ST507), the present apparatus 1 updates the filter coefficient F stored in the storage unit 60 using the control unit 50 (ST510). The control unit 50 includes the individual filter coefficient k3 corresponding to the updated specific microphone, the individual filter coefficients k1, k2, k4-k6 corresponding to the non-specific microphones, and the gain values g1-g6 set for each transmission path. Are read from the storage unit 60, and the filter coefficient F is calculated. The filter coefficient F is calculated in the same manner as the initial learning process (ST2) (ST208).

Next, the apparatus 1 stores the calculated filter coefficient F in the storage unit 60, that is, updates the filter coefficient F stored in the storage unit 60 (ST511), and performs echo signal removal processing (ST3). Return.

Thus, the present apparatus 1 specifies the microphone 3 whose individual ERL has deteriorated in the specific process (ST4) as the specific microphone, and executes the update process (ST5) only for the specific microphone. As a result, the processing load for updating the filter coefficient F is reduced, and the processing time is shortened.

The apparatus 1 always compares the ERL and the threshold value V2 (that is, monitors the ERL) in the echo signal removal process (ST3). When the ERL is greater than or equal to the threshold value V2, the present apparatus 1 performs a specific process (ST4) and an update process (ST5) at a timing when the echo signal es is included in the transmission signal s4 and the audio signal s1 is not included. Execute. In the specific process (ST4), the apparatus 1 compares the individual ERL and the threshold value V4 for each microphone 3. When the individual ERL is equal to or greater than the threshold value V4, the apparatus 1 determines a specific microphone that is an object of updating the individual filter coefficient k. In the update process (ST5), the present apparatus 1 determines the individual filter coefficient corresponding to the specific microphone based on the received signal s2 and the individual residual echo signal res10 included in the individual echo removal signal (specific echo removal signal) s60. k is calculated. The device 1 calculates and updates the filter coefficient F based on the individual filter coefficient k corresponding to the specific microphone and the individual filter coefficient k corresponding to the non-specific microphone.

Summary According to the embodiment described above, the control unit 50 calculates the individual filter coefficients k1-k6 corresponding to each of the plurality of microphones 3a-3f, and synthesizes the individual filter coefficients k1-k6 to filter coefficients. F is calculated. The removal signal generation unit 70 generates a removal signal s5 based on the calculated filter coefficient F. The removal unit 80 removes the echo signal es included in the transmission signal s4 based on the transmission signal s4 and the removal signal s5 (generates an echo removal signal s6). Therefore, this apparatus 1 differs from a conventional apparatus having an echo cancellation unit corresponding to each of a plurality of microphones, and signals from a plurality of microphones 3 (multi-channels) by a common FIR filter (removal signal generation unit 70). Can be removed. That is, the present apparatus 1 realizes a simple circuit configuration as compared with the conventional apparatus. That is, the present apparatus 1 removes the echo signals es included in the signals from the plurality of microphones 3 with a simple circuit configuration in which one common FIR filter is used.

Further, according to the embodiment described above, the control unit 50 does not include the audio signal s1 in the transmission signal s4 and includes the echo signal es in the transmission signal s4 (the reception signal). When there is s2, the filter coefficient F is calculated (updated). Therefore, the present apparatus 1 reduces the processing load for calculating (updating) the filter coefficient F as compared with the conventional apparatus that always calculates (updates) the filter coefficient.

Furthermore, according to the embodiment described above, the switching unit 40 is configured such that the voice signal s1 is not included in the transmission signal s4 and the echo signal es is included in the transmission signal s4 (the reception signal s2 is When there is, the individual transmission signals s41 to s46 are input to the control unit 50 while being switched. The control unit 50 calculates individual filter coefficients k1-k6 corresponding to the microphones 3a-3f based on signals from the plurality of microphones 3a-3f. That is, the apparatus 1 calculates the individual filter coefficients k1-k6 while switching the individual transmission signals s41-s46 by the switching unit 40. Therefore, the present apparatus 1 can calculate the individual filter coefficients k1-k6 corresponding to the six microphones 3a-3f by one common FIR filter (removal signal generation unit 70). That is, the apparatus 1 calculates the individual filter coefficient k corresponding to the plurality of microphones 3 with a simple circuit configuration, and calculates the filter coefficient F based on the individual filter coefficient k. As a result, the present apparatus 1 removes the echo signal es included in the signals from the plurality of microphones 3 with a simple circuit configuration.

Furthermore, according to the embodiment described above, the control unit 50 calculates the individual filter coefficient k based on the received signal s2 and the individual residual echo signal res10 included in the individual echo removal signal s60. That is, the present apparatus 1 improves the accuracy of the filter coefficient F by repeatedly calculating the individual filter coefficient k so that the individual residual echo signal res10 approaches “0” as much as possible, and reliably determines from the transmission signal s4. Echo signal es is removed (suppressed).

Furthermore, according to the embodiment described above, the control unit 50 updates the individual filter coefficients k1-k6 based on the gain values g1-g6 corresponding to each of the plurality of microphones 3a-3f. Therefore, the present apparatus 1 can calculate the individual filter coefficients k1-k6 with the gain values g1-g6 when the microphones 3a-3f pick up the echo components. As a result, the present apparatus 1 can improve (accurate) the filter coefficient F and reliably remove (suppress) the echo signal es from the transmission signal s4.

Furthermore, according to the embodiment described above, the control unit 50 always measures ERL in the echo signal removal process (ST3). Next, the control unit 50 updates the filter coefficient F stored in the storage unit 60 when the ERL is equal to or greater than the reference value (threshold value V2) and the speech signal s1 is not included in the transmission signal s4. That is, the apparatus 1 detects an environmental change at the timing when the ERL deteriorates, and updates the filter coefficient F. That is, this apparatus 1 reduces the processing load of calculation (update) of the filter coefficient F compared with the conventional apparatus which always calculates (updates) the filter coefficient F.

Furthermore, according to the embodiment described above, the control unit 50 measures the individual ERL based on the comparison result between the ERL and the reference value (threshold value V2). As a result, when the ERL deteriorates, the present apparatus 1 detects the deviation of the filter coefficient F (deterioration / suppression effect of the echo signal es) from the measurement result of the ERL corresponding to each microphone 3a-3f.

Furthermore, according to the embodiment described above, the control unit 50 determines the individual filter coefficient k from among the plurality of microphones 3a to 3f based on the comparison result between the individual ERL and the individual reference value (threshold value V4). The specific microphone to be updated is determined. That is, when the ERL deteriorates, the present apparatus 1 determines a specific microphone, thereby reducing the processing load and time required for updating the individual filter coefficient k and updating the filter coefficient F.

Furthermore, according to the embodiment described above, the control unit 50 calculates the individual filter coefficient k of the specific microphone. Next, the control unit 50 updates the filter coefficient F stored in the storage unit 60 based on the calculated individual filter coefficient k of the specific microphone and the individual filter coefficient k of the non-specific microphone. Therefore, the present apparatus 1 updates the filter coefficient F by calculating (updating) only the individual filter coefficient k of the specific microphone. That is, the present apparatus 1 reduces the processing load for the time required for updating the individual filter coefficient k and updating the filter coefficient F.

Furthermore, according to the embodiment described above, the control unit 50 performs the environmental measurement for each microphone 3 and calculates the individual filter coefficient k based on the measurement result of the environmental measurement. Therefore, this apparatus 1 can calculate the filter coefficient F according to the environment of the room (space) where this apparatus 1 is installed.

As described above, according to the embodiment described above, the apparatus 1 calculates the filter coefficient F based on the initialization process (ST1) and the initial learning process (ST2), and based on the filter coefficient F. Echo cancellation is executed (echo signal removal processing (ST3) is executed). When the apparatus 1 detects an environmental change during the execution of the echo signal removal process (ST3), the apparatus 1 performs the specific process (ST4) and the update process (ST5), thereby realizing automatic adjustment of the filter coefficient F. As a result, the present apparatus 1 executes multi-channel echo cancellation using a common filter, and also performs echo cancellation by automatically following environmental changes.

It should be noted that the number of microphones connected to the second input unit is not limited to “6” as long as it is plural.

In the embodiment described above, the present apparatus 1 is configured to include a pair of removal signal generation unit 70 and removal unit 80. Therefore, the removal signal generation unit 70 is dedicated to the generation of the individual removal signal s50 in the specific process (ST4) and the update process (ST5). As a result, the present apparatus 1 does not execute the echo signal removal process (ST3), the specific process (ST4), and the update process (ST5) at the same time.

Instead, this apparatus includes a set of removal signal generation unit and removal unit used for echo signal removal processing, and a set of removal signal generation unit and removal unit used for identification processing and update processing. You may provide two sets of removal signal production | generation parts and removal parts.

FIG. 12 is a functional block showing another embodiment of the present apparatus.
This figure shows an audio signal processing apparatus in which the present apparatus 1A includes a first removal signal generation unit 70A, a second removal signal generation unit 70B, a first removal unit 80A, and a second removal unit 80B. Indicates that The first removal signal generation unit 70A and the first removal unit 80A perform a specific process (ST4) and an update process (ST5). The second removal signal generation unit 70B and the second removal unit 80B execute an echo signal removal process (ST3).

According to this configuration, the apparatus 1A can simultaneously execute the echo signal removal process (ST3), the specific process (ST4), and the update process (ST5). Therefore, this apparatus 1A can remove (suppress) the echo signal es included in the signals from two or more microphones 3 with a simple circuit configuration including two echo canceller units.

DESCRIPTION OF SYMBOLS 1 Audio | voice signal processing apparatus 1A Audio | voice signal processing apparatus 20 1st output part (output part)
30 Second input section (input section)
40 switching unit 50 control unit 60 storage unit 70 removal signal generation unit 70A first removal signal generation unit 70B second removal signal generation unit 80 removal unit 80A first removal unit 80B second removal unit s1 voice signal s2 reception signal s3 reference signal s4 transmission signal s40 individual transmission signal s5 cancellation signal s50 individual cancellation signal s6 echo cancellation signal s60 individual echo cancellation signal es echo signal res residual echo signal res10 individual residual echo signal F filter coefficient k individual filter coefficient

Claims

An output unit for outputting a reception signal;
From each of the plurality of microphones that pick up the echo component of the received signal and the voice of the speaker and generate an echo signal according to the echo component and a voice signal according to the voice of the speaker An input unit for synthesizing input signals to generate a transmission signal;
A removal signal generation unit that generates a removal signal for removing the echo signal included in the transmission signal based on a filter coefficient;
A control unit for calculating the filter coefficient;
A removal unit that generates an echo removal signal based on the transmission signal and the removal signal;
Having
The control unit calculates individual filter coefficients corresponding to each of the plurality of microphones, and combines the individual filter coefficients to calculate the filter coefficients.
An audio signal processing device.
The control unit calculates the filter coefficient when the audio signal is not included in the transmission signal.
The audio signal processing apparatus according to claim 1.
The control unit calculates the filter coefficient when the echo signal is included in the transmission signal;
The audio signal processing apparatus according to claim 2.
The removal signal generation unit generates the removal signal based on the received signal and the filter coefficient.
The audio signal processing apparatus according to claim 1.
The input unit generates an individual transmission signal corresponding to each of the plurality of microphones based on a signal input from each of the plurality of microphones, and generates the transmission signal by combining the individual transmission signals. And
Of the individual transmission signals corresponding to each of the plurality of microphones, a switching unit that switches a signal input to the removal unit,
With
When the voice signal is not included in the transmission signal, the switching unit inputs the individual transmission signal corresponding to each of the plurality of microphones to the removal unit while switching.
The audio signal processing apparatus according to claim 1.
When the echo signal is included in the transmission signal, the switching unit inputs the individual transmission signal corresponding to each of the plurality of microphones to the removal unit while switching.
The audio signal processing apparatus according to claim 5.
The removal signal generation unit generates an individual removal signal for removing the echo signal included in the individual transmission signal,
The removal unit generates an individual echo removal signal based on the individual transmission signal and the individual removal signal,
The control unit calculates the individual filter coefficient based on the received signal and the individual residual echo signal included in the individual echo cancellation signal;
The audio signal processing apparatus according to claim 5.
The control unit calculates the individual filter coefficient based on a gain value corresponding to each of the plurality of microphones;
The audio signal processing apparatus according to claim 7.
A storage unit for storing the filter coefficient;
With
The control unit updates the filter coefficient stored in the storage unit when the audio signal is not included in the transmission signal.
The audio signal processing apparatus according to claim 1.
The storage unit stores a reference value,
The controller is
Based on the signal level of the transmission signal and the signal level of the echo cancellation signal, an echo return loss is measured,
Updating the filter coefficient based on a comparison result between the echo return loss and the reference value;
The audio signal processing apparatus according to claim 9.
The control unit measures individual echo return loss corresponding to each of the plurality of microphones based on the comparison result.
The audio signal processing apparatus according to claim 10.
The storage unit stores an individual reference value,
The controller is
For each of the plurality of microphones, the individual echo return loss and the individual reference value are compared,
Based on the comparison result between the individual echo return loss and the individual reference value, a specific microphone to be updated of the individual filter coefficient is determined from the plurality of microphones.
The audio signal processing apparatus according to claim 11.
The plurality of microphones are:
The specific microphone;
A non-specific microphone different from the specific microphone;
Consists of
The removal signal generation unit generates a specific removal signal for removing the echo signal included in the signal from the specific microphone based on the individual filter coefficient corresponding to the specific microphone,
The removal unit generates a specific echo removal signal based on the signal from the specific microphone and the specific removal signal,
The controller is
Based on the received signal and the individual residual echo signal included in the specific echo removal signal, calculate the individual filter coefficient corresponding to the specific microphone,
Updating the filter coefficient stored in the storage unit based on the individual filter coefficient corresponding to the non-specific microphone and the individual filter coefficient corresponding to the specific microphone;
The audio signal processing apparatus according to claim 12.
The controller is
Performing environmental measurements for each of the plurality of microphones;
Calculating the individual filter coefficient based on the result of the environmental measurement corresponding to each of the plurality of microphones;
The audio signal processing apparatus according to claim 1.
A computer is caused to function as the audio signal processing device according to any one of claims 1 to 14.
An audio signal processing program.
An output unit for outputting a reception signal;
From each of the plurality of microphones that pick up the echo component of the received signal and the voice of the speaker and generate an echo signal according to the echo component and a voice signal according to the voice of the speaker An input unit for synthesizing input signals to generate a transmission signal;
A removal signal generation unit that generates a removal signal for removing the echo signal included in the transmission signal based on a filter coefficient;
A control unit for calculating the filter coefficient;
A removal unit that generates an echo removal signal based on the transmission signal and the removal signal;
An audio signal processing method executed by an audio signal processing device comprising:
The control unit calculates individual filter coefficients corresponding to the plurality of microphones;
The control unit calculates the filter coefficient by combining the individual filter coefficients;
An audio signal processing method.
The control unit executes the calculation of the individual filter coefficient when the transmission signal includes the echo signal and does not include the audio signal.
The audio signal processing method according to claim 16.
The audio signal processing device includes a storage unit that stores the filter coefficient,
With
The control unit updates the filter coefficient stored in the storage unit when the voice signal is not included in the transmission signal;
The audio signal processing method according to claim 16.
The storage unit stores a reference value and an individual reference value,
The control unit, for each of the plurality of microphones,
Based on the signal level of the transmission signal and the signal level of the echo cancellation signal, an echo return loss is measured,
Based on a comparison result between the echo return loss and the reference value, an individual echo return loss corresponding to each of the plurality of microphones is measured,
Based on a comparison result between the individual echo return loss and the individual reference value, a specific microphone to be updated of the individual filter coefficient is determined from the plurality of microphones,
Updating the filter coefficients;
The audio signal processing method according to claim 18.