WO2018230062A1 - Voice signal processing device, voice signal processing method and voice signal processing program - Google Patents

Voice signal processing device, voice signal processing method and voice signal processing program Download PDF

Info

Publication number
WO2018230062A1
WO2018230062A1 PCT/JP2018/010330 JP2018010330W WO2018230062A1 WO 2018230062 A1 WO2018230062 A1 WO 2018230062A1 JP 2018010330 W JP2018010330 W JP 2018010330W WO 2018230062 A1 WO2018230062 A1 WO 2018230062A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
individual
echo
removal
filter coefficient
Prior art date
Application number
PCT/JP2018/010330
Other languages
French (fr)
Japanese (ja)
Inventor
鬼塚一浩
相川徹
菊原靖仁
実方友里
Original Assignee
株式会社オーディオテクニカ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社オーディオテクニカ filed Critical 株式会社オーディオテクニカ
Priority to CN201880038882.7A priority Critical patent/CN110741563B/en
Priority to US16/621,861 priority patent/US11227618B2/en
Priority to EP18816708.4A priority patent/EP3641141A4/en
Priority to JP2019525088A priority patent/JP7122756B2/en
Publication of WO2018230062A1 publication Critical patent/WO2018230062A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B3/00Line transmission systems
    • H04B3/02Details
    • H04B3/20Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other
    • H04B3/23Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other using a replica of transmitted signal in the time domain, e.g. echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/02Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers

Definitions

  • the present invention relates to an audio signal processing device, an audio signal processing method, and an audio signal processing program.
  • teleconferencing systems such as teleconference systems and video conference systems using communication lines such as the Internet have been used for conferences between physically separate locations.
  • a sound based on an audio signal received from one site hereinafter referred to as “received signal”
  • the output audio is transmitted to the other site.
  • the microphone picks up sound, an acoustic echo is generated.
  • a general echo canceller includes an adaptive filter that generates a removal signal for removing an echo signal based on a received signal and an echo signal corresponding to an acoustic echo, and adds or subtracts the removal signal and the echo signal. Thus, the echo signal is removed.
  • the echo canceller disclosed in Patent Document 1 includes a plurality of echo cancellation units corresponding to each of a plurality of microphones, and each echo cancellation unit removes an echo signal included in an input signal from a corresponding microphone, Supports multiple channels.
  • the echo canceller disclosed in Patent Document 1 requires the same number of echo canceling units as the microphones, which complicates the circuit configuration and signal processing.
  • the present invention has been made to solve the above-described problems of the prior art, and an audio signal processing apparatus capable of removing echo signals contained in input signals from a plurality of microphones with a simple circuit configuration, and An object of the present invention is to provide an audio signal processing method and an audio signal processing program.
  • An audio signal processing apparatus picks up an output unit that outputs a received signal, an echo component of the received signal, and a speaker's voice, and an echo signal according to the echo component,
  • An input unit that generates a transmission signal by synthesizing signals input from each of a plurality of microphones that generate an audio signal corresponding to the audio, and a filter coefficient that removes an echo signal included in the transmission signal
  • a removal signal generation unit that generates a filter coefficient, a control unit that calculates a filter coefficient, and a removal unit that generates an echo cancellation signal based on the transmission signal and the removal signal.
  • the filter coefficient is calculated by calculating individual filter coefficients corresponding to each of the plurality of microphones and combining the individual filter coefficients.
  • FIG. 5 is a functional block diagram showing a signal flow in the initial learning process of FIG. 4. It is a flowchart of the echo signal removal process included in the audio
  • Audio signal processing device ⁇ Audio signal processing device ⁇ First, an embodiment of an audio signal processing apparatus (hereinafter referred to as “this apparatus”) according to the present invention will be described.
  • FIG. 1 is a functional block diagram showing an embodiment of this device.
  • the apparatus 1 performs processing such as mixing, distribution, and balance adjustment of a signal (input signal) from a device such as a microphone 3 that converts voice or musical sound into an electrical signal.
  • the device 1 is, for example, a mixer.
  • the device 1 Is used, and the present apparatus 1 includes one speaker 2 disposed at the first base and six microphones 3a, 3b, 3c, 3d, 3e, 3f (so-called six channels).
  • the first base and the second base are, for example, rooms such as a conference room.
  • the microphone 3 When the speaker at the first site speaks, the microphone 3 generates a signal (hereinafter referred to as “voice signal”) s1 corresponding to the voice of the speaker and outputs the voice signal s1. That is, when the speaker at the first site and the speaker at the second site are speaking, the signal output from the microphone 3 includes the audio signal s1 and the echo signal es. On the other hand, when only the speaker at the second site is speaking, the signal output from the microphone 3 includes an echo signal es.
  • the apparatus 1 includes a first input unit 10, a first output unit 20, a second input unit 30, a switching unit 40, a control unit 50, a storage unit 60, a removal signal generation unit 70, and a removal unit. 80 and a second output unit 90.
  • the device 1 is realized by a personal computer or the like.
  • an information processing program hereinafter referred to as “this program”
  • this program operates, and this program cooperates with the hardware resources of the apparatus 1, and an audio signal according to the present invention to be described later.
  • a processing method hereinafter referred to as “the present method”.
  • the computer can be caused to function in the same manner as the apparatus, and the computer can execute the method.
  • the first input unit 10 is connected to the communication device 4 at the second site via the communication line 5 such as a communication cable, and receives a voice signal (hereinafter referred to as “received signal”) s2 from the second site.
  • the first input unit 10 includes, for example, a communication interface (I / F) such as a connector or a terminal, an amplifier, and the like.
  • the received signal s ⁇ b> 2 from the first input unit 10 is input to the first output unit 20, the control unit 50, and the removal signal generation unit 70.
  • the first output unit 20 outputs the reception signal s2 from the first input unit 10 and the reference signal s3 from the control unit 50 to the speaker 2.
  • the first output unit 20 includes, for example, an I / F, an amplifier, and the like.
  • the first output unit 20 is an output unit in the present invention.
  • the “reference signal s3” is a signal corresponding to a reference sound (for example, white noise) emitted through the speaker 2 when the apparatus 1 executes the method described later.
  • the reference signal s3 is generated by the control unit 50.
  • the second input unit 30 is connected to each microphone 3a-3f and receives signals from the respective microphones 3a-3f.
  • the second input unit 30 includes, for example, an I / F, an amplifier, an AD converter, a variable resistor, and the like.
  • the second input unit 30 is an input unit in the present invention.
  • the second input unit 30 generates signals (hereinafter referred to as “individual transmission signals”) s41, s42, s43, s44, s45, and s46 in which the gains of the received signals are adjusted, and the individual transmission signals s41.
  • a signal hereinafter referred to as “transmission signal” s4 obtained by combining ⁇ s46 is generated.
  • the second input unit 30 generates the transmission signal s4 by combining the individual transmission signals s41 to s46, in other words, the signals from the respective microphones 3a to 3f.
  • the second input unit 30 includes seven transmission paths (not shown) corresponding to the transmission signal s4 and the individual transmission signals s41 to s46.
  • the generated transmission signal s4 and the individual transmission signals s41 to s46 are input to the switching unit 40.
  • the individual transmission signals s41 to s46 are collectively referred to without distinction, the individual transmission signals s41 to s46 are referred to as individual transmission signals s40.
  • Gain sharing compares the input from each microphone 3a-3f with the sum of the inputs (for example, when there is a signal input only from the microphone 3a and when there is a signal input from the microphone 3a-3f) And gain values g1, g2, g3, g4, and g5 set in the transmission paths (amplifiers) of signals from the microphones 3a to 3f so that the total gain value G becomes a constant value.
  • G6 gain sharing is an algorithm that adjusts the gain values g1-g6 corresponding to the microphones 3a-3f so that the total gain value G of each transmission line becomes a constant value.
  • the gain values g1-g6 set for each transmission path are stored in the storage unit 60.
  • the gain values g1-g6 are collectively referred to without distinction, the gain values g1-g6 are referred to as gain values g.
  • the transmission signal s4 and the individual transmission signals s41 to s46 are generated based on the signals from the respective microphones 3a to 3f. That is, the transmission signal s4 and the individual transmission signals s41 to s46 include the voice signal s1 and the echo signal es when the speaker at the first site is speaking, and the speaker at the first site is speaking. When there is no echo signal es is included.
  • the switching unit 40 switches signals input from the second input unit 30 to the control unit 50 and the removal unit 80 by switching the transmission path of the second input unit 30 based on the switching signal from the control unit 50. That is, the switching unit 40 switches a signal input to the control unit 50 or the removal unit 80 among the individual transmission signal s40 and the transmission signal s4 corresponding to each of the six microphones 3a to 3f.
  • the switching unit 40 is composed of, for example, a rotary switch or a slide switch. The operation of the switching unit 40 will be described later.
  • the control unit 50 performs calculation of coefficients necessary for the apparatus 1 to execute the method described later, detection of the audio signal s1 and the reception signal s2, measurement of echo return loss, and the like.
  • the controller 50 includes, for example, a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like. Consists of a circuit. The operation of the control unit 50 and the echo return loss will be described later.
  • the storage unit 60 is a means for storing information necessary for the apparatus 1 to execute the method described later.
  • the storage unit 60 includes, for example, a recording device such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive), a semiconductor memory element such as a RAM (Random Access Memory), a flash memory, and the like. Information stored in the storage unit 60 will be described later.
  • the removal signal generation unit 70 generates a removal signal s5 based on the received signal s2 and the filter coefficient F.
  • the removal signal generation unit 70 is, for example, a known FIR (Finite Impulse Response) filter.
  • the “removal signal s5” is a signal for removing (suppressing) the echo signal es included in the transmission signal s4. That is, for example, the removal signal s5 is a signal having the same phase (or almost as close as possible) to the echo signal es included in the transmission signal s4.
  • the generation of the removal signal s5 by the removal signal generation unit 70 will be described later.
  • “Filter coefficient F” is a coefficient used by the removal signal generation unit 70 to perform FIR processing on the received signal s2 and generate the removal signal s5. That is, the removal signal generation unit 70 performs FIR processing on the received signal s2 based on the filter coefficient F to generate a removal signal s5.
  • the filter coefficient F is calculated by the control unit 50 as described above. The calculation of the filter coefficient F by the control unit 50 will be described later.
  • the removal unit 80 removes the echo signal es included in the transmission signal s4 based on the transmission signal s4 and the removal signal s5, and generates an echo removal signal s6.
  • the removal unit 80 is an arithmetic circuit such as a subtraction circuit or an addition circuit, for example. The generation of the echo removal signal s6 by the removal unit 80 will be described later.
  • the “echo removal signal s6” is a signal obtained by removing (suppressing) the echo signal es from the transmission signal s4 as described above.
  • the echo cancellation signal s6 includes a voice signal s1 and a residual echo signal res when the speaker at the first site is speaking, and includes a residual echo signal res when the speaker at the first site is not speaking.
  • the “residual echo signal res” is a difference signal between the echo signal es and the removal signal s5.
  • the echo removal signal s6 completely removes the echo signal es (when the phase of the echo signal es and the removal signal s5 is the same phase), the residual echo signal res generated by subtracting both signals The signal level is “0”.
  • the echo removal signal s6 is input to the control unit 50 and the second output unit 90.
  • the second output unit 90 is connected to the communication line 5 and outputs an echo cancellation signal s6 to the communication line 5.
  • the second output unit 90 includes, for example, an I / F, an amplifier, and the like.
  • the echo cancellation signal s6 from the second output unit 90 is input to the communication device 4 at the second site via the communication line 5.
  • Audio signal processing method Next, this method will be described.
  • FIG. 2 is a flowchart showing an embodiment of the present method.
  • the present apparatus 1 uses an initialization process (ST1), an initial learning process (ST2), an echo signal removal process (ST3), a later-described specific process (ST4) (see FIG. 8), and a later-described process.
  • Update processing (ST5) (see FIG. 10) is executed.
  • the apparatus 1 corresponds to six microphones 3 (6 channels) with one common FIR filter (removed signal generation unit 70) by executing each process (ST1-ST5) in the present method, and As will be described later, echo cancellation that automatically responds to environmental changes is realized.
  • the apparatus 1 executes an initialization process (ST1) after the apparatus 1 is turned on.
  • FIG. 3 is a flowchart of the initialization process (ST1).
  • the “initialization process (ST1)” is a process for executing parameter initialization, environment measurement, and the like.
  • the apparatus 1 initializes parameters (ST101).
  • the “parameter” is a value set in an algorithm (adaptive algorithm) used for calculating the individual filter coefficient k described later.
  • the present apparatus 1 uses the control unit 50 to perform environmental measurement of the first site where the present apparatus 1, the speaker 2, and the microphone 3 are installed (ST102).
  • “Environmental measurement” is an item related to the transmission path (environment) of echo components from the speaker 2 to the microphone 3 at the first site where the apparatus 1, the speaker 2, and the microphone 3 are installed (for example, reverberation time, delay time). , Maximum echo amount, background noise).
  • the apparatus 1 outputs a reference sound to the first site via the speaker 2 and collects an echo component of the reference sound via the microphone 3.
  • the control unit 50 measures reverberation time, delay time, maximum echo amount, and background noise.
  • the environmental measurement is executed for each microphone 3a-3f.
  • “Reverberation time” is the time required for the energy density of the reverberant sound of the same sound to decay by 60 dB after the reference sound is output (radiated) in the first base and the output of the reference sound is stopped.
  • “Delay time” is the time required for the microphone 3 to collect the reference sound output from the speaker 2.
  • the “maximum echo amount” is the maximum amount of echo components collected by the microphone 3 in the first site.
  • Background noise is the sound pressure level of noise (such as air-conditioning sound or outdoor car sound) in the first site.
  • the present apparatus 1 stores the measurement results of the environmental measurement corresponding to each of the microphones 3a-3f in the storage unit 60 (ST103).
  • the present apparatus 1 specifies parameters based on the measurement results of the environmental measurements corresponding to the microphones 3a to 3f (ST104).
  • the parameter is newly calculated based on the measurement result of the environmental measurement, or one parameter group is selected from a plurality of parameter groups stored in advance in the storage unit 60 based on the measurement result of the environmental measurement. Specified.
  • FIG. 4 is a flowchart of the initial learning process (ST2).
  • FIG. 5 is a functional block diagram showing a signal flow in the initial learning process (ST2). In the figure, the main flow is indicated by arrows in the signal flow in the initial learning process (ST2).
  • “Initial learning process (ST2)” is a process in which the apparatus 1 first calculates (learns) the filter coefficient F after the apparatus 1 is powered on.
  • the device 1 switches the transmission path of the second input unit 30 to the transmission path from the microphone 3a using the switching unit 40 and the control unit 50 (ST201).
  • the switching of the transmission path by the switching unit 40 is performed based on a switching signal from the control unit 50.
  • the present apparatus 1 uses the second input unit 30 to generate an individual transmission signal s41 corresponding to the microphone 3a (ST202).
  • the control unit 50 generates the reference signal s3 and inputs the reference signal s3 to the first output unit 20.
  • the apparatus 1 outputs a reference sound from the speaker 2 and picks up the echo component of the reference sound by the microphone 3 (microphone 3a) corresponding to the transmission path switched in the above-described processing (ST201).
  • the second input unit 30 generates an individual transmission signal s41 corresponding to the microphone 3a based on a signal input from the microphone 3a.
  • the individual transmission signal s41 includes an echo signal es corresponding to the echo component of the reference sound.
  • the individual transmission signal s41 is input from the second input unit 30 to the removal unit 80 via the switching unit 40.
  • the apparatus 1 generates the individual removal signal s51 using the control unit 50 and the removal signal generation unit 70 (ST203).
  • the “individual removal signal s51” is a signal for removing an echo signal (hereinafter referred to as “individual echo signal”) es1 included in the individual transmission signal s41.
  • the individual removal signals s51-s56 are collectively referred to without distinction, the individual removal signals s51-s56 are referred to as individual removal signals s50.
  • the control unit 50 reads the initial value of the individual filter coefficient k1 corresponding to the microphone 3a from the storage unit 60, and inputs (sets) it to the removal signal generation unit 70.
  • the removal signal generation unit 70 calculates the individual removal signal s51 based on the reference signal s3 and the individual filter coefficient k1.
  • the individual removal signal s51 is input to the removal unit 80.
  • “Individual filter coefficient k1” is a transfer function of an acoustic transfer path from the speaker 2 to the microphone 3a. That is, the individual filter coefficient k1 is a coefficient used by the removal signal generation unit 70 to perform the FIR process on the reference signal and generate the individual removal signal s51.
  • the “reference signal” is a signal that is a basis for the removal signal generation unit 70 to generate the individual removal signal s51 based on the individual filter coefficient k1 (the reference signal s3 in the initial learning process (ST2), the echo signal removal process (ST3), The specific process (ST4) and the update process (ST5) are received signals s2).
  • the present apparatus 1 uses the removal unit 80 to remove the individual echo signal es1 included in the individual transmission signal s41 and generate the individual echo removal signal s61 (ST204).
  • the removal unit 80 generates the individual echo removal signal s61 based on the individual transmission signal s41 and the individual removal signal s51.
  • the individual echo removal signal s61 is input from the removal unit 80 to the control unit 50 and the second output unit 90. At this time, the second output unit 90 mutes the individual echo removal signal s61. As a result, the individual echo removal signal s61 is not transmitted to the second site.
  • the second output unit may attenuate the individual echo cancellation signal, or may mute the individual echo cancellation signal and transmit dummy noise (pink noise) to the second site.
  • the “individual echo cancellation signal s61” is a signal obtained by removing (suppressing) the individual echo signal es1 from the individual transmission signal s41.
  • the individual echo removal signal s61 includes an individual residual echo signal res11.
  • the individual echo cancellation signals s61-s66 are collectively referred to without distinction, the individual echo cancellation signals s61-s66 are referred to as individual echo cancellation signals s60.
  • the “individual residual echo signal res11” is a difference signal between the individual echo signal es1 and the individual removal signal s51.
  • the removing unit 80 subtracts the individual removal signal s51 from the individual transmission signal s41 to generate the individual echo removal signal s61.
  • the individual residual echo signals res11 to res16 are collectively referred to without distinction, the individual residual echo signals res11 to res16 are referred to as individual residual echo signals res10.
  • the present apparatus 1 uses the control unit 50 to calculate an individual filter coefficient k1 corresponding to the microphone 3a (ST205).
  • the control unit 50 reads the gain value g1 (corresponding to the microphone 3a) set in the transmission path of the signal from the microphone 3a from the storage unit 60.
  • the control unit 50 is known based on the read gain value g1, the reference signal s3, and the individual echo removal signal s61 (that is, the individual residual echo signal res11 included in the individual echo removal signal s61). Is used to calculate the individual filter coefficient k1 corresponding to the microphone 3a.
  • the calculated individual filter coefficient k1 is stored in the storage unit 60 (ST206). As a result, the individual filter coefficient k1 stored in the storage unit 60 is updated from the initial value to the calculated value.
  • the apparatus 1 repeats the processing (ST201-ST206) until the individual filter coefficients k1-k6 corresponding to all the microphones 3a-3f are calculated (“No” in ST207).
  • the parameter of the adaptive algorithm is specified based on the measurement result of the environmental measurement corresponding to each microphone 3.
  • the control unit 50 calculates the individual filter coefficient k corresponding to each microphone 3 based on the measurement result of the environmental measurement corresponding to each microphone 3.
  • the present apparatus 1 calculates the filter coefficient F using the control unit 50 (ST208).
  • the control unit 50 reads the gain values g1-g6 of the signal transmission paths from the microphones 3a-3f and the individual filter coefficients k1-k6 from the storage unit 60, and the gain values g1-g6 and the individual filter coefficients k1. Based on ⁇ k6, the filter coefficient F is calculated.
  • the filter coefficient F is calculated by combining the individual filter coefficients k1-k6.
  • the individual filter coefficients k1-k6 are collectively referred to without distinction, the individual filter coefficients k1-k6 are referred to as individual filter coefficients k.
  • each individual filter coefficient k1-k6 is performed by multiplying the individual filter coefficient k by the gain value g and adding the result for each individual filter coefficient k1-k6 corresponding to each microphone 3a-3f. Is done. That is, the filter coefficient F is a value obtained by multiplying the individual filter coefficient k1 corresponding to the microphone 3a and the gain value g1, a value obtained by multiplying the individual filter coefficient k2 corresponding to the microphone 3b and the gain value g2, and the microphone 3c.
  • the calculated filter coefficient F is stored in the storage unit 60 and input (set) to the removal signal generation unit 70 (ST209).
  • the removal signal generation unit 70 can generate the removal signal s5 based on the filter coefficient F.
  • the filter coefficient F is calculated by calculating and combining the individual filter coefficients k1-k6 corresponding to the microphones 3a-3f. Therefore, unlike the conventional apparatus that includes an echo canceling unit for each microphone, the present device 1 uses a single echo canceling unit (corresponding to the control unit 50, the removal signal generating unit 70, and the removing unit 80) to each microphone 3a- The echo signal es included in the signal from 3f can be removed. That is, the present apparatus 1 executes echo cancellation by using one common FIR filter (removal signal generation unit 70) for the inputs from the six microphones 3a-3f. That is, the present apparatus 1 can remove the echo signal es included in the signal from each microphone 3a-3f with a simple circuit configuration as compared with the conventional apparatus.
  • FIG. 6 is a flowchart of the echo signal removal process (ST3).
  • FIG. 7 is a functional block diagram showing a signal flow in the echo signal removal process (ST3). In the figure, the main flow of the signal flow in the echo signal removal process (ST3) is indicated by an arrow.
  • the “echo signal removal process (ST3)” is performed when the received signal s2 is included in the signal received by the first input unit 10, for example, during a meeting between the first site and the second site. Is a process of removing the echo signal es corresponding to the received signal s2.
  • the signal (received signal s ⁇ b> 2) from the first input unit 10 is input to the first output unit 20, the control unit 50, and the removal signal generation unit 70.
  • the apparatus 1 uses the control unit 50 to detect whether or not the received signal s2 is included in the signal from the first input unit 10, that is, the presence or absence of the received signal s2 (ST301). For example, the control unit 50 detects the presence or absence of the received signal s2 by comparing a signal (signal level) from the first input unit 10 with a predetermined threshold value V1. When there is a reception signal s2, the transmission signal s4 includes an echo signal es corresponding to the reception signal s2.
  • Theshold V1 is a threshold for the control unit 50 to detect whether or not the received signal s2 is included in the signal from the first input unit 10.
  • the threshold value V1 is stored in the storage unit 60.
  • the present apparatus 1 repeats the detection of the presence or absence of the received signal s2.
  • the present apparatus 1 uses the switching unit 40 and the control unit 50 to The transmission path of input unit 30 is switched to the transmission path of transmission signal s4 (ST302).
  • the present apparatus 1 generates a transmission signal s4 using the second input unit 30 (ST303).
  • the transmission signal s4 is input from the second input unit 30 to the control unit 50 and the removal unit 80 via the switching unit 40.
  • the present apparatus 1 generates a removal signal s5 using the control unit 50 and the removal signal generation unit 70 (ST304).
  • the control unit 50 reads the filter coefficient F from the storage unit 60 and inputs (sets) the filter coefficient F to the removal signal generation unit 70.
  • the removal signal generation unit 70 generates a removal signal s5 from the received signal s2 based on the filter coefficient F input from the control unit 50.
  • the filter coefficient F is the filter coefficient F calculated in the initial learning process (ST2) or the filter coefficient F calculated and updated in the update process (ST5) described later.
  • the present apparatus 1 uses the removal unit 80 to remove the echo signal es included in the transmission signal s4 and generate an echo removal signal s6 (ST305).
  • the removal unit 80 generates an echo removal signal s6 based on the transmission signal s4 and the removal signal s5.
  • the echo removal signal s6 is input to the control unit 50 and the second output unit 90.
  • the present apparatus 1 measures an echo return loss (ERL) using the control unit 50 (ST306).
  • ERP echo return loss
  • ERL is a level difference between the transmission signal s4 and the echo cancellation signal s6, that is, the magnitude (signal level) of the residual echo signal res included in the echo cancellation signal s6.
  • the ERL is influenced by, for example, a change in the installation location of the microphone 3 or a change in the output level of the speaker 2. That is, for example, ERL deteriorates when the position of the microphone 3 is moved by the speaker and the transmission path of the echo component changes (environmental change).
  • the control unit 50 measures ERL based on the signal level of the transmission signal s4 and the signal level of the echo cancellation signal s6. That is, the control unit 50 measures the ERL by subtracting the signal level of the echo removal signal s6 from the signal level of the transmission signal s4.
  • the present apparatus 1 uses the control unit 50 to compare the measured ERL with a predetermined threshold value V2 (ST307).
  • the “threshold value V2” is a threshold value indicating whether or not the echo signal es is sufficiently removed by the apparatus 1 (whether the signal level of the residual echo signal res is high). That is, when the removal of the echo signal es by the present apparatus 1 is insufficient, the ERL becomes equal to or more than the threshold value V2 (deteriorates). On the other hand, when the removal of the echo signal es by the apparatus 1 is sufficient, the ERL is smaller than the threshold value V2.
  • the threshold value V2 is a reference value in the present invention.
  • the threshold value V2 is stored in the storage unit 60.
  • the apparatus 1 uses the second output unit 90 to output the echo removal signal s6 to the communication apparatus 4 at the second site (ST308) for processing. Return to (ST301).
  • the apparatus 1 uses the control unit 50 to determine whether the transmission signal s4 includes the audio signal s1 (whether the audio signal s1 is present). ) Is detected (ST309).
  • the control unit 50 detects the presence or absence of the audio signal s1 by comparing the transmission signal s4 (signal level) from the second input unit 30 with a predetermined threshold value V3.
  • Theshold V3 is a threshold for the control unit 50 to detect whether or not the audio signal s1 is included in the transmission signal s4 from the second input unit 30.
  • the threshold value V3 is stored in the storage unit 60.
  • the present apparatus 1 uses the second output unit 90 to transmit the echo cancellation signal s6 to the second base. Is output to the communication device 4 (ST308), and the process returns to ST301.
  • the present apparatus 1 uses the second output unit 90 to output the echo cancellation signal s6 as the first signal.
  • the data is output to the communication devices 4 at two sites (ST310), and the specific process (ST4) is executed.
  • the present apparatus 1 executes the specific process (ST4) at the timing when the received signal s2 is present and the voice signal s1 is absent. That is, the present device 1 is based on the comparison result between the ERL and the threshold value V2, and when the echo signal es is included in the transmission signal s4 and the audio signal s1 is not included in the transmission signal s4, A specific process (ST4) is executed. In other words, when the apparatus 1 detects an environmental change during the execution of the echo signal removal process (ST3), the apparatus 1 executes the specific process (ST4).
  • the present apparatus 1 can reverse the magnitude comparison between the ERL and the threshold value V2 in the above-described processing (ST307).
  • the apparatus uses the control unit to detect whether or not a voice signal is included in the transmission signal (the presence or absence of a voice signal). Also good.
  • FIG. 8 is a flowchart of the specific processing (ST4).
  • FIG. 9 is a functional block diagram showing a signal flow in the specific process (ST4).
  • the main flow is indicated by arrows.
  • FIG. 4 shows only signals corresponding to signals from the microphone 3a among the microphones 3a to 3f.
  • “Specific processing (ST4)” is processing for specifying the microphone 3 as a specific microphone or a non-specific microphone.
  • the “specific microphone” is a microphone 3 in which the corresponding individual filter coefficient k is not appropriate (deviation), that is, the microphone 3 for which the individual filter coefficient k is to be updated.
  • the deterioration of ERL is caused by the shift of the filter coefficient F with respect to the echo signal es, that is, the shift of the individual filter coefficients k1-k6 with respect to the individual echo signals es1-es6. Therefore, the individual filter coefficient k corresponding to the specific microphone needs to be updated to an appropriate value.
  • the “non-specific microphone” is a microphone 3 in which the corresponding individual filter coefficient k is appropriate (not shifted), that is, a microphone 3 that is not subject to update of the individual filter coefficient k.
  • the present apparatus 1 uses the control unit 50 to detect whether or not the voice signal s1 is included in the transmission signal s4 (presence / absence of the voice signal s1) (ST401).
  • the detection of the presence / absence of the audio signal s1 (ST401) is the same process as the detection of the presence / absence of the audio signal s1 (ST309) in the echo signal removal process (ST3).
  • the present apparatus 1 uses the switching unit 40 and the control unit 50 to generate a second input unit.
  • the 30 transmission paths are switched to the transmission path of the signal from the microphone 3a (ST402).
  • the present apparatus 1 uses the second input unit 30 to generate an individual transmission signal s41 based on the signal from the microphone 3a (ST403).
  • the individual transmission signal s41 is input to the removal unit 80 via the switching unit 40.
  • the present apparatus 1 generates an individual removal signal (specific removal signal) s51 using the control unit 50 and the removal signal generation unit 70 (ST404).
  • the control unit 50 reads the individual filter coefficient k1 corresponding to the microphone 3a from the storage unit 60 and inputs it to the removal signal generation unit 70.
  • the removal signal generation unit 70 generates an individual removal signal s51 based on the received signal s2 and the individual filter coefficient k1.
  • the individual removal signal s51 is input to the removal unit 80.
  • the present apparatus 1 uses the removal unit 80 to remove the individual echo signal es1 included in the individual transmission signal s41 and generate the individual echo removal signal s61 (ST405).
  • the removal unit 80 generates the individual echo removal signal s61 based on the individual transmission signal s41 and the individual removal signal s51.
  • the individual echo removal signal s61 is input from the removal unit 80 to the control unit 50 and the second output unit 90.
  • the present apparatus 1 measures the individual ERL using the control unit 50 (ST406).
  • “Individual ERL” is a level difference between the individual transmission signal s41 and the individual echo removal signal s61, that is, the magnitude (signal level) of the individual residual echo signal res11 included in the individual echo removal signal s61.
  • the control unit 50 measures the individual ERL based on the signal level of the individual transmission signal s41 and the signal level of the individual echo removal signal s61. That is, the control unit 50 measures the individual ERL by subtracting the signal level of the individual echo removal signal s61 from the signal level of the individual transmission signal s41.
  • the present apparatus 1 uses the control unit 50 to compare the measured individual ERL with a predetermined threshold value V4 (ST407).
  • the “threshold value V4” is a threshold value indicating whether or not the removal of the individual echo signal es1 by the apparatus 1 is sufficient (whether the signal level of the individual residual echo signal res11 is large). That is, when the removal of the individual echo signal es1 by the present apparatus 1 is insufficient, the individual ERL becomes equal to or higher than the threshold value V4 (deteriorates). On the other hand, when the individual echo signal es1 is sufficiently removed by the apparatus 1, the individual ERL is smaller than the threshold value V4.
  • the threshold value V4 is an individual reference value in the present invention. The threshold value V4 is stored in the storage unit 60.
  • the present apparatus 1 specifies the microphone 3a as a non-specific microphone (ST408).
  • apparatus 1 identifies microphone 3a as a specific microphone (ST409).
  • the specific result is stored in storage unit 60 (ST410).
  • the individual echo removal signal s61 is output from the second output unit 90.
  • This apparatus 1 repeats the processing (ST401-ST410) on the signals from the remaining microphones 3b-3f until all microphones 3a-3f are specified as specific microphones or non-specific microphones (“No” in ST411) "). That is, the apparatus 1 uses the switching unit 40 to input the individual transmission signals s42 to s46 corresponding to the remaining microphones 3b to 3f to the removal unit 80 while switching, and specifies each microphone 3a to 3f. It is determined as either a microphone or a non-specific microphone.
  • the apparatus 1 executes the update process (ST5) when each microphone 3a-3f is specified as a specific microphone or a non-specific microphone (“Yes” in ST411). At this time, the microphone 3 includes a specific microphone and a non-specific microphone.
  • the apparatus 1 ends (interrupts) the specific process (ST4) and removes the echo signal.
  • the process (ST3) is executed. That is, when the control unit 50 detects the audio signal s1 before the specific process (ST4) is completed, the apparatus 1 interrupts the specific process (ST4) and executes the echo signal removal process (ST3).
  • the apparatus 1 determines that the audio signal s1 is not included in the transmission signal s4 in the echo signal removal process (ST3), and the interrupted process (specific microphone or The specific process (ST4) is executed (restarted) from the process for the signal from the microphone 3 not specified as the non-specific microphone. That is, for example, if the specific process (ST4) is interrupted to the microphone 3d among the microphones 3a to 3f, the specific process (ST4) is resumed from the microphone 3e.
  • this apparatus may execute the specific process from the beginning, that is, all microphones.
  • the threshold value V4 is a negative value
  • this apparatus 1 reverses the comparison of the individual ERL and the threshold value V4 in the above-described processing (ST407). May be. That is, for example, when the individual ERL that is a negative value is equal to or less than the threshold value V4, the apparatus may identify the microphone 3 corresponding to the individual ERL as a specific microphone.
  • the present apparatus 1 is based on the comparison result between the individual ERL and the individual reference value (threshold value V4), the specific microphone that is the target of updating the individual filter coefficient k from among the plurality of microphones 3a-3f, Non-specific microphones that are not targeted for updating the individual filter coefficient k are determined. That is, when the ERL deteriorates, the device 1 determines a specific microphone at a timing at which the echo signal es is included in the transmission signal s4 and the audio signal s1 is not included in the transmission signal s4. Therefore, the present apparatus 1 limits the microphone 3 that needs to update the individual filter coefficient k, and reduces the time and processing load required for updating the individual filter coefficient k and the filter coefficient F.
  • FIG. 10 is a flowchart of the update process (ST5).
  • FIG. 11 is a functional block diagram showing a signal flow in the update process (ST5).
  • the main flow is indicated by arrows.
  • the figure shows only each signal corresponding to the signal from the microphone 3c.
  • Update process (ST5) is a process of updating the filter coefficient F by updating the individual filter coefficient k corresponding to the microphone 3 specified as the specific microphone. That is, for example, when the microphone 3a is specified as a specific microphone, the device 1 updates the filter coefficient F by updating the individual filter coefficient k1 corresponding to the microphone 3a. When the microphones 3e and 3f are specified as specific microphones, the apparatus 1 updates the filter coefficients F by updating the individual filter coefficients k5 and k6 corresponding to the microphones 3e and 3f.
  • the microphone 3c is specified as a specific microphone will be described as an example.
  • the present apparatus 1 uses the control unit 50 to detect whether or not the audio signal s1 is included in the transmission signal s4 (or the individual transmission signal s43) (the presence or absence of the audio signal s1) (ST501). ).
  • the detection of the presence / absence of the audio signal s1 (ST501) is the same processing as the detection of the presence / absence of the audio signal s1 (ST309) in the echo signal removal processing (ST3).
  • the present apparatus 1 uses the switching unit 40 and the control unit 50 to switch the transmission path of the second input unit 30 to the transmission path of the signal from the specific microphone (microphone 3c) (ST502).
  • this apparatus 1 generates an individual transmission signal s43 based on a signal from a specific microphone (microphone 3c) (ST503).
  • the present apparatus 1 generates an individual removal signal s53 using the control unit 50 and the removal signal generation unit 70 (ST504).
  • the control unit 50 reads out the individual filter coefficient k3 corresponding to the specific microphone from the storage unit 60 and inputs it to the removal signal generation unit 70.
  • the removal signal generation unit 70 generates an individual removal signal s53 based on the received signal s2 and the individual filter coefficient k3.
  • the individual removal signal s53 is a specific removal signal in the present invention.
  • the individual removal signal s53 is input to the removal unit 80.
  • the present apparatus 1 uses the removal unit 80 to remove the individual echo signal es3 included in the individual transmission signal s43 and generate the individual echo removal signal s63 (ST505).
  • the individual echo removal signal s63 is a specific echo removal signal in the present invention.
  • the individual echo removal signal s63 is input to the control unit 50 and the second output unit 90.
  • the present apparatus 1 measures an individual echo return loss (individual ERL) using the control unit 50 (ST506).
  • the present apparatus 1 uses the control unit 50 to compare the measured individual ERL with a predetermined threshold value V4 (ST507).
  • the present apparatus 1 calculates the individual filter coefficient k3 using the control unit 50 (ST508).
  • the control unit 50 reads the gain value g3 set in the transmission path of the signal from the specific microphone from the storage unit 60.
  • the controller 50 reads the read gain value g3, the individual echo removal signal s63 (that is, the individual residual echo signal res13 included in the individual (specific) echo removal signal s63), the received signal s2, and the environment measurement result. ,
  • the individual filter coefficient k3 is calculated.
  • the present apparatus 1 stores the calculated individual filter coefficient k3 in the storage unit 60, that is, updates the individual filter coefficient k3 stored in the storage unit 60 (ST509), and returns to the processing (ST504). .
  • the present apparatus 1 updates the filter coefficient F stored in the storage unit 60 using the control unit 50 (ST510).
  • the control unit 50 includes the individual filter coefficient k3 corresponding to the updated specific microphone, the individual filter coefficients k1, k2, k4-k6 corresponding to the non-specific microphones, and the gain values g1-g6 set for each transmission path.
  • the filter coefficient F is calculated in the same manner as the initial learning process (ST2) (ST208).
  • the apparatus 1 stores the calculated filter coefficient F in the storage unit 60, that is, updates the filter coefficient F stored in the storage unit 60 (ST511), and performs echo signal removal processing (ST3). Return.
  • the present apparatus 1 specifies the microphone 3 whose individual ERL has deteriorated in the specific process (ST4) as the specific microphone, and executes the update process (ST5) only for the specific microphone.
  • the processing load for updating the filter coefficient F is reduced, and the processing time is shortened.
  • the apparatus 1 always compares the ERL and the threshold value V2 (that is, monitors the ERL) in the echo signal removal process (ST3). When the ERL is greater than or equal to the threshold value V2, the present apparatus 1 performs a specific process (ST4) and an update process (ST5) at a timing when the echo signal es is included in the transmission signal s4 and the audio signal s1 is not included. Execute. In the specific process (ST4), the apparatus 1 compares the individual ERL and the threshold value V4 for each microphone 3. When the individual ERL is equal to or greater than the threshold value V4, the apparatus 1 determines a specific microphone that is an object of updating the individual filter coefficient k.
  • the present apparatus 1 determines the individual filter coefficient corresponding to the specific microphone based on the received signal s2 and the individual residual echo signal res10 included in the individual echo removal signal (specific echo removal signal) s60. k is calculated. The device 1 calculates and updates the filter coefficient F based on the individual filter coefficient k corresponding to the specific microphone and the individual filter coefficient k corresponding to the non-specific microphone.
  • the control unit 50 calculates the individual filter coefficients k1-k6 corresponding to each of the plurality of microphones 3a-3f, and synthesizes the individual filter coefficients k1-k6 to filter coefficients. F is calculated.
  • the removal signal generation unit 70 generates a removal signal s5 based on the calculated filter coefficient F.
  • the removal unit 80 removes the echo signal es included in the transmission signal s4 based on the transmission signal s4 and the removal signal s5 (generates an echo removal signal s6).
  • this apparatus 1 differs from a conventional apparatus having an echo cancellation unit corresponding to each of a plurality of microphones, and signals from a plurality of microphones 3 (multi-channels) by a common FIR filter (removal signal generation unit 70). Can be removed. That is, the present apparatus 1 realizes a simple circuit configuration as compared with the conventional apparatus. That is, the present apparatus 1 removes the echo signals es included in the signals from the plurality of microphones 3 with a simple circuit configuration in which one common FIR filter is used.
  • the control unit 50 does not include the audio signal s1 in the transmission signal s4 and includes the echo signal es in the transmission signal s4 (the reception signal).
  • the filter coefficient F is calculated (updated). Therefore, the present apparatus 1 reduces the processing load for calculating (updating) the filter coefficient F as compared with the conventional apparatus that always calculates (updates) the filter coefficient.
  • the switching unit 40 is configured such that the voice signal s1 is not included in the transmission signal s4 and the echo signal es is included in the transmission signal s4 (the reception signal s2 is When there is, the individual transmission signals s41 to s46 are input to the control unit 50 while being switched.
  • the control unit 50 calculates individual filter coefficients k1-k6 corresponding to the microphones 3a-3f based on signals from the plurality of microphones 3a-3f. That is, the apparatus 1 calculates the individual filter coefficients k1-k6 while switching the individual transmission signals s41-s46 by the switching unit 40.
  • the present apparatus 1 can calculate the individual filter coefficients k1-k6 corresponding to the six microphones 3a-3f by one common FIR filter (removal signal generation unit 70). That is, the apparatus 1 calculates the individual filter coefficient k corresponding to the plurality of microphones 3 with a simple circuit configuration, and calculates the filter coefficient F based on the individual filter coefficient k. As a result, the present apparatus 1 removes the echo signal es included in the signals from the plurality of microphones 3 with a simple circuit configuration.
  • the control unit 50 calculates the individual filter coefficient k based on the received signal s2 and the individual residual echo signal res10 included in the individual echo removal signal s60. That is, the present apparatus 1 improves the accuracy of the filter coefficient F by repeatedly calculating the individual filter coefficient k so that the individual residual echo signal res10 approaches “0” as much as possible, and reliably determines from the transmission signal s4. Echo signal es is removed (suppressed).
  • the control unit 50 updates the individual filter coefficients k1-k6 based on the gain values g1-g6 corresponding to each of the plurality of microphones 3a-3f. Therefore, the present apparatus 1 can calculate the individual filter coefficients k1-k6 with the gain values g1-g6 when the microphones 3a-3f pick up the echo components. As a result, the present apparatus 1 can improve (accurate) the filter coefficient F and reliably remove (suppress) the echo signal es from the transmission signal s4.
  • the control unit 50 always measures ERL in the echo signal removal process (ST3).
  • the control unit 50 updates the filter coefficient F stored in the storage unit 60 when the ERL is equal to or greater than the reference value (threshold value V2) and the speech signal s1 is not included in the transmission signal s4. That is, the apparatus 1 detects an environmental change at the timing when the ERL deteriorates, and updates the filter coefficient F. That is, this apparatus 1 reduces the processing load of calculation (update) of the filter coefficient F compared with the conventional apparatus which always calculates (updates) the filter coefficient F.
  • control unit 50 measures the individual ERL based on the comparison result between the ERL and the reference value (threshold value V2). As a result, when the ERL deteriorates, the present apparatus 1 detects the deviation of the filter coefficient F (deterioration / suppression effect of the echo signal es) from the measurement result of the ERL corresponding to each microphone 3a-3f.
  • the control unit 50 determines the individual filter coefficient k from among the plurality of microphones 3a to 3f based on the comparison result between the individual ERL and the individual reference value (threshold value V4).
  • the specific microphone to be updated is determined. That is, when the ERL deteriorates, the present apparatus 1 determines a specific microphone, thereby reducing the processing load and time required for updating the individual filter coefficient k and updating the filter coefficient F.
  • the control unit 50 calculates the individual filter coefficient k of the specific microphone.
  • the control unit 50 updates the filter coefficient F stored in the storage unit 60 based on the calculated individual filter coefficient k of the specific microphone and the individual filter coefficient k of the non-specific microphone. Therefore, the present apparatus 1 updates the filter coefficient F by calculating (updating) only the individual filter coefficient k of the specific microphone. That is, the present apparatus 1 reduces the processing load for the time required for updating the individual filter coefficient k and updating the filter coefficient F.
  • control unit 50 performs the environmental measurement for each microphone 3 and calculates the individual filter coefficient k based on the measurement result of the environmental measurement. Therefore, this apparatus 1 can calculate the filter coefficient F according to the environment of the room (space) where this apparatus 1 is installed.
  • the apparatus 1 calculates the filter coefficient F based on the initialization process (ST1) and the initial learning process (ST2), and based on the filter coefficient F. Echo cancellation is executed (echo signal removal processing (ST3) is executed).
  • the apparatus 1 detects an environmental change during the execution of the echo signal removal process (ST3), the apparatus 1 performs the specific process (ST4) and the update process (ST5), thereby realizing automatic adjustment of the filter coefficient F.
  • the present apparatus 1 executes multi-channel echo cancellation using a common filter, and also performs echo cancellation by automatically following environmental changes.
  • the number of microphones connected to the second input unit is not limited to “6” as long as it is plural.
  • the present apparatus 1 is configured to include a pair of removal signal generation unit 70 and removal unit 80. Therefore, the removal signal generation unit 70 is dedicated to the generation of the individual removal signal s50 in the specific process (ST4) and the update process (ST5). As a result, the present apparatus 1 does not execute the echo signal removal process (ST3), the specific process (ST4), and the update process (ST5) at the same time.
  • this apparatus includes a set of removal signal generation unit and removal unit used for echo signal removal processing, and a set of removal signal generation unit and removal unit used for identification processing and update processing. You may provide two sets of removal signal production
  • FIG. 12 is a functional block showing another embodiment of the present apparatus.
  • This figure shows an audio signal processing apparatus in which the present apparatus 1A includes a first removal signal generation unit 70A, a second removal signal generation unit 70B, a first removal unit 80A, and a second removal unit 80B.
  • the first removal signal generation unit 70A and the first removal unit 80A perform a specific process (ST4) and an update process (ST5).
  • the second removal signal generation unit 70B and the second removal unit 80B execute an echo signal removal process (ST3).
  • the apparatus 1A can simultaneously execute the echo signal removal process (ST3), the specific process (ST4), and the update process (ST5). Therefore, this apparatus 1A can remove (suppress) the echo signal es included in the signals from two or more microphones 3 with a simple circuit configuration including two echo canceller units.

Abstract

Provided are a voice signal processing device, a voice signal processing method and a voice signal processing program which can remove, with a simple circuit configuration, an echo signal included in input signals from a plurality of microphones. This device 1 includes: an output unit 20 which outputs a reception signal s2; an input unit 30 which collects an echo component and the voice of a speaker from the reception signal, and synthesizes signals respectively input from a plurality of microphones 3 that generate echo signals es according to the echo component and voice signals s1 according to the voice of the speaker so as to generate a transmission signal s4; a removal signal generation unit 70 which generates, on the basis of filter coefficients F, a removal signal s5 for removing the echo signals included in the transmission signal; a control unit 50 which calculates the filter coefficients; and a removal unit 80 which generates, on the basis of the transmission signal and the removal signal, an echo removal signal s6, wherein the control unit calculates individual filter coefficients k respectively corresponding to the plurality of microphones and synthesizes the individual filter coefficients to calculate the filter coefficient.

Description

音声信号処理装置と音声信号処理方法と音声信号処理プログラムAudio signal processing apparatus, audio signal processing method, and audio signal processing program
 本発明は、音声信号処理装置と、音声信号処理方法と、音声信号処理プログラムと、に関する。 The present invention relates to an audio signal processing device, an audio signal processing method, and an audio signal processing program.
 近年、インターネットなどの通信回線を利用した電話会議システムやテレビ会議システムなどの通信会議システムが、物理的に離れた拠点間での会議に使用されている。このような通信会議システムでは、一方の拠点から受信した音声信号(以下「受話信号」という。)に基づく音声が他方の拠点のスピーカから出力されたとき、その出力された音声を他方の拠点のマイクロホンが収音することにより、音響エコーが発生する。 In recent years, teleconferencing systems such as teleconference systems and video conference systems using communication lines such as the Internet have been used for conferences between physically separate locations. In such a communication conference system, when a sound based on an audio signal received from one site (hereinafter referred to as “received signal”) is output from a speaker at the other site, the output audio is transmitted to the other site. When the microphone picks up sound, an acoustic echo is generated.
 音響エコーは、通常、通信会議システムが備えるエコーキャンセラにより抑制・除去される。一般的なエコーキャンセラは、受話信号と、音響エコーに応じたエコー信号と、に基づいて、エコー信号を除去する除去信号を生成する適応フィルタを備え、除去信号とエコー信号とを加算または減算することにより、エコー信号を除去する。 Acoustic echo is normally suppressed and removed by an echo canceller provided in the teleconference system. A general echo canceller includes an adaptive filter that generates a removal signal for removing an echo signal based on a received signal and an echo signal corresponding to an acoustic echo, and adds or subtracts the removal signal and the echo signal. Thus, the echo signal is removed.
 このようなエコーキャンセラとして、複数のマイクロホンからのエコー信号を抑制・除去する多チャンネル対応のエコーキャンセラが提案されている(例えば、特許文献1参照。) As such an echo canceller, a multi-channel echo canceller that suppresses and removes echo signals from a plurality of microphones has been proposed (for example, see Patent Document 1).
特開2002-252577号公報JP 2002-252577 A
 特許文献1に開示されているエコーキャンセラは、複数のマイクロホンそれぞれに対応する複数のエコーキャンセル部を備え、各エコーキャンセル部が対応するマイクロホンからの入力信号に含まれるエコー信号を除去することにより、多チャンネルに対応する。すなわち、特許文献1に開示されているエコーキャンセラでは、マイクロホンと同数のエコーキャンセル部が必要となるため、回路構成や信号処理が複雑となる。 The echo canceller disclosed in Patent Document 1 includes a plurality of echo cancellation units corresponding to each of a plurality of microphones, and each echo cancellation unit removes an echo signal included in an input signal from a corresponding microphone, Supports multiple channels. In other words, the echo canceller disclosed in Patent Document 1 requires the same number of echo canceling units as the microphones, which complicates the circuit configuration and signal processing.
 本発明は、以上のような従来技術の問題点を解消するためになされたもので、簡易な回路構成で複数のマイクロホンそれぞれからの入力信号に含まれるエコー信号を除去可能な音声信号処理装置と、音声信号処理方法と、音声信号処理プログラムと、を提供することを目的とする。 The present invention has been made to solve the above-described problems of the prior art, and an audio signal processing apparatus capable of removing echo signals contained in input signals from a plurality of microphones with a simple circuit configuration, and An object of the present invention is to provide an audio signal processing method and an audio signal processing program.
 本発明にかかる音声信号処理装置は、受話信号を出力する出力部と、受話信号のエコー成分と、話者の音声と、を収音して、エコー成分に応じたエコー信号と、話者の音声に応じた音声信号と、を生成する複数のマイクロホンそれぞれから入力される信号を合成して送話信号を生成する入力部と、送話信号に含まれるエコー信号を除去する除去信号をフィルタ係数に基づいて生成する除去信号生成部と、フィルタ係数を算出する制御部と、送話信号と除去信号とに基づいて、エコー除去信号を生成する除去部と、を有してなり、制御部は、複数のマイクロホンそれぞれに対応する個別フィルタ係数を算出し、個別フィルタ係数を合成してフィルタ係数を算出する、ことを特徴とする。 An audio signal processing apparatus according to the present invention picks up an output unit that outputs a received signal, an echo component of the received signal, and a speaker's voice, and an echo signal according to the echo component, An input unit that generates a transmission signal by synthesizing signals input from each of a plurality of microphones that generate an audio signal corresponding to the audio, and a filter coefficient that removes an echo signal included in the transmission signal A removal signal generation unit that generates a filter coefficient, a control unit that calculates a filter coefficient, and a removal unit that generates an echo cancellation signal based on the transmission signal and the removal signal. The filter coefficient is calculated by calculating individual filter coefficients corresponding to each of the plurality of microphones and combining the individual filter coefficients.
 本発明によれば、簡易な回路構成で複数のマイクロホンそれぞれからの入力信号に含まれるエコー信号を除去することができる。 According to the present invention, it is possible to remove an echo signal included in an input signal from each of a plurality of microphones with a simple circuit configuration.
本発明にかかる音声信号処理装置の実施の形態を示す機能ブロック図である。It is a functional block diagram which shows embodiment of the audio | voice signal processing apparatus concerning this invention. 本発明にかかる音声信号処理方法の実施の形態を示すフローチャートである。It is a flowchart which shows embodiment of the audio | voice signal processing method concerning this invention. 図2の音声信号処理方法に含まれる初期化処理のフローチャートである。It is a flowchart of the initialization process included in the audio | voice signal processing method of FIG. 図2の音声信号処理方法に含まれる初期学習処理のフローチャートである。It is a flowchart of the initial learning process included in the audio | voice signal processing method of FIG. 図4の初期学習処理での信号の流れを示す機能ブロック図である。FIG. 5 is a functional block diagram showing a signal flow in the initial learning process of FIG. 4. 図2の音声信号処理方法に含まれるエコー信号除去処理のフローチャートである。It is a flowchart of the echo signal removal process included in the audio | voice signal processing method of FIG. 図6のエコー信号除去処理での信号の流れを示す機能ブロック図である。It is a functional block diagram which shows the flow of the signal in the echo signal removal process of FIG. 図6の信号処理に含まれる特定処理のフローチャートである。It is a flowchart of the specific process included in the signal processing of FIG. 図8の特定処理での信号の流れを示す機能ブロック図である。It is a functional block diagram which shows the flow of the signal in the specific process of FIG. 図8の信号処理に含まれる更新処理のフローチャートである。It is a flowchart of the update process included in the signal processing of FIG. 図10の更新処理での信号の流れを示す機能ブロック図である。It is a functional block diagram which shows the flow of the signal in the update process of FIG. 本発明の別の実施の形態を示す機能ブロック図である。It is a functional block diagram which shows another embodiment of this invention.
 以下、図面を参照しながら、本発明にかかる音声信号処理装置と、音声信号処理方法と、音声信号処理プログラムと、の実施の形態について説明する。 Hereinafter, embodiments of an audio signal processing device, an audio signal processing method, and an audio signal processing program according to the present invention will be described with reference to the drawings.
●音声信号処理装置●
 先ず、本発明にかかる音声信号処理装置(以下「本装置」という。)の実施の形態について、説明する。
● Audio signal processing device ●
First, an embodiment of an audio signal processing apparatus (hereinafter referred to as “this apparatus”) according to the present invention will be described.
●音声信号処理装置の構成
 図1は、本装置の実施の形態を示す機能ブロック図である。
 本装置1は、音声や楽音を電気信号に変換するマイクロホン3などの機器からの信号(入力信号)の混合、分配、バランス調整などの処理を行う。本装置1は、例えば、ミキサである。
Configuration of Audio Signal Processing Device FIG. 1 is a functional block diagram showing an embodiment of this device.
The apparatus 1 performs processing such as mixing, distribution, and balance adjustment of a signal (input signal) from a device such as a microphone 3 that converts voice or musical sound into an electrical signal. The device 1 is, for example, a mixer.
 以下、例えば、本装置1が設置されている第1拠点の話者と、第1拠点とは物理的に離れた第2拠点の話者と、の間で行われるテレビ会議において、本装置1が使用される場合であって、本装置1が、第1拠点に配置されている1つのスピーカ2と、6つのマイクロホン3a,3b,3c,3d,3e,3f(いわゆる6チャンネル)と、に接続されている場合を例に説明する。第1拠点と第2拠点とは、例えば、会議室などの部屋である。 Hereinafter, for example, in a video conference performed between a speaker at a first site where the device 1 is installed and a speaker at a second site physically separated from the first site, the device 1 Is used, and the present apparatus 1 includes one speaker 2 disposed at the first base and six microphones 3a, 3b, 3c, 3d, 3e, 3f (so-called six channels). An example of a connection is described. The first base and the second base are, for example, rooms such as a conference room.
 スピーカ2から室内の空間に出力された第2拠点からの音声(音)の一部は、同室内の空間を経由して、マイクロホン3に収音される。このとき、マイクロホン3は、スピーカ2から出力された音声(音)の一部(以下「エコー成分」という。)に応じた信号(以下「エコー信号」という。)esを生成して、エコー信号esを出力する。マイクロホン3は、第1拠点の話者が発話したとき、同話者の音声に応じた信号(以下「音声信号」という。)s1を生成して、音声信号s1を出力する。すなわち、第1拠点の話者と第2拠点の話者とが発話しているとき、マイクロホン3から出力される信号は、音声信号s1と、エコー信号esと、を含む。一方、第2拠点の話者のみが発話しているとき、マイクロホン3から出力される信号は、エコー信号esを含む。 Part of the sound (sound) from the second base output from the speaker 2 to the indoor space is collected by the microphone 3 via the indoor space. At this time, the microphone 3 generates a signal (hereinafter referred to as “echo signal”) es corresponding to a part of the sound (sound) output from the speaker 2 (hereinafter referred to as “echo component”), and the echo signal es is output. When the speaker at the first site speaks, the microphone 3 generates a signal (hereinafter referred to as “voice signal”) s1 corresponding to the voice of the speaker and outputs the voice signal s1. That is, when the speaker at the first site and the speaker at the second site are speaking, the signal output from the microphone 3 includes the audio signal s1 and the echo signal es. On the other hand, when only the speaker at the second site is speaking, the signal output from the microphone 3 includes an echo signal es.
 本装置1は、第1入力部10と、第1出力部20と、第2入力部30と、切替部40と、制御部50と、記憶部60と、除去信号生成部70と、除去部80と、第2出力部90と、を有してなる。 The apparatus 1 includes a first input unit 10, a first output unit 20, a second input unit 30, a switching unit 40, a control unit 50, a storage unit 60, a removal signal generation unit 70, and a removal unit. 80 and a second output unit 90.
 本装置1は、パーソナルコンピュータなどで実現される。本装置1では、本発明にかかる情報処理プログラム(以下「本プログラム」という。)が動作して、本プログラムが本装置1のハードウェア資源と協働して、後述する本発明にかかる音声信号処理方法(以下「本方法」という。)を実現する。 The device 1 is realized by a personal computer or the like. In the apparatus 1, an information processing program (hereinafter referred to as “this program”) according to the present invention operates, and this program cooperates with the hardware resources of the apparatus 1, and an audio signal according to the present invention to be described later. A processing method (hereinafter referred to as “the present method”) is realized.
 なお、コンピュータ(不図示)に本プログラムを実行させることで、同コンピュータを本装置と同様に機能させて、同コンピュータに本方法を実行させることができる。 In addition, by causing the computer (not shown) to execute the program, the computer can be caused to function in the same manner as the apparatus, and the computer can execute the method.
 第1入力部10は、第2拠点の通信装置4と通信ケーブルなどの通信回線5を介して接続されて、第2拠点からの音声信号(以下「受話信号」という。)s2を受信する。第1入力部10は、例えば、コネクタや端子などの通信インターフェイス(I/F)、増幅器、などにより構成される。第1入力部10からの受話信号s2は、第1出力部20と、制御部50と、除去信号生成部70と、に入力される。 The first input unit 10 is connected to the communication device 4 at the second site via the communication line 5 such as a communication cable, and receives a voice signal (hereinafter referred to as “received signal”) s2 from the second site. The first input unit 10 includes, for example, a communication interface (I / F) such as a connector or a terminal, an amplifier, and the like. The received signal s <b> 2 from the first input unit 10 is input to the first output unit 20, the control unit 50, and the removal signal generation unit 70.
 第1出力部20は、第1入力部10からの受話信号s2や制御部50からの基準信号s3を、スピーカ2に出力する。第1出力部20は、例えば、I/F、増幅器、などにより構成される。第1出力部20は、本発明における出力部である。「基準信号s3」は、本装置1が後述する本方法を実行するとき、スピーカ2を介して放出する基準音(例えば、ホワイトノイズ)に対応する信号である。基準信号s3は、制御部50により生成される。 The first output unit 20 outputs the reception signal s2 from the first input unit 10 and the reference signal s3 from the control unit 50 to the speaker 2. The first output unit 20 includes, for example, an I / F, an amplifier, and the like. The first output unit 20 is an output unit in the present invention. The “reference signal s3” is a signal corresponding to a reference sound (for example, white noise) emitted through the speaker 2 when the apparatus 1 executes the method described later. The reference signal s3 is generated by the control unit 50.
 第2入力部30は、各マイクロホン3a-3fと接続されて、同マイクロホン3a-3fそれぞれからの信号を受信する。第2入力部30は、例えば、I/F、増幅器、AD変換器、可変抵抗、などにより構成される。第2入力部30は、本発明における入力部である。第2入力部30は、受信した各信号のゲインを調整した信号(以下「個別送話信号」という。)s41,s42,s43,s44,s45,s46を生成すると共に、各個別送話信号s41-s46を合成した信号(以下「送話信号」という。)s4を生成する。すなわち、第2入力部30は、各個別送話信号s41-s46、換言すれば、各マイクロホン3a-3fそれぞれからの信号、を合成して送話信号s4を生成する。第2入力部30は、送話信号s4と各個別送話信号s41-s46それぞれに対応する7つの伝送路(不図示)を備える。生成された送話信号s4と各個別送話信号s41-s46とは、切替部40に入力される。以下、個別送話信号s41-s46を区別することなく総称する場合、各個別送話信号s41-s46を個別送話信号s40と記載する。 The second input unit 30 is connected to each microphone 3a-3f and receives signals from the respective microphones 3a-3f. The second input unit 30 includes, for example, an I / F, an amplifier, an AD converter, a variable resistor, and the like. The second input unit 30 is an input unit in the present invention. The second input unit 30 generates signals (hereinafter referred to as “individual transmission signals”) s41, s42, s43, s44, s45, and s46 in which the gains of the received signals are adjusted, and the individual transmission signals s41. A signal (hereinafter referred to as “transmission signal”) s4 obtained by combining −s46 is generated. That is, the second input unit 30 generates the transmission signal s4 by combining the individual transmission signals s41 to s46, in other words, the signals from the respective microphones 3a to 3f. The second input unit 30 includes seven transmission paths (not shown) corresponding to the transmission signal s4 and the individual transmission signals s41 to s46. The generated transmission signal s4 and the individual transmission signals s41 to s46 are input to the switching unit 40. Hereinafter, when the individual transmission signals s41 to s46 are collectively referred to without distinction, the individual transmission signals s41 to s46 are referred to as individual transmission signals s40.
 各信号のゲインの調整は、公知のゲインシェアリングのアルゴリズムを用いて実行される。「ゲインシェアリング」は、各マイクロホン3a-3fからの入力と、同入力の和と、を比較(例えば、マイクロホン3aからのみ信号の入力があるときと、マイクロホン3a-3fから信号の入力があるときと、を比較)して、トータルのゲイン値Gが一定値になるように各マイクロホン3a-3fからの信号の伝送路(増幅器)に設定されるゲイン値g1,g2,g3,g4,g5,g6を調節するアルゴリズムである。換言すれば、ゲインシェアリングは、各伝送路のトータルのゲイン値Gが一定値になるように、各マイクロホン3a-3fに対応するゲイン値g1-g6を調節するアルゴリズムである。各伝送路に設定されたゲイン値g1-g6は、記憶部60に記憶される。以下、各ゲイン値g1-g6を区別することなく総称する場合、各ゲイン値g1-g6をゲイン値gと記載する。 The adjustment of the gain of each signal is performed using a known gain sharing algorithm. “Gain sharing” compares the input from each microphone 3a-3f with the sum of the inputs (for example, when there is a signal input only from the microphone 3a and when there is a signal input from the microphone 3a-3f) And gain values g1, g2, g3, g4, and g5 set in the transmission paths (amplifiers) of signals from the microphones 3a to 3f so that the total gain value G becomes a constant value. , G6. In other words, gain sharing is an algorithm that adjusts the gain values g1-g6 corresponding to the microphones 3a-3f so that the total gain value G of each transmission line becomes a constant value. The gain values g1-g6 set for each transmission path are stored in the storage unit 60. Hereinafter, when the gain values g1-g6 are collectively referred to without distinction, the gain values g1-g6 are referred to as gain values g.
 送話信号s4と個別送話信号s41-s46とは、前述のとおり、各マイクロホン3a-3fそれぞれからの信号に基づいて生成される。すなわち、送話信号s4と個別送話信号s41-s46とは、第1拠点の話者が発話しているとき音声信号s1とエコー信号esとを含み、第1拠点の話者が発話していないときエコー信号esを含む。 As described above, the transmission signal s4 and the individual transmission signals s41 to s46 are generated based on the signals from the respective microphones 3a to 3f. That is, the transmission signal s4 and the individual transmission signals s41 to s46 include the voice signal s1 and the echo signal es when the speaker at the first site is speaking, and the speaker at the first site is speaking. When there is no echo signal es is included.
 切替部40は、制御部50からの切替信号に基づいて、第2入力部30の伝送路を切り替えることにより、第2入力部30から制御部50や除去部80に入力される信号を切り替える。すなわち、切替部40は、6つのマイクロホン3a-3fそれぞれに対応する個別送話信号s40や送話信号s4のうち、制御部50や除去部80に入力される信号を切り替える。切替部40は、例えば、ロータリースイッチやスライドスイッチなどで構成される。切替部40の動作については、後述する。 The switching unit 40 switches signals input from the second input unit 30 to the control unit 50 and the removal unit 80 by switching the transmission path of the second input unit 30 based on the switching signal from the control unit 50. That is, the switching unit 40 switches a signal input to the control unit 50 or the removal unit 80 among the individual transmission signal s40 and the transmission signal s4 corresponding to each of the six microphones 3a to 3f. The switching unit 40 is composed of, for example, a rotary switch or a slide switch. The operation of the switching unit 40 will be described later.
 制御部50は、本装置1が後述する本方法を実行するために必要な係数の算出、音声信号s1や受話信号s2の検出、エコーリターンロスの測定、などを実行する。制御部50は、例えば、CPU(Central Processing Unit)、MPU(Micro Processing Unit)、DSP(Digital Signal Processor)などのプロセッサや、ASIC(Application Specific Integrated Circuit)、FPGA(Field Programmable Gate Array)などの集積回路により構成される。制御部50の動作と、エコーリターンロスと、については、後述する。 The control unit 50 performs calculation of coefficients necessary for the apparatus 1 to execute the method described later, detection of the audio signal s1 and the reception signal s2, measurement of echo return loss, and the like. The controller 50 includes, for example, a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like. Consists of a circuit. The operation of the control unit 50 and the echo return loss will be described later.
 記憶部60は、本装置1が後述する本方法を実行するために必要な情報を記憶する手段である。記憶部60は、例えば、HDD(Hard Disk Drive)、SSD(Solid State Drive)などの記録装置や、RAM(Random Access Memory)、フラッシュメモリなどの半導体メモリ素子、などにより構成される。記憶部60に記憶される情報については、後述する。 The storage unit 60 is a means for storing information necessary for the apparatus 1 to execute the method described later. The storage unit 60 includes, for example, a recording device such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive), a semiconductor memory element such as a RAM (Random Access Memory), a flash memory, and the like. Information stored in the storage unit 60 will be described later.
 除去信号生成部70は、受話信号s2とフィルタ係数Fとに基づいて、除去信号s5を生成する。除去信号生成部70は、例えば、公知のFIR(Finite Impulse Response)フィルタである。「除去信号s5」は、送話信号s4に含まれるエコー信号esを除去(抑制)する信号である。すなわち、例えば、除去信号s5は、送話信号s4に含まれるエコー信号esと同位相(あるいは、限りなく同位相に近い)信号である。除去信号生成部70による除去信号s5の生成については、後述する。 The removal signal generation unit 70 generates a removal signal s5 based on the received signal s2 and the filter coefficient F. The removal signal generation unit 70 is, for example, a known FIR (Finite Impulse Response) filter. The “removal signal s5” is a signal for removing (suppressing) the echo signal es included in the transmission signal s4. That is, for example, the removal signal s5 is a signal having the same phase (or almost as close as possible) to the echo signal es included in the transmission signal s4. The generation of the removal signal s5 by the removal signal generation unit 70 will be described later.
 「フィルタ係数F」は、除去信号生成部70が受話信号s2に対してFIR処理を実行し、除去信号s5を生成するために用いる係数である。すなわち、除去信号生成部70は、フィルタ係数Fに基づいて、受話信号s2に対してFIR処理を実行して、除去信号s5を生成する。フィルタ係数Fは、前述のとおり、制御部50により算出される。制御部50によるフィルタ係数Fの算出については、後述する。 “Filter coefficient F” is a coefficient used by the removal signal generation unit 70 to perform FIR processing on the received signal s2 and generate the removal signal s5. That is, the removal signal generation unit 70 performs FIR processing on the received signal s2 based on the filter coefficient F to generate a removal signal s5. The filter coefficient F is calculated by the control unit 50 as described above. The calculation of the filter coefficient F by the control unit 50 will be described later.
 除去部80は、送話信号s4と除去信号s5とに基づいて、送話信号s4に含まれるエコー信号esを除去して、エコー除去信号s6を生成する。除去部80は、例えば、減算回路や加算回路などの演算回路である。除去部80によるエコー除去信号s6の生成については、後述する。 The removal unit 80 removes the echo signal es included in the transmission signal s4 based on the transmission signal s4 and the removal signal s5, and generates an echo removal signal s6. The removal unit 80 is an arithmetic circuit such as a subtraction circuit or an addition circuit, for example. The generation of the echo removal signal s6 by the removal unit 80 will be described later.
 「エコー除去信号s6」は、前述のとおり、送話信号s4からエコー信号esを除去(抑制)した信号である。エコー除去信号s6は、第1拠点の話者が発話しているとき音声信号s1と残留エコー信号resとを含み、第1拠点の話者が発話していないとき残留エコー信号resを含む。「残留エコー信号res」は、エコー信号esと除去信号s5との差分信号である。すなわち、例えば、エコー除去信号s6がエコー信号esを完全に除去したとき(エコー信号esと除去信号s5との位相が同位相のとき)、両信号を減算して生成される残留エコー信号resの信号レベルは、「0」である。エコー除去信号s6は、制御部50と第2出力部90とに入力される。 The “echo removal signal s6” is a signal obtained by removing (suppressing) the echo signal es from the transmission signal s4 as described above. The echo cancellation signal s6 includes a voice signal s1 and a residual echo signal res when the speaker at the first site is speaking, and includes a residual echo signal res when the speaker at the first site is not speaking. The “residual echo signal res” is a difference signal between the echo signal es and the removal signal s5. That is, for example, when the echo removal signal s6 completely removes the echo signal es (when the phase of the echo signal es and the removal signal s5 is the same phase), the residual echo signal res generated by subtracting both signals The signal level is “0”. The echo removal signal s6 is input to the control unit 50 and the second output unit 90.
 第2出力部90は、通信回線5に接続されて、同通信回線5にエコー除去信号s6を出力する。第2出力部90は、例えば、I/F、増幅器、などにより構成される。第2出力部90からのエコー除去信号s6は、通信回線5を介して、第2拠点の通信装置4に入力される。 The second output unit 90 is connected to the communication line 5 and outputs an echo cancellation signal s6 to the communication line 5. The second output unit 90 includes, for example, an I / F, an amplifier, and the like. The echo cancellation signal s6 from the second output unit 90 is input to the communication device 4 at the second site via the communication line 5.
●音声信号処理方法●
 次に、本方法について説明する。
● Audio signal processing method ●
Next, this method will be described.
 図2は、本方法の実施の形態を示すフローチャートである。
 本装置1は、本方法において、初期化処理(ST1)と、初期学習処理(ST2)と、エコー信号除去処理(ST3)と、後述する特定処理(ST4)(図8参照)と、後述する更新処理(ST5)(図10参照)と、を実行する。本装置1は、本方法において、各処理(ST1-ST5)を実行することにより、共通する1つのFIRフィルタ(除去信号生成部70)で6つのマイクロホン3(6チャンネル)に対応し、かつ、後述するように環境変化に自動的に対応したエコーキャンセルを実現する。
FIG. 2 is a flowchart showing an embodiment of the present method.
In the present method, the present apparatus 1 uses an initialization process (ST1), an initial learning process (ST2), an echo signal removal process (ST3), a later-described specific process (ST4) (see FIG. 8), and a later-described process. Update processing (ST5) (see FIG. 10) is executed. In the present method, the apparatus 1 corresponds to six microphones 3 (6 channels) with one common FIR filter (removed signal generation unit 70) by executing each process (ST1-ST5) in the present method, and As will be described later, echo cancellation that automatically responds to environmental changes is realized.
 本装置1は、本装置1の電源投入後、初期化処理(ST1)を実行する。 The apparatus 1 executes an initialization process (ST1) after the apparatus 1 is turned on.
●初期化処理
 図3は、初期化処理(ST1)のフローチャートである。
 「初期化処理(ST1)」は、パラメータの初期化や、環境測定などを実行する処理である。
Initialization Process FIG. 3 is a flowchart of the initialization process (ST1).
The “initialization process (ST1)” is a process for executing parameter initialization, environment measurement, and the like.
 先ず、本装置1は、パラメータの初期化を行う(ST101)。「パラメータ」は、後述する個別フィルタ係数kの算出に用いられるアルゴリズム(適応アルゴリズム)に設定される値である。 First, the apparatus 1 initializes parameters (ST101). The “parameter” is a value set in an algorithm (adaptive algorithm) used for calculating the individual filter coefficient k described later.
 次いで、本装置1は、制御部50を用いて、本装置1とスピーカ2とマイクロホン3とが設置されている第1拠点の環境測定を実行する(ST102)。「環境測定」は、本装置1とスピーカ2とマイクロホン3とが設置されている第1拠点のスピーカ2からマイクロホン3までのエコー成分の伝達経路(環境)に関する項目(例えば、残響時間、遅延時間、最大エコー量、暗騒音)の測定である。本装置1は、スピーカ2を介して基準音を第1拠点内に出力して、同基準音のエコー成分をマイクロホン3を介して収音する。制御部50は、残響時間と、遅延時間と、最大エコー量と、暗騒音と、を測定する。環境測定は、マイクロホン3a-3fごとに実行される。 Next, the present apparatus 1 uses the control unit 50 to perform environmental measurement of the first site where the present apparatus 1, the speaker 2, and the microphone 3 are installed (ST102). “Environmental measurement” is an item related to the transmission path (environment) of echo components from the speaker 2 to the microphone 3 at the first site where the apparatus 1, the speaker 2, and the microphone 3 are installed (for example, reverberation time, delay time). , Maximum echo amount, background noise). The apparatus 1 outputs a reference sound to the first site via the speaker 2 and collects an echo component of the reference sound via the microphone 3. The control unit 50 measures reverberation time, delay time, maximum echo amount, and background noise. The environmental measurement is executed for each microphone 3a-3f.
 「残響時間」は、第1拠点内に基準音を出力(放射)して、同基準音の出力を停止してから同音の残響音のエネルギー密度が60dB減衰するまでに要する時間である。「遅延時間」は、スピーカ2から出力した基準音をマイクロホン3が収音するまでに要する時間である。「最大エコー量」は、第1拠点内でマイクロホン3が収音するエコー成分の最大量である。「暗騒音」は、第1拠点内の騒音(空調の音や室外の車の音など)の音圧レベルである。 “Reverberation time” is the time required for the energy density of the reverberant sound of the same sound to decay by 60 dB after the reference sound is output (radiated) in the first base and the output of the reference sound is stopped. “Delay time” is the time required for the microphone 3 to collect the reference sound output from the speaker 2. The “maximum echo amount” is the maximum amount of echo components collected by the microphone 3 in the first site. “Background noise” is the sound pressure level of noise (such as air-conditioning sound or outdoor car sound) in the first site.
 次いで、本装置1は、マイクロホン3a-3fそれぞれに対応する環境測定の測定結果を記憶部60に記憶する(ST103)。 Next, the present apparatus 1 stores the measurement results of the environmental measurement corresponding to each of the microphones 3a-3f in the storage unit 60 (ST103).
 次いで、本装置1は、マイクロホン3a-3fそれぞれに対応する環境測定の測定結果に基づいてパラメータを特定する(ST104)。パラメータは、環境測定の測定結果に基づいて新規に算出され、あるいは、環境測定の測定結果に基づいて予め記憶部60に記憶されている複数のパラメータ群から1のパラメータ群が選択されることにより、特定される。 Next, the present apparatus 1 specifies parameters based on the measurement results of the environmental measurements corresponding to the microphones 3a to 3f (ST104). The parameter is newly calculated based on the measurement result of the environmental measurement, or one parameter group is selected from a plurality of parameter groups stored in advance in the storage unit 60 based on the measurement result of the environmental measurement. Specified.
●初期学習処理
 図4は、初期学習処理(ST2)のフローチャートである。
 図5は、初期学習処理(ST2)での信号の流れを示す機能ブロック図である。
 同図は、初期学習処理(ST2)での信号の流れのうち、主要な流れを矢印で示す。
Initial Learning Process FIG. 4 is a flowchart of the initial learning process (ST2).
FIG. 5 is a functional block diagram showing a signal flow in the initial learning process (ST2).
In the figure, the main flow is indicated by arrows in the signal flow in the initial learning process (ST2).
 「初期学習処理(ST2)」は、本装置1の電源投入後などに、本装置1がフィルタ係数Fを最初に算出(学習)する処理である。 “Initial learning process (ST2)” is a process in which the apparatus 1 first calculates (learns) the filter coefficient F after the apparatus 1 is powered on.
 先ず、本装置1は、切替部40と制御部50とを用いて、第2入力部30の伝送路をマイクロホン3aからの伝送路に切り替える(ST201)。切替部40による伝送路の切替は、制御部50からの切替信号に基づいて、行われる。 First, the device 1 switches the transmission path of the second input unit 30 to the transmission path from the microphone 3a using the switching unit 40 and the control unit 50 (ST201). The switching of the transmission path by the switching unit 40 is performed based on a switching signal from the control unit 50.
 次いで、本装置1は、第2入力部30を用いて、マイクロホン3aに対応する個別送話信号s41を生成する(ST202)。制御部50は、基準信号s3を生成して、基準信号s3を第1出力部20に入力する。本装置1は、スピーカ2から基準音を出力して、同基準音のエコー成分を前述の処理(ST201)で切り替えた伝送路に対応するマイクロホン3(マイクロホン3a)により収音する。第2入力部30は、マイクロホン3aから入力される信号に基づいて、マイクロホン3aに対応する個別送話信号s41を生成する。同個別送話信号s41は、基準音のエコー成分に応じたエコー信号esを含む。個別送話信号s41は、第2入力部30から切替部40を介して、除去部80に入力される。 Next, the present apparatus 1 uses the second input unit 30 to generate an individual transmission signal s41 corresponding to the microphone 3a (ST202). The control unit 50 generates the reference signal s3 and inputs the reference signal s3 to the first output unit 20. The apparatus 1 outputs a reference sound from the speaker 2 and picks up the echo component of the reference sound by the microphone 3 (microphone 3a) corresponding to the transmission path switched in the above-described processing (ST201). The second input unit 30 generates an individual transmission signal s41 corresponding to the microphone 3a based on a signal input from the microphone 3a. The individual transmission signal s41 includes an echo signal es corresponding to the echo component of the reference sound. The individual transmission signal s41 is input from the second input unit 30 to the removal unit 80 via the switching unit 40.
 次いで、本装置1は、制御部50と除去信号生成部70とを用いて、個別除去信号s51を生成する(ST203)。「個別除去信号s51」は、個別送話信号s41に含まれるエコー信号(以下「個別エコー信号」という。)es1を除去する信号である。以下、個別除去信号s51-s56を区別することなく総称する場合、個別除去信号s51-s56を個別除去信号s50と記載する。 Next, the apparatus 1 generates the individual removal signal s51 using the control unit 50 and the removal signal generation unit 70 (ST203). The “individual removal signal s51” is a signal for removing an echo signal (hereinafter referred to as “individual echo signal”) es1 included in the individual transmission signal s41. Hereinafter, when the individual removal signals s51-s56 are collectively referred to without distinction, the individual removal signals s51-s56 are referred to as individual removal signals s50.
 制御部50は、記憶部60からマイクロホン3aに対応する個別フィルタ係数k1の初期値を読み出し、除去信号生成部70に入力(設定)する。除去信号生成部70は、基準信号s3と個別フィルタ係数k1とに基づいて、個別除去信号s51を算出する。個別除去信号s51は、除去部80に入力される。 The control unit 50 reads the initial value of the individual filter coefficient k1 corresponding to the microphone 3a from the storage unit 60, and inputs (sets) it to the removal signal generation unit 70. The removal signal generation unit 70 calculates the individual removal signal s51 based on the reference signal s3 and the individual filter coefficient k1. The individual removal signal s51 is input to the removal unit 80.
 「個別フィルタ係数k1」は、スピーカ2からマイクロホン3aに至る音響伝達経路の伝達関数である。すなわち、個別フィルタ係数k1は、除去信号生成部70が参照信号に対してFIR処理を実行して、個別除去信号s51を生成するために用いる係数である。「参照信号」は、除去信号生成部70が個別フィルタ係数k1に基づいて個別除去信号s51を生成する基となる信号(初期学習処理(ST2)では基準信号s3、エコー信号除去処理(ST3)と特定処理(ST4)と更新処理(ST5)とでは受話信号s2)である。 “Individual filter coefficient k1” is a transfer function of an acoustic transfer path from the speaker 2 to the microphone 3a. That is, the individual filter coefficient k1 is a coefficient used by the removal signal generation unit 70 to perform the FIR process on the reference signal and generate the individual removal signal s51. The “reference signal” is a signal that is a basis for the removal signal generation unit 70 to generate the individual removal signal s51 based on the individual filter coefficient k1 (the reference signal s3 in the initial learning process (ST2), the echo signal removal process (ST3), The specific process (ST4) and the update process (ST5) are received signals s2).
 次いで、本装置1は、除去部80を用いて、個別送話信号s41に含まれる個別エコー信号es1を除去して、個別エコー除去信号s61を生成する(ST204)。除去部80は、個別送話信号s41と個別除去信号s51とに基づいて、個別エコー除去信号s61を生成する。個別エコー除去信号s61は、除去部80から制御部50と第2出力部90とに入力される。このとき、第2出力部90は、個別エコー除去信号s61をミュートする。その結果、個別エコー除去信号s61は、第2拠点に送信されない。 Next, the present apparatus 1 uses the removal unit 80 to remove the individual echo signal es1 included in the individual transmission signal s41 and generate the individual echo removal signal s61 (ST204). The removal unit 80 generates the individual echo removal signal s61 based on the individual transmission signal s41 and the individual removal signal s51. The individual echo removal signal s61 is input from the removal unit 80 to the control unit 50 and the second output unit 90. At this time, the second output unit 90 mutes the individual echo removal signal s61. As a result, the individual echo removal signal s61 is not transmitted to the second site.
 なお、第2出力部は、個別エコー除去信号をアッテネートしてもよく、あるいは、個別エコー除去信号をミュートした上でダミーノイズ(ピンクノイズ)を第2拠点に送信してもよい。 Note that the second output unit may attenuate the individual echo cancellation signal, or may mute the individual echo cancellation signal and transmit dummy noise (pink noise) to the second site.
 「個別エコー除去信号s61」は、個別送話信号s41から個別エコー信号es1を除去(抑制)した信号である。個別エコー除去信号s61は、個別残留エコー信号res11を含む。以下、個別エコー除去信号s61-s66を区別することなく総称する場合、個別エコー除去信号s61-s66を個別エコー除去信号s60と記載する。「個別残留エコー信号res11」は、個別エコー信号es1と個別除去信号s51との差分信号である。除去部80は、個別送話信号s41から個別除去信号s51を減算して、個別エコー除去信号s61を生成する。以下、個別残留エコー信号res11-res16を区別することなく総称する場合、個別残留エコー信号res11-res16を個別残留エコー信号res10と記載する。 The “individual echo cancellation signal s61” is a signal obtained by removing (suppressing) the individual echo signal es1 from the individual transmission signal s41. The individual echo removal signal s61 includes an individual residual echo signal res11. Hereinafter, when the individual echo cancellation signals s61-s66 are collectively referred to without distinction, the individual echo cancellation signals s61-s66 are referred to as individual echo cancellation signals s60. The “individual residual echo signal res11” is a difference signal between the individual echo signal es1 and the individual removal signal s51. The removing unit 80 subtracts the individual removal signal s51 from the individual transmission signal s41 to generate the individual echo removal signal s61. Hereinafter, when the individual residual echo signals res11 to res16 are collectively referred to without distinction, the individual residual echo signals res11 to res16 are referred to as individual residual echo signals res10.
 次いで、本装置1は、制御部50を用いて、マイクロホン3aに対応する個別フィルタ係数k1を算出する(ST205)。制御部50は、マイクロホン3aからの信号の伝送路に設定された(マイクロホン3aに対応する)ゲイン値g1を記憶部60から読み出す。次いで、制御部50は、読み出されたゲイン値g1と、基準信号s3と、個別エコー除去信号s61(すなわち、個別エコー除去信号s61に含まれる個別残留エコー信号res11)と、に基づいて、公知の適応アルゴリズムを用いて、マイクロホン3aに対応する個別フィルタ係数k1を算出する。 Next, the present apparatus 1 uses the control unit 50 to calculate an individual filter coefficient k1 corresponding to the microphone 3a (ST205). The control unit 50 reads the gain value g1 (corresponding to the microphone 3a) set in the transmission path of the signal from the microphone 3a from the storage unit 60. Next, the control unit 50 is known based on the read gain value g1, the reference signal s3, and the individual echo removal signal s61 (that is, the individual residual echo signal res11 included in the individual echo removal signal s61). Is used to calculate the individual filter coefficient k1 corresponding to the microphone 3a.
 算出された個別フィルタ係数k1は、記憶部60に記憶される(ST206)。その結果、記憶部60に記憶されている個別フィルタ係数k1は、初期値から算出値へ更新される。 The calculated individual filter coefficient k1 is stored in the storage unit 60 (ST206). As a result, the individual filter coefficient k1 stored in the storage unit 60 is updated from the initial value to the calculated value.
 本装置1は、全てのマイクロホン3a-3fに対応する個別フィルタ係数k1-k6を算出するまで(ST207の「いいえ」)、処理(ST201-ST206)を繰り返す。ここで、前述のとおり、適応アルゴリズムのパラメータは、マイクロホン3それぞれに対応する環境測定の測定結果に基づいて、特定される。換言すれば、制御部50は、マイクロホン3それぞれに対応する環境測定の測定結果に基づいて、マイクロホン3それぞれに対応する個別フィルタ係数kを算出する。 The apparatus 1 repeats the processing (ST201-ST206) until the individual filter coefficients k1-k6 corresponding to all the microphones 3a-3f are calculated (“No” in ST207). Here, as described above, the parameter of the adaptive algorithm is specified based on the measurement result of the environmental measurement corresponding to each microphone 3. In other words, the control unit 50 calculates the individual filter coefficient k corresponding to each microphone 3 based on the measurement result of the environmental measurement corresponding to each microphone 3.
 全てのマイクロホン3a-3fに対応する個別フィルタ係数k1-k6が算出されたとき(ST207の「はい」)、本装置1は、制御部50を用いて、フィルタ係数Fを算出する(ST208)。制御部50は、各マイクロホン3a-3fからの信号の伝送路のゲイン値g1-g6と、各個別フィルタ係数k1-k6と、を記憶部60から読み出し、ゲイン値g1-g6と個別フィルタ係数k1-k6とに基づいて、フィルタ係数Fを算出する。フィルタ係数Fは、各個別フィルタ係数k1-k6を合成して算出される。以下、個別フィルタ係数k1-k6を区別することなく総称する場合、個別フィルタ係数k1-k6を個別フィルタ係数kと記載する。 When the individual filter coefficients k1-k6 corresponding to all the microphones 3a-3f have been calculated (“Yes” in ST207), the present apparatus 1 calculates the filter coefficient F using the control unit 50 (ST208). The control unit 50 reads the gain values g1-g6 of the signal transmission paths from the microphones 3a-3f and the individual filter coefficients k1-k6 from the storage unit 60, and the gain values g1-g6 and the individual filter coefficients k1. Based on −k6, the filter coefficient F is calculated. The filter coefficient F is calculated by combining the individual filter coefficients k1-k6. Hereinafter, when the individual filter coefficients k1-k6 are collectively referred to without distinction, the individual filter coefficients k1-k6 are referred to as individual filter coefficients k.
 各個別フィルタ係数k1-k6の合成は、各マイクロホン3a-3fに対応する個別フィルタ係数k1-k6ごとに、個別フィルタ係数kにゲイン値gを乗算して、その結果を加算することにより、実行される。すなわち、フィルタ係数Fは、マイクロホン3aに対応する個別フィルタ係数k1とゲイン値g1とを乗算した値と、マイクロホン3bに対応する個別フィルタ係数k2とゲイン値g2とを乗算した値と、マイクロホン3cに対応する個別フィルタ係数k3とゲイン値g3とを乗算した値と、マイクロホン3dに対応する個別フィルタ係数k4とゲイン値g4とを乗算した値と、マイクロホン3eに対応する個別フィルタ係数k5とゲイン値g5とを乗算した値と、マイクロホン3fに対応する個別フィルタ係数k6とゲイン値g6とを乗算した値と、を加算して算出される。 The synthesis of each individual filter coefficient k1-k6 is performed by multiplying the individual filter coefficient k by the gain value g and adding the result for each individual filter coefficient k1-k6 corresponding to each microphone 3a-3f. Is done. That is, the filter coefficient F is a value obtained by multiplying the individual filter coefficient k1 corresponding to the microphone 3a and the gain value g1, a value obtained by multiplying the individual filter coefficient k2 corresponding to the microphone 3b and the gain value g2, and the microphone 3c. A value obtained by multiplying the corresponding individual filter coefficient k3 and the gain value g3, a value obtained by multiplying the individual filter coefficient k4 and the gain value g4 corresponding to the microphone 3d, an individual filter coefficient k5 and a gain value g5 corresponding to the microphone 3e. And a value obtained by multiplying the individual filter coefficient k6 corresponding to the microphone 3f and the gain value g6 are calculated.
 算出されたフィルタ係数Fは、記憶部60に記憶されると共に、除去信号生成部70に入力(設定)される(ST209)。その結果、除去信号生成部70は、フィルタ係数Fに基づいて、除去信号s5の生成が可能となる。 The calculated filter coefficient F is stored in the storage unit 60 and input (set) to the removal signal generation unit 70 (ST209). As a result, the removal signal generation unit 70 can generate the removal signal s5 based on the filter coefficient F.
 このように、本方法は、各マイクロホン3a-3fに対応する個別フィルタ係数k1-k6を算出して、合成することで、フィルタ係数Fを算出する。そのため、本装置1は、マイクロホンごとにエコーキャンセル部を備える従来の装置とは異なり、1つのエコーキャンセル部(制御部50と除去信号生成部70と除去部80とに相当)により各マイクロホン3a-3fからの信号に含まれるエコー信号esを除去可能である。すなわち、本装置1は、6つのマイクロホン3a-3fからの入力に対して、共通する1つのFIRフィルタ(除去信号生成部70)によりエコーキャンセルを実行する。つまり、本装置1は、従来の装置と比較して、簡易な回路構成で各マイクロホン3a-3fからの信号に含まれるエコー信号esを除去可能である。 Thus, in this method, the filter coefficient F is calculated by calculating and combining the individual filter coefficients k1-k6 corresponding to the microphones 3a-3f. Therefore, unlike the conventional apparatus that includes an echo canceling unit for each microphone, the present device 1 uses a single echo canceling unit (corresponding to the control unit 50, the removal signal generating unit 70, and the removing unit 80) to each microphone 3a- The echo signal es included in the signal from 3f can be removed. That is, the present apparatus 1 executes echo cancellation by using one common FIR filter (removal signal generation unit 70) for the inputs from the six microphones 3a-3f. That is, the present apparatus 1 can remove the echo signal es included in the signal from each microphone 3a-3f with a simple circuit configuration as compared with the conventional apparatus.
●エコー信号除去処理
 図6は、エコー信号除去処理(ST3)のフローチャートである。
 図7は、エコー信号除去処理(ST3)での信号の流れを示す機能ブロック図である。
 同図は、エコー信号除去処理(ST3)での信号の流れのうち、主要な流れを矢印で示す。
Echo Signal Removal Process FIG. 6 is a flowchart of the echo signal removal process (ST3).
FIG. 7 is a functional block diagram showing a signal flow in the echo signal removal process (ST3).
In the figure, the main flow of the signal flow in the echo signal removal process (ST3) is indicated by an arrow.
 「エコー信号除去処理(ST3)」は、例えば、第1拠点と第2拠点との間の会議中など、第1入力部10が受信した信号に受話信号s2が含まれるとき、送話信号s4から受話信号s2に対応するエコー信号esを除去する処理である。前述のとおり、第1入力部10からの信号(受話信号s2)は、第1出力部20と、制御部50と、除去信号生成部70と、に入力される。 The “echo signal removal process (ST3)” is performed when the received signal s2 is included in the signal received by the first input unit 10, for example, during a meeting between the first site and the second site. Is a process of removing the echo signal es corresponding to the received signal s2. As described above, the signal (received signal s <b> 2) from the first input unit 10 is input to the first output unit 20, the control unit 50, and the removal signal generation unit 70.
 先ず、本装置1は、制御部50を用いて、第1入力部10からの信号に受話信号s2が含まれるか否か、つまり、受話信号s2の有無を検出する(ST301)。制御部50は、例えば、第1入力部10からの信号(信号レベル)と、所定の閾値V1と、を比較することにより、受話信号s2の有無を検出する。受話信号s2が有るとき、送話信号s4は、受話信号s2に対応するエコー信号esを含む。 First, the apparatus 1 uses the control unit 50 to detect whether or not the received signal s2 is included in the signal from the first input unit 10, that is, the presence or absence of the received signal s2 (ST301). For example, the control unit 50 detects the presence or absence of the received signal s2 by comparing a signal (signal level) from the first input unit 10 with a predetermined threshold value V1. When there is a reception signal s2, the transmission signal s4 includes an echo signal es corresponding to the reception signal s2.
 「閾値V1」は、制御部50が第1入力部10からの信号に受話信号s2が含まれるか否かを検出するための閾値である。閾値V1は、記憶部60に記憶されている。 “Threshold V1” is a threshold for the control unit 50 to detect whether or not the received signal s2 is included in the signal from the first input unit 10. The threshold value V1 is stored in the storage unit 60.
 第1入力部10からの信号(信号レベル)が閾値V1より小さい(受話信号s2が無い)とき(ST301の「いいえ」)、本装置1は、受話信号s2の有無の検出を繰り返す。 When the signal (signal level) from the first input unit 10 is smaller than the threshold value V1 (there is no received signal s2) (“No” in ST301), the present apparatus 1 repeats the detection of the presence or absence of the received signal s2.
 一方、第1入力部10からの信号が閾値V1以上(受話信号s2が有る)のとき(ST301の「はい」)、本装置1は、切替部40と制御部50とを用いて、第2入力部30の伝送路を送話信号s4の伝送路に切り替える(ST302)。 On the other hand, when the signal from the first input unit 10 is equal to or higher than the threshold value V1 (there is an incoming signal s2) (“Yes” in ST301), the present apparatus 1 uses the switching unit 40 and the control unit 50 to The transmission path of input unit 30 is switched to the transmission path of transmission signal s4 (ST302).
 次いで、本装置1は、第2入力部30を用いて、送話信号s4を生成する(ST303)。送話信号s4は、切替部40を介して、第2入力部30から制御部50と除去部80とに入力される。 Next, the present apparatus 1 generates a transmission signal s4 using the second input unit 30 (ST303). The transmission signal s4 is input from the second input unit 30 to the control unit 50 and the removal unit 80 via the switching unit 40.
 次いで、本装置1は、制御部50と除去信号生成部70とを用いて、除去信号s5を生成する(ST304)。制御部50は、記憶部60からフィルタ係数Fを読み出して、除去信号生成部70に入力(設定)する。除去信号生成部70は、制御部50から入力されたフィルタ係数Fに基づいて、受話信号s2から除去信号s5を生成する。フィルタ係数Fは、初期学習処理(ST2)において算出されたフィルタ係数F、または、後述する更新処理(ST5)において算出して更新されたフィルタ係数Fである。 Next, the present apparatus 1 generates a removal signal s5 using the control unit 50 and the removal signal generation unit 70 (ST304). The control unit 50 reads the filter coefficient F from the storage unit 60 and inputs (sets) the filter coefficient F to the removal signal generation unit 70. The removal signal generation unit 70 generates a removal signal s5 from the received signal s2 based on the filter coefficient F input from the control unit 50. The filter coefficient F is the filter coefficient F calculated in the initial learning process (ST2) or the filter coefficient F calculated and updated in the update process (ST5) described later.
 次いで、本装置1は、除去部80を用いて、送話信号s4に含まれるエコー信号esを除去して、エコー除去信号s6を生成する(ST305)。除去部80は、送話信号s4と除去信号s5とに基づいて、エコー除去信号s6を生成する。エコー除去信号s6は、制御部50と第2出力部90とに入力される。 Next, the present apparatus 1 uses the removal unit 80 to remove the echo signal es included in the transmission signal s4 and generate an echo removal signal s6 (ST305). The removal unit 80 generates an echo removal signal s6 based on the transmission signal s4 and the removal signal s5. The echo removal signal s6 is input to the control unit 50 and the second output unit 90.
 次いで、本装置1は、制御部50を用いて、エコーリターンロス(Echo Return Loss:ERL)を測定する(ST306)。 Next, the present apparatus 1 measures an echo return loss (ERL) using the control unit 50 (ST306).
 「ERL」は、送話信号s4とエコー除去信号s6のレベル差、すなわち、エコー除去信号s6に含まれる残留エコー信号resの大きさ(信号レベル)である。ERLは、例えば、マイクロホン3の設置場所の変更や、スピーカ2の出力レベルの変動などに影響を受ける。すなわち、例えば、ERLは、話者によりマイクロホン3の位置が動かされ、エコー成分の伝達経路が変化(環境変化)したときに悪化する。制御部50は、送話信号s4の信号レベルと、エコー除去信号s6の信号レベルと、に基づいて、ERLを測定する。すなわち、制御部50は、送話信号s4の信号レベルからエコー除去信号s6の信号レベルを減算することにより、ERLを測定する。 “ERL” is a level difference between the transmission signal s4 and the echo cancellation signal s6, that is, the magnitude (signal level) of the residual echo signal res included in the echo cancellation signal s6. The ERL is influenced by, for example, a change in the installation location of the microphone 3 or a change in the output level of the speaker 2. That is, for example, ERL deteriorates when the position of the microphone 3 is moved by the speaker and the transmission path of the echo component changes (environmental change). The control unit 50 measures ERL based on the signal level of the transmission signal s4 and the signal level of the echo cancellation signal s6. That is, the control unit 50 measures the ERL by subtracting the signal level of the echo removal signal s6 from the signal level of the transmission signal s4.
 次いで、本装置1は、制御部50を用いて、測定したERLと、所定の閾値V2と、を比較する(ST307)。「閾値V2」は、本装置1によるエコー信号esの除去が十分か否か(残留エコー信号resの信号レベルが大きいか否か)の閾値である。すなわち、本装置1によるエコー信号esの除去が不十分なとき、ERLは、閾値V2以上となる(悪化する)。一方、本装置1によるエコー信号esの除去が十分なとき、ERLは、閾値V2より小さい。閾値V2は、本発明における基準値である。閾値V2は、記憶部60に記憶されている。 Next, the present apparatus 1 uses the control unit 50 to compare the measured ERL with a predetermined threshold value V2 (ST307). The “threshold value V2” is a threshold value indicating whether or not the echo signal es is sufficiently removed by the apparatus 1 (whether the signal level of the residual echo signal res is high). That is, when the removal of the echo signal es by the present apparatus 1 is insufficient, the ERL becomes equal to or more than the threshold value V2 (deteriorates). On the other hand, when the removal of the echo signal es by the apparatus 1 is sufficient, the ERL is smaller than the threshold value V2. The threshold value V2 is a reference value in the present invention. The threshold value V2 is stored in the storage unit 60.
 ERLが閾値V2より小さいとき(ST307の「いいえ」)、本装置1は、第2出力部90を用いて、エコー除去信号s6を第2拠点の通信装置4に出力して(ST308)、処理(ST301)に戻る。 When the ERL is smaller than the threshold value V2 (“No” in ST307), the apparatus 1 uses the second output unit 90 to output the echo removal signal s6 to the communication apparatus 4 at the second site (ST308) for processing. Return to (ST301).
 一方、ERLが閾値V2以上のとき(ST307の「はい」)、本装置1は、制御部50を用いて、送話信号s4に音声信号s1が含まれているか否か(音声信号s1の有無)を検出する(ST309)。制御部50は、例えば、第2入力部30からの送話信号s4(信号レベル)と、所定の閾値V3と、を比較することにより、音声信号s1の有無を検出する。 On the other hand, when the ERL is equal to or higher than the threshold value V2 (“Yes” in ST307), the apparatus 1 uses the control unit 50 to determine whether the transmission signal s4 includes the audio signal s1 (whether the audio signal s1 is present). ) Is detected (ST309). For example, the control unit 50 detects the presence or absence of the audio signal s1 by comparing the transmission signal s4 (signal level) from the second input unit 30 with a predetermined threshold value V3.
 「閾値V3」は、制御部50が第2入力部30からの送話信号s4に音声信号s1が含まれるか否かを検出するための閾値である。閾値V3は、記憶部60に記憶されている。 “Threshold V3” is a threshold for the control unit 50 to detect whether or not the audio signal s1 is included in the transmission signal s4 from the second input unit 30. The threshold value V3 is stored in the storage unit 60.
 送話信号s4の信号レベルが閾値V3以上(音声信号s1が有る)のとき(ST309の「はい」)、本装置1は、第2出力部90を用いて、エコー除去信号s6を第2拠点の通信装置4に出力して(ST308)、処理(ST301)に戻る。 When the signal level of the transmission signal s4 is equal to or higher than the threshold value V3 (the voice signal s1 is present) (“Yes” in ST309), the present apparatus 1 uses the second output unit 90 to transmit the echo cancellation signal s6 to the second base. Is output to the communication device 4 (ST308), and the process returns to ST301.
 一方、送話信号s4の信号レベルが閾値V3より小さい(音声信号s1が無い)とき(ST309の「いいえ」)、本装置1は、第2出力部90を用いて、エコー除去信号s6を第2拠点の通信装置4に出力して(ST310)、特定処理(ST4)を実行する。 On the other hand, when the signal level of the transmission signal s4 is lower than the threshold value V3 (no audio signal s1) (“No” in ST309), the present apparatus 1 uses the second output unit 90 to output the echo cancellation signal s6 as the first signal. The data is output to the communication devices 4 at two sites (ST310), and the specific process (ST4) is executed.
 このように、本装置1は、ERLが閾値V2以上のとき、受話信号s2が有り、かつ、音声信号s1が無いタイミング、で特定処理(ST4)を実行する。すなわち、本装置1は、ERLと閾値V2との比較結果に基づいて、送話信号s4にエコー信号esが含まれており、かつ、送話信号s4に音声信号s1が含まれていないとき、特定処理(ST4)を実行する。換言すれば、本装置1は、エコー信号除去処理(ST3)の実行中に環境変化を検知すると、特定処理(ST4)を実行する。 Thus, when the ERL is equal to or greater than the threshold value V2, the present apparatus 1 executes the specific process (ST4) at the timing when the received signal s2 is present and the voice signal s1 is absent. That is, the present device 1 is based on the comparison result between the ERL and the threshold value V2, and when the echo signal es is included in the transmission signal s4 and the audio signal s1 is not included in the transmission signal s4, A specific process (ST4) is executed. In other words, when the apparatus 1 detects an environmental change during the execution of the echo signal removal process (ST3), the apparatus 1 executes the specific process (ST4).
 なお、ERLの値が負の値として測定されるとき、閾値V2は負の値であり、本装置1は前述の処理(ST307)でのERLと閾値V2との大小の比較を逆にしてもよい。すなわち、例えば、負の値であるERLが閾値V2以下のとき、本装置は、制御部を用いて、送話信号に音声信号が含まれているか否か(音声信号の有無)を検出してもよい。 When the ERL value is measured as a negative value, the threshold value V2 is a negative value, and the present apparatus 1 can reverse the magnitude comparison between the ERL and the threshold value V2 in the above-described processing (ST307). Good. That is, for example, when the negative ERL is equal to or less than the threshold value V2, the apparatus uses the control unit to detect whether or not a voice signal is included in the transmission signal (the presence or absence of a voice signal). Also good.
●特定処理
 図8は、特定処理(ST4)のフローチャートである。
 図9は、特定処理(ST4)での信号の流れを示す機能ブロック図である。
 同図は、特定処理(ST4)での信号の流れのうち、主要な流れを矢印で示す。同図は、説明の便宜上、マイクロホン3a-3fのうち、マイクロホン3aからの信号に対応する各信号のみを示す。
Specific processing FIG. 8 is a flowchart of the specific processing (ST4).
FIG. 9 is a functional block diagram showing a signal flow in the specific process (ST4).
In the figure, of the signal flow in the specific processing (ST4), the main flow is indicated by arrows. For convenience of explanation, FIG. 4 shows only signals corresponding to signals from the microphone 3a among the microphones 3a to 3f.
 「特定処理(ST4)」は、マイクロホン3を特定マイクロホン、または、非特定マイクロホンとして特定する処理である。「特定マイクロホン」は、対応する個別フィルタ係数kが適正でない(ずれている)マイクロホン3、すなわち、個別フィルタ係数kの更新の対象となるマイクロホン3である。ERLの悪化は、エコー信号esに対するフィルタ係数Fのずれ、すなわち、各個別エコー信号es1-es6に対する各個別フィルタ係数k1-k6のずれ、に起因する。そのため、特定マイクロホンに対応する個別フィルタ係数kは、適正な値に更新する必要がある。「非特定マイクロホン」は、対応する個別フィルタ係数kが適正な(ずれていない)マイクロホン3、すなわち、個別フィルタ係数kの更新の対象とならないマイクロホン3である。 “Specific processing (ST4)” is processing for specifying the microphone 3 as a specific microphone or a non-specific microphone. The “specific microphone” is a microphone 3 in which the corresponding individual filter coefficient k is not appropriate (deviation), that is, the microphone 3 for which the individual filter coefficient k is to be updated. The deterioration of ERL is caused by the shift of the filter coefficient F with respect to the echo signal es, that is, the shift of the individual filter coefficients k1-k6 with respect to the individual echo signals es1-es6. Therefore, the individual filter coefficient k corresponding to the specific microphone needs to be updated to an appropriate value. The “non-specific microphone” is a microphone 3 in which the corresponding individual filter coefficient k is appropriate (not shifted), that is, a microphone 3 that is not subject to update of the individual filter coefficient k.
 先ず、本装置1は、制御部50を用いて、送話信号s4に音声信号s1が含まれているか否か(音声信号s1の有無)を検出する(ST401)。音声信号s1の有無の検出(ST401)は、エコー信号除去処理(ST3)における音声信号s1の有無の検出(ST309)と同様の処理である。 First, the present apparatus 1 uses the control unit 50 to detect whether or not the voice signal s1 is included in the transmission signal s4 (presence / absence of the voice signal s1) (ST401). The detection of the presence / absence of the audio signal s1 (ST401) is the same process as the detection of the presence / absence of the audio signal s1 (ST309) in the echo signal removal process (ST3).
 送話信号s4に音声信号s1が含まれていない(音声信号s1が無い)とき(ST401の「いいえ」)、本装置1は、切替部40と制御部50とを用いて、第2入力部30の伝送路をマイクロホン3aからの信号の伝送路に切り替える(ST402)。 When the transmission signal s4 does not include the audio signal s1 (no audio signal s1) (“No” in ST401), the present apparatus 1 uses the switching unit 40 and the control unit 50 to generate a second input unit. The 30 transmission paths are switched to the transmission path of the signal from the microphone 3a (ST402).
 次いで、本装置1は、第2入力部30を用いて、マイクロホン3aからの信号に基づいて、個別送話信号s41を生成する(ST403)。個別送話信号s41は、切替部40を介して、除去部80へ入力される。 Next, the present apparatus 1 uses the second input unit 30 to generate an individual transmission signal s41 based on the signal from the microphone 3a (ST403). The individual transmission signal s41 is input to the removal unit 80 via the switching unit 40.
 次いで、本装置1は、制御部50と除去信号生成部70とを用いて、個別除去信号(特定除去信号)s51を生成する(ST404)。制御部50は、記憶部60からマイクロホン3aに対応する個別フィルタ係数k1を読み出して、除去信号生成部70に入力する。除去信号生成部70は、受話信号s2と個別フィルタ係数k1とに基づいて、個別除去信号s51を生成する。個別除去信号s51は、除去部80に入力される。 Next, the present apparatus 1 generates an individual removal signal (specific removal signal) s51 using the control unit 50 and the removal signal generation unit 70 (ST404). The control unit 50 reads the individual filter coefficient k1 corresponding to the microphone 3a from the storage unit 60 and inputs it to the removal signal generation unit 70. The removal signal generation unit 70 generates an individual removal signal s51 based on the received signal s2 and the individual filter coefficient k1. The individual removal signal s51 is input to the removal unit 80.
 次いで、本装置1は、除去部80を用いて、個別送話信号s41に含まれる個別エコー信号es1を除去して、個別エコー除去信号s61を生成する(ST405)。除去部80は、個別送話信号s41と個別除去信号s51とに基づいて、個別エコー除去信号s61を生成する。個別エコー除去信号s61は、除去部80から制御部50と第2出力部90とに入力される。 Next, the present apparatus 1 uses the removal unit 80 to remove the individual echo signal es1 included in the individual transmission signal s41 and generate the individual echo removal signal s61 (ST405). The removal unit 80 generates the individual echo removal signal s61 based on the individual transmission signal s41 and the individual removal signal s51. The individual echo removal signal s61 is input from the removal unit 80 to the control unit 50 and the second output unit 90.
 次いで、本装置1は、制御部50を用いて、個別ERLを測定する(ST406)。 Next, the present apparatus 1 measures the individual ERL using the control unit 50 (ST406).
 「個別ERL」は、個別送話信号s41と個別エコー除去信号s61のレベル差、すなわち、個別エコー除去信号s61に含まれる個別残留エコー信号res11の大きさ(信号レベル)である。制御部50は、個別送話信号s41の信号レベルと、個別エコー除去信号s61の信号レベルと、に基づいて、個別ERLを測定する。すなわち、制御部50は、個別送話信号s41の信号レベルから個別エコー除去信号s61の信号レベルを減算することにより、個別ERLを測定する。 “Individual ERL” is a level difference between the individual transmission signal s41 and the individual echo removal signal s61, that is, the magnitude (signal level) of the individual residual echo signal res11 included in the individual echo removal signal s61. The control unit 50 measures the individual ERL based on the signal level of the individual transmission signal s41 and the signal level of the individual echo removal signal s61. That is, the control unit 50 measures the individual ERL by subtracting the signal level of the individual echo removal signal s61 from the signal level of the individual transmission signal s41.
 次いで、本装置1は、制御部50を用いて、測定した個別ERLと所定の閾値V4とを比較する(ST407)。 Next, the present apparatus 1 uses the control unit 50 to compare the measured individual ERL with a predetermined threshold value V4 (ST407).
 「閾値V4」は、本装置1による個別エコー信号es1の除去が十分か否か(個別残留エコー信号res11の信号レベルが大きいか否か)の閾値である。すなわち、本装置1による個別エコー信号es1の除去が不十分なとき、個別ERLは、閾値V4以上となる(悪化する)。一方、本装置1による個別エコー信号es1の除去が十分なとき、個別ERLは、閾値V4より小さい。閾値V4は、本発明における個別基準値である。閾値V4は、記憶部60に記憶されている。 The “threshold value V4” is a threshold value indicating whether or not the removal of the individual echo signal es1 by the apparatus 1 is sufficient (whether the signal level of the individual residual echo signal res11 is large). That is, when the removal of the individual echo signal es1 by the present apparatus 1 is insufficient, the individual ERL becomes equal to or higher than the threshold value V4 (deteriorates). On the other hand, when the individual echo signal es1 is sufficiently removed by the apparatus 1, the individual ERL is smaller than the threshold value V4. The threshold value V4 is an individual reference value in the present invention. The threshold value V4 is stored in the storage unit 60.
 個別ERLが閾値V4より小さいとき(ST407の「いいえ」)、本装置1は、マイクロホン3aを非特定マイクロホンとして特定する(ST408)。一方、個別ERLが閾値V4以上のとき(ST407の「はい」)、本装置1は、マイクロホン3aを特定マイクロホンとして特定する(ST409)。特定結果は、記憶部60に記憶される(ST410)。このとき、個別エコー除去信号s61は、第2出力部90から出力される。 When the individual ERL is smaller than the threshold value V4 (“No” in ST407), the present apparatus 1 specifies the microphone 3a as a non-specific microphone (ST408). On the other hand, when the individual ERL is greater than or equal to threshold value V4 (“Yes” in ST407), apparatus 1 identifies microphone 3a as a specific microphone (ST409). The specific result is stored in storage unit 60 (ST410). At this time, the individual echo removal signal s61 is output from the second output unit 90.
 本装置1は、全てのマイクロホン3a-3fを特定マイクロホン、または、非特定マイクロホンとして特定するまで、残りのマイクロホン3b-3fからの信号に対して処理(ST401-ST410)を繰り返す(ST411の「いいえ」)。すなわち、本装置1は、切替部40を用いて、残りのマイクロホン3b-3fそれぞれに対応する個別送話信号s42-s46を切り替えながら除去部80に入力して、各マイクロホン3a-3fを、特定マイクロホン、または、非特定マイクロホンのいずれかに決定する。 This apparatus 1 repeats the processing (ST401-ST410) on the signals from the remaining microphones 3b-3f until all microphones 3a-3f are specified as specific microphones or non-specific microphones (“No” in ST411) "). That is, the apparatus 1 uses the switching unit 40 to input the individual transmission signals s42 to s46 corresponding to the remaining microphones 3b to 3f to the removal unit 80 while switching, and specifies each microphone 3a to 3f. It is determined as either a microphone or a non-specific microphone.
 本装置1は、各マイクロホン3a-3fを特定マイクロホン、または、非特定マイクロホンとして特定したとき(ST411の「はい」)、更新処理(ST5)を実行する。このとき、マイクロホン3は、特定マイクロホンと、非特定マイクロホンと、で構成される。 The apparatus 1 executes the update process (ST5) when each microphone 3a-3f is specified as a specific microphone or a non-specific microphone (“Yes” in ST411). At this time, the microphone 3 includes a specific microphone and a non-specific microphone.
 送話信号s4に音声信号s1が含まれている(音声信号s1が有る)とき(ST401の「はい」)、本装置1は、特定処理(ST4)を終了(中断)して、エコー信号除去処理(ST3)を実行する。すなわち、特定処理(ST4)が完了する前に制御部50が音声信号s1を検出したとき、本装置1は特定処理(ST4)を中断してエコー信号除去処理(ST3)を実行する。特定処理(ST4)を中断した場合、本装置1は、エコー信号除去処理(ST3)において送話信号s4に音声信号s1が含まれないと判定したとき、中断された処理(特定マイクロホン、または、非特定マイクロホンとして特定されていないマイクロホン3からの信号に対する処理)から特定処理(ST4)を実行する(再開する)。すなわち、例えば、各マイクロホン3a-3fのうち、マイクロホン3dまで特定処理(ST4)が実行された段階で中断されていれば、特定処理(ST4)は、マイクロホン3eから再開される。 When the transmission signal s4 includes the voice signal s1 (the voice signal s1 is present) (“Yes” in ST401), the apparatus 1 ends (interrupts) the specific process (ST4) and removes the echo signal. The process (ST3) is executed. That is, when the control unit 50 detects the audio signal s1 before the specific process (ST4) is completed, the apparatus 1 interrupts the specific process (ST4) and executes the echo signal removal process (ST3). When the specific process (ST4) is interrupted, the apparatus 1 determines that the audio signal s1 is not included in the transmission signal s4 in the echo signal removal process (ST3), and the interrupted process (specific microphone or The specific process (ST4) is executed (restarted) from the process for the signal from the microphone 3 not specified as the non-specific microphone. That is, for example, if the specific process (ST4) is interrupted to the microphone 3d among the microphones 3a to 3f, the specific process (ST4) is resumed from the microphone 3e.
 なお、本装置は、特定処理を中断したとき、特定処理を最初から、つまり、全てのマイクロホンに対して実行してもよい。 Note that, when the specific process is interrupted, this apparatus may execute the specific process from the beginning, that is, all microphones.
 また、個別ERLの値が負の値として測定されるとき、閾値V4は負の値であり、本装置1は前述の処理(ST407)での個別ERLと閾値V4との大小の比較を逆にしてもよい。すなわち、例えば、負の値である個別ERLが閾値V4以下のとき、本装置は、同個別ERLに対応するマイクロホン3を特定マイクロホンとして特定してもよい。 Further, when the individual ERL value is measured as a negative value, the threshold value V4 is a negative value, and this apparatus 1 reverses the comparison of the individual ERL and the threshold value V4 in the above-described processing (ST407). May be. That is, for example, when the individual ERL that is a negative value is equal to or less than the threshold value V4, the apparatus may identify the microphone 3 corresponding to the individual ERL as a specific microphone.
 このように、本装置1は、個別ERLと個別基準値(閾値V4)との比較結果に基づいて、複数のマイクロホン3a-3fの中から個別フィルタ係数kの更新の対象となる特定マイクロホンと、個別フィルタ係数kの更新の対象とならない非特定マイクロホンと、を決定する。すなわち、本装置1は、ERLが悪化したとき、送話信号s4にエコー信号esが含まれ、かつ、送話信号s4に音声信号s1が含まれないタイミングで特定マイクロホンを決定する。そのため、本装置1は、個別フィルタ係数kの更新が必要なマイクロホン3を限定し、個別フィルタ係数kの更新と、フィルタ係数Fの更新と、に必要な時間や、処理負荷を低減する。 As described above, the present apparatus 1 is based on the comparison result between the individual ERL and the individual reference value (threshold value V4), the specific microphone that is the target of updating the individual filter coefficient k from among the plurality of microphones 3a-3f, Non-specific microphones that are not targeted for updating the individual filter coefficient k are determined. That is, when the ERL deteriorates, the device 1 determines a specific microphone at a timing at which the echo signal es is included in the transmission signal s4 and the audio signal s1 is not included in the transmission signal s4. Therefore, the present apparatus 1 limits the microphone 3 that needs to update the individual filter coefficient k, and reduces the time and processing load required for updating the individual filter coefficient k and the filter coefficient F.
●更新処理
 図10は、更新処理(ST5)のフローチャートである。
 図11は、更新処理(ST5)での信号の流れを示す機能ブロック図である。
 同図は、更新処理(ST5)での信号の流れのうち、主要な流れを矢印で示す。同図は、マイクロホン3cからの信号に対応する各信号のみを示す。
Update Process FIG. 10 is a flowchart of the update process (ST5).
FIG. 11 is a functional block diagram showing a signal flow in the update process (ST5).
In the figure, of the signal flow in the update process (ST5), the main flow is indicated by arrows. The figure shows only each signal corresponding to the signal from the microphone 3c.
 「更新処理(ST5)」は、特定マイクロホンとして特定されたマイクロホン3に対応する個別フィルタ係数kを更新することにより、フィルタ係数Fを更新する処理である。すなわち、例えば、マイクロホン3aが特定マイクロホンとして特定されたとき、本装置1は、マイクロホン3aに対応する個別フィルタ係数k1を更新してフィルタ係数Fを更新する。また、マイクロホン3e,3fが特定マイクロホンとして特定されたとき、本装置1は、マイクロホン3e,3fに対応する個別フィルタ係数k5,k6を更新してフィルタ係数Fを更新する。以下、マイクロホン3cが特定マイクロホンとして特定された場合を例に説明する。 “Update process (ST5)” is a process of updating the filter coefficient F by updating the individual filter coefficient k corresponding to the microphone 3 specified as the specific microphone. That is, for example, when the microphone 3a is specified as a specific microphone, the device 1 updates the filter coefficient F by updating the individual filter coefficient k1 corresponding to the microphone 3a. When the microphones 3e and 3f are specified as specific microphones, the apparatus 1 updates the filter coefficients F by updating the individual filter coefficients k5 and k6 corresponding to the microphones 3e and 3f. Hereinafter, a case where the microphone 3c is specified as a specific microphone will be described as an example.
 先ず、本装置1は、制御部50を用いて、送話信号s4(または、個別送話信号s43)に音声信号s1が含まれているか否か(音声信号s1の有無)を検出する(ST501)。音声信号s1の有無の検出(ST501)は、エコー信号除去処理(ST3)における音声信号s1の有無の検出(ST309)と同様の処理である。 First, the present apparatus 1 uses the control unit 50 to detect whether or not the audio signal s1 is included in the transmission signal s4 (or the individual transmission signal s43) (the presence or absence of the audio signal s1) (ST501). ). The detection of the presence / absence of the audio signal s1 (ST501) is the same processing as the detection of the presence / absence of the audio signal s1 (ST309) in the echo signal removal processing (ST3).
 先ず、本装置1は、切替部40と制御部50とを用いて、第2入力部30の伝送路を特定マイクロホン(マイクロホン3c)からの信号の伝送路に切り替える(ST502)。 First, the present apparatus 1 uses the switching unit 40 and the control unit 50 to switch the transmission path of the second input unit 30 to the transmission path of the signal from the specific microphone (microphone 3c) (ST502).
 次いで、本装置1は、特定マイクロホン(マイクロホン3c)からの信号に基づいて、個別送話信号s43を生成する(ST503)。 Next, this apparatus 1 generates an individual transmission signal s43 based on a signal from a specific microphone (microphone 3c) (ST503).
 次いで、本装置1は、制御部50と除去信号生成部70とを用いて、個別除去信号s53を生成する(ST504)。制御部50は、記憶部60から特定マイクロホンに対応する個別フィルタ係数k3を読み出して、除去信号生成部70に入力する。除去信号生成部70は、受話信号s2と個別フィルタ係数k3とに基づいて、個別除去信号s53を生成する。個別除去信号s53は、本発明における特定除去信号である。個別除去信号s53は、除去部80に入力される。 Next, the present apparatus 1 generates an individual removal signal s53 using the control unit 50 and the removal signal generation unit 70 (ST504). The control unit 50 reads out the individual filter coefficient k3 corresponding to the specific microphone from the storage unit 60 and inputs it to the removal signal generation unit 70. The removal signal generation unit 70 generates an individual removal signal s53 based on the received signal s2 and the individual filter coefficient k3. The individual removal signal s53 is a specific removal signal in the present invention. The individual removal signal s53 is input to the removal unit 80.
 次いで、本装置1は、除去部80を用いて、個別送話信号s43に含まれる個別エコー信号es3を除去して、個別エコー除去信号s63を生成する(ST505)。個別エコー除去信号s63は、本発明における特定エコー除去信号である。個別エコー除去信号s63は、制御部50と第2出力部90とに入力される。 Next, the present apparatus 1 uses the removal unit 80 to remove the individual echo signal es3 included in the individual transmission signal s43 and generate the individual echo removal signal s63 (ST505). The individual echo removal signal s63 is a specific echo removal signal in the present invention. The individual echo removal signal s63 is input to the control unit 50 and the second output unit 90.
 次いで、本装置1は、制御部50を用いて、個別エコーリターンロス(個別ERL)を測定する(ST506)。 Next, the present apparatus 1 measures an individual echo return loss (individual ERL) using the control unit 50 (ST506).
 次いで、本装置1は、制御部50を用いて、測定した個別ERLと所定の閾値V4とを比較する(ST507)。 Next, the present apparatus 1 uses the control unit 50 to compare the measured individual ERL with a predetermined threshold value V4 (ST507).
 個別ERLが閾値V4以上のとき(ST507の「はい」)、本装置1は、制御部50を用いて、個別フィルタ係数k3を算出する(ST508)。制御部50は、特定マイクロホンからの信号の伝送路に設定したゲイン値g3を記憶部60から読み出す。制御部50は、読み出されたゲイン値g3と、個別エコー除去信号s63(すなわち、個別(特定)エコー除去信号s63に含まれる個別残留エコー信号res13)と、受話信号s2と、環境測定結果と、に基づいて、個別フィルタ係数k3を算出する。 When the individual ERL is greater than or equal to the threshold value V4 (“Yes” in ST507), the present apparatus 1 calculates the individual filter coefficient k3 using the control unit 50 (ST508). The control unit 50 reads the gain value g3 set in the transmission path of the signal from the specific microphone from the storage unit 60. The controller 50 reads the read gain value g3, the individual echo removal signal s63 (that is, the individual residual echo signal res13 included in the individual (specific) echo removal signal s63), the received signal s2, and the environment measurement result. , The individual filter coefficient k3 is calculated.
 次いで、本装置1は、算出した個別フィルタ係数k3を記憶部60に記憶して、すなわち、記憶部60に記憶されている個別フィルタ係数k3を更新して(ST509)、処理(ST504)に戻る。 Next, the present apparatus 1 stores the calculated individual filter coefficient k3 in the storage unit 60, that is, updates the individual filter coefficient k3 stored in the storage unit 60 (ST509), and returns to the processing (ST504). .
 一方、個別ERLが閾値V4より小さいとき(ST507の「いいえ」)、本装置1は、制御部50を用いて、記憶部60に記憶されているフィルタ係数Fを更新する(ST510)。制御部50は、更新された特定マイクロホンに対応する個別フィルタ係数k3と、非特定マイクロホンに対応する個別フィルタ係数k1,k2,k4-k6と、各伝送路に設定されたゲイン値g1-g6と、を記憶部60から読み出し、フィルタ係数Fを算出する。フィルタ係数Fは、初期学習処理(ST2)の処理(ST208)と同様に算出される。 On the other hand, when the individual ERL is smaller than the threshold value V4 (“No” in ST507), the present apparatus 1 updates the filter coefficient F stored in the storage unit 60 using the control unit 50 (ST510). The control unit 50 includes the individual filter coefficient k3 corresponding to the updated specific microphone, the individual filter coefficients k1, k2, k4-k6 corresponding to the non-specific microphones, and the gain values g1-g6 set for each transmission path. Are read from the storage unit 60, and the filter coefficient F is calculated. The filter coefficient F is calculated in the same manner as the initial learning process (ST2) (ST208).
 次いで、本装置1は、算出したフィルタ係数Fを記憶部60に記憶して、すなわち、記憶部60に記憶されているフィルタ係数Fを更新して(ST511)、エコー信号除去処理(ST3)に戻る。 Next, the apparatus 1 stores the calculated filter coefficient F in the storage unit 60, that is, updates the filter coefficient F stored in the storage unit 60 (ST511), and performs echo signal removal processing (ST3). Return.
 このように、本装置1は、特定処理(ST4)において個別ERLが悪化したマイクロホン3を特定マイクロホンとして特定し、特定マイクロホンに対してのみ更新処理(ST5)を実行する。その結果、フィルタ係数Fの更新にかかる処理負荷は軽減され、同処理時間は短縮される。 Thus, the present apparatus 1 specifies the microphone 3 whose individual ERL has deteriorated in the specific process (ST4) as the specific microphone, and executes the update process (ST5) only for the specific microphone. As a result, the processing load for updating the filter coefficient F is reduced, and the processing time is shortened.
 また、本装置1は、エコー信号除去処理(ST3)において、常にERLと閾値V2とを比較(すなわち、ERLを監視)する。ERLが閾値V2以上のとき、本装置1は、送話信号s4にエコー信号esが含まれ、かつ、音声信号s1が含まれないタイミングで、特定処理(ST4)と更新処理(ST5)とを実行する。特定処理(ST4)において、本装置1は、マイクロホン3ごとに個別ERLと閾値V4とを比較する。個別ERLが閾値V4以上のとき、本装置1は、個別フィルタ係数kの更新の対象となる特定マイクロホンを決定する。更新処理(ST5)において、本装置1は、受話信号s2と、個別エコー除去信号(特定エコー除去信号)s60に含まれる個別残留エコー信号res10と、に基づいて、特定マイクロホンに対応する個別フィルタ係数kを算出する。本装置1は、特定マイクロホンに対応する個別フィルタ係数kと、非特定マイクロホンに対応する個別フィルタ係数kと、に基づいて、フィルタ係数Fを算出・更新する。 The apparatus 1 always compares the ERL and the threshold value V2 (that is, monitors the ERL) in the echo signal removal process (ST3). When the ERL is greater than or equal to the threshold value V2, the present apparatus 1 performs a specific process (ST4) and an update process (ST5) at a timing when the echo signal es is included in the transmission signal s4 and the audio signal s1 is not included. Execute. In the specific process (ST4), the apparatus 1 compares the individual ERL and the threshold value V4 for each microphone 3. When the individual ERL is equal to or greater than the threshold value V4, the apparatus 1 determines a specific microphone that is an object of updating the individual filter coefficient k. In the update process (ST5), the present apparatus 1 determines the individual filter coefficient corresponding to the specific microphone based on the received signal s2 and the individual residual echo signal res10 included in the individual echo removal signal (specific echo removal signal) s60. k is calculated. The device 1 calculates and updates the filter coefficient F based on the individual filter coefficient k corresponding to the specific microphone and the individual filter coefficient k corresponding to the non-specific microphone.
●まとめ
 以上説明した実施の形態によれば、制御部50は、複数のマイクロホン3a-3fそれぞれに対応する個別フィルタ係数k1-k6を算出し、各個別フィルタ係数k1-k6を合成してフィルタ係数Fを算出する。除去信号生成部70は、算出したフィルタ係数Fに基づいて、除去信号s5を生成する。除去部80は、送話信号s4と除去信号s5とに基づいて、送話信号s4に含まれるエコー信号esを除去する(エコー除去信号s6を生成する)。そのため、本装置1は、複数のマイクロホンそれぞれに対応するエコーキャンセル部を備える従来の装置と異なり、共通する1つのFIRフィルタ(除去信号生成部70)により複数のマイクロホン3(多チャンネル)からの信号に含まれるエコー信号esを除去することができる。すなわち、本装置1は、従来の装置と比較して簡易な回路構成を実現する。つまり、本装置1は、1つの共通するFIRフィルタを用いるという簡易な回路構成で複数のマイクロホン3からの信号に含まれるエコー信号esを除去する。
Summary According to the embodiment described above, the control unit 50 calculates the individual filter coefficients k1-k6 corresponding to each of the plurality of microphones 3a-3f, and synthesizes the individual filter coefficients k1-k6 to filter coefficients. F is calculated. The removal signal generation unit 70 generates a removal signal s5 based on the calculated filter coefficient F. The removal unit 80 removes the echo signal es included in the transmission signal s4 based on the transmission signal s4 and the removal signal s5 (generates an echo removal signal s6). Therefore, this apparatus 1 differs from a conventional apparatus having an echo cancellation unit corresponding to each of a plurality of microphones, and signals from a plurality of microphones 3 (multi-channels) by a common FIR filter (removal signal generation unit 70). Can be removed. That is, the present apparatus 1 realizes a simple circuit configuration as compared with the conventional apparatus. That is, the present apparatus 1 removes the echo signals es included in the signals from the plurality of microphones 3 with a simple circuit configuration in which one common FIR filter is used.
 また、以上説明した実施の形態によれば、制御部50は、送話信号s4に音声信号s1が含まれてなく、かつ、送話信号s4にエコー信号esが含まれているとき(受話信号s2があるとき)、フィルタ係数Fを算出(更新)する。そのため、本装置1は、常にフィルタ係数を算出(更新)する従来の装置と比較して、フィルタ係数Fの算出(更新)の処理負荷を低減する。 Further, according to the embodiment described above, the control unit 50 does not include the audio signal s1 in the transmission signal s4 and includes the echo signal es in the transmission signal s4 (the reception signal). When there is s2, the filter coefficient F is calculated (updated). Therefore, the present apparatus 1 reduces the processing load for calculating (updating) the filter coefficient F as compared with the conventional apparatus that always calculates (updates) the filter coefficient.
 さらに、以上説明した実施の形態によれば、切替部40は、送話信号s4に音声信号s1が含まれず、かつ、送話信号s4にエコー信号esが含まれているとき(受話信号s2があるとき)、各個別送話信号s41-s46を切り替えながら制御部50に入力する。制御部50は、複数のマイクロホン3a-3fそれぞれからの信号に基づいて、各マイクロホン3a-3fに対応する個別フィルタ係数k1-k6を算出する。すなわち、本装置1は、切替部40により個別送話信号s41-s46を切り替えながら、個別フィルタ係数k1-k6を算出する。そのため、本装置1は、共通する1つのFIRフィルタ(除去信号生成部70)により6つのマイクロホン3a-3fに対応する個別フィルタ係数k1-k6を算出することができる。つまり、本装置1は、簡易な回路構成で複数のマイクロホン3に対応する個別フィルタ係数kを算出し、同個別フィルタ係数kに基づいてフィルタ係数Fを算出する。その結果、本装置1は、簡易な回路構成で複数のマイクロホン3からの信号に含まれるエコー信号esを除去する。 Furthermore, according to the embodiment described above, the switching unit 40 is configured such that the voice signal s1 is not included in the transmission signal s4 and the echo signal es is included in the transmission signal s4 (the reception signal s2 is When there is, the individual transmission signals s41 to s46 are input to the control unit 50 while being switched. The control unit 50 calculates individual filter coefficients k1-k6 corresponding to the microphones 3a-3f based on signals from the plurality of microphones 3a-3f. That is, the apparatus 1 calculates the individual filter coefficients k1-k6 while switching the individual transmission signals s41-s46 by the switching unit 40. Therefore, the present apparatus 1 can calculate the individual filter coefficients k1-k6 corresponding to the six microphones 3a-3f by one common FIR filter (removal signal generation unit 70). That is, the apparatus 1 calculates the individual filter coefficient k corresponding to the plurality of microphones 3 with a simple circuit configuration, and calculates the filter coefficient F based on the individual filter coefficient k. As a result, the present apparatus 1 removes the echo signal es included in the signals from the plurality of microphones 3 with a simple circuit configuration.
 さらにまた、以上説明した実施の形態によれば、制御部50は、受話信号s2と、個別エコー除去信号s60に含まれる個別残留エコー信号res10と、に基づいて、個別フィルタ係数kを算出する。すなわち、本装置1は、個別残留エコー信号res10が限りなく「0」に近づくように繰り返し個別フィルタ係数kを算出することにより、フィルタ係数Fの精度を向上させて、送話信号s4から確実にエコー信号esを除去(抑制)する。 Furthermore, according to the embodiment described above, the control unit 50 calculates the individual filter coefficient k based on the received signal s2 and the individual residual echo signal res10 included in the individual echo removal signal s60. That is, the present apparatus 1 improves the accuracy of the filter coefficient F by repeatedly calculating the individual filter coefficient k so that the individual residual echo signal res10 approaches “0” as much as possible, and reliably determines from the transmission signal s4. Echo signal es is removed (suppressed).
 さらにまた、以上説明した実施の形態によれば、制御部50は、複数のマイクロホン3a-3fそれぞれに対応するゲイン値g1-g6に基づいて、個別フィルタ係数k1-k6を更新する。そのため、本装置1は、各マイクロホン3a-3fがエコー成分を収音したときのゲイン値g1-g6で個別フィルタ係数k1-k6を算出することができる。その結果、本装置1は、フィルタ係数Fの精度を向上させて、送話信号s4から確実にエコー信号esを除去(抑制)することができる。 Furthermore, according to the embodiment described above, the control unit 50 updates the individual filter coefficients k1-k6 based on the gain values g1-g6 corresponding to each of the plurality of microphones 3a-3f. Therefore, the present apparatus 1 can calculate the individual filter coefficients k1-k6 with the gain values g1-g6 when the microphones 3a-3f pick up the echo components. As a result, the present apparatus 1 can improve (accurate) the filter coefficient F and reliably remove (suppress) the echo signal es from the transmission signal s4.
 さらにまた、以上説明した実施の形態によれば、制御部50は、エコー信号除去処理(ST3)において常にERLを測定する。次いで、制御部50は、ERLが基準値(閾値V2)以上、かつ、送話信号s4に音声信号s1が含まれていないとき、記憶部60に記憶されているフィルタ係数Fを更新する。すなわち、本装置1は、ERLが悪化したタイミングで環境変化を検知し、フィルタ係数Fを更新する。つまり、本装置1は、常にフィルタ係数Fを算出(更新)する従来の装置と比較して、フィルタ係数Fの算出(更新)の処理負荷を低減する。 Furthermore, according to the embodiment described above, the control unit 50 always measures ERL in the echo signal removal process (ST3). Next, the control unit 50 updates the filter coefficient F stored in the storage unit 60 when the ERL is equal to or greater than the reference value (threshold value V2) and the speech signal s1 is not included in the transmission signal s4. That is, the apparatus 1 detects an environmental change at the timing when the ERL deteriorates, and updates the filter coefficient F. That is, this apparatus 1 reduces the processing load of calculation (update) of the filter coefficient F compared with the conventional apparatus which always calculates (updates) the filter coefficient F.
 さらにまた、以上説明した実施の形態によれば、制御部50は、ERLと基準値(閾値V2)との比較結果に基づいて、個別ERLを測定する。その結果、本装置1は、ERLが悪化したとき、各マイクロホン3a-3fに対応するERLの測定結果から、フィルタ係数Fのずれ(エコー信号esの除去・抑制効果の悪化)を検出する。 Furthermore, according to the embodiment described above, the control unit 50 measures the individual ERL based on the comparison result between the ERL and the reference value (threshold value V2). As a result, when the ERL deteriorates, the present apparatus 1 detects the deviation of the filter coefficient F (deterioration / suppression effect of the echo signal es) from the measurement result of the ERL corresponding to each microphone 3a-3f.
 さらにまた、以上説明した実施の形態によれば、制御部50は、個別ERLと個別基準値(閾値V4)との比較結果に基づいて、複数のマイクロホン3a-3fの中から個別フィルタ係数kの更新の対象となる特定マイクロホンを決定する。すなわち、本装置1は、ERLが悪化したとき、特定マイクロホンを決定することにより、個別フィルタ係数kの更新と、フィルタ係数Fの更新と、に必要な時間、処理負荷を低減する。 Furthermore, according to the embodiment described above, the control unit 50 determines the individual filter coefficient k from among the plurality of microphones 3a to 3f based on the comparison result between the individual ERL and the individual reference value (threshold value V4). The specific microphone to be updated is determined. That is, when the ERL deteriorates, the present apparatus 1 determines a specific microphone, thereby reducing the processing load and time required for updating the individual filter coefficient k and updating the filter coefficient F.
 さらにまた、以上説明した実施の形態によれば、制御部50は、特定マイクロホンの個別フィルタ係数kを算出する。次いで、制御部50は、算出した特定マイクロホンの個別フィルタ係数kと、非特定マイクロホンの個別フィルタ係数kと、に基づいて、記憶部60に記憶されているフィルタ係数Fを更新する。そのため、本装置1は、特定マイクロホンの個別フィルタ係数kのみを算出(更新)することにより、フィルタ係数Fを更新する。すなわち、本装置1は、個別フィルタ係数kの更新と、フィルタ係数Fの更新と、に必要な時間、処理負荷を低減する。 Furthermore, according to the embodiment described above, the control unit 50 calculates the individual filter coefficient k of the specific microphone. Next, the control unit 50 updates the filter coefficient F stored in the storage unit 60 based on the calculated individual filter coefficient k of the specific microphone and the individual filter coefficient k of the non-specific microphone. Therefore, the present apparatus 1 updates the filter coefficient F by calculating (updating) only the individual filter coefficient k of the specific microphone. That is, the present apparatus 1 reduces the processing load for the time required for updating the individual filter coefficient k and updating the filter coefficient F.
 さらにまた、以上説明した実施の形態によれば、制御部50は、マイクロホン3ごとの環境測定を実行して、同環境測定の測定結果に基づいて、個別フィルタ係数kを算出する。そのため、本装置1は、本装置1を設置する部屋(空間)の環境に応じて、フィルタ係数Fを算出することができる。 Furthermore, according to the embodiment described above, the control unit 50 performs the environmental measurement for each microphone 3 and calculates the individual filter coefficient k based on the measurement result of the environmental measurement. Therefore, this apparatus 1 can calculate the filter coefficient F according to the environment of the room (space) where this apparatus 1 is installed.
 このように、以上説明した実施の形態によれば、本装置1は、初期化処理(ST1)と初期学習処理(ST2)とに基づいてフィルタ係数Fを算出し、同フィルタ係数Fに基づいてエコーキャンセルを実行する(エコー信号除去処理(ST3)を実行する)。本装置1は、エコー信号除去処理(ST3)の実行中に環境変化を検知すると、特定処理(ST4)と更新処理(ST5)とを実行することにより、フィルタ係数Fの自動調整を実現する。その結果、本装置1は、共通する1つのフィルタにより多チャンネルのエコーキャンセルを実行すると共に、環境変化に対して自動的に追従してエコーキャンセルを実行する。 As described above, according to the embodiment described above, the apparatus 1 calculates the filter coefficient F based on the initialization process (ST1) and the initial learning process (ST2), and based on the filter coefficient F. Echo cancellation is executed (echo signal removal processing (ST3) is executed). When the apparatus 1 detects an environmental change during the execution of the echo signal removal process (ST3), the apparatus 1 performs the specific process (ST4) and the update process (ST5), thereby realizing automatic adjustment of the filter coefficient F. As a result, the present apparatus 1 executes multi-channel echo cancellation using a common filter, and also performs echo cancellation by automatically following environmental changes.
 なお、第2入力部に接続されるマイクロホンの数は、複数であればよく、「6」に限定されない。 It should be noted that the number of microphones connected to the second input unit is not limited to “6” as long as it is plural.
 また、以上説明した実施の形態では、本装置1は、1組の除去信号生成部70と除去部80とを備える構成であった。そのため、除去信号生成部70は、特定処理(ST4)や更新処理(ST5)において、個別除去信号s50の生成に専有される。その結果、本装置1は、エコー信号除去処理(ST3)と、特定処理(ST4)や更新処理(ST5)と、を同時に実行しない。 In the embodiment described above, the present apparatus 1 is configured to include a pair of removal signal generation unit 70 and removal unit 80. Therefore, the removal signal generation unit 70 is dedicated to the generation of the individual removal signal s50 in the specific process (ST4) and the update process (ST5). As a result, the present apparatus 1 does not execute the echo signal removal process (ST3), the specific process (ST4), and the update process (ST5) at the same time.
 これに代えて、本装置は、エコー信号除去処理に用いられる1組の除去信号生成部と除去部と、特定処理と更新処理とに用いられる1組の除去信号生成部と除去部と、の2組の除去信号生成部と除去部とを備えてもよい。 Instead, this apparatus includes a set of removal signal generation unit and removal unit used for echo signal removal processing, and a set of removal signal generation unit and removal unit used for identification processing and update processing. You may provide two sets of removal signal production | generation parts and removal parts.
 図12は、本装置の別の実施の形態を示す機能ブロックである。
 同図は、本装置1Aが、第1除去信号生成部70Aと、第2除去信号生成部70Bと、第1除去部80Aと、第2除去部80Bと、を有してなる音声信号処理装置であることを示す。第1除去信号生成部70Aと第1除去部80Aとは、特定処理(ST4)と更新処理(ST5)とを実行する。第2除去信号生成部70Bと第2除去部80Bとは、エコー信号除去処理(ST3)を実行する。
FIG. 12 is a functional block showing another embodiment of the present apparatus.
This figure shows an audio signal processing apparatus in which the present apparatus 1A includes a first removal signal generation unit 70A, a second removal signal generation unit 70B, a first removal unit 80A, and a second removal unit 80B. Indicates that The first removal signal generation unit 70A and the first removal unit 80A perform a specific process (ST4) and an update process (ST5). The second removal signal generation unit 70B and the second removal unit 80B execute an echo signal removal process (ST3).
 この構成によれば、本装置1Aは、エコー信号除去処理(ST3)と、特定処理(ST4)と更新処理(ST5)と、を同時に実行することができる。そのため、本装置1Aは、2つのエコーキャンセラ部を備える簡易な回路構成で、2以上のマイクロホン3からの信号に含まれるエコー信号esを除去(抑制)することができる。 According to this configuration, the apparatus 1A can simultaneously execute the echo signal removal process (ST3), the specific process (ST4), and the update process (ST5). Therefore, this apparatus 1A can remove (suppress) the echo signal es included in the signals from two or more microphones 3 with a simple circuit configuration including two echo canceller units.
1     音声信号処理装置
1A    音声信号処理装置
20    第1出力部(出力部)
30    第2入力部(入力部)
40    切替部
50    制御部
60    記憶部
70    除去信号生成部
70A   第1除去信号生成部
70B   第2除去信号生成部
80    除去部
80A   第1除去部
80B   第2除去部
s1    音声信号
s2    受話信号
s3    基準信号
s4    送話信号
s40   個別送話信号
s5    除去信号
s50   個別除去信号
s6    エコー除去信号
s60   個別エコー除去信号
es    エコー信号
res   残留エコー信号
res10 個別残留エコー信号
F     フィルタ係数
k     個別フィルタ係数

 
DESCRIPTION OF SYMBOLS 1 Audio | voice signal processing apparatus 1A Audio | voice signal processing apparatus 20 1st output part (output part)
30 Second input section (input section)
40 switching unit 50 control unit 60 storage unit 70 removal signal generation unit 70A first removal signal generation unit 70B second removal signal generation unit 80 removal unit 80A first removal unit 80B second removal unit s1 voice signal s2 reception signal s3 reference signal s4 transmission signal s40 individual transmission signal s5 cancellation signal s50 individual cancellation signal s6 echo cancellation signal s60 individual echo cancellation signal es echo signal res residual echo signal res10 individual residual echo signal F filter coefficient k individual filter coefficient

Claims (19)

  1.  受話信号を出力する出力部と、
     前記受話信号のエコー成分と、話者の音声と、を収音して、前記エコー成分に応じたエコー信号と、前記話者の音声に応じた音声信号と、を生成する複数のマイクロホンそれぞれから入力される信号を合成して送話信号を生成する入力部と、
     前記送話信号に含まれる前記エコー信号を除去する除去信号をフィルタ係数に基づいて生成する除去信号生成部と、
     前記フィルタ係数を算出する制御部と、
     前記送話信号と前記除去信号とに基づいて、エコー除去信号を生成する除去部と、
    を有してなり、
     前記制御部は、前記複数のマイクロホンそれぞれに対応する個別フィルタ係数を算出し、前記個別フィルタ係数を合成して前記フィルタ係数を算出する、
    ことを特徴とする音声信号処理装置。
    An output unit for outputting a reception signal;
    From each of the plurality of microphones that pick up the echo component of the received signal and the voice of the speaker and generate an echo signal according to the echo component and a voice signal according to the voice of the speaker An input unit for synthesizing input signals to generate a transmission signal;
    A removal signal generation unit that generates a removal signal for removing the echo signal included in the transmission signal based on a filter coefficient;
    A control unit for calculating the filter coefficient;
    A removal unit that generates an echo removal signal based on the transmission signal and the removal signal;
    Having
    The control unit calculates individual filter coefficients corresponding to each of the plurality of microphones, and combines the individual filter coefficients to calculate the filter coefficients.
    An audio signal processing device.
  2.  前記制御部は、前記送話信号に前記音声信号が含まれていないとき、前記フィルタ係数を算出する、
    請求項1記載の音声信号処理装置。
    The control unit calculates the filter coefficient when the audio signal is not included in the transmission signal.
    The audio signal processing apparatus according to claim 1.
  3.  前記制御部は、前記送話信号に前記エコー信号が含まれているとき、前記フィルタ係数を算出する、
    請求項2記載の音声信号処理装置。
    The control unit calculates the filter coefficient when the echo signal is included in the transmission signal;
    The audio signal processing apparatus according to claim 2.
  4.  前記除去信号生成部は、前記受話信号と前記フィルタ係数とに基づいて、前記除去信号を生成する、
    請求項1記載の音声信号処理装置。
    The removal signal generation unit generates the removal signal based on the received signal and the filter coefficient.
    The audio signal processing apparatus according to claim 1.
  5.  前記入力部は、前記複数のマイクロホンそれぞれから入力される信号に基づいて、前記複数のマイクロホンそれぞれに対応する個別送話信号を生成し、前記個別送話信号を合成して前記送話信号を生成し、
     前記複数のマイクロホンそれぞれに対応する前記個別送話信号のうち、前記除去部に入力される信号を切り替える切替部、
    を備え、
     前記切替部は、前記送話信号に前記音声信号が含まれていないとき、前記複数のマイクロホンそれぞれに対応する前記個別送話信号を切り替えながら前記除去部に入力する、
    請求項1記載の音声信号処理装置。
    The input unit generates an individual transmission signal corresponding to each of the plurality of microphones based on a signal input from each of the plurality of microphones, and generates the transmission signal by combining the individual transmission signals. And
    Of the individual transmission signals corresponding to each of the plurality of microphones, a switching unit that switches a signal input to the removal unit,
    With
    When the voice signal is not included in the transmission signal, the switching unit inputs the individual transmission signal corresponding to each of the plurality of microphones to the removal unit while switching.
    The audio signal processing apparatus according to claim 1.
  6.  前記切替部は、前記送話信号に前記エコー信号が含まれているとき、前記複数のマイクロホンそれぞれに対応する前記個別送話信号を切り替えながら前記除去部に入力する、
    請求項5記載の音声信号処理装置。
    When the echo signal is included in the transmission signal, the switching unit inputs the individual transmission signal corresponding to each of the plurality of microphones to the removal unit while switching.
    The audio signal processing apparatus according to claim 5.
  7.  前記除去信号生成部は、前記個別送話信号に含まれる前記エコー信号を除去する個別除去信号を生成し、
     前記除去部は、前記個別送話信号と前記個別除去信号とに基づいて、個別エコー除去信号を生成し、
     前記制御部は、前記受話信号と、前記個別エコー除去信号に含まれる個別残留エコー信号と、に基づいて、前記個別フィルタ係数を算出する、
    請求項5記載の音声信号処理装置。
    The removal signal generation unit generates an individual removal signal for removing the echo signal included in the individual transmission signal,
    The removal unit generates an individual echo removal signal based on the individual transmission signal and the individual removal signal,
    The control unit calculates the individual filter coefficient based on the received signal and the individual residual echo signal included in the individual echo cancellation signal;
    The audio signal processing apparatus according to claim 5.
  8.  前記制御部は、前記複数のマイクロホンそれぞれに対応するゲイン値に基づいて、前記個別フィルタ係数を算出する、
    請求項7記載の音声信号処理装置。
    The control unit calculates the individual filter coefficient based on a gain value corresponding to each of the plurality of microphones;
    The audio signal processing apparatus according to claim 7.
  9.  前記フィルタ係数を記憶する記憶部、
    を備え、
     前記制御部は、前記送話信号に前記音声信号が含まれていないとき、前記記憶部に記憶されている前記フィルタ係数を更新する、
    請求項1記載の音声信号処理装置。
    A storage unit for storing the filter coefficient;
    With
    The control unit updates the filter coefficient stored in the storage unit when the audio signal is not included in the transmission signal.
    The audio signal processing apparatus according to claim 1.
  10.  前記記憶部は、基準値を記憶し、
     前記制御部は、
     前記送話信号の信号レベルと、前記エコー除去信号の信号レベルと、に基づいて、エコーリターンロスを測定し、
     前記エコーリターンロスと、前記基準値と、の比較結果に基づいて、前記フィルタ係数を更新する、
    請求項9記載の音声信号処理装置。
    The storage unit stores a reference value,
    The controller is
    Based on the signal level of the transmission signal and the signal level of the echo cancellation signal, an echo return loss is measured,
    Updating the filter coefficient based on a comparison result between the echo return loss and the reference value;
    The audio signal processing apparatus according to claim 9.
  11.  前記制御部は、前記比較結果に基づいて、前記複数のマイクロホンそれぞれに対応する個別エコーリターンロスを測定する、
    請求項10記載の音声信号処理装置。
    The control unit measures individual echo return loss corresponding to each of the plurality of microphones based on the comparison result.
    The audio signal processing apparatus according to claim 10.
  12.  前記記憶部は、個別基準値、を記憶し、
     前記制御部は、
     前記複数のマイクロホンごとに、前記個別エコーリターンロスと、前記個別基準値と、を比較し、
     前記個別エコーリターンロスと、前記個別基準値と、の比較結果に基づいて、前記複数のマイクロホンの中から前記個別フィルタ係数の更新の対象となる特定マイクロホンを決定する、
    請求項11記載の音声信号処理装置。
    The storage unit stores an individual reference value,
    The controller is
    For each of the plurality of microphones, the individual echo return loss and the individual reference value are compared,
    Based on the comparison result between the individual echo return loss and the individual reference value, a specific microphone to be updated of the individual filter coefficient is determined from the plurality of microphones.
    The audio signal processing apparatus according to claim 11.
  13.  前記複数のマイクロホンは、
     前記特定マイクロホンと、
     前記特定マイクロホンとは異なる非特定マイクロホンと、
    で構成され、
     前記除去信号生成部は、前記特定マイクロホンからの信号に含まれる前記エコー信号を除去する特定除去信号を、前記特定マイクロホンに対応する前記個別フィルタ係数に基づいて生成し、
     前記除去部は、前記特定マイクロホンからの信号と、前記特定除去信号と、に基づいて、特定エコー除去信号を生成し、
     前記制御部は、
     前記受話信号と、前記特定エコー除去信号に含まれる個別残留エコー信号と、に基づいて、前記特定マイクロホンに対応する前記個別フィルタ係数を算出し、
     前記非特定マイクロホンに対応する前記個別フィルタ係数と、前記特定マイクロホンに対応する前記個別フィルタ係数と、に基づいて、前記記憶部に記憶されている前記フィルタ係数を更新する、
    請求項12記載の音声信号処理装置。
    The plurality of microphones are:
    The specific microphone;
    A non-specific microphone different from the specific microphone;
    Consists of
    The removal signal generation unit generates a specific removal signal for removing the echo signal included in the signal from the specific microphone based on the individual filter coefficient corresponding to the specific microphone,
    The removal unit generates a specific echo removal signal based on the signal from the specific microphone and the specific removal signal,
    The controller is
    Based on the received signal and the individual residual echo signal included in the specific echo removal signal, calculate the individual filter coefficient corresponding to the specific microphone,
    Updating the filter coefficient stored in the storage unit based on the individual filter coefficient corresponding to the non-specific microphone and the individual filter coefficient corresponding to the specific microphone;
    The audio signal processing apparatus according to claim 12.
  14.  前記制御部は、
     前記複数のマイクロホンごとの環境測定を実行し、
     前記複数のマイクロホンそれぞれに対応する前記環境測定の結果に基づいて、前記個別フィルタ係数を算出する、
    請求項1記載の音声信号処理装置。
    The controller is
    Performing environmental measurements for each of the plurality of microphones;
    Calculating the individual filter coefficient based on the result of the environmental measurement corresponding to each of the plurality of microphones;
    The audio signal processing apparatus according to claim 1.
  15.  コンピュータを、請求項1乃至14のいずれかに記載の音声信号処理装置として機能させる、
    ことを特徴とする音声信号処理プログラム。
    A computer is caused to function as the audio signal processing device according to any one of claims 1 to 14.
    An audio signal processing program.
  16.  受話信号を出力する出力部と、
     前記受話信号のエコー成分と、話者の音声と、を収音して、前記エコー成分に応じたエコー信号と、前記話者の音声に応じた音声信号と、を生成する複数のマイクロホンそれぞれから入力される信号を合成して送話信号を生成する入力部と、
     前記送話信号に含まれる前記エコー信号を除去する除去信号をフィルタ係数に基づいて生成する除去信号生成部と、
     前記フィルタ係数を算出する制御部と、
     前記送話信号と前記除去信号とに基づいて、エコー除去信号を生成する除去部と、
    を備える音声信号処理装置により実行される音声信号処理方法であって、
     前記制御部が、前記複数のマイクロホンそれぞれに対応する個別フィルタ係数を算出し、
     前記制御部が、前記個別フィルタ係数を合成して前記フィルタ係数を算出する、
    ことを特徴とする音声信号処理方法。
    An output unit for outputting a reception signal;
    From each of the plurality of microphones that pick up the echo component of the received signal and the voice of the speaker and generate an echo signal according to the echo component and a voice signal according to the voice of the speaker An input unit for synthesizing input signals to generate a transmission signal;
    A removal signal generation unit that generates a removal signal for removing the echo signal included in the transmission signal based on a filter coefficient;
    A control unit for calculating the filter coefficient;
    A removal unit that generates an echo removal signal based on the transmission signal and the removal signal;
    An audio signal processing method executed by an audio signal processing device comprising:
    The control unit calculates individual filter coefficients corresponding to the plurality of microphones;
    The control unit calculates the filter coefficient by combining the individual filter coefficients;
    An audio signal processing method.
  17.  前記制御部が、前記送話信号に、前記エコー信号が含まれ、かつ、前記音声信号が含まれていないとき、前記個別フィルタ係数の算出を実行する、
    請求項16記載の音声信号処理方法。
    The control unit executes the calculation of the individual filter coefficient when the transmission signal includes the echo signal and does not include the audio signal.
    The audio signal processing method according to claim 16.
  18.  前記音声信号処理装置は、前記フィルタ係数を記憶する記憶部、
    を備え、
     前記制御部が、前記送話信号に前記音声信号が含まれていないとき、前記記憶部に記憶されている前記フィルタ係数を更新する、
    請求項16記載の音声信号処理方法。
    The audio signal processing device includes a storage unit that stores the filter coefficient,
    With
    The control unit updates the filter coefficient stored in the storage unit when the voice signal is not included in the transmission signal;
    The audio signal processing method according to claim 16.
  19.  前記記憶部は、基準値と個別基準値と、を記憶し、
     前記制御部が、前記複数のマイクロホンごとに、
     前記送話信号の信号レベルと、前記エコー除去信号の信号レベルと、に基づいて、エコーリターンロスを測定し、
     前記エコーリターンロスと、前記基準値と、の比較結果に基づいて、前記複数のマイクロホンそれぞれに対応する個別エコーリターンロスを測定し、
     前記個別エコーリターンロスと、前記個別基準値と、の比較結果に基づいて、前記複数のマイクロホンの中から前記個別フィルタ係数の更新の対象となる特定マイクロホンを決定して、
    前記フィルタ係数を更新する、
    請求項18記載の音声信号処理方法。
    The storage unit stores a reference value and an individual reference value,
    The control unit, for each of the plurality of microphones,
    Based on the signal level of the transmission signal and the signal level of the echo cancellation signal, an echo return loss is measured,
    Based on a comparison result between the echo return loss and the reference value, an individual echo return loss corresponding to each of the plurality of microphones is measured,
    Based on a comparison result between the individual echo return loss and the individual reference value, a specific microphone to be updated of the individual filter coefficient is determined from the plurality of microphones,
    Updating the filter coefficients;
    The audio signal processing method according to claim 18.
PCT/JP2018/010330 2017-06-12 2018-03-15 Voice signal processing device, voice signal processing method and voice signal processing program WO2018230062A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201880038882.7A CN110741563B (en) 2017-06-12 2018-03-15 Speech signal processing apparatus, method and storage medium thereof
US16/621,861 US11227618B2 (en) 2017-06-12 2018-03-15 Sound signal processing device, sound signal processing method and sound signal processing program
EP18816708.4A EP3641141A4 (en) 2017-06-12 2018-03-15 Voice signal processing device, voice signal processing method and voice signal processing program
JP2019525088A JP7122756B2 (en) 2017-06-12 2018-03-15 Audio signal processing device, audio signal processing method, and audio signal processing program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017-114908 2017-06-12
JP2017114908 2017-06-12

Publications (1)

Publication Number Publication Date
WO2018230062A1 true WO2018230062A1 (en) 2018-12-20

Family

ID=64660460

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/010330 WO2018230062A1 (en) 2017-06-12 2018-03-15 Voice signal processing device, voice signal processing method and voice signal processing program

Country Status (5)

Country Link
US (1) US11227618B2 (en)
EP (1) EP3641141A4 (en)
JP (2) JP7122756B2 (en)
CN (1) CN110741563B (en)
WO (1) WO2018230062A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11502723B2 (en) * 2018-02-28 2022-11-15 Maxlinear, Inc. Full-duplex cable modem calibration
CN112885365B (en) * 2021-01-08 2024-04-30 上海锐承通讯技术有限公司 Echo cancellation device and vehicle-mounted intelligent terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6412648A (en) * 1987-07-06 1989-01-17 Nippon Telegraph & Telephone Echo canceller
JP2001251224A (en) * 2000-03-06 2001-09-14 Nippon Telegr & Teleph Corp <Ntt> Echo canceling method and echo canceller
JP2002252577A (en) 2001-02-26 2002-09-06 Nippon Telegr & Teleph Corp <Ntt> Method and system for canceling multichannel acoustic echo, its program and its recording medium
WO2005076663A1 (en) * 2004-01-07 2005-08-18 Koninklijke Philips Electronics N.V. Audio system having reverberation reducing filter
JP2005323308A (en) * 2004-05-11 2005-11-17 Sony Corp Voice collecting device and echo cancellation processing method
JP2005347957A (en) * 2004-06-01 2005-12-15 Nippon Telegr & Teleph Corp <Ntt> Method, apparatus and program for suppressing multi-channel acoustic echo, and recording medium

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5410595A (en) * 1992-11-12 1995-04-25 Motorola, Inc. Apparatus and method for noise reduction for a full-duplex speakerphone or the like
US5323458A (en) * 1993-10-25 1994-06-21 Motorola Inc. Echo cancellation in a full-duplex speakerphone
JPH07334175A (en) * 1994-06-07 1995-12-22 Matsushita Electric Ind Co Ltd On-vehicle sound field correcting device
JP3863538B2 (en) * 2004-06-01 2006-12-27 Necエレクトロニクス株式会社 Switch circuit for converter for satellite broadcasting
DE602004017603D1 (en) * 2004-09-03 2008-12-18 Harman Becker Automotive Sys Speech signal processing for the joint adaptive reduction of noise and acoustic echoes
CN101471694B (en) * 2007-12-24 2014-06-11 瑞昱半导体股份有限公司 Device and method for eliminating interference
CN101192411B (en) 2007-12-27 2010-06-02 北京中星微电子有限公司 Large distance microphone array noise cancellation method and noise cancellation system
DE102008039330A1 (en) * 2008-01-31 2009-08-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for calculating filter coefficients for echo cancellation
JP5510288B2 (en) 2010-11-24 2014-06-04 沖電気工業株式会社 Adaptive filter order control device and program, and echo canceller
JP5501527B2 (en) * 2011-05-10 2014-05-21 三菱電機株式会社 Echo canceller and echo detector
US8526599B2 (en) * 2011-09-22 2013-09-03 Panasonic Corporation Input/output apparatus and communication terminal
CN107172538B (en) * 2012-11-12 2020-09-04 雅马哈株式会社 Signal processing system and signal processing method
JP6349899B2 (en) * 2014-04-14 2018-07-04 ヤマハ株式会社 Sound emission and collection device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6412648A (en) * 1987-07-06 1989-01-17 Nippon Telegraph & Telephone Echo canceller
JP2001251224A (en) * 2000-03-06 2001-09-14 Nippon Telegr & Teleph Corp <Ntt> Echo canceling method and echo canceller
JP2002252577A (en) 2001-02-26 2002-09-06 Nippon Telegr & Teleph Corp <Ntt> Method and system for canceling multichannel acoustic echo, its program and its recording medium
WO2005076663A1 (en) * 2004-01-07 2005-08-18 Koninklijke Philips Electronics N.V. Audio system having reverberation reducing filter
JP2005323308A (en) * 2004-05-11 2005-11-17 Sony Corp Voice collecting device and echo cancellation processing method
JP2005347957A (en) * 2004-06-01 2005-12-15 Nippon Telegr & Teleph Corp <Ntt> Method, apparatus and program for suppressing multi-channel acoustic echo, and recording medium

Also Published As

Publication number Publication date
CN110741563B (en) 2021-11-23
JP7122756B2 (en) 2022-08-22
US20200105289A1 (en) 2020-04-02
EP3641141A1 (en) 2020-04-22
JP7323959B2 (en) 2023-08-09
JP2022125069A (en) 2022-08-26
JPWO2018230062A1 (en) 2020-05-21
US11227618B2 (en) 2022-01-18
EP3641141A4 (en) 2021-02-17
CN110741563A (en) 2020-01-31

Similar Documents

Publication Publication Date Title
JP7323959B2 (en) Audio signal processing device and audio signal processing method
US8644517B2 (en) System and method for automatic disabling and enabling of an acoustic beamformer
US20100150360A1 (en) Audio source localization system and method
US8233352B2 (en) Audio source localization system and method
EP2868117B1 (en) Systems and methods for surround sound echo reduction
JP6163468B2 (en) Sound quality evaluation apparatus, sound quality evaluation method, and program
CN106663447B (en) Audio system with noise interference suppression
KR20150084814A (en) Echo cancellation using ultrasound
JP7070562B2 (en) Audio output control device, audio output control method, and program
JP2003523674A (en) Signal comparison method, transducer control device, and transducer control system
JP6800809B2 (en) Audio processor, audio processing method and program
CN112055122A (en) Conference component equipment, conference equipment and data processing method
JP6363429B2 (en) Data structure, data generation apparatus, data generation method, and program
WO2023081535A1 (en) Automated audio tuning and compensation procedure
TWI790718B (en) Conference terminal and echo cancellation method for conference
US8804946B2 (en) Stochastic vector based network echo cancellation
WO2023130206A1 (en) Multi-channel speaker system and method thereof
JP6994221B2 (en) Extraction generation sound correction device, extraction generation sound correction method, program
US11924368B2 (en) Data correction apparatus, data correction method, and program
US20170041707A1 (en) Retaining binaural cues when mixing microphone signals
JP6779489B2 (en) Extraction generated sound correction device, extraction generation sound correction method, program
JP2016046694A (en) Acoustic quality evaluation device, acoustic quality evaluation method, and program
CN116486823A (en) Sound watermark processing method and sound watermark generating device
JPS62120734A (en) Echo erasing equipment
WO2023081534A1 (en) Automated audio tuning launch procedure and report

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18816708

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019525088

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018816708

Country of ref document: EP

Effective date: 20200113