JP2008147914A

JP2008147914A - Echo canceler and echo canceling method

Info

Publication number: JP2008147914A
Application number: JP2006331686A
Authority: JP
Inventors: Tetsunori Kobayashi; 哲則小林; Kenzo Akagiri; 健三赤桐; Shinya Fujie; 真也藤江; Tetsuji Ogawa; 哲司小川
Original assignee: Waseda University
Current assignee: Waseda University
Priority date: 2006-12-08
Filing date: 2006-12-08
Publication date: 2008-06-26

Abstract

PROBLEM TO BE SOLVED: To provide an echo canceler and an echo canceling method capable of eliminating echo while reducing cost. SOLUTION: A talker voice spectrum F1 (ω) corresponding to an echo can be eliminated from a received voice spectrum F2 (ω) without providing a complicated and expensive device apparatus such as an adaptive filter by eliminating the talker voice spectrum F1 (ω) from the received voice spectrum F2 (ω) on the basis of a characteristic spectrum waveform inherent to the voice of the talker voice spectrum F1 (ω) generated on the basis of a talker voice, the cost of the entire device can be reduced to the extent that an expensive adaptive filter for echo elimination is not provided in this way, and the echo can be eliminated. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明はエコーキャンセラ及びエコーキャンセル方法に関し、例えばテレビ会議システム又はハンズフリーホン等のハンズフリー通話システムや、音声認識機能及び対話制御機能が搭載され、ユーザとの間で簡単な日常会話を行い得るようになされたエンターテインメントロボット等に適用して好適なものである。 The present invention relates to an echo canceller and an echo cancellation method. For example, a hands-free call system such as a video conference system or a hands-free phone, a voice recognition function and a dialog control function are mounted, and simple daily conversations with a user can be performed. The present invention is suitable for application to entertainment robots and the like.

テレビ会議システムやハンズフリーホン等のハンズフリー通話システムにおいては、双方向の音声伝送を行なうために、同一の部屋にマイクロフォンとスピーカとが配置される。このため参加者の声は、マイクロフォンから伝送路を通った後、通話相手側のスピーカからマイクロフォンへと回り込み、再び伝送路を通って帰ってくる。このような音響結合によって、帰ってくる音声には時間の遅れが生じ、エコーとなって感知される。 In a hands-free call system such as a video conference system or a hands-free phone, a microphone and a speaker are arranged in the same room for bidirectional audio transmission. For this reason, the participant's voice passes through the transmission path from the microphone, then wraps around from the speaker on the other party's side to the microphone, and returns through the transmission path again. Due to such acoustic coupling, a time delay occurs in the returning voice, which is detected as an echo.

特に音声信号をＩＰ（Internet Protocol）パケット化し、インターネット等のネットワークを経由して音声通信を行なうハンズフリー通話システムでは、当該ＩＰパケットがネットワーク上のルータを通過する際に遅延が生じ、この遅延に基づいて音響結合によるエコーが発生する場合がある。 In particular, in a hands-free call system in which a voice signal is converted into an IP (Internet Protocol) packet and voice communication is performed via a network such as the Internet, a delay occurs when the IP packet passes through a router on the network. Based on this, echo due to acoustic coupling may occur.

かかる音響結合によるエコーを除去するためのエコーキャンセラとしては、スピーカに供給される音声信号を基にエコーそっくりな擬似エコー信号を生成する適応フィルタを設け、マイクロフォンで集音した受音信号からこの擬似エコー信号を差し引くことにより、適応的にエコーの除去を行なうものが提案されている（例えば、特許文献１参照）。また、このようなエコーキャンセラは、ハンズフリー通話システムのみならず、音声対話側のロボット装置等の各種分野においても用いることが考えられている。
特許第２８６１８８８号公報 As an echo canceller for removing echo due to such acoustic coupling, an adaptive filter that generates an echo-like pseudo echo signal based on an audio signal supplied to a speaker is provided, and this pseudo-acoustic signal is collected from a received sound signal collected by a microphone. An apparatus that adaptively removes echoes by subtracting echo signals has been proposed (see, for example, Patent Document 1). Further, such an echo canceller is considered to be used not only in a hands-free call system but also in various fields such as a robot apparatus on the voice conversation side.
Japanese Patent No. 2618888

しかしながら、かかる構成のエコーキャンセラは、各回路部を複雑に制御する必要があり、装置規模やソフトウェア規模が大きくなると共に、複雑な信号検出器を用いて音声信号の判別を行なう必要があることから製造コストが高くなるという問題があった。 However, the echo canceller having such a configuration needs to control each circuit unit in a complicated manner, which increases the scale of the apparatus and the software, and requires the discrimination of the audio signal using a complicated signal detector. There was a problem that the manufacturing cost was high.

本発明は以上の点を考慮してなされたもので、コスト低減を図りつつ、エコーを除去できるエコーキャンセラ及びエコーキャンセル方法を提供することを目的とする。 The present invention has been made in consideration of the above points, and an object thereof is to provide an echo canceller and an echo cancellation method capable of removing echoes while reducing costs.

本発明の請求項１記載のエコーキャンセラは、スピーカを介して出力されてなる音声がマイクロフォンに回り込み集音されることにより発生するエコーを除去するためのエコーキャンセラであって、前記スピーカから出力されるエコー成分となる前記音声を周波数解析したエコー成分スペクトルを取得するエコー成分取得手段と、前記マイクロフォンで集音することにより得られた受音信号を周波数解析した受音スペクトルを取得し、前記エコー成分スペクトルの有する音声特有の特徴的スペクトル波形を基に、前記受音スペクトルから前記エコー成分スペクトルを取り除くエコー除去手段とを備えることを特徴とするものである。 An echo canceller according to a first aspect of the present invention is an echo canceller for removing echoes generated when sound output through a speaker wraps around a microphone and is collected, and is output from the speaker. An echo component acquisition means for acquiring an echo component spectrum obtained by frequency analysis of the sound to be an echo component, and obtaining a sound reception spectrum obtained by frequency analysis of a sound reception signal obtained by collecting the sound with the microphone; Echo removal means for removing the echo component spectrum from the sound reception spectrum based on a characteristic spectrum waveform unique to speech possessed by the component spectrum is provided.

本発明の請求項２記載のエコーキャンセラは、ネットワークを通じて受話者側の通信端末へ送信した音声信号が、前記通信端末のスピーカから音声として出力してマイクロフォンへと回り込み、再び前記ネットワークを通って帰ってくることにより発生するエコーを除去するためのエコーキャンセラであって、前記通信端末へ送信する前記音声信号を周波数解析したエコー成分スペクトルを取得するエコー成分取得手段と、前記ネットワークを通って前記通信端末から受信する受音信号を周波数解析した受音スペクトルを取得し、前記エコー成分スペクトルの有する音声特有の特徴的スペクトル波形を基に、前記受音スペクトルから前記エコー成分スペクトルを取り除くエコー除去手段とを備えることを特徴とするものである。 In the echo canceller according to claim 2 of the present invention, the voice signal transmitted to the communication terminal on the receiver side through the network is output as a voice from the speaker of the communication terminal, circulates to the microphone, and returns through the network again. An echo canceller for removing an echo generated by the transmission, and an echo component acquisition means for acquiring an echo component spectrum obtained by frequency analysis of the voice signal transmitted to the communication terminal; and the communication through the network An echo removing unit that obtains a sound reception spectrum obtained by frequency analysis of a sound reception signal received from a terminal, and removes the echo component spectrum from the sound reception spectrum based on a characteristic spectrum waveform peculiar to the voice of the echo component spectrum; It is characterized by providing.

本発明の請求項３記載のエコーキャンセラは、前記エコー成分スペクトルを時間軸上で遅延させてタイミングをずらした複数の遅延エコー成分スペクトルを生成する遅延エコー成分生成手段を備え、前記エコー除去手段は、前記エコー成分スペクトル及び前記複数の遅延エコー成分スペクトルの各前記タイミングで前記受音スペクトル内に各前記特徴的スペクトル波形を有するか否かを判断することを特徴とするものである。 According to a third aspect of the present invention, the echo canceller includes a delayed echo component generation unit that generates a plurality of delayed echo component spectra that are shifted in timing by delaying the echo component spectrum on a time axis, and the echo removal unit includes: It is characterized in that it is determined whether or not each characteristic spectrum waveform is included in the sound reception spectrum at each timing of the echo component spectrum and the plurality of delayed echo component spectra.

本発明の請求項４記載のエコーキャンセラは、前記エコー成分スペクトルを所定長さのフレーム単位で時間軸上にずらして遅延させた複数のフレーム遅延エコー成分スペクトルを生成するフレーム遅延エコー成分生成手段を備え、前記エコー除去手段は、前記エコー成分スペクトル及び前記複数のフレーム遅延エコー成分スペクトルの各前記タイミングで前記受音スペクトル内に各前記特徴的スペクトル波形を有するか否かを判断することを特徴とするものである。 According to a fourth aspect of the present invention, there is provided an echo canceller comprising: a frame delay echo component generation means for generating a plurality of frame delay echo component spectra obtained by shifting the echo component spectrum by shifting on a time axis in units of frames of a predetermined length. And the echo removing means determines whether each of the received spectrum has the characteristic spectrum waveform at each timing of the echo component spectrum and the plurality of frame delay echo component spectra. To do.

本発明の請求項５記載のエコーキャンセル方法は、スピーカを介して出力されてなる音声がマイクロフォンに回り込み集音されることにより発生するエコーを除去するためのエコーキャンセル方法であって、前記スピーカから出力されるエコー成分となる前記音声を周波数解析したエコー成分スペクトルを取得するエコー成分取得ステップと、
前記マイクロフォンで集音することにより得られた受音信号を周波数解析した受音スペクトルを取得する受音スペクトル取得ステップと、前記エコー成分スペクトルの有する音声特有の特徴的スペクトル波形を前記受音スペクトルから取り除くエコー除去ステップとを備えることを特徴とするものである。 An echo canceling method according to claim 5 of the present invention is an echo canceling method for removing echoes generated when sound output from a speaker wraps around and is collected by a microphone. An echo component acquisition step of acquiring an echo component spectrum obtained by frequency analysis of the sound to be output as an echo component;
A sound reception spectrum acquisition step of acquiring a sound reception spectrum obtained by frequency analysis of a sound reception signal obtained by collecting the sound with the microphone, and a voice-specific characteristic spectrum waveform of the echo component spectrum from the sound reception spectrum And an echo removal step to be removed.

本発明の請求項６記載のエコーキャンセル方法は、ネットワークを通じて受話者側の通信端末へ送信した音声信号が、前記通信端末のスピーカから音声として出力してマイクロフォンへと回り込み、再び前記ネットワークを通って帰ってくることにより発生するエコーを除去するためのエコーキャンセル方法であって、前記通信端末へ送信する前記音声信号を周波数解析したエコー成分スペクトルを取得するエコー成分取得ステップと、前記ネットワークを通って前記通信端末から受信する受音信号を周波数解析した受音スペクトルを取得する受音スペクトル取得ステップと、前記エコー成分スペクトルの有する音声特有の特徴的スペクトル波形を前記受音スペクトルから取り除くエコー除去ステップとを備えることを特徴とするものである。 In the echo canceling method according to claim 6 of the present invention, an audio signal transmitted to a communication terminal on the receiver side through a network is output as a sound from a speaker of the communication terminal and circulates to a microphone, and again passes through the network. An echo cancellation method for removing an echo generated by returning, an echo component acquisition step of acquiring an echo component spectrum obtained by frequency analysis of the audio signal transmitted to the communication terminal, and through the network A sound reception spectrum acquisition step of acquiring a sound reception spectrum obtained by frequency analysis of a sound reception signal received from the communication terminal; and an echo removal step of removing a characteristic spectrum waveform peculiar to the voice of the echo component spectrum from the sound reception spectrum; It is characterized by comprising

本発明の請求項７記載のエコーキャンセル方法は、前記エコー成分スペクトルを時間軸上で遅延させてタイミングをずらした複数の遅延エコー成分スペクトルを生成する遅延エコー成分スペクトル生成ステップを備え、前記エコー除去ステップは、前記エコー成分スペクトル及び前記複数の遅延エコー成分スペクトルの各前記タイミングで前記受音スペクトル内に各前記特徴的スペクトル波形を有するか否かを判断することを特徴とするものである。 The echo cancellation method according to claim 7 of the present invention includes a delayed echo component spectrum generation step for generating a plurality of delayed echo component spectra with the timing shifted by delaying the echo component spectrum on a time axis, The step is characterized by determining whether or not each characteristic spectrum waveform is included in the sound reception spectrum at each timing of the echo component spectrum and the plurality of delayed echo component spectra.

本発明の請求項８記載のエコーキャンセル方法は、前記エコー成分スペクトルを所定長さのフレーム単位で時間軸上にずらして遅延させることにより複数のフレーム遅延エコー成分スペクトルを生成するフレーム遅延エコー成分生成ステップを備え、前記エコー除去ステップは、前記エコー成分スペクトル及び前記複数のフレーム遅延エコー成分スペクトルの各前記タイミングで前記受音スペクトル内に各前記特徴的スペクトル波形を有するか否かを判断することを特徴とするものである。 According to an eighth aspect of the present invention, there is provided an echo cancellation method for generating a plurality of frame delay echo component spectra by delaying the echo component spectrum by shifting on a time axis in units of frames of a predetermined length. A step of determining whether the echo spectrum has the characteristic spectrum waveform in the received spectrum at each timing of the echo component spectrum and the plurality of frame delay echo component spectra. It is a feature.

本発明の請求項１記載のエコーキャンセラ及び請求項５記載のエコーキャンセル方法によれば、音声を周波数解析したエコー成分スペクトルの音声特有の特徴的スペクトル波形を基に受音スペクトルの中から当該エコー成分スペクトルを除去するようにしたことにより、適応フィルタのような複雑で、高価な装置機器を設けることなく、受音スペクトルからエコーに相当するエコー成分スペクトルを除去でき、かくして当該エコー除去用の高価な適応フィルタを設けない分だけ、装置全体のコスト低減を図りつつ、エコーを除去できる。 According to the echo canceller of claim 1 and the echo cancellation method of claim 5 of the present invention, the echo is detected from the received spectrum based on the characteristic spectral waveform of the echo component spectrum obtained by frequency analysis of the voice. By removing the component spectrum, it is possible to remove the echo component spectrum corresponding to the echo from the received spectrum without providing a complicated and expensive device such as an adaptive filter. Echoes can be removed while reducing the cost of the entire apparatus by the amount that no adaptive filter is provided.

本発明の請求項２記載のエコーキャンセラ及び請求項６記載のエコーキャンセル方法によれば、ネットワークを通じて受話者側の通信端末へ送信した音声信号が、前記通信端末のスピーカから音声として出力してマイクロフォンへと回り込み、再び前記ネットワークを通って帰ってくることによりエコーが発生しても、適応フィルタのような複雑で、高価な装置機器を設けることなく、当該エコーを除去できるので、当該エコー除去用の高価な適応フィルタを設けない分だけ、装置全体のコスト低減を図ることができる。 According to the echo canceller according to claim 2 and the echo cancellation method according to claim 6 of the present invention, the voice signal transmitted to the communication terminal on the receiver side through the network is output as a voice from the speaker of the communication terminal, and the microphone is used. Even if an echo is generated by going back to the network and returning again through the network, the echo can be removed without providing complicated and expensive equipment such as an adaptive filter. The cost of the entire apparatus can be reduced by the amount that the expensive adaptive filter is not provided.

本発明の請求項３記載のエコーキャンセラ及び請求項７記載のエコーキャンセル方法によれば、受音スペクトルの中にタイミングが遅れてエコー成分スペクトルが現れても、複数の遅延エコー成分スペクトルのタイミングにおいて特徴的スペクトル波形があるか否かを判断し、エコー成分となるエコー成分スペクトルを確実に取り除くことができる。 According to the echo canceller according to claim 3 and the echo cancellation method according to claim 7 of the present invention, even if the echo component spectrum appears late in the received sound spectrum, the timing of the plurality of delayed echo component spectra It is possible to determine whether or not there is a characteristic spectrum waveform, and to reliably remove the echo component spectrum that is an echo component.

本発明の請求項４記載のエコーキャンセラ及び請求項８記載のエコーキャンセル方法によれば、ネットワーク上で遅延が生じ、受音スペクトルの中にフレーム単位で遅れたタイミングでエコー成分スペクトルが現れても、複数の遅延エコー成分スペクトルのタイミングにおいて特徴的スペクトル波形があるか否かを判断し、エコー成分となるエコー成分スペクトルを確実に取り除くことができる。 According to the echo canceller according to claim 4 and the echo cancellation method according to claim 8 of the present invention, even if a delay occurs on the network and an echo component spectrum appears at a timing delayed in frame units in the received spectrum. It is possible to determine whether or not there is a characteristic spectrum waveform at the timing of a plurality of delayed echo component spectra, and to reliably remove the echo component spectrum that is an echo component.

以下図面に基づいて本発明の実施の形態を詳述する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（１）第１の実施の形態
図１において、１は全体としてハンズフリー通話システムを示し、このハンズフリー通話システム１は、それぞれ異なる地点に設置された複数の通話端末２Ａ，２Ｂが例えばインターネット回線等のネットワーク３を介して相互に接続されることにより構成されている。 (1) First Embodiment In FIG. 1, reference numeral 1 denotes a hands-free call system as a whole. This hands-free call system 1 includes a plurality of call terminals 2A and 2B installed at different points, for example, Internet lines. Are connected to each other via the network 3.

因みに、これら通話端末２Ａ，２Ｂは同一構成からなるが、説明の便宜上、一方の通話端末２Ａを受話者側とし、他方の通話端末２Ｂを送話者側として、第１の実施の形態では一方の受話者側の通話端末２Ａを中心に以下説明する。なお、この実施の形態の場合、通信端末２Ａ，２Ｂはスピーカ４とマイクロフォン５とが近傍付近に配置され、当該通信端末２Ａ，２Ｂ全体として小型化が図られている。 Incidentally, although these call terminals 2A and 2B have the same configuration, for convenience of explanation, one call terminal 2A is set as the receiver side, and the other call terminal 2B is set as the caller side. In the first embodiment, The following description will focus on the call terminal 2A on the receiver side. In the case of this embodiment, the communication terminals 2A and 2B have the speaker 4 and the microphone 5 arranged in the vicinity of the vicinity, so that the communication terminals 2A and 2B as a whole are reduced in size.

この場合、通話端末２Ａは、ネットワーク３からアンテナ（図示せず）を介して受信した受信信号S1を信号処理部７において周波数解析処理及び復調処理を施すことにより、所定長さのフレーム単位で受信データを得、これに所定のデータ処理（例えば伝送途中で発生した符号誤りを検出し、マスクする処理等）及び復号化し、その結果得られる音声信号をデイジタル／アナログ変換した後、これにより得られた送話者音声スペクトルＦ1（ω）をスピーカを介して送話者の音声（送話者音声）として出力する。 In this case, the call terminal 2A receives the received signal S1 received from the network 3 via an antenna (not shown) by performing frequency analysis processing and demodulation processing in the signal processing unit 7 in units of frames of a predetermined length. This is obtained after obtaining data, performing predetermined data processing (for example, processing for detecting and masking a code error occurring during transmission) and decoding, and digital / analog conversion of the resulting audio signal. The transmitted speaker voice spectrum F1 (ω) is output as the voice of the speaker (speaker voice) through the speaker.

かかる構成に加えて通話端末２Ａには、音声スペクトル取得部10及びエコー除去部11からなるエコーキャンセラ12が本体13に内蔵されている。このエコーキャンセラ12は、送話者音声スペクトルＦ1（ω）が信号処理部７からスピーカ４に送出される際に、エコー成分取得手段としての音声スペクトル取得部10にも順次送出されることにより、当該音声スペクトル取得部10で送話者音声スペクトルＦ1（ω）を取得してゆく。音声スペクトル取得部は10は、エコー成分スペクトルとしての送話者音声スペクトルＦ1（ω）を取得すると、これをエコー除去部11へ送出する。 In addition to such a configuration, the call terminal 2A includes an echo canceller 12 including a voice spectrum acquisition unit 10 and an echo removal unit 11 in the main body 13. The echo canceller 12 is sequentially sent to the voice spectrum acquisition unit 10 as the echo component acquisition means when the speaker voice spectrum F1 (ω) is sent from the signal processing unit 7 to the speaker 4. The voice spectrum acquisition unit 10 acquires the speaker voice spectrum F1 (ω). When the voice spectrum acquisition unit 10 acquires the talker voice spectrum F1 (ω) as the echo component spectrum, the voice spectrum acquisition unit 10 sends it to the echo removal unit 11.

一方、マイクロフォン５は、受話者の音声（受話者音声）や、スピーカ４からマイクロフォン５へ送話者音声が回り込んだ場合には当該送話者音声を集音する。このマイクロフォン５は、受話者音声及び送話者音声を集音すると、これらを受音信号としてスペクトル処理部15に送出する。スペクトル処理部15は、受音信号S2に周波数解析処理を施した後、得られた受音スペクトルＦ2（ω）をエコー除去部11に送出する。 On the other hand, the microphone 5 collects the voice of the receiver (the voice of the receiver) or the voice of the sender when the speaker voice wraps around the microphone 5 from the speaker 4. When the microphone 5 collects the voice of the receiver and the voice of the sender, the microphone 5 sends them to the spectrum processing unit 15 as a received sound signal. The spectrum processing unit 15 performs frequency analysis processing on the sound reception signal S2, and then transmits the obtained sound reception spectrum F2 (ω) to the echo removal unit 11.

エコー除去部11は、スペクロル処理部15から受音スペクトルＦ2（ω）を受け取ると、予め音声スペクトル取得部10から受け取った送話者音声スペクトルＦ1（ω）を用いて受音スペクトルＦ2（ω）に対し後述するエコー除去処理を施すことにより、当該受音スペクトルＦ2（ω）から送話者音声スペクトルＦ1（ω）に相当するスペクトル成分を取り除き、これにより得られた受話者音声スペクトルＦ3（ω）を信号処理部７に送出する。 When the echo removal unit 11 receives the sound reception spectrum F2 (ω) from the spectro processing unit 15, the sound reception spectrum F2 (ω) using the transmitter voice spectrum F1 (ω) received from the voice spectrum acquisition unit 10 in advance. Is subjected to echo cancellation processing described later, thereby removing the spectrum component corresponding to the transmitter voice spectrum F1 (ω) from the received voice spectrum F2 (ω), and the obtained speaker voice spectrum F3 (ω ) To the signal processing unit 7.

ここでエコー除去部11は、エコー除去処理として帯域選択（すなわち、バイナリマスク法）によるスペクトル分離処理を行なうようになされており、受音スペクトルＦ2（ω）と送話者音声スペクトルＦ1（ω）とを比較し、当該送話者音声スペクトルＦ1（ω）及び受音スペクトルＦ2（ω）同士の間で各周波数帯域毎にパワーの大小の比較を行なう。 Here, the echo removing unit 11 is configured to perform spectrum separation processing by band selection (that is, binary mask method) as echo removal processing. The received sound spectrum F2 (ω) and the transmitter voice spectrum F1 (ω). And the magnitude of the power is compared for each frequency band between the transmitter voice spectrum F1 (ω) and the sound reception spectrum F2 (ω).

ここで、送話者音声スペクトルＦ1（ω）は、送話者音声に基づいて生成されることから、時間軸上のパワーの分布として、凹凸がまばらに形成される性質（スパーク性）があり、音声特有の特徴的スペクトル波形となる。これによりエコー除去部11は、送話者音声スペクトルＦ1（ω）のパワーの時間軸上の分布を基に受音スペクトルＦ2（ω）の中に送話者音声スペクトルＦ1（ω）とほぼ同じ音声特有の特徴的スペクトル波形となる周波数帯域部分を有するか否か判断するようになされている。 Here, since the speaker voice spectrum F1 (ω) is generated based on the voice of the sender, the power distribution on the time axis has a characteristic that sparse irregularities are formed (sparking). It becomes a characteristic spectrum waveform peculiar to voice. As a result, the echo removing unit 11 is almost the same as the transmitter voice spectrum F1 (ω) in the received spectrum F2 (ω) based on the distribution of the power of the transmitter voice spectrum F1 (ω) on the time axis. It is determined whether or not it has a frequency band portion that becomes a characteristic spectrum waveform peculiar to speech.

ここで、この実施の場合には、スピーカ４とマイクロフォン５とが近傍付近に配置されていることから、スピーカ４から出力された送話者音声がスピーカ４から出力したのとほぼ同時にマイクロフォン５に回り込み集音される。このため、エコー除去部11では、音声スペクトル取得部10から送話者音声スペクトルＦ1（ω）を受け取ったときに、ほぼ同じタイミングでスペクトル処理部15から受音スペクトルＦ2（ω）を受け取る。 Here, in this embodiment, since the speaker 4 and the microphone 5 are disposed in the vicinity, the microphone 5 outputs the talker voice output from the speaker 4 to the microphone 5 almost simultaneously with the output from the speaker 4. A roundabout sound is collected. For this reason, the echo removing unit 11 receives the received sound spectrum F2 (ω) from the spectrum processing unit 15 at substantially the same timing when receiving the transmitter speech spectrum F1 (ω) from the voice spectrum acquiring unit 10.

すなわち、スピーカ４から出力された送話者音声がマイクロフォン５に回り込んで集音された場合には、受音スペクトルＦ2（ω）をスペクトル処理部15から受け取ったときに、当該受音スペクトルＦ2（ω）の中に送話者音声スペクトルＦ1（ω）とほぼ同一の特徴的スペクトル波形が含まれる。 That is, when the voice of the speaker output from the speaker 4 goes around the microphone 5 and is collected, when the sound reception spectrum F2 (ω) is received from the spectrum processing unit 15, the sound reception spectrum F2 is received. (Ω) includes a characteristic spectrum waveform that is almost the same as the speaker voice spectrum F 1 (ω).

このときエコー除去部11は、バイナリマスク法に基づいて、受音スペクトルＦ2（ω）からエコー成分に相当する送話者音声スペクトルＦ1（ω）を除去し、受音スペクトルＦ2（ω）から残りの受話者音声スペクトルＦ3（ω）のみを抽出して、これを信号処理部７へ送出し得る。 At this time, the echo removing unit 11 removes the transmitter speech spectrum F1 (ω) corresponding to the echo component from the received sound spectrum F2 (ω) based on the binary mask method, and the remaining from the received sound spectrum F2 (ω). Can be extracted and sent to the signal processing unit 7.

信号処理部15は、エコー除去部11から受話者音声スペクトルＦ3（ω）を受け取り、当該受話者音声スペクトルＦ3（ω）をアナログ／デイジタル変換した後、得られた受話者音声信号に暗号化等の所定のデータ処理を行ない、その結果所定長さのフレーム単位で送信データを得、これに変調処理及び周波数解析処理を施すことにより送信信号S3を生成し、これをアンテナ及びネットワーク３を介して送話者側の通話端末２Ｂへ送信する。 The signal processing unit 15 receives the listener's voice spectrum F3 (ω) from the echo removing unit 11, performs analog / digital conversion on the receiver's voice spectrum F3 (ω), and then encrypts the obtained speaker's voice signal to the obtained speaker's voice signal. As a result, transmission data is obtained in frame units of a predetermined length, and a transmission signal S3 is generated by performing modulation processing and frequency analysis processing on the transmission data, and this is transmitted via the antenna and the network 3 The data is transmitted to the call terminal 2B on the sender side.

以上の構成において、エコーキャンセラ12では、スピーカ４から出力される送話者音声スペクトルＦ1（ω）を予め取得しておき、マイクロフォン５から出力された受音信号を周波数解析した受音スペクトルＦ2（ω）と、当該送話者音声スペクトルＦ1（ω）とのパワーを比較し、当該送話者音声スペクトルＦ1（ω）の時間軸上のパワーの分布波形とほぼ同一の特徴的スペクトル波形をエコー成分であるとして受音スペクトルＦ2（ω）から除去するようにした。 In the above configuration, the echo canceller 12 obtains the transmitter voice spectrum F1 (ω) output from the speaker 4 in advance, and the received sound spectrum F2 (frequency analysis of the received sound signal output from the microphone 5). ω) and the power of the speaker's voice spectrum F1 (ω) are compared, and a characteristic spectrum waveform almost identical to the power distribution waveform on the time axis of the speaker's voice spectrum F1 (ω) is echoed. As a component, it is removed from the sound reception spectrum F2 (ω).

このように、エコーキャンセラ12では、送話者音声に基づき生成され、音声特有の特徴的スペクトル波形を有する送話者音声スペクトルＦ1（ω）を用いることで、この特徴的スペクトル波形に基づいて受音スペクトルＦ2（ω）中からエコー成分となる当該送話者音声スペクトルＦ1（ω）を容易に判別することができる。かくして、このエコーキャンセラ12では、受音スペクトルＦ2（ω）の中からエコー成分となる送話者音声スペクトルＦ1（ω）のみを確実に取り除き、受話者音声スペクトルＦ3（ω）を抽出することができる。 As described above, the echo canceller 12 uses the speaker voice spectrum F1 (ω) generated based on the voice of the talker and having a characteristic spectrum waveform unique to the voice, thereby receiving the voice based on the characteristic spectrum waveform. From the sound spectrum F2 (ω), the sender voice spectrum F1 (ω) as an echo component can be easily discriminated. Thus, the echo canceller 12 can reliably remove only the transmitter voice spectrum F1 (ω) as an echo component from the received spectrum F2 (ω) and extract the receiver voice spectrum F3 (ω). it can.

このようにエコーキャンセラ12では、従来のようなエコーそっくりな擬似エコー信号を生成する高価な適応フィルタを設けることなく、エコーのないクリアな受話者音声のみを送話者側の通話端末２Ｂに送信することができる。 In this way, the echo canceller 12 transmits only clear receiver speech without echoes to the talker terminal 2B on the transmitter side without providing an expensive adaptive filter that generates an echo-like pseudo echo signal as in the prior art. can do.

また、従来の適応フィルタを用いてエコーを除去する場合、当該適応フィルタは収束に所定時間を必要とするため、過渡状態でエコーが発生する虞があるが、本願発明のエコーキャンセラ12では、マイクロフォン５から集音し始めた直後からエコーの発生を防止することができる。 Further, when echo is removed using a conventional adaptive filter, the adaptive filter requires a predetermined time for convergence, and thus an echo may occur in a transient state. However, in the echo canceller 12 of the present invention, a microphone is used. The occurrence of echoes can be prevented immediately after starting to collect sound from 5.

以上の構成によれば、送話者音声に基づき生成された送話者音声スペクトルＦ1（ω）の音声特有の特徴的スペクトル波形を基に、受音スペクトルＦ2（ω）の中から当該送話者音声スペクトルＦ1（ω）を除去するようにしたことにより、適応フィルタのような複雑で、高価な装置機器を設けることなく、受音スペクトルＦ2（ω）からエコーに相当する送話者音声スペクトルＦ1（ω）を除去でき、かくして当該エコー除去用の高価な適応フィルタを設けない分だけ、装置全体のコスト低減を図ることができ、かつエコーを除去できる。 According to the above configuration, based on the characteristic spectrum waveform unique to the voice of the speaker voice spectrum F1 (ω) generated based on the voice of the sender, the transmission of the voice from the received spectrum F2 (ω). By removing the speaker voice spectrum F1 (ω), the transmitter voice spectrum equivalent to the echo from the received sound spectrum F2 (ω) can be obtained without providing complicated and expensive equipment such as an adaptive filter. F1 (ω) can be removed, and thus the cost of the entire apparatus can be reduced and the echo can be removed by the amount that the expensive adaptive filter for removing the echo is not provided.

なお、本発明は、上記の実施の形態に限定されるものではなく、種々の変形実施が可能である。例えば、上述した第１の実施の形態においては、受音スペクトルＦ2（ω）から送話者音声スペクトルＦ1（ω）に相当する特徴的スペクトル波形を取り除くエコー除去処理として、バイナリマスク法を用いるようにした場合について述べたが、本発明はこれに限らず、スペクトラル・サブトラクション（ＳＳ：Spectral Subtraction）法を用いるようにしても良い。 In addition, this invention is not limited to said embodiment, A various deformation | transformation implementation is possible. For example, in the first embodiment described above, the binary mask method is used as an echo removal process for removing a characteristic spectrum waveform corresponding to the transmitter voice spectrum F1 (ω) from the received sound spectrum F2 (ω). However, the present invention is not limited to this, and a spectral subtraction (SS) method may be used.

ここでバイナリマスク法に替えてスペクトラル・サブトラクション（ＳＳ）法を用いた場合には、エコーキャンセラ12では、周波数帯域毎に、受音スペクトルＦ2（ω）のパワーγから、送話者音声スペクトルＦ1（ω）のパワーδに係数Ｋを乗じた値（Ｋ×δ）を減じ、これにより送話者音声スペクトルＦ1（ω）を除去した受話者音声スペクトルＦ3（ω）を生成する。 Here, when the spectral subtraction (SS) method is used instead of the binary mask method, the echo canceller 12 uses the power γ of the received spectrum F2 (ω) for each frequency band, and the speaker voice spectrum F1. A value (K × δ) obtained by multiplying the power δ of (ω) by a coefficient K is subtracted, thereby generating a receiver voice spectrum F3 (ω) from which the speaker voice spectrum F1 (ω) is removed.

なお、係数Ｋは、例えば、受音スペクトルＦ2（ω）のパワーγと、送話者音声スペクトルＦ1（ω）のパワーδとの差の大きさに依存する係数等である。また、例えば受音スペクトルＦ2（ω）のパワーγの方が、送話者音声スペクトルＦ1（ω）のパワーδに係数Ｋを乗じた値（Ｋ×δ）よりも小さくなる周波数帯域においては、例えば、一定のルールで定められた最小値（各周波数帯域につき一定の値でもよく、送話者音声スペクトルＦ1（ω）の周波数帯域毎の各パワーの値に比例する値等でもよい。）を算出値としてもよく、あるいはゼロとしてもよい。これにより上述した実施の形態と同様な効果を得ることができる。 The coefficient K is, for example, a coefficient that depends on the magnitude of the difference between the power γ of the sound reception spectrum F2 (ω) and the power δ of the transmitter voice spectrum F1 (ω). Further, for example, in the frequency band where the power γ of the sound reception spectrum F2 (ω) is smaller than the value (K × δ) obtained by multiplying the power δ of the speaker voice spectrum F1 (ω) by the coefficient K, For example, a minimum value determined by a certain rule (may be a constant value for each frequency band, or a value proportional to each power value for each frequency band of the speaker voice spectrum F1 (ω)). It may be a calculated value or may be zero. Thereby, the same effect as the above-described embodiment can be obtained.

また、上述した実施の形態においては、音声スペクトル取得部10から送話者音声スペクトルＦ1（ω）を受け取ったとほぼ同時に受音スペクトルＦ2（ω）の中に送話者音声スペクトルＦ1（ω）とほぼ同一の特徴的スペクトル波形がある否かの判断をするようにした場合について述べたが、本発明はこれに限らず、送話者音声スペクトルＦ1（ω）を所定時間単位で遅らせた複数の遅延音声スペクトルを遅延エコー成分生成手段としての音声スペクトル取得部10で生成し、これら複数の遅延音声スペクトルのタイミングで受音スペクトルＦ2（ω）の中に特徴的スペクトル波形がある否かの判断をするようにしても良い。 In the above-described embodiment, the transmitter voice spectrum F1 (ω) is included in the received spectrum F2 (ω) almost simultaneously with the reception of the transmitter voice spectrum F1 (ω) from the voice spectrum acquisition unit 10. The case where it is determined whether or not there is a substantially identical characteristic spectrum waveform has been described. However, the present invention is not limited to this, and a plurality of transmission voice frames F1 (ω) delayed by a predetermined time unit. A delayed speech spectrum is generated by the speech spectrum acquisition unit 10 as a delayed echo component generating means, and it is determined whether or not there is a characteristic spectrum waveform in the received spectrum F2 (ω) at the timing of the plurality of delayed speech spectra. You may make it do.

この場合、スピーカ４とマイクロフォン５とが離れて配置され、スピーカ４から出力された音声がマイクロフォン５に回り込むまで僅かに時間がかかり、受音スペクトルＦ2（ω）の中に特徴的スペクトル波形が送話者音声スペクトルＦ1（ω）をエコー除去部11に送出したタイミングより遅れて現れても、複数の遅延音声スペクトルのタイミングにおいて特徴的スペクトル波形があるか否かを判断することができ、かくしてエコー成分となる送話者音声スペクトルＦ1（ω）のみを確実に取り除くことができる。 In this case, the speaker 4 and the microphone 5 are arranged apart from each other, and it takes a little time for the sound output from the speaker 4 to enter the microphone 5, and a characteristic spectrum waveform is transmitted in the received sound spectrum F2 (ω). Even if the speaker voice spectrum F1 (ω) appears later than the timing when it is sent to the echo removing unit 11, it is possible to determine whether or not there is a characteristic spectrum waveform at the timings of a plurality of delayed voice spectra, thus echoing. Only the speaker voice spectrum F1 (ω) as a component can be surely removed.

（２）第２の実施の形態
図１との対応部分に同一符号を付して示す図２において、31は第２の実施の形態のハンズフリー通話システムを示し、第１の実施の形態とはエコーキャンセラ32の構成が異なるものである。因みに、これら通話端末２Ａ，２Ｂは同一構成からなるが、説明の便宜上、一方の通話端末２Ａを受話者側とし、他方の通話端末２Ｂを送話者側とし、この第２の実施の形態では他方の送話者側の通話端末２Ｂを中心に以下説明する。 (2) Second Embodiment In FIG. 2, in which parts corresponding to those in FIG. 1 are assigned the same reference numerals, 31 denotes a hands-free call system according to the second embodiment, which is the same as in the first embodiment. The configuration of the echo canceller 32 is different. Incidentally, although these call terminals 2A and 2B have the same configuration, for convenience of explanation, one call terminal 2A is set as the receiver side and the other call terminal 2B is set as the caller side. In the second embodiment, The following description will focus on the call terminal 2B on the other transmitter side.

この通話端末２Ｂは、マイクロフォン５によって送話者音声を集音し、得られた音声信号S2aをスペクトル処理部15に送出する。スペクトル処理部15は、音声信号S2aに周波数解析処理を施した後、得られた送話者音声スペクトルＦ4（ω）を信号処理部７及びエコーキャンセラ32に送出する。エコーキャンセラ32は、音声スペクトル取得部33とエコー除去部34とを有し、スペクトル処理部15から送出されるエコー成分スペクトルとしての送話者音声スペクトルＦ4（ω）を音声スペクトル取得部33で取得してゆく。 The call terminal 2B collects the talker's voice using the microphone 5, and sends the obtained voice signal S2a to the spectrum processing unit 15. The spectrum processing unit 15 performs frequency analysis processing on the voice signal S2a, and then sends the obtained speaker voice spectrum F4 (ω) to the signal processing unit 7 and the echo canceller 32. The echo canceller 32 has a voice spectrum acquisition unit 33 and an echo removal unit 34, and the voice spectrum acquisition unit 33 acquires the speaker voice spectrum F4 (ω) as an echo component spectrum transmitted from the spectrum processing unit 15. I will do it.

これに加えてこの音声スペクトル取得部33は、送話者音声スペクトルＦ4（ω）の音声特有の特徴的スペクトル波形をそのままにして１フレーム分づつ時間的に順次遅延させた複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…を生成し得るようになされている。 In addition to this, the voice spectrum acquisition unit 33 keeps the characteristic spectrum waveform peculiar to the voice of the talker voice spectrum F4 (ω) as it is and delays the delayed voice spectrum F5a sequentially in time by one frame. (Ω), F5b (ω),... Can be generated.

遅延エコー成分生成手段及びフレーム遅延エコー成分生成手段としての音声スペクトル取得部33は、これら送話者音性スペクトルＦ4（ω）と、フレーム遅延エコー成分スペクトルとしての遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…とをエコー除去部34に送出し、エコー除去部34はこれら送話者音声スペクトルＦ4（ω）及び複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…を一旦記憶するようになされている。 The speech spectrum acquisition unit 33 serving as the delayed echo component generating means and the frame delayed echo component generating means is configured to transmit the speaker sound characteristics F4 (ω) and the delayed speech spectra F5a (ω), F5b as the frame delayed echo component spectra. (Ω),... Are sent to the echo removing unit 34. The echo removing unit 34 temporarily stores the speaker voice spectrum F4 (ω) and a plurality of delayed voice spectra F5a (ω), F5b (ω),. It is made to do.

信号処理部７は、送話者音声スペクトルＦ4（ω）をスペクトル処理部15から受け取ると、当該送話者音声スペクトルＦ4（ω）をアナログ／デイジタル変換した後、暗号化等の所定のデータ処理を行ない、その結果所定長さのフレーム単位で送信データを得、これに変調処理及び周波数解析処理を施すことにより得られた送信信号S3をアンテナ（図示せず）及びネットワーク３を介して受話者側の通話端末２Ａへ送信する。 When the signal processing unit 7 receives the speaker voice spectrum F4 (ω) from the spectrum processing unit 15, the signal processing unit 7 performs analog / digital conversion on the speaker voice spectrum F4 (ω) and then performs predetermined data processing such as encryption. As a result, transmission data is obtained in a frame unit of a predetermined length, and a transmission signal S3 obtained by performing modulation processing and frequency analysis processing on the transmission data is received via an antenna (not shown) and the network 3 To the call terminal 2A.

また、送話者側の通信端末２Ｂでは、受話者の通信端末２Ａから送信された受音信号S4をアンテナを介して受信すると、これを信号処理部７に送出する。 In addition, when the communication terminal 2B on the transmitter side receives the received sound signal S4 transmitted from the communication terminal 2A of the receiver through the antenna, the communication terminal 2B transmits the received signal S4 to the signal processing unit 7.

ところで、受話者側の通信端末２Ａでは、受話者音声をマイクロフォン５で集音するが、このときスピーカ４から出力された送話者音声がマイクロフォン５に回り込むと、当該送話者音声がスピーカ４から出力したのとほぼ同時にマイクロフォン５に回り込み集音される。このため、送話者側の通信端末２Ｂには、受話者側の通信端末２Ａにおいてマイクロフォン５に周り込んだ送話者音声が受話者音声と重畳して再びネットワーク３を通って帰ってくる。 By the way, in the communication terminal 2A on the receiver side, the receiver's voice is collected by the microphone 5. At this time, when the speaker's voice output from the speaker 4 wraps around the microphone 5, the speaker's voice becomes the speaker 4's. Is output to the microphone 5 and collected at almost the same time. For this reason, the sender's communication terminal 2 B returns to the sender's communication terminal 2 A through the network 3 again with the speaker's voice that has entered the microphone 5 superimposed on the receiver's voice.

送話者側の通信端末２Ｂは、ネットワーク３からアンテナ（図示せず）を介して受信した受音信号S4を信号処理部７において周波数解析処理及び復調処理を施すことにより、所定長さのフレーム単位で受信データを得、これに所定のデータ処理（例えば伝送途中で発生した符号誤りを検出し、マスクする処理等）及び復号化し、その結果得られる音声信号をデイジタル／アナログ変換した後、得られた受音スペクトルＦ6（ω）をエコー除去部34に送出する。 The communication terminal 2B on the transmitter side performs a frequency analysis process and a demodulation process on the received sound signal S4 received from the network 3 via an antenna (not shown) in the signal processing unit 7, thereby a frame having a predetermined length. Received data is obtained in units, predetermined data processing (for example, processing for detecting and masking a code error occurring during transmission) and decoding, and the resulting audio signal is digital / analog converted and then obtained. The received sound spectrum F6 (ω) is sent to the echo removing unit 34.

エコー除去部34は、信号処理部７から受音スペクトルＦ6（ω）を受け取ると、音声スペクトル取得部33から受け取った送話者音声スペクトルＦ4（ω）及び複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…のうちいずれかを用いて受音スペクトルＦ6（ω）に対しエコー除去処理を施すことにより、当該受音スペクトルＦ6（ω）から送話者音声スペクトルＦ2（ω）又は複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…のいずれかに対応する特徴的スペクトル波形を取り除き、これにより得られた受話者音声スペクトルＦ7（ω）をスピーカ４に出力するようになされている。 When the echo removal unit 34 receives the received sound spectrum F6 (ω) from the signal processing unit 7, the echoer sound spectrum F4 (ω) received from the sound spectrum acquisition unit 33 and the plurality of delayed sound spectra F5a (ω), Echo removal processing is performed on the sound reception spectrum F6 (ω) using any one of F5b (ω),... The characteristic spectrum waveform corresponding to any one of the delayed speech spectra F5a (ω), F5b (ω),... Is removed, and the speaker speech spectrum F7 (ω) obtained thereby is output to the speaker 4. ing.

実際上、エコー除去部34は、エコー除去処理として帯域選択によるスペクトル分離処理を行ない、受音スペクトルＦ6（ω）及び送話者音声スペクトルＦ2（ω）同士の間で各周波数帯域毎にパワーの大小を比較した後、当該受音スペクトルＦ6（ω）及び複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…同士との間で各周波数帯域毎にパワーの大小の比較を行なう。 In practice, the echo removal unit 34 performs spectrum separation processing by band selection as echo removal processing, and the power of each frequency band between the sound reception spectrum F6 (ω) and the transmitter voice spectrum F2 (ω) is obtained. After comparing the magnitude, the magnitude of power is compared for each frequency band between the received sound spectrum F6 (ω) and the plurality of delayed speech spectra F5a (ω), F5b (ω),.

ここで送話者音声が、送話者側の通信端末２Ｂのマイクロフォン５からネットワーク３を通った後、受話者側の通信端末２Ａのスピーカ４からマイクロフォン５へと回り込み、再びネットワーク３を通って帰ってくる際に遅延が生じていない場合には、エコー除去部34が音声スペクトル取得部33から送話者音声スペクトルＦ2（ω）を受け取った時点でほぼ同時に信号処理部７から受け取った受音スペクトルＦ6（ω）の中に、送話者音声スペクトルＦ2（ω）に相当する音声特有の特徴的スペクトル波形が形成される。 Here, the speaker's voice passes through the network 3 from the microphone 5 of the communication terminal 2B on the transmitter side, and then enters the microphone 5 from the speaker 4 of the communication terminal 2A on the receiver side. When there is no delay at the time of returning, the sound reception received from the signal processing unit 7 almost at the same time when the echo removal unit 34 receives the transmitter speech spectrum F2 (ω) from the speech spectrum acquisition unit 33. In the spectrum F6 (ω), a characteristic spectrum waveform peculiar to speech corresponding to the speaker speech spectrum F2 (ω) is formed.

このときエコー除去部34は、受音スペクトルＦ6（ω）からエコー成分に相当する送話者音声スペクトルＦ2（ω）を除去して受話者音声スペクトルＦ7（ω）のみを抽出して、これをスピーカ４へ出力し得るようになされている。 At this time, the echo removing unit 34 removes the speaker voice spectrum F2 (ω) corresponding to the echo component from the sound reception spectrum F6 (ω), and extracts only the speaker voice spectrum F7 (ω). It can be output to the speaker 4.

これに対して、ネットワーク３内にある中継点（ルータ）を数多く経由した場合には、そのルータの数だけ当該ルータの通過時間が増えるため、例えば受話者側の通信端末２Ａからネットワーク３を通って送話者側の通信端末２Ｂに受音スペクトルＦ6（ω）が届く際に遅延が発生し易くなる。 On the other hand, when a large number of relay points (routers) in the network 3 are routed, the transit time of the router increases by the number of routers. For example, the communication terminal 2A on the receiver side passes through the network 3. Thus, a delay is likely to occur when the received sound spectrum F6 (ω) reaches the communication terminal 2B on the transmitter side.

このためハンズフリー通話システム31のネットワーク３上で例えば所定のフレーム単位数で遅延が生じた場合には、受音スペクトルＦ6（ω）の中に特徴的スペクトル波形が存在するか否かを判断する時間的なタイミングがずれることにより、エコー除去部34が受音スペクトルＦ6（ω）を受け取った時点で当該受音スペクトルＦ6（ω）の中から送話者音声スペクトルＦ2（ω）を判別し得ない。 For this reason, when a delay occurs, for example, by a predetermined number of frames on the network 3 of the hands-free call system 31, it is determined whether or not a characteristic spectrum waveform exists in the received sound spectrum F6 (ω). Due to the shift in timing, the transmitter voice spectrum F2 (ω) can be discriminated from the received spectrum F6 (ω) when the echo removing unit 34 receives the received spectrum F6 (ω). Absent.

このとき、エコー除去部34は、予め記憶された複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…と受音スペクトルＦ6（ω）とを比較してゆき、当該受音スペクトルＦ6（ω）の中にこれら遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…の中から対応したタイミングの遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…があるか否かを判断する。 At this time, the echo removing unit 34 compares the plurality of previously stored delayed speech spectrums F5a (ω), F5b (ω),... With the received spectrum F6 (ω), and the received received spectrum F6 ( It is determined whether or not there is a delayed speech spectrum F5a (ω), F5b (ω),... at a corresponding timing from among these delayed speech spectra F5a (ω), F5b (ω),.

すなわち、エコー除去部34は、遅延エコー成分スペクトルとしての遅延受音スペクトルＦ5a（ω），Ｆ5b（ω），…のうちから、先ず始めに１フレームだけずらした遅延音声スペクトルＦ5a（ω）を選択し、当該遅延音声スペクトルＦ5a（ω）のタイミングで受音スペクトルＦ6（ω）の中に特徴的スペクトル波形を有するか否かを判断する。 That is, the echo removing unit 34 first selects a delayed speech spectrum F5a (ω) shifted by one frame from the delayed received sound spectra F5a (ω), F5b (ω),... As delayed echo component spectra. Then, it is determined whether or not there is a characteristic spectrum waveform in the sound reception spectrum F6 (ω) at the timing of the delayed sound spectrum F5a (ω).

エコー除去部34は、１フレームずらした遅延音声スペクトルＦ5a（ω）のタイミングで特徴的スペクトル波形が受音スペクトルＦ6（ω）の中に有しないと判断すると、次に２フレームだけずらした遅延音声スペクトルＦ5b（ω）を選択し、当該遅延音声スペクトルＦ5b（ω）のタイミングで受音スペクトルＦ6（ω）の中に特徴的スペクトル波形を有するか否か判断する。 When the echo removing unit 34 determines that the characteristic spectrum waveform does not exist in the received sound spectrum F6 (ω) at the timing of the delayed sound spectrum F5a (ω) shifted by one frame, the delayed sound shifted by two frames next. The spectrum F5b (ω) is selected, and it is determined whether or not the received spectrum F6 (ω) has a characteristic spectrum waveform at the timing of the delayed speech spectrum F5b (ω).

このようにしてエコー除去部34は、信号処理部７から受け取る受音スペクトルＦ6（ω）に対して複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…を順次比較してゆき、当該受音スペクトルＦ6（ω）の中にこれら遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…のいずれかのタイミングで遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…の特徴的スペクトル波形があるか否かを判断する。 In this manner, the echo removing unit 34 sequentially compares the plurality of delayed sound spectra F5a (ω), F5b (ω),... With the sound reception spectrum F6 (ω) received from the signal processing unit 7, and The characteristic spectrum waveform of the delayed speech spectrum F5a (ω), F5b (ω),... At any timing of the delayed speech spectrum F5a (ω), F5b (ω),. Judge whether there is.

そして、エコー除去部34は、受音スペクトルＦ6（ω）の中にこれら複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…のいずれかと対応したタイミングで遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…があると判断すると、受音スペクトルＦ6（ω）から当該遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…を除去し、これにより得られた受話者音声スペクトルＦ7（ω）をスピーカ４から出力する。 The echo removing unit 34 then includes the delayed speech spectrum F5a (ω), at a timing corresponding to any one of the plurality of delayed speech spectra F5a (ω), F5b (ω),. If it is determined that there is F5b (ω),..., The delayed speech spectrums F5a (ω), F5b (ω),... Are removed from the received spectrum F6 (ω), and the obtained speaker speech spectrum F7 ( ω) is output from the speaker 4.

以上の構成において、エコーキャンセラ32では、所定長さのフレーム単位に相当する送話者音声スペクトルＦ2（ω）を取得するとともに、送話者音声スペクトルＦ2（ω）の音声特有の特徴的スペクトル波形をそのままに１フレーム分づつ時間的に順次遅延させてずらした複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…を生成し、これら送話者音声スペクトルＦ2（ω）及び複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…をエコー除去部34で記憶するようにした。 In the above configuration, the echo canceller 32 acquires the speaker voice spectrum F2 (ω) corresponding to a frame unit of a predetermined length, and the voice-specific characteristic spectrum waveform of the speaker voice spectrum F2 (ω). Are generated as a plurality of delayed speech spectra F5a (ω), F5b (ω),..., Which are sequentially delayed by one frame at a time, and the sender speech spectrum F2 (ω) and a plurality of delays are generated. The audio spectrum F5a (ω), F5b (ω),... Is stored in the echo removing unit 34.

また、エコーキャンセラ32では、エコー除去部34が信号処理部７から受音スペクトルＦ6（ω）を受け取ったタイミングで、当該受音スペクトルＦ6（ω）の中に送話者音声スペクトルＦ2（ω）に相当する特徴的スペクトル波形を有するか否かを判断するようにした。 Further, in the echo canceller 32, at the timing when the echo removing unit 34 receives the sound reception spectrum F6 (ω) from the signal processing unit 7, the speaker voice spectrum F2 (ω) in the sound reception spectrum F6 (ω). Whether or not it has a characteristic spectral waveform corresponding to is determined.

エコーキャンセラ32では、エコー除去部34が音声スペクトル取得部33から送話者音声スペクトルＦ2（ω）を受け取った時点でほぼ同時に信号処理部７から受け取った受音スペクトルＦ6（ω）の中に、送話者音声スペクトルＦ2（ω）に相当する音声特有の特徴的スペクトル波形を有すると判断すると、受音スペクトルＦ6（ω）からエコー成分に相当する送話者音声スペクトルＦ2（ω）を除去し、これにより受音スペクトルＦ6（ω）から受話者音声スペクトルＦ7（ω）のみを抽出することができる。 In the echo canceller 32, when the echo removing unit 34 receives the speaker voice spectrum F2 (ω) from the voice spectrum acquiring unit 33, the received sound spectrum F6 (ω) received from the signal processing unit 7 almost simultaneously. If it is determined that the voice has a characteristic spectrum waveform unique to the voice corresponding to the voice spectrum F2 (ω), the voice spectrum F2 (ω) corresponding to the echo component is removed from the received voice spectrum F6 (ω). Thus, only the speaker voice spectrum F7 (ω) can be extracted from the sound reception spectrum F6 (ω).

これに加えてエコーキャンセラ32では、１フレーム分づつ時間的に順次遅延させてずらした複数の遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…のタイミングで、当該受音スペクトルＦ6（ω）の中に遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…のいずれかに相当する特徴的スペクトル波形を有するか否かを判断するようにした。 In addition, in the echo canceller 32, the sound reception spectrum F6 (ω) is received at the timing of a plurality of delayed speech spectra F5a (ω), F5b (ω),... It is determined whether or not it has a characteristic spectrum waveform corresponding to any of the delayed speech spectrums F5a (ω), F5b (ω),.

従って、このエコーキャンセラ32では、送話者側の通信端末２Ｂからネットワーク３を通って受話者側の通信端末２Ａに送信信号S3が届く際、或いは受話者側の通信端末２Ａからネットワーク３を通って送話者側の通信端末２Ｂに受音信号S4が届く際、ネットワーク３内にある多数のルータを経由することにより発生する遅延等の各種ネットワーク３上の遅延が発生しても、フレーム単位で時間的なタイミングをずらした複数の遅延音性スペクトルＦ5a（ω），Ｆ5b（ω），…を用いることにより、受音スペクトルＦ6（ω）の中にある時間的にずれた特徴的スペクトル波形があるか否かを判断し、受音スペクトルＦ6（ω）の中からエコー成分となる遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…を確実に取り除くことができる。 Therefore, in the echo canceller 32, when the transmission signal S3 arrives from the communication terminal 2B on the sender side through the network 3 to the communication terminal 2A on the receiver side, or from the communication terminal 2A on the receiver side through the network 3. When the received signal S4 arrives at the communication terminal 2B on the transmitter side, even if a delay on various networks 3 such as a delay caused by passing through a number of routers in the network 3 occurs, the frame unit , By using a plurality of delayed sound spectra F5a (ω), F5b (ω),... With different timings, the characteristic spectrum waveforms shifted in time in the received sound spectrum F6 (ω). And the delayed speech spectrums F5a (ω), F5b (ω),..., Which are echo components, can be reliably removed from the received sound spectrum F6 (ω).

従って、このエコーキャンセラ32では、ネットワーク３上の遅延が生じる場合であっても、適応フィルタのような複雑で、高価な装置機器を設けることなく、送話者音声スペクトルＦ2（ω）からエコーに相当する遅延音声スペクトルＦ5a（ω），Ｆ5b（ω），…を除去でき、かくして当該エコー除去用の高価な適応フィルタを設けない分だけ、装置全体のコスト低減を図ることができ、かつエコーを除去できる。 Therefore, in this echo canceller 32, even if a delay on the network 3 occurs, the echo from the speaker voice spectrum F2 (ω) is echoed without providing a complicated and expensive device such as an adaptive filter. The corresponding delayed speech spectrums F5a (ω), F5b (ω),... Can be removed, and thus the cost of the entire apparatus can be reduced by providing no expensive adaptive filter for removing the echoes. Can be removed.

（３）第３の実施の形態
図３において、41は全体としてロボット装置に内蔵される第３の実施の形態による音声処理装置を示す。なお、この場合ロボット装置の全体図は省略し、当該音声処理装置41についてのみ説明する。 (3) Third Embodiment In FIG. 3, reference numeral 41 denotes a speech processing apparatus according to a third embodiment which is incorporated in the robot apparatus as a whole. In this case, the overall view of the robot apparatus is omitted, and only the voice processing apparatus 41 will be described.

この音声処理装置41は、音声処理部42で生成した合成音声に所定のデータ処理を施すことにより合成音声スペクトルＦ8（ω）を生成し、これをスピーカ４及び音声スペクトル取得部10へ送出する。これにより音声処理装置41は、合成音声スペクトルＦ8（ω）を音声スペクトル取得部10を介してエコー除去部11へ送出すると共に、合成音声スペクトルＦ8（ω）をスピーカ４から合成音声として出力する。 The voice processing device 41 generates a synthesized voice spectrum F8 (ω) by performing predetermined data processing on the synthesized voice generated by the voice processing unit 42, and sends this to the speaker 4 and the voice spectrum acquisition unit 10. As a result, the speech processing apparatus 41 sends the synthesized speech spectrum F8 (ω) to the echo removing unit 11 via the speech spectrum acquisition unit 10 and outputs the synthesized speech spectrum F8 (ω) from the speaker 4 as synthesized speech.

ここで、この実施の形態の場合には、スピーカ４とマイクロフォン５とが近傍付近に配置されていることから、ユーザの音声をマイクロフォン５で集音する際に、スピーカ４から合成音声が出力されると、当該合成音声がマイクロフォン５に回り込み、当該スピーカ４から出力したのとほぼ同時にマイクロフォン５に集音される。このため、エコー除去部11では、音声スペクトル取得部10から合成音声スペクトルＦ8（ω）を受け取ったときに、受音スペクトルＦ2（ω）をペクトル処理部15からほぼ同時に受け取る。 Here, in the case of this embodiment, since the speaker 4 and the microphone 5 are disposed in the vicinity, when the user's voice is collected by the microphone 5, the synthesized voice is output from the speaker 4. Then, the synthesized voice goes around the microphone 5 and is collected by the microphone 5 almost simultaneously with the output from the speaker 4. Therefore, the echo removing unit 11 receives the received sound spectrum F2 (ω) from the spectrum processing unit 15 almost simultaneously when receiving the synthesized speech spectrum F8 (ω) from the sound spectrum acquiring unit 10.

エコー除去部11は、スピーカ４から出力された合成音声がマイクロフォン５に回り込んで集音されることにより、受音スペクトルＦ2（ω）の中に合成音声スペクトルＦ8（ω）と略同一の特徴的スペクトル波形があるとの判断結果を得る。 The echo removing unit 11 has substantially the same characteristics as the synthesized speech spectrum F8 (ω) in the received sound spectrum F2 (ω) when the synthesized speech output from the speaker 4 wraps around the microphone 5 and is collected. The result of determination that there is a target spectrum waveform is obtained.

このようとき、エコー除去部11は、受音スペクトルＦ2（ω）からエコー成分に相当する合成音声スペクトルＦ8（ω）を除去し、受音スペクトルＦ2（ω）からエコー成分を除去した残りのユーザ音声スペクトルＦ9（ω）のみを抽出して、これを音声処理部42へ送出し得る。 At this time, the echo removing unit 11 removes the synthesized speech spectrum F8 (ω) corresponding to the echo component from the received sound spectrum F2 (ω), and the remaining users who have removed the echo component from the received sound spectrum F2 (ω). Only the speech spectrum F9 (ω) can be extracted and sent to the speech processing unit 42.

これにより、この音声処理装置41は、エコーキャンセラ12でエコー成分を除去したクリアなユーザの音声からなるユーザ音声スペクトルＦ9（ω）を、図示しない音声認識処理部等に送出し、当該ユーザの音声に基づいてロボット装置が確実に各種動作等を行なうように構成されている。 As a result, the voice processing device 41 sends the user voice spectrum F9 (ω) consisting of the clear user voice from which the echo component has been removed by the echo canceller 12 to a voice recognition processing unit (not shown) or the like. The robot apparatus is configured to reliably perform various operations based on the above.

以上の構成において、エコーキャンセラ12では、スピーカ４から出力され、音声特有の特徴的スペクトル波形を有する合成音声スペクトルＦ8（ω）を予め記憶しておき、当該合成音声スペクトルＦ8（ω）の特徴的スペクトル波形に基づいて受音スペクトルＦ2（ω）の中から合成音声スペクトルＦ8（ω）を容易に判断できるので、受音スペクトルＦ2（ω）の中からエコー成分となる合成音声スペクトルＦ8（ω）のみを確実に取り除くことができる。 In the above configuration, the echo canceller 12 stores in advance the synthesized speech spectrum F8 (ω) output from the speaker 4 and having a characteristic spectrum waveform specific to speech, and the characteristic of the synthesized speech spectrum F8 (ω). Since the synthesized speech spectrum F8 (ω) can be easily determined from the received sound spectrum F2 (ω) based on the spectrum waveform, the synthesized speech spectrum F8 (ω) that is an echo component from the received sound spectrum F2 (ω). Can only be removed reliably.

これによりエコーキャンセラ12では、従来のようなエコーそっくりな擬似エコー信号を生成する適応フィルタを別途設けることなく、エコーないクリアなユーザの音声のみを得ることができるので、当該ユーザの音声を音声認識等に用いることができる。 As a result, the echo canceller 12 can obtain only the clear user's voice without echo without providing an adaptive filter that generates a pseudo echo signal similar to the conventional echo. Etc. can be used.

以上、本発明の第１〜第３の実施の形態について説明したが、本発明は、当該第１〜第３の実施の形態に限定されるものではなく、種々の変形実施が可能である。 Although the first to third embodiments of the present invention have been described above, the present invention is not limited to the first to third embodiments, and various modifications can be made.

第１の実施の形態によるハンズフリー通話システムの全体構成を示す概略図である。It is the schematic which shows the whole structure of the hands-free call system by 1st Embodiment. 第２の実施の形態によるハンズフリー通話システムの全体構成を示す概略図である。It is the schematic which shows the whole structure of the hands-free call system by 2nd Embodiment. 第３の実施の形態による音声処理装置の全体構成を示す概略図である。It is the schematic which shows the whole structure of the speech processing unit by 3rd Embodiment.

Explanation of symbols

10 音声スペクトル取得部（エコー成分取得手段、遅延エコー成分生成手段）
33 音声スペクトル取得部（エコー成分取得手段、遅延エコー成分生成手段、フレーム遅延エコー成分生成手段）
11 エコー除去部
12、32 エコーキャンセラ 10 Speech spectrum acquisition unit (echo component acquisition means, delayed echo component generation means)
33 Speech spectrum acquisition unit (echo component acquisition means, delayed echo component generation means, frame delay echo component generation means)
11 Echo remover
12, 32 Echo canceller

Claims

An echo canceller for removing echo generated when sound output through a speaker wraps around a microphone and is collected,
An echo component acquisition means for acquiring an echo component spectrum obtained by frequency analysis of the sound to be an echo component output from the speaker;
A sound reception spectrum obtained by frequency analysis of a sound reception signal obtained by collecting the sound with the microphone is obtained, and the echo component is obtained from the sound reception spectrum on the basis of a characteristic spectrum waveform unique to the voice of the echo component spectrum. An echo canceller comprising: an echo removing means for removing a spectrum.

The voice signal transmitted to the communication terminal on the receiver side through the network is output as a voice from the speaker of the communication terminal, circulates to the microphone, and removes echo generated by returning through the network again. An echo canceller,
Echo component acquisition means for acquiring an echo component spectrum obtained by frequency analysis of the audio signal transmitted to the communication terminal;
A sound reception spectrum obtained by frequency analysis of a sound reception signal received from the communication terminal through the network is acquired, and the echo component is obtained from the sound reception spectrum on the basis of a characteristic spectrum waveform unique to the voice of the echo component spectrum. An echo canceller comprising: an echo removing means for removing a spectrum.

Delay echo component generation means for generating a plurality of delayed echo component spectra that are shifted in timing by delaying the echo component spectrum on the time axis,
The echo removal means determines whether each of the characteristic spectrum waveforms is included in the sound reception spectrum at each timing of the echo component spectrum and the plurality of delayed echo component spectra. The echo canceller according to 1 or 2.

Frame delay echo component generation means for generating a plurality of frame delay echo component spectra obtained by shifting the echo component spectrum by shifting on the time axis in units of a frame of a predetermined length,
The echo removal means determines whether or not each of the characteristic spectrum waveforms is included in the sound reception spectrum at each timing of the echo component spectrum and the plurality of frame delay echo component spectra. Item 3. The echo canceller according to Item 2.

An echo canceling method for removing echo generated when sound output through a speaker wraps around a microphone and is collected,
An echo component acquisition step of acquiring an echo component spectrum obtained by frequency analysis of the sound to be an echo component output from the speaker;
A sound reception spectrum acquisition step of acquiring a sound reception spectrum obtained by frequency analysis of a sound reception signal obtained by collecting sound with the microphone;
An echo cancellation method comprising: an echo removal step of removing a characteristic spectrum waveform peculiar to speech possessed by the echo component spectrum from the received spectrum.

The voice signal transmitted to the communication terminal on the receiver side through the network is output as a voice from the speaker of the communication terminal, circulates to the microphone, and removes echo generated by returning through the network again. An echo cancellation method,
An echo component acquisition step of acquiring an echo component spectrum obtained by frequency analysis of the audio signal to be transmitted to the communication terminal;
A sound reception spectrum obtaining step of obtaining a sound reception spectrum obtained by frequency analysis of a sound reception signal received from the communication terminal through the network;
An echo cancellation method comprising: an echo removal step of removing a characteristic spectrum waveform peculiar to speech possessed by the echo component spectrum from the received spectrum.

A delayed echo component spectrum generating step for generating a plurality of delayed echo component spectra with the timing shifted by delaying the echo component spectrum on the time axis;
The echo removal step determines whether or not each of the characteristic spectrum waveforms is included in the sound reception spectrum at each timing of the echo component spectrum and the plurality of delayed echo component spectra. The echo cancellation method according to 5 or 6.

A frame delay echo component generation step of generating a plurality of frame delay echo component spectra by delaying the echo component spectrum by shifting on the time axis in units of frames of a predetermined length;
The echo removal step determines whether or not each of the characteristic spectrum waveforms is included in the sound reception spectrum at each timing of the echo component spectrum and the plurality of frame delay echo component spectra. Item 7. The echo cancellation method according to item 6.