JP2003274481A

JP2003274481A - Voice processing apparatus, computer program, and recording medium

Info

Publication number: JP2003274481A
Application number: JP2002070868A
Authority: JP
Inventors: Yasuo Nomura; 康雄野村; Yoshinobu Kajikawa; 嘉延梶川
Original assignee: Osaka Industrial Promotion Organization
Current assignee: Osaka Industrial Promotion Organization
Priority date: 2002-03-14
Filing date: 2002-03-14
Publication date: 2003-09-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice processing apparatus capable of quickly eliminating a nonlinear component of an acoustic echo signal and to provide a computer program and a computer-readable recording medium for recording the computer program. <P>SOLUTION: The voice processing apparatus is provided with: a voice signal input terminal 1 for receiving a voice signal from a communication opposite party; a nonlinear inverse system 6 for eliminating in advance nonlinear distortion before a loudspeaker section 3 outputs the received voice signal as a voice; and a linear adaptive filter 5 for estimating a linear component of the acoustic echo signal in a path from the voice signal input terminal 1 to a voice signal output terminal 2, and subtracts the estimated linear component of the acoustic echo signal from the acoustic echo signal generated by a sound reception section 4. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声の双方向通信で
発生する音響エコー信号を低減するための音声処理装
置、入力された音声信号に基づき音響エコー信号を低減
させる処理を行うコンピュータプログラム、及び該コン
ピュータプログラムが記録されているコンピュータでの
読取りが可能な記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice processing device for reducing an acoustic echo signal generated in two-way voice communication, a computer program for reducing the acoustic echo signal based on an input voice signal, and The present invention relates to a computer-readable recording medium in which the computer program is recorded.

【０００２】[0002]

【従来の技術】拡声機能付き携帯電話機、自動車電話等
の装置にあっては、テレビ会議システム、ハンズフリー
フォン等と同様に、ハンドセットを用いることなく、ス
ピーカとマイクロホンとを利用して通話先の遠端話者と
通話することが可能である。2. Description of the Related Art In a device such as a mobile phone with a loudspeaker function and a car phone, a speaker and a microphone can be used as a call destination without using a handset like a video conference system and a hands-free phone. It is possible to talk to the far-end speaker.

【０００３】拡声機能付き携帯電話機、自動車電話等で
は携帯電話網のような通信回線を通じて音声信号の送受
信を行う。このとき、通話先から送信された音声信号
は、スピーカによって拡声されて話者の耳に届くととも
に、拡声された音声がマイクロホンに回込み、拡声音が
重畳された音声信号が通話先に送信されることになる。
すなわち、通話先の遠端話者がマイクロホンに向かって
発した音声が、その遠端話者の耳に遅れて戻ってくると
いうフィードバック現象が生じることになり、いわゆる
音響エコーが発生する。In a mobile phone with a voice amplification function, a car phone, etc., a voice signal is transmitted and received through a communication line such as a mobile phone network. At this time, the voice signal transmitted from the call destination is loudened by the speaker and reaches the speaker's ear, and the loud voice is circulated to the microphone, and the voice signal superimposed with the loud sound is transmitted to the call destination. Will be.
That is, a feedback phenomenon occurs in which the voice emitted by the far-end talker to the microphone returns to the ear of the far-end talker, and a so-called acoustic echo occurs.

【０００４】このような音響エコーは自然な通話を妨害
するため、抑制する必要がある。従来、音響エコーを抑
制するために、スピーカからマイクロホンに至るスピー
カ出力の伝搬路（音響エコー経路）における音響エコー
信号を適応フィルタにより推定することによって、疑似
音響エコー信号を生成し、生成した疑似音響エコー信号
を差引くことによって音響エコーを抑制する音響エコー
キャンセラ装置が開発されている。Since such acoustic echo interferes with a natural call, it needs to be suppressed. Conventionally, in order to suppress acoustic echo, a pseudo acoustic echo signal is generated by estimating an acoustic echo signal in a speaker output propagation path (acoustic echo path) from a speaker to a microphone by an adaptive filter, and the generated pseudo acoustic signal is generated. Acoustic echo canceller devices have been developed that suppress acoustic echoes by subtracting echo signals.

【０００５】前述の適応フィルタには、ＦＩＲ（finite
impulse response）フィルタに代表される線形適応フ
ィルタを用いることが一般的である。適応フィルタとし
て線形適応フィルタを用いた場合、音響エコー経路に非
線形の要因が存在するときにはその性能が劣化すること
が知られている。ところで、拡声通話で用いられるスピ
ーカは非線形性を有するため、従来の音響エコーキャン
セラ装置では十分に音響エコー信号を低減することがで
きないという問題点を有していた。特に、拡声機能付き
携帯電話機で利用されるスピーカは安価であり、しかも
小型のものであることが多く、非線形性が強くなるため
性能の劣化が著しいという問題点を有していた。The adaptive filter described above includes FIR (finite
An impulse response) filter is generally used as a linear adaptive filter. It is known that when a linear adaptive filter is used as the adaptive filter, its performance deteriorates when a non-linear factor exists in the acoustic echo path. By the way, since the speaker used in the voice call has non-linearity, there is a problem that the conventional acoustic echo canceller device cannot sufficiently reduce the acoustic echo signal. In particular, a speaker used in a mobile phone with a loudspeaker function has a problem that it is inexpensive and often has a small size, and its non-linearity becomes strong, resulting in significant deterioration of performance.

【０００６】そこで、前述の問題点を解決するために、
非線形適応フィルタを利用した音響エコーキャンセラ装
置が開発されている。図８は従来の音響エコーキャンセ
ラ装置を説明するブロック図である。音響エコーキャン
セラ装置は、例えば、拡声機能付き携帯電話機、自動車
電話等に内蔵されている。通話先からデジタル信号によ
って送信された音声信号は、公衆電話回線網、携帯電話
網等の通信回線を通じて音声信号入力端子１に入力され
た後、スピーカを有する拡声部３によって外部に音声と
して出力される。また、話者の音声はマイクロホンを有
する受音部４から入力された後、音声信号出力端子２か
ら通信回線を通じて通話先に送信される。Therefore, in order to solve the above-mentioned problems,
An acoustic echo canceller device using a non-linear adaptive filter has been developed. FIG. 8 is a block diagram illustrating a conventional acoustic echo canceller device. The acoustic echo canceller device is built in, for example, a mobile phone with a loud sound function, a car phone, or the like. A voice signal transmitted as a digital signal from a call destination is input to a voice signal input terminal 1 through a communication line such as a public telephone line network or a mobile telephone network, and then output as voice to the outside by a loud speaker 3 having a speaker. It Also, the voice of the speaker is input from the sound receiving unit 4 having a microphone and then transmitted from the voice signal output terminal 2 to the call destination through the communication line.

【０００７】線形適応フィルタ５ａは、音声信号入力端
子１から入力された音声信号に基づいて、受音部４に生
じる音響エコー信号の線形成分を推定して、疑似音響エ
コー信号の線形成分を生成するようになっており、非線
形適応フィルタ５ａは、同様に、受音部４に生じる音響
エコー信号の非線形成分を推定して、疑似音響エコー信
号の非線形成分を生成するようになっている。生成され
た疑似音響エコー信号の線形成分と非線型成分とは演算
処理部５ｃによって加算されて出力される。The linear adaptive filter 5a estimates the linear component of the acoustic echo signal generated in the sound receiving section 4 based on the audio signal input from the audio signal input terminal 1 to generate the linear component of the pseudo acoustic echo signal. Similarly, the non-linear adaptive filter 5a estimates the non-linear component of the acoustic echo signal generated in the sound receiving unit 4 and generates the non-linear component of the pseudo acoustic echo signal. The linear component and the non-linear component of the generated pseudo acoustic echo signal are added by the arithmetic processing unit 5c and output.

【０００８】そして、線形成分と非線形成分とを加算し
て生成した疑似音響エコー信号は演算処理部７に入力さ
れる。演算処理部７は、音声入力部４にて発生した音響
エコー信号から疑似音響エコー信号を差引く処理を行
う。したがって、音声入力部４で発生した音響エコー信
号は、演算処理部７にて消去されることになる。The pseudo acoustic echo signal generated by adding the linear component and the non-linear component is input to the arithmetic processing unit 7. The arithmetic processing unit 7 performs a process of subtracting the pseudo acoustic echo signal from the acoustic echo signal generated by the voice input unit 4. Therefore, the acoustic echo signal generated by the voice input unit 4 is erased by the arithmetic processing unit 7.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、従来の
音響エコーキャンセラ装置では、Volterraフィルタと呼
ばれる非線形適応フィルタが利用されているが、Volter
raフィルタの収束速度（音響エコー経路の同定速度）は
一般的に遅く、また、達成されるエコー消去量が劣化す
る等の問題点を含んでいるため、実用化の面で克服すべ
き多くの課題が残されていた。とりわけ、拡声機能付き
携帯電話機で利用されるスピーカは安価であり、しかも
小型のものであることが多く、このようなスピーカから
出力される音声は非線形性が強くなるため音響エコー信
号の非線形成分を確実かつ速やかに低減することができ
る音響エコーキャンセラ装置の開発が望まれていた。However, in the conventional acoustic echo canceller device, a nonlinear adaptive filter called a Volterra filter is used.
Ra filter convergence speed (acoustic echo path identification speed) is generally slow, and since it contains problems such as deterioration of the amount of echo cancellation to be achieved, many practical problems must be overcome. There were challenges left. In particular, the speaker used in the mobile phone with the loud sound function is inexpensive and often small in size, and the sound output from such a speaker has a strong non-linearity. It has been desired to develop an acoustic echo canceller device that can surely and quickly reduce the acoustic echo canceller device.

【００１０】本発明は斯かる事情に鑑みてなされたもの
であり、受信手段が音声信号を受信した場合、推定した
非線形の歪みを音声として出力すべき音声信号から除去
し、音声出力手段から出力された音声が音声入力手段に
入力されることにより生じる音響エコー信号の線形成分
を推定し、推定した音響エコー信号の線形成分を音声入
力手段にて生じた音響エコー信号から除去する構成とす
ることにより、発生し得る音響エコー信号の非線形成分
を抑制するとともに、速やかに音響エコー信号を除去す
ることが可能な音声処理装置、コンピュータプログラ
ム、及びコンピュータでの読取りが可能な記録媒体を提
供することを目的とする。The present invention has been made in view of the above circumstances, and when the receiving means receives an audio signal, the estimated nonlinear distortion is removed from the audio signal to be output as audio and output from the audio output means. A linear component of the acoustic echo signal generated by the input of the input voice to the voice input means is estimated, and the linear component of the estimated acoustic echo signal is removed from the acoustic echo signal generated by the voice input means. According to the present invention, it is possible to provide a sound processing device, a computer program, and a computer-readable recording medium capable of suppressing a nonlinear component of an acoustic echo signal that may occur and quickly removing the acoustic echo signal. To aim.

【００１１】[0011]

【課題を解決するための手段】第１発明に係る音声処理
装置は、通信機器から送信された音声信号を受信する受
信手段と、該受信手段が受信した音声信号に基づいて外
部へ音声を出力する音声出力手段と、外部の音声を受音
して音声信号を生成する音声入力手段と、該音声入力手
段にて生成された音声信号を前記通信機器へ送信する送
信手段とを備え、前記音声出力手段から出力された音声
が前記音声入力手段に入力されることにより生じる音響
エコー信号を低減すべくなしてある音声処理装置におい
て、前記受信手段にて音声信号を受信した場合、前記音
声出力手段から出力される音声に伴う非線形の歪みを前
記音声信号に基づき推定する手段と、音響エコー信号の
線形成分を前記音声信号に基づき推定する手段と、推定
した非線形の歪みを前記音声出力手段から出力すべき音
声信号から除去する手段と、推定した音響エコー信号の
線形成分を前記音声入力手段にて生じた音響エコー信号
から除去する手段とを備えることを特徴とする。According to a first aspect of the present invention, there is provided a voice processing device, which receives a voice signal transmitted from a communication device and outputs a voice to the outside based on the voice signal received by the receiving means. A voice output unit for receiving an external voice to generate a voice signal, and a transmitting unit for transmitting the voice signal generated by the voice input unit to the communication device. In a voice processing device configured to reduce an acoustic echo signal generated when the voice output from the output unit is input to the voice input unit, when the voice signal is received by the receiving unit, the voice output unit Means for estimating a non-linear distortion associated with the sound output from the sound signal based on the sound signal, means for estimating a linear component of an acoustic echo signal based on the sound signal, and the estimated non-linear distortion Means for removing from the audio signal to be outputted from the sound output unit, characterized in that it comprises a means for removing the linear component of the estimated acoustic echo signal from the acoustic echo signal generated by said voice input means.

【００１２】第２発明に係る音声処理装置は、第１発明
に係る音声処理装置において、前記非線形の歪みを推定
する際に生じる遅延時間を算出する手段を備え、前記受
信手段にて音声信号を受信した後、算出した遅延時刻が
経過した場合に推定した音響エコー信号の線形成分を前
記音声入力手段にて生じた音響エコー信号から除去すべ
くなしてあることを特徴とする。A sound processing apparatus according to a second aspect of the present invention is the sound processing apparatus according to the first aspect of the present invention, further comprising means for calculating a delay time that occurs when estimating the non-linear distortion, and the receiving means for outputting the sound signal. It is characterized in that the linear component of the acoustic echo signal estimated when the calculated delay time has elapsed after reception is removed from the acoustic echo signal generated by the voice input means.

【００１３】第３発明に係る音声処理装置は、第１発明
又は第２発明に係る音声処理装置において、音声信号と
Volterra級数との畳込み演算処理により、前記音声出力
手段から出力される音声に伴う非線形の歪みを推定すべ
くなしてあることを特徴とする。A voice processing device according to a third aspect of the invention is the voice processing device according to the first or second aspect of the invention, wherein
It is characterized in that a non-linear distortion associated with a voice output from the voice output means is estimated by a convolution calculation process with a Volterra series.

【００１４】第４発明に係るコンピュータプログラム
は、コンピュータに、入力された音声信号に基づいて出
力すべき音声に伴う非線形の歪みを推定させるステップ
と、コンピュータに、推定した非線形の歪みを前記音声
信号から除去させるステップと、コンピュータに、非線
形の歪みを除去した音声信号に基づいて音声を出力させ
るステップと、コンピュータに、入力された音声信号に
基づいて音響エコー信号の線形成分を推定させるステッ
プと、コンピュータに、推定した音響エコー信号の線形
成分を送信すべき音声信号から除去させるステップと、
コンピュータに、音響エコー信号の線形成分を除去した
音声信号を送信させるステップとを有することを特徴と
する。A computer program according to a fourth aspect of the present invention causes a computer to estimate a non-linear distortion associated with a voice to be output based on an input voice signal, and causes the computer to estimate the estimated non-linear distortion. A step of causing the computer to output a voice based on the voice signal from which the non-linear distortion is removed, a step of causing the computer to estimate a linear component of the acoustic echo signal based on the input voice signal, Causing the computer to remove the linear component of the estimated acoustic echo signal from the audio signal to be transmitted,
Causing the computer to transmit the audio signal from which the linear component of the acoustic echo signal has been removed.

【００１５】第５発明に係るコンピュータでの読取りが
可能な記録媒体は、コンピュータに、入力された音声信
号に基づいて出力すべき音声に伴う非線形の歪みを推定
させるステップと、コンピュータに、推定した非線形の
歪みを前記音声信号から除去させるステップと、コンピ
ュータに、非線形の歪みを除去した音声信号に基づいて
音声を出力させるステップと、コンピュータに、入力さ
れた音声信号に基づいて音響エコー信号の線形成分を推
定させるステップと、コンピュータに、推定した音響エ
コー信号の線形成分を送信すべき音声信号から除去させ
るステップと、コンピュータに音響エコー信号の線形成
分を除去した音声信号を送信させるステップとを有する
コンピュータプログラムが記録されていることを特徴と
する。A computer-readable recording medium according to the fifth aspect of the present invention includes a step of causing a computer to estimate a non-linear distortion associated with a voice to be output based on an input voice signal, and the computer estimates the nonlinear distortion. Removing non-linear distortion from the audio signal; causing a computer to output audio based on the audio signal from which the non-linear distortion is removed; and causing the computer to output the linear sound echo signal based on the input audio signal. The step of estimating the component, the step of causing the computer to remove the estimated linear component of the acoustic echo signal from the voice signal to be transmitted, and the step of causing the computer to transmit the voice signal from which the linear component of the acoustic echo signal has been removed. A computer program is recorded.

【００１６】第１発明にあっては、受信手段にて音声信
号を受信した場合、音声信号から非線形の歪み及び音声
入力手段に発生する音響エコー信号の線形成分を推定
し、推定した非線形の歪みを音声として出力すべき音声
信号から除去すると共に、音声出力手段から出力された
音声が音声入力手段に入力されることにより生じる音響
エコー信号から推定した音響エコー信号の線形成分を除
去するようにしている。したがって、例えば、スピーカ
のような音声出力手段の特性を考慮した非線形適応フィ
ルタを音声出力手段の前段に実装することによって、出
力される音声に伴う非線形の歪みを容易に除去すること
が可能である。また、出力される音声から非線形の歪み
を除去することによって、予め非線形の要因を取り除い
ているため、音声入力手段に生じうる音響エコー信号は
線形成分のみとなる。音響エコー信号の線形成分は、Ｆ
ＩＲフィルタのような従来から利用されている線形適応
フィルタを用いて容易に除去することが可能であるた
め、本発明では、音響エコー信号の非線形成分を同定す
る必要がなく、速やかに音響エコー信号を低減すること
が可能となる。また、出力される音声から非線形の歪み
が除去されるため、拡声通話の音質が向上する。According to the first aspect of the invention, when the receiving means receives the audio signal, the nonlinear distortion and the linear component of the acoustic echo signal generated in the audio input means are estimated from the audio signal, and the estimated nonlinear distortion is estimated. Is removed from the voice signal to be output as voice, and the linear component of the acoustic echo signal estimated from the acoustic echo signal generated when the voice output from the voice output means is input to the voice input means is removed. There is. Therefore, for example, by installing a non-linear adaptive filter in consideration of the characteristics of the audio output means such as a speaker in the preceding stage of the audio output means, it is possible to easily remove the non-linear distortion associated with the output audio. . Further, since the nonlinear factor is removed in advance by removing the non-linear distortion from the output voice, the acoustic echo signal that can occur in the voice input means has only a linear component. The linear component of the acoustic echo signal is F
Since it can be easily removed by using a conventionally used linear adaptive filter such as an IR filter, in the present invention, it is not necessary to identify the nonlinear component of the acoustic echo signal, and the acoustic echo signal can be quickly output. Can be reduced. Further, since the non-linear distortion is removed from the output voice, the sound quality of the voice call is improved.

【００１７】第２発明にあっては、非線形の歪みを推定
する際に生じる遅延時間を算出する手段を備えており、
音声信号を受信した後、算出した遅延時間が経過した場
合に推定した音響エコー信号の線形成分を除去するよう
にしている。したがって、発生し得る音響エコー信号の
みを送信すべき音声信号から取り除くことが可能であ
る。According to the second aspect of the invention, there is provided means for calculating the delay time that occurs when estimating the non-linear distortion.
The linear component of the acoustic echo signal estimated when the calculated delay time has elapsed after receiving the audio signal is removed. Therefore, it is possible to remove only possible acoustic echo signals from the audio signal to be transmitted.

【００１８】第３発明にあっては、受信した音声信号と
Volterra級数との畳込み演算処理により、出力される音
声に伴う非線形の歪みを推定するようにしている。した
がって、例えば、スピーカのような音声出力手段の特性
に応じてVolterra級数を予め定めておくことにより、速
やかに非線形の歪みを除去することが可能となる。In the third invention, the received voice signal and
The convolution operation with the Volterra series is used to estimate the non-linear distortion associated with the output speech. Therefore, for example, by setting the Volterra series in advance according to the characteristics of the audio output means such as a speaker, it is possible to quickly remove the non-linear distortion.

【００１９】第４発明及び第５発明にあっては、入力さ
れた音声信号に基づいて出力音声に伴う非線形の歪みと
発生しうる音響エコー信号の線形成分を推定し、出力す
べき音声信号から非線形の歪みを除去すると共に、送信
すべき音声信号から音響エコー信号の線形成分を除去す
るようにしている。したがって、例えば、スピーカのよ
うな音声出力手段の特性を考慮して非線形の歪みを推定
することによって、出力される音声に伴う非線形の歪み
を容易に除去することが可能である。また、出力される
音声から非線形の歪みを除去することによって、予め非
線形の要因を取り除いているため、音声入力手段に生じ
うる音響エコー信号は線形成分のみとなり、容易に音響
エコー信号の線形成分を除去することが可能である。本
発明では、音響エコー信号の非線形成分を同定する必要
がなく、速やかに音響エコー信号を低減することが可能
となる。また、出力される音声から非線形の歪みが除去
されるため、拡声通話の音質が向上する。According to the fourth and fifth aspects of the invention, the non-linear distortion associated with the output voice and the linear component of the acoustic echo signal that can occur are estimated based on the input voice signal, and the estimated voice signal is output from the voice signal to be output. In addition to removing the non-linear distortion, the linear component of the acoustic echo signal is removed from the voice signal to be transmitted. Therefore, for example, by estimating the non-linear distortion in consideration of the characteristics of the audio output means such as a speaker, it is possible to easily remove the non-linear distortion associated with the output sound. In addition, since the nonlinear factor is removed in advance by removing the non-linear distortion from the output voice, the acoustic echo signal that can occur in the voice input means has only a linear component, and the linear component of the acoustic echo signal can be easily converted. It can be removed. According to the present invention, it is not necessary to identify the nonlinear component of the acoustic echo signal, and the acoustic echo signal can be promptly reduced. Further, since the non-linear distortion is removed from the output voice, the sound quality of the voice call is improved.

【００２０】[0020]

【発明の実施の形態】以下、本発明の音声処理装置を具
体化した音響エコーキャンセラ装置についてその実施の
形態を示す図面を用いて具体的に説明する。音響エコー
キャンセラ装置は、例えば、拡声機能付き携帯電話機、
自動車電話機等に備えられており、スピーカを備える音
声出力手段から出力された音声がマイクロホンを備える
音声入力手段に入力されることによって生じる音響エコ
ーを抑制する機能を有する。BEST MODE FOR CARRYING OUT THE INVENTION An acoustic echo canceller device embodying a voice processing device of the present invention will be specifically described below with reference to the drawings showing an embodiment thereof. The acoustic echo canceller device is, for example, a mobile phone with a loud sound function,
It is provided in an automobile telephone or the like and has a function of suppressing an acoustic echo generated when a voice output from a voice output unit including a speaker is input to a voice input unit including a microphone.

【００２１】実施の形態１．図１は本実施の形態に係る
音響エコーキャンセラ装置を説明するブロック図であ
る。図中１は、通話相手の遠端話者が持つ電話機、携帯
電話機等の通信端末装置（不図示）から送信される音声
信号を公衆電話回線網、携帯電話網等の通信回線（不図
示）を介して受信するための音声信号入力端子であり、
該音声信号入力端子１に入力された音声信号は後述する
非線形逆システム６により信号処理が施された後、拡声
部３から音声として外部に出力される。また、話者から
発せられる音声は、受音部４から入力された後、音声信
号出力端子２から音声信号として出力され、遠端話者の
通信端末装置に送信される。Embodiment 1. FIG. 1 is a block diagram illustrating an acoustic echo canceller device according to this embodiment. In FIG. 1, reference numeral 1 denotes a voice signal transmitted from a communication terminal device (not shown) such as a telephone or a mobile phone held by a far-end talker, which is a public telephone line network, a communication line such as a mobile telephone network (not shown). Is an audio signal input terminal for receiving via
The voice signal input to the voice signal input terminal 1 is subjected to signal processing by a non-linear inverse system 6 which will be described later, and then output from the loudspeaker 3 as voice to the outside. In addition, the voice uttered by the speaker is input from the sound receiving unit 4, is output as a voice signal from the voice signal output terminal 2, and is transmitted to the communication terminal device of the far-end speaker.

【００２２】なお、前記通信回線を通じて送受信がされ
る音声信号はデジタル信号であることが望ましい。しか
し、アナログ信号によって音声信号を送受信することも
可能であり、その場合には、音声信号入力端子１に入力
された音声信号をデジタル信号に変換するＡ／Ｄ変換
器、及び音声信号出力端子２から送信する信号をアナロ
グ信号に変換するＤ／Ａ変換器を備えている必要があ
る。The voice signal transmitted and received through the communication line is preferably a digital signal. However, it is also possible to transmit and receive a voice signal by an analog signal. In that case, an A / D converter for converting the voice signal input to the voice signal input terminal 1 into a digital signal, and the voice signal output terminal 2 It is necessary to include a D / A converter that converts a signal transmitted from the device into an analog signal.

【００２３】拡声部３は、図に示していないＤ／Ａ変換
器、増幅器、スピーカを備えており、非線形逆システム
６によって信号処理が施された音声信号が音声として出
力される。受音部４は、図に示していないマイクロホ
ン、増幅器、Ａ／Ｄ変換器を備えており、外部の音声を
受音してデジタル信号による音声信号を生成する。The loudspeaker 3 is provided with a D / A converter, an amplifier and a speaker which are not shown in the figure, and the voice signal processed by the nonlinear inverse system 6 is output as voice. The sound receiving unit 4 includes a microphone, an amplifier, and an A / D converter, which are not shown in the figure, and receives an external sound and generates a digital audio signal.

【００２４】音声信号入力端子１から音声信号出力端子
２に至る経路には、遅延器８及び線形適応フィルタ５が
接続されている。音声信号入力端子１に入力された音声
信号は遅延器８により所定時間（例えば、数十ｍｓｅ
ｃ）だけ遅延した後、線形適応フィルタ５に入力され
る。線形適応フィルタは、例えば、ＦＩＲフィルタであ
り、入力された音声信号に基づき疑似音響エコー信号の
線形成分を生成する。A delay device 8 and a linear adaptive filter 5 are connected to the path from the audio signal input terminal 1 to the audio signal output terminal 2. The audio signal input to the audio signal input terminal 1 is delayed by the delay unit 8 for a predetermined time (for example, several tens mse
After being delayed by c), it is input to the linear adaptive filter 5. The linear adaptive filter is, for example, an FIR filter, and generates a linear component of the pseudo acoustic echo signal based on the input audio signal.

【００２５】生成された疑似音響エコー信号の線形成分
は演算処理部７に入力される。演算処理部７は、受音部
４で発生した音響エコー信号から疑似音響エコー信号の
線形成分を差引く処理を行う。The linear component of the generated pseudo acoustic echo signal is input to the arithmetic processing unit 7. The arithmetic processing unit 7 performs a process of subtracting the linear component of the pseudo acoustic echo signal from the acoustic echo signal generated by the sound receiving unit 4.

【００２６】一方、非線形逆システム６は、入力された
音声信号に基づいて拡声部から出力される音声に伴う非
線形の歪みを推定し、拡声部３で生じる非線形の歪みを
予め除去しており、拡声部３から出力される音声に非線
形の歪みが伴わないようにしている。On the other hand, the non-linear inverse system 6 estimates the non-linear distortion associated with the voice output from the loudspeaker based on the input voice signal, and removes the non-linear distortion caused in the loudspeaker 3 in advance. The sound output from the loudspeaker 3 is prevented from being accompanied by non-linear distortion.

【００２７】本発明では、前述のように受音部４で発生
する音響エコー信号の線形成分を推定した疑似音響エコ
ー信号を生成して除去するとともに、音声信号の伝達経
路における非線形の歪みを拡声部３から出力される音声
信号から除去することによって、受音部４で発生する音
響エコー信号の非線形成分を抑制する。また、拡声部３
から出力される音声から非線形の歪みを除去するため、
拡声音の音質が向上する。In the present invention, as described above, the pseudo acoustic echo signal in which the linear component of the acoustic echo signal generated in the sound receiving unit 4 is estimated is generated and removed, and the non-linear distortion in the transmission path of the voice signal is amplified. By removing from the audio signal output from the unit 3, the nonlinear component of the acoustic echo signal generated in the sound receiving unit 4 is suppressed. Also, the loudspeaker 3
In order to remove the non-linear distortion from the sound output from
The sound quality of the loud sound is improved.

【００２８】図２は、非線形逆システム６の機能を説明
するブロック図である。非線形逆システム６と拡声部３
とからなる伝達系に非線形時不変の因果関係がある場
合、その伝達系の入出力関係は、次式のようなVolterra
級数展開によって表すことが可能である。FIG. 2 is a block diagram illustrating the function of the nonlinear inverse system 6. Non-linear inverse system 6 and loudspeaker 3
When there is a non-linear time-invariant causal relationship in the transfer system consisting of and, the input-output relationship of the transfer system is
It can be represented by a series expansion.

【００２９】[0029]

【数１】 [Equation 1]

【００３０】ここで、ｘ（ｎ）及びｙ（ｎ）は、それぞ
れ伝達系に対する入力信号および出力信号であり、ｈ₁
（ｋ₁ ）は１次のVolterra核、ｈ₂ （ｋ₁ ，ｋ₂ ）は２
次のVolterra核を表している。本実施の形態では、簡易
化のために２次のVolterra核まで採用し、また、Volter
ra核が有限のタップ数Ｎを持つとして前記伝達系の非線
形性を表現する。Here, x (n) and y (n) are an input signal and an output signal to the transmission system, respectively, and h ₁
(K ₁ ) is the primary Volterra nucleus and h ₂ (k ₁ , k ₂ ) is 2
It represents the next Volterra nucleus. In this embodiment, a secondary Volterra nucleus is adopted for simplification, and
The nonlinearity of the transfer system is expressed by assuming that the ra kernel has a finite number of taps N.

【００３１】本発明では、拡声部３（特に、拡声部３が
有するスピーカ）で発生する非線形の歪みを除去するこ
とを一つの目的としており、その非線形の歪みをVolter
ra級数と入力信号ｘ（ｎ）との畳込みによって表すこと
を特徴としている。すなわち、図２に示した如く、拡声
部３に仮想的に１次Volterraフィルタ３１と２次Volter
raフィルタ３２とを設けたことと等価であり、演算処理
部３３によりVolterraフィルタ３１，３２から出力され
た信号を加算して、非線形の歪みを仮想的に発生させる
構成としている。One object of the present invention is to eliminate the non-linear distortion generated in the loudspeaker 3 (particularly the speaker included in the loudspeaker 3).
It is characterized by being expressed by the convolution of the ra series and the input signal x (n). That is, as shown in FIG. 2, the first-order Volterra filter 31 and the second-order Volter are virtually included in the loudspeaker 3.
This is equivalent to the provision of the ra filter 32, and the arithmetic processing unit 33 adds signals output from the Volterra filters 31 and 32 to virtually generate nonlinear distortion.

【００３２】非線形逆システム６では、逆に拡声部３で
発生する非線形の歪みを推定し、入力された音声信号か
ら非線形の歪みを差引いた音声信号を出力する。したが
って、演算処理部３３には非線形の歪みが含まれていな
い音声信号が出力されることになり、音声信号における
非線形の要因を予め取り除くことが可能である。On the contrary, the non-linear inverse system 6 estimates the non-linear distortion generated in the loudspeaker 3 and outputs the audio signal obtained by subtracting the non-linear distortion from the input audio signal. Therefore, the audio signal that does not include the nonlinear distortion is output to the arithmetic processing unit 33, and the nonlinear factor in the audio signal can be removed in advance.

【００３３】非線形逆システム６では、以下の２種類の
音声信号を生成する。一つは、遅延器６１を通過させる
ことによって、所定時間だけ遅延した音声信号を生成す
る。ここで生成した音声信号が２次Volterraフィルタ３
２を通過することによって、所定時間だけ遅延した非線
形の歪みを伴う音声信号が発生する。一方、音声信号が
モデル信号生成部６２に入力された場合、入力された音
声信号に基づいて非線形の歪みを模擬した音声信号が生
成され、更に線形逆フィルタ６３によって、線形の歪み
を伴う音声信号を生成する。ここで生成した信号は、演
算処理部６４にて符号が反転され、前述の遅延した音声
信号に印加される。そして、線形の歪み及び非線形の歪
みを伴う音声信号が、１次Volterraフィルタ３１に入力
されることによって、まず線形の歪みが除去され、次い
で演算処理部３３にて非線形の歪みが除去される。The nonlinear inverse system 6 generates the following two types of audio signals. One is to generate an audio signal delayed by a predetermined time by passing through the delay device 61. The audio signal generated here is the second-order Volterra filter 3
By passing through 2, an audio signal with nonlinear distortion delayed by a predetermined time is generated. On the other hand, when an audio signal is input to the model signal generation unit 62, an audio signal simulating a non-linear distortion is generated based on the input audio signal, and the linear inverse filter 63 further causes an audio signal with linear distortion. To generate. The sign of the signal generated here is inverted by the arithmetic processing unit 64 and applied to the delayed audio signal. Then, the audio signal accompanied by the linear distortion and the non-linear distortion is input to the first-order Volterra filter 31, so that the linear distortion is removed first, and then the arithmetic processing unit 33 removes the non-linear distortion.

【００３４】図３は、本実施の形態に係る音響エコーキ
ャンセラ装置のエコー消去特性を示すグラフである。横
軸には時間、縦軸にはエコー消去量をとる。ここで、エ
コー消去量は、ある入力信号（例えば、有色雑音または
白色雑音）ｙ₀ （ｋ）と、該入力信号ｙ₀ （ｋ）を入力
した場合に推定される音響エコー信号ｙ₁ （ｋ）とを用
いて、FIG. 3 is a graph showing the echo cancellation characteristic of the acoustic echo canceller device according to this embodiment. The horizontal axis represents time, and the vertical axis represents the echo cancellation amount. Here, the echo cancellation amount is a certain input signal (for example, colored noise or white noise) y ₀ (k) and an acoustic echo signal y ₁ (k estimated when the input signal y ₀ (k) is input. ) And

【００３５】[0035]

【数２】 [Equation 2]

【００３６】のように表すことができる。It can be expressed as

【００３７】図３では、本実施の形態の音響エコーキャ
ンセラ装置によるエコー消去特性をグラフ３ｂに示して
おり、比較の対象として、伝達経路を線形として音響エ
コーキャンセラ装置に線形適応フィルタを用いたシュミ
レーションの結果（グラフ３ａ）、伝達経路を非線形と
して音響エコーキャンセラ装置に線形適応フィルタを用
いたシュミレーションの結果（グラフ３ｄ）、従来のVo
lterraフィルタを用いた音響エコーキャンセラ装置の結
果（グラフ３ｃ）を示している。In FIG. 3, an echo canceling characteristic of the acoustic echo canceller device according to the present embodiment is shown in a graph 3b. As a comparison target, a simulation using a linear adaptive filter in the acoustic echo canceller device with a linear transmission path is used. Result (graph 3a), a simulation result using a linear adaptive filter in the acoustic echo canceller device with a non-linear transmission path (graph 3d), and the conventional Vo
The result (graph 3c) of the acoustic echo canceller device using the lterra filter is shown.

【００３８】図３に示した如く、本実施の形態の音響エ
コーキャンセラ装置による場合、比較的短時間でエコー
消去量が２５［ｄＢ］を超えていることが分かる。伝達
経路を線形として音響エコーキャンセラ装置に線形適応
フィルタを用いた場合（グラフ３ａ）、エコー消去量が
最も改善されているが、これは伝達経路の非線形性を考
慮していない理想的な状態であるからであり、いわば性
能限界を示すグラフである。一方、伝達経路を非線形と
して音響エコーキャンセラ装置に線形適応フィルタを用
いた場合（グラフ３ｄ）、伝達経路の非線形性が外乱と
して影響を及ぼすため、伝達経路を線形と仮定した場合
と比較してエコー消去量が著しく劣化している。また、
従来のVolterraフィルタを用いた音響エコーキャンセラ
装置では、エコー消去量が改善されているが、音響エコ
ー信号を算出する速度が非常に遅い。これらに対して、
本実施の形態に係る音響エコーキャンセラ装置は、短時
間で音響エコー信号を算出することができ、従来の音響
エコーキャンセラ装置と比較してもエコー消去量が改善
されていることが分かる。As shown in FIG. 3, it can be seen that the amount of echo cancellation exceeds 25 [dB] in a relatively short time in the case of the acoustic echo canceller device of this embodiment. When the linear adaptive filter is used in the acoustic echo canceller device with the transmission path being linear (graph 3a), the amount of echo cancellation is most improved, but this is in an ideal state where the non-linearity of the transmission path is not considered. This is because it is, so to speak, a graph showing the performance limit. On the other hand, when a linear adaptive filter is used in the acoustic echo canceller device with a non-linear transmission path (graph 3d), since the non-linearity of the transmission path affects as a disturbance, the echo is compared with the case where the transmission path is assumed to be linear. The erased amount is significantly degraded. Also,
In the conventional acoustic echo canceller device using the Volterra filter, the echo cancellation amount is improved, but the acoustic echo signal calculation speed is very slow. Against these,
It can be seen that the acoustic echo canceller device according to the present embodiment can calculate the acoustic echo signal in a short time, and the echo cancellation amount is improved even compared with the conventional acoustic echo canceller device.

【００３９】図４は、音響エコーキャンセラ装置で必要
な演算量を比較したグラフである。グラフの横軸にはタ
ップ数（数式１におけるＮの値）をとり、縦軸には本実
施の形態に係る音響エコーキャンセラ装置で必要な演算
量を従来のVolterraフィルタで必要な演算量により除算
した値をとる。FIG. 4 is a graph comparing the calculation amounts required in the acoustic echo canceller device. The horizontal axis of the graph represents the number of taps (the value of N in Equation 1), and the vertical axis represents the calculation amount required by the acoustic echo canceller device according to the present embodiment divided by the calculation amount required by the conventional Volterra filter. Takes the value

【００４０】本実施の形態に係る音響エコーキャンセラ
装置では、拡声部３で発生する非線形の歪みを推定する
とき、非線形逆システム６のモデル信号生成部６２で事
前にプリセットした信号を生成する構成であるため、タ
ップ数Ｎに依らず一定の演算量を消費するだけである。
したがって、全体の演算量は、音響エコー信号の線形成
分を線形適応フィルタ５で推定する際に必要な演算量に
依存しており、タップ数Ｎに比例した値となる。一方、
従来のVolterraフィルタを用いた音響エコーキャンセラ
装置による場合、Volterra級数の２次の項を算出する必
要があり、演算量はタップ数Ｎの二乗に比例することに
なる。In the acoustic echo canceller device according to this embodiment, when estimating the non-linear distortion generated in the loudspeaker 3, the model signal generator 62 of the non-linear inverse system 6 generates a preset signal. Therefore, regardless of the number N of taps, only a certain amount of calculation is consumed.
Therefore, the overall calculation amount depends on the calculation amount required when the linear component of the acoustic echo signal is estimated by the linear adaptive filter 5, and is a value proportional to the number N of taps. on the other hand,
In the case of the acoustic echo canceller device using the conventional Volterra filter, it is necessary to calculate the quadratic term of the Volterra series, and the calculation amount is proportional to the square of the tap number N.

【００４１】発明者らによるシュミレーションの結果、
タップ数を増加させる程、エコー消去量の改善をするこ
とが可能であることが知られているが、全体の演算量お
よび非線形逆システム６での遅延を考慮した場合、タッ
プ数Ｎは１２８が実用的である。タップ数Ｎを１２８に
とった場合、図４に示されているように演算量は従来の
ものと比べておよそ１／５に削減することが可能であ
る。As a result of the simulation by the inventors,
It is known that the echo cancellation amount can be improved as the number of taps is increased. However, when the total amount of calculation and the delay in the nonlinear inverse system 6 are taken into consideration, the number N of taps is 128. It is practical. When the number of taps N is set to 128, the calculation amount can be reduced to about 1/5 as compared with the conventional one as shown in FIG.

【００４２】このように、本実施の形態に係る音響エコ
ーキャンセラ装置では、Volterraフィルタを利用した従
来の音響エコーキャンセラ装置と比較して演算量が大幅
に削減され、エコー消去量の改善が見られる。また、拡
声部３から出力する音声信号の非線形の歪みを除去して
いるため、音質が向上するという利点も有している。As described above, in the acoustic echo canceller device according to the present embodiment, the amount of calculation is greatly reduced as compared with the conventional acoustic echo canceller device using the Volterra filter, and the echo cancellation amount is improved. . Further, since the non-linear distortion of the audio signal output from the loudspeaker 3 is removed, there is an advantage that the sound quality is improved.

【００４３】なお、本実施の形態に係る音響エコーキャ
ンセラ装置を携帯電話機のような小型の装置に組込む場
合、線形適応フィルタ５、及び非線形逆システム６はＤ
ＳＰ（デジタルシグナルプロセッサ）、専用ＬＳＩ等を
利用することにより実現することができ、また、線形適
応フィルタ５、非線形逆システム６、線形適応フィルタ
５、及び遅延器８を一体化したＤＳＰ又は専用ＬＳＩを
用いることも可能である。When the acoustic echo canceller device according to this embodiment is incorporated in a small device such as a mobile phone, the linear adaptive filter 5 and the nonlinear inverse system 6 are D
This can be realized by using an SP (digital signal processor), a dedicated LSI, or the like, and a DSP or a dedicated LSI in which the linear adaptive filter 5, the nonlinear inverse system 6, the linear adaptive filter 5, and the delay device 8 are integrated. It is also possible to use.

【００４４】また、本実施の形態では、Volterra級数の
２次の項まで用いて音声信号の非線形歪みを算出してい
るが、３次以上の高次の項を用いて非線形歪みを算出す
ることもできることは、勿論のことである。Further, in the present embodiment, the nonlinear distortion of the audio signal is calculated by using the second-order terms of the Volterra series, but the nonlinear distortion is calculated by using the third-order and higher-order terms. Of course, you can also do it.

【００４５】実施の形態２．前述の音響エコーキャンセ
ラ装置は、拡声機能付き携帯電話機、自動車電話機等に
適用されるだけでなく、テレビ会議システム、電話会議
システム等の音声信号の双方向通信が利用されるシステ
ムにも適用することが可能である。本実施の形態では、
テレビ会議システムに適用した実施の形態について説明
する。Embodiment 2. The above-mentioned acoustic echo canceller device should be applied not only to mobile phones with a loudspeaker function, car phones, etc., but also to systems that use two-way communication of audio signals such as video conference systems and telephone conference systems. Is possible. In this embodiment,
An embodiment applied to a video conference system will be described.

【００４６】図５は、本実施の形態に係るテレビ会議シ
ステムを説明する模式図である。図中１００は、テレビ
会議システムで利用される通信装置であり、該通信装置
１００は、公衆電話回線網のような通信ネットワークＮ
を介して通信先の通信装置１００に接続されている。FIG. 5 is a schematic diagram for explaining the video conference system according to this embodiment. In the figure, 100 is a communication device used in a video conference system, and the communication device 100 is a communication network N such as a public telephone line network.
It is connected to the communication device 100 of the communication destination via.

【００４７】通信装置１００は、後述するようにスピー
カ及びマイクロホンを備えており、通信ネットワークＮ
を介して音声信号の送受信を行うことが可能であり、ま
た、ＣＣＤカメラ、ビデオカメラのような撮像装置、液
晶ディスプレイのような表示装置を備えており、通信ネ
ットワークＮを介して映像データの送受信を行うことが
可能である。なお、音声信号及び映像データを送信する
際、それらを同期して送信することが望ましいが、本発
明にあっては必ずしも必須の要件ではない。The communication device 100 includes a speaker and a microphone as described later, and the communication network N
It is possible to transmit and receive an audio signal via the communication network N, is equipped with an image pickup device such as a CCD camera and a video camera, and a display device such as a liquid crystal display. It is possible to It should be noted that when transmitting the audio signal and the video data, it is desirable to transmit them in synchronization, but it is not always an essential requirement in the present invention.

【００４８】本実施の形態では、通信ネットワークＮを
介して音声信号を受信した場合、受信した音声信号から
非線形の歪み、及び音響エコー信号の線形成分をコンピ
ュータプログラムの演算処理により算出し、マイクロホ
ンから出力すべき音声信号から算出した非線形の歪みを
除去するとともに、スピーカにて発生した音響エコー信
号の線形成分を除去するようにしている。In the present embodiment, when a voice signal is received via the communication network N, nonlinear distortion and a linear component of the acoustic echo signal are calculated from the received voice signal by the arithmetic processing of the computer program, and the result is output from the microphone. In addition to removing the non-linear distortion calculated from the audio signal to be output, the linear component of the acoustic echo signal generated in the speaker is also removed.

【００４９】図６は、テレビ会議システムで利用される
通信装置１００の内部構成を示すブロック図である。通
信装置１００は、ＣＰＵを有する制御部１０１を備えて
おり、バス１０２を介して、ＲＯＭ１０３、ＲＡＭ１０
４、操作部１０５、表示部１０６、通信部１０７、拡声
部１０８、受音部１０９、撮像部１１０、及び補助記憶
装置１１１等の各ハードウェアに接続されている。制御
部１０１は、ＲＯＭ１０３に格納された本発明のプログ
ラム、演算処理プログラム、キー入力処理プログラム等
の各種制御プログラムに従って、それらのハードウェア
を制御する。ＲＡＭ１０４はＳＲＡＭ又はフラッシュメ
モリ等で構成され、ＲＯＭ１０３に格納された各種制御
プログラムの実行時に発生するデータ、通信部１０７に
て送受信する映像データ等を一時的に記憶する。FIG. 6 is a block diagram showing an internal configuration of the communication device 100 used in the video conference system. The communication device 100 includes a control unit 101 having a CPU, and a ROM 103 and a RAM 10 via a bus 102.
4, the operation unit 105, the display unit 106, the communication unit 107, the loud sound unit 108, the sound receiving unit 109, the imaging unit 110, the auxiliary storage device 111, and other hardware. The control unit 101 controls the hardware according to various control programs such as the program of the present invention, the arithmetic processing program, and the key input processing program stored in the ROM 103. The RAM 104 is configured by SRAM, flash memory, or the like, and temporarily stores data generated when various control programs stored in the ROM 103 are executed, video data transmitted and received by the communication unit 107, and the like.

【００５０】操作部１０５は、通信装置１００を操作す
るために必要なテンキー、ファンクションキー等のハー
ドウェアキー又はソフトウェアキーを備えている。表示
部１０６は、液晶ディスプレイのような表示装置を備え
ており、通信部１０７にて受信した映像データ等を表示
する。The operation section 105 is provided with hardware keys or software keys such as ten keys and function keys necessary for operating the communication device 100. The display unit 106 includes a display device such as a liquid crystal display, and displays the video data and the like received by the communication unit 107.

【００５１】通信部１０７は、モデムのような回線終端
装置を備えており、拡声部１０８及び受音部１０９に入
力された音声信号、並びに撮像部１１０に入力された映
像データ等を通信ネットワークＮを通じて送受信する際
の制御を行う。The communication section 107 is provided with a line terminating device such as a modem, and receives the audio signals input to the loudspeaker section 108 and the sound receiving section 109, the video data input to the image pickup section 110, and the like from the communication network N. It controls when sending and receiving through.

【００５２】拡声部１０８は、図に示していないＤ／Ａ
変換器、増幅器、スピーカを備えており、制御部１０１
により信号処理が施された後、音声信号が音声として出
力される。受音部４は、図に示していないマイクロホ
ン、増幅器、Ａ／Ｄ変換器を備えており、外部の音声を
受音してデジタル信号としての音声信号を生成する。The loudspeaker 108 has a D / A (not shown).
The control unit 101 includes a converter, an amplifier, and a speaker.
After the signal processing is performed by, the voice signal is output as voice. The sound receiving unit 4 includes a microphone, an amplifier, and an A / D converter, which are not shown, and receives an external sound and generates a sound signal as a digital signal.

【００５３】撮像部１１０は、ＣＣＤカメラ、又はビデ
オカメラのような撮像装置を有しており、話者を撮像し
て映像データを取得し、通信ネットワークＮを通じて通
信先の通信装置１００に送信する。The image pickup unit 110 has an image pickup device such as a CCD camera or a video camera, picks up an image of a speaker, obtains video data, and transmits it to the communication device 100 of the communication destination through the communication network N. .

【００５４】補助記憶部１１１は、本発明のコンピュー
タプログラム及びデータを記録したＣＤ−ＲＯＭ等の記
録媒体１１２からコンピュータプログラム及びデータを
読取るＣＤ−ＲＯＭドライブ等からなり、読取られたコ
ンピュータプログラム及びデータは、ＲＯＭ１０３に記
憶される。なお、本発明のコンピュータプログラムは、
必ずしも記録媒体１１２により提供される形態である必
要はなく、ＲＯＭ１０３に予め記憶されている形態であ
ってもよい。The auxiliary storage section 111 is composed of a CD-ROM drive or the like for reading the computer program and data from a recording medium 112 such as a CD-ROM in which the computer program and data of the present invention are recorded. The read computer program and data are , ROM 103. The computer program of the present invention is
The form provided by the recording medium 112 is not necessarily required, and the form stored in the ROM 103 in advance may be used.

【００５５】図７は、通信装置１００による音声信号の
処理手順を示すフローチャートである。制御部１０１
は、まず、通信部１０７にて音声信号を受信したか否か
を判断する（ステップＳ１）。音声信号を受信していな
い場合（Ｓ１：ＮＯ）、音声信号を受信するまで待機す
る。FIG. 7 is a flow chart showing a processing procedure of a voice signal by the communication device 100. Control unit 101
First, the communication unit 107 determines whether or not an audio signal is received (step S1). When the voice signal is not received (S1: NO), it waits until the voice signal is received.

【００５６】音声信号を受信した場合（Ｓ１：ＹＥ
Ｓ）、受信した音声信号に基づいて拡声部１０８にて出
力される音声に伴う非線形の歪みを算出する（ステップ
Ｓ２）。非線形の歪みを算出する際、拡声部１０８の特
性を考慮して予めプリセットしてあるVolterra級数の２
次の項を利用して学習同定法（ＮＬＭＳ法）により算出
する。また、受信した音声信号に基づいて音響エコー信
号の線形成分を算出する（ステップＳ３）。音響エコー
信号の線形成分を算出する際、前述の学習同定法の他、
ＬＭＳアルゴリズム、ＲＬＳアルゴリズム等の従来法を
用いることが可能である。When a voice signal is received (S1: YE
S), based on the received voice signal, a non-linear distortion associated with the voice output by the loudspeaker 108 is calculated (step S2). When calculating the non-linear distortion, the Volterra series 2 preset in consideration of the characteristics of the loudspeaker 108 is used.
It is calculated by the learning identification method (NLMS method) using the following terms. Further, the linear component of the acoustic echo signal is calculated based on the received voice signal (step S3). When calculating the linear component of the acoustic echo signal, in addition to the learning identification method described above,
Conventional methods such as the LMS algorithm and RLS algorithm can be used.

【００５７】次いで、制御部１０１は、ステップＳ２に
より算出した非線形の歪みを受信した音声信号から除去
し（ステップＳ４）、非線形の歪みを除去した音声信号
を音声として拡声部１０８から出力する（ステップＳ
５）。Next, the control unit 101 removes the non-linear distortion calculated in step S2 from the received voice signal (step S4), and outputs the non-linear distortion-removed voice signal from the loudspeaker unit 108 (step). S
5).

【００５８】次いで、制御部１０１は、ステップＳ１に
て音声信号を受信してから所定時間が経過したか否かを
判断する（ステップＳ６）。所定時間が経過していない
と判断した場合（Ｓ６：ＮＯ）、所定時間が経過するま
で待機する。なお、前記所定時間は、ステップＳ２にて
非線形の歪みを算出する際に生じる遅延時間に対応して
おり、音響エコー信号の線形成分を算出するにあたっ
て、入力された音声信号に基づき算出される時間であ
る。Next, the control unit 101 determines whether or not a predetermined time has elapsed after receiving the audio signal in step S1 (step S6). When it is determined that the predetermined time has not elapsed (S6: NO), the process stands by until the predetermined time elapses. The predetermined time corresponds to the delay time that occurs when the nonlinear distortion is calculated in step S2, and when calculating the linear component of the acoustic echo signal, the time calculated based on the input audio signal. Is.

【００５９】所定時間が経過したと判断した場合（Ｓ
６：ＹＥＳ）、受音部１０９にて受音した音声信号から
音響エコー信号の線形成分を除去する（ステップＳ
７）。そして、音響エコー信号の線形成分を除去した
後、音声信号を通信部１０７から送信する（ステップＳ
８）。When it is determined that the predetermined time has passed (S
6: YES), the linear component of the acoustic echo signal is removed from the sound signal received by the sound receiving unit 109 (step S).
7). Then, after removing the linear component of the acoustic echo signal, the audio signal is transmitted from the communication unit 107 (step S
8).

【００６０】なお、本実施の形態では、コンピュータプ
ログラムの処理により音声信号の非線形歪みと、音響エ
コー信号の線形成分とを除去することとしたが、実施の
形態１で説明したようなＤＳＰ、専用ＬＳＩを用いて実
現することも可能である。In the present embodiment, the non-linear distortion of the audio signal and the linear component of the acoustic echo signal are removed by the processing of the computer program. It can also be realized by using an LSI.

【００６１】[0061]

【発明の効果】以上、詳述したように、第１発明による
場合は、受信手段にて音声信号を受信した場合、音声信
号から非線形の歪み及び音声入力手段に発生する音響エ
コー信号の線形成分を推定し、推定した非線形の歪みを
音声として出力すべき音声信号から除去すると共に、音
声出力手段から出力された音声が音声入力手段に入力さ
れることにより生じる音響エコー信号から推定した音響
エコー信号の線形成分を除去するようにしている。した
がって、例えば、スピーカのような音声出力手段の特性
を考慮した非線形適応フィルタを音声出力手段の前段に
実装することによって、出力される音声に伴う非線形の
歪みを容易に除去することが可能である。また、出力さ
れる音声から非線形の歪みを除去することによって、予
め非線形の要因を取り除いているため、音声入力手段に
生じうる音響エコー信号は線形成分のみとなる。音響エ
コー信号の線形成分は、ＦＩＲフィルタのような従来か
ら利用されている線形適応フィルタを用いて容易に除去
することが可能であるため、本発明では、音響エコー信
号の非線形成分を同定する必要がなく、速やかに音響エ
コー信号を低減することが可能となる。また、出力され
る音声から非線形の歪みが除去されるため、拡声通話の
音質が向上する。As described above in detail, according to the first aspect of the present invention, when the receiving means receives the audio signal, the audio signal has a non-linear distortion and the linear component of the acoustic echo signal generated in the audio input means. Is estimated, and the estimated nonlinear distortion is removed from the voice signal to be output as voice, and the acoustic echo signal estimated from the acoustic echo signal generated when the voice output from the voice output means is input to the voice input means. The linear component of is removed. Therefore, for example, by installing a non-linear adaptive filter in consideration of the characteristics of the audio output means such as a speaker in the preceding stage of the audio output means, it is possible to easily remove the non-linear distortion associated with the output audio. . Further, since the nonlinear factor is removed in advance by removing the non-linear distortion from the output voice, the acoustic echo signal that can occur in the voice input means has only a linear component. Since the linear component of the acoustic echo signal can be easily removed by using a conventionally used linear adaptive filter such as an FIR filter, the present invention needs to identify the nonlinear component of the acoustic echo signal. Therefore, the acoustic echo signal can be reduced promptly. Further, since the non-linear distortion is removed from the output voice, the sound quality of the voice call is improved.

【００６２】第２発明による場合は、非線形の歪みを推
定する際に生じる遅延時間を算出する手段を備えてお
り、音声信号を受信した後、算出した遅延時間が経過し
た場合に推定した音響エコー信号の線形成分を除去する
ようにしている。したがって、発生し得る音響エコー信
号のみを送信すべき音声信号から取り除くことが可能で
ある。According to the second aspect of the invention, means for calculating the delay time generated when estimating the non-linear distortion is provided, and the acoustic echo estimated when the calculated delay time elapses after receiving the audio signal. The linear component of the signal is removed. Therefore, it is possible to remove only possible acoustic echo signals from the audio signal to be transmitted.

【００６３】第３発明による場合は、受信した音声信号
とVolterra級数との畳込み演算処理により、出力される
音声に伴う非線形の歪みを推定するようにしている。し
たがって、例えば、スピーカのような音声出力手段の特
性に応じてVolterra級数を予め定めておくことにより、
速やかに非線形の歪みを除去することが可能となる。In the case of the third aspect of the invention, the non-linear distortion associated with the output voice is estimated by the convolution operation processing of the received voice signal and the Volterra series. Therefore, for example, by presetting the Volterra series according to the characteristics of the audio output means such as a speaker,
It is possible to quickly remove the non-linear distortion.

【００６４】第４発明及び第５発明による場合は、入力
された音声信号に基づいて出力音声に伴う非線形の歪み
と発生しうる音響エコー信号の線形成分を推定し、出力
すべき音声信号から非線形の歪みを除去すると共に、送
信すべき音声信号から音響エコー信号の線形成分を除去
するようにしている。したがって、例えば、スピーカの
ような音声出力手段の特性を考慮して非線形の歪みを推
定することによって、出力される音声に伴う非線形の歪
みを容易に除去することが可能である。また、出力され
る音声から非線形の歪みを除去することによって、予め
非線形の要因を取り除いているため、音声入力手段に生
じうる音響エコー信号は線形成分のみとなり、容易に音
響エコー信号の線形成分を除去することが可能である。
本発明では、音響エコー信号の非線形成分を同定する必
要がなく、速やかに音響エコー信号を低減することが可
能となる。また、出力される音声から非線形の歪みが除
去されるため、拡声通話の音質が向上する等、本発明は
優れた効果を奏する。In the case of the fourth and fifth inventions, the nonlinear distortion associated with the output voice and the linear component of the acoustic echo signal which can occur are estimated based on the input voice signal, and the nonlinear component is estimated from the voice signal to be output. Is removed, and the linear component of the acoustic echo signal is removed from the voice signal to be transmitted. Therefore, for example, by estimating the non-linear distortion in consideration of the characteristics of the audio output means such as a speaker, it is possible to easily remove the non-linear distortion associated with the output sound. In addition, since the nonlinear factor is removed in advance by removing the non-linear distortion from the output voice, the acoustic echo signal that can occur in the voice input means has only a linear component, and the linear component of the acoustic echo signal can be easily converted. It can be removed.
According to the present invention, it is not necessary to identify the nonlinear component of the acoustic echo signal, and the acoustic echo signal can be promptly reduced. Further, since the non-linear distortion is removed from the output voice, the present invention has an excellent effect such that the sound quality of the voice call is improved.

[Brief description of drawings]

【図１】本実施の形態に係る音響エコーキャンセラ装置
を説明するブロック図である。FIG. 1 is a block diagram illustrating an acoustic echo canceller device according to this embodiment.

【図２】非線形逆システムの機能を説明するブロック図
である。FIG. 2 is a block diagram illustrating the function of a nonlinear inverse system.

【図３】本実施の形態に係る音響エコーキャンセラ装置
のエコー消去特性を示すグラフである。FIG. 3 is a graph showing echo cancellation characteristics of the acoustic echo canceller device according to the present embodiment.

【図４】音響エコーキャンセラ装置で必要な演算量を比
較したグラフである。FIG. 4 is a graph comparing the calculation amounts required in the acoustic echo canceller device.

【図５】本実施の形態に係るテレビ会議システムを説明
する模式図である。FIG. 5 is a schematic diagram illustrating a video conference system according to the present embodiment.

【図６】テレビ会議システムで利用される通信装置の内
部構成を示すブロック図である。FIG. 6 is a block diagram showing an internal configuration of a communication device used in the video conference system.

【図７】通信装置による音声信号の処理手順を示すフロ
ーチャートである。FIG. 7 is a flowchart showing a processing procedure of an audio signal by the communication device.

【図８】従来の音響エコーキャンセラ装置を説明するブ
ロック図である。FIG. 8 is a block diagram illustrating a conventional acoustic echo canceller device.

[Explanation of symbols]

１音声信号入力端子２音声信号出力端子３拡声部４受音部５線形適応フィルタ６非線形逆システム７演算処理部８遅延器 1 Audio signal input terminal 2 Audio signal output terminal 3 loudspeaker 4 sound receiving section 5 Linear adaptive filter 6 Non-linear inverse system 7 Arithmetic processing section 8 delay device

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5D020 CC00 5K027 DD07 DD10 HH03 5K046 BB01 CC29 HH24 HH30 HH54 HH79 HH80 ─────────────────────────────────────────────────── ─── Continued front page F term (reference) 5D020 CC00 5K027 DD07 DD10 HH03 5K046 BB01 CC29 HH24 HH30 HH54 HH79 HH80

Claims

[Claims]

1. A receiving means for receiving a voice signal transmitted from a communication device, a voice output means for outputting a voice to the outside based on the voice signal received by the receiving means, and a means for receiving an external voice. A voice input means for generating a voice signal and a transmitting means for transmitting the voice signal generated by the voice input means to the communication device are provided, and the voice output from the voice output means is input to the voice input means. In a voice processing device configured to reduce an acoustic echo signal caused by the above, when a voice signal is received by the receiving unit, a non-linear distortion associated with a voice output from the voice output unit is generated in the voice signal. Based on the audio signal, a means for estimating a linear component of an acoustic echo signal based on the audio signal, and an estimated nonlinear distortion from the audio signal to be output from the audio output means. Speech processing apparatus characterized by comprising: means for to, and means for removing a linear component of the estimated acoustic echo signal from the acoustic echo signal generated by said voice input means.

2. An acoustic echo signal estimated when a delay time calculated after the audio signal is received by the receiving means is provided with a means for calculating a delay time generated when the nonlinear distortion is estimated. 2. The audio processing apparatus according to claim 1, wherein the linear component of is removed from the acoustic echo signal generated by the audio input means.

3. A non-linear distortion associated with a voice output from the voice output means is estimated by a convolution operation process of a voice signal and a Volterra series. 2. The voice processing device according to 2.

4. A step of causing a computer to estimate a non-linear distortion associated with a voice to be output based on an input voice signal; a step of causing the computer to remove the estimated non-linear distortion from the voice signal; To output a voice based on the voice signal from which the non-linear distortion is removed, a step of causing the computer to estimate a linear component of the acoustic echo signal based on the input voice signal, and a computer to estimate the estimated acoustic echo. A computer program comprising: a step of removing a linear component of a signal from an audio signal to be transmitted; and a step of causing a computer to transmit an audio signal from which a linear component of an acoustic echo signal has been removed.

5. A step of causing a computer to estimate a non-linear distortion associated with a voice to be output based on an input voice signal, a step of causing the computer to remove the estimated non-linear distortion from the voice signal, To output a voice based on the voice signal from which the non-linear distortion is removed, a step of causing the computer to estimate a linear component of the acoustic echo signal based on the input voice signal, and a computer to estimate the estimated acoustic echo. A computer program recorded with a computer program having a step of removing a linear component of a signal from an audio signal to be transmitted and a step of causing a computer to transmit an audio signal from which a linear component of an acoustic echo signal is transmitted. Readable recording medium.