JP2009124386A

JP2009124386A - Voice signal processor, and voice signal processing method

Info

Publication number: JP2009124386A
Application number: JP2007295570A
Authority: JP
Inventors: Nobuyuki Kihara; 信之木原; Yasuhiko Kato; 靖彦加藤; Yohei Sakuraba; 洋平櫻庭
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2007-11-14
Filing date: 2007-11-14
Publication date: 2009-06-04

Abstract

<P>PROBLEM TO BE SOLVED: To constantly obtain successful reproduced sound as much as possible corresponding to uncertainty of a speaker which is used in an acoustic system of sound reinforcement communication system. <P>SOLUTION: An echo canceler which performs echo cancellation by adaptive processing is constituted so that frequency characteristics of a voice signal for reproduction sound source are changed based on frequency characteristics of the voice signal for reproduced sound source input as the one which becomes the basis of speaker reproduced sound and frequency characteristics of a collected voice signal collected from a microphone, and the voice signal after change is output to the speaker side. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、いわゆるエコーキャンセルといわれる音声信号処理機能を有する音声信号処理装置とその方法とに関するものである。 The present invention relates to an audio signal processing apparatus having an audio signal processing function called so-called echo cancellation and a method thereof.

電話機でのハンズフリー通話のほか、音声会議システム及びテレビ会議システムなどにおける音声送受信処理系などのようにして、お互いに離れた場所や位置に居る話者間での通話、会話などが行えるように構成された音響システムは拡声通話系などともいわれ、既に実用化され、また、普及している。
上記の拡声通話系システムでは、例えば、なんらかの通信方式に従って相互に通信可能な通信端末装置が複数の異なる場所に配置される。そのうえで、一方の通信端末装置側にてマイクロフォンで収音した音声が、上記一方の通信端末装置から他方の通信端末装置に対して送信され、これを受信した他方の通信端末装置側にてスピーカから音として放出するようにされる。これにより、遠隔した場所にいる話者同士の会話が可能となるものである。
例えば特許文献１には、適応フィルタを備えることによりエコーキャンセルが行われるように構成された拡声通話系の音響システムが記載されている。 In addition to hands-free phone calls, voice communication systems such as audio conferencing systems and video conferencing systems can be used to make calls and conversations between speakers at remote locations and locations. The constructed acoustic system is also called a loudspeaker call system, and has already been put into practical use and has become widespread.
In the above voice communication system, for example, communication terminal devices that can communicate with each other according to some communication method are arranged at a plurality of different locations. In addition, the sound picked up by the microphone on one communication terminal apparatus side is transmitted from the one communication terminal apparatus to the other communication terminal apparatus, and is received from the speaker on the other communication terminal apparatus side that has received the sound. It is made to emit as sound. As a result, conversations between speakers at remote locations are possible.
For example, Patent Document 1 describes a loudspeaker-based sound system configured to perform echo cancellation by providing an adaptive filter.

特開２００７−２３５７７０号公報JP 2007-235770 A

ところで、実際に拡声通話系の音響システムを利用するのにあたっては、スピーカには、あり合わせのものを使用して接続する場合が多い。例えばテレビ会議システムであれば、他方の会議場の様子を表示するためのモニタ、テレビジョン受像機に付属しているスピーカを使用することなどがしばしば行われる。このことは、拡声通話系の音響システムが利用されるときには、どのような音響再生特性のスピーカが使用されるのかを特定できない、即ち、使用されるスピーカについては不確定性がある、ということを意味している。
しかし、拡声通話系の音響システムの核である通信端末装置は、一般には、一定の周波数特性設定によりスピーカ出力用の音声信号を出力するようにしか構成されていない。
このために、使用されるスピーカの再生音響特性が良好でないような場合には、再生される音声にこの特性がそのまま反映されることになり、良好な再生音質を得ることができなくなってしまう。
そこで、本願発明は、拡声通話系の音響システムにおいて、使用されるスピーカの不確定性にも対応して、常にできるだけ良好な再生音が得られるようにすることを目的とする。 By the way, when actually using the sound system of the loudspeaking call system, the speaker is often connected using a common one. For example, in the case of a video conference system, a monitor for displaying the state of the other conference hall, a speaker attached to the television receiver, and the like are often used. This means that when a sound system of a loudspeaking call system is used, it is not possible to specify what kind of sound reproduction characteristic speaker is used, that is, there is uncertainty about the speaker used. I mean.
However, the communication terminal device that is the core of the sound system of the loudspeaking call system is generally configured only to output a sound signal for speaker output with a certain frequency characteristic setting.
For this reason, when the reproduction acoustic characteristic of the speaker used is not good, this characteristic is reflected as it is in the reproduced sound, and it becomes impossible to obtain good reproduction sound quality.
SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to make it possible to always obtain the best possible reproduced sound in response to the uncertainties of the speakers used in a loudspeaker-based sound system.

そこで本発明は上記した課題を解決するために、音声信号処理装置として下記のように構成する。
つまり、スピーカから放出させる音の源として音声信号処理装置が入力する再生音源用音声信号に基づいて得られる音声信号を参照信号とし、マイクロフォンにより収音したことに基づいて得られる収音音声信号を所望信号として、この所望信号に含まれる、スピーカから放出されて上記マイクロフォンに到達してくるものとされる音の音声信号成分が最小となるようにして適応信号処理を実行するエコーキャンセル処理手段と、再生音源用音声信号の周波数特性と、スピーカから放出されて上記マイクロフォンに到達してきたものとされるスピーカ到来音の周波数特性とに基づいて、再生音源用音声信号の周波数特性を変更して得られる特性変更音声信号をスピーカ側に出力する周波数特性変更手段とを備えて構成することとした。 In order to solve the above-described problems, the present invention is configured as follows as an audio signal processing apparatus.
In other words, the sound collection sound signal obtained based on the sound collected by the microphone is obtained by using the sound signal obtained based on the sound signal for reproduction sound source input by the sound signal processing apparatus as the sound source to be emitted from the speaker. Echo cancellation processing means for executing adaptive signal processing so that the audio signal component of the sound that is included in the desired signal and is emitted from the speaker and reaches the microphone is minimized as the desired signal Obtained by changing the frequency characteristic of the sound signal for reproduction sound source based on the frequency characteristic of the sound signal for reproduction sound source and the frequency characteristic of the sound coming from the speaker that has been emitted from the speaker and reached the microphone. Frequency characteristic changing means for outputting the characteristic-changed audio signal to the speaker side.

上記構成による音声信号処理装置では、マイクロフォンにより収音したスピーカからの放出音の成分をキャンセルするという、エコーキャンセル機能が与えられる。そのうえで、上記再生音源用音声信号の周波数特性と上記収音音声信号の周波数特性とに基づいて、上記再生音源用音声信号の周波数特性を変更する。そして、この再生音源用音声信号の周波数特性を変更した音声信号が上記スピーカから音として放出されるようにして出力することになる。 The audio signal processing apparatus having the above configuration is provided with an echo cancellation function of canceling a component of sound emitted from the speaker picked up by the microphone. Then, the frequency characteristic of the reproduction sound source audio signal is changed based on the frequency characteristic of the reproduction sound source audio signal and the frequency characteristic of the collected sound signal. Then, an audio signal in which the frequency characteristic of the reproduction sound source audio signal is changed is output so as to be emitted as sound from the speaker.

これにより本願発明は、拡声通話系の音響システムにおいて、使用されるスピーカの不確定性に関わらず、常に良好な再生音を得ることができる。 As a result, the present invention can always obtain a good reproduced sound regardless of the uncertainty of the speaker used in the sound system of the loudspeaking call system.

本願発明を実施するための最良の形態（以下、実施の形態という）としては、例えば、テレビジョン会議システム(テレビ会議システム)における音声送受信系に本願発明を適用する。
テレビ会議システムは、場所の異なる複数の会議場ごとに通信端末装置を設置し、この通信端末装置から、カメラ装置により撮影した画像と、マイクロフォンにより収音した音声を他の通信端末装置に送信させると共に、他の通信装置から送信されてきた画像と音声を受信して、それぞれ、表示装置、スピーカから出力させるように構成される。つまり、テレビ会議システムでは、画像を相互に送受信する映像送受信系と、音声を相互に送受信する音声送受信系とを備える。そして、本実施の形態としては、上記音声送受信系として音声を送受信するために設けられる、通信端末装置（音声通信端末装置）とされるものである。 As the best mode for carrying out the present invention (hereinafter referred to as an embodiment), for example, the present invention is applied to an audio transmission / reception system in a video conference system (video conference system).
The video conference system installs a communication terminal device for each of a plurality of conference halls at different locations, and causes the communication terminal device to transmit an image captured by a camera device and sound collected by a microphone to other communication terminal devices. At the same time, it is configured to receive an image and a sound transmitted from another communication device and output them from a display device and a speaker, respectively. That is, the video conference system includes a video transmission / reception system that transmits / receives images to / from each other and an audio transmission / reception system that transmits / receives audio to / from each other. And as this Embodiment, it is set as the communication terminal device (voice communication terminal device) provided in order to transmit / receive an audio | voice as said audio | voice transmission / reception system.

図１は、テレビ会議システムにおける音声送受信系のシステム構成例を示している。
この場合には、互いに離れた２つの場所Ａ、場所Ｂが会議場とされており、これらの場所Ａ，Ｂのそれぞれにおいて、音声送受信系を成す音声通信端末装置１−１、１−２が設置される。これらの音声通信端末装置１−１は、所定の通信方式に対応する通信回線により接続されて、相互通信が可能なようにされている。また、場所Ａ、Ｂのそれぞれには、マイクロフォン２−１、２−２、スピーカ３−１、３−２が設置される。マイクロフォン２−１、２−２は、それぞれ、場所Ａ，Ｂ内に居る会議参加者の声（自己側話者音声）を収音するためのもので、各場所内の適当な位置に設けられる。スピーカ３−１、３−２は、他の場所の会議参加者の声（相手側話者音声）を聴くためのもので、これも各場所内の適当な位置に設けられる。なお、以降の説明において、音声通信端末装置、マイクロフォン、及びスピーカについて、特に離れた場所にある同一のものを区別する必要のない場合には、音声通信端末装置１、マイクロフォン２、スピーカ３などのようにして表記する。 FIG. 1 shows a system configuration example of an audio transmission / reception system in a video conference system.
In this case, two places A and B that are separated from each other are used as conference halls, and in each of these places A and B, the voice communication terminal apparatuses 1-1 and 1-2 that form a voice transmission / reception system are provided. Installed. These voice communication terminal apparatuses 1-1 are connected by a communication line corresponding to a predetermined communication method so that mutual communication is possible. In addition, microphones 2-1 and 2-2 and speakers 3-1 and 3-2 are installed at locations A and B, respectively. The microphones 2-1 and 2-2 are for collecting voices of the conference participants in the locations A and B (self-speaker speech), and are provided at appropriate positions in the locations. . The speakers 3-1 and 3-2 are for listening to the voice of the conference participant in the other place (the other party's voice), and are also provided at appropriate positions in each place. In the following description, the voice communication terminal device 1, the microphone 2, the speaker 3, etc., unless it is necessary to distinguish between the voice communication terminal device, the microphone, and the speaker, particularly those in the remote place. The notation is as follows.

先ず、場所Ａにおいて、マイクロフォン２−１により収音して得た音声信号（収音音声信号）は、音声通信端末装置１−１に入力される。音声通信端末装置１−１は、入力された音声信号を、通信回線を経由して音声通信端末装置１−２に対して送信する。音声通信端末装置１−２は、上記のようにして送信されてきた音声信号を受信し、スピーカ３−２から出力させる。これにより、場所Ｂの会議参加者は、場所Ａの会議参加者の声を聴くことができる。
また、同様にして、場所Ｂ内のマイクロフォン２−２により収音して得られた音声は、音声通信端末装置１−２により音声通信端末装置１−１に送信される。音声通信端末装置１−１では、受信した音声信号を、スピーカ３−１から出力させる。
このようにして、テレビ会議システムの音声送受信系では、音声の双方向通信を行うものであり、これにより、例えば或る１つの場所にいる会議参加者と、他の場所に居る会議参加者との間で会話を行うことが可能になる。また、このテレビ会議システムの場合には、各場所において、複数の会議参加者が居ることを想定しており、このために、各場所の会議参加者の全員が、他の場所の会議参加者の声を聴くことができるように、スピーカ３を備えることとしているものである。このようにしてスピーカを用いて双方向で音声のやりとりを行うシステムは、拡声通話系などともいわれる。 First, at a location A, a sound signal (sound collected sound signal) obtained by collecting sound by the microphone 2-1 is input to the sound communication terminal device 1-1. The voice communication terminal device 1-1 transmits the input voice signal to the voice communication terminal device 1-2 via the communication line. The voice communication terminal device 1-2 receives the voice signal transmitted as described above and outputs it from the speaker 3-2. Thereby, the meeting participant in the place B can listen to the voice of the meeting participant in the place A.
Similarly, the sound obtained by picking up the sound from the microphone 2-2 in the location B is transmitted to the sound communication terminal apparatus 1-1 by the sound communication terminal apparatus 1-2. The voice communication terminal apparatus 1-1 outputs the received voice signal from the speaker 3-1.
In this way, the audio transmission / reception system of the video conference system performs two-way audio communication. For example, a conference participant in one location and a conference participant in another location can It is possible to have a conversation between. In addition, in the case of this video conference system, it is assumed that there are a plurality of conference participants at each location. For this reason, all the conference participants at each location are considered to be conference participants at other locations. The speaker 3 is provided so that the voice can be heard. A system that performs two-way audio exchange using a speaker in this manner is also called a loudspeaker call system.

図２は、音声通信端末装置１の構成例を示している。確認のために述べておくと、図１に示した音声通信端末装置１−１、１−２は、この図２に示す構成を共通に有するものとされる。
音声通信端末装置１は、例えばこの図に示すようにして、Ａ／Ｄコンバータ（ＡＤＣ）１１、Ｄ／Ａコンバータ（ＤＡＣ）１２、音声信号処理部１３、コーデック部１４、通信部１７を備えて成る。 FIG. 2 shows a configuration example of the voice communication terminal device 1. For confirmation, the voice communication terminal apparatuses 1-1 and 1-2 shown in FIG. 1 have the configuration shown in FIG. 2 in common.
The voice communication terminal device 1 includes an A / D converter (ADC) 11, a D / A converter (DAC) 12, an audio signal processing unit 13, a codec unit 14, and a communication unit 17, for example, as shown in FIG. Become.

Ａ／Ｄコンバータ１１は、マイクロフォン２により収音して得られたアナログの音声信号（収音音声信号）をデジタル形式に変換して音声信号処理部１３に出力する。 The A / D converter 11 converts an analog audio signal (collected audio signal) obtained by collecting sound with the microphone 2 into a digital format and outputs the digital signal to the audio signal processing unit 13.

先に述べたように、拡声通話系システムは、そのまま使用したのでは、エコー、ハウリングなどの現象を生じる。つまり、図２において示しているように、スピーカ３から空間に放出された音は、直接音及び間接音としての空間伝搬経路（エコーパス）Ｓを経て、マイクロフォン２に到達する。つまり、通信相手側の音声通信端末装置から送信されスピーカ３から放出された通信相手の声がマイクロフォン２にて収音され、再び、通信相手側の音声通信端末装置に送信される。また、通信相手側においても、さらにスピーカから放出された音がマイクロフォンで収音されて、こちらの音声通信端末装置に送信されてくる。即ち、拡声通話系システムでは、一度空間に放出された音が、音声通信端末装置間で循環するようにして送受信される。これにより、スピーカから放出される音には、自分が今話している声が、或る遅延時間をもってこだまのようにして聴こえるものが含まれることになる。これがエコーであり、ループゲインが１を越えればハウリングとなる。
そこで、拡声通話系システムでは、このようなエコーの現象を解消、抑制するエコーキャンセルシステムを備えることが行われている。音声信号処理部１３は、このエコーキャンセルシステムとしての信号処理機能を有するようにして構成されている。なお、この音声信号処理部１３は、例えば実際には、ＤＳＰ(Digital Signal Processor)として構成される。また、音声信号処理部１３によるエコーキャンセルのための構成については後述する。 As described above, if the loudspeaker communication system is used as it is, phenomena such as echo and howling occur. That is, as shown in FIG. 2, the sound emitted from the speaker 3 to the space reaches the microphone 2 via a spatial propagation path (echo path) S as a direct sound and an indirect sound. That is, the voice of the communication partner transmitted from the voice communication terminal device on the communication partner side and emitted from the speaker 3 is picked up by the microphone 2 and transmitted again to the voice communication terminal device on the communication partner side. Further, on the communication partner side, sound emitted from the speaker is further picked up by the microphone and transmitted to the voice communication terminal device here. That is, in the loudspeaker communication system, the sound once released into the space is transmitted / received in a circulating manner between the voice communication terminal devices. As a result, the sound emitted from the speaker includes a sound in which the voice he / she is currently speaking can be heard with a certain delay time. This is an echo, and howling occurs when the loop gain exceeds 1.
In view of this, in the loudspeaker communication system, an echo canceling system that eliminates and suppresses such an echo phenomenon is provided. The audio signal processing unit 13 is configured to have a signal processing function as the echo cancellation system. The audio signal processing unit 13 is actually configured as a DSP (Digital Signal Processor), for example. A configuration for echo cancellation by the audio signal processing unit 13 will be described later.

音声信号処理部１３によりエコーキャンセル処理が施された音声信号は、コーデック部１４内のエンコーダ１５に対して入力される。エンコーダ１５は、入力された音声信号について、例えば所定方式に応じた音声圧縮符号化などの信号処理を施して通信部１７に対して出力する。通信部１７は、エンコーダ１５から入力された音声信号を、所定の通信方式に従って、通信回線経由で、他の音声通信端末装置に対して出力するようにされる。 The audio signal that has been subjected to echo cancellation processing by the audio signal processing unit 13 is input to the encoder 15 in the codec unit 14. The encoder 15 performs signal processing such as voice compression encoding according to a predetermined method on the input voice signal and outputs the signal to the communication unit 17. The communication unit 17 outputs the audio signal input from the encoder 15 to another audio communication terminal device via a communication line according to a predetermined communication method.

また、通信部１７は、他の音声通信端末装置から送信されてきた音声信号を受信して所定の圧縮符号化形式の音声信号に復元し、コーデック部１４のデコーダ１６に出力する。
デコーダ１６では、入力された音声信号の圧縮符号化に対する復調処理を実行して、所定のＰＣＭ(Pulse Code Modulation)形式のデジタル音声信号に変換し、音声信号処理部１３に出力する。音声信号処理部１３を経由した音声信号は、Ｄ／Ａコンバータ１２によりアナログ信号に変換されたうえで出力される。この出力された音声信号をスピーカ３から出力させるようにする。
なお、実際においては、Ｄ／Ａコンバータ１２によりアナログに変換された音声信号をスピーカ３により音として放出させるためには、音声信号の増幅が必要になるが、ここでは、増幅のための部位についての図示は省略している。 In addition, the communication unit 17 receives an audio signal transmitted from another audio communication terminal device, restores the audio signal in a predetermined compression encoding format, and outputs the audio signal to the decoder 16 of the codec unit 14.
The decoder 16 performs a demodulation process on the compression encoding of the input audio signal, converts it into a predetermined PCM (Pulse Code Modulation) format digital audio signal, and outputs it to the audio signal processing unit 13. The audio signal that has passed through the audio signal processing unit 13 is converted into an analog signal by the D / A converter 12 and then output. The output audio signal is output from the speaker 3.
Actually, in order to emit the audio signal converted into analog by the D / A converter 12 as sound by the speaker 3, it is necessary to amplify the audio signal. Is not shown.

図３は、本実施の形態の基となる音声信号処理部１３の内部構成例を示している。この図においては、音声信号処理部１３とともに、Ａ／Ｄコンバータ１１、Ｄ／Ａコンバータ１２、及びコーデック部１４（エンコーダ１５、デコーダ１６）を示している。 FIG. 3 shows an example of the internal configuration of the audio signal processing unit 13 that is the basis of the present embodiment. In this figure, the A / D converter 11, the D / A converter 12, and the codec unit 14 (encoder 15 and decoder 16) are shown together with the audio signal processing unit 13.

この図に示される音声信号処理部１３は、適応フィルタ（ADF：Adaptive Digital Filter）２１及び減算器２２から成る適応フィルタシステムとして構成される。
適応フィルタ２１は、デコーダ１６から出力されてＤ／Ａコンバータ１２に入力される段階の音声信号を参照信号として入力する。適応フィルタ２１は、上記参照信号と後述する誤差信号とを利用して所定の適応アルゴリズムに従った適応処理により、上記入力信号から疑似エコー信号(キャンセル用信号)を生成して出力し、減算器２１に入力する。
減算器２２は、Ａ／Ｄコンバータ１１から出力される音声信号、即ち、マイクロフォン２により収音して得られた収音音声信号を所望信号として入力する。そして、この所望信号から上記適応フィルタ２１の出力信号を減算して出力する。この減算器２２の出力は、エンコーダ１４に対して出力される。また、適応フィルタ２１に入力される減算器２２の出力は誤差信号、残差信号といわれるものとなる。 The audio signal processing unit 13 shown in this figure is configured as an adaptive filter system including an adaptive filter (ADF: Adaptive Digital Filter) 21 and a subtractor 22.
The adaptive filter 21 inputs the audio signal output from the decoder 16 and input to the D / A converter 12 as a reference signal. The adaptive filter 21 generates and outputs a pseudo echo signal (cancellation signal) from the input signal by an adaptive process according to a predetermined adaptive algorithm using the reference signal and an error signal described later, and a subtractor 21.
The subtractor 22 inputs a sound signal output from the A / D converter 11, that is, a sound-collected sound signal obtained by collecting sound by the microphone 2 as a desired signal. Then, the output signal of the adaptive filter 21 is subtracted from the desired signal and output. The output of the subtracter 22 is output to the encoder 14. The output of the subtractor 22 input to the adaptive filter 21 is called an error signal or residual signal.

適応フィルタ２１の内部は、図示による説明は省略するが、上記の参照信号が通過する、必要次数によるＦＩＲ(Finite Impulse Response：有限インパルス応答)型のデジタルフィルタと、このデジタルフィルタの係数(フィルタ係数)を、所定の適応アルゴリズムに従って可変設定する係数設定回路とを備えている。上記のデジタルフィルタの出力が、適応フィルタ２１の出力信号であり、疑似エコー信号（キャンセル用信号）となる。
そして、適応フィルタ２１は、上記の誤差信号により示される残差量を最小とする出力信号(キャンセル用信号)が常に得られるようにして、係数設定回路が、必要な次数段階における係数器のフィルタ係数を変更設定していく。
この結果、適応フィルタ２１の係数ベクトル（次数段階に応じた係数の配列に相当する）は、Ｄ／Ａコンバータ１２に入力される段階の音声信号（参照信号）がスピーカ３から出力され、次に空間伝搬経路Ｓ(図２参照)を経由してマイクロフォン２にて収音され、さらにＡ／Ｄコンバータ１１から減算器２２に対して所望信号として入力されるまでの伝達経路(以降、キャンセル音伝達経路ともいう)の擬似的な伝達関数を表現するインパルス応答を形成することになる。この動作は即ち、上記キャンセル音伝達経路を経由して得られる音の信号成分を、そのときの処理対象信号の状態に応じて適応的にキャンセルする動作であることになる。
そして、上記のキャンセル音伝達経路を経由する音は、エコーパスである空間伝搬経路Ｓを経由することからも分かるように、デコーダ１６から出力されて最終的にはスピーカ３に供給される音声信号を基としたエコー音の成分である。従って、適応フィルタ２１の出力信号（キャンセル用信号）は、スピーカ３から音として再生すべき音声信号についての疑似エコーとして捉えられることとなる。この適応フィルタシステムとしての音声信号処理部１３においては、減算器２２により、通信相手側に送信するためにエンコーダ１５に供給すべき音声信号から、上記の疑似エコー音を差し引くことになる。このようにして、音声信号処理部１３は、通信相手側に送信すべき音声信号から、エコー音の成分を適応的に除去するという動作を実行するものである。
そして、音声通信端末装置１は、このエコー音の成分が除去された音声信号を、通信相手側の音声通信端末装置に対して送信する。これにより、通信相手側の音声通信端末装置にて受信した音声信号をスピーカから放出させて聴こえる音からも、エコー音が取り除かれる。このようにしてエコーキャンセル効果が生じるものである。 Although the description of the inside of the adaptive filter 21 is omitted, an FIR (Finite Impulse Response) type digital filter of the required order through which the above reference signal passes, and a coefficient (filter coefficient) of this digital filter ) Is variably set according to a predetermined adaptive algorithm. The output of the digital filter is the output signal of the adaptive filter 21 and becomes a pseudo echo signal (cancellation signal).
Then, the adaptive filter 21 always obtains an output signal (cancellation signal) that minimizes the residual amount indicated by the error signal, so that the coefficient setting circuit performs a filter of the coefficient unit at the required order stage. Change and set the coefficient.
As a result, the coefficient vector of the adaptive filter 21 (corresponding to an array of coefficients corresponding to the order stage) is output from the speaker 3 as an audio signal (reference signal) at the stage input to the D / A converter 12. A transmission path (hereinafter referred to as canceling sound transmission) from the A / D converter 11 to the subtracter 22 as a desired signal after being picked up by the microphone 2 via the spatial propagation path S (see FIG. 2). An impulse response representing a pseudo transfer function (also called a path) is formed. That is, this operation is an operation for adaptively canceling the signal component of the sound obtained via the canceling sound transmission path according to the state of the signal to be processed at that time.
The sound that passes through the canceling sound transmission path is output from the decoder 16 and finally supplied to the speaker 3 as can be seen from the spatial propagation path S that is an echo path. This is the component of the echo sound based on it. Therefore, the output signal (cancellation signal) of the adaptive filter 21 is regarded as a pseudo echo for the audio signal to be reproduced as sound from the speaker 3. In the audio signal processing unit 13 as the adaptive filter system, the pseudo echo sound is subtracted by the subtracter 22 from the audio signal to be supplied to the encoder 15 for transmission to the communication partner side. In this way, the audio signal processing unit 13 performs an operation of adaptively removing the component of the echo sound from the audio signal to be transmitted to the communication partner side.
Then, the voice communication terminal device 1 transmits the voice signal from which the echo sound component has been removed to the voice communication terminal device on the communication partner side. As a result, the echo sound is also removed from the sound that is heard by releasing the sound signal received by the voice communication terminal device on the communication partner side from the speaker. In this way, an echo canceling effect is produced.

ところで、これまでに述べてきたテレビ会議システムにおける音声送受信系システムにおいて、スピーカ３については、システムにおいて適当なものが予め導入されている場合もあるが、その会議場で調達可能なあり合わせのものを接続することもしばしば行われている。すると、後者のような場合には、音声通信端末装置１と接続されて使用されることとなるスピーカを特定できないという、不確定性が生じることになる。
使用されるスピーカについて不確定性が生じるということは、使用されるスピーカの再生周波数特性（ここではスピーカ自体が有する周波数特性をいう）にも不確定が生じる、ということになる。これは、スピーカ３として、テレビ会議システムには不向きな、良くない音のするものが使用される可能性のあることも意味する。この具体例として、テレビ会議システムでは、テレビジョン受像機に備え付けのスピーカを使用することがしばしば行われるが、このようなスピーカでは、周波数特性として高域が少なく、こもった感じの聴き取りにくい音になる場合がある。
このような場合において、例えば図３に示したような基本的な音声通信端末装置１の構成であると、音声通信端末装置１からスピーカ側に対しては、或る固定の周波数特性による音声信号しか出力することができない。このために、スピーカ３から放出される音は、スピーカ３そのものが持つ再生性能（再生周波数特性）の影響をそのまま受けて、劣化した音しか出ないことになる。 By the way, in the audio transmission / reception system in the video conference system described so far, an appropriate speaker 3 may be introduced in advance in the system. Connecting is often done. Then, in the latter case, there arises uncertainty that the speaker to be used by being connected to the voice communication terminal device 1 cannot be specified.
When uncertainty is generated for the speaker used, this means that uncertainty is also generated in the reproduction frequency characteristic of the speaker used (here, the frequency characteristic of the speaker itself). This also means that there is a possibility that an unsound sound that is not suitable for a video conference system may be used as the speaker 3. As a specific example of this, in a video conference system, a speaker provided in a television receiver is often used. However, with such a speaker, there is a low frequency in the frequency characteristics and it is difficult to hear a muffled feeling. It may become.
In such a case, for example, in the case of the basic configuration of the voice communication terminal device 1 as shown in FIG. 3, a voice signal having a certain fixed frequency characteristic is transmitted from the voice communication terminal device 1 to the speaker side. Can only output. For this reason, the sound emitted from the speaker 3 is directly affected by the reproduction performance (reproduction frequency characteristics) of the speaker 3 itself, and only deteriorated sound is produced.

そこで本実施の形態としては、このような不都合を解消するために、例えばスピーカの持つ再生周波数特性に応じて、スピーカから出力させるべき音の音声信号について周波数特性を自動で変更できるように構成する。これにより、スピーカの再生周波数特性の不確定性にも適応して、できるだけ良好な音、聞き取りやすい音をスピーカから放出させることが可能になる。 Therefore, in order to eliminate such inconvenience, the present embodiment is configured such that the frequency characteristic of the sound signal of the sound to be output from the speaker can be automatically changed according to the reproduction frequency characteristic of the speaker, for example. . As a result, it is possible to emit sound that is as good as possible and easy to hear from the speaker in conformity with the uncertainty of the reproduction frequency characteristic of the speaker.

先ず、図４は、第１の実施の形態に対応する音声信号処理部１３Ａの構成例を示している。なお、この図において、図３と同一部分は同一符号を付して説明を省略する。
この図に示される音声信号処理部１３Ａにおいては、適応フィルタシステム２０とともに、周波数特性補正部２３が設けられる。
周波数特性補正部２３には、デコーダ１６から出力される音声信号Ｓ１と、Ａ／Ｄコンバータ１１から出力されて適応フィルタシステム２０の減算器２２に入力される段階の信号（所望信号）とを入力することで、後述するようにして、音声信号Ｓ１の周波数特性をダイナミック（動的）に補正（変更）して出力することができる。周波数特性補正部２３により周波数特性が補正された音声信号Ｓ１は、音声信号Ｓ２として出力される。この音声信号Ｓ２は、先ず、スピーカ３より出力させるべき音の音声信号として、Ｄ／Ａコンバータ１２に対して入力するようにしている。また、この場合には、この音声信号Ｓ２を、適応フィルタシステム２０の適応フィルタ２１に対して参照信号としても入力させている。
なお、図２、図３による説明からもいえることであるが、例えば音声信号処理部１３Ａからみれば、デコーダ１６から出力される段階の音声信号Ｓ１は、スピーカ３から出力させるべき音声（相手側話者音声）の源として入力した音声信号（再生音源用音声信号）としてみることができる。 First, FIG. 4 shows a configuration example of the audio signal processing unit 13A corresponding to the first embodiment. In this figure, the same parts as those in FIG.
In the audio signal processing unit 13A shown in this figure, a frequency characteristic correction unit 23 is provided together with the adaptive filter system 20.
The frequency characteristic correction unit 23 receives the audio signal S1 output from the decoder 16 and a signal (desired signal) at a stage output from the A / D converter 11 and input to the subtractor 22 of the adaptive filter system 20. Thus, as described later, the frequency characteristic of the audio signal S1 can be dynamically corrected (changed) and output. The audio signal S1 whose frequency characteristic is corrected by the frequency characteristic correcting unit 23 is output as the audio signal S2. The audio signal S2 is first input to the D / A converter 12 as an audio signal of a sound to be output from the speaker 3. In this case, the audio signal S2 is also input as a reference signal to the adaptive filter 21 of the adaptive filter system 20.
2 and FIG. 3, for example, when viewed from the audio signal processing unit 13A, the audio signal S1 at the stage of output from the decoder 16 is the audio to be output from the speaker 3 (the other party side). It can be viewed as a voice signal (sound signal for reproduction sound source) input as a source of speaker voice.

図５及び図６を参照して、周波数特性補正部２３の構成及びその動作例について説明する。
図５は、周波数特性補正部２３の内部構成例を示している。先ず、デコーダ１６から出力される音声信号Ｓ１は、周波数特性解析部３０ａに対して入力されるとともに、分岐して、周波数特性可変部３２に対して、周波数特性を変更すべき変更処理対象信号として入力される。
また、Ａ／Ｄコンバータ１１から出力される音声信号Ｓ３は、周波数特性解析部３０ｂに対して入力されるようになっている。
周波数特性解析部３０ａは、入力される音声信号Ｓ１について、例えば高速フーリエ変換処理などを実行することで、音声信号Ｓ１についての周波数特性を得る、即ち、音声信号Ｓ１について、時間領域信号から周波数領域信号に変換する。同様にして、周波数特性解析部３０ｂは、音声信号Ｓ３についての周波数特性を得る。
なお、このようにして得られる音声信号Ｓ３である収音音声信号の周波数特性は、先のキャンセル音伝達経路を経由した音の周波数特性、即ち、スピーカ３から放出されてマイクロフォン２に到達してきたものとされる音（スピーカ到来音）の周波数特性として捉えることができる。
周波数特性解析部３０ａは、自身が得た周波数特性の情報を有する信号である周波数特性信号Ｓaを比較基準データとして補正量取得部３１に対して入力させる。同様に、周波数特性解析部３０ｂは、自身が得た周波数特性の情報を有する信号である周波数特性信号Ｓbを、比較対象データとして補正量取得部３１に対して入力させる。 With reference to FIGS. 5 and 6, the configuration of the frequency characteristic correction unit 23 and an operation example thereof will be described.
FIG. 5 illustrates an internal configuration example of the frequency characteristic correction unit 23. First, the audio signal S1 output from the decoder 16 is input to the frequency characteristic analysis unit 30a and branches to the frequency characteristic variable unit 32 as a change process target signal whose frequency characteristic is to be changed. Entered.
The audio signal S3 output from the A / D converter 11 is input to the frequency characteristic analysis unit 30b.
The frequency characteristic analysis unit 30a obtains a frequency characteristic for the audio signal S1 by executing, for example, a fast Fourier transform process on the input audio signal S1, that is, for the audio signal S1, from the time domain signal to the frequency domain. Convert to signal. Similarly, the frequency characteristic analysis unit 30b obtains a frequency characteristic for the audio signal S3.
Note that the frequency characteristic of the collected sound signal, which is the sound signal S3 obtained in this way, is the frequency characteristic of the sound that has passed through the cancellation sound transmission path, that is, the sound signal emitted from the speaker 3 and has reached the microphone 2. It can be grasped as a frequency characteristic of a supposed sound (speaker incoming sound).
The frequency characteristic analysis unit 30a causes the correction amount acquisition unit 31 to input a frequency characteristic signal Sa, which is a signal having frequency characteristic information obtained by itself, as comparison reference data. Similarly, the frequency characteristic analysis unit 30b causes the correction amount acquisition unit 31 to input a frequency characteristic signal Sb, which is a signal having frequency characteristic information obtained by itself, as comparison target data.

補正量取得部３１では、入力される周波数特性信号Ｓa、Ｓbが示す周波数特性に基づいて、音声信号Ｓ１の周波数特性についての補正量（周波数特性補正量）を求める。そして、このようにして求めた周波数特性補正量の情報を有する信号である補正量信号Ｓｅを、周波数特性可変部３２に対して出力する。 The correction amount acquisition unit 31 obtains a correction amount (frequency characteristic correction amount) for the frequency characteristic of the audio signal S1 based on the frequency characteristic indicated by the input frequency characteristic signals Sa and Sb. Then, a correction amount signal Se, which is a signal having information on the frequency characteristic correction amount obtained in this way, is output to the frequency characteristic variable unit 32.

周波数特性可変部３２は、上記補正量信号Ｓｅが示す周波数特性補正量に基づいて、変更処理対象として入力される音声信号Ｓ１の周波数特性を変更するための処理を実行し、変更後の音声信号を、音声信号Ｓ２（特性変更音声信号）として出力する。 Based on the frequency characteristic correction amount indicated by the correction amount signal Se, the frequency characteristic variable unit 32 executes a process for changing the frequency characteristic of the audio signal S1 input as a change processing target, and the changed audio signal Is output as an audio signal S2 (characteristic change audio signal).

図６は、周波数特性補正部２３における、補正量取得部３１、及び周波数特性可変部３２の具体的動作の一例を示している。
周波数特性信号Ｓａが示す周波数特性が図６（ａ）に示すものであるのに対して、周波数特性信号Ｓｂが示す周波数特性が図６（ｂ）に示すものであるとする。なお、ここでの周波数特性は、周波数に対するレベル（振幅）の関係を表したものとされる。
図６において示される周波数特性は、あくまでも模式的に簡易化されたものではあるが、周波数特性信号Ｓaが示す周波数特性に対して、周波数特性信号Ｓbが示す周波数特性は、或る一定以上の周波数帯域における振幅が小さくなっている。 FIG. 6 shows an example of specific operations of the correction amount acquisition unit 31 and the frequency characteristic variable unit 32 in the frequency characteristic correction unit 23.
Assume that the frequency characteristic indicated by the frequency characteristic signal Sa is as shown in FIG. 6A, whereas the frequency characteristic indicated by the frequency characteristic signal Sb is as shown in FIG. 6B. Note that the frequency characteristic here represents the relationship of the level (amplitude) to the frequency.
The frequency characteristics shown in FIG. 6 are simplified in a simplified manner, but the frequency characteristics indicated by the frequency characteristic signal Sb are higher than a certain frequency compared to the frequency characteristics indicated by the frequency characteristic signal Sa. The amplitude in the band is small.

ここで、説明を簡単で分かりやすいものとするために、マイクロフォン２とスピーカ３が近接しており、マイクロフォン２及びＡ／Ｄコンバータ１１の周波数特性がフラット(平坦)である条件を仮定する。
この場合、周波数特性信号Ｓbが示す周波数特性は、そのまま、スピーカ３から放出された音の周波数特性を示すものとなる。すると周波数特性信号Ｓa（比較基準データ）が示す基準の周波数特性に対する、周波数特性信号Ｓb（比較対象データ）が示す周波数特性の変化は、スピーカ３の再生周波数特性に応じたものであることになる。そこで、周波数特性補正部２３における補正量取得部３１は、先ずは、周波数特性信号Ｓaと周波数特性信号Ｓbとが示す周波数特性を比較することに基づき、スピーカ３の再生周波数特性を推定する。例えば図６（ａ）、図６（ｂ）の各周波数特性に基づいては、図６（ｃ）に示すスピーカ３の再生周波数特性が推定されることになる。この図においては、周波数特性をあくまでも単純化して示してはいるが、図６（ｃ）に示される再生周波数特性としては、或る特定の周波数以上の帯域において再生音のレベルが低下する特性であることを示している。これは、実際においては、図６（ｃ）において破線で示す領域に対応して高域が落ち込んで、こもった音となる例を表している。 Here, in order to make the explanation simple and easy to understand, it is assumed that the microphone 2 and the speaker 3 are close to each other and the frequency characteristics of the microphone 2 and the A / D converter 11 are flat.
In this case, the frequency characteristic indicated by the frequency characteristic signal Sb indicates the frequency characteristic of the sound emitted from the speaker 3 as it is. Then, the change in the frequency characteristic indicated by the frequency characteristic signal Sb (comparison target data) with respect to the reference frequency characteristic indicated by the frequency characteristic signal Sa (comparison reference data) corresponds to the reproduction frequency characteristic of the speaker 3. . Therefore, the correction amount acquisition unit 31 in the frequency characteristic correction unit 23 first estimates the reproduction frequency characteristic of the speaker 3 based on comparing the frequency characteristics indicated by the frequency characteristic signal Sa and the frequency characteristic signal Sb. For example, based on the frequency characteristics shown in FIGS. 6A and 6B, the reproduction frequency characteristics of the speaker 3 shown in FIG. 6C are estimated. In this figure, the frequency characteristic is shown in a simplified manner, but the reproduction frequency characteristic shown in FIG. 6C is a characteristic in which the level of reproduced sound decreases in a band above a specific frequency. It shows that there is. This actually represents an example in which the high frequency falls corresponding to the area indicated by the broken line in FIG.

次に、補正量取得部３１では、上記のようにして推定したスピーカ３の再生周波数特性に基づき、スピーカ３から放出される音の周波数特性が、再生音源用音声信号である音声信号Ｓ１の周波数特性にできるだけ近くなるようにするために、音声信号Ｓ１に与えるべき周波数特性補正量を求める。
例えば図６（ｃ）に示されるスピーカ３の再生周波数特性が推定された場合に対応して求めたとする周波数特性補正量は、図６（ｄ）に示すものとなる。この例では、推定したスピーカ３の再生周波数特性においてレベルが低下している周波数帯域について、その低下分を補うだけのレベル（補正レベル）をそのまま与えたものを、周波数特性補正量としている。なお、この場合の補正量は、上記の説明から理解されるように、物理量の情報として、補正すべき周波数帯域と、この周波数帯域において補正すべきレベルの情報を有するものとなる。周波数帯域と補正量取得部３１は、このようにして求めた周波数特性補正量を表す情報の信号である補正量信号Ｓeを、周波数特性可変部３２に出力する。 Next, in the correction amount acquisition unit 31, the frequency characteristic of the sound emitted from the speaker 3 is based on the reproduction frequency characteristic of the speaker 3 estimated as described above, and the frequency of the audio signal S1 that is the reproduction sound source audio signal. In order to be as close as possible to the characteristic, a frequency characteristic correction amount to be given to the audio signal S1 is obtained.
For example, the frequency characteristic correction amount determined in correspondence with the case where the reproduction frequency characteristic of the speaker 3 shown in FIG. 6C is estimated is as shown in FIG. In this example, a frequency characteristic correction amount is obtained by giving a level (correction level) that just compensates for the reduced frequency band of the estimated reproduction frequency characteristic of the speaker 3 as it is. Note that the correction amount in this case has a frequency band to be corrected and information on a level to be corrected in this frequency band as physical quantity information, as can be understood from the above description. The frequency band and correction amount acquisition unit 31 outputs a correction amount signal Se, which is a signal of information representing the frequency characteristic correction amount obtained in this way, to the frequency characteristic variable unit 32.

周波数特性可変部３２は、上記のようにして入力されてくる補正量信号Ｓｅに基づいて、変更処理対象信号として入力される音声信号Ｓ１の周波数特性を変更するための処理を実行し、変更後の音声信号を、音声信号Ｓ２として出力する。図６の場合、音声信号Ｓ１について周波数特性を変更して得られる音声信号Ｓ２の周波数特性は、図６（ｅ）に示すものとなる。図６（ｅ）に示される周波数特性は、音声信号Ｓ１の周波数特性（図６（ａ））に対して、図６（ｄ）に示されるだけの補正量を加算したものとなっている。 Based on the correction amount signal Se input as described above, the frequency characteristic variable unit 32 executes a process for changing the frequency characteristic of the audio signal S1 input as the change process target signal. Is output as an audio signal S2. In the case of FIG. 6, the frequency characteristic of the audio signal S2 obtained by changing the frequency characteristic of the audio signal S1 is as shown in FIG. The frequency characteristic shown in FIG. 6E is obtained by adding the correction amount shown in FIG. 6D to the frequency characteristic of the audio signal S1 (FIG. 6A).

上記図６（ｅ）に示される周波数特性の音声信号Ｓ２がＤ／Ａコンバータ１２を経由してスピーカ３に供給されることによっては、スピーカ３から放出される音の周波数特性は、図６（ｅ）に示される周波数特性において、補正量が加算されたことに対応してレベルが持ち上げられている周波数帯域の部分について、図６（ｃ）に示されるスピーカ３の再生周波数特性により相殺されるようにしてレベルが抑制されるものとなる。つまり、図６（ａ）に示される周波数特性が得られる。このことは、スピーカ３の再生周波数特性に起因する周波数特性の変化を補正して、音声信号Ｓ１（再生音源用音声信号）と同じ周波数特性の音をスピーカ３から放出させていることを意味する。
このようにして、スピーカ３から放出される音の周波数特性についての補正が行われることで、スピーカ３の再生周波数特性の影響をうけることなく、原音（再生音源用音声信号）にできるだけ忠実な音をスピーカ３により放出させることが可能になる。また、上記図５、図６により説明した構成であれば、周波数特性補正量は、スピーカ３の再生周波数特性を推定した結果に基づいて行われるようにされているので、どのようなスピーカが接続されても対応することができる。つまり、音声通信端末装置１に接続されるスピーカが不定となる条件であっても、実際に接続されたスピーカの再生周波数特性に適合するようにして、スピーカから放出される音についての補正が自動で行われる結果を得ることができる。
また、さらに上記図５、図６により説明した構成では、ダイナミックに補正を行うことが可能となっている。従って、時間経過に伴ってスピーカ３の再生周波数特性が変化し得るような環境であっても、その変化に応答するようにして、常に最適とされる音がスピーカ３から出力されるようにして補正が行われるものとなる。例えば、使用するスピーカを途中で切り換えることができるような音響システムと併用した場合には、切り換えられたスピーカの再生周波数特性に適合した補正が自動で行われる。 When the audio signal S2 having the frequency characteristic shown in FIG. 6E is supplied to the speaker 3 via the D / A converter 12, the frequency characteristic of the sound emitted from the speaker 3 is as shown in FIG. In the frequency characteristic shown in e), the portion of the frequency band whose level is raised corresponding to the addition of the correction amount is canceled by the reproduction frequency characteristic of the speaker 3 shown in FIG. In this way, the level is suppressed. That is, the frequency characteristic shown in FIG. 6A is obtained. This means that the sound having the same frequency characteristic as that of the audio signal S1 (reproduction sound source audio signal) is emitted from the speaker 3 by correcting the change in the frequency characteristic caused by the reproduction frequency characteristic of the speaker 3. .
In this way, by correcting the frequency characteristics of the sound emitted from the speaker 3, the sound that is as faithful as possible to the original sound (reproduction sound source audio signal) without being affected by the reproduction frequency characteristic of the speaker 3. Can be emitted by the speaker 3. Further, in the configuration described with reference to FIGS. 5 and 6, the frequency characteristic correction amount is performed based on the result of estimating the reproduction frequency characteristic of the speaker 3, so that any speaker is connected. Even if it is done, it can respond. In other words, even when the speaker connected to the voice communication terminal device 1 is indefinite, the correction for the sound emitted from the speaker is automatically performed so as to conform to the reproduction frequency characteristics of the actually connected speaker. You can get results done in.
Further, in the configuration described with reference to FIGS. 5 and 6, correction can be performed dynamically. Therefore, even in an environment in which the reproduction frequency characteristic of the speaker 3 can change with the passage of time, a sound that is always optimized is output from the speaker 3 so as to respond to the change. Correction will be performed. For example, when used together with an acoustic system that can switch a speaker to be used in the middle, correction that is adapted to the reproduction frequency characteristics of the switched speaker is automatically performed.

なお、確認のために述べておくと、図６においては、周波数特性補正量として、特定の周波数帯域のレベルを高くしたものを求めた結果が得られている例を示している。しかし、図６により説明した周波数特性補正量を求めるアルゴリズムによっては、比較基準データと比較対象データの周波数特性の関係に応じて、特定の周波数帯域のレベルを低くする周波数特性補正量が求められる結果となる場合もある。また、或る周波数帯域ではレベルを高くし、別の或る周波数帯域ではレベルを低くするようにした周波数特性補正量が求められる結果となる場合もある。
また、図６に示した例は、結果的に、図６（ａ）に示される比較基準データ（周波数特性信号Ｓａ）が示す周波数特性と、図６（ｂ）に示される比較対象データ（周波数特性信号Ｓｂ）が示す周波数特性についての差分（差分周波数特性）を周波数特性補正量として求めていることになる。この場合、補正量取得部３１が実行する実際の処理としては、比較基準データ（周波数特性信号Ｓａ）の周波数特性と、図６（ｂ）に示される比較対象データ（周波数特性信号Ｓｂ）の周波数特性との差分を演算することで、図６（ｄ）に示す周波数特性補正量を得ることができる。つまり、概念的は、スピーカの再生周波数特性を推定しているものとして捉えられるものではあるが、現実の処理においては、図６（ｃ）に示されるスピーカの再生周波数特性を推定するための実際の手順が省略されるようにして、効率的にアルゴリズムを構成できる。 For confirmation, FIG. 6 shows an example in which a result obtained by obtaining a frequency characteristic correction amount with a higher level in a specific frequency band is obtained. However, depending on the algorithm for obtaining the frequency characteristic correction amount described with reference to FIG. 6, the frequency characteristic correction amount for lowering the level of the specific frequency band is obtained according to the relationship between the frequency characteristics of the comparison reference data and the comparison target data. It may become. In some cases, a frequency characteristic correction amount may be obtained in which a level is increased in a certain frequency band and a level is decreased in another certain frequency band.
In addition, the example shown in FIG. 6 results in the frequency characteristic indicated by the comparison reference data (frequency characteristic signal Sa) shown in FIG. 6A and the comparison target data (frequency shown in FIG. 6B). The difference (frequency difference characteristic) regarding the frequency characteristic indicated by the characteristic signal Sb) is obtained as the frequency characteristic correction amount. In this case, the actual processing executed by the correction amount acquisition unit 31 includes the frequency characteristic of the comparison reference data (frequency characteristic signal Sa) and the frequency of the comparison target data (frequency characteristic signal Sb) shown in FIG. By calculating the difference from the characteristic, the frequency characteristic correction amount shown in FIG. 6D can be obtained. That is, conceptually, it can be regarded as estimating the reproduction frequency characteristic of the speaker, but in actual processing, the actual frequency characteristic for estimating the reproduction frequency characteristic of the speaker shown in FIG. Thus, the algorithm can be efficiently constructed by omitting the above procedure.

図７は、図４に示される第１の実施の形態に対応する音声信号処理部１３Ａの構成を備える音声通信端末装置１における信号処理手順を示すフローチャートである。
先ず、ステップＳ１０１においては、通信相手側の音声通信端末装置から送信されてきた音声信号を、通信部１７により受信する。
先に述べたように、上記ステップＳ１０１により受信した音声信号は音声圧縮符号化が施された形式となっている。そこで、ステップＳ１０１に続く手順のステップＳ１０２では、ステップＳ１０１により受信して得られた音声信号について、コーデック部１４内のデコーダ１６により、音声圧縮符号化を解くための復調（デコード）処理を実行し、例えば所定のＰＣＭ形式の音声信号Ｓ１を得る。 FIG. 7 is a flowchart showing a signal processing procedure in the voice communication terminal apparatus 1 having the configuration of the voice signal processing unit 13A corresponding to the first embodiment shown in FIG.
First, in step S101, the communication unit 17 receives a voice signal transmitted from the voice communication terminal device on the communication partner side.
As described above, the audio signal received in step S101 is in a format subjected to audio compression encoding. Therefore, in step S102 of the procedure following step S101, the decoder 16 in the codec unit 14 performs demodulation (decoding) processing for decoding the audio compression coding on the audio signal obtained in step S101. For example, an audio signal S1 in a predetermined PCM format is obtained.

また、音声通信端末装置１では、上記ステップＳ１０１、Ｓ１０２の手順と併行するようにして、ステップＳ１０３、Ｓ１０４の手順を実行する。ステップＳ１０３では、マイクロフォン２により収音して得られたアナログの収音音声信号を入力し、続くステップＳ１０４では、Ａ／Ｄコンバータ１１により、この収音音声信号についてデジタル信号に変換する処理（Ａ／Ｄ変換）を実行する。
上記ステップＳ１０４により得られるデジタルの収音音声信号（音声信号Ｓ３）は、後述するステップＳ１０６の処理のために、所望信号として、適応フィルタシステム２０の減算器２２に対して入力させる。また、次に述べる後述するステップＳ１０５の処理のために、周波数特性補正部２３に対して入力させる。 Further, the voice communication terminal device 1 executes the procedures of steps S103 and S104 in parallel with the procedures of steps S101 and S102. In step S103, an analog collected sound signal obtained by collecting the sound from the microphone 2 is input. In subsequent step S104, the A / D converter 11 converts the collected sound signal into a digital signal (A / D conversion).
The digital collected sound signal (sound signal S3) obtained in step S104 is input to the subtracter 22 of the adaptive filter system 20 as a desired signal for the process of step S106 described later. In addition, the frequency characteristic correction unit 23 is caused to input for the process of step S105 described later.

ステップＳ１０５においては、周波数特性補正部２３により、ステップＳ１０２の処理に応じて入力されてくる音声信号Ｓ１と、ステップＳ１０５の処理に応じて入力されてくる音声信号Ｓ３とに基づき、例えば図５及び図６により説明したようにして、音声信号Ｓ１の周波数特性を変更（補正）し、音声信号Ｓ２として出力する。
この音声信号Ｓ２は、先ず、スピーカ３から放出させるべき音の音声信号として、Ｄ／Ａコンバータ１２に対して入力させることになる。Ｄ／Ａコンバータ１２は、ステップＳ１０９により、入力された音声信号Ｓ２をアナログ信号に変換（Ｄ／Ａ変換）し、スピーカ３に対して出力する。これにより、スピーカ３では、音声信号Ｓ２を音として再生出力することになる。
また、音声信号Ｓ２は、適応フィルタシステム２０における適応フィルタ２１に対して参照信号としても出力される。適応フィルタシステム２０は、ステップＳ１０６により、上記ステップＳ１０５の処理に対応して入力される音声信号Ｓ２を参照信号とし、ステップＳ１０５の処理に対応して入力される音声信号Ｓ３を所望信号として、先に説明したエコーキャンセル処理を実行する。そして、このエコーキャンセル処理によりエコー音の成分が除去された音声信号（減算器２２の出力信号）を、次のステップＳ１０７の手順のために、エンコーダ１４に対して出力する。 In step S105, based on the audio signal S1 input according to the process of step S102 and the audio signal S3 input according to the process of step S105 by the frequency characteristic correction unit 23, for example, FIG. As described with reference to FIG. 6, the frequency characteristic of the audio signal S1 is changed (corrected) and output as the audio signal S2.
The audio signal S2 is first input to the D / A converter 12 as an audio signal of sound to be emitted from the speaker 3. In step S109, the D / A converter 12 converts the input audio signal S2 into an analog signal (D / A conversion) and outputs the analog signal to the speaker 3. As a result, the speaker 3 reproduces and outputs the audio signal S2 as sound.
The audio signal S2 is also output as a reference signal to the adaptive filter 21 in the adaptive filter system 20. In step S106, the adaptive filter system 20 uses the audio signal S2 input corresponding to the process of step S105 as a reference signal and the audio signal S3 input corresponding to the process of step S105 as a desired signal. The echo cancellation process described in (1) is executed. Then, the audio signal (the output signal of the subtractor 22) from which the echo sound component has been removed by this echo cancellation processing is output to the encoder 14 for the next step S107.

エンコーダ１４は、ステップＳ１０７の手順により、入力されてくる音声信号について音声圧縮符号化を施して通信部１７に出力する。通信部１７は、ステップＳ１０８の手順として、エンコーダ１４から入力されてくる音声信号を、所定の通信フォーマットに従って、通信相手側の音声通信端末装置１に送信するための処理を実行する。 The encoder 14 performs voice compression coding on the input voice signal according to the procedure of step S107 and outputs the result to the communication unit 17. As the procedure of step S108, the communication unit 17 executes a process for transmitting the audio signal input from the encoder 14 to the audio communication terminal device 1 on the communication partner side according to a predetermined communication format.

ところで、このような拡声通話系システムにおいては、通話状態として、シングルトーク状態とダブルトーク状態が存在する。シングルトーク状態とは、相手側話者音声がスピーカから放出されているが、自己側話者はマイクロフォンに向かって話していない状態である。本実施の形態との対応では、デコーダ１６から有効なレベルの相手側話者音声の音声信号が出力されてはいるが、有効なレベルの自己側話者音声がマイクロフォン２にて収音されてはいない状態である。ダブルトーク状態とは、相手側話者音声がスピーカから放出されていると共に、自己側話者もマイクロフォンに向かって話している状態であり、本実施の形態との対応では、デコーダ１４から有効とみなされる相手側話者音声の音声信号が出力されているとともに、有効とみなされる一定レベル以上の自己側話者音声もマイクロフォン２にて収音されている状態となる。 By the way, in such a loudspeaker communication system, there are a single talk state and a double talk state as call states. The single talk state is a state in which the other party's speaker voice is emitted from the speaker, but the own speaker is not speaking into the microphone. In correspondence with the present embodiment, the voice signal of the other party's speaker voice having an effective level is output from the decoder 16, but the self speaker's voice having an effective level is collected by the microphone 2. There is no state. The double talk state is a state in which the other party's speaker voice is emitted from the speaker and the own speaker is also speaking into the microphone. In correspondence with this embodiment, the decoder 14 is effective. The sound signal of the other party's speaker sound that is regarded as being output, and the self-speaker sound of a certain level or higher that is regarded as valid is also being collected by the microphone 2.

本実施の形態によるスピーカ音声の補正にあっては、スピーカ３の再生周波数特性を推定したことに基づき周波数特性補正量を求めるのであるから、比較対象データとして必要なのは、キャンセル音伝達経路を伝達する音、即ち、空間伝搬経路Ｓを経由してスピーカ３からマイクロフォン２に到達してくるものとされるスピーカ到来音についての周波数特性の情報となる。
本実施の形態おいて比較対象データの基となる収音音声信号の音声内容であるが、シングルトーク状態では、ほぼ、スピーカ到来音であるものとしてみることができる。この場合、これまでに図５、図６などにより説明した補正の処理手順で特に問題はない。しかし、ダブルトーク状態では、上記のスピーカ到来音とともに、自己側話者音声の成分も含まれることになる。このとき、収音音声信号の周波数特性としては、スピーカ到来音の特性だけではなく、これに自己側話者音声の特性も加味されることになるが、この状態では、スピーカの再生周波数特性を高い精度で推定することが難しくなり、結果として、良好な周波数特性補正量を求めることも難しくなる可能性がある。 In the correction of the speaker sound according to the present embodiment, the frequency characteristic correction amount is obtained based on the estimation of the reproduction frequency characteristic of the speaker 3. Therefore, what is necessary as the comparison target data is to transmit the cancellation sound transmission path. This is information on the frequency characteristics of the sound, that is, the speaker arrival sound that is supposed to reach the microphone 2 from the speaker 3 via the spatial propagation path S.
In the present embodiment, the voice content of the collected voice signal that is the basis of the comparison target data, but in the single talk state, it can be regarded as being almost the sound coming from the speaker. In this case, there is no particular problem in the correction processing procedure described so far with reference to FIGS. However, in the double talk state, the self-speaker speech component is included in addition to the above-mentioned speaker incoming sound. At this time, the frequency characteristics of the collected sound signal include not only the characteristics of the sound coming from the speaker but also the characteristics of the self-speaker voice. In this state, the reproduction frequency characteristics of the speaker are It is difficult to estimate with high accuracy, and as a result, it may be difficult to obtain a good frequency characteristic correction amount.

そこで、このための対策としては、例えばダブルトーク状態のときには、周波数特性可変部３２は、今回のダブルトーク状態に遷移する直前のシングルトーク状態時のタイミングにおいて、補正量取得部３１にて出力されていた補正量信号Ｓeをラッチ（保持）したものを使用して周波数特性を変更する構成とすることが考えられる。このようにしてラッチされた補正量信号Ｓeは、シングルトーク状態時において、正しいスピーカの再生周波数特性の推定結果に基づいているものとして扱える。そこで、ダブルトーク状態時において、上記のラッチした補正量信号Ｓeを固定的に使用することとすれば、スピーカ３から放出される音を適正に補正することが充分に期待できる。 Therefore, as a countermeasure for this, for example, in the double talk state, the frequency characteristic variable unit 32 is output from the correction amount acquisition unit 31 at the timing in the single talk state immediately before the transition to the current double talk state. It can be considered that the frequency characteristic is changed by using a latched (held) correction amount signal Se. The correction amount signal Se latched in this way can be treated as being based on the correct estimation result of the reproduction frequency characteristic of the speaker in the single talk state. Therefore, if the latched correction amount signal Se is fixedly used in the double talk state, it can be sufficiently expected that the sound emitted from the speaker 3 is corrected appropriately.

なお、ダブルトーク状態時においてラッチされたシングルトーク状態時の補正量信号Ｓeを使用するときには、補正量取得部３１による周波数特性補正量の取得処理（周波数特性補正量の更新）は停止させて構わない。例えば補正量取得部３１としての動作をＤＳＰにより実現する場合には、補正量取得部３１の動作を停止させることで、その分の処理負担の軽減を図ることができる。
また、通話状態としてダブルトーク状態であるかどうかを判断するための手法の例については、次の第２の実施の形態において説明する。 When the correction amount signal Se in the single talk state latched in the double talk state is used, the frequency characteristic correction amount acquisition process (update of the frequency characteristic correction amount) by the correction amount acquisition unit 31 may be stopped. Absent. For example, when the operation as the correction amount acquisition unit 31 is realized by a DSP, the processing load can be reduced by stopping the operation of the correction amount acquisition unit 31.
Further, an example of a method for determining whether or not the telephone conversation state is the double talk state will be described in the second embodiment below.

図８は、第２の実施の形態に対応する音声信号処理部１３Ｂの内部構成例を示している。なお、この図において、図４と同一部分については同一符号を付して説明を省略する。
この図に示される音声信号処理部１３Ｂは、図４に示した音声信号処理部１３Ａの構成に対して、適応フィルタ２１から周波数特性補正部２３に対して係数情報Ｄｋを入力させるようにして構成されている。 FIG. 8 shows an internal configuration example of the audio signal processing unit 13B corresponding to the second embodiment. In this figure, the same parts as those in FIG.
The audio signal processing unit 13B shown in this figure has a configuration in which coefficient information Dk is input from the adaptive filter 21 to the frequency characteristic correction unit 23 with respect to the configuration of the audio signal processing unit 13A shown in FIG. Has been.

適応フィルタシステム２０が実行する適応処理は、所望信号のインパルス応答を学習することに相当する。このインパルス応答の学習結果として、適応フィルタ２１を形成するＦＩＲフィルタにおける係数ベクトルが変更されていくものであり、従って、係数ベクトルは、上記のインパルス応答を表している。係数情報Ｄｋは、この係数ベクトルを示す情報であり、従って、学習されているインパルス応答を表すものとなる。インパルス応答は、周波数解析（フーリエ変換など）を行うことで、周波数特性の情報に変換することが可能である。そして、この場合において得られるインパルス応答は、キャンセル音伝達経路を経由した音についてのものとなる。
つまり、第２の実施の形態においては、スピーカ３から放出されてマイクロフォン２に到達してきたものとされるスピーカの到来音の周波数特性（比較対象データ）を得るために、収音音声信号自体である音声信号Ｓ１だけではなく、適応フィルタ２１において設定されている係数ベクトルの情報、即ち、キャンセル音伝達経路を経由する音についてのインパルス応答の情報も取得できるようにしているものである。 The adaptive process executed by the adaptive filter system 20 corresponds to learning the impulse response of the desired signal. As a result of learning the impulse response, the coefficient vector in the FIR filter forming the adaptive filter 21 is changed, and therefore the coefficient vector represents the impulse response described above. The coefficient information Dk is information indicating the coefficient vector, and therefore represents the learned impulse response. The impulse response can be converted into frequency characteristic information by performing frequency analysis (Fourier transform or the like). The impulse response obtained in this case is for the sound that passes through the canceling sound transmission path.
That is, in the second embodiment, in order to obtain the frequency characteristic (comparison target data) of the incoming sound of the speaker that is emitted from the speaker 3 and reaches the microphone 2, the collected sound signal itself is used. In addition to a certain sound signal S1, information on coefficient vectors set in the adaptive filter 21, that is, information on impulse responses for sound passing through the canceling sound transmission path can be acquired.

図９は、第２の実施の形態に対応する周波数特性補正部２３の内部構成例を示している。なお、この図において、図５と同一部分については同一符号を付して説明を省略する。
この図に示されるように、第２の実施の形態においては、音声信号Ｓ１、Ｓ２について周波数解析処理を行う周波数解析処理部３０ａ、３０ｂとともに、係数情報Ｄｋに対応する周波数解析処理部３０ｃが設けられる。
周波数解析処理部３０ｃは、例えば入力される係数情報Ｄｋをインパルス応答波形の情報に変換したうえで、このインパルス応答波形に対して周波数解析処理をかけることで、周波数特性を得る。そして、このようにして得た周波数特性の情報を有する信号として、周波数特性信号Ｓｃを補正量取得部３１に対して出力する。この構成により、補正量取得部３１は、比較対象データとなる周波数特性の情報として、収音音声信号自体に対応する周波数特性信号Ｓｂとともに、適応フィルタ２１における係数ベクトルに基づいて得られた周波数特性信号Ｓｃを取り込むことができるようになっている。 FIG. 9 shows an internal configuration example of the frequency characteristic correction unit 23 corresponding to the second embodiment. In this figure, the same parts as those in FIG.
As shown in this figure, in the second embodiment, a frequency analysis processing unit 30c corresponding to the coefficient information Dk is provided together with frequency analysis processing units 30a and 30b that perform frequency analysis processing on the audio signals S1 and S2. It is done.
For example, the frequency analysis processing unit 30c converts the input coefficient information Dk into information of an impulse response waveform, and performs frequency analysis processing on the impulse response waveform to obtain frequency characteristics. Then, the frequency characteristic signal Sc is output to the correction amount acquisition unit 31 as a signal having the frequency characteristic information obtained in this way. With this configuration, the correction amount acquisition unit 31 uses the frequency characteristic obtained based on the coefficient vector in the adaptive filter 21 together with the frequency characteristic signal Sb corresponding to the collected sound signal itself as the frequency characteristic information to be compared. The signal Sc can be captured.

図１０は、第２の実施の形態に対応する音声信号処理部１３Ｂの構成を備える音声通信端末装置１における信号処理手順を示すフローチャートである。
この図に示されるステップＳ２０１〜ステップＳ２０９の手順の流れ、及び各ステップにおける処理の基本的な内容としては、先の第１の実施の形態に対応した図７のステップＳ１０１〜Ｓ１０９と同様となる。ただし、第２の実施の形態においては、ステップＳ２０６において適応フィルタシステム２０によるエコーキャンセル処理を実行させながら、そのときに設定されている適応フィルタ２１における係数ベクトルがどのようになっているのかを示す情報である係数情報Ｄｋを生成し、これを周波数特性補正部２３に出力できるようにしている。これとともに、ステップＳ２０５として示される、周波数特性補正部２３の処理は、図１１のフローチャートに示すようにして実行するものとなる。 FIG. 10 is a flowchart showing a signal processing procedure in the voice communication terminal device 1 having the configuration of the voice signal processing unit 13B corresponding to the second embodiment.
The flow of the procedure from step S201 to step S209 shown in this figure and the basic contents of the processing in each step are the same as steps S101 to S109 in FIG. 7 corresponding to the first embodiment. . However, in the second embodiment, the echo cancellation process by the adaptive filter system 20 is executed in step S206, and the coefficient vector in the adaptive filter 21 set at that time is shown. The coefficient information Dk, which is information, is generated and can be output to the frequency characteristic correction unit 23. At the same time, the processing of the frequency characteristic correction unit 23 shown as step S205 is executed as shown in the flowchart of FIG.

先ず、図１１においては、ステップＳ３０１により、ダブルトーク状態にあるか否かについて判別する。
ダブルトーク状態であるか否かは、例えば、デコーダ１６からＤ／Ａコンバータ１２に入力される経路における音声信号Ｓ１若しくは音声信号Ｓ２のレベルと、減算器２２の出力、即ち、適応フィルタ２１のための誤差信号のレベルを比較することによって判別できる。説明を分かりやすいものとするために、適応フィルタシステム２０は収束した状態にあると仮定したうえで、先ずは、シングルトーク状態を考えてみる。この場合には、音声信号Ｓ１（又はＳ２）としては、一定以上の有効なレベルが得られているのに対して、誤差信号としては、エコー成分がキャンセルされていることで、非常に小さいレベルとなっている。これに対して、上記のダブルトーク状態では、所望信号（収音音声信号）に含まれることとなった自己側話者音声の成分については、適応フィルタシステム２０によりキャンセルされない。従って、音声信号Ｓ１（又はＳ２）にも一定以上の有効なレベルが得られるとともに、誤差信号にも、自己側話者音声に対応する相応のレベルが現れる。このようなレベルの相違が生じることを基にして、両者の信号についての比較を行えば、ダブルトーク状態であるか否かの判定が行える。 First, in FIG. 11, it is determined in step S301 whether or not it is in a double talk state.
Whether or not it is in the double talk state is determined by, for example, the level of the audio signal S1 or the audio signal S2 in the path inputted from the decoder 16 to the D / A converter 12 and the output of the subtractor 22, that is, the adaptive filter 21. It can be determined by comparing the error signal levels. In order to make the explanation easy to understand, it is assumed that the adaptive filter system 20 is in a converged state, and first consider a single talk state. In this case, an effective level of a certain level or more is obtained as the audio signal S1 (or S2), whereas the error signal is a very small level because the echo component is canceled. It has become. On the other hand, in the above-described double talk state, the component of the self-speaker speech that is included in the desired signal (collected speech signal) is not canceled by the adaptive filter system 20. Therefore, an effective level of a certain level or more is obtained in the voice signal S1 (or S2), and a corresponding level corresponding to the self-speaker voice appears in the error signal. Based on the occurrence of such a level difference, it is possible to determine whether or not the state is a double talk state by comparing both signals.

ステップＳ３０１により、ダブルトーク状態ではないとの判別結果を得た場合、周波数特性補正部２３は、ステップＳ３０２により、比較対象データとして、収音音声信号自体、即ち、音声信号Ｓ３を取り込む。これに対して、ステップＳ３０１にてダブルトーク状態であると判別された場合には、ステップＳ３０３により、比較対象データとして、係数情報Ｄｋを取り込む。 When it is determined in step S301 that the double talk state is not obtained, the frequency characteristic correction unit 23 captures the collected sound signal itself, that is, the audio signal S3 as comparison target data in step S302. On the other hand, if it is determined in step S301 that the state is a double talk state, coefficient information Dk is captured as comparison target data in step S303.

上記ステップＳ３０２、又はＳ３０３により比較対象データを取り込んだ後は、ステップＳ３０４により、比較基準データである音声信号Ｓ１と、比較対象データである音声信号Ｓ３、若しくは係数情報Ｄｋとについて周波数特性解析処理を実行する。ここでの周波数特性解析処理は、音声信号Ｓ１については周波数特性解析部３０ａが行う。また、音声信号Ｓ３であれば周波数特性解析部３０ｂが行い、係数情報Ｄｋであれば、周波数特性解析部３０ｃが行う。 After the comparison target data is fetched in step S302 or S303, frequency characteristic analysis processing is performed on the audio signal S1 that is comparison reference data, the audio signal S3 that is comparison target data, or the coefficient information Dk in step S304. Execute. The frequency characteristic analysis processing here is performed by the frequency characteristic analysis unit 30a for the audio signal S1. Further, the frequency characteristic analyzing unit 30b performs the sound signal S3, and the frequency characteristic analyzing unit 30c performs the coefficient information Dk.

続くステップＳ３０５においては、補正量取得部３１により周波数特性補正量を求めて（取得して）、補正量信号Ｓｅとして周波数特性可変部３２に対して出力する。このとき、ステップＳ３０２により比較対象データとして収音音声信号自体を取り込んだ場合には、図５、図６により説明した第１の実施の形態と同様の処理を実行することになる。これに対して、ステップＳ３０３により比較対象データとして係数情報Ｄｋを取り込んだ場合には、補正量取得部３１は、図６（ｂ）の周波数特性信号Ｓｂに代えて、周波数特性解析部３０ｃから取り込んだ周波数特性信号Ｓｃを利用したうえで、スピーカ３の再生周波数特性の推定、周波数特性補正量の取得を行い、補正量信号Ｓｅを出力する。
そして、続いては、ステップＳ３０６にて、周波数特性可変部３２により、上記ステップＳ３０５により入力される補正量信号Ｓeに基づいて、音声信号Ｓ１（変更処理対象信号）の周波数特性を変更し、音声信号Ｓ２として出力する。 In the subsequent step S305, the correction amount acquisition unit 31 obtains (acquires) the frequency characteristic correction amount and outputs it to the frequency characteristic variable unit 32 as the correction amount signal Se. At this time, if the collected sound signal itself is taken in as comparison target data in step S302, the same processing as in the first embodiment described with reference to FIGS. 5 and 6 is executed. On the other hand, when the coefficient information Dk is captured as comparison target data in step S303, the correction amount acquisition unit 31 captures from the frequency characteristic analysis unit 30c instead of the frequency characteristic signal Sb in FIG. After using the frequency characteristic signal Sc, the reproduction frequency characteristic of the speaker 3 is estimated, the frequency characteristic correction amount is acquired, and the correction amount signal Se is output.
Subsequently, in step S306, the frequency characteristics variable unit 32 changes the frequency characteristics of the audio signal S1 (change processing target signal) based on the correction amount signal Se input in step S305, and the audio characteristics are changed. Output as signal S2.

このような第２の実施の形態としての構成では、ダブルトーク状態時においては、収音音声信号自体に代えて、適応フィルタシステム２０の適応処理の結果として得られる、収音音声信号のインパルス応答（即ち適応フィルタ２１の係数ベクトル）に基づいて得た周波数特性を比較対象データとして利用することになる。この場合のインパルス応答は、収音音声信号に含まれる音の成分として、上記のスピーカ到来音についてのものであるから、適切に周波数特性補正量（補正量信号Ｓe）を得ることが可能になる。つまり、通話状態がダブルトーク状態であるときにも、スピーカ３から放出される音を適正に補正することが可能とされているものである。
また、適応フィルタシステム２０が適正にエコーキャンセルを行って収束している状態では、適応フィルタ２１における係数ベクトルが示すインパルス応答としても、上記スピーカ到来音に対応したものとなっている。従って、第２の実施の形態の構成にあっては、適応フィルタシステム２０が収束しているときには、ダブルトーク状態以外の状態時（シングルトーク状態を含む）においても、係数情報Ｄｋを比較対象データとして使用する構成とすることが考えられる。 In such a configuration as the second embodiment, in the double talk state, an impulse response of the collected sound signal obtained as a result of the adaptive processing of the adaptive filter system 20 instead of the collected sound signal itself. The frequency characteristic obtained based on (that is, the coefficient vector of the adaptive filter 21) is used as comparison target data. Since the impulse response in this case is for the above-mentioned speaker incoming sound as a sound component included in the collected sound signal, it is possible to appropriately obtain the frequency characteristic correction amount (correction amount signal Se). . That is, even when the call state is the double talk state, the sound emitted from the speaker 3 can be appropriately corrected.
Further, when the adaptive filter system 20 is properly echo canceled and converged, the impulse response indicated by the coefficient vector in the adaptive filter 21 corresponds to the sound coming from the speaker. Therefore, in the configuration of the second embodiment, when the adaptive filter system 20 is converged, the coefficient information Dk is used as the comparison target data even in a state other than the double talk state (including the single talk state). It can be considered that the configuration is used.

また、上記のようにして、ダブルトーク状態以外の状態時においても、係数情報Ｄｋを比較対象データとして使用する構成を採ることが可能であるとすれば、比較対象データとして、係数情報Ｄｋのみを使用して、収音音声信号自体である音声信号Ｓ３は使用しない、という構成も本実施の形態として考えることができる。このような音声信号処理部１３Ｃの構成を、第３の実施の形態として、図１２に示す。なお、図１２において、図４、図８と同一部分については同一符号を付して説明を省略する。
この図１２から分かるように、周波数特性補正部２３に対して入力される比較対象データとしては、係数情報Ｄｋのみとされて、音声信号Ｓ３（収音音声信号）は入力されていない。 Further, as described above, if it is possible to adopt a configuration in which the coefficient information Dk is used as comparison target data even in a state other than the double talk state, only the coefficient information Dk is used as the comparison target data. A configuration in which the sound signal S3 that is the collected sound signal itself is not used can also be considered as the present embodiment. Such a configuration of the audio signal processing unit 13C is shown in FIG. 12 as a third embodiment. In FIG. 12, the same parts as those in FIGS. 4 and 8 are denoted by the same reference numerals, and description thereof is omitted.
As can be seen from FIG. 12, the comparison target data input to the frequency characteristic correction unit 23 is only coefficient information Dk, and no audio signal S3 (sound-collected audio signal) is input.

また、第３の実施の形態に対応する周波数特性補正部２３の内部構成例を図１３に示す。なお、先の第２の実施の形態に対応する図５、図９と同一部分については同一符号を付して説明を省略する。
この図１３に示される周波数特性補正部２３は、図９に示した第２の実施の形態に対応する周波数特性補正部２３の構成から、音声信号Ｓ３を入力して周波数解析処理を行うための周波数特性解析部３０ｂが省略された構成となっている。 FIG. 13 shows an internal configuration example of the frequency characteristic correction unit 23 corresponding to the third embodiment. Note that the same parts as those in FIGS. 5 and 9 corresponding to the second embodiment are denoted by the same reference numerals and description thereof is omitted.
The frequency characteristic correction unit 23 shown in FIG. 13 is for inputting the audio signal S3 and performing frequency analysis processing from the configuration of the frequency characteristic correction unit 23 corresponding to the second embodiment shown in FIG. The frequency characteristic analysis unit 30b is omitted.

また、第３の実施の形態の音声通信端末装置１が実行する信号処理手順を、第２の実施の形態として図１０に示したフローチャートを基にして説明するとすれば、ステップＳ２０４により得られるデジタルの収音音声信号（音声信号Ｓ３）を、ステップＳ２０６の処理のために音声信号処理部１３Ｃの適応フィルタシステム２０（減算器２２）に出力するのみで、ステップＳ２０５の処理のために周波数特性補正部２３に出力することはしない。そのうえで、ステップＳ２０５における周波数特性変更のための処理としては、ステップＳ３０３〜Ｓ３０６の手順を定常的に実行するように構成することになる。 Further, if the signal processing procedure executed by the voice communication terminal device 1 of the third embodiment is described based on the flowchart shown in FIG. 10 as the second embodiment, the digital signal obtained in step S204 is obtained. The collected sound signal (sound signal S3) is output to the adaptive filter system 20 (subtractor 22) of the sound signal processing unit 13C for the process of step S206, and the frequency characteristic correction is performed for the process of step S205. The data is not output to the unit 23. In addition, as a process for changing the frequency characteristics in step S205, the procedure of steps S303 to S306 is regularly executed.

ところで、これまでに説明した実施の形態においては、周波数特性補正部２３から出力される周波数特性変更後の音声信号Ｓ１、即ち、音声信号Ｓ２を、適応フィルタシステム２０（適応フィルタ２１）が参照信号として入力するようにされている。換言すれば、周波数特性補正部２３は、スピーカ３から音として放出させる音の基として、音声信号処理部１３（１３Ａ、１３Ｂ、１３Ｃ）が入力した音声信号が、最終的にスピーカ３に供給されるまでの信号経路において、適応フィルタシステム２０に参照信号を入力させる点の前段に挿入されている。
１つの変形例として、周波数特性補正部２３を参照信号の後段に挿入する、即ち、周波数特性補正部２３について、適応フィルタシステム２０に入力される参照信号を処理対象の音声信号（比較基準データ）として入力させるようにして挿入した構成を採ることもできる。
しかしながら、エコーキャンセルを行うための適応フィルタシステム２０が適応処理によって学習するインパルス応答は、厳密には、適応フィルタシステム２０に入力される参照信号がスピーカ３から空間伝達経路Ｓを経由してマイクロフォン２に到達し、さらに所望信号として適応フィルタシステム２０に入力された経路（キャンセル音伝達経路）を伝達する音に対応したものである。このことは、適応フィルタシステム２０に入力される参照信号と、スピーカ３の入力段に供給する段階の音声信号とが同じであるべきことを意味している。このために、参照信号の後段に周波数特性補正部２３を挿入して、ダイナミックに周波数特性の変更を行うと、適応フィルタシステム２０に入力される参照信号と、スピーカ３の入力段に供給する段階の音声信号とが異なるものになってしまう。これは、適応処理によって学習していた解が誤るのと等価であり、従って、適正な適応処理を実行することが難しくなる。そこで、本実施の形態では、周波数特性補正部２３による周波数特性の変更、補正が、リアルタイムで動的に行われるようにすることを配慮して、周波数特性補正部２３を、参照信号の入力点の前段に挿入しているものである。
ただし、周波数特性補正部２３による周波数特性の変更を動的に行わずに、周波数特性補正量を固定して周波数特性の変更を行うようにした場合には、参照信号の後段に挿入しても特に問題にはならない。周波数特性補正量を固定して周波数特性の変更を行う構成としては、例えば、音声信号処理部１３の起動初期時に周波数特性補正量を求め、以降は、この周波数特性補正量を固定として周波数特性の変更を行うようにするものを考えることができる。 Incidentally, in the embodiment described so far, the adaptive filter system 20 (adaptive filter 21) uses the reference signal as the audio signal S1 after the frequency characteristic change output from the frequency characteristic correction unit 23, that is, the audio signal S2. Have been entered as. In other words, the frequency characteristic correcting unit 23 finally supplies the audio signal input by the audio signal processing unit 13 (13A, 13B, 13C) to the speaker 3 as a sound base to be emitted from the speaker 3. Is inserted before the point where the adaptive filter system 20 inputs the reference signal.
As one modification, the frequency characteristic correction unit 23 is inserted in the subsequent stage of the reference signal. That is, for the frequency characteristic correction unit 23, the reference signal input to the adaptive filter system 20 is processed as an audio signal (comparison reference data). It is also possible to adopt a configuration in which it is inserted as if.
However, the impulse response learned by the adaptive processing by the adaptive filter system 20 for performing echo cancellation is strictly speaking that the reference signal input to the adaptive filter system 20 is transmitted from the speaker 3 via the spatial transmission path S to the microphone 2. , And further corresponds to the sound transmitted through the path (cancellation sound transmission path) input to the adaptive filter system 20 as a desired signal. This means that the reference signal input to the adaptive filter system 20 and the audio signal supplied to the input stage of the speaker 3 should be the same. For this purpose, when the frequency characteristic correction unit 23 is inserted after the reference signal to dynamically change the frequency characteristic, the reference signal input to the adaptive filter system 20 and the input stage of the speaker 3 are supplied. The audio signal will be different. This is equivalent to a mistake in the solution learned by the adaptive process, and therefore it is difficult to execute an appropriate adaptive process. Therefore, in the present embodiment, considering that the frequency characteristic change and correction by the frequency characteristic correction unit 23 are dynamically performed in real time, the frequency characteristic correction unit 23 is connected to the reference signal input point. Is inserted in the previous stage.
However, when the frequency characteristic correction amount is fixed and the frequency characteristic is changed without dynamically changing the frequency characteristic by the frequency characteristic correction unit 23, the frequency characteristic may be inserted after the reference signal. There is no particular problem. As a configuration for changing the frequency characteristic while fixing the frequency characteristic correction amount, for example, the frequency characteristic correction amount is obtained at the initial startup of the audio signal processing unit 13, and thereafter, the frequency characteristic correction amount is fixed and the frequency characteristic correction amount is fixed. You can think of what you want to make changes.

また、図６による説明では、周波数特性補正量を求めるのにあたって、単純に、比較比較基準データとしての周波数特性と比較対象データとしての周波数特性の差分を求めることとしているが、これは１つの例に過ぎないものであり、比較基準データとしての周波数特性と比較対象データとしての周波数特性とに基づく限り、多様に考えられる。例えば、厳密には、空間伝達経路Ｓを経由してスピーカ３からマイクロフォン２に到達してくるスピーカ到来音は、空間伝達経路Ｓの実際の距離などに応じて、マイクロフォン２に到達するまでにその音質（周波数特性）が変化し得る。また、スピーカ到来音の実際は、直接音と反射音が合成されたものになるが、例えば反射音は、その反射した壁などの物の材質により周波数特性が変化し得る。このようなことも考慮に加えたうえで、周波数特性補正量を求めるアルゴリズム、処理手順を構成することが考えられる。 In the description with reference to FIG. 6, in obtaining the frequency characteristic correction amount, the difference between the frequency characteristic as the comparison reference data and the frequency characteristic as the comparison target data is simply obtained, but this is one example. As long as it is based on the frequency characteristic as the comparison reference data and the frequency characteristic as the comparison target data, it can be considered variously. For example, strictly speaking, a speaker incoming sound that reaches the microphone 2 from the speaker 3 via the spatial transmission path S is not changed until it reaches the microphone 2 depending on the actual distance of the spatial transmission path S or the like. Sound quality (frequency characteristics) can change. The actual sound coming from the speaker is a combination of the direct sound and the reflected sound. For example, the frequency characteristics of the reflected sound can change depending on the material of the reflected wall or the like. In consideration of this, it is conceivable to construct an algorithm and a processing procedure for obtaining a frequency characteristic correction amount.

また、適応フィルタシステム２０として採用する適応アルゴリズムとしては、例えばこれまでに知られている技術のうちから、適切とされるものを選択すればよい。
また、音声通信端末装置１における送信用音声信号、及び再生音源用音声信号の処理は、主にデジタル信号処理によるものとしているが、デジタル信号処理を施すときの送信用音声信号及び再生音源用音声信号の形式については特に限定されるべきものではない。例えば、再生音源用音声信号を出力させる場合には、ΔΣ変調されたビットストリーム形式の音声信号をD級増幅によって再生するような構成とすることも場合によっては考えられる。
また、実施の形態としてはテレビ会議システムにおいて音声送受信のために設けられる音声通信端末装置を例に挙げているが、これ以外にも、例えば、音声会議システムであるとか、電話装置におけるハンズフリー通話機能などをはじめとして、いわゆる拡声通話系システムとして捉えることのできる装置全般に適用可能である。 Further, as an adaptive algorithm employed as the adaptive filter system 20, for example, an appropriate algorithm may be selected from techniques known so far.
The processing of the transmission audio signal and the reproduction sound source audio signal in the audio communication terminal apparatus 1 is mainly performed by the digital signal processing. However, the transmission audio signal and the reproduction sound source audio when performing the digital signal processing are used. The format of the signal is not particularly limited. For example, in the case of outputting a reproduction sound source audio signal, a configuration in which a ΔΣ modulated bit stream format audio signal is reproduced by class D amplification may be considered in some cases.
Moreover, although the voice communication terminal device provided for voice transmission / reception in the video conference system is taken as an example as an embodiment, other than this, for example, a voice conference system or a hands-free call in a telephone device It can be applied to all devices that can be regarded as a so-called loudspeaker communication system, including functions.

また、これまでに説明してきたスピーカ出力音の補正のための構成は、エコーキャンセル処理機能を除いても成立するものであり、従って、例えばオーディオシステムなどにおいて、スピーカ出力音声を所望の周波数特性となるように補正するような装置、回路などにも適用できる。 In addition, the configuration for correcting the speaker output sound described so far is established even if the echo cancellation processing function is excluded. Therefore, for example, in an audio system, the speaker output sound has a desired frequency characteristic. The present invention can also be applied to devices, circuits, and the like that perform corrections.

また、これまでに説明した実施の形態による音声信号処理の動作は、ＤＳＰのほかにも、コンピュータシステム（ＣＰＵ）にプログラムを実行させることで実現できる。このようなプログラムは、例えばリムーバブルの記憶媒体に記憶させておいたうえで、この記憶媒体からインストール(アップデートも含む)させるようにして、ＤＳＰやコンピュータシステムなどに記憶させることが考えられる。また、所定のデータインターフェイスを経由させるなどして、他のホストとなる機器からの制御によってプログラムのインストールを行えるようにすることも考えられる。さらに、ネットワーク上のサーバなどにおける記憶装置に記憶させておいたうえで、本実施の形態に対応の音声信号処理機能を有する装置にネットワーク機能を持たせることとし、サーバからダウンロードして取得してインストールできるように構成することも考えられる。 In addition to the DSP, the audio signal processing operation according to the embodiments described so far can be realized by causing a computer system (CPU) to execute a program. For example, such a program may be stored in a DSP, a computer system, or the like after being stored in, for example, a removable storage medium and then installed (including update) from the storage medium. It is also conceivable that the program can be installed through control from another host device, such as via a predetermined data interface. Furthermore, after storing it in a storage device such as a server on the network, the device having the audio signal processing function corresponding to this embodiment is provided with the network function, downloaded from the server and acquired. It can also be configured to be installable.

本発明の実施の形態に対応するテレビ会議システムにおける音声送受信系の構成例を示すブロック図である。It is a block diagram which shows the structural example of the audio | voice transmission / reception system in the video conference system corresponding to embodiment of this invention. 実施の形態の音声通信端末装置の内部構成例を示すブロック図である。It is a block diagram which shows the example of an internal structure of the voice communication terminal device of embodiment. 音声通信端末装置が備える音声信号処理部についての基本構成例を示す図である。It is a figure which shows the basic structural example about the audio | voice signal processing part with which an audio | voice communication terminal device is provided. 第１の実施の形態に対応する音声信号処理部の内部構成例を示す図である。It is a figure which shows the internal structural example of the audio | voice signal processing part corresponding to 1st Embodiment. 第１の実施の形態に対応する周波数特性補正部の内部構成例を示す図である。It is a figure which shows the internal structural example of the frequency characteristic correction | amendment part corresponding to 1st Embodiment. 本実施の形態の周波数特性補正部における補正量取得部及び周波数特性可変部の動作例を模式的に示す図である。It is a figure which shows typically the operation example of the correction amount acquisition part in the frequency characteristic correction part of this Embodiment, and a frequency characteristic variable part. 第１の実施の形態に対応する音声通信端末装置が実行する音声信号処理手順例を示すフローチャートである。It is a flowchart which shows the example of an audio | voice signal processing procedure which the audio | voice communication terminal device corresponding to 1st Embodiment performs. 第２の実施の形態に対応する音声信号処理部の内部構成例を示す図である。It is a figure which shows the internal structural example of the audio | voice signal processing part corresponding to 2nd Embodiment. 第２の実施の形態に対応する周波数特性補正部の内部構成例を示す図である。It is a figure which shows the example of an internal structure of the frequency characteristic correction | amendment part corresponding to 2nd Embodiment. 第２の実施の形態に対応する音声通信端末装置が実行する音声信号処理手順例を示すフローチャートである。It is a flowchart which shows the example of an audio | voice signal processing procedure which the audio | voice communication terminal device corresponding to 2nd Embodiment performs. 図１０に示される周波数特性変更のためのステップにおける手順例を示すフローチャートである。It is a flowchart which shows the example of a procedure in the step for the frequency characteristic change shown by FIG. 第３の実施の形態に対応する音声信号処理部の内部構成例を示す図である。It is a figure which shows the internal structural example of the audio | voice signal processing part corresponding to 3rd Embodiment. 第３の実施の形態に対応する周波数特性補正部の内部構成例を示す図である。It is a figure which shows the example of an internal structure of the frequency characteristic correction | amendment part corresponding to 3rd Embodiment.

Explanation of symbols

１（１−１・１−２）音声通信端末装置、２（２−１・２−２）マイクロフォン、３（３−１・３−２）スピーカ、１１Ａ／Ｄコンバータ、１２Ｄ／Ａコンバータ、１３音声信号処理部、１４コーデック部、１５エンコーダ、１６デコーダ、１７通信部、２０適応フィルタシステム、２１適応フィルタ、２２減算器、２３周波数特性補正部、３０ａ・３０ｂ・３０ｃ周波数特性解析部、３１補正量取得部、３２周波数特性可変部 1 (1-1, 1-2) Voice communication terminal device, 2 (2-1, 2-2) Microphone, 3 (3-1, 3-2) Speaker, 11 A / D converter, 12 D / A converter , 13 Audio signal processing unit, 14 Codec unit, 15 Encoder, 16 Decoder, 17 Communication unit, 20 Adaptive filter system, 21 Adaptive filter, 22 Subtractor, 23 Frequency characteristic correction unit, 30a / 30b / 30c Frequency characteristic analysis unit, 31 correction amount acquisition unit, 32 frequency characteristic variable unit

Claims

As a sound source to be emitted from the speaker, the sound signal obtained based on the sound signal for reproduction sound source input by the sound signal processing device is used as a reference signal, and the sound collection sound signal obtained based on the sound collected by the microphone is a desired signal. Echo cancellation processing means for performing adaptive signal processing so that the audio signal component of the sound that is emitted from the speaker and reaches the microphone included in the desired signal is minimized,
The frequency characteristic of the reproduction sound source audio signal is changed based on the frequency characteristic of the reproduction sound source sound signal and the frequency characteristic of the speaker arrival sound that is emitted from the speaker and reaches the microphone. Frequency characteristic changing means for outputting the characteristic-changed audio signal obtained by the above to the speaker side;
An audio signal processing device comprising:

The frequency characteristic changing means is
It is possible to obtain the frequency characteristics of the collected sound signal and the frequency characteristics of the impulse response obtained as a result of the adaptive signal processing in the echo cancellation processing means,
At least in the double talk state, the frequency characteristic of the impulse response obtained by the adaptive signal processing in the echo cancellation processing means is the frequency characteristic of the speaker incoming sound,
The audio signal processing apparatus according to claim 1.

The echo cancellation processing means is
As the reference signal, configured to input the characteristic change audio signal output by the frequency characteristic change means,
The audio signal processing apparatus according to claim 1.

The frequency characteristic changing means is
As the frequency characteristic of the sound coming from the speaker, the frequency characteristic of the collected sound signal is obtained.
The audio signal processing apparatus according to claim 1.

The frequency characteristic changing means is
As the frequency characteristic of the speaker arrival sound, the frequency characteristic of the impulse response obtained by the adaptive signal processing in the echo cancellation processing means is obtained.
The audio signal processing apparatus according to claim 1.

As a sound source to be emitted from the speaker, the sound signal obtained based on the sound signal for reproduction sound source input by the sound signal processing device is used as a reference signal, and the sound collection sound signal obtained based on the sound collected by the microphone is a desired signal. An echo cancellation processing procedure for performing adaptive signal processing so that the sound signal component of the sound that is emitted from the speaker and reaches the microphone included in the desired signal is minimized,
The frequency characteristic of the reproduction sound source audio signal is changed based on the frequency characteristic of the reproduction sound source sound signal and the frequency characteristic of the speaker arrival sound that is emitted from the speaker and reaches the microphone. Frequency characteristic changing procedure for outputting the characteristic-changed audio signal obtained by the above-mentioned to the speaker side,
The audio signal processing method characterized by performing.