JP2008131593A

JP2008131593A - Method of deciding double talk state, echo eraser using same and its, program and recording medium therefore

Info

Publication number: JP2008131593A
Application number: JP2006317578A
Authority: JP
Inventors: Kenichi Noguchi; 賢一野口; Suehiro Shimauchi; 末廣島内; Kenichi Furuya; 賢一古家; Akitoshi Kataoka; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-11-24
Filing date: 2006-11-24
Publication date: 2008-06-05
Anticipated expiration: 2026-11-24
Also published as: JP4542538B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an echo eraser that decides a double talk state and a method of deciding it using only received signals x (n) and picked-up sound signals z (n), to provide its program, and to provide a recording medium therefore<SB></SB>. <P>SOLUTION: A double talk state deciding means includes a received and transmitted signals deciding and counting means, a picked-up sound and transmitted signals deciding and counting means, and a double talk state deciding portion. The received and transmitted signals deciding and counting means obtains continuous frame numbers whose received sound index every short time frame of the received signals is greater than a threshold value xth. The picked-up sound and transmitted signals deciding and counting means obtains the number of continuous frames whose sound pickup index for every short time frame of sound pickup signal is greater than a threshold values zth. The double-talk state deciding means decides whether the state is in the double talk or the single talk by a receiver only by comparing each size of the continuous frame numbers obtained by the received and transmitted signals deciding and counting means, and the picked-up sound and transmitted signals deciding and counting means. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、音声会議、ＴＶ会議等の拡声通話系で発生するスピーカからマイクロホンへ回り込む反響信号を抑圧する反響消去装置において、双方向で同時に発話が起こるダブルトーク状態を検出する方法、その方法を用いた反響消去装置、そのプログラム及びその記録媒体に関する。 The present invention relates to a method and a method for detecting a double talk state in which utterances occur simultaneously in both directions in an echo canceling apparatus that suppresses an echo signal that circulates from a speaker to a microphone that occurs in a speech communication system such as an audio conference and a TV conference The present invention relates to an echo canceller, a program thereof, and a recording medium thereof.

拡声通話系では、受話信号をスピーカを用いて再生し、発話信号をマイクロホンを用いて収音して相手側に送信する。この時、受話信号を再生するスピーカとマイクロホンとの音響結合により、スピーカから再生された信号がマイクロホンに回り込み、エコー信号としてマイクロホン入力信号に混入する。このエコー信号は、音響エコーとして通話品質の劣化を招く、また、ハウリング発生の原因にもなる。
反響路（音響）結合量推定方法及び推定した反響路結合量を利用した従来の反響消去装置の一つとして、特許文献１に開示されているものが知られている。この従来技術を図１３を参照して説明する。スピーカ等の受話手段２とマイクロホン等の送話手段４を用いたハンズフリー拡声通話を行なう自分側を近端、図示しない通信路を挟んで会話を行なう相手側を遠端とする。図１３は、近端に配置された反響消去装置８００を示している。 In the voice call system, the received signal is reproduced using a speaker, and the uttered signal is collected using a microphone and transmitted to the other party. At this time, due to the acoustic coupling between the speaker that reproduces the received signal and the microphone, the signal reproduced from the speaker wraps around the microphone and is mixed into the microphone input signal as an echo signal. This echo signal causes deterioration of call quality as an acoustic echo and also causes a howling.
As one of conventional echo canceling apparatuses using an echo path (acoustic) coupling amount estimation method and an estimated echo path coupling amount, one disclosed in Patent Document 1 is known. This prior art will be described with reference to FIG. The own side that conducts a hands-free loudspeaking call using the receiving means 2 such as a speaker and the transmitting means 4 such as a microphone is the near end, and the other end that talks across a communication path (not shown) is the far end. FIG. 13 shows an echo canceller 800 placed at the near end.

反響消去装置８００は、反響消去部８０と損失制御部９０とからなる。遠端の相手（話者）から通信路を経由して受話端８に受信された受話信号ｘ（ｎ）は、反響消去装置８００を介して受話手段２から拡声音として空間に放射される。ここでｎは離散時間を示す。その拡声音は反響路６を介して反響し、エコー信号ｙ（ｎ）として送話手段４に収音される。
送話手段４で収音された入力信号ｚ（ｎ）（以降、収音信号ｚ（ｎ）と称す）は、送話端９より反響消去部８０に入力される。 The echo cancellation apparatus 800 includes an echo cancellation unit 80 and a loss control unit 90. The received signal x (n) received by the receiving end 8 via the communication path from the far end partner (speaker) is radiated to the space as a loud sound from the receiving means 2 via the echo canceller 800. Here, n indicates discrete time. The loud sound reverberates through the echo path 6 and is picked up by the transmission means 4 as an echo signal y (n).
The input signal z (n) collected by the transmission means 4 (hereinafter referred to as the collected sound signal z (n)) is input from the transmission end 9 to the echo canceling unit 80.

反響消去部８０は、例えば受話信号ｘ（ｎ）と誤差信号ｅ（ｎ）（後述する）とから適応フィルタ係数ｈ＾（ｎ）を生成する推定手段８２と、受話信号ｘ（ｎ）と適応フィルタ係数ｈ＾（ｎ）から擬似エコー信号ｙ＾（ｎ）を生成する擬似反響路８４と、収音信号ｚ（ｎ）から擬似エコーｙ＾（ｎ）を減算する減算器８６からなる。
収音信号ｚ（ｎ）は、通話状態によってエコー信号ｙ（ｎ）と送話信号ｓ（ｎ）で構成される。ただし、受話信号ｘ（ｎ）がない場合には、同時にエコー信号ｙ（ｎ）が存在することはない。今、受話信号ｘ（ｎ）だけの場合の収音信号ｚ（ｎ）は、ｚ（ｎ）＝ｙ＾（ｎ）である。この場合に、擬似反響路８４が生成する擬似エコー信号ｙ＾（ｎ）が反響路６のインパルス応答を正しく模擬していれば、エコー信号ｙ（ｎ）に擬似エコー信号ｙ＾（ｎ）が近づくので、減算器８６の出力である誤差信号ｅ（ｎ）は小さくなる。 The echo canceling unit 80, for example, an estimation unit 82 that generates an adaptive filter coefficient ＾ (n) from the received signal x (n) and the error signal e (n) (described later), and the received signal x (n) and the adaptive signal. It consists of a pseudo echo path 84 that generates a pseudo echo signal y ^ (n) from the filter coefficient h ^ (n), and a subtractor 86 that subtracts the pseudo echo y ^ (n) from the collected sound signal z (n).
The collected sound signal z (n) is composed of an echo signal y (n) and a transmission signal s (n) depending on the call state. However, when there is no received signal x (n), the echo signal y (n) does not exist at the same time. Now, the sound pickup signal z (n) in the case of only the reception signal x (n) is z (n) = y ^ (n). In this case, if the pseudo echo signal y ^ (n) generated by the pseudo echo path 84 correctly simulates the impulse response of the echo path 6, the pseudo echo signal y ^ (n) is added to the echo signal y (n). As it approaches, the error signal e (n), which is the output of the subtractor 86, becomes smaller.

このように誤差信号ｅ（ｎ）が小さくなるように適応アルゴリズムにより推定手段８２が適応フィルタ係数ｈ＾（ｎ）を生成する。代表的なアルゴリズムとしては、最小二乗法（ＬＭＳ：Least-Mean-Squares）や学習同定法（ＮＬＭＳ：Normalized LMS）などが知られている。
減算器８６の出力信号である誤差信号ｅ（ｎ）は、損失制御部９０に入力される。
損失制御部９０は、受話側と送話側で同時に発話するダブルトーク状態を判定するダブルトーク状態判定手段９２と、受話端８と反響消去部８０の間の通信路に挿入される損失器９８と、反響消去部８０と相手側への通信路に収音信号ｚ（ｎ）を出力する出力端７との間に挿入される損失器９９と、ダブルトーク状態でないときの受話信号ｘ（ｎ）と収音信号ｚ（ｎ）とからの損失量を計算する損失量決定手段９４と、それぞれの損失器９８と９９に受話信号ｘ（ｎ）と誤差信号ｅ（ｎ）とに基づき判定された通話状態に対応して損失量を挿入する損失量制御手段９６とからなる。 Thus, the estimation means 82 generates the adaptive filter coefficient 係数 (n) by the adaptive algorithm so that the error signal e (n) becomes small. As typical algorithms, a least square method (LMS: Least-Mean-Squares) and a learning identification method (NLMS: Normalized LMS) are known.
An error signal e (n) that is an output signal of the subtractor 86 is input to the loss control unit 90.
The loss control unit 90 includes a double talk state determination unit 92 that determines a double talk state in which the receiving side and the transmitting side speak simultaneously, and a lossr 98 that is inserted into a communication path between the receiving end 8 and the echo canceling unit 80. And the loss device 99 inserted between the echo canceling unit 80 and the output terminal 7 that outputs the collected sound signal z (n) on the communication path to the other party, and the received signal x (n when not in the double talk state ) And the sound pickup signal z (n), the loss amount determining means 94 for calculating the loss amount, and the loss units 98 and 99 are determined based on the received signal x (n) and the error signal e (n). Loss amount control means 96 for inserting a loss amount corresponding to the call state.

ダブルトーク状態判定手段９２には、受話信号ｘ（ｎ）と収音信号ｚ（ｎ）と誤差信号ｅ（ｎ）とが入力される。受話信号ｘ（ｎ）は短時間パワー値Ｐｘ（ｋ）計算部９２１で、収音信号ｚ（ｎ）は短時間パワー値Ｐｚ（ｋ）計算部９２２で、誤差信号ｅ（ｎ）は短時間パワー値Ｐｅ（ｋ）計算部９２３で、それぞれの短時間パワー値が計算される。ｋは短時間区間の番号を表わす。
受話信号ｘの短時間パワー値Ｐｘ（ｋ）は、受話区間判定部９２５に入力される。受話信号短時間パワー値Ｐｘ（ｋ）が閾値ｘｔｈより大の場合、受話区間判定部９２５は受話信号ｘ（ｎ）中に発話信号ありと判定する。 The double-talk state determination unit 92 receives the received signal x (n), the collected sound signal z (n), and the error signal e (n). The received signal x (n) is a short-time power value Px (k) calculation unit 921, the collected sound signal z (n) is a short-time power value Pz (k) calculation unit 922, and the error signal e (n) is a short time. The power value Pe (k) calculation unit 923 calculates each short-time power value. k represents a short section number.
The short-time power value Px (k) of the reception signal x is input to the reception segment determination unit 925. When the received signal short-time power value Px (k) is larger than the threshold value xth, the received interval determination unit 925 determines that there is an utterance signal in the received signal x (n).

収音信号ｚの短時間パワー値Ｐｚ（ｋ）は、閾値設定部９２７に入力される。エコーレベル設定部９２７では、１以下に設定された定数Ｔｈと短時間パワー値Ｐｚ（ｋ）を乗算して（Ｔｈ×Ｐｚ（ｋ））、その結果を閾値として設定する。
受話区間判定部９２５の判定結果と、閾値と誤差信号の短時間パワー値Ｐｅ（ｋ）とがダブルトーク状態判定部９２９に入力される。ダブルトーク状態判定部９２９は、受話信号短時間パワー値Ｐｘ（ｋ）が所定の閾値ｘｔｈを超えた場合において、閾値よりも誤差信号短時間パワー値Ｐｅ（ｋ）が小さければダブルトーク状態ではないと判定する（Ｐｘ（ｋ）＞ｘｔｈで且つ、Ｐｅ（ｋ）＜Ｔｈ×Ｐｚ（ｋ））。また同様に受話信号短時間パワー値Ｐｘ（ｋ）が所定の閾値ｘｔｈを超えた場合において、閾値（Ｔｈ×Ｐｚ（ｋ））よりも、誤差信号短時間パワー値Ｐｅ（ｋ）が大きければダブルトーク状態か又は反響路６が変化しているものと判定する（Ｐｘ（ｋ）＞ｘｔｈで且つ、Ｐｅ（ｋ）＞Ｔｈ×Ｐｚ（ｋ））。 The short-time power value Pz (k) of the collected sound signal z is input to the threshold setting unit 927. The echo level setting unit 927 multiplies the constant Th set to 1 or less by the short-time power value Pz (k) (Th × Pz (k)), and sets the result as a threshold value.
The determination result of the reception interval determination unit 925, the threshold value, and the short-time power value Pe (k) of the error signal are input to the double talk state determination unit 929. When the received signal short time power value Px (k) exceeds a predetermined threshold value xth, the double talk state determination unit 929 is not in the double talk state if the error signal short time power value Pe (k) is smaller than the threshold value. (Px (k)> xth and Pe (k) <Th × Pz (k)). Similarly, when the received signal short-time power value Px (k) exceeds a predetermined threshold value xth, if the error signal short-time power value Pe (k) is larger than the threshold value (Th × Pz (k)), it is doubled. It is determined that the talk state or the echo path 6 is changing (Px (k)> xth and Pe (k)> Th × Pz (k)).

このダブルトーク状態判定手段９２の判定結果と、受話信号短時間パワー値Ｐｘ（ｋ）と収音信号短時間パワー値Ｐｚ（ｋ）とが損失量決定手段９４に入力される。
損失量決定手段９４は、ダブルトーク状態ではない時に例えば収音信号短時間パワー値Ｐｚ（ｋ）と受話信号短時間パワー値Ｐｘ（ｋ）とから反響路結合量（Ｐｚ（ｋ）/Ｐｘ（ｋ））を求め、その逆数を損失量として決定する。その損失量は損失量制御手段９６に入力される。
損失量制御手段９６は、受話信号ｘ（ｎ）と誤差信号ｅ（ｎ）とを用いて送受話状態の判定を行なう。受話信号ｘ（ｎ）のみの通信状態と判定された場合、損失量制御手段９６は送話側の損失器９９に損失を挿入する。送話信号ｓ（ｎ）のみの通信状態と判断された場合、損失量制御手段９６は受話側の損失器９８に損失を挿入する。ダブルトーク状態の時には、送受話双方に損失は挿入されない。 The determination result of the double talk state determining means 92, the received signal short time power value Px (k), and the collected sound signal short time power value Pz (k) are input to the loss amount determining means 94.
The loss amount determining means 94, when not in the double talk state, for example, uses the collected sound signal short-time power value Pz (k) and the received signal short-time power value Px (k) to generate an echo path coupling amount (Pz (k) / Px ( k)), and the reciprocal thereof is determined as a loss amount. The loss amount is input to the loss amount control means 96.
The loss amount control means 96 determines the transmission / reception state using the reception signal x (n) and the error signal e (n). If it is determined that only the received signal x (n) is in the communication state, the loss amount control means 96 inserts a loss into the losser 99 on the transmission side. When it is determined that the communication state is only for the transmission signal s (n), the loss amount control means 96 inserts a loss into the losser 98 on the reception side. In the double talk state, no loss is inserted in both the transmission and reception.

以上のように動作することで遠端と近端を一巡する通信路（ループ）の利得が１を超えてハウリングが発生することを防止している。
特許第３２６８５７２号、図５ By operating as described above, it is possible to prevent howling from occurring when the gain of a communication path (loop) that makes a round between the far end and the near end exceeds 1.
Japanese Patent No. 3268572, FIG.

しかしながら従来の方法では、ダブルトーク状態ではない（すなわちシングルトーク状態）場合を判定し、その時にのみ反響結合量を決定する。ここで、適応アルゴリズムにより推定手段８２が生成する適応フィルタ係数ｈ＾（ｎ）に大きな誤差がある場合、ダブルトーク状態の判定を誤ることがある。ダブルトーク状態の判定を誤ると通話品質が劣化する。
シングルトーク状態であるのに、ダブルトーク状態と誤判定された場合には、損失器９８と９９に損失が挿入されないので、誤差信号ｅ（ｎ）がそのまま遠端側に送信されてしまう。ダブルトーク状態であった場合に受話シングルトーク状態と誤判定された場合には、送話側の損失器９９に損失が挿入されるので発話に途切れが発生してしまう。 However, in the conventional method, a case where the state is not the double talk state (that is, the single talk state) is determined, and the echo coupling amount is determined only at that time. Here, when there is a large error in the adaptive filter coefficient ＾ (n) generated by the estimation means 82 using the adaptive algorithm, the determination of the double talk state may be erroneous. If the determination of the double talk state is incorrect, the call quality deteriorates.
If the single talk state is erroneously determined as the double talk state, no loss is inserted into the loss units 98 and 99, and the error signal e (n) is transmitted to the far end side as it is. If it is erroneously determined to be the received single talk state when it is in the double talk state, a loss is inserted into the loss device 99 on the transmission side, so that the speech is interrupted.

このダブルトーク状態の誤判定は、上記したように推定手段８２による適応フィルタ係数ｈ＾（ｎ）に大きな誤差がある場合に起こり得る。その誤差は、例えば話者が移動したりマイクロホンの位置を移動させたりすることで、反響路６のインパルス応答が変化することで発生する。つまり、適応フィルタ係数ｈ＾（ｎ）の学習が充分進んだ状況では、擬似反響路８４が生成する擬似エコー信号ｙ＾（ｎ）と、反響路６におけるエコー信号ｙ（ｎ）とは等しい。しかし、反響路６のインパルス応答が変化すると、新たな反響路６に基づくエコー信号ｙ（ｎ）が変化するので、擬似エコー信号ｙ＾（ｎ）との間に大きな誤差が発生することがある。新たな適応フィルタ係数ｈ＾（ｎ）を推定手段８２が学習するのに例えば約２秒程度の時間が必要なので、この間にダブルトーク状態の誤判定が発生することがある。 This erroneous determination of the double talk state can occur when there is a large error in the adaptive filter coefficient ＾ (n) by the estimation means 82 as described above. The error occurs when the impulse response of the echo path 6 changes due to, for example, the speaker moving or moving the position of the microphone. That is, in a situation where learning of the adaptive filter coefficient ＾ (n) is sufficiently advanced, the pseudo echo signal ＾ (n) generated by the pseudo echo path 84 and the echo signal y (n) in the echo path 6 are equal. However, since the echo signal y (n) based on the new echo path 6 changes when the impulse response of the echo path 6 changes, a large error may occur with the pseudo echo signal y ^ (n). . Since it takes about 2 seconds for the estimation means 82 to learn the new adaptive filter coefficient ＾ (n), for example, an erroneous determination of the double talk state may occur during this time.

この誤判定を避けるには、適応フィルタ係数ｈ＾（ｎ）による擬似エコーｙ＾（ｎ）を減算した誤差信号ｅ（ｎ）を用いないで判定すればよい。つまり、受話信ｘ（ｎ）と収音信号ｚ（ｎ）のみを用いてダブルトーク状態を判定することができればよい。
この発明は、このような課題に鑑みてなされたものであり、受話信号ｘ（ｎ）と収音信号ｚ（ｎ）のみを用いてダブルトーク状態を判定することができる反響消去装置及びその判定方法、そのプログラム、及びその記録媒体を提供することを目的とする。 In order to avoid this erroneous determination, the determination may be made without using the error signal e (n) obtained by subtracting the pseudo echo ＾ (n) from the adaptive filter coefficient ＾ (n). That is, it is sufficient that the double talk state can be determined using only the reception signal x (n) and the collected sound signal z (n).
The present invention has been made in view of such a problem, and an echo canceller capable of determining a double talk state using only the received signal x (n) and the sound pickup signal z (n) and the determination thereof. It is an object to provide a method, a program thereof, and a recording medium thereof.

この発明による反響消去装置は、受話端と送話端から同時に信号が入力される同時通話であるダブルトーク状態か、受話のみの受話シングルトーク状態かを判定するダブルトーク状態判定手段を含み、
上記受話端と送話端から入力される離散化された受話信号ｘ（ｎ）と収音信号ｚ（ｎ）に対して、ｎは自然数、
上記ダブルトーク状態判定手段は、受話発話判定計数手段と収音発話判定計数手段とダブルトーク状態判定部とで構成され、
受話発話判定計数手段は、受話信号の短時間フレーム毎の受話指標値を、閾値ｘｔｈと比較して発話の連続するフレーム数を求め、
収音発話判定計数手段は、収音信号の短時間フレーム毎の収音指標値を、閾値ｚｔｈと比較して発話の連続するフレーム数を求め、
ダブルトーク状態判定部は、受話発話判定計数手段と収音発話判定計数手段で求められたそれぞれの連続するフレーム数の大小比較により、ダブルトーク状態か受話のみの受話シングルトーク状態かを判定するものである。 The echo canceling apparatus according to the present invention includes a double talk state determining means for determining whether a double talk state is a simultaneous call in which a signal is simultaneously input from the receiving end and the transmitting end, or a receiving single talk state for receiving only,
For the discretized received signal x (n) and the collected sound signal z (n) input from the receiving end and the transmitting end, n is a natural number,
The double talk state determination means includes a received utterance determination counting means, a collected utterance determination counting means, and a double talk state determination unit.
The received utterance determination counting means compares the reception index value for each short time frame of the received signal with the threshold value xth to obtain the number of consecutive frames of utterance,
The sound collection utterance determination counting means obtains the number of consecutive frames of speech by comparing the sound collection index value for each short time frame of the sound collection signal with the threshold value zth,
The double-talk state determination unit determines whether the state is a double-talk state or a reception-only single-talk state by comparing the number of consecutive frames obtained by the received-speech determination counting unit and the collected-speech determination unit. It is.

この発明の反響消去装置によれば、上記短時間フレーム毎に通話状態を判断することができ、受話信号ｘ（ｎ）と収音信号ｚ（ｎ）のみを用いてダブルトーク状態の判定を正しく行なうことが可能である。これにより、擬似反響路の推定に大きな誤差がある場合でも、送話信号の途切れをなくし、エコー信号の送信を抑えることができる。 According to the echo canceling apparatus of the present invention, it is possible to determine the call state for each short-time frame, and to correctly determine the double talk state using only the received signal x (n) and the collected sound signal z (n). It is possible to do. Thereby, even when there is a large error in the estimation of the pseudo echo path, it is possible to eliminate the interruption of the transmission signal and suppress the transmission of the echo signal.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

この発明の実施例１のダブルトーク状態判定手段２０を含む反響消去装置１００の機能構成例を図１に示し、図１中のダブルトーク状態判定手段２０の実施例１の機能構成例を図２に示す。実施例１の反響消去装置１００は、先に図１１に示した従来の反響消去装置８００と、ダブルトーク状態判定手段２０のみが異なり、その他の各部の動作は同じ場合を例としている。また、図１も自分側の近端の反響消去装置１００を示す点は同じである。したがって、ここではダブルトーク状態判定手段２０の動作について説明する。ダブルトーク状態判定手段２０の主要な処理の流れを図３に示す。 FIG. 1 shows a functional configuration example of the echo canceling apparatus 100 including the double talk state determination means 20 of the first embodiment of the present invention, and FIG. 2 shows a functional configuration example of the first embodiment of the double talk state determination means 20 in FIG. Shown in The echo canceling apparatus 100 of the first embodiment is different from the conventional echo canceling apparatus 800 shown in FIG. 11 only in the double talk state determination means 20, and the operation of other parts is the same. Also, FIG. 1 is the same in that it shows a near-end echo canceling apparatus 100 on its own side. Therefore, the operation of the double talk state determination means 20 will be described here. The main processing flow of the double talk state determination means 20 is shown in FIG.

ダブルトーク状態判定手段２０には、受話信号ｘ（ｎ）と送話手段４の出力信号である収音信号ｚ（ｎ）が入力される。（ｎ）は、例えば１６ｋＨｚでサンプリングされた離散化された信号であることを表わす自然数である。
ダブルトーク状態判定手段２０は、受話発話判定計数手段２２と、収音発話判定計数手段２４とダブルトーク状態判定部３２とから構成される。
受話信号ｘ（ｎ）が入力される受話発話判定計数手段２２と、収音信号ｚ（ｎ）が入力される収音発話判定計数手段２４とは、基本的な構成が同じである。そこで、受話発話判定計数手段２２の動作を中心に説明を行い。異なる部分については、追記して説明する。 The double talk state determination means 20 receives the received signal x (n) and the sound pickup signal z (n) that is the output signal of the transmission means 4. (N) is a natural number representing a discretized signal sampled at 16 kHz, for example.
The double talk state determination unit 20 includes a received utterance determination counting unit 22, a collected utterance determination counting unit 24, and a double talk state determination unit 32.
The received utterance determination counting means 22 to which the received signal x (n) is input and the collected voice utterance determination counting means 24 to which the collected sound signal z (n) is input have the same basic configuration. Therefore, the operation of the received utterance determination counting unit 22 will be mainly described. Different parts will be described later.

受話信号ｘ（ｎ）が入力される受話発話判定計数手段２２は、受話信号ｘ（ｎ）を短時間フレーム（後述する）に分割するフレーム分割部２６と、短時間フレーム毎のエネルギー又は振幅の絶対値に対応する受話指標値を計算する受話指標値計算部２８と、受話指標値が閾値ｘｔｈより大であるフレームを判定する受話発話判定部２９と、受話指標が閾値ｘｔｈよりも大である連続するフレームを計数する受話発話カウンター３０とからなる。
例えば１６ｋＨｚでサンプリングされた受話信号ｘ（ｎ）は、フレーム分割部２６において短時間フレームに分割される。短時間フレームの分割は、例えば受話信号ｘ（ｎ）を１６０個数えたら計数値を１累進するｘ（ｎ）カウンター２６１で行なわれる。つまり、ｘ（ｎ）カウンター２６１の計数値ｋによって受話信号ｘ（ｎ）は、短時間フレームに分割される。この例の場合、１フレームの長さは１０ｍｓである。 The received utterance determination unit 22 to which the received signal x (n) is input includes a frame dividing unit 26 that divides the received signal x (n) into short frames (described later), and energy or amplitude of each short frame. A reception index value calculation unit 28 that calculates a reception index value corresponding to the absolute value, a reception utterance determination unit 29 that determines a frame whose reception index value is larger than the threshold value xth, and a reception index that is larger than the threshold value xth. The received speech counter 30 counts consecutive frames.
For example, the received signal x (n) sampled at 16 kHz is divided into short frames by the frame dividing unit 26. The short-time frame division is performed by, for example, an x (n) counter 261 that increments the count value by 1 when 160 received signals x (n) are counted. That is, the received signal x (n) is divided into short frames by the count value k of the x (n) counter 261. In this example, the length of one frame is 10 ms.

フレーム分割部２６は、ｘ（ｎ）カウンター２６１の計数値ｋと受話信号ｘ（ｎ）を受話指標値計算部２８に出力する（図３、ステップＳ１０）。
受話指標値計算部２８は、同一の短時間フレーム内の受話信号ｘ（ｎ）のパワーの累積をパワー計算部２８１で求め、その短時間フレーム内パワーを平均化部２８３で平均することで当該短時間フレーム内のパワーの平均値Ｐｘｍ（ｋ）を求める（ステップＳ１２）。求めた当該短時間フレーム内のパワーの平均値Ｐｘｍ（ｋ）は受話発話判定部２９に入力される。その短時間フレーム内のパワーの平均値Ｐｘｍ（ｋ）と、レジスタ２９２内の実験的に求める閾値ｘｔｈとを、比較部２９１で比較し、パワーの平均値が閾値ｘｔｈ以上の場合（ステップＳ１４がＹｅｓ）、つまり発話ありの場合、比較部２９１は受話信号発話フラグｆｘ（ｋ）＝１を受話発話カウンター３０に出力する。パワーの平均値が閾値ｘｔｈより小さい発話なしの場合、比較部２９１は受話信号発話フラグｆｘ（ｋ）＝０を受話発話カウンター３０に出力する。短時間フレーム内のパワーの平均値Ｐｘｍ（ｋ）が、レジスタ２９２内の閾値ｘｔｈよりも大（ステップＳ１４の不等号が＞でもよい）の時に発話ありとしてもよい。 The frame division unit 26 outputs the count value k of the x (n) counter 261 and the reception signal x (n) to the reception index value calculation unit 28 (FIG. 3, step S10).
The reception index value calculation unit 28 calculates the accumulated power of the reception signal x (n) in the same short time frame by the power calculation unit 281 and averages the short time frame power by the averaging unit 283. An average value Pxm (k) of power in the short-time frame is obtained (step S12). The obtained average power value Pxm (k) in the short-time frame is input to the received utterance determination unit 29. The comparison unit 291 compares the average power value Pxm (k) in the short-time frame with the experimentally obtained threshold value xth in the register 292. If the average power value is greater than or equal to the threshold value xth (step S14) Yes), that is, when there is an utterance, the comparison unit 291 outputs the received signal utterance flag fx (k) = 1 to the received utterance counter 30. When there is no utterance whose average power is smaller than the threshold value xth, the comparison unit 291 outputs the received signal utterance flag fx (k) = 0 to the received utterance counter 30. An utterance may be given when the average value Pxm (k) of the power in the short-time frame is larger than the threshold value xth in the register 292 (the inequality sign in step S14 may be>).

ここで短時間フレーム内のパワーは、平均化部２８３で平均せずに累積値でもよい。また、受話信号ｘ（ｎ）の振幅の絶対値の累積を累積振幅絶対値算出部２８２で求め、それを平均化部２８３で平均した受話信号ｘ（ｎ）の振幅の絶対値を用いてもよい。あるいは、平均せず累積振幅絶対値を用いてもよい。更に、発話区間の判定方法としては、音声区間の検出方法として従来から用いられている振幅の零交叉数を用いてもよい。つまり発話音声があるとなしで振幅の零交叉数が異なることを利用してもよい。また、後述する実施例２のように受話信号ｘ（ｎ）と収音信号ｚ（ｎ）をそれぞれ周波数分析する方法においては、雑音区間のスペクトルと発話がある区間のスペクトルとの差から発話区間を判定してもよい。従来の発話検出方法の何れを用いてもよい。このように受話指標値計算部２８において求める受話指標値は各種のものが考えられる。何れの値を受話指標値にするかによって受話発話判定部２９内のレジスタ２９２に格納する閾値ｘｔｈの値も異なるものとなる。 Here, the power in the short-time frame may be an accumulated value without being averaged by the averaging unit 283. Alternatively, the absolute value of the amplitude of the received signal x (n) obtained by accumulating the absolute value of the amplitude of the received signal x (n) by the accumulated amplitude absolute value calculating unit 282 and averaged by the averaging unit 283 may be used. Good. Alternatively, the accumulated amplitude absolute value may be used without averaging. Further, as the method for determining the speech section, the zero crossing number of the amplitude conventionally used as the method for detecting the voice section may be used. In other words, the fact that the zero crossing number of the amplitude is different without and with the uttered voice may be used. Further, in the method of frequency analysis of the received signal x (n) and the collected sound signal z (n) as in Example 2 described later, the utterance interval is determined from the difference between the spectrum of the noise interval and the spectrum of the utterance interval. May be determined. Any of the conventional utterance detection methods may be used. As described above, the reception index value calculated by the reception index value calculation unit 28 may be various. The value of the threshold value xth stored in the register 292 in the received speech utterance determination unit 29 differs depending on which value is used as the received speech index value.

受話信号発話フラグｆｘ（ｋ）が入力される受話発話カウンター３０は、受話信号発話フラグｆｘ（ｋ）が連続する時の、短時間フレームの数を計数する（ステップＳ１６）。
受話発話カウンター３０の初期値は０と（ｃｘ（０）＝０）とする。ｋフレーム目で受話信号発話フラグが発話ありの場合、受話発話カウンター３０の計数値は、１フレーム前の計数値ｃｘ（ｋ−１）に＋１される（ステップＳ１６、ｃｘ（ｋ）＝ｃｘ（ｋ−１）＋１）。
逆にｋフレーム目で受話信号発話フラグｆｘ（ｋ）が発話なしの場合（ｆｘ（ｋ）＝０）、受話発話カウンター３０の計数値は０にリセットされる（ステップＳ１８、ｃｘ（ｋ）＝０）。つまり受話発話カウンター３０は、受話信号側の連続する発話フレーム数を計数することになる。 The received utterance counter 30 to which the received signal utterance flag fx (k) is input counts the number of short-time frames when the received signal utterance flag fx (k) continues (step S16).
The initial values of the received speech counter 30 are 0 and (cx (0) = 0). When the received signal utterance flag is uttered at the kth frame, the count value of the received utterance counter 30 is incremented by 1 to the count value cx (k−1) of the previous frame (step S16, cx (k) = cx ( k-1) +1).
Conversely, when the received signal utterance flag fx (k) is not uttered at the kth frame (fx (k) = 0), the count value of the received utterance counter 30 is reset to 0 (step S18, cx (k) = 0). That is, the reception speech counter 30 counts the number of continuous speech frames on the reception signal side.

収音信号ｚ（ｘ）が入力される収音発話判定計数手段２４も上記した受話発話判定計数手段２２と、実験的に求められる閾値ｚｔｈの値が異なるだけで全く同じ動作である。したがって、受話発話判定計数手段２２と同一の構成には、受話発話判定計数手段２２の参照符号にダッシュ（’）を付けて図２中に表記することで、収音発話判定計数手段２４の説明を省略する（以降も同様とする）。
つまり、収音発話カウンター３０’は、収音信号側の連続する発話フレーム数を計数することになる。受話発話カウンター３０の計数値ｃｘ（ｋ）と、収音発話カウンター３０’の計数値ｃｚ（ｋ）は、ダブルトーク状態判定部３２に入力される。 The sound collection utterance determination counting means 24 to which the sound collection signal z (x) is inputted is exactly the same as the above-described received speech utterance determination counting means 22 except that the threshold value zth obtained experimentally is different. Therefore, in the same configuration as the received utterance determination unit 22, the reference sign of the received utterance determination unit 22 is indicated with a dash (′) in FIG. Is omitted (the same applies hereinafter).
That is, the sound collection utterance counter 30 ′ counts the number of continuous utterance frames on the sound collection signal side. The count value cx (k) of the reception utterance counter 30 and the count value cz (k) of the sound collection utterance counter 30 ′ are input to the double talk state determination unit 32.

ダブルトーク状態判定部３２は、受話発話カウンター３０の計数値ｃｘ（ｋ）が０より大（ステップＳ２０でＹｅｓ、ｃｘ（ｋ）＞０）で、且つ収音発話カウンター３０’の計数値ｃｚ（ｋ）が０より大（ステップＳ２２でＹｅｓ、ｃｚ（ｋ）＞０）の時、すなわち受話信号ｘ（ｎ）と収音信号ｚ（ｎ）と共に発話ありのとき、ダブルトーク状態か受話シングルトーク状態かを第１状態判定部３２０が判定する。
一般に、受話シングルトーク状態であれば、受話信号発話開始からその直後に収音信号（この場合、エコー信号ｙ（ｎ）のみ）の発話開始となる。つまり、受話シングルトーク状態であれば、エコー信号ｙ（ｎ）は受話信号ｘ（ｎ）より遅れるため、受話発話カウンター３０の計数値ｃｘ（ｋ）が０より大になる前に、収音発話カウンター３０’の計数値ｃｚ（ｋ）が０より大になることはない。従って、受話シングルトーク状態では受話発話カウンター３０の計数値ｃｘ（ｋ）より収音発話カウンター３０’の計数値ｃｚ（ｋ）の値が大きくなることはない。 The double-talk state determination unit 32 has a count value cx (k) of the reception utterance counter 30 that is greater than 0 (Yes in step S20, cx (k)> 0) and a count value cz ( When k) is larger than 0 (Yes in step S22, cz (k)> 0), that is, when there is an utterance together with the received signal x (n) and the collected sound signal z (n), the double talk state or the received single talk The first state determination unit 320 determines whether the state is present.
In general, in the received single talk state, the voice collection signal (in this case, only the echo signal y (n)) is started immediately after the start of the reception signal utterance. In other words, in the received single talk state, since the echo signal y (n) is delayed from the received signal x (n), the collected utterance utterance before the count value cx (k) of the received utterance counter 30 becomes larger than 0. The count value cz (k) of the counter 30 ′ never exceeds 0. Therefore, in the received single talk state, the count value cz (k) of the collected voice utterance counter 30 ′ does not become larger than the count value cx (k) of the received utterance counter 30.

これらを鑑みて、ダブルトーク状態を次のように判定する。受話発話カウンター３０の計数値ｃｘ（ｋ）から、収音発話カウンター３０’の計数値ｃｚ（ｋ）を引き算した値が負の場合（ステップＳ２４がＹｅｓ）にダブルトーク状態と判定し、ダブルトーク状態判定フラグｆｄ（ｋ）をｆｄ（ｋ）＝１（ステップＳ２６）として損失量決定手段９４に出力する。
逆に受話発話カウンター３０の計数値ｃｘ（ｋ）から、収音発話カウンター３０’の計数値ｃｚ（ｋ）を引き算した値が、０以上の場合（ステップＳ２４がＮｏ、ｃｘ（ｋ）−ｃｚ（ｋ）≧０）に受話シングルトーク状態と判定し、ダブルトーク状態判定フラグｆｄ（ｋ）をｆｄ（ｋ）＝０（ステップＳ２８）として損失量決定手段９４に出力する。 In view of these, the double talk state is determined as follows. When the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is negative (step S24 is Yes), the double talk state is determined. The state determination flag fd (k) is output to the loss amount determining means 94 as fd (k) = 1 (step S26).
Conversely, when the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is 0 or more (step S24 is No, cx (k) −cz). When (k) ≧ 0), the received single talk state is determined, and the double talk state determination flag fd (k) is output to the loss amount determining means 94 as fd (k) = 0 (step S28).

以上述べたダブルトーク状態判定部３２の動作のタイムチャートを図４に示す。図４の横方向は時間であり、目盛り一つが１個の短時間フレームを表わす。この例の場合、短時間フレームは１０ｍｓである。縦方向は受話信号発話フラグｆｘ（ｋ）と収音信号発話フラグｆｚ（ｋ）の１，０の状態、その１，０の状態の中に数字で受話発話カウンター３０と収音発話カウンター３０’の計数値を示す。その計数値の下に受話発話カウンター３０の計数値ｃｘ（ｋ）から、収音発話カウンター３０’の計数値ｃｚ（ｋ）を引き算した値を示す。 FIG. 4 shows a time chart of the operation of the double talk state determination unit 32 described above. The horizontal direction in FIG. 4 is time, and one scale represents one short frame. In this example, the short time frame is 10 ms. In the vertical direction, the received signal utterance flag fx (k) and the collected sound signal utterance flag fz (k) are in the 1, 0 state, and the received utterance counter 30 and the collected utterance counter 30 ′ are numerically included in the 1, 0 state. The count value is shown. Below the count value, a value obtained by subtracting the count value cz (k) of the collected speech counter 30 'from the count value cx (k) of the received speech counter 30 is shown.

今、ある時刻に受話信号ｘ（ｎ）が発生し、その受話指標値が閾値ｘｔｈよりも大であると、その短時間フレームから受話発話カウンター３０は、計数を開始する。図４（ａ）に受話発話状態が１秒継続したした後に、受話信号ｘ（ｎ）がない状態があって、再び受話信号ｘ（ｎ）が発生した状況を示す。
最初の受話発話カウンター３０の計数開始時が受話シングルトーク状態であるとすると、図４（ａ）に示す様に、収音信号発話フラグｆｚ（ｋ）は同じフレームか遅れたフレームでｆｚ（ｋ）＝１になる。したがって、受話発話カウンター３０の計数値ｃｘ（ｋ）から、収音発話カウンター３０’の計数値ｃｚ（ｋ）を引き算した値は、負になることがない。ダブルトーク状態判定部３２は、この状態を受話シングルトーク状態と判断する。 Now, when the reception signal x (n) is generated at a certain time and the reception index value is larger than the threshold value xth, the reception utterance counter 30 starts counting from the short time frame. FIG. 4A shows a situation where there is no received signal x (n) after the received speech state has continued for 1 second, and the received signal x (n) is generated again.
Assuming that the first reception utterance counter 30 starts counting in the reception single talk state, as shown in FIG. 4A, the collected signal utterance flag fz (k) is the same frame or a delayed frame, fz (k ) = 1. Therefore, the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 never becomes negative. The double talk state determination unit 32 determines this state as the received single talk state.

逆に先に収音信号発話フラグｆｚ（ｋ）が発話あり（ｆｚ（ｋ）＝１）となって、収音発話カウンター３０’が計数を開始し、その後に、受話発話カウンター３０が遅れて計数を開始した場合を考える。この場合は、受話発話カウンター３０の計数値ｃｘ（ｋ）から、収音発話カウンター３０’の計数値ｃｚ（ｋ）を引き算した値は、負になる。この状態をダブルトーク状態と判断する。
なお、反響消去部８０としては、適応フィルタを用いる場合に限らず他の方法を用いてもよい。また、例えば、損失器９９において収音信号ｚ（ｎ）がスペクトラム制御されるなどの構成もあり、損失器９８，９９も図１に示すものに限られない。 Conversely, the sound collection signal utterance flag fz (k) is uttered first (fz (k) = 1), the sound collection utterance counter 30 ′ starts counting, and then the reception utterance counter 30 is delayed. Consider the case where counting is started. In this case, the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is negative. This state is determined as a double talk state.
The echo canceling unit 80 is not limited to using an adaptive filter, and other methods may be used. Further, for example, there is a configuration in which the sound pickup signal z (n) is spectrum-controlled in the loss device 99, and the loss devices 98 and 99 are not limited to those shown in FIG.

一般的に音声会議やＴＶ会議等の場面においては、遠端と近端は同時に発話することはない。つまり、相手の話を聞いてから自分が発言する場合には上記した実施例１の構成で、十分正しくダブルトーク状態の判定を行なうことが可能である。
しかし、議論が白熱して来ると、相手が発言を終了する前に自分が発言をしてしまうことがしばしば起こり得る。そのような状況でも正確にダブルトーク状態を判定できるようにした実施例２を次に説明する。 Generally, in a scene such as an audio conference or a video conference, the far end and the near end do not speak at the same time. That is, when the user speaks after listening to the other party's story, the double talk state can be determined sufficiently correctly with the configuration of the first embodiment described above.
However, when discussions get heated up, it is often possible for you to speak before the other person finishes speaking. A second embodiment in which the double talk state can be accurately determined even in such a situation will be described below.

図４（ｂ）に相手が発言を終了する前に自分が発言をした場合のダブルトーク状態判定部３２の動作タイムチャートを示す。横方向と縦方向、及び横方向の目盛りの関係も上記した図４（ａ）と同じである。
最初の受話発話カウンター３０の計数開始時が受話シングルトーク状態であれば、収音信号発話フラグｆｚ（ｋ）は同じフレームか遅れたフレームで遅れてｆｚ（ｋ）＝１になる。図４（ｂ）では図４（ａ）よりも反響路６における遅延量が大きく、収音信号発話フラグｆｚ（ｋ）が１フレーム遅れてｆｚ（ｋ）＝１になる状況を示している。この状態では、受話発話カウンター３０の計数値ｃｘ（ｋ）から、収音発話カウンター３０’の計数値ｃｚ（ｋ）を引き算した値は、１になり受話シングルトーク状態と判定される。 FIG. 4B shows an operation time chart of the double talk state determination unit 32 when the partner speaks before the partner finishes speaking. The relationship between the horizontal direction, the vertical direction, and the scale in the horizontal direction is the same as that in FIG.
If the first reception utterance counter 30 starts counting at the reception single talk state, the collected signal utterance flag fz (k) becomes fz (k) = 1 after the same frame or a delayed frame. FIG. 4B shows a situation in which the delay amount in the echo path 6 is larger than that in FIG. 4A and the sound pickup signal utterance flag fz (k) is delayed by one frame and fz (k) = 1. In this state, the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 becomes 1, and it is determined that the received single talk state.

この判定は上記した実施例１に示した構成で正確に行なわれるが、一度受話シングルトーク状態と判定された後に、自分側（収音側）がｃｚ（ｋ）＝９７の時刻に発話したとすると、ダブルトーク状態であるにも関わらず受話発話カウンター３０の計数値ｃｘ（ｋ）から、収音発話カウンター３０’の計数値ｃｚ（ｋ）を引き算した値は、１のままであり受話シングルトーク状態の判定が継続してしまう。したがって、収音側の損失器９９に損失が挿入された状態が維持されるので送話信号ｓ（ｎ）が通らない。
実施例２は、このような状況でも正確なダブルトーク状態の判定が行なえるようにしたものである。実施例２のダブルトーク状態判定手段５０の機能構成例を図５に示す。 Although this determination is performed accurately with the configuration shown in the first embodiment, it is assumed that the local side (sound collecting side) uttered at the time of cz (k) = 97 after it has been determined that the received single talk state has occurred. Then, the value obtained by subtracting the count value cz (k) of the collected utterance counter 30 ′ from the count value cx (k) of the received utterance counter 30 in spite of the double talk state remains 1 and the received single The determination of the talk state continues. Therefore, since the state in which the loss is inserted in the losser 99 on the sound collection side is maintained, the transmission signal s (n) does not pass.
In the second embodiment, an accurate double talk state can be determined even in such a situation. FIG. 5 shows a functional configuration example of the double talk state determination unit 50 of the second embodiment.

実施例２のダブルトーク状態判定手段５０では、受話信号ｘ（ｎ）を短時間フレーム毎に周波数分析する受話信号周波数分析部５６と、収音信号ｚ（ｎ）を短時間フレーム毎に周波数分析する収音信号周波数分析部５６’と、それぞれの周波数分析された受話信号と収音信号の短時間フレーム間同士の相関を求める相関部５４０と、が実施例１のダブルトーク状態判定手段２０に加えられている。加えられた各部とダブルトーク状態判定部５８以外の各部の動作は実施例１と同じである。ここでは、加えられた各部とダブルトーク状態判定部５８の動作を説明する。 In the double talk state determining means 50 of the second embodiment, the received signal frequency analyzing unit 56 that analyzes the frequency of the received signal x (n) for each short time frame, and the frequency analysis of the collected sound signal z (n) for each short time frame. The double-talk state determination unit 20 according to the first embodiment includes a collected sound signal frequency analysis unit 56 ′ that performs the correlation and a correlation unit 540 that obtains a correlation between the short-term frames of the received signal and the collected sound signal. It has been added. The operations of the respective units other than the added units and the double talk state determination unit 58 are the same as those in the first embodiment. Here, the operation of each added unit and the double talk state determination unit 58 will be described.

受話信号周波数分析部５６にはフレーム分割部２６から分割された受話信号ｘ（ｋ）と短時間フレームが入力される。受話信号周波数分析部５６は、周知の技術である短時間離散フーリエ変換などで、短時間フレーム内の受話信号ｘ（ｎ）が、例えばｆ１〜ｆＮまでのＮ個の周波数に対応するＮ個の離散周波数領域信号Ｘ（ｋ，ｆ１）、‥‥、Ｘ（ｋ，ｆｎ）、‥‥Ｘ（ｋ，ｆＮ）に変換される。
受話信号パワー計算部５４において、離散周波数領域受話信号Ｘ（ｋ，ｆ＊）（＊＝１，２，‥‥，Ｎ）は、受話離散パワー計算部５４１で、各離散周波数受話信号Ｘ（ｋ，ｆ＊）のそれぞれのパワーが計算され、各パワーを要素とする受話パワーベクトルＰＸＶｋ＝[ＰＸ（ｋ，ｆ１），ＰＸ（ｋ，ｆ２），‥‥，ＰＸ（ｋ，ｆＮ）]が求められる。 The reception signal frequency analysis unit 56 receives the reception signal x (k) divided from the frame division unit 26 and the short-time frame. The received signal frequency analyzing unit 56 is a known technique such as a short-time discrete Fourier transform, and the received signal x (n) in the short-time frame has N numbers corresponding to N frequencies from f1 to fN, for example. Discrete frequency domain signals X (k, f1), ..., X (k, fn), ... X (k, fN).
In the received signal power calculation unit 54, the discrete frequency domain received signal X (k, f *) (* = 1, 2,..., N) is received by the received discrete power calculation unit 541 in each discrete frequency received signal X (k , F *) is calculated, and a received power vector PXVk = [PX (k, f1), PX (k, f2),..., PX (k, fN)] having each power as an element is obtained. It is done.

収音信号ｚ（ｘ）側のフレーム分割部２６’から収音信号ｚ（ｎ）と短時間フレームが入力される収音信号周波数分析部５６’と収音信号パワー計算部５４’の動作も、受話信号ｘ（ｎ）側と全く同じであり、収音信号パワー計算部５４’を構成する収音離散パワー計算部５４１’で収音パワーベクトルＰＺＶｋ＝[ＰＸ（ｋ，ｆ１），ＰＸ（ｋ，ｆ２），‥‥，ＰＸ（ｋ，ｆＮ）]が求められる。
受話パワーベクトルＰＸＶｋ＝[ＰＸ（ｋ，ｆ１），ＰＸ（ｋ，ｆ２），‥‥，ＰＸ（ｋ，ｆＮ）]と収音パワーベクトルＰＺＶｋ＝[ＰＸ（ｋ，ｆ１），ＰＸ（ｋ，ｆ２），‥‥，ＰＸ（ｋ，ｆＮ）]は相関計算部５２に入力され、それぞれの相関値ＣＸＺ（ｋ）がＣＸＺ（ｋ）＝（ＰＸＶｋ・ＰＺＶｋ）/｜ＰＸＶｋ｜｜ＰＺＶｋ）｜で計算される。ここで「・」はベクトルの内積を表わす。この相関値ＣＸＺ（ｋ）は、正規化された値であり０〜１までの値を持つ。 The operations of the sound collection signal frequency analysis unit 56 ′ and the sound collection signal power calculation unit 54 ′ to which the sound collection signal z (n) and the short-time frame are input from the frame division unit 26 ′ on the sound collection signal z (x) side are also described. The collected sound power vector PZVk = [PX (k, f1), PX () in the collected sound discrete power calculation unit 541 ′ constituting the collected sound signal power calculation unit 54 ′. k, f2),..., PX (k, fN)].
Received power vector PXVk = [PX (k, f1), PX (k, f2),..., PX (k, fN)] and sound collection power vector PZVk = [PX (k, f1), PX (k, f2) ,..., PX (k, fN)] are input to the correlation calculation unit 52, and the respective correlation values CXZ (k) are calculated by CXZ (k) = (PXVk · PZVk) / | PXVk || PZVk) | Is done. Here, “·” represents an inner product of vectors. The correlation value CXZ (k) is a normalized value and has a value from 0 to 1.

相関値ＣＸＺ（ｋ）は、ダブルトーク状態判定部５８に入力される。ダブルトーク状態判定部５８には、相関値ＣＸＺ（ｋ）の他に、上記した受話発話カウンター３０の計数値ｃｘ（ｋ）と、収音発話カウンター３０’の計数値ｃｚ（ｋ）とが入力される。
ダブルトーク状態判定部５８は、受話発話カウンター３０の計数値ｃｘ（ｋ）が０より大で、且つ収音発話カウンター３０’の計数値ｃｚ（ｋ）も０より大の場合には、受話信号ｘ（ｎ）と収音信号ｚ（ｎ）ともに発話ありと判定する（図３、ステップＳ２２のＹｅｓ）。この場合、ダブルトーク状態判定部５８はダブルトーク状態か受話シングルトーク状態かを判定する。その処理を図３に破線で示す。 The correlation value CXZ (k) is input to the double talk state determination unit 58. In addition to the correlation value CXZ (k), the double talk state determination unit 58 receives the count value cx (k) of the received speech counter 30 and the count value cz (k) of the collected speech counter 30 ′. Is done.
When the count value cx (k) of the received speech counter 30 is greater than 0 and the count value cz (k) of the collected speech counter 30 ′ is also greater than 0, the double talk state determination unit 58 receives the received signal. Both x (n) and the collected sound signal z (n) are determined to have utterance (Yes in FIG. 3, step S22). In this case, the double talk state determination unit 58 determines whether the state is a double talk state or an incoming single talk state. The process is indicated by a broken line in FIG.

受話発話カウンター３０の計数値ｃｘ（ｋ）と収音発話カウンター３０’の計数値ｃｚ（ｋ）の小さい方の値が閾値ｔｈｄ以下の場合は、実施例１に示した発話状態に基づく判定を第１状態判定部３２０が行なう（ステップＳ３０のＹｅｓ）。ここで閾値ｔｈｄは、例えば１秒程度に相当するフレーム値を設定する。この例の場合、フレーム長が１０ｍｓであるので閾値ｔｈｄ＝１００とする。
ステップＳ３０で受話発話カウンター３０の計数値ｃｘ（ｋ）と収音発話カウンター３０’の計数値ｃｚ（ｋ）の小さい方の値が閾値ｔｈｄより大きいと状態判定選択部５８１で判断されると、ダブルトーク状態判定部５８の第２状態判定部５８２が、相関値ＣＸＺ（ｋ）を参照してダブルトーク状態であるか否かを判定する。 When the smaller one of the count value cx (k) of the received speech counter 30 and the count value cz (k) of the collected speech counter 30 ′ is equal to or smaller than the threshold thd, the determination based on the speech state shown in the first embodiment is performed. The first state determination unit 320 performs (Yes in step S30). Here, the threshold value thd is set to a frame value corresponding to, for example, about 1 second. In this example, since the frame length is 10 ms, the threshold thd = 100.
When the state determination selection unit 581 determines that the smaller value of the count value cx (k) of the reception utterance counter 30 and the count value cz (k) of the sound collection utterance counter 30 ′ is greater than the threshold thd in step S30, The second state determination unit 582 of the double talk state determination unit 58 refers to the correlation value CXZ (k) and determines whether or not the double talk state is in effect.

相関値ＣＸＺ（ｋ）は、短時間フレーム毎の受話パワーベクトルＰＸＶｋと収音パワーベクトルＰＺＶｋとの相関が高いと１に近い値をとる。したがって、受話シングルトーク状態では、相関値ＣＸＺ（ｋ）は１に近い値になる。そこでステップＳ３２で閾値ｔｈｃと、相関値ＣＸＺ（ｋ）とを比較して、閾値ｔｈｃよりも相関値ＣＸＺ（ｋ）が大であれば受話シングルトーク状態と判定（ステップＳ３４）し、相関値ＣＸＺ（ｋ）が、閾値ｔｈｃよりも小さければ、ダブルトーク状態と判定する（ステップＳ３６）。
このように受話信号ｘ（ｎ）と収音信号ｚ（ｎ）とをそれぞれ周波数領域信号に変換して、その相関を取ることで、相手の発言が終了する前に自分が発言をする様な状況でも正しくダブルトーク状態を判定することが出来る。又、ダブルトーク状態から受話シングルトーク状態に変化した時も同様に判定することが出来る。 The correlation value CXZ (k) takes a value close to 1 when the correlation between the reception power vector PXVk and the sound collection power vector PZVk for each short time frame is high. Therefore, in the received single talk state, the correlation value CXZ (k) is close to 1. Therefore, in step S32, the threshold value thc is compared with the correlation value CXZ (k), and if the correlation value CXZ (k) is larger than the threshold value thc, it is determined as the received single talk state (step S34), and the correlation value CXZ. If (k) is smaller than the threshold thc, it is determined that the state is a double talk state (step S36).
In this way, the received signal x (n) and the collected sound signal z (n) are converted into frequency domain signals, respectively, and their correlation is obtained, so that the user speaks before the other party's speech is completed. Even in a situation, the double talk state can be correctly determined. The same determination can be made when the state changes from the double talk state to the received single talk state.

なお、受話パワーベクトルＰＸＶｋと収音パワーベクトルＰＺＶｋの相関を、フレーム内の同一の離散周波数毎に求める例で説明を行ったが、この発明はこの例に限定されない。離散周波数領域受話信号Ｘ（ｋ，ｆ＊）の振幅を用いて短時間フレーム間の相関を求めてもよい。その場合、受話離散パワー計算部５４１は、離散周波数領域受話信号Ｘ（ｋ，ｆ＊）の振幅を求める受話離散振幅部５４１として機能する。
この場合、上記した受話パワーベクトルは、受話振幅ベクトルＸＶｋ＝[Ｘ（ｋ，ｆ１），Ｘ（ｋ，ｆ２），‥‥，Ｘ（ｋ，ｆＮ）]となり、収音振幅ベクトルＺＶｋ＝[Ｘ（ｋ，ｆ１），Ｘ（ｋ，ｆ２），‥‥，Ｘ（ｋ，ｆＮ）]と共に相関計算部５２に入力される。それぞれの相関値ＣＸＺ（ｋ）は、それぞれの振幅ベクトルの内積をそれぞれの振幅の絶対値の積で除算されて計算される（ＣＸＺ（ｋ）＝（ＸＶｋ・ＺＶｋ）/｜ＸＶｋ｜｜ＺＶｋ｜）。 Although the example in which the correlation between the reception power vector PXVk and the sound collection power vector PZVk is obtained for each identical discrete frequency in the frame has been described, the present invention is not limited to this example. Correlation between short-time frames may be obtained using the amplitude of the discrete frequency domain received signal X (k, f *). In that case, the received discrete power calculation unit 541 functions as the received discrete amplitude unit 541 for obtaining the amplitude of the discrete frequency domain received signal X (k, f *).
In this case, the reception power vector described above is reception amplitude vector XVk = [X (k, f1), X (k, f2),..., X (k, fN)], and sound collection amplitude vector ZVk = [X (K, f1), X (k, f2),..., X (k, fN)] are input to the correlation calculation unit 52. Each correlation value CXZ (k) is calculated by dividing the inner product of the respective amplitude vectors by the product of the absolute values of the respective amplitudes (CXZ (k) = (XVk · ZVk) / | XVk || ZVk | ).

又、周波数領域信号の外形や包絡の相関を求めることも考えられる。
離散周波数領域信号Ｘ（ｋ，ｆ＊）を、受話信号パワー計算部５４の受話バンド分割部５４２で、例えば３個（Ｘ（ｋ，ｆ１）〜Ｘ（ｋ，ｆ３））ずつ、或いは５個乃至１０個ずつ等間隔に分割する。
離散周波数領域信号Ｘ（ｋ，ｆ＊）の分割については色々な方法が考えられる。例えば、音声は一般的に低い周波数領域に特徴が出易いので、周波数の低い領域では少ない数の離散周波数領域信号Ｘ（ｋ，ｆ＊）に分割し、周波数の高い領域では数多くの離散周波数領域信号Ｘ（ｋ，ｆ＊）に分割しても良い。 It is also conceivable to obtain the correlation between the external shape and envelope of the frequency domain signal.
For example, three (X (k, f1) to X (k, f3)) or five discrete frequency domain signals X (k, f *) are received by the reception band dividing unit 542 of the reception signal power calculation unit 54. Divide into 10 pieces at regular intervals.
Various methods can be considered for dividing the discrete frequency domain signal X (k, f *). For example, since speech generally tends to have characteristics in a low frequency region, it is divided into a small number of discrete frequency region signals X (k, f *) in a low frequency region, and many discrete frequency regions in a high frequency region. You may divide | segment into the signal X (k, f *).

つまり、周波数の対数の関係で、離散周波数領域信号Ｘ（ｋ，ｆ＊）を分割する。又は、人間の聴感に合わせたメルスケールに対応させて分割することで、より聴感特性を考慮した分割にすることが出来る。
受話バンド分割部５４２で平滑された離散周波数ｆ＊を集約バンドｍｉ（ｉ＝１〜ｗ）で表わし、受話バンド列と称す。
受話平滑部５４２で分割された受話バンド列である離散周波数領域信号Ｘ（ｋ，ｍ＊）は、受話バンド平均化部５４３に入力され、バンド毎の受話バンドパワーベクトルＰＸＶｋ＝[ＰＸ（ｋ，ｍ１），ＰＸ（ｋ，ｍ２），‥‥，ＰＸ（ｋ，ｍｗ）]が計算される。 That is, the discrete frequency domain signal X (k, f *) is divided based on the logarithm of the frequency. Or it can divide | segment according to the mel scale matched with human audibility, and can be divided | segmented in consideration of the audibility characteristic more.
The discrete frequency f * smoothed by the reception band dividing unit 542 is expressed as an aggregate band mi (i = 1 to w), and is referred to as a reception band sequence.
The discrete frequency domain signal X (k, m *), which is the reception band sequence divided by the reception smoothing unit 542, is input to the reception band averaging unit 543, and the reception band power vector PXVk = [PX (k, m1), PX (k, m2),..., PX (k, mw)].

同様に収音バンド列Ｚ（ｋ，ｍ＊）から、収音バンドパワーベクトルＰＺＶｋ＝[ＰＺ（ｋ，ｍ１），ＰＺ（ｋ，ｍ２），‥‥，ＰＺ（ｋ，ｍｗ）]が計算される。
相関計算部５２は、フレーム内の集約バンド同士の相関値ＣＸＺ（ｋ）をＣＸＺ（ｋ）＝（ＰＸＶｋ・ＰＺＶｋ）/｜ＰＸＶｋ｜｜ＰＺＶｋ）｜で計算する。
このように離散周波数領域信号を分割バンド毎に平均化してフレーム間の相関を取ることで、ダブルトーク状態の判定動作の安定化を図ることができる。
なお
、離散周波数領域信号のパワーベクトル同士の相関を取る例を説明したが、離散周波数領域信号の包絡を求めた後に、周波数領域毎の包絡の代表値を求め、その代表値の相関値を求める方法も考えられる。その例を実施例３に示す。 Similarly, the sound collection band power vector PZVk = [PZ (k, m1), PZ (k, m2),..., PZ (k, mw)] is calculated from the sound collection band sequence Z (k, m *). The
The correlation calculation unit 52 calculates the correlation value CXZ (k) between the aggregated bands in the frame by CXZ (k) = (PXVk · PZVk) / | PXVk || PZVk) |.
In this way, by averaging the discrete frequency domain signals for each divided band and obtaining the correlation between the frames, it is possible to stabilize the determination operation of the double talk state.
In addition, although the example which takes the correlation of the power vectors of a discrete frequency domain signal was demonstrated, after calculating | requiring the envelope of a discrete frequency domain signal, the representative value of the envelope for every frequency domain is calculated | required, and the correlation value of the representative value is calculated | required A method is also conceivable. An example is shown in Example 3.

実施例３のダブルトーク状態判定手段６０の機能構成例を図６に示す。
実施例３のダブルトーク状態判定手段６０は、図５に示した実施例２のダブルトーク状態判定手段５０に対して、受話信号スペクトル包絡計算部６２が追加され、受話信号バンドパワー計算部６４の機能構成が変わっている点のみが異なる。ここでは、その異なる点のみを説明する。
受話信号周波数分析部５６で周波数領域の信号に変換された離散周波数領域受話信号Ｘ（ｋ，ｆ＊）は、受話信号スペクトル包絡計算部６２に入力される。受話信号スペクトル包絡計算部６２は、例えばケプストラム分析を用いたスペクトル包絡や、線形予測ケプストラム分析を用いたスペクトル包絡や、線形予測を用いたスペクトル包絡を計算する。 A functional configuration example of the double talk state determination means 60 of the third embodiment is shown in FIG.
The double talk state determination unit 60 according to the third embodiment has a received signal spectrum envelope calculation unit 62 added to the double talk state determination unit 50 according to the second embodiment shown in FIG. The only difference is that the functional configuration has changed. Here, only the different points will be described.
The discrete frequency domain received signal X (k, f *) converted into the frequency domain signal by the received signal frequency analyzing unit 56 is input to the received signal spectrum envelope calculating unit 62. The received signal spectrum envelope calculation unit 62 calculates, for example, a spectrum envelope using cepstrum analysis, a spectrum envelope using linear prediction cepstrum analysis, and a spectrum envelope using linear prediction.

ケプストラム分析を用いたスペクトル包絡の求め方は、例えば参考文献「デジタル音声処理、著者：古井貞煕、東海大学出版会、４５頁」等に記載されている。
受話信号スペクトル包絡計算部６２で計算されたスペクトル包絡の一例を、模式的に図７に示す。図７の横軸は周波数であり縦軸はスペクトルの振幅を表わす。このスペクトル包絡が受話信号バンドパワー計算部６４を構成する周波数バンド毎受話代表値生成部６４１に入力される。
周波数バンド毎受話代表値生成部６４１は、図７に●で示す代表値を上記した集約バンドｍｉ（ｉ＝１〜ｗ）毎に計算する。集約バンドｍｉを受話包絡バンド列と称す。周波数バンド毎受話代表値生成部６４１で計算された受話包絡バンド列毎の代表スペクトルは、例えば受話包絡バンド列毎の平均値であり、受話代表値パワー計算部６４２でパワーに計算される。受話側、収音側の代表値パワーベクトルは相関計算部５２に入力される。相関値の求め方は上記したと同様な方法で求めることが出来る。
なお、相関値は上記代表スペクトルの振幅値を用いて求めてもよい。 A method for obtaining a spectral envelope using cepstrum analysis is described in, for example, a reference document “Digital Speech Processing, Author: Sadaaki Furui, Tokai University Press, page 45”.
An example of the spectrum envelope calculated by the received signal spectrum envelope calculation unit 62 is schematically shown in FIG. In FIG. 7, the horizontal axis represents frequency, and the vertical axis represents spectrum amplitude. This spectrum envelope is input to the reception representative value generation unit 641 for each frequency band constituting the reception signal band power calculation unit 64.
The reception representative value generating unit 641 for each frequency band calculates the representative value indicated by ● in FIG. 7 for each aggregate band mi (i = 1 to w) described above. Aggregation band mi is referred to as an incoming envelope band sequence. The representative spectrum for each reception envelope band sequence calculated by the reception representative value generation unit 641 for each frequency band is, for example, an average value for each reception envelope band sequence, and is calculated into power by the reception representative value power calculation unit 642. The representative value power vectors on the reception side and the sound collection side are input to the correlation calculation unit 52. The method for obtaining the correlation value can be obtained by the same method as described above.
The correlation value may be obtained using the amplitude value of the representative spectrum.

例えば線形予測ケプストラム分析を用いたスペクトル包絡は、スペクトルのピークが強調される（参考文献４８頁）ので、音声の相関が取り易くなる効果が期待できる。
以上述べたように、相関は、スペクトルの振幅値、バンド毎の平均値、スペクトル包絡の包絡バンド毎の代表振幅値、スペクトル包絡の包絡バンド毎のパワー値の何れで計算してもよい。 For example, the spectrum envelope using the linear prediction cepstrum analysis emphasizes the peak of the spectrum (see reference page 48), so that an effect of facilitating the correlation of speech can be expected.
As described above, the correlation may be calculated by any of the spectrum amplitude value, the average value for each band, the representative amplitude value for each envelope band of the spectrum envelope, and the power value for each envelope band of the spectrum envelope.

他のダブルトーク状態判定部７０の機能構成例を図１１に示し、その動作の主要な処理の流れを図１２に示す。図１２は図３のステップＳ２２以降の流れを示す。ダブルトーク状態判定部７０は、実施例１で説明した発話状態に基づく判定と、実施例２及び３で説明した相関値に基づいた判定とを組み合わせてダブルトーク状態の判定を行なう。
ダブルトーク状態判定部７０は、第１状態判定部７２と第２状態判定部７４と受話状態判定部７６とからなり、相関値ＣＸＺ（ｋ）と、受話発話カウンター３０の計数値ｃｘ（ｋ）と、収音発話カウンター３０’の計数値ｃｚ（ｋ）とが入力される。 FIG. 11 shows a functional configuration example of another double talk state determination unit 70, and FIG. 12 shows a main processing flow of the operation. FIG. 12 shows the flow after step S22 of FIG. The double talk state determination unit 70 determines the double talk state by combining the determination based on the speech state described in the first embodiment and the determination based on the correlation value described in the second and third embodiments.
The double talk state determination unit 70 includes a first state determination unit 72, a second state determination unit 74, and a reception state determination unit 76. The correlation value CXZ (k) and the count value cx (k) of the reception speech counter 30 are included. And the count value cz (k) of the sound collection utterance counter 30 ′ is input.

ダブルトーク状態判定部７０は、受話発話カウンター３０の計数値ｃｘ（ｋ）が０より大で、且つ収音発話カウンター３０’の計数値ｃｚ（ｋ）も０より大の場合（図３のステップＳ２２でＹｅｓと判定された場合）に、動作を開始する。
第１状態判定部７２では、発話状態に基づく判定を行う。受話発話カウンター３０の計数値ｃｘ（ｋ）から収音発話カウンター３０’の計数値ｃｚ（ｋ）を引き算した値が負の場合、ダブルトーク状態と判定する（ステップＳ４０、Ｙｅｓ）。ここで、ｃｘ（ｋ）とｃｚ（ｋ）の小さい方の値が閾値ｔｈｄ以下の場合（ステップＳ４２、Ｙｅｓ）には、第１状態判定部７２内の重み付け部７２ａがダブルトークスコアｓｄ１（ｋ）＝ｓ１とする（ステップＳ４４）。ｃｘ（ｋ）とｃｚ（ｋ）の小さい方の値が閾値ｔｈｄより大きい場合には（ステップＳ４２、Ｎｏ）、第１状態判定部７２内の重み付け部７２ａは、ダブルトークスコアｓｄ１＝ｓ２とする（ステップＳ４６）。例えば、ｓ１＝５、ｓ２＝０とする。 When the count value cx (k) of the reception utterance counter 30 is larger than 0 and the count value cz (k) of the sound collection utterance counter 30 ′ is also larger than 0 (step in FIG. 3). The operation is started when it is determined Yes in S22).
The first state determination unit 72 performs determination based on the utterance state. When the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is negative, it is determined as a double talk state (step S40, Yes). Here, when the smaller value of cx (k) and cz (k) is equal to or smaller than the threshold thd (step S42, Yes), the weighting unit 72a in the first state determination unit 72 performs the double talk score sd1 (k ) = S1 (step S44). When the smaller value of cx (k) and cz (k) is larger than the threshold thd (No in step S42), the weighting unit 72a in the first state determination unit 72 sets the double talk score sd1 = s2. (Step S46). For example, s1 = 5 and s2 = 0.

受話発話カウンター３０の計数値ｃｘ（ｋ）から収音発話カウンター３０’の計数値ｃｚ（ｋ）を引き算した値が、０以上の場合には受話シングルトーク状態と判定する（ステップＳ４０、Ｎｏ）。ここで、ｃｘ（ｋ）とｃｚ（ｋ）の小さい方の値が閾値ｔｈｄ以下の場合（ステップＳ４８、Ｙｅｓ）には、第１状態判定部７２内の重み付け部７２ａがダブルトークスコアｓｄ１（ｋ）＝ｓ３とする（ステップＳ５０）。ｃｘ（ｋ）とｃｚ（ｋ）の小さい方の値が閾値ｔｈｄより大きい場合には（ステップＳ４８、Ｎｏ）、第１状態判定部７２内の重み付け部７２ａは、ダブルトークスコアｓｄ１＝ｓ４とする（ステップＳ５２）。例えば、ｓ３＝−５、ｓ４＝０とする。 When the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is 0 or more, it is determined that the received single talk state is set (No in step S40). . Here, when the smaller value of cx (k) and cz (k) is equal to or smaller than the threshold thd (step S48, Yes), the weighting unit 72a in the first state determination unit 72 performs the double talk score sd1 (k ) = S3 (step S50). When the smaller value of cx (k) and cz (k) is larger than the threshold thd (No in step S48), the weighting unit 72a in the first state determination unit 72 sets the double talk score sd1 = s4. (Step S52). For example, s3 = −5 and s4 = 0.

第２状態判定部７４では相関値ＣＸＺ（ｋ）に基づいた判定を行う。相関値ＣＸＺ（ｋ）が閾値ｔｈｃ以上の場合は、相関が高く、受話シングルトーク状態と判定する（ステップＳ５４、Ｙｅｓ）。ここで、ｃｘ（ｋ）とｃｚ（ｋ）の小さい方の値が閾値ｔｈｄ以下の場合（ステップＳ５６、Ｙｅｓ）には、第２状態判定部７４内の重み付け部７４ａがダブルトークスコアｓｄ２（ｋ）＝ｓ５とする（ステップＳ５８）。ｃｘ（ｋ）とｃｚ（ｋ）の小さい方の値が閾値ｔｈｄより大きい場合には（ステップＳ５６、Ｎｏ）、第２状態判定部７４内の重み付け部７４ａは、ダブルトークスコアｓｄ２＝ｓ６とする（ステップＳ６０）。例えば、ｓ５＝−２、ｓ６＝−５とする。 The second state determination unit 74 performs determination based on the correlation value CXZ (k). When the correlation value CXZ (k) is greater than or equal to the threshold thc, the correlation is high and it is determined that the received single talk state is present (step S54, Yes). Here, when the smaller value of cx (k) and cz (k) is equal to or smaller than the threshold thd (step S56, Yes), the weighting unit 74a in the second state determination unit 74 performs the double talk score sd2 (k ) = S5 (step S58). When the smaller value of cx (k) and cz (k) is larger than the threshold thd (No in step S56), the weighting unit 74a in the second state determination unit 74 sets the double talk score sd2 = s6. (Step S60). For example, s5 = −2 and s6 = −5.

相関値ＣＸＺ（ｋ）が閾値ｔｈｃよりも小さい場合、ダブルトーク状態と判定する（ステップＳ５４、Ｎｏ）。ここで、ｃｘ（ｋ）とｃｚ（ｋ）の小さい方の閾値ｔｈｄ以下の場合には、（ステップＳ６２、Ｙｅｓ）には、第２状態判定部７４内の重み付け部７４ａがダブルトークスコアｓｄ２（ｋ）＝ｓ７とする（ステップＳ６４）。ｃｘ（ｋ）とｃｚ（ｋ）の小さい方の値が閾値ｔｈｄより大きい場合には（ステップＳ６２、Ｎｏ）、第２状態判定部７４内の重み付け部７４ａは、ダブルトークスコアｓｄ２＝ｓ８とする（ステップＳ６６）。例えば、ｓ７＝２、ｓ８＝５とする。ここで、相関値ＣＸＺ（ｋ）に対する閾値を複数設けて、ダブルトークスコアｓｄ２（ｋ）を更に細かく重み付けしてもよい。 When the correlation value CXZ (k) is smaller than the threshold value thc, it is determined as a double talk state (No in step S54). Here, when the threshold value thd of cx (k) and cz (k), which is smaller, is equal to or smaller than (Yes in step S62), the weighting unit 74a in the second state determination unit 74 uses the double talk score sd2 ( k) = s7 (step S64). When the smaller value of cx (k) and cz (k) is larger than the threshold thd (No in step S62), the weighting unit 74a in the second state determination unit 74 sets the double talk score sd2 = s8. (Step S66). For example, s7 = 2 and s8 = 5. Here, a plurality of threshold values for the correlation value CXZ (k) may be provided, and the double talk score sd2 (k) may be weighted more finely.

受話状態決定部７６には、第１状態判定部７２によるダブルトークスコアｓｄ１（ｋ）と、第２状態判定部７４によるダブルトークスコアｓｄ２（ｋ）とが入力される。ｓｄ１（ｋ）とｓｄ２（ｋ）は、受話状態決定部７６内のスコア加算部７６ａで合計され、総合ダブルトークスコアｓｄａが計算される。
総合ダブルトークスコアｓｄａとダブルトークスコア閾値ｔｈｓとを比較し、ｓｄａ（ｋ）が閾値ｔｈｓ以上の場合には、ダブルトーク状態と判定（ステップＳ７０、Ｙｅｓ）し、ダブルトーク判定フラグｆｄ（ｋ）＝１として損失量決定手段９４に出力する（ステップＳ７２）。ｓｄａ（ｋ）が閾値ｔｈｓより小さい場合には、受話シングルトーク状態と判定してｆｄ（ｋ）＝０として損失量決定手段９４に出力する（ステップＳ７４）。 The reception state determination unit 76 receives the double talk score sd1 (k) from the first state determination unit 72 and the double talk score sd2 (k) from the second state determination unit 74. sd1 (k) and sd2 (k) are summed by the score adding unit 76a in the reception state determining unit 76, and the total double talk score sda is calculated.
The total double talk score sda and the double talk score threshold value ths are compared. If sda (k) is equal to or greater than the threshold value ths, it is determined as a double talk state (step S70, Yes), and a double talk determination flag fd (k). = 1 is output to the loss determining means 94 (step S72). If sda (k) is smaller than the threshold value ths, it is determined that the received single talk state is present, and fd (k) = 0 is output to the loss determining means 94 (step S74).

以上述べたようにダブルトーク状態の判定を、第１状態判定部７２の計数値による判定と、第２状態判定部７４の相関値による判定とを組み合わせて行なってもよい。
シミュレーション結果
実施例１のダブルトーク状態判定手段におけるダブルトーク状態の判定結果をシミュレーションした結果を図８に示す。
「シミュレーション条件
マイクロホンとスピーカとの間隔：２０ｃｍ（条件１はこの間隔に固定、条件２はマイクロホンをスピーカから１ｍ程度離して動かし続けた状態とした。）
サンプリングレート：１６ｋＨｚ
フレーム：２０ｍｓ」
条件１は、上記したようにマイクロホンとスピーカの位置を固定し、反響消去部８０が反響路６のインパルス応答を充分に学習した状態である。条件１における受話信号ｘ（ｎ）を図８（ａ）に、収音信号ｚ（ｎ）を図８（ｂ）に、誤差信号ｅ（ｎ）図８（ｃ）に示す。 As described above, the determination of the double talk state may be performed by combining the determination based on the count value of the first state determination unit 72 and the determination based on the correlation value of the second state determination unit 74.
Simulation Results FIG. 8 shows the result of simulating the determination result of the double talk state in the double talk state determination means of the first embodiment.
“Simulation condition: Distance between microphone and speaker: 20 cm (Condition 1 is fixed at this distance, Condition 2 is a state in which the microphone is kept moving about 1 m away from the speaker.)
Sampling rate: 16 kHz
Frame: 20ms "
Condition 1 is a state in which the positions of the microphone and the speaker are fixed as described above, and the echo canceling unit 80 has sufficiently learned the impulse response of the echo path 6. The received signal x (n) under condition 1 is shown in FIG. 8A, the collected sound signal z (n) is shown in FIG. 8B, and the error signal e (n) is shown in FIG. 8C.

図８の横軸は時間（秒）であり、縦軸は正規化した振幅である。
反響消去部８０がエコー信号ｙ（ｎ）を充分消去しているので、誤差信号ｅ（ｎ）が小さい。
この条件１の状態からマイクロホンの位置をランダムに動かし続けた条件２における受話信号ｘ（ｎ）を図９（ａ）に、収音信号ｚ（ｎ）を図９（ｂ）に、誤差信号ｅ（ｎ）図９（ｃ）に示す。横軸と縦軸の関係は図８と同じである。
条件２では、反響消去部８０の推定手段８２が適応アルゴリズムにより生成する適応フィルタ係数ｈ＾（ｎ）が、反響路６の変化に追従することが出来ないので誤差信号ｅ（ｎ）が大きい。 The horizontal axis in FIG. 8 is time (seconds), and the vertical axis is normalized amplitude.
Since the echo canceling unit 80 sufficiently cancels the echo signal y (n), the error signal e (n) is small.
The received signal x (n) in condition 2 in which the microphone position is continuously moved from the condition 1 condition is shown in FIG. 9A, the collected signal z (n) is shown in FIG. 9B, and the error signal e (N) As shown in FIG. The relationship between the horizontal axis and the vertical axis is the same as in FIG.
Under condition 2, the error signal e (n) is large because the adaptive filter coefficient ＾ (n) generated by the estimation means 82 of the echo canceling unit 80 by the adaptive algorithm cannot follow the change in the echo path 6.

条件２における従来方法と実施例１の判定結果を、図１０に示す。図１０（ａ）にシミュレーション条件を示す。図１０（ｂ）は誤差信号ｅ（ｎ）を判定に用いる従来方法による判定結果である。図１０（ｃ）は実施例１による判定結果である。
図１０の横軸はフレームナンバー（フレーム数）であり、この例では１フレームが２０ｍｓであるので０〜１０秒間を表わす。縦軸は判定結果を示し、０：無音区間、１：受話シングルトーク状態、２：ダブルトーク状態を表わす。
図１０（ａ）から分かるようにダブルトーク状態は無い状況でシミュレーションしているのにも関わらず、従来方法では１秒、２秒、３秒、４秒付近を中心に、誤判定が発生する。 FIG. 10 shows the determination results of the conventional method and Example 1 under Condition 2. FIG. 10A shows simulation conditions. FIG. 10B shows a determination result by a conventional method using the error signal e (n) for determination. FIG. 10C shows a determination result according to the first embodiment.
The horizontal axis of FIG. 10 is the frame number (the number of frames). In this example, one frame is 20 ms, so it represents 0 to 10 seconds. The vertical axis represents the determination result, and represents 0: silent interval, 1: received single talk state, 2: double talk state.
As can be seen from FIG. 10 (a), in spite of the simulation in a situation where there is no double talk state, the conventional method causes misjudgment around 1 second, 2 seconds, 3 seconds and 4 seconds. .

その従来方法に対して実施例１の結果は、シミュレーション条件と同じ判定結果が得られている。
このように実施例１のダブルトーク状態判定手段を用いたダブルトーク状態判定装置によれば、受話信号ｘ（ｎ）と収音信号ｚ（ｎ）のみを用いてダブルトーク状態の判定を行なうので、精度の良い判定が可能である。
以上の各実施例の他、この発明である各装置及び方法は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記装置及び方法において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 Compared to the conventional method, the result of Example 1 is the same determination result as the simulation condition.
As described above, according to the double talk state determination device using the double talk state determination unit of the first embodiment, the determination of the double talk state is performed using only the received signal x (n) and the collected sound signal z (n). It is possible to make a determination with high accuracy.
In addition to the above embodiments, each apparatus and method according to the present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the spirit of the present invention. Further, the processes described in the above apparatus and method are not only executed in time series according to the order of description, but also may be executed in parallel or individually as required by the processing capability of the apparatus that executes the process. Good.

また、上記各装置における処理機能をコンピュータによって実現する場合、反響消去装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記反響消去装置における処理機能がコンピュータ上で実現される。
この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 Further, when the processing functions in the above devices are realized by a computer, the processing contents of the functions that the echo canceling device should have are described by a program. By executing this program on a computer, the processing function of the echo canceling apparatus is realized on the computer.
The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。
このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記録装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、この形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.
A computer that executes such a program, for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer in its recording device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to a computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, each apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

この発明の実施例１のダブルトーク状態判定手段２０を含む反響消去装置１００の機能構成例を示す図。The figure which shows the function structural example of the echo cancellation apparatus 100 containing the double talk state determination means 20 of Example 1 of this invention. この発明の実施例１のダブルトーク状態判定手段２０の機能構成例を示す図。The figure which shows the function structural example of the double talk state determination means 20 of Example 1 of this invention. ダブルトーク状態判定手段２０の主要な処理の流れを示す図。The figure which shows the flow of the main processes of the double talk state determination means 20. ダブルトーク状態判定部３２の動作のタイムチャートの一例を示す、図４（ａ）に受話シングルトーク状態とダブルトーク状態を示す、図４（ｂ）ダブルトーク状態を誤判定する状態を示す図である。FIG. 4A shows an example of a time chart of the operation of the double talk state determination unit 32, FIG. 4A shows a received single talk state and a double talk state, and FIG. 4B shows a state in which the double talk state is erroneously determined. is there. この発明の実施例２のダブルトーク状態判定手段５０の機能構成例を示す図。The figure which shows the function structural example of the double talk state determination means 50 of Example 2 of this invention. この発明の実施例３のダブルトーク状態判定手段６０の機能構成例を示す図。The figure which shows the function structural example of the double talk state determination means 60 of Example 3 of this invention. 受話信号スペクトル包絡計算部６２で計算されたスペクトル包絡の一例を、模式的に示す図。The figure which shows typically an example of the spectrum envelope calculated by the received signal spectrum envelope calculation part 62. FIG. 実施例１のダブルトーク状態判定手段におけるダブルトーク状態の判定結果をシミュレーションした条件１の結果を示す、図８（ａ）は受話信号ｘ（ｎ）、図８（ｂ）は、収音信号ｚ（ｎ）、図８（ｃ）は誤差信号ｅ（ｎ）を示す図である。FIGS. 8A and 8B show the result of Condition 1 in which the determination result of the double talk state in the double talk state determination unit of the first embodiment is simulated. FIG. 8B shows the received signal x. (N) and FIG. 8 (c) are diagrams showing the error signal e (n). 実施例１のダブルトーク状態判定手段におけるダブルトーク状態の判定結果をシミュレーションした条件２の結果を示す、図９（ａ）は受話信号ｘ（ｎ）、図９（ｂ）は、収音信号ｚ（ｎ）、図９（ｃ）は誤差信号ｅ（ｎ）を示す図である。FIGS. 9A and 9B show the results of Condition 2 in which the determination result of the double talk state in the double talk state determination unit of the first embodiment is simulated. FIG. 9B shows the received signal x (n), and FIG. (N) and FIG. 9 (c) are diagrams showing the error signal e (n). ダブトーク状態の判定結果を示す、図１０（ａ）は正しい判定結果、図１０（ｂ）は誤差信号ｅ（ｎ）を判定に用いる従来方法による判定結果、図１０（ｃ）は実施例１のダブルトーク状態判定手段による判定結果を示す図である。FIG. 10A shows the determination result of the dub talk state, FIG. 10B shows the determination result by the conventional method using the error signal e (n) for the determination, and FIG. 10C shows the result of the first embodiment. It is a figure which shows the determination result by a double talk state determination means. ダブルトーク状態判定部７０の機能構成例を示す図。The figure which shows the function structural example of the double talk state determination part. ダブルトーク状態判定部７０の主要な処理の流れを示す図。The figure which shows the flow of the main processes of the double talk state determination part. 近端に配置された従来の反響消去装置８００を示す図。The figure which shows the conventional echo cancellation apparatus 800 arrange | positioned at the near end.

Claims

Judges whether it is a double talk state where a call signal is input simultaneously from the receiving end and the transmitting end, or a received single talk state where a call signal is not input to the transmitting end and a call signal is input only to the receiving end In the echo canceller equipped with the double talk state determination means to
For the discretized received signal x (n) and the collected sound signal z (n) input from the receiving end and the transmitting end, n is a natural number,
A reception utterance determination counting unit that counts the number of short-time frames in which utterances continue as compared to a threshold value xth as a reception index value for each short-time frame of the reception signal x (n);
A sound collection utterance determination counting unit that counts the number of short-time frames in which the sound collection is compared with a threshold zth as a sound collection index for each short-time frame of the sound collection signal z (n);
A double-talk state determination unit for determining whether a double-talk state or a received-single-talk state by comparing the respective counts obtained by the received-utterance determination-counting unit and the collected-speech determination unit;
An echo canceling device comprising:

The echo canceling device according to claim 1,
The received utterance determination counting means includes:
A reception index value calculation unit for obtaining a sum of absolute values of power or amplitude in the short time frame of each reception signal x (n) or an average value thereof as the reception index value;
A reception utterance determination unit that outputs a reception utterance flag if the reception index value is greater than the threshold xth or greater than or equal to the threshold xth;
A received speech counter that counts the number of consecutive frames cx (k) of the received speech flag output by the received speech determination unit,
The sound collection utterance determination counting means includes:
A sound collection index value calculation unit for obtaining a sum of absolute values of power or amplitude in the short time frame of each sound collection signal z (n) or an average value thereof as the sound collection index value;
A sound collection utterance determination unit that outputs a sound collection utterance flag if the sound collection index value is greater than the threshold zth or greater than or equal to the threshold zth;
A sound collection utterance counter that counts the number of consecutive frames cz (k) of the sound collection utterance flag output by the sound collection utterance determination unit,
The double talk state determination unit inputs the counted cx (k) and the counted cz (k). If cz (k) is larger than cx (k), the double talk state determination unit determines a double talk state, and the cz (k ) Is smaller than or equal to cx (k), it is determined that the received single talk state is present.

In the echo canceller according to claim 1,
A received signal frequency analyzer for converting the received signal x (n) into a discrete frequency domain signal for each short-time frame;
A sound collection signal frequency analysis unit that converts the sound collection signal z (n) into a discrete frequency domain signal for each short-time frame, and
A correlation unit that receives the discrete frequency domain received signal and the discrete frequency domain collected signal and calculates a correlation value for each of the short-time frames of the two signals;
With
The double talk state determination unit receives the counted cx (k), the counted cz (k), and the correlation value as inputs.
An echo canceling device for determining a double talk state.

In the echo canceller according to claim 3,
The correlation unit includes a received signal power calculation unit, a collected sound signal power calculation unit, and a correlation calculation unit,
The received signal power calculating unit includes an received discrete power calculating unit that calculates power of each frequency component of the discrete frequency domain received signal, and an received band dividing unit that divides the power of the discrete frequency domain received signal into a plurality of frequency bands. And a reception band averaging unit that averages the power of the discrete frequency domain reception signal for each band divided by the reception band division unit to obtain reception band average power,
The sound collection signal power calculation unit divides the power of the discrete frequency domain sound collection signal into a plurality of frequency bands, and a sound collection discrete power calculation unit that calculates the power of each frequency component of the discrete frequency domain sound collection signal A sound collection band dividing unit; and a sound collection band averaging unit that averages the power of the discrete frequency domain sound collection signal for each band divided by the sound collection band division unit to obtain the sound collection band average power. ,
The correlation calculation unit calculates the correlation value for each short-time frame of the reception band average power and the sound collection band average power to obtain the correlation value,
The double talk state determination unit receives the correlation value and the count values cx (k) and cz (k), and determines whether the smaller of cx (k) and cz (k) is greater than the threshold thd. The echo canceling apparatus is characterized in that the method for determining the double-talk state is switched depending on whether or not.

Determines whether the device is in a double-talk state where a signal is input simultaneously from the receiving end and the transmitting end, or in a receiving single-talk state where a call signal is not input to the transmitting end and a call signal is input only to the receiving end In the double talk state determination method,
For the discretized received signal x (n) and the collected sound signal z (n) input from the receiving end and the transmitting end,
A process of counting a state in which the reception index value for each short-time frame of the reception signal x (n) is larger than a threshold value xth;
A process of counting a state in which the sound collection index for each short-time frame of the sound collection signal z (n) is greater than a threshold value zth;
A step of determining whether the state is a double talk state or a received single talk state by comparing the respective count values obtained by the received speech determination unit and the collected speech determination unit. Talk state determination method.

In the double talk state judging method according to claim 5,
The process of determining whether the above-mentioned double talk state or received single talk state is
Obtaining a number cx (k) of continuous frames in which the reception index corresponding to the absolute value of the energy or amplitude of each received signal x (n) for each short-time frame is greater than the threshold value xth;
Obtaining a number of consecutive frames cz (k) in which the sound collection index corresponding to the absolute value of energy or amplitude for each short-time frame of the sound collection signal z (n) is greater than a threshold value zth;
A process of inputting the obtained cx (k) and cz (k) and determining a double talk state if cz (k) is larger than cx (k);
A method for determining a double talk state, comprising:

In the double talk state judging method according to claim 6,
The time domain received signal x (k) and the collected sound signal z (k) divided into the short-time frames are respectively converted into discrete frequency domain received signals X (k, f *) (* = 1, 2,..., N ) And a discrete frequency domain sound pickup signal Z (k, f *),
Obtaining a correlation value between the discrete frequency domain received signal X (k, f *) and the discrete frequency domain collected signal Z (k, f *) in the short time frame;
A determination process of determining a double talk state by inputting the count value cx (k) of the received speech counter, the count value cz (k) of the collected speech counter, and the correlation value;
A method for determining a double talk state, comprising:

In the double talk state judging method according to claim 6,
The time domain received signal x (k) and the collected sound signal z (k) divided into the short-time frames are respectively converted into discrete frequency domain received signals X (k, f *) (* = 1, 2,..., N ) And a discrete frequency domain sound pickup signal Z (k, f *),
Dividing the discrete frequency domain received signal X (k, f *) into a plurality of frequency bands to generate a received band sequence;
Dividing the discrete frequency domain sound collection signal Z (k, f *) into a plurality of frequency bands to generate a sound collection band sequence;
Obtaining a correlation value between the reception band sequence X (k, m *) and the sound collection band sequence Z (k, m *) in the short-time frame;
A determination process of determining a double talk state by inputting the count value cx (k) of the received speech counter, the count value cz (k) of the collected speech counter, and the correlation value;
A method for determining a double talk state, comprising:

An apparatus program for causing a computer to function as each apparatus according to claim 1.

A computer-readable recording medium on which any one of the programs according to claim 9 is recorded.