JP2008131593A - Method of deciding double talk state, echo eraser using same and its, program and recording medium therefore - Google Patents

Method of deciding double talk state, echo eraser using same and its, program and recording medium therefore Download PDF

Info

Publication number
JP2008131593A
JP2008131593A JP2006317578A JP2006317578A JP2008131593A JP 2008131593 A JP2008131593 A JP 2008131593A JP 2006317578 A JP2006317578 A JP 2006317578A JP 2006317578 A JP2006317578 A JP 2006317578A JP 2008131593 A JP2008131593 A JP 2008131593A
Authority
JP
Japan
Prior art keywords
signal
talk state
received
sound collection
double talk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2006317578A
Other languages
Japanese (ja)
Other versions
JP4542538B2 (en
Inventor
Kenichi Noguchi
賢一 野口
Suehiro Shimauchi
末廣 島内
Kenichi Furuya
賢一 古家
Akitoshi Kataoka
章俊 片岡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2006317578A priority Critical patent/JP4542538B2/en
Publication of JP2008131593A publication Critical patent/JP2008131593A/en
Application granted granted Critical
Publication of JP4542538B2 publication Critical patent/JP4542538B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To provide an echo eraser that decides a double talk state and a method of deciding it using only received signals x (n) and picked-up sound signals z (n), to provide its program, and to provide a recording medium therefore<SB></SB>. <P>SOLUTION: A double talk state deciding means includes a received and transmitted signals deciding and counting means, a picked-up sound and transmitted signals deciding and counting means, and a double talk state deciding portion. The received and transmitted signals deciding and counting means obtains continuous frame numbers whose received sound index every short time frame of the received signals is greater than a threshold value xth. The picked-up sound and transmitted signals deciding and counting means obtains the number of continuous frames whose sound pickup index for every short time frame of sound pickup signal is greater than a threshold values zth. The double-talk state deciding means decides whether the state is in the double talk or the single talk by a receiver only by comparing each size of the continuous frame numbers obtained by the received and transmitted signals deciding and counting means, and the picked-up sound and transmitted signals deciding and counting means. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、音声会議、TV会議等の拡声通話系で発生するスピーカからマイクロホンへ回り込む反響信号を抑圧する反響消去装置において、双方向で同時に発話が起こるダブルトーク状態を検出する方法、その方法を用いた反響消去装置、そのプログラム及びその記録媒体に関する。   The present invention relates to a method and a method for detecting a double talk state in which utterances occur simultaneously in both directions in an echo canceling apparatus that suppresses an echo signal that circulates from a speaker to a microphone that occurs in a speech communication system such as an audio conference and a TV conference The present invention relates to an echo canceller, a program thereof, and a recording medium thereof.

拡声通話系では、受話信号をスピーカを用いて再生し、発話信号をマイクロホンを用いて収音して相手側に送信する。この時、受話信号を再生するスピーカとマイクロホンとの音響結合により、スピーカから再生された信号がマイクロホンに回り込み、エコー信号としてマイクロホン入力信号に混入する。このエコー信号は、音響エコーとして通話品質の劣化を招く、また、ハウリング発生の原因にもなる。
反響路(音響)結合量推定方法及び推定した反響路結合量を利用した従来の反響消去装置の一つとして、特許文献1に開示されているものが知られている。この従来技術を図13を参照して説明する。スピーカ等の受話手段2とマイクロホン等の送話手段4を用いたハンズフリー拡声通話を行なう自分側を近端、図示しない通信路を挟んで会話を行なう相手側を遠端とする。図13は、近端に配置された反響消去装置800を示している。
In the voice call system, the received signal is reproduced using a speaker, and the uttered signal is collected using a microphone and transmitted to the other party. At this time, due to the acoustic coupling between the speaker that reproduces the received signal and the microphone, the signal reproduced from the speaker wraps around the microphone and is mixed into the microphone input signal as an echo signal. This echo signal causes deterioration of call quality as an acoustic echo and also causes a howling.
As one of conventional echo canceling apparatuses using an echo path (acoustic) coupling amount estimation method and an estimated echo path coupling amount, one disclosed in Patent Document 1 is known. This prior art will be described with reference to FIG. The own side that conducts a hands-free loudspeaking call using the receiving means 2 such as a speaker and the transmitting means 4 such as a microphone is the near end, and the other end that talks across a communication path (not shown) is the far end. FIG. 13 shows an echo canceller 800 placed at the near end.

反響消去装置800は、反響消去部80と損失制御部90とからなる。遠端の相手(話者)から通信路を経由して受話端8に受信された受話信号x(n)は、反響消去装置800を介して受話手段2から拡声音として空間に放射される。ここでnは離散時間を示す。その拡声音は反響路6を介して反響し、エコー信号y(n)として送話手段4に収音される。
送話手段4で収音された入力信号z(n)(以降、収音信号z(n)と称す)は、送話端9より反響消去部80に入力される。
The echo cancellation apparatus 800 includes an echo cancellation unit 80 and a loss control unit 90. The received signal x (n) received by the receiving end 8 via the communication path from the far end partner (speaker) is radiated to the space as a loud sound from the receiving means 2 via the echo canceller 800. Here, n indicates discrete time. The loud sound reverberates through the echo path 6 and is picked up by the transmission means 4 as an echo signal y (n).
The input signal z (n) collected by the transmission means 4 (hereinafter referred to as the collected sound signal z (n)) is input from the transmission end 9 to the echo canceling unit 80.

反響消去部80は、例えば受話信号x(n)と誤差信号e(n)(後述する)とから適応フィルタ係数h^(n)を生成する推定手段82と、受話信号x(n)と適応フィルタ係数h^(n)から擬似エコー信号y^(n)を生成する擬似反響路84と、収音信号z(n)から擬似エコーy^(n)を減算する減算器86からなる。
収音信号z(n)は、通話状態によってエコー信号y(n)と送話信号s(n)で構成される。ただし、受話信号x(n)がない場合には、同時にエコー信号y(n)が存在することはない。今、受話信号x(n)だけの場合の収音信号z(n)は、z(n)=y^(n)である。この場合に、擬似反響路84が生成する擬似エコー信号y^(n)が反響路6のインパルス応答を正しく模擬していれば、エコー信号y(n)に擬似エコー信号y^(n)が近づくので、減算器86の出力である誤差信号e(n)は小さくなる。
The echo canceling unit 80, for example, an estimation unit 82 that generates an adaptive filter coefficient ^ (n) from the received signal x (n) and the error signal e (n) (described later), and the received signal x (n) and the adaptive signal. It consists of a pseudo echo path 84 that generates a pseudo echo signal y ^ (n) from the filter coefficient h ^ (n), and a subtractor 86 that subtracts the pseudo echo y ^ (n) from the collected sound signal z (n).
The collected sound signal z (n) is composed of an echo signal y (n) and a transmission signal s (n) depending on the call state. However, when there is no received signal x (n), the echo signal y (n) does not exist at the same time. Now, the sound pickup signal z (n) in the case of only the reception signal x (n) is z (n) = y ^ (n). In this case, if the pseudo echo signal y ^ (n) generated by the pseudo echo path 84 correctly simulates the impulse response of the echo path 6, the pseudo echo signal y ^ (n) is added to the echo signal y (n). As it approaches, the error signal e (n), which is the output of the subtractor 86, becomes smaller.

このように誤差信号e(n)が小さくなるように適応アルゴリズムにより推定手段82が適応フィルタ係数h^(n)を生成する。代表的なアルゴリズムとしては、最小二乗法(LMS:Least-Mean-Squares)や学習同定法(NLMS:Normalized LMS)などが知られている。
減算器86の出力信号である誤差信号e(n)は、損失制御部90に入力される。
損失制御部90は、受話側と送話側で同時に発話するダブルトーク状態を判定するダブルトーク状態判定手段92と、受話端8と反響消去部80の間の通信路に挿入される損失器98と、反響消去部80と相手側への通信路に収音信号z(n)を出力する出力端7との間に挿入される損失器99と、ダブルトーク状態でないときの受話信号x(n)と収音信号z(n)とからの損失量を計算する損失量決定手段94と、それぞれの損失器98と99に受話信号x(n)と誤差信号e(n)とに基づき判定された通話状態に対応して損失量を挿入する損失量制御手段96とからなる。
Thus, the estimation means 82 generates the adaptive filter coefficient 係数 (n) by the adaptive algorithm so that the error signal e (n) becomes small. As typical algorithms, a least square method (LMS: Least-Mean-Squares) and a learning identification method (NLMS: Normalized LMS) are known.
An error signal e (n) that is an output signal of the subtractor 86 is input to the loss control unit 90.
The loss control unit 90 includes a double talk state determination unit 92 that determines a double talk state in which the receiving side and the transmitting side speak simultaneously, and a lossr 98 that is inserted into a communication path between the receiving end 8 and the echo canceling unit 80. And the loss device 99 inserted between the echo canceling unit 80 and the output terminal 7 that outputs the collected sound signal z (n) on the communication path to the other party, and the received signal x (n when not in the double talk state ) And the sound pickup signal z (n), the loss amount determining means 94 for calculating the loss amount, and the loss units 98 and 99 are determined based on the received signal x (n) and the error signal e (n). Loss amount control means 96 for inserting a loss amount corresponding to the call state.

ダブルトーク状態判定手段92には、受話信号x(n)と収音信号z(n)と誤差信号e(n)とが入力される。受話信号x(n)は短時間パワー値Px(k)計算部921で、収音信号z(n)は短時間パワー値Pz(k)計算部922で、誤差信号e(n)は短時間パワー値Pe(k)計算部923で、それぞれの短時間パワー値が計算される。kは短時間区間の番号を表わす。
受話信号xの短時間パワー値Px(k)は、受話区間判定部925に入力される。受話信号短時間パワー値Px(k)が閾値xthより大の場合、受話区間判定部925は受話信号x(n)中に発話信号ありと判定する。
The double-talk state determination unit 92 receives the received signal x (n), the collected sound signal z (n), and the error signal e (n). The received signal x (n) is a short-time power value Px (k) calculation unit 921, the collected sound signal z (n) is a short-time power value Pz (k) calculation unit 922, and the error signal e (n) is a short time. The power value Pe (k) calculation unit 923 calculates each short-time power value. k represents a short section number.
The short-time power value Px (k) of the reception signal x is input to the reception segment determination unit 925. When the received signal short-time power value Px (k) is larger than the threshold value xth, the received interval determination unit 925 determines that there is an utterance signal in the received signal x (n).

収音信号zの短時間パワー値Pz(k)は、閾値設定部927に入力される。エコーレベル設定部927では、1以下に設定された定数Thと短時間パワー値Pz(k)を乗算して(Th×Pz(k))、その結果を閾値として設定する。
受話区間判定部925の判定結果と、閾値と誤差信号の短時間パワー値Pe(k)とがダブルトーク状態判定部929に入力される。ダブルトーク状態判定部929は、受話信号短時間パワー値Px(k)が所定の閾値xthを超えた場合において、閾値よりも誤差信号短時間パワー値Pe(k)が小さければダブルトーク状態ではないと判定する(Px(k)>xthで且つ、Pe(k)<Th×Pz(k))。また同様に受話信号短時間パワー値Px(k)が所定の閾値xthを超えた場合において、閾値(Th×Pz(k))よりも、誤差信号短時間パワー値Pe(k)が大きければダブルトーク状態か又は反響路6が変化しているものと判定する(Px(k)>xthで且つ、Pe(k)>Th×Pz(k))。
The short-time power value Pz (k) of the collected sound signal z is input to the threshold setting unit 927. The echo level setting unit 927 multiplies the constant Th set to 1 or less by the short-time power value Pz (k) (Th × Pz (k)), and sets the result as a threshold value.
The determination result of the reception interval determination unit 925, the threshold value, and the short-time power value Pe (k) of the error signal are input to the double talk state determination unit 929. When the received signal short time power value Px (k) exceeds a predetermined threshold value xth, the double talk state determination unit 929 is not in the double talk state if the error signal short time power value Pe (k) is smaller than the threshold value. (Px (k)> xth and Pe (k) <Th × Pz (k)). Similarly, when the received signal short-time power value Px (k) exceeds a predetermined threshold value xth, if the error signal short-time power value Pe (k) is larger than the threshold value (Th × Pz (k)), it is doubled. It is determined that the talk state or the echo path 6 is changing (Px (k)> xth and Pe (k)> Th × Pz (k)).

このダブルトーク状態判定手段92の判定結果と、受話信号短時間パワー値Px(k)と収音信号短時間パワー値Pz(k)とが損失量決定手段94に入力される。
損失量決定手段94は、ダブルトーク状態ではない時に例えば収音信号短時間パワー値Pz(k)と受話信号短時間パワー値Px(k)とから反響路結合量(Pz(k)/Px(k))を求め、その逆数を損失量として決定する。その損失量は損失量制御手段96に入力される。
損失量制御手段96は、受話信号x(n)と誤差信号e(n)とを用いて送受話状態の判定を行なう。受話信号x(n)のみの通信状態と判定された場合、損失量制御手段96は送話側の損失器99に損失を挿入する。送話信号s(n)のみの通信状態と判断された場合、損失量制御手段96は受話側の損失器98に損失を挿入する。ダブルトーク状態の時には、送受話双方に損失は挿入されない。
The determination result of the double talk state determining means 92, the received signal short time power value Px (k), and the collected sound signal short time power value Pz (k) are input to the loss amount determining means 94.
The loss amount determining means 94, when not in the double talk state, for example, uses the collected sound signal short-time power value Pz (k) and the received signal short-time power value Px (k) to generate an echo path coupling amount (Pz (k) / Px ( k)), and the reciprocal thereof is determined as a loss amount. The loss amount is input to the loss amount control means 96.
The loss amount control means 96 determines the transmission / reception state using the reception signal x (n) and the error signal e (n). If it is determined that only the received signal x (n) is in the communication state, the loss amount control means 96 inserts a loss into the losser 99 on the transmission side. When it is determined that the communication state is only for the transmission signal s (n), the loss amount control means 96 inserts a loss into the losser 98 on the reception side. In the double talk state, no loss is inserted in both the transmission and reception.

以上のように動作することで遠端と近端を一巡する通信路(ループ)の利得が1を超えてハウリングが発生することを防止している。
特許第3268572号、図5
By operating as described above, it is possible to prevent howling from occurring when the gain of a communication path (loop) that makes a round between the far end and the near end exceeds 1.
Japanese Patent No. 3268572, FIG.

しかしながら従来の方法では、ダブルトーク状態ではない(すなわちシングルトーク状態)場合を判定し、その時にのみ反響結合量を決定する。ここで、適応アルゴリズムにより推定手段82が生成する適応フィルタ係数h^(n)に大きな誤差がある場合、ダブルトーク状態の判定を誤ることがある。ダブルトーク状態の判定を誤ると通話品質が劣化する。
シングルトーク状態であるのに、ダブルトーク状態と誤判定された場合には、損失器98と99に損失が挿入されないので、誤差信号e(n)がそのまま遠端側に送信されてしまう。ダブルトーク状態であった場合に受話シングルトーク状態と誤判定された場合には、送話側の損失器99に損失が挿入されるので発話に途切れが発生してしまう。
However, in the conventional method, a case where the state is not the double talk state (that is, the single talk state) is determined, and the echo coupling amount is determined only at that time. Here, when there is a large error in the adaptive filter coefficient ^ (n) generated by the estimation means 82 using the adaptive algorithm, the determination of the double talk state may be erroneous. If the determination of the double talk state is incorrect, the call quality deteriorates.
If the single talk state is erroneously determined as the double talk state, no loss is inserted into the loss units 98 and 99, and the error signal e (n) is transmitted to the far end side as it is. If it is erroneously determined to be the received single talk state when it is in the double talk state, a loss is inserted into the loss device 99 on the transmission side, so that the speech is interrupted.

このダブルトーク状態の誤判定は、上記したように推定手段82による適応フィルタ係数h^(n)に大きな誤差がある場合に起こり得る。その誤差は、例えば話者が移動したりマイクロホンの位置を移動させたりすることで、反響路6のインパルス応答が変化することで発生する。つまり、適応フィルタ係数h^(n)の学習が充分進んだ状況では、擬似反響路84が生成する擬似エコー信号y^(n)と、反響路6におけるエコー信号y(n)とは等しい。しかし、反響路6のインパルス応答が変化すると、新たな反響路6に基づくエコー信号y(n)が変化するので、擬似エコー信号y^(n)との間に大きな誤差が発生することがある。新たな適応フィルタ係数h^(n)を推定手段82が学習するのに例えば約2秒程度の時間が必要なので、この間にダブルトーク状態の誤判定が発生することがある。   This erroneous determination of the double talk state can occur when there is a large error in the adaptive filter coefficient ^ (n) by the estimation means 82 as described above. The error occurs when the impulse response of the echo path 6 changes due to, for example, the speaker moving or moving the position of the microphone. That is, in a situation where learning of the adaptive filter coefficient ^ (n) is sufficiently advanced, the pseudo echo signal ^ (n) generated by the pseudo echo path 84 and the echo signal y (n) in the echo path 6 are equal. However, since the echo signal y (n) based on the new echo path 6 changes when the impulse response of the echo path 6 changes, a large error may occur with the pseudo echo signal y ^ (n). . Since it takes about 2 seconds for the estimation means 82 to learn the new adaptive filter coefficient ^ (n), for example, an erroneous determination of the double talk state may occur during this time.

この誤判定を避けるには、適応フィルタ係数h^(n)による擬似エコーy^(n)を減算した誤差信号e(n)を用いないで判定すればよい。つまり、受話信x(n)と収音信号z(n)のみを用いてダブルトーク状態を判定することができればよい。
この発明は、このような課題に鑑みてなされたものであり、受話信号x(n)と収音信号z(n)のみを用いてダブルトーク状態を判定することができる反響消去装置及びその判定方法、そのプログラム、及びその記録媒体を提供することを目的とする。
In order to avoid this erroneous determination, the determination may be made without using the error signal e (n) obtained by subtracting the pseudo echo ^ (n) from the adaptive filter coefficient ^ (n). That is, it is sufficient that the double talk state can be determined using only the reception signal x (n) and the collected sound signal z (n).
The present invention has been made in view of such a problem, and an echo canceller capable of determining a double talk state using only the received signal x (n) and the sound pickup signal z (n) and the determination thereof. It is an object to provide a method, a program thereof, and a recording medium thereof.

この発明による反響消去装置は、受話端と送話端から同時に信号が入力される同時通話であるダブルトーク状態か、受話のみの受話シングルトーク状態かを判定するダブルトーク状態判定手段を含み、
上記受話端と送話端から入力される離散化された受話信号x(n)と収音信号z(n)に対して、nは自然数、
上記ダブルトーク状態判定手段は、受話発話判定計数手段と収音発話判定計数手段とダブルトーク状態判定部とで構成され、
受話発話判定計数手段は、受話信号の短時間フレーム毎の受話指標値を、閾値xthと比較して発話の連続するフレーム数を求め、
収音発話判定計数手段は、収音信号の短時間フレーム毎の収音指標値を、閾値zthと比較して発話の連続するフレーム数を求め、
ダブルトーク状態判定部は、受話発話判定計数手段と収音発話判定計数手段で求められたそれぞれの連続するフレーム数の大小比較により、ダブルトーク状態か受話のみの受話シングルトーク状態かを判定するものである。
The echo canceling apparatus according to the present invention includes a double talk state determining means for determining whether a double talk state is a simultaneous call in which a signal is simultaneously input from the receiving end and the transmitting end, or a receiving single talk state for receiving only,
For the discretized received signal x (n) and the collected sound signal z (n) input from the receiving end and the transmitting end, n is a natural number,
The double talk state determination means includes a received utterance determination counting means, a collected utterance determination counting means, and a double talk state determination unit.
The received utterance determination counting means compares the reception index value for each short time frame of the received signal with the threshold value xth to obtain the number of consecutive frames of utterance,
The sound collection utterance determination counting means obtains the number of consecutive frames of speech by comparing the sound collection index value for each short time frame of the sound collection signal with the threshold value zth,
The double-talk state determination unit determines whether the state is a double-talk state or a reception-only single-talk state by comparing the number of consecutive frames obtained by the received-speech determination counting unit and the collected-speech determination unit. It is.

この発明の反響消去装置によれば、上記短時間フレーム毎に通話状態を判断することができ、受話信号x(n)と収音信号z(n)のみを用いてダブルトーク状態の判定を正しく行なうことが可能である。これにより、擬似反響路の推定に大きな誤差がある場合でも、送話信号の途切れをなくし、エコー信号の送信を抑えることができる。   According to the echo canceling apparatus of the present invention, it is possible to determine the call state for each short-time frame, and to correctly determine the double talk state using only the received signal x (n) and the collected sound signal z (n). It is possible to do. Thereby, even when there is a large error in the estimation of the pseudo echo path, it is possible to eliminate the interruption of the transmission signal and suppress the transmission of the echo signal.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。   Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

この発明の実施例1のダブルトーク状態判定手段20を含む反響消去装置100の機能構成例を図1に示し、図1中のダブルトーク状態判定手段20の実施例1の機能構成例を図2に示す。実施例1の反響消去装置100は、先に図11に示した従来の反響消去装置800と、ダブルトーク状態判定手段20のみが異なり、その他の各部の動作は同じ場合を例としている。また、図1も自分側の近端の反響消去装置100を示す点は同じである。したがって、ここではダブルトーク状態判定手段20の動作について説明する。ダブルトーク状態判定手段20の主要な処理の流れを図3に示す。   FIG. 1 shows a functional configuration example of the echo canceling apparatus 100 including the double talk state determination means 20 of the first embodiment of the present invention, and FIG. 2 shows a functional configuration example of the first embodiment of the double talk state determination means 20 in FIG. Shown in The echo canceling apparatus 100 of the first embodiment is different from the conventional echo canceling apparatus 800 shown in FIG. 11 only in the double talk state determination means 20, and the operation of other parts is the same. Also, FIG. 1 is the same in that it shows a near-end echo canceling apparatus 100 on its own side. Therefore, the operation of the double talk state determination means 20 will be described here. The main processing flow of the double talk state determination means 20 is shown in FIG.

ダブルトーク状態判定手段20には、受話信号x(n)と送話手段4の出力信号である収音信号z(n)が入力される。(n)は、例えば16kHzでサンプリングされた離散化された信号であることを表わす自然数である。
ダブルトーク状態判定手段20は、受話発話判定計数手段22と、収音発話判定計数手段24とダブルトーク状態判定部32とから構成される。
受話信号x(n)が入力される受話発話判定計数手段22と、収音信号z(n)が入力される収音発話判定計数手段24とは、基本的な構成が同じである。そこで、受話発話判定計数手段22の動作を中心に説明を行い。異なる部分については、追記して説明する。
The double talk state determination means 20 receives the received signal x (n) and the sound pickup signal z (n) that is the output signal of the transmission means 4. (N) is a natural number representing a discretized signal sampled at 16 kHz, for example.
The double talk state determination unit 20 includes a received utterance determination counting unit 22, a collected utterance determination counting unit 24, and a double talk state determination unit 32.
The received utterance determination counting means 22 to which the received signal x (n) is input and the collected voice utterance determination counting means 24 to which the collected sound signal z (n) is input have the same basic configuration. Therefore, the operation of the received utterance determination counting unit 22 will be mainly described. Different parts will be described later.

受話信号x(n)が入力される受話発話判定計数手段22は、受話信号x(n)を短時間フレーム(後述する)に分割するフレーム分割部26と、短時間フレーム毎のエネルギー又は振幅の絶対値に対応する受話指標値を計算する受話指標値計算部28と、受話指標値が閾値xthより大であるフレームを判定する受話発話判定部29と、受話指標が閾値xthよりも大である連続するフレームを計数する受話発話カウンター30とからなる。
例えば16kHzでサンプリングされた受話信号x(n)は、フレーム分割部26において短時間フレームに分割される。短時間フレームの分割は、例えば受話信号x(n)を160個数えたら計数値を1累進するx(n)カウンター261で行なわれる。つまり、x(n)カウンター261の計数値kによって受話信号x(n)は、短時間フレームに分割される。この例の場合、1フレームの長さは10msである。
The received utterance determination unit 22 to which the received signal x (n) is input includes a frame dividing unit 26 that divides the received signal x (n) into short frames (described later), and energy or amplitude of each short frame. A reception index value calculation unit 28 that calculates a reception index value corresponding to the absolute value, a reception utterance determination unit 29 that determines a frame whose reception index value is larger than the threshold value xth, and a reception index that is larger than the threshold value xth. The received speech counter 30 counts consecutive frames.
For example, the received signal x (n) sampled at 16 kHz is divided into short frames by the frame dividing unit 26. The short-time frame division is performed by, for example, an x (n) counter 261 that increments the count value by 1 when 160 received signals x (n) are counted. That is, the received signal x (n) is divided into short frames by the count value k of the x (n) counter 261. In this example, the length of one frame is 10 ms.

フレーム分割部26は、x(n)カウンター261の計数値kと受話信号x(n)を受話指標値計算部28に出力する(図3、ステップS10)。
受話指標値計算部28は、同一の短時間フレーム内の受話信号x(n)のパワーの累積をパワー計算部281で求め、その短時間フレーム内パワーを平均化部283で平均することで当該短時間フレーム内のパワーの平均値Pxm(k)を求める(ステップS12)。求めた当該短時間フレーム内のパワーの平均値Pxm(k)は受話発話判定部29に入力される。その短時間フレーム内のパワーの平均値Pxm(k)と、レジスタ292内の実験的に求める閾値xthとを、比較部291で比較し、パワーの平均値が閾値xth以上の場合(ステップS14がYes)、つまり発話ありの場合、比較部291は受話信号発話フラグfx(k)=1を受話発話カウンター30に出力する。パワーの平均値が閾値xthより小さい発話なしの場合、比較部291は受話信号発話フラグfx(k)=0を受話発話カウンター30に出力する。短時間フレーム内のパワーの平均値Pxm(k)が、レジスタ292内の閾値xthよりも大(ステップS14の不等号が>でもよい)の時に発話ありとしてもよい。
The frame division unit 26 outputs the count value k of the x (n) counter 261 and the reception signal x (n) to the reception index value calculation unit 28 (FIG. 3, step S10).
The reception index value calculation unit 28 calculates the accumulated power of the reception signal x (n) in the same short time frame by the power calculation unit 281 and averages the short time frame power by the averaging unit 283. An average value Pxm (k) of power in the short-time frame is obtained (step S12). The obtained average power value Pxm (k) in the short-time frame is input to the received utterance determination unit 29. The comparison unit 291 compares the average power value Pxm (k) in the short-time frame with the experimentally obtained threshold value xth in the register 292. If the average power value is greater than or equal to the threshold value xth (step S14) Yes), that is, when there is an utterance, the comparison unit 291 outputs the received signal utterance flag fx (k) = 1 to the received utterance counter 30. When there is no utterance whose average power is smaller than the threshold value xth, the comparison unit 291 outputs the received signal utterance flag fx (k) = 0 to the received utterance counter 30. An utterance may be given when the average value Pxm (k) of the power in the short-time frame is larger than the threshold value xth in the register 292 (the inequality sign in step S14 may be>).

ここで短時間フレーム内のパワーは、平均化部283で平均せずに累積値でもよい。また、受話信号x(n)の振幅の絶対値の累積を累積振幅絶対値算出部282で求め、それを平均化部283で平均した受話信号x(n)の振幅の絶対値を用いてもよい。あるいは、平均せず累積振幅絶対値を用いてもよい。更に、発話区間の判定方法としては、音声区間の検出方法として従来から用いられている振幅の零交叉数を用いてもよい。つまり発話音声があるとなしで振幅の零交叉数が異なることを利用してもよい。また、後述する実施例2のように受話信号x(n)と収音信号z(n)をそれぞれ周波数分析する方法においては、雑音区間のスペクトルと発話がある区間のスペクトルとの差から発話区間を判定してもよい。従来の発話検出方法の何れを用いてもよい。このように受話指標値計算部28において求める受話指標値は各種のものが考えられる。何れの値を受話指標値にするかによって受話発話判定部29内のレジスタ292に格納する閾値xthの値も異なるものとなる。   Here, the power in the short-time frame may be an accumulated value without being averaged by the averaging unit 283. Alternatively, the absolute value of the amplitude of the received signal x (n) obtained by accumulating the absolute value of the amplitude of the received signal x (n) by the accumulated amplitude absolute value calculating unit 282 and averaged by the averaging unit 283 may be used. Good. Alternatively, the accumulated amplitude absolute value may be used without averaging. Further, as the method for determining the speech section, the zero crossing number of the amplitude conventionally used as the method for detecting the voice section may be used. In other words, the fact that the zero crossing number of the amplitude is different without and with the uttered voice may be used. Further, in the method of frequency analysis of the received signal x (n) and the collected sound signal z (n) as in Example 2 described later, the utterance interval is determined from the difference between the spectrum of the noise interval and the spectrum of the utterance interval. May be determined. Any of the conventional utterance detection methods may be used. As described above, the reception index value calculated by the reception index value calculation unit 28 may be various. The value of the threshold value xth stored in the register 292 in the received speech utterance determination unit 29 differs depending on which value is used as the received speech index value.

受話信号発話フラグfx(k)が入力される受話発話カウンター30は、受話信号発話フラグfx(k)が連続する時の、短時間フレームの数を計数する(ステップS16)。
受話発話カウンター30の初期値は0と(cx(0)=0)とする。kフレーム目で受話信号発話フラグが発話ありの場合、受話発話カウンター30の計数値は、1フレーム前の計数値cx(k−1)に+1される(ステップS16、cx(k)=cx(k−1)+1)。
逆にkフレーム目で受話信号発話フラグfx(k)が発話なしの場合(fx(k)=0)、受話発話カウンター30の計数値は0にリセットされる(ステップS18、cx(k)=0)。つまり受話発話カウンター30は、受話信号側の連続する発話フレーム数を計数することになる。
The received utterance counter 30 to which the received signal utterance flag fx (k) is input counts the number of short-time frames when the received signal utterance flag fx (k) continues (step S16).
The initial values of the received speech counter 30 are 0 and (cx (0) = 0). When the received signal utterance flag is uttered at the kth frame, the count value of the received utterance counter 30 is incremented by 1 to the count value cx (k−1) of the previous frame (step S16, cx (k) = cx ( k-1) +1).
Conversely, when the received signal utterance flag fx (k) is not uttered at the kth frame (fx (k) = 0), the count value of the received utterance counter 30 is reset to 0 (step S18, cx (k) = 0). That is, the reception speech counter 30 counts the number of continuous speech frames on the reception signal side.

収音信号z(x)が入力される収音発話判定計数手段24も上記した受話発話判定計数手段22と、実験的に求められる閾値zthの値が異なるだけで全く同じ動作である。したがって、受話発話判定計数手段22と同一の構成には、受話発話判定計数手段22の参照符号にダッシュ(’)を付けて図2中に表記することで、収音発話判定計数手段24の説明を省略する(以降も同様とする)。
つまり、収音発話カウンター30’は、収音信号側の連続する発話フレーム数を計数することになる。受話発話カウンター30の計数値cx(k)と、収音発話カウンター30’の計数値cz(k)は、ダブルトーク状態判定部32に入力される。
The sound collection utterance determination counting means 24 to which the sound collection signal z (x) is inputted is exactly the same as the above-described received speech utterance determination counting means 22 except that the threshold value zth obtained experimentally is different. Therefore, in the same configuration as the received utterance determination unit 22, the reference sign of the received utterance determination unit 22 is indicated with a dash (′) in FIG. Is omitted (the same applies hereinafter).
That is, the sound collection utterance counter 30 ′ counts the number of continuous utterance frames on the sound collection signal side. The count value cx (k) of the reception utterance counter 30 and the count value cz (k) of the sound collection utterance counter 30 ′ are input to the double talk state determination unit 32.

ダブルトーク状態判定部32は、受話発話カウンター30の計数値cx(k)が0より大(ステップS20でYes、cx(k)>0)で、且つ収音発話カウンター30’の計数値cz(k)が0より大(ステップS22でYes、cz(k)>0)の時、すなわち受話信号x(n)と収音信号z(n)と共に発話ありのとき、ダブルトーク状態か受話シングルトーク状態かを第1状態判定部320が判定する。
一般に、受話シングルトーク状態であれば、受話信号発話開始からその直後に収音信号(この場合、エコー信号y(n)のみ)の発話開始となる。つまり、受話シングルトーク状態であれば、エコー信号y(n)は受話信号x(n)より遅れるため、受話発話カウンター30の計数値cx(k)が0より大になる前に、収音発話カウンター30’の計数値cz(k)が0より大になることはない。従って、受話シングルトーク状態では受話発話カウンター30の計数値cx(k)より収音発話カウンター30’の計数値cz(k)の値が大きくなることはない。
The double-talk state determination unit 32 has a count value cx (k) of the reception utterance counter 30 that is greater than 0 (Yes in step S20, cx (k)> 0) and a count value cz ( When k) is larger than 0 (Yes in step S22, cz (k)> 0), that is, when there is an utterance together with the received signal x (n) and the collected sound signal z (n), the double talk state or the received single talk The first state determination unit 320 determines whether the state is present.
In general, in the received single talk state, the voice collection signal (in this case, only the echo signal y (n)) is started immediately after the start of the reception signal utterance. In other words, in the received single talk state, since the echo signal y (n) is delayed from the received signal x (n), the collected utterance utterance before the count value cx (k) of the received utterance counter 30 becomes larger than 0. The count value cz (k) of the counter 30 ′ never exceeds 0. Therefore, in the received single talk state, the count value cz (k) of the collected voice utterance counter 30 ′ does not become larger than the count value cx (k) of the received utterance counter 30.

これらを鑑みて、ダブルトーク状態を次のように判定する。受話発話カウンター30の計数値cx(k)から、収音発話カウンター30’の計数値cz(k)を引き算した値が負の場合(ステップS24がYes)にダブルトーク状態と判定し、ダブルトーク状態判定フラグfd(k)をfd(k)=1(ステップS26)として損失量決定手段94に出力する。
逆に受話発話カウンター30の計数値cx(k)から、収音発話カウンター30’の計数値cz(k)を引き算した値が、0以上の場合(ステップS24がNo、cx(k)−cz(k)≧0)に受話シングルトーク状態と判定し、ダブルトーク状態判定フラグfd(k)をfd(k)=0(ステップS28)として損失量決定手段94に出力する。
In view of these, the double talk state is determined as follows. When the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is negative (step S24 is Yes), the double talk state is determined. The state determination flag fd (k) is output to the loss amount determining means 94 as fd (k) = 1 (step S26).
Conversely, when the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is 0 or more (step S24 is No, cx (k) −cz). When (k) ≧ 0), the received single talk state is determined, and the double talk state determination flag fd (k) is output to the loss amount determining means 94 as fd (k) = 0 (step S28).

以上述べたダブルトーク状態判定部32の動作のタイムチャートを図4に示す。図4の横方向は時間であり、目盛り一つが1個の短時間フレームを表わす。この例の場合、短時間フレームは10msである。縦方向は受話信号発話フラグfx(k)と収音信号発話フラグfz(k)の1,0の状態、その1,0の状態の中に数字で受話発話カウンター30と収音発話カウンター30’の計数値を示す。その計数値の下に受話発話カウンター30の計数値cx(k)から、収音発話カウンター30’の計数値cz(k)を引き算した値を示す。   FIG. 4 shows a time chart of the operation of the double talk state determination unit 32 described above. The horizontal direction in FIG. 4 is time, and one scale represents one short frame. In this example, the short time frame is 10 ms. In the vertical direction, the received signal utterance flag fx (k) and the collected sound signal utterance flag fz (k) are in the 1, 0 state, and the received utterance counter 30 and the collected utterance counter 30 ′ are numerically included in the 1, 0 state. The count value is shown. Below the count value, a value obtained by subtracting the count value cz (k) of the collected speech counter 30 'from the count value cx (k) of the received speech counter 30 is shown.

今、ある時刻に受話信号x(n)が発生し、その受話指標値が閾値xthよりも大であると、その短時間フレームから受話発話カウンター30は、計数を開始する。図4(a)に受話発話状態が1秒継続したした後に、受話信号x(n)がない状態があって、再び受話信号x(n)が発生した状況を示す。
最初の受話発話カウンター30の計数開始時が受話シングルトーク状態であるとすると、図4(a)に示す様に、収音信号発話フラグfz(k)は同じフレームか遅れたフレームでfz(k)=1になる。したがって、受話発話カウンター30の計数値cx(k)から、収音発話カウンター30’の計数値cz(k)を引き算した値は、負になることがない。ダブルトーク状態判定部32は、この状態を受話シングルトーク状態と判断する。
Now, when the reception signal x (n) is generated at a certain time and the reception index value is larger than the threshold value xth, the reception utterance counter 30 starts counting from the short time frame. FIG. 4A shows a situation where there is no received signal x (n) after the received speech state has continued for 1 second, and the received signal x (n) is generated again.
Assuming that the first reception utterance counter 30 starts counting in the reception single talk state, as shown in FIG. 4A, the collected signal utterance flag fz (k) is the same frame or a delayed frame, fz (k ) = 1. Therefore, the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 never becomes negative. The double talk state determination unit 32 determines this state as the received single talk state.

逆に先に収音信号発話フラグfz(k)が発話あり(fz(k)=1)となって、収音発話カウンター30’が計数を開始し、その後に、受話発話カウンター30が遅れて計数を開始した場合を考える。この場合は、受話発話カウンター30の計数値cx(k)から、収音発話カウンター30’の計数値cz(k)を引き算した値は、負になる。この状態をダブルトーク状態と判断する。
なお、反響消去部80としては、適応フィルタを用いる場合に限らず他の方法を用いてもよい。また、例えば、損失器99において収音信号z(n)がスペクトラム制御されるなどの構成もあり、損失器98,99も図1に示すものに限られない。
Conversely, the sound collection signal utterance flag fz (k) is uttered first (fz (k) = 1), the sound collection utterance counter 30 ′ starts counting, and then the reception utterance counter 30 is delayed. Consider the case where counting is started. In this case, the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is negative. This state is determined as a double talk state.
The echo canceling unit 80 is not limited to using an adaptive filter, and other methods may be used. Further, for example, there is a configuration in which the sound pickup signal z (n) is spectrum-controlled in the loss device 99, and the loss devices 98 and 99 are not limited to those shown in FIG.

一般的に音声会議やTV会議等の場面においては、遠端と近端は同時に発話することはない。つまり、相手の話を聞いてから自分が発言する場合には上記した実施例1の構成で、十分正しくダブルトーク状態の判定を行なうことが可能である。
しかし、議論が白熱して来ると、相手が発言を終了する前に自分が発言をしてしまうことがしばしば起こり得る。そのような状況でも正確にダブルトーク状態を判定できるようにした実施例2を次に説明する。
Generally, in a scene such as an audio conference or a video conference, the far end and the near end do not speak at the same time. That is, when the user speaks after listening to the other party's story, the double talk state can be determined sufficiently correctly with the configuration of the first embodiment described above.
However, when discussions get heated up, it is often possible for you to speak before the other person finishes speaking. A second embodiment in which the double talk state can be accurately determined even in such a situation will be described below.

図4(b)に相手が発言を終了する前に自分が発言をした場合のダブルトーク状態判定部32の動作タイムチャートを示す。横方向と縦方向、及び横方向の目盛りの関係も上記した図4(a)と同じである。
最初の受話発話カウンター30の計数開始時が受話シングルトーク状態であれば、収音信号発話フラグfz(k)は同じフレームか遅れたフレームで遅れてfz(k)=1になる。図4(b)では図4(a)よりも反響路6における遅延量が大きく、収音信号発話フラグfz(k)が1フレーム遅れてfz(k)=1になる状況を示している。この状態では、受話発話カウンター30の計数値cx(k)から、収音発話カウンター30’の計数値cz(k)を引き算した値は、1になり受話シングルトーク状態と判定される。
FIG. 4B shows an operation time chart of the double talk state determination unit 32 when the partner speaks before the partner finishes speaking. The relationship between the horizontal direction, the vertical direction, and the scale in the horizontal direction is the same as that in FIG.
If the first reception utterance counter 30 starts counting at the reception single talk state, the collected signal utterance flag fz (k) becomes fz (k) = 1 after the same frame or a delayed frame. FIG. 4B shows a situation in which the delay amount in the echo path 6 is larger than that in FIG. 4A and the sound pickup signal utterance flag fz (k) is delayed by one frame and fz (k) = 1. In this state, the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 becomes 1, and it is determined that the received single talk state.

この判定は上記した実施例1に示した構成で正確に行なわれるが、一度受話シングルトーク状態と判定された後に、自分側(収音側)がcz(k)=97の時刻に発話したとすると、ダブルトーク状態であるにも関わらず受話発話カウンター30の計数値cx(k)から、収音発話カウンター30’の計数値cz(k)を引き算した値は、1のままであり受話シングルトーク状態の判定が継続してしまう。したがって、収音側の損失器99に損失が挿入された状態が維持されるので送話信号s(n)が通らない。
実施例2は、このような状況でも正確なダブルトーク状態の判定が行なえるようにしたものである。実施例2のダブルトーク状態判定手段50の機能構成例を図5に示す。
Although this determination is performed accurately with the configuration shown in the first embodiment, it is assumed that the local side (sound collecting side) uttered at the time of cz (k) = 97 after it has been determined that the received single talk state has occurred. Then, the value obtained by subtracting the count value cz (k) of the collected utterance counter 30 ′ from the count value cx (k) of the received utterance counter 30 in spite of the double talk state remains 1 and the received single The determination of the talk state continues. Therefore, since the state in which the loss is inserted in the losser 99 on the sound collection side is maintained, the transmission signal s (n) does not pass.
In the second embodiment, an accurate double talk state can be determined even in such a situation. FIG. 5 shows a functional configuration example of the double talk state determination unit 50 of the second embodiment.

実施例2のダブルトーク状態判定手段50では、受話信号x(n)を短時間フレーム毎に周波数分析する受話信号周波数分析部56と、収音信号z(n)を短時間フレーム毎に周波数分析する収音信号周波数分析部56’と、それぞれの周波数分析された受話信号と収音信号の短時間フレーム間同士の相関を求める相関部540と、が実施例1のダブルトーク状態判定手段20に加えられている。加えられた各部とダブルトーク状態判定部58以外の各部の動作は実施例1と同じである。ここでは、加えられた各部とダブルトーク状態判定部58の動作を説明する。   In the double talk state determining means 50 of the second embodiment, the received signal frequency analyzing unit 56 that analyzes the frequency of the received signal x (n) for each short time frame, and the frequency analysis of the collected sound signal z (n) for each short time frame. The double-talk state determination unit 20 according to the first embodiment includes a collected sound signal frequency analysis unit 56 ′ that performs the correlation and a correlation unit 540 that obtains a correlation between the short-term frames of the received signal and the collected sound signal. It has been added. The operations of the respective units other than the added units and the double talk state determination unit 58 are the same as those in the first embodiment. Here, the operation of each added unit and the double talk state determination unit 58 will be described.

受話信号周波数分析部56にはフレーム分割部26から分割された受話信号x(k)と短時間フレームが入力される。受話信号周波数分析部56は、周知の技術である短時間離散フーリエ変換などで、短時間フレーム内の受話信号x(n)が、例えばf1〜fNまでのN個の周波数に対応するN個の離散周波数領域信号X(k,f1)、‥‥、X(k,fn)、‥‥X(k,fN)に変換される。
受話信号パワー計算部54において、離散周波数領域受話信号X(k,f*)(*=1,2,‥‥,N)は、受話離散パワー計算部541で、各離散周波数受話信号X(k,f*)のそれぞれのパワーが計算され、各パワーを要素とする受話パワーベクトルPXVk=[PX(k,f1),PX(k,f2),‥‥,PX(k,fN)]が求められる。
The reception signal frequency analysis unit 56 receives the reception signal x (k) divided from the frame division unit 26 and the short-time frame. The received signal frequency analyzing unit 56 is a known technique such as a short-time discrete Fourier transform, and the received signal x (n) in the short-time frame has N numbers corresponding to N frequencies from f1 to fN, for example. Discrete frequency domain signals X (k, f1), ..., X (k, fn), ... X (k, fN).
In the received signal power calculation unit 54, the discrete frequency domain received signal X (k, f *) (* = 1, 2,..., N) is received by the received discrete power calculation unit 541 in each discrete frequency received signal X (k , F *) is calculated, and a received power vector PXVk = [PX (k, f1), PX (k, f2),..., PX (k, fN)] having each power as an element is obtained. It is done.

収音信号z(x)側のフレーム分割部26’から収音信号z(n)と短時間フレームが入力される収音信号周波数分析部56’と収音信号パワー計算部54’の動作も、受話信号x(n)側と全く同じであり、収音信号パワー計算部54’を構成する収音離散パワー計算部541’で収音パワーベクトルPZVk=[PX(k,f1),PX(k,f2),‥‥,PX(k,fN)]が求められる。
受話パワーベクトルPXVk=[PX(k,f1),PX(k,f2),‥‥,PX(k,fN)]と収音パワーベクトルPZVk=[PX(k,f1),PX(k,f2),‥‥,PX(k,fN)]は相関計算部52に入力され、それぞれの相関値CXZ(k)がCXZ(k)=(PXVk・PZVk)/|PXVk||PZVk)|で計算される。ここで「・」はベクトルの内積を表わす。この相関値CXZ(k)は、正規化された値であり0〜1までの値を持つ。
The operations of the sound collection signal frequency analysis unit 56 ′ and the sound collection signal power calculation unit 54 ′ to which the sound collection signal z (n) and the short-time frame are input from the frame division unit 26 ′ on the sound collection signal z (x) side are also described. The collected sound power vector PZVk = [PX (k, f1), PX () in the collected sound discrete power calculation unit 541 ′ constituting the collected sound signal power calculation unit 54 ′. k, f2),..., PX (k, fN)].
Received power vector PXVk = [PX (k, f1), PX (k, f2),..., PX (k, fN)] and sound collection power vector PZVk = [PX (k, f1), PX (k, f2) ,..., PX (k, fN)] are input to the correlation calculation unit 52, and the respective correlation values CXZ (k) are calculated by CXZ (k) = (PXVk · PZVk) / | PXVk || PZVk) | Is done. Here, “·” represents an inner product of vectors. The correlation value CXZ (k) is a normalized value and has a value from 0 to 1.

相関値CXZ(k)は、ダブルトーク状態判定部58に入力される。ダブルトーク状態判定部58には、相関値CXZ(k)の他に、上記した受話発話カウンター30の計数値cx(k)と、収音発話カウンター30’の計数値cz(k)とが入力される。
ダブルトーク状態判定部58は、受話発話カウンター30の計数値cx(k)が0より大で、且つ収音発話カウンター30’の計数値cz(k)も0より大の場合には、受話信号x(n)と収音信号z(n)ともに発話ありと判定する(図3、ステップS22のYes)。この場合、ダブルトーク状態判定部58はダブルトーク状態か受話シングルトーク状態かを判定する。その処理を図3に破線で示す。
The correlation value CXZ (k) is input to the double talk state determination unit 58. In addition to the correlation value CXZ (k), the double talk state determination unit 58 receives the count value cx (k) of the received speech counter 30 and the count value cz (k) of the collected speech counter 30 ′. Is done.
When the count value cx (k) of the received speech counter 30 is greater than 0 and the count value cz (k) of the collected speech counter 30 ′ is also greater than 0, the double talk state determination unit 58 receives the received signal. Both x (n) and the collected sound signal z (n) are determined to have utterance (Yes in FIG. 3, step S22). In this case, the double talk state determination unit 58 determines whether the state is a double talk state or an incoming single talk state. The process is indicated by a broken line in FIG.

受話発話カウンター30の計数値cx(k)と収音発話カウンター30’の計数値cz(k)の小さい方の値が閾値thd以下の場合は、実施例1に示した発話状態に基づく判定を第1状態判定部320が行なう(ステップS30のYes)。ここで閾値thdは、例えば1秒程度に相当するフレーム値を設定する。この例の場合、フレーム長が10msであるので閾値thd=100とする。
ステップS30で受話発話カウンター30の計数値cx(k)と収音発話カウンター30’の計数値cz(k)の小さい方の値が閾値thdより大きいと状態判定選択部581で判断されると、ダブルトーク状態判定部58の第2状態判定部582が、相関値CXZ(k)を参照してダブルトーク状態であるか否かを判定する。
When the smaller one of the count value cx (k) of the received speech counter 30 and the count value cz (k) of the collected speech counter 30 ′ is equal to or smaller than the threshold thd, the determination based on the speech state shown in the first embodiment is performed. The first state determination unit 320 performs (Yes in step S30). Here, the threshold value thd is set to a frame value corresponding to, for example, about 1 second. In this example, since the frame length is 10 ms, the threshold thd = 100.
When the state determination selection unit 581 determines that the smaller value of the count value cx (k) of the reception utterance counter 30 and the count value cz (k) of the sound collection utterance counter 30 ′ is greater than the threshold thd in step S30, The second state determination unit 582 of the double talk state determination unit 58 refers to the correlation value CXZ (k) and determines whether or not the double talk state is in effect.

相関値CXZ(k)は、短時間フレーム毎の受話パワーベクトルPXVkと収音パワーベクトルPZVkとの相関が高いと1に近い値をとる。したがって、受話シングルトーク状態では、相関値CXZ(k)は1に近い値になる。そこでステップS32で閾値thcと、相関値CXZ(k)とを比較して、閾値thcよりも相関値CXZ(k)が大であれば受話シングルトーク状態と判定(ステップS34)し、相関値CXZ(k)が、閾値thcよりも小さければ、ダブルトーク状態と判定する(ステップS36)。
このように受話信号x(n)と収音信号z(n)とをそれぞれ周波数領域信号に変換して、その相関を取ることで、相手の発言が終了する前に自分が発言をする様な状況でも正しくダブルトーク状態を判定することが出来る。又、ダブルトーク状態から受話シングルトーク状態に変化した時も同様に判定することが出来る。
The correlation value CXZ (k) takes a value close to 1 when the correlation between the reception power vector PXVk and the sound collection power vector PZVk for each short time frame is high. Therefore, in the received single talk state, the correlation value CXZ (k) is close to 1. Therefore, in step S32, the threshold value thc is compared with the correlation value CXZ (k), and if the correlation value CXZ (k) is larger than the threshold value thc, it is determined as the received single talk state (step S34), and the correlation value CXZ. If (k) is smaller than the threshold thc, it is determined that the state is a double talk state (step S36).
In this way, the received signal x (n) and the collected sound signal z (n) are converted into frequency domain signals, respectively, and their correlation is obtained, so that the user speaks before the other party's speech is completed. Even in a situation, the double talk state can be correctly determined. The same determination can be made when the state changes from the double talk state to the received single talk state.

なお、受話パワーベクトルPXVkと収音パワーベクトルPZVkの相関を、フレーム内の同一の離散周波数毎に求める例で説明を行ったが、この発明はこの例に限定されない。離散周波数領域受話信号X(k,f*)の振幅を用いて短時間フレーム間の相関を求めてもよい。その場合、受話離散パワー計算部541は、離散周波数領域受話信号X(k,f*)の振幅を求める受話離散振幅部541として機能する。
この場合、上記した受話パワーベクトルは、受話振幅ベクトルXVk=[X(k,f1),X(k,f2),‥‥,X(k,fN)]となり、収音振幅ベクトルZVk=[X(k,f1),X(k,f2),‥‥,X(k,fN)]と共に相関計算部52に入力される。それぞれの相関値CXZ(k)は、それぞれの振幅ベクトルの内積をそれぞれの振幅の絶対値の積で除算されて計算される(CXZ(k)=(XVk・ZVk)/|XVk||ZVk|)。
Although the example in which the correlation between the reception power vector PXVk and the sound collection power vector PZVk is obtained for each identical discrete frequency in the frame has been described, the present invention is not limited to this example. Correlation between short-time frames may be obtained using the amplitude of the discrete frequency domain received signal X (k, f *). In that case, the received discrete power calculation unit 541 functions as the received discrete amplitude unit 541 for obtaining the amplitude of the discrete frequency domain received signal X (k, f *).
In this case, the reception power vector described above is reception amplitude vector XVk = [X (k, f1), X (k, f2),..., X (k, fN)], and sound collection amplitude vector ZVk = [X (K, f1), X (k, f2),..., X (k, fN)] are input to the correlation calculation unit 52. Each correlation value CXZ (k) is calculated by dividing the inner product of the respective amplitude vectors by the product of the absolute values of the respective amplitudes (CXZ (k) = (XVk · ZVk) / | XVk || ZVk | ).

又、周波数領域信号の外形や包絡の相関を求めることも考えられる。
離散周波数領域信号X(k,f*)を、受話信号パワー計算部54の受話バンド分割部542で、例えば3個(X(k,f1)〜X(k,f3))ずつ、或いは5個乃至10個ずつ等間隔に分割する。
離散周波数領域信号X(k,f*)の分割については色々な方法が考えられる。例えば、音声は一般的に低い周波数領域に特徴が出易いので、周波数の低い領域では少ない数の離散周波数領域信号X(k,f*)に分割し、周波数の高い領域では数多くの離散周波数領域信号X(k,f*)に分割しても良い。
It is also conceivable to obtain the correlation between the external shape and envelope of the frequency domain signal.
For example, three (X (k, f1) to X (k, f3)) or five discrete frequency domain signals X (k, f *) are received by the reception band dividing unit 542 of the reception signal power calculation unit 54. Divide into 10 pieces at regular intervals.
Various methods can be considered for dividing the discrete frequency domain signal X (k, f *). For example, since speech generally tends to have characteristics in a low frequency region, it is divided into a small number of discrete frequency region signals X (k, f *) in a low frequency region, and many discrete frequency regions in a high frequency region. You may divide | segment into the signal X (k, f *).

つまり、周波数の対数の関係で、離散周波数領域信号X(k,f*)を分割する。又は、人間の聴感に合わせたメルスケールに対応させて分割することで、より聴感特性を考慮した分割にすることが出来る。
受話バンド分割部542で平滑された離散周波数f*を集約バンドmi(i=1〜w)で表わし、受話バンド列と称す。
受話平滑部542で分割された受話バンド列である離散周波数領域信号X(k,m*)は、受話バンド平均化部543に入力され、バンド毎の受話バンドパワーベクトルPXVk=[PX(k,m1),PX(k,m2),‥‥,PX(k,mw)]が計算される。
That is, the discrete frequency domain signal X (k, f *) is divided based on the logarithm of the frequency. Or it can divide | segment according to the mel scale matched with human audibility, and can be divided | segmented in consideration of the audibility characteristic more.
The discrete frequency f * smoothed by the reception band dividing unit 542 is expressed as an aggregate band mi (i = 1 to w), and is referred to as a reception band sequence.
The discrete frequency domain signal X (k, m *), which is the reception band sequence divided by the reception smoothing unit 542, is input to the reception band averaging unit 543, and the reception band power vector PXVk = [PX (k, m1), PX (k, m2),..., PX (k, mw)].

同様に収音バンド列Z(k,m*)から、収音バンドパワーベクトルPZVk=[PZ(k,m1),PZ(k,m2),‥‥,PZ(k,mw)]が計算される。
相関計算部52は、フレーム内の集約バンド同士の相関値CXZ(k)をCXZ(k)=(PXVk・PZVk)/|PXVk||PZVk)|で計算する。
このように離散周波数領域信号を分割バンド毎に平均化してフレーム間の相関を取ることで、ダブルトーク状態の判定動作の安定化を図ることができる。
なお
、離散周波数領域信号のパワーベクトル同士の相関を取る例を説明したが、離散周波数領域信号の包絡を求めた後に、周波数領域毎の包絡の代表値を求め、その代表値の相関値を求める方法も考えられる。その例を実施例3に示す。
Similarly, the sound collection band power vector PZVk = [PZ (k, m1), PZ (k, m2),..., PZ (k, mw)] is calculated from the sound collection band sequence Z (k, m *). The
The correlation calculation unit 52 calculates the correlation value CXZ (k) between the aggregated bands in the frame by CXZ (k) = (PXVk · PZVk) / | PXVk || PZVk) |.
In this way, by averaging the discrete frequency domain signals for each divided band and obtaining the correlation between the frames, it is possible to stabilize the determination operation of the double talk state.
In addition, although the example which takes the correlation of the power vectors of a discrete frequency domain signal was demonstrated, after calculating | requiring the envelope of a discrete frequency domain signal, the representative value of the envelope for every frequency domain is calculated | required, and the correlation value of the representative value is calculated | required A method is also conceivable. An example is shown in Example 3.

実施例3のダブルトーク状態判定手段60の機能構成例を図6に示す。
実施例3のダブルトーク状態判定手段60は、図5に示した実施例2のダブルトーク状態判定手段50に対して、受話信号スペクトル包絡計算部62が追加され、受話信号バンドパワー計算部64の機能構成が変わっている点のみが異なる。ここでは、その異なる点のみを説明する。
受話信号周波数分析部56で周波数領域の信号に変換された離散周波数領域受話信号X(k,f*)は、受話信号スペクトル包絡計算部62に入力される。受話信号スペクトル包絡計算部62は、例えばケプストラム分析を用いたスペクトル包絡や、線形予測ケプストラム分析を用いたスペクトル包絡や、線形予測を用いたスペクトル包絡を計算する。
A functional configuration example of the double talk state determination means 60 of the third embodiment is shown in FIG.
The double talk state determination unit 60 according to the third embodiment has a received signal spectrum envelope calculation unit 62 added to the double talk state determination unit 50 according to the second embodiment shown in FIG. The only difference is that the functional configuration has changed. Here, only the different points will be described.
The discrete frequency domain received signal X (k, f *) converted into the frequency domain signal by the received signal frequency analyzing unit 56 is input to the received signal spectrum envelope calculating unit 62. The received signal spectrum envelope calculation unit 62 calculates, for example, a spectrum envelope using cepstrum analysis, a spectrum envelope using linear prediction cepstrum analysis, and a spectrum envelope using linear prediction.

ケプストラム分析を用いたスペクトル包絡の求め方は、例えば参考文献「デジタル音声処理、著者:古井貞煕、東海大学出版会、45頁」等に記載されている。
受話信号スペクトル包絡計算部62で計算されたスペクトル包絡の一例を、模式的に図7に示す。図7の横軸は周波数であり縦軸はスペクトルの振幅を表わす。このスペクトル包絡が受話信号バンドパワー計算部64を構成する周波数バンド毎受話代表値生成部641に入力される。
周波数バンド毎受話代表値生成部641は、図7に●で示す代表値を上記した集約バンドmi(i=1〜w)毎に計算する。集約バンドmiを受話包絡バンド列と称す。周波数バンド毎受話代表値生成部641で計算された受話包絡バンド列毎の代表スペクトルは、例えば受話包絡バンド列毎の平均値であり、受話代表値パワー計算部642でパワーに計算される。受話側、収音側の代表値パワーベクトルは相関計算部52に入力される。相関値の求め方は上記したと同様な方法で求めることが出来る。
なお、相関値は上記代表スペクトルの振幅値を用いて求めてもよい。
A method for obtaining a spectral envelope using cepstrum analysis is described in, for example, a reference document “Digital Speech Processing, Author: Sadaaki Furui, Tokai University Press, page 45”.
An example of the spectrum envelope calculated by the received signal spectrum envelope calculation unit 62 is schematically shown in FIG. In FIG. 7, the horizontal axis represents frequency, and the vertical axis represents spectrum amplitude. This spectrum envelope is input to the reception representative value generation unit 641 for each frequency band constituting the reception signal band power calculation unit 64.
The reception representative value generating unit 641 for each frequency band calculates the representative value indicated by ● in FIG. 7 for each aggregate band mi (i = 1 to w) described above. Aggregation band mi is referred to as an incoming envelope band sequence. The representative spectrum for each reception envelope band sequence calculated by the reception representative value generation unit 641 for each frequency band is, for example, an average value for each reception envelope band sequence, and is calculated into power by the reception representative value power calculation unit 642. The representative value power vectors on the reception side and the sound collection side are input to the correlation calculation unit 52. The method for obtaining the correlation value can be obtained by the same method as described above.
The correlation value may be obtained using the amplitude value of the representative spectrum.

例えば線形予測ケプストラム分析を用いたスペクトル包絡は、スペクトルのピークが強調される(参考文献48頁)ので、音声の相関が取り易くなる効果が期待できる。
以上述べたように、相関は、スペクトルの振幅値、バンド毎の平均値、スペクトル包絡の包絡バンド毎の代表振幅値、スペクトル包絡の包絡バンド毎のパワー値の何れで計算してもよい。
For example, the spectrum envelope using the linear prediction cepstrum analysis emphasizes the peak of the spectrum (see reference page 48), so that an effect of facilitating the correlation of speech can be expected.
As described above, the correlation may be calculated by any of the spectrum amplitude value, the average value for each band, the representative amplitude value for each envelope band of the spectrum envelope, and the power value for each envelope band of the spectrum envelope.

他のダブルトーク状態判定部70の機能構成例を図11に示し、その動作の主要な処理の流れを図12に示す。図12は図3のステップS22以降の流れを示す。ダブルトーク状態判定部70は、実施例1で説明した発話状態に基づく判定と、実施例2及び3で説明した相関値に基づいた判定とを組み合わせてダブルトーク状態の判定を行なう。
ダブルトーク状態判定部70は、第1状態判定部72と第2状態判定部74と受話状態判定部76とからなり、相関値CXZ(k)と、受話発話カウンター30の計数値cx(k)と、収音発話カウンター30’の計数値cz(k)とが入力される。
FIG. 11 shows a functional configuration example of another double talk state determination unit 70, and FIG. 12 shows a main processing flow of the operation. FIG. 12 shows the flow after step S22 of FIG. The double talk state determination unit 70 determines the double talk state by combining the determination based on the speech state described in the first embodiment and the determination based on the correlation value described in the second and third embodiments.
The double talk state determination unit 70 includes a first state determination unit 72, a second state determination unit 74, and a reception state determination unit 76. The correlation value CXZ (k) and the count value cx (k) of the reception speech counter 30 are included. And the count value cz (k) of the sound collection utterance counter 30 ′ is input.

ダブルトーク状態判定部70は、受話発話カウンター30の計数値cx(k)が0より大で、且つ収音発話カウンター30’の計数値cz(k)も0より大の場合(図3のステップS22でYesと判定された場合)に、動作を開始する。
第1状態判定部72では、発話状態に基づく判定を行う。受話発話カウンター30の計数値cx(k)から収音発話カウンター30’の計数値cz(k)を引き算した値が負の場合、ダブルトーク状態と判定する(ステップS40、Yes)。ここで、cx(k)とcz(k)の小さい方の値が閾値thd以下の場合(ステップS42、Yes)には、第1状態判定部72内の重み付け部72aがダブルトークスコアsd1(k)=s1とする(ステップS44)。cx(k)とcz(k)の小さい方の値が閾値thdより大きい場合には(ステップS42、No)、第1状態判定部72内の重み付け部72aは、ダブルトークスコアsd1=s2とする(ステップS46)。例えば、s1=5、s2=0とする。
When the count value cx (k) of the reception utterance counter 30 is larger than 0 and the count value cz (k) of the sound collection utterance counter 30 ′ is also larger than 0 (step in FIG. 3). The operation is started when it is determined Yes in S22).
The first state determination unit 72 performs determination based on the utterance state. When the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is negative, it is determined as a double talk state (step S40, Yes). Here, when the smaller value of cx (k) and cz (k) is equal to or smaller than the threshold thd (step S42, Yes), the weighting unit 72a in the first state determination unit 72 performs the double talk score sd1 (k ) = S1 (step S44). When the smaller value of cx (k) and cz (k) is larger than the threshold thd (No in step S42), the weighting unit 72a in the first state determination unit 72 sets the double talk score sd1 = s2. (Step S46). For example, s1 = 5 and s2 = 0.

受話発話カウンター30の計数値cx(k)から収音発話カウンター30’の計数値cz(k)を引き算した値が、0以上の場合には受話シングルトーク状態と判定する(ステップS40、No)。ここで、cx(k)とcz(k)の小さい方の値が閾値thd以下の場合(ステップS48、Yes)には、第1状態判定部72内の重み付け部72aがダブルトークスコアsd1(k)=s3とする(ステップS50)。cx(k)とcz(k)の小さい方の値が閾値thdより大きい場合には(ステップS48、No)、第1状態判定部72内の重み付け部72aは、ダブルトークスコアsd1=s4とする(ステップS52)。例えば、s3=−5、s4=0とする。   When the value obtained by subtracting the count value cz (k) of the collected speech counter 30 ′ from the count value cx (k) of the received speech counter 30 is 0 or more, it is determined that the received single talk state is set (No in step S40). . Here, when the smaller value of cx (k) and cz (k) is equal to or smaller than the threshold thd (step S48, Yes), the weighting unit 72a in the first state determination unit 72 performs the double talk score sd1 (k ) = S3 (step S50). When the smaller value of cx (k) and cz (k) is larger than the threshold thd (No in step S48), the weighting unit 72a in the first state determination unit 72 sets the double talk score sd1 = s4. (Step S52). For example, s3 = −5 and s4 = 0.

第2状態判定部74では相関値CXZ(k)に基づいた判定を行う。相関値CXZ(k)が閾値thc以上の場合は、相関が高く、受話シングルトーク状態と判定する(ステップS54、Yes)。ここで、cx(k)とcz(k)の小さい方の値が閾値thd以下の場合(ステップS56、Yes)には、第2状態判定部74内の重み付け部74aがダブルトークスコアsd2(k)=s5とする(ステップS58)。cx(k)とcz(k)の小さい方の値が閾値thdより大きい場合には(ステップS56、No)、第2状態判定部74内の重み付け部74aは、ダブルトークスコアsd2=s6とする(ステップS60)。例えば、s5=−2、s6=−5とする。   The second state determination unit 74 performs determination based on the correlation value CXZ (k). When the correlation value CXZ (k) is greater than or equal to the threshold thc, the correlation is high and it is determined that the received single talk state is present (step S54, Yes). Here, when the smaller value of cx (k) and cz (k) is equal to or smaller than the threshold thd (step S56, Yes), the weighting unit 74a in the second state determination unit 74 performs the double talk score sd2 (k ) = S5 (step S58). When the smaller value of cx (k) and cz (k) is larger than the threshold thd (No in step S56), the weighting unit 74a in the second state determination unit 74 sets the double talk score sd2 = s6. (Step S60). For example, s5 = −2 and s6 = −5.

相関値CXZ(k)が閾値thcよりも小さい場合、ダブルトーク状態と判定する(ステップS54、No)。ここで、cx(k)とcz(k)の小さい方の閾値thd以下の場合には、(ステップS62、Yes)には、第2状態判定部74内の重み付け部74aがダブルトークスコアsd2(k)=s7とする(ステップS64)。cx(k)とcz(k)の小さい方の値が閾値thdより大きい場合には(ステップS62、No)、第2状態判定部74内の重み付け部74aは、ダブルトークスコアsd2=s8とする(ステップS66)。例えば、s7=2、s8=5とする。ここで、相関値CXZ(k)に対する閾値を複数設けて、ダブルトークスコアsd2(k)を更に細かく重み付けしてもよい。   When the correlation value CXZ (k) is smaller than the threshold value thc, it is determined as a double talk state (No in step S54). Here, when the threshold value thd of cx (k) and cz (k), which is smaller, is equal to or smaller than (Yes in step S62), the weighting unit 74a in the second state determination unit 74 uses the double talk score sd2 ( k) = s7 (step S64). When the smaller value of cx (k) and cz (k) is larger than the threshold thd (No in step S62), the weighting unit 74a in the second state determination unit 74 sets the double talk score sd2 = s8. (Step S66). For example, s7 = 2 and s8 = 5. Here, a plurality of threshold values for the correlation value CXZ (k) may be provided, and the double talk score sd2 (k) may be weighted more finely.

受話状態決定部76には、第1状態判定部72によるダブルトークスコアsd1(k)と、第2状態判定部74によるダブルトークスコアsd2(k)とが入力される。sd1(k)とsd2(k)は、受話状態決定部76内のスコア加算部76aで合計され、総合ダブルトークスコアsdaが計算される。
総合ダブルトークスコアsdaとダブルトークスコア閾値thsとを比較し、sda(k)が閾値ths以上の場合には、ダブルトーク状態と判定(ステップS70、Yes)し、ダブルトーク判定フラグfd(k)=1として損失量決定手段94に出力する(ステップS72)。sda(k)が閾値thsより小さい場合には、受話シングルトーク状態と判定してfd(k)=0として損失量決定手段94に出力する(ステップS74)。
The reception state determination unit 76 receives the double talk score sd1 (k) from the first state determination unit 72 and the double talk score sd2 (k) from the second state determination unit 74. sd1 (k) and sd2 (k) are summed by the score adding unit 76a in the reception state determining unit 76, and the total double talk score sda is calculated.
The total double talk score sda and the double talk score threshold value ths are compared. If sda (k) is equal to or greater than the threshold value ths, it is determined as a double talk state (step S70, Yes), and a double talk determination flag fd (k). = 1 is output to the loss determining means 94 (step S72). If sda (k) is smaller than the threshold value ths, it is determined that the received single talk state is present, and fd (k) = 0 is output to the loss determining means 94 (step S74).

以上述べたようにダブルトーク状態の判定を、第1状態判定部72の計数値による判定と、第2状態判定部74の相関値による判定とを組み合わせて行なってもよい。
シミュレーション結果
実施例1のダブルトーク状態判定手段におけるダブルトーク状態の判定結果をシミュレーションした結果を図8に示す。
「シミュレーション条件
マイクロホンとスピーカとの間隔:20cm(条件1はこの間隔に固定、条件2はマイクロホンをスピーカから1m程度離して動かし続けた状態とした。)
サンプリングレート:16kHz
フレーム:20ms」
条件1は、上記したようにマイクロホンとスピーカの位置を固定し、反響消去部80が反響路6のインパルス応答を充分に学習した状態である。条件1における受話信号x(n)を図8(a)に、収音信号z(n)を図8(b)に、誤差信号e(n)図8(c)に示す。
As described above, the determination of the double talk state may be performed by combining the determination based on the count value of the first state determination unit 72 and the determination based on the correlation value of the second state determination unit 74.
Simulation Results FIG. 8 shows the result of simulating the determination result of the double talk state in the double talk state determination means of the first embodiment.
“Simulation condition: Distance between microphone and speaker: 20 cm (Condition 1 is fixed at this distance, Condition 2 is a state in which the microphone is kept moving about 1 m away from the speaker.)
Sampling rate: 16 kHz
Frame: 20ms "
Condition 1 is a state in which the positions of the microphone and the speaker are fixed as described above, and the echo canceling unit 80 has sufficiently learned the impulse response of the echo path 6. The received signal x (n) under condition 1 is shown in FIG. 8A, the collected sound signal z (n) is shown in FIG. 8B, and the error signal e (n) is shown in FIG. 8C.

図8の横軸は時間(秒)であり、縦軸は正規化した振幅である。
反響消去部80がエコー信号y(n)を充分消去しているので、誤差信号e(n)が小さい。
この条件1の状態からマイクロホンの位置をランダムに動かし続けた条件2における受話信号x(n)を図9(a)に、収音信号z(n)を図9(b)に、誤差信号e(n)図9(c)に示す。横軸と縦軸の関係は図8と同じである。
条件2では、反響消去部80の推定手段82が適応アルゴリズムにより生成する適応フィルタ係数h^(n)が、反響路6の変化に追従することが出来ないので誤差信号e(n)が大きい。
The horizontal axis in FIG. 8 is time (seconds), and the vertical axis is normalized amplitude.
Since the echo canceling unit 80 sufficiently cancels the echo signal y (n), the error signal e (n) is small.
The received signal x (n) in condition 2 in which the microphone position is continuously moved from the condition 1 condition is shown in FIG. 9A, the collected signal z (n) is shown in FIG. 9B, and the error signal e (N) As shown in FIG. The relationship between the horizontal axis and the vertical axis is the same as in FIG.
Under condition 2, the error signal e (n) is large because the adaptive filter coefficient ^ (n) generated by the estimation means 82 of the echo canceling unit 80 by the adaptive algorithm cannot follow the change in the echo path 6.

条件2における従来方法と実施例1の判定結果を、図10に示す。図10(a)にシミュレーション条件を示す。図10(b)は誤差信号e(n)を判定に用いる従来方法による判定結果である。図10(c)は実施例1による判定結果である。
図10の横軸はフレームナンバー(フレーム数)であり、この例では1フレームが20msであるので0〜10秒間を表わす。縦軸は判定結果を示し、0:無音区間、1:受話シングルトーク状態、2:ダブルトーク状態を表わす。
図10(a)から分かるようにダブルトーク状態は無い状況でシミュレーションしているのにも関わらず、従来方法では1秒、2秒、3秒、4秒付近を中心に、誤判定が発生する。
FIG. 10 shows the determination results of the conventional method and Example 1 under Condition 2. FIG. 10A shows simulation conditions. FIG. 10B shows a determination result by a conventional method using the error signal e (n) for determination. FIG. 10C shows a determination result according to the first embodiment.
The horizontal axis of FIG. 10 is the frame number (the number of frames). In this example, one frame is 20 ms, so it represents 0 to 10 seconds. The vertical axis represents the determination result, and represents 0: silent interval, 1: received single talk state, 2: double talk state.
As can be seen from FIG. 10 (a), in spite of the simulation in a situation where there is no double talk state, the conventional method causes misjudgment around 1 second, 2 seconds, 3 seconds and 4 seconds. .

その従来方法に対して実施例1の結果は、シミュレーション条件と同じ判定結果が得られている。
このように実施例1のダブルトーク状態判定手段を用いたダブルトーク状態判定装置によれば、受話信号x(n)と収音信号z(n)のみを用いてダブルトーク状態の判定を行なうので、精度の良い判定が可能である。
以上の各実施例の他、この発明である各装置及び方法は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記装置及び方法において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。
Compared to the conventional method, the result of Example 1 is the same determination result as the simulation condition.
As described above, according to the double talk state determination device using the double talk state determination unit of the first embodiment, the determination of the double talk state is performed using only the received signal x (n) and the collected sound signal z (n). It is possible to make a determination with high accuracy.
In addition to the above embodiments, each apparatus and method according to the present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the spirit of the present invention. Further, the processes described in the above apparatus and method are not only executed in time series according to the order of description, but also may be executed in parallel or individually as required by the processing capability of the apparatus that executes the process. Good.

また、上記各装置における処理機能をコンピュータによって実現する場合、反響消去装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記反響消去装置における処理機能がコンピュータ上で実現される。
この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD(Digital Versatile Disc)、DVD−RAM(Random Access Memory)、CD−ROM(Compact Disc Read Only Memory)、CD−R(Recordable)/RW(ReWritable)等を、光磁気記録媒体として、MO(Magneto Optical disc)等を、半導体メモリとしてEEP−ROM(Electronically Erasable and Programmable-Read Only Memory)等を用いることができる。
Further, when the processing functions in the above devices are realized by a computer, the processing contents of the functions that the echo canceling device should have are described by a program. By executing this program on a computer, the processing function of the echo canceling apparatus is realized on the computer.
The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD−ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。
このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記録装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP(Application Service Provider)型のサービスによって、上述の処理を実行する構成としてもよい。なお、この形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの(コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等)を含むものとする。
The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.
A computer that executes such a program, for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer in its recording device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to a computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。   In this embodiment, each apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

この発明の実施例1のダブルトーク状態判定手段20を含む反響消去装置100の機能構成例を示す図。The figure which shows the function structural example of the echo cancellation apparatus 100 containing the double talk state determination means 20 of Example 1 of this invention. この発明の実施例1のダブルトーク状態判定手段20の機能構成例を示す図。The figure which shows the function structural example of the double talk state determination means 20 of Example 1 of this invention. ダブルトーク状態判定手段20の主要な処理の流れを示す図。The figure which shows the flow of the main processes of the double talk state determination means 20. ダブルトーク状態判定部32の動作のタイムチャートの一例を示す、図4(a)に受話シングルトーク状態とダブルトーク状態を示す、図4(b)ダブルトーク状態を誤判定する状態を示す図である。FIG. 4A shows an example of a time chart of the operation of the double talk state determination unit 32, FIG. 4A shows a received single talk state and a double talk state, and FIG. 4B shows a state in which the double talk state is erroneously determined. is there. この発明の実施例2のダブルトーク状態判定手段50の機能構成例を示す図。The figure which shows the function structural example of the double talk state determination means 50 of Example 2 of this invention. この発明の実施例3のダブルトーク状態判定手段60の機能構成例を示す図。The figure which shows the function structural example of the double talk state determination means 60 of Example 3 of this invention. 受話信号スペクトル包絡計算部62で計算されたスペクトル包絡の一例を、模式的に示す図。The figure which shows typically an example of the spectrum envelope calculated by the received signal spectrum envelope calculation part 62. FIG. 実施例1のダブルトーク状態判定手段におけるダブルトーク状態の判定結果をシミュレーションした条件1の結果を示す、図8(a)は受話信号x(n)、図8(b)は、収音信号z(n)、図8(c)は誤差信号e(n)を示す図である。FIGS. 8A and 8B show the result of Condition 1 in which the determination result of the double talk state in the double talk state determination unit of the first embodiment is simulated. FIG. 8B shows the received signal x. (N) and FIG. 8 (c) are diagrams showing the error signal e (n). 実施例1のダブルトーク状態判定手段におけるダブルトーク状態の判定結果をシミュレーションした条件2の結果を示す、図9(a)は受話信号x(n)、図9(b)は、収音信号z(n)、図9(c)は誤差信号e(n)を示す図である。FIGS. 9A and 9B show the results of Condition 2 in which the determination result of the double talk state in the double talk state determination unit of the first embodiment is simulated. FIG. 9B shows the received signal x (n), and FIG. (N) and FIG. 9 (c) are diagrams showing the error signal e (n). ダブトーク状態の判定結果を示す、図10(a)は正しい判定結果、図10(b)は誤差信号e(n)を判定に用いる従来方法による判定結果、図10(c)は実施例1のダブルトーク状態判定手段による判定結果を示す図である。FIG. 10A shows the determination result of the dub talk state, FIG. 10B shows the determination result by the conventional method using the error signal e (n) for the determination, and FIG. 10C shows the result of the first embodiment. It is a figure which shows the determination result by a double talk state determination means. ダブルトーク状態判定部70の機能構成例を示す図。The figure which shows the function structural example of the double talk state determination part. ダブルトーク状態判定部70の主要な処理の流れを示す図。The figure which shows the flow of the main processes of the double talk state determination part. 近端に配置された従来の反響消去装置800を示す図。The figure which shows the conventional echo cancellation apparatus 800 arrange | positioned at the near end.

Claims (10)

受話端と送話端から同時に通話信号が入力される同時通話であるダブルトーク状態か、送話端に通話信号が入力されず受話端のみに通話信号が入力される受話シングルトーク状態かを判定するダブルトーク状態判定手段を備えた反響消去装置において、
上記受話端と送話端から入力される離散化された受話信号x(n)と収音信号z(n)に対して、nは自然数、
上記受話信号x(n)の短時間フレーム毎の受話指標値が、閾値xthと比較して発話の連続する上記短時間フレーム数を計数する受話発話判定計数手段と、
上記収音信号z(n)の短時間フレーム毎の収音指標が、閾値zthと比較して発話の連続する上記短時間フレーム数を計数する収音発話判定計数手段と、
上記受話発話判定計数手段と上記収音発話判定計数手段で求められたそれぞれの上記計数値の大小比較により、ダブルトーク状態か受話シングルトーク状態かを判定するダブルトーク状態判定部と、
を具備する反響消去装置。
Judges whether it is a double talk state where a call signal is input simultaneously from the receiving end and the transmitting end, or a received single talk state where a call signal is not input to the transmitting end and a call signal is input only to the receiving end In the echo canceller equipped with the double talk state determination means to
For the discretized received signal x (n) and the collected sound signal z (n) input from the receiving end and the transmitting end, n is a natural number,
A reception utterance determination counting unit that counts the number of short-time frames in which utterances continue as compared to a threshold value xth as a reception index value for each short-time frame of the reception signal x (n);
A sound collection utterance determination counting unit that counts the number of short-time frames in which the sound collection is compared with a threshold zth as a sound collection index for each short-time frame of the sound collection signal z (n);
A double-talk state determination unit for determining whether a double-talk state or a received-single-talk state by comparing the respective counts obtained by the received-utterance determination-counting unit and the collected-speech determination unit;
An echo canceling device comprising:
請求項1に記載の反響消去装置において、
上記受話発話判定計数手段は、
各受話信号x(n)の上記短時間フレーム内のパワー又は振幅の絶対値の総和あるいはそれらの平均値を上記受話指標値として求める受話指標値計算部と、
上記受話指標値が上記閾値xthより大又は閾値xth以上であれば受話発話フラグを出力する受話発話判定部と、
上記受話発話判定部が出力する上記受話発話フラグの連続するフレーム数cx(k)を計数する受話発話カウンターと、を備え、
上記収音発話判定計数手段は、
各収音信号z(n)の上記短時間フレーム内のパワー又は振幅の絶対値の総和あるいはそれらの平均値を上記収音指標値として求める収音指標値計算部と、
上記収音指標値が上記閾値zthより大又は閾値zth以上であれば収音発話フラグを出力する収音発話判定部と、
上記収音発話判定部が出力する上記収音発話フラグの連続するフレーム数cz(k)を計数する収音発話カウンターと、を備え、
上記ダブルトーク状態判定部は、上記計数したcx(k)と上記計数したcz(k)を入力し、cz(k)がcx(k)より大きければダブルトーク状態と判定し、上記cz(k)がcx(k)より小さいか、等しければ受話シングルトーク状態と判定するものであることを特徴とする反響消去装置。
The echo canceling device according to claim 1,
The received utterance determination counting means includes:
A reception index value calculation unit for obtaining a sum of absolute values of power or amplitude in the short time frame of each reception signal x (n) or an average value thereof as the reception index value;
A reception utterance determination unit that outputs a reception utterance flag if the reception index value is greater than the threshold xth or greater than or equal to the threshold xth;
A received speech counter that counts the number of consecutive frames cx (k) of the received speech flag output by the received speech determination unit,
The sound collection utterance determination counting means includes:
A sound collection index value calculation unit for obtaining a sum of absolute values of power or amplitude in the short time frame of each sound collection signal z (n) or an average value thereof as the sound collection index value;
A sound collection utterance determination unit that outputs a sound collection utterance flag if the sound collection index value is greater than the threshold zth or greater than or equal to the threshold zth;
A sound collection utterance counter that counts the number of consecutive frames cz (k) of the sound collection utterance flag output by the sound collection utterance determination unit,
The double talk state determination unit inputs the counted cx (k) and the counted cz (k). If cz (k) is larger than cx (k), the double talk state determination unit determines a double talk state, and the cz (k ) Is smaller than or equal to cx (k), it is determined that the received single talk state is present.
請求項1に記載された反響消去装置において、
上記受話信号x(n)を短時間フレーム毎に離散周波数領域信号に変換する受話信号周波数分析部と、
上記収音信号z(n)を短時間フレーム毎に離散周波数領域信号に変換する収音信号周波数分析部と、を備え、
上記離散周波数領域受話信号と上記離散周波数領域収音信号とが入力され、これらの両信号の上記短時間フレーム毎の相関値を計算する相関部と、
を備え、
上記ダブルトーク状態判定部は、上記計数したcx(k)と上記計数したcz(k)と上記相関値を入力として、
ダブルトーク状態を判定するものであることを特徴とする反響消去装置。
In the echo canceller according to claim 1,
A received signal frequency analyzer for converting the received signal x (n) into a discrete frequency domain signal for each short-time frame;
A sound collection signal frequency analysis unit that converts the sound collection signal z (n) into a discrete frequency domain signal for each short-time frame, and
A correlation unit that receives the discrete frequency domain received signal and the discrete frequency domain collected signal and calculates a correlation value for each of the short-time frames of the two signals;
With
The double talk state determination unit receives the counted cx (k), the counted cz (k), and the correlation value as inputs.
An echo canceling device for determining a double talk state.
請求項3に記載された反響消去装置において、
上記相関部は、受話信号パワー計算部と収音信号パワー計算部と相関計算部を備え、
上記受話信号パワー計算部は、上記離散周波数領域受話信号の各周波数成分のパワーを計算する受話離散パワー計算部と、上記離散周波数領域受話信号のパワーを複数の周波数バンドに分割する受話バンド分割部と、上記受話バンド分割部で分割したバンド毎に上記離散周波数領域受話信号のパワーを平均して受話バンド平均パワーを求める受話バンド平均化部を含むものであり、
上記収音信号パワー計算部は、上記離散周波数領域収音信号の各周波数成分のパワーを計算する収音離散パワー計算部と、上記離散周波数領域収音信号のパワーを複数の周波数バンドに分割する収音バンド分割部と、上記収音バンド分割部で分割したバンド毎に上記離散周波数領域収音信号のパワーを平均して収音バンド平均パワーを求める収音バンド平均化部を含むものであり、
上記相関計算部は、上記受話バンド平均パワーと上記収音バンド平均パワーの上記短時間フレーム毎の相関値を計算して上記相関値とするものであり、
上記ダブルトーク状態判定部は、上記相関値と上記計数値cx(k)とcz(k)とが入力され、cx(k)とcz(k)の小さい方が閾値thdに対して大きいか否かによって、ダブルトーク状態の判定方法を切り替えるものである
ことを特徴とする反響消去装置。
In the echo canceller according to claim 3,
The correlation unit includes a received signal power calculation unit, a collected sound signal power calculation unit, and a correlation calculation unit,
The received signal power calculating unit includes an received discrete power calculating unit that calculates power of each frequency component of the discrete frequency domain received signal, and an received band dividing unit that divides the power of the discrete frequency domain received signal into a plurality of frequency bands. And a reception band averaging unit that averages the power of the discrete frequency domain reception signal for each band divided by the reception band division unit to obtain reception band average power,
The sound collection signal power calculation unit divides the power of the discrete frequency domain sound collection signal into a plurality of frequency bands, and a sound collection discrete power calculation unit that calculates the power of each frequency component of the discrete frequency domain sound collection signal A sound collection band dividing unit; and a sound collection band averaging unit that averages the power of the discrete frequency domain sound collection signal for each band divided by the sound collection band division unit to obtain the sound collection band average power. ,
The correlation calculation unit calculates the correlation value for each short-time frame of the reception band average power and the sound collection band average power to obtain the correlation value,
The double talk state determination unit receives the correlation value and the count values cx (k) and cz (k), and determines whether the smaller of cx (k) and cz (k) is greater than the threshold thd. The echo canceling apparatus is characterized in that the method for determining the double-talk state is switched depending on whether or not.
受話端と送話端から同時に信号が入力される同時通話であるダブルトーク状態か、送話端に通話信号が入力されず受話端のみに通話信号が入力される受話シングルトーク状態かを判定するダブルトーク状態判定方法において、
上記受話端と送話端から入力される離散化された受話信号x(n)と収音信号z(n)に対して、
上記受話信号x(n)の短時間フレーム毎の受話指標値が、閾値xthより大きい状態を計数する過程と、
上記収音信号z(n)の短時間フレーム毎の収音指標が、閾値zthより大きい状態を計数する過程と、
上記受話発話判定計数手段と上記収音発話判定計数手段で求められたそれぞれの上記計数値の大小比較により、ダブルトーク状態か受話シングルトーク状態かを判定する過程と
を含むことを特徴とするダブルトーク状態判定方法。
Determines whether the device is in a double-talk state where a signal is input simultaneously from the receiving end and the transmitting end, or in a receiving single-talk state where a call signal is not input to the transmitting end and a call signal is input only to the receiving end In the double talk state determination method,
For the discretized received signal x (n) and the collected sound signal z (n) input from the receiving end and the transmitting end,
A process of counting a state in which the reception index value for each short-time frame of the reception signal x (n) is larger than a threshold value xth;
A process of counting a state in which the sound collection index for each short-time frame of the sound collection signal z (n) is greater than a threshold value zth;
A step of determining whether the state is a double talk state or a received single talk state by comparing the respective count values obtained by the received speech determination unit and the collected speech determination unit. Talk state determination method.
請求項5に記載のダブルトーク状態判定方法において、
上記ダブルトーク状態か受話シングルトーク状態かを判定する過程は、
上記受話信号x(n)の短時間フレーム毎のエネルギー又は振幅の絶対値に対応する受話指標が、閾値xthより大きい状態の連続するフレーム数cx(k)を求める過程と、
上記収音信号z(n)の短時間フレーム毎のエネルギー又は振幅の絶対値に対応する収音指標が、閾値zthより大きい状態の連続するフレーム数cz(k)を求める過程と、
上記求めたcx(k)とcz(k)を入力し、cz(k)がcx(k)より大きければダブルトーク状態と判定する過程と、
を含むことを特徴とするダブルトーク状態判定方法。
In the double talk state judging method according to claim 5,
The process of determining whether the above-mentioned double talk state or received single talk state is
Obtaining a number cx (k) of continuous frames in which the reception index corresponding to the absolute value of the energy or amplitude of each received signal x (n) for each short-time frame is greater than the threshold value xth;
Obtaining a number of consecutive frames cz (k) in which the sound collection index corresponding to the absolute value of energy or amplitude for each short-time frame of the sound collection signal z (n) is greater than a threshold value zth;
A process of inputting the obtained cx (k) and cz (k) and determining a double talk state if cz (k) is larger than cx (k);
A method for determining a double talk state, comprising:
請求項6に記載のダブルトーク状態判定方法において、
上記短時間フレームに分割された時間領域の受話信号x(k)と収音信号z(k)をそれぞれ離散周波数領域受話信号X(k,f*)(*=1,2,‥‥,N)と離散周波数領域収音信号Z(k,f*)に変換する過程と、
上記短時間フレームにおける上記離散周波数領域受話信号X(k,f*)と上記離散周波数領域収音信号Z(k,f*)との相関値を求める過程と、
上記受話発話カウンターの計数値cx(k)と収音発話カウンターの計数値cz(k)と上記相関値を入力としてダブルトーク状態を判定する判定過程と、
を含むことを特徴とするダブルトーク状態判定方法。
In the double talk state judging method according to claim 6,
The time domain received signal x (k) and the collected sound signal z (k) divided into the short-time frames are respectively converted into discrete frequency domain received signals X (k, f *) (* = 1, 2,..., N ) And a discrete frequency domain sound pickup signal Z (k, f *),
Obtaining a correlation value between the discrete frequency domain received signal X (k, f *) and the discrete frequency domain collected signal Z (k, f *) in the short time frame;
A determination process of determining a double talk state by inputting the count value cx (k) of the received speech counter, the count value cz (k) of the collected speech counter, and the correlation value;
A method for determining a double talk state, comprising:
請求項6に記載のダブルトーク状態判定方法において、
上記短時間フレームに分割された時間領域の受話信号x(k)と収音信号z(k)をそれぞれ離散周波数領域受話信号X(k,f*)(*=1,2,‥‥,N)と離散周波数領域収音信号Z(k,f*)に変換する過程と、
上記離散周波数領域受話信号X(k,f*)を複数の周波数バンドに分割して受話バンド列を生成する過程と、
上記離散周波数領域収音信号Z(k,f*)を複数の周波数バンドに分割して収音バンド列を生成する過程と、
上記短時間フレームにおける上記受話バンド列X(k,m*)と上記収音バンド列Z(k,m*)との相関値を求める過程と、
上記受話発話カウンターの計数値cx(k)と収音発話カウンターの計数値cz(k)と上記相関値を入力としてダブルトーク状態を判定する判定過程と、
を含むことを特徴とするダブルトーク状態判定方法。
In the double talk state judging method according to claim 6,
The time domain received signal x (k) and the collected sound signal z (k) divided into the short-time frames are respectively converted into discrete frequency domain received signals X (k, f *) (* = 1, 2,..., N ) And a discrete frequency domain sound pickup signal Z (k, f *),
Dividing the discrete frequency domain received signal X (k, f *) into a plurality of frequency bands to generate a received band sequence;
Dividing the discrete frequency domain sound collection signal Z (k, f *) into a plurality of frequency bands to generate a sound collection band sequence;
Obtaining a correlation value between the reception band sequence X (k, m *) and the sound collection band sequence Z (k, m *) in the short-time frame;
A determination process of determining a double talk state by inputting the count value cx (k) of the received speech counter, the count value cz (k) of the collected speech counter, and the correlation value;
A method for determining a double talk state, comprising:
請求項1乃至4の何れかに記載した各装置としてコンピュータを機能させるための装置プログラム   An apparatus program for causing a computer to function as each apparatus according to claim 1. 請求項9に記載した何れかのプログラムを記録したコンピュータで読み取り可能な記録媒体。   A computer-readable recording medium on which any one of the programs according to claim 9 is recorded.
JP2006317578A 2006-11-24 2006-11-24 Double talk state determination method, echo canceling apparatus using the method, program thereof, and recording medium thereof Active JP4542538B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2006317578A JP4542538B2 (en) 2006-11-24 2006-11-24 Double talk state determination method, echo canceling apparatus using the method, program thereof, and recording medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2006317578A JP4542538B2 (en) 2006-11-24 2006-11-24 Double talk state determination method, echo canceling apparatus using the method, program thereof, and recording medium thereof

Publications (2)

Publication Number Publication Date
JP2008131593A true JP2008131593A (en) 2008-06-05
JP4542538B2 JP4542538B2 (en) 2010-09-15

Family

ID=39556964

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2006317578A Active JP4542538B2 (en) 2006-11-24 2006-11-24 Double talk state determination method, echo canceling apparatus using the method, program thereof, and recording medium thereof

Country Status (1)

Country Link
JP (1) JP4542538B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015195510A (en) * 2014-03-31 2015-11-05 沖電気工業株式会社 Echo suppression device, echo suppression program and echo suppression method
JP2016025471A (en) * 2014-07-18 2016-02-08 沖電気工業株式会社 Echo suppression device, echo suppression program, echo suppression method and communication terminal
CN113345459A (en) * 2021-07-16 2021-09-03 北京融讯科创技术有限公司 Method and device for detecting double-talk state, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0786992A (en) * 1993-09-13 1995-03-31 Toshiba Corp Radio telephone system
JPH10308816A (en) * 1997-05-06 1998-11-17 Fujitsu Ltd Voice switch for speaking equipment
JP2006033789A (en) * 2004-06-16 2006-02-02 Nippon Telegr & Teleph Corp <Ntt> Method, device, and program for estimating amount of echo path coupling; method, device, and program for controlling echoes; method for suppressing echoes; echo suppressor; echo suppressor program; method and device for controlling amount of losses on transmission lines; program for controlling losses on transmission lines; method, device, and program for suppressing multichannel echoes; and recording medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0786992A (en) * 1993-09-13 1995-03-31 Toshiba Corp Radio telephone system
JPH10308816A (en) * 1997-05-06 1998-11-17 Fujitsu Ltd Voice switch for speaking equipment
JP2006033789A (en) * 2004-06-16 2006-02-02 Nippon Telegr & Teleph Corp <Ntt> Method, device, and program for estimating amount of echo path coupling; method, device, and program for controlling echoes; method for suppressing echoes; echo suppressor; echo suppressor program; method and device for controlling amount of losses on transmission lines; program for controlling losses on transmission lines; method, device, and program for suppressing multichannel echoes; and recording medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015195510A (en) * 2014-03-31 2015-11-05 沖電気工業株式会社 Echo suppression device, echo suppression program and echo suppression method
JP2016025471A (en) * 2014-07-18 2016-02-08 沖電気工業株式会社 Echo suppression device, echo suppression program, echo suppression method and communication terminal
CN113345459A (en) * 2021-07-16 2021-09-03 北京融讯科创技术有限公司 Method and device for detecting double-talk state, computer equipment and storage medium
CN113345459B (en) * 2021-07-16 2023-02-21 北京融讯科创技术有限公司 Method and device for detecting double-talk state, computer equipment and storage medium

Also Published As

Publication number Publication date
JP4542538B2 (en) 2010-09-15

Similar Documents

Publication Publication Date Title
KR101444100B1 (en) Noise cancelling method and apparatus from the mixed sound
JP5000647B2 (en) Multi-sensor voice quality improvement using voice state model
KR100750440B1 (en) Reverberation estimation and suppression system
TWI392322B (en) Double talk detection method based on spectral acoustic properties
US20120010881A1 (en) Monaural Noise Suppression Based on Computational Auditory Scene Analysis
US20070280472A1 (en) Adaptive acoustic echo cancellation
JP2002501337A (en) Method and apparatus for providing comfort noise in a communication system
US9454956B2 (en) Sound processing device
US11380312B1 (en) Residual echo suppression for keyword detection
JP4542538B2 (en) Double talk state determination method, echo canceling apparatus using the method, program thereof, and recording medium thereof
Zhou et al. Residual acoustic echo suppression based on efficient multi-task convolutional neural network
US8406430B2 (en) Simulated background noise enabled echo canceller
JP2003188776A (en) Acoustic echo erasing method and device, and acoustic echo erasure program
JP2005064968A (en) Method, device and program for collecting sound, and recording medium
JP4709714B2 (en) Echo canceling apparatus, method thereof, program thereof, and recording medium thereof
JP2003250193A (en) Echo elimination method, device for executing the method, program and recording medium therefor
JP4594854B2 (en) Voice switch method, voice switch device, voice switch program, and recording medium recording the program
JP2004147069A (en) Voice switching method, voice switch, voice switching program, and recording medium having the program recorded thereon
JP2002064617A (en) Echo suppression method and echo suppression equipment
JP4495704B2 (en) Sound image localization emphasizing reproduction method, apparatus thereof, program thereof, and storage medium thereof
JP2001144656A (en) Multi-channel echo elimination method and system, and recording medium recording its program
JP2002261659A (en) Multi-channel echo cancellation method, its apparatus, its program, and its storage medium
JP5325134B2 (en) Echo canceling method, echo canceling apparatus, program thereof, and recording medium
JP5183506B2 (en) Howling prevention device
JP2002261660A (en) Multi-channel echo cancellation method, its apparatus, its program, and its storage medium

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20080313

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20100603

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20100615

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20100625

R150 Certificate of patent or registration of utility model

Ref document number: 4542538

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130702

Year of fee payment: 3

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350