JP2001514823A

JP2001514823A - Echo-reducing telephone with state machine controlled switch

Info

Publication number: JP2001514823A
Application number: JP53949498A
Authority: JP
Inventors: グノセペリウス，ヨハン
Original assignee: テレフオンアクチーボラゲツトエルエムエリクソン（パブル）
Priority date: 1997-03-11
Filing date: 1998-02-24
Publication date: 2001-09-11
Also published as: CN1255255A; AU735505B2; CA2283590A1; BR9808240A; AU6426498A; SE511650C2; SE9700873L; WO1998040974A1; SE9700873D0; EP0974205A1; TW407435B

Abstract

(57)【要約】本発明の目的は、漏話により生じるエコーを減少することである。上述した問題、漏話により生じるエコーをいかにして減少するかは、マイクロフォンからの信号の信号エネルギ、マイクロフォンからの信号のVADフラグ、スピーカへの信号の信号エネルギ、及びスピーカへの信号のVADフラグを入力として取るステート・マシンにより制御されるスイッチをマイクロフォンとスピーカに導入することにより解決される。 (57) Summary It is an object of the present invention to reduce echoes caused by crosstalk. The problem described above, how to reduce the echo caused by crosstalk, depends on the signal energy of the signal from the microphone, the VAD flag of the signal from the microphone, the signal energy of the signal to the speaker, and the VAD flag of the signal to the speaker. The problem is solved by introducing switches to the microphones and speakers controlled by a state machine that takes as input.

Description

【発明の詳細な説明】ステート・マシン制御スイッチによるエコー減少電話発明の技術分野本発明は一般に電気通信に関係し、特にインターネットを介した音声通信用の音声処理に関係する。関連技術の説明標準的なインターネット電話は、音声ボード、マイクロフォン及び２個のスピーカを有するPCを使用する。マイクロフォンとスピーカはしばしば机上で互いに近接して配置される。このような構成により受信端でエコーとして聞こえる相当量の漏話が発生する。インターネット電話を使用可能とするにはこのエコーを抑止しなければならない。 GSMでは、移動電話のユーザーが会話中であるか又は会話中でない時を検出するためVAD（Voice Activity Detection、音声活動検出）を使用することが公知である。この情報を使用して、音声を送信する時には帯域幅を減少することかが可能である。VOX原理（Voice Operated Transmission、音声操作送信）による不連続音声コーディングでは、VAD装置は受取った音声列が人間の音声であるかどうかを検出する責任がある。VAD装置は２つの異なる状態、すなわち音声列が人間の声であることを指示する第１状態と、音声列が人間の声ではないことを指示する他の状態を取る。 VAD装置が、与えられた音声列が人間の音声を表していることを検出した場合、VAD装置は、音声列を音声フレームにコード化する音声コーディング装置へ第１状態信号を発行する。反対に、与えられた音声列が人間の声以外の何かを表している場合、VAD装置はSID（Silnce Descriptor、静寂記述子）装置へ第２状態信号を発行する。前記SID装置は各N番目のフレームにSIDフレームを送り出す。残りのN-1のフレームを送信する可能な機会の間は、何も送信されない。SIDフレームは送信側の概算背景雑音と概算雑音スペクトルに関する情報を含む。この処理により電池電力と無線帯域幅が節約可能となる。 SID装置が第１状態信号を発生している状態から第２状態信号を発生している状態へ、すなわち音声を検出している状態から非音声時間間隔を検出する状態へ変化すると、いわゆるハングオーバー（hang-over）が通常適用されて、この間音声コーディング装置は受取った音声列が人間の音声であるかのように音声フレームを送り続ける。ハングオーバー時間後に、VAD装置が依然として非音声を検出している場合、SIDフレームが発生される。この処理の理由は、人間の声の単語間の短い休止は非音声と解釈されるべきではなく、音声フレーム発生器は依然としてアクティブでなければならないからである。発明の要旨本発明は漏話により導入されるエコーの減少用の方法と装置を開示する。本発明の目的は従って漏話により生じるエコーを減少することである。上述した問題、漏話により生じるエコーをいかに減少するかは、マイクロフォンからの信号の信号のエネルギ、マイクロフォンからの信号のVADフラグ、スピーカへの信号の信号エネルギ、及びスピーカへの信号のVADフラグを入力として取るステートマシンにより制御されるスイッチをスピーカとマイクロフォンに入れることにより解決される。本発明の利点の１つは、そう多くの計算能力を必要とすることなく、漏話により生じるエコーが著しく減少することである。その他の利点は、以下の詳細な説明の下で当業者には明らかである。本発明の更なる適用範囲は以下に与える詳細な説明から明らかとなる。しかしながら、以下の詳細な説明から本発明の範囲内の各種の変更や修正が当業者には明らかとなるため、本発明の望ましい実施例は単なる説明用に与えたものであることを理解すべきである。図面の簡単な説明図１は本発明の１実施例のブロック線図である。図２は有限状態図である。望ましい実施例の詳細な説明図１では、マイクロフォン１０１はGSMエンコーダ１０２に接続される。GSMエンコーダ１０２に信号が到達する前に、既知の技術に従ってディジタル化されサンプルされるが、これは図１には示されていない。GSMエンコーダ１０２から、コード化信号は図面に図示されていない受信器に送信され、最初に送信を付勢または減勢可能なスイッチ１０３を通過する。GSMエンコーダ１０２からはVAD装置１０４にACF_E（Autocorrection CoeFficient、自動訂正係数）が渡される。VAD 装置１０５にはGSMフレームから長期予測装置ラグ値_NEも渡される。VAD装置１０４からは、有限ステート・マシン１０５へ信号のエネルギを表す値P_Eが渡される。VAD装置１０４が人間の音声を検出したかどうかを指示するフラグF_EもVAD装置１０４は計算する。フラグF_Eは有限ステート・マシン１０５に渡される。フラグ F_Eは人間の音声を検出した場合に真である。さらに図１には、送信者（図示せず）から受信し、GSMデコーダ１０６へ渡されるサンプルされたコード化音声信号がある。GSMデコーダ１０６から、デコードされサンプルされた音声信号がスピーカ１０７に渡され、最初に音声信号がスピーカに到達することをエイブルまたはディスエイブル可能なスイッチ１０８を通る。スピーカが正しく機能するためには、既知技術によるD/A変換を必要とするが、図１には図示されていない。受信したサンプルされたコード化音声信号から長期予測装置ラグ値N_Dが演鐸されVAD装置１０９に渡される。 GSMフレームのデコードは通常VAD装置の使用には関係していないため、GSMデコーダはACFを計算するための必要なパラメータが不足している。ACFを計算可能とするため、自動相関装置１１０がGSMデコーダ１０６からのデータを受け取ってVAD装置１０９へ渡されるACF_Dを計算する。自動相関装置１１０は基準に記載されているようにGSMデコーダの一部である。スピーカへの音声信号のエネルギの指示である値P_DがVAD装置１０９から有限ステート・マシン１０５に渡される。VAD装置１０９からは、VAD装置が人間の音声を検出したかどうかを指示するフラグF_Dが前記有限ステート・マシンに渡される。有限ステート・マシン１０６は、有限ステート・マシンに入力される値に応じてスイッチ１０３と１０９を設定する機能を含む。図２に、図１の有限ステート・マシンの状態と可能な遷移を図示する。状態間の遷移は以下の説明に従って行われる。以下の定義が使用される：・F_E：コード化時のVADフラグ・F_D：デコード時のVADフラグ・P_E：コード化時の信号エネルギ・P_D：デコード時の信号エネルギ・ハングオーバー：方向を切り替える決定から切替が行われるまでの時間。この時間は部屋のエコーを補償する十分な長さがなければならない。２０１．F_E＝１かつF_D＝０またはF_E＝１かつP_E＞P_D、ハングオーバー＝０２０２．F_E＝０、ハングオーバー＝600ms ２０３．F_D＝１かつF_E＝０またはF_D＝１かつP_D＞P_E、ハングオーバー＝０２０４．F_D＝０、ハングオーバー＝600ms ２０５．F_D＝１かつP_D＞P_E、ハングオーバー＝600ms ２０６．F_E＝１かつP_E＞P_D、ハングオーバー＝600ms 状態送信中２０７では、マイクロフォンから音声信号の送信を制御するスイッチはエイブルされ、スピーカへ音声信号の送信を制御するスイッチはディスエイブルされる。状態受信中２０８では、マイクロフォンから音声信号の送信を制御するスイッチはディスエイブルされ、スピーカへの送信を制御するスイッチはエイブルされる。アイドル状態２０９では両方のスイッチがディスエイブルされる。本発明を以上のように説明してきたが、同じものが多数の方法に変更できることは明らかである。このような変更は本発明の要旨と範囲から逸脱するものとは見なせるものではなく、当業者には明らかなこの様な全ての変更は以下の請求の範囲の範囲内に含まれるものと見なせる。Description: FIELD OF THE INVENTION The present invention relates generally to telecommunications, and more particularly to voice processing for voice communications over the Internet. 2. Description of the Related Art A standard Internet phone uses a PC with a voice board, microphone and two speakers. Microphones and speakers are often placed close to each other on a desk. Such a configuration causes a considerable amount of crosstalk that can be heard as an echo at the receiving end. This echo must be suppressed to enable internet telephony. In GSM, VAD for detecting when the user of the mobile telephone is not in or conversation is a conversation (V oice A ctivity D etection, voice activity detection) is known to use. Using this information, it is possible to reduce the bandwidth when transmitting voice. VOX principles (V oice O perated Transmission, voice operated transmission) In the discontinuous speech coding by, the VAD apparatus is responsible for voice string received to detect whether the human voice. The VAD device takes two different states, a first state indicating that the speech sequence is a human voice and another state indicating that the speech sequence is not a human voice. If the VAD device detects that the given speech sequence represents human speech, the VAD device issues a first state signal to a speech coding device that encodes the speech sequence into speech frames. Conversely, the voice string given when they represent something other than a human voice, VAD apparatus SID (Si lnce D escriptor, silence descriptor) issues a second status signal to the apparatus. The SID device sends out a SID frame for each Nth frame. Nothing is transmitted during the possible opportunities to transmit the remaining N-1 frames. The SID frame contains information about the estimated background noise and the estimated noise spectrum of the transmitting side. This process can save battery power and wireless bandwidth. When the SID device changes from generating the first status signal to generating the second status signal, that is, from detecting voice to non-voice time interval, a so-called hangover occurs. (Hang-over) is usually applied, during which the speech coding device continues to send speech frames as if the received speech sequence were human speech. After the hangover time, if the VAD device is still detecting non-voice, a SID frame is generated. The reason for this processing is that short pauses between words in the human voice should not be interpreted as non-speech, and the speech frame generator must still be active. SUMMARY OF THE INVENTION The present invention discloses a method and apparatus for reducing echo introduced by crosstalk. It is therefore an object of the present invention to reduce the echo caused by crosstalk. The above problems, how to reduce the echo caused by the crosstalk, input the signal energy of the signal from the microphone, the VAD flag of the signal from the microphone, the signal energy of the signal to the speaker, and the VAD flag of the signal to the speaker The problem is solved by putting a switch controlled by a state machine taking into the speaker and microphone. One of the advantages of the present invention is that echoes caused by crosstalk are significantly reduced without requiring much computing power. Other advantages will be apparent to those skilled in the art under the following detailed description. Further areas of applicability of the present invention will become apparent from the detailed description provided hereinafter. However, it should be understood that preferred embodiments of the present invention have been given by way of illustration only, since various changes and modifications within the scope of the invention will become apparent to those skilled in the art from the following detailed description. It is. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of one embodiment of the present invention. FIG. 2 is a finite state diagram. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT In FIG. 1, a microphone 101 is connected to a GSM encoder 102. Before the signal reaches the GSM encoder 102, it is digitized and sampled according to known techniques, but this is not shown in FIG. From the GSM encoder 102, the coded signal is transmitted to a receiver (not shown) and first passes through a switch 103 that can activate or deactivate the transmission. ACF _E in VAD apparatus 104 from the GSM encoder 102 (A utocorrection C oe F ficient , automatic correction factor) is passed. The long-term prediction device lag value _NE is also passed to the VAD device 105 from the GSM frame. From VAD apparatus 104, the value P _E representing the energy of the signal to a finite state machine 105 is passed. Flag F _E also VAD apparatus 104 VAD apparatus 104 instructs whether the detected human speech is calculated. Flag F _E is passed to a finite state machine 105. The flag _FE is true when a human voice is detected. 1 also includes a sampled coded audio signal received from a sender (not shown) and passed to GSM decoder 106. From the GSM decoder 106, the decoded and sampled audio signal is passed to the speaker 107 and first passes through a switch 108 that can enable or disable the audio signal from reaching the speaker. In order for the loudspeaker to function properly, D / A conversion according to a known technique is required, but is not shown in FIG. Received sampled coded long term predictor lag value N _D from the audio signal is passed to the VAD apparatus 109 is were deducted and examples of the solution were. The GSM decoder lacks the necessary parameters to calculate the ACF, since decoding of the GSM frame is usually not related to the use of a VAD device. To enable the ACF to be calculated, the autocorrelator 110 receives the data from the GSM decoder 106 and calculates the ACF _D passed to the VAD device 109. The autocorrelator 110 is part of the GSM decoder as described in the standard. A value P _D, which is an indication of the energy of the audio signal to the speaker, is passed from the VAD device 109 to the finite state machine 105. From VAD apparatus 109, the flag F _D the VAD apparatus instructs whether the detected human voice is passed to the finite state machine. The finite state machine 106 has a function of setting the switches 103 and 109 according to a value input to the finite state machine. FIG. 2 illustrates the states and possible transitions of the finite state machine of FIG. The transition between the states is performed according to the following description. The following definitions are used: • F _E : VAD flag during coding • F _D : VAD flag during decoding • P _E : Signal energy during coding • P _D : Signal energy during decoding • Hangover: The time between the decision to switch directions and the switch. This time must be long enough to compensate for room echoes. 201. F _E = 1 and F _D = 0 or F _E = 1 and P _E > P _D , hangover = 0 202. F _E = 0, hangover = 600 ms 203. F _D = 1 and F _E = 0 or F _D = 1 and P _D > P _E , hangover = 0 204. F _D = 0, hangover = 600 ms 205. 206. F _D = 1 and P _D > P _E , hangover = 600 ms F _E = 1 and P _E > P _D , hangover = 600 ms During state transmission 207, the switch controlling the transmission of the audio signal from the microphone is disabled, and the switch controlling the transmission of the audio signal to the speaker is disabled. . In state receiving 208, the switch controlling the transmission of the audio signal from the microphone is disabled and the switch controlling the transmission to the loudspeaker is disabled. In the idle state 209, both switches are disabled. Having described the invention in the foregoing, it is clear that the same can be varied in many ways. Such modifications should not be deemed to depart from the spirit and scope of the present invention, and all such modifications apparent to those skilled in the art are deemed to be within the scope of the following claims.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＳＤ，ＳＺ，ＵＧ，ＺＷ)，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＥ，ＧＨ，ＧＭ，ＧＷ，ＨＵ，ＩＤ，ＩＬ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＵＡ，ＵＧ，ＵＳ，ＵＺ，ＶＮ，ＹＵ，ＺＷ────────────────────────────────────────────────── ─── Continuation of front page (81) Designated countries EP (AT, BE, CH, DE, DK, ES, FI, FR, GB, GR, IE, IT, L U, MC, NL, PT, SE), OA (BF, BJ, CF) , CG, CI, CM, GA, GN, ML, MR, NE, SN, TD, TG), AP (GH, GM, KE, LS, M W, SD, SZ, UG, ZW), EA (AM, AZ, BY) , KG, KZ, MD, RU, TJ, TM), AL, AM , AT, AU, AZ, BA, BB, BG, BR, BY, CA, CH, CN, CU, CZ, DE, DK, EE, E S, FI, GB, GE, GH, GM, GW, HU, ID , IL, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MD, M G, MK, MN, MW, MX, NO, NZ, PL, PT , RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, UA, UG, US, UZ, V N, YU, ZW

Claims

[Claims] 1. A method for reducing echo when transmitting voice in a telephone application, wherein the telephone application Including loudspeaker and microphone, if the finite state machine is The speaker and the microphone depending on the characteristics of these signals and the characteristics of the signal to the speaker. The method as described above, wherein the method comprises turning on or off the lophone. 2. The method of claim 1, wherein the telephone application comprises at least one VAD device. Unit, one GSM encoder and one GSM decoder, from the microphone The first VAD flag of this signal is passed to the finite state machine, A first value representing the signal energy of these signals is passed to the finite state machine and The second VAD flag of the signal to the speaker is passed to the finite state machine and A second value representing the signal energy of the signal to power is passed to the finite state machine; Depending on the first VAD flag, the second VAD flag, the first value and the second value The finite state machine controls transmission of the signal from the microphone. The finite state machine affects the first switch, A method that affects a second switch that controls the transmission of two signals. 3. The method of claim 2, wherein the first sample from the microphone. The resulting speech signal is passed to the GSM encoder, and the first long-term predictor lag value is 1V Passed to the AD device, the first auto-correlation coefficient from the first GSM encoder to the first VAD device And a first Boolean flag is passed from the first VAD device to the finite state machine. And the first value representing the energy of the signal from the microphone is the first VAD Passed from the device to the finite state machine for a second sampled and coded Receiving the second audio signal, the second audio signal is passed to a GSM decoder, The second long-term predictor lag value from the voice signal is passed to a second VAD device and a second automatic phase relationship A number is calculated and passed to the second VAD device and represents a second signal representing the energy of the second audio signal. A binary value is passed from the second VAD device to the finite state machine and a second Boolean A flag is passed from the VAD device to the finite state machine and the finite state machine is The first Boolean flag, the second Boolean flag, the first value And the first sump from the microphone depending on the second value. A first switch for affecting the transmission of the coded audio signal and a loudspeaker; And a second switch that affects transmission of the second decoded audio signal to the second switch. Method. 4. The method of claim 2, wherein the finite state machine takes a first state. The first switch for controlling transmission from the microphone And the second switch for controlling the transmission to the speaker is The finite state machine is set not to allow transmission and the finite state machine takes the second state. The first switch for controlling transmission from the microphone And the second switch for controlling transmission to the speaker is A method configured to allow said transmission. 5. The method of claim 4, wherein the finite state machine takes a third state. The first and second switches are both set to the same state. 6. The method of claim 5, wherein the first flag is true and the second flag is false. Or if the first flag is true and the first value is greater than the second value, The finite state machine switches from the third state to the first state and The finite state machine when the hangover time elapses with one flag false Switches from the first state to the third state, the second flag is true and the first state If the flag is false, or if the second flag is true and the second value is greater than the first value The finite state machine switches from the third state to the second state. If the second flag is false and the hangover time has passed, the finite step is performed. The port machine switches from the second state to the third state and the second flag is True if the second value is greater than the first value and the hangover time has passed If the finite state machine switches from the first state to the second state, The first flag is true, the first value is greater than the second value and the hangover Over time, the finite state machine transitions from the second state to the first state. How to switch to the state. 7. The method of claim 6, wherein the hangover time is 600ms. Law. 8. A device for reducing echo when transmitting voice in a telephone application, wherein the telephone application Including a speaker and a microphone, wherein the telephone application receives signals from the microphone. And the microphone depending on the characteristics of the signal and the signal to the speaker. Said device including a finite state machine adapted to turn on or off Place. 9. Arrange to send and receive audio in telephone applications, including devices to reduce audio echo. A personal computer, wherein the telephone application is a speaker and a micro computer. And the telephone application includes the characteristics of the signal from the microphone and the On or off the speaker and microphone depending on the characteristics of the signal to the Said personal computer including a finite state machine adapted to be turned off. Data.