EP0974205A1

EP0974205A1 - Echo reducing phone with state machine controlled switches

Info

Publication number: EP0974205A1
Application number: EP98909895A
Authority: EP
Inventors: Johan Gnosspelius
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 1997-03-11
Filing date: 1998-02-24
Publication date: 2000-01-26
Also published as: WO1998040974A1; AU6426498A; SE9700873L; AU735505B2; CA2283590A1; TW407435B; CN1255255A; JP2001514823A; BR9808240A; SE511650C2; SE9700873D0

Abstract

The purpose of the present invention is thus to reduce the echo introduced by cross-talk. The problem described above, with how to reduce the echo introduced by cross-talk is solved by to the microphone and to the speaker introduce switches controlled by a state-machine which take as input the signal energy of the signal from the microphone, a VAD flag of the signal from the microphone, the signal energy of the signal to the speaker and a VAD flag of the signal to the speaker.

Description

ECHO REDUCING PHONE WITH STATE MACHINE CONTROLLED SWITCHES

TECHNICAL FIELD OF INVENTION

The present invention relates to telecommxinication in general and to speech processing for voice communication over the Internet in particular.

DESCRIPTION OF RELATED ART

A typical Internet phone uses a PC with a sound board, a microphone and two loudspeakers . The microphone and the loudspeakers are often placed next to each other on the desk. Such a configuration causes a considerable amount of cross-talk heard as an echo at the receiver end. This echo must be suppressed to make the Internet phone usable.

In GSM it is known to use VAD (Voice Activity Detection) to detect when a user of a mobile phone is talking or not talking. This information is used to be able to decrease to bandwith when transmitting the voice. In discontinuous speech coding according to the VOX-principle (Voice Operated Transmisson) a VAD unit is responsible for detecting wheter or not a received sound sequence represents human speech or not. The VAD unit can take two different states where a first state indicates that the sound sequence was human voice and the other state indicates that the sound sequence was not human voice.

If the VAD unit detects that a given sound sequence represents human voice the VAD unit will issue a first state signal to a speech coding unit which will encode the sound sequence in a speech frame. If on the other hand a given sound sequence represents something but not human speech the VAD unit will issue a second state signal to a SID (Silence Descriptor) unit. Said SID unit will every N:th frame deliver a SID frame. During the remaining N-l possible occasions to send frames nothing will be sent. A SID frame comprises information about estimated background noise and estimated noise spectra on the sending side. With this procedure batteri power and radio-bandwidth can be saved.

When the SID unit changes from generating the first state signal to generating the second state signal, that is from detecting speech to detecting non-speech a time interval, a so called hang-over is normally applied during which the speech encoding unit will continue to deliver speech frames as if the received sound sequence had been human speech. If, after the hang-over time the VAD unit still detects non-speech a SID frame is generated. The reason for this procedure is that short pauses between words in the human speech not shall be interpreted as non-speech, but that the speech frame generator still shall be active.

SUMMARY OF THE INVENTION

The present invention discloses a method an an apparatus for reduction of echoes introduced by cross-talk.

The purpose of the present invention is thus to reduce the echo introduced by cross-talk

The problem described above, with how to reduce the echo introduced by cross-talk is solved by to the microphone and to the speaker introduce switches controlled by a state-machine which take as input the signal energi of the signal from the microphone, a VAD flag of the signal from the microphone, the signal energi of the signal to the speaker and a VAD flag of the signal to the speaker. One of the advantages with the present invention is that the echo introduced with cross-talk is significantly reduced without requiring to much computational power.

Other advantages will be obviouse to a man skilled in the art in the light of the detailed description given below.

Further scope of applicability of the present invention will become apparent from the detailed description given herein after. However, it should be understood that the preferred embodiments of the invention, are given by way of illustration only, since variouse changes and modifications within the scope of the invention will become apparent to those skilled in the art from this detailed description.

BRIEF DESCRIPTION OF DRAWINGS

Figure 1 shows a block diagram of one embodiment of the invention.

Figure 2 shows a Finite State Diagram.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

In figure 1 a microphone 101 is connected to a GSM encoder 102. Before the signal arrives to the GSM encoder 102 it has been digitalised and sampled according to known technology but which is not shown in figure 1. From the GSM encoder 102 the encoded signal is transmitted to a receiver not shown in the figure first passing a switch 103 which can enable or disable the transmission. From the GSM encoder 102 is a ACF_E (Autocorrection Coefficient) passed to a VAD unit 104. To the VAD unit 105 is also a long term predictor lag value N_E passed from the GSM frames. From the VAD unit 104 is a value, P_E representing the energi of the signal passed to a finite state machine 105. The VAD unit 104 also computes a flag F_E indicating if the VAD unit

104 has detected human speech. The flag F_E is passed to the finite state machine 105. The flag F_E will be true if human voice has been detected.

Further in figure 1 is a sampled, encoded voice signal received from a sender (not shown) and passed to a GSM decoder 106. From the GSM decoder 106 the decoded, sampled voice signal is passed to a speaker 107 first passing a switch 108 which can enable or disable to voice signal from reaching the speaker. For the speaker to be able to function properly an D/A-convertion is needed according to known technology but not shown in figure 1. From the received sampled, encoded voice signal a long term predictor lag value N_D is deducted and passed to a VAD unit 109.

Since the decoding of GSM frames normaly do not involve using a VAD unit the GSM decoder lacks necessary parameters for calculating the ACF. To be able to calculate the ACF an autocorrelation unit 110 receives data from the GSM decoder 106 and calculates the ACF_D which is passed to the VAD unit 109. The autocorrelation unit 110 is a part of the GSM encoder as described in the standards. A value P_D indication the energi in the voice signal to the speaker is passed from the VAD unit 109 to the finite state machine 105. From the VAD unit 109 is also a flag F_D passed to said finite state machine indicating if the VAD unit has detected human voice.

The finite state machine 106 comprises functionality for setting the switches 103 and 109 in dependance of the values inputted to the finite state machine.

In figure 2 the states and the possible transitions is shown for the finite state machine in figure 1. The transitions between states are done according to the description below. The following definitions are used:

• F_E: VAD flag when encoding

• F_D: VAD flag when decoding • P_E: Signal energy when encoding

• P_D: Signal energy when decoding.

• Hangover: The time from the decision to switch direction until the switch is made. This time must be long enough to compensate for the room echo.

201. F_E= 1 AND F_D= 0 OR F_E= 1 and P_E> P_D, hangover = 0

202. F_E= 0, hangover = 600 ms

203. F_D= 1 AND F_E= 0 OR F_D= 1 and P_D> P_E, hangover = 0

204. F_D= 0, hangover = 600 ms

205. F_D= 1 AND P_D> P_E, hangover = 600 ms 206. F_E= 1 AND P_E> P_D hangover = 600 ms

In the state TRANSMITTING 207 is the switch controlling the transmission of the voice signal from the microphone enabled and the switch controlling the transmission of the voice signal to the speaker disabled. In the state RECEIVING 208 is the switch controlling the transmission of the voice signal from the microphone disabled and the switch controlling the transmission to the speaker enabled. In the IDLE state 209 both switches are disabled.

The invention being thus described, it will be obviouse that the same may be varied in many ways. Such variations are not to be regarded as a departure from the spirit and scope of the invention, and all such modifications as would be obvoiuse to a man skilled in the art are intended to be included within the scope of the following claims .

Claims

1. A method for reducing the echo when transmitting voice in an telephone application, said telephone application comprises a speaker and a microphone, characterise in that a finit state machine affects said speaker and microphone to be on or off in dependance of characteristics of the signal from said microphone and characteristics of the signal to said speaker.

2. A method according to claim 1 where said telephone application comprises at least one VAD unit, one GSM encoder and one GSM decoder, characterised in that a first VAD flag of the signal from the microphone is passed to said finite state machine, that a first value representing the signal energi in the signal from the microphone is passed to said finite state machine, that a second VAD flag of the signal to the speaker is passed to said finite state machine, that a second value representing the energi in the signal to the speaker is passed to said finite state machine, that said finite state machine affects a first switch controlling the transmission of said signal from said microphone, that said finite state machine affects a second switch controlling the transmission of said second signal to said speaker in dependance of the values of said first VAD flag, said second VAD flag, said first value and said second value.

3. A method according to claim 2 , characterised in that a first sampled voice signal from said microphone is passed to said

GSM encoder, that a first long term predictor lag value is passed to a first VAD unit, that a first autocorrelation coefficient is passed from said first GSM encoder to said first VAD unit, that a first boolean flag is passed from said first VAD unit to said finit state machine, that a first value representing the energi of the signal from said microphone is passed from said first VAD unit to said finite state machine, that a second sampled, encoded voice signal is received, that said second voice signal is passed to a GSM decoder, that a second long term predictor lag value from said second voice signal is passed to a second VAD unit, that a second autocorrelation coefficient is calculated and passed to said second VAD unit, that a second value representing the energi in said second voice signal is passed from said second VAD unit to said finite state machine, that a second boolean flag is passed from said VAD unit to said finite state machine and that said finite state machine controlls a first switch affecting the transmission of said first sampled, encoded voice signal from said microphone, a second switch affecting the transmission of said second, decoded voice signal to a speaker in dependance of the values of said first boolean flag, said second boolean flag, said first value and said second value.

4. A method according to claim 2 , characterised in that if said finite state machine takes a first state said first switch controlling the transmission from said microphone is set to allow such transmission and that said second switch controlling the transmission to the speaker is set not to allow such transmission, that if said finite state machine takes a second state said first switch controlling the transmission from said microphone is set not to allow such transmission and that said second switch controlling the transmission to the speaker is set to allow such transmission.

5. A method according to claim 4, characterised in that if said finite state machine takes a third state said first and second switch are both set to the same state.

. A method according to claim 5 , characterised in that said finite state machine switch from said third state to said first state if said first flag is true and said second flag is false or if said first flag is true and said first value is larger than said second value, that said finite state machine switches from said first state to said third state if said first flag is false and a hangover time has elapsed, that said finite state machen switches from said third state to said second state if said second flag is true and said first flag is false or if said second flag is true and said second value is larger than said first value, that said finite state machine switches from said second state to said third state if said second flag is false and said hangover time has passed, that said finite state machine switches from said first state to said second state if said second flag is true and said second value is larger than said first value and said hangover time has passed, that said finite state machine switches from said second to said first state if said first flag is true and said first value is larger than said second value and said hangover time has passed.

7. A method according to claim 6, characterised in that said hangover time is 600 ms.

8. An apparatus for reducing the echo when transmitting voice in an telephone application, said telephone application comprises a speaker and a microphone, characterised in that said telephone application comprises a finit state machine arranged to affect said speaker and microphone to be on or off in dependance of characteristics of the signal from said microphone and characteristics of the signal to said speaker.

9. A personal computer arranged for transmitting and receiving voice with an telephone application comprising an apparatus for reducing the echo of said voice, said telephone application comprises a speaker and a microphone, characterised in that said telephone application comprises a finit state machine arranged to affect said speaker and microphone to be on or off in dependance of characteristics of the signal from said microphone and characteristics of the signal to said speaker.