CN1255255A

CN1255255A - Echo reducing phone with state machine controlled switches

Info

Publication number: CN1255255A
Application number: CN 98804832
Authority: CN
Inventors: J·格瑙斯佩刘斯
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 1997-03-11
Filing date: 1998-02-24
Publication date: 2000-05-31
Also published as: BR9808240A; JP2001514823A; SE9700873D0; EP0974205A1; TW407435B; WO1998040974A1; AU6426498A; AU735505B2; CA2283590A1; SE9700873L; SE511650C2

Abstract

The purpose of the present invention is thus to reduce the echo introduced by cross-talk. The problem described above, with how to reduce the echo introduced by cross-talk is solved by to the microphone and to the speaker introduce switches controlled by a state-machine which take as input the signal energy of the signal from the microphone, a VAD flag of the signal from the microphone, the signal energy of the signal to the speaker and a VAD flag of the signal to the speaker.

Description

The echo reducing phone that has the switch of state machine control

The present invention relates generally to telecommunications, relate more specifically to the speech processes of the Speech Communication on the internet.

The PC of typical Internet telephony utilization band sound plate, microphone and two loud speakers.Microphone and loud speaker are put on the table usually toward each other.This configuration causes sounding the considerable amounts of cross talk that resembles echo at receiver end.For being used, Internet telephony must suppress this echo.

In GSM, knownly detect the mobile phone user whether in speech with VAD (voice activity detection).Bandwidth when utilizing this information can reduce the emission speech.In the discontinuousness voice coding according to VOX principle (transmission of voice operation), VAD is responsible for the unit detecting the voice whether sound sequence that is received represents the people.Two kinds of different states can be got in the VAD unit, wherein the first state representation sound sequence speech of behaving and speech that another state representation sound sequence is not the people.

If the VAD unit detects the speech that given sound sequence is represented the people, first status signal will be issued to the speech coding unit in this VAD unit, and the latter is coded in this sound sequence in the speech frame.Otherwise if given sound sequence is represented people's voice thing in addition, this VAD unit will be issued second status signal and give SID (silence descriptor) unit.Every N the frame in described SID unit sends a SID frame.May not send anything in the opportunity of transmit frame at all the other N-1.The SID frame comprises the information about the noise spectrum of the background noise estimated and estimation on the transmit leg.Can save the power of battery and radio bandwidth with this process.

When the SID unit changes to generation second status signal from generating first status signal, be from detecting voice to the time interval that detects non-voice, usually apply the so-called hangover, the speech coding unit continue to send speech frame during this, and seeming the sound sequence that is received still is people's voice.If the VAD unit still detects non-voice after the hang-over delay, just generate the SID frame.The reason of this process is the pause of the weak point between the word in people's the language not to be interpreted as non-voice, and the speech frame generator must enliven.

The invention discloses the method and apparatus of the echo of reducing the cross-talk introducing.

Thereby purpose of the present invention is for reducing the echo that cross-talk is introduced.

Above-mentioned is solved by the switch of state machine control by introducing to microphone and loud speaker about how reducing the echo problem that cross-talk introduces, this state machine with from the signal energy of the signal of microphone, from the VAD sign of the signal of microphone, to the signal energy of the signal of loud speaker and to the VAD sign of the signal of loud speaker as importing.

One of advantage of the present invention is to have reduced the echo that cross-talk introduces significantly and need not more computing capability.

From detailed description given below, will be conspicuous for other advantage of person skilled in the art person.

From detailed description given below, further range of application of the present invention will be conspicuous.However, it should be understood that most preferred embodiment of the present invention is exemplary, because from this detailed description, various changes within the scope of the present invention are conspicuous with revising person skilled in the art person.

Fig. 1 illustrates the block diagram of one embodiment of the present of invention.

Fig. 2 illustrates finite state digraph.

Microphone 101 is connected on the GSM encoder 102 in Fig. 1.Before signal arrived GSM encoder 102, it was digitized according to unshowned known technology among Fig. 1 and samples.At first code signal is transferred to not shown receiver from GSM encoder 102 by starting or ending the switch 103 that transmits.From GSM encoder 102 with ACF _E(autocorrection coefficient) passes to VAD unit 104.Also transmit long-term predictor lagged value N from the GSM frame _EGive VAD unit 104.The value P that will represent the energy of signal from VAD unit 104 _EPass to finite state machine 105.VAD unit 104 also calculates the sign F whether indication VAD unit 104 has detected people's voice _ETo indicate F _EPass to finite state machine 105.If detect people's speech then indicate F _EFor very.

Also has the coded speech signal that receives and pass to the sampling of GSM decoder 106 from sender's (not shown) among Fig. 1.At first pass to loud speaker 107 from GSM decoder 106 by the sampling voice signal that can make or will decode by the switch 108 of voice signal arrival loud speaker.According to unshowned known technology among Fig. 1,, need the D/A conversion in order to make loud speaker energy operate as normal.From the coded speech signal that is received, derive long-term predictor lagged value N ₀And pass to VAD unit 109.

Do not use the VAD unit because the decoding of GSM frame does not comprise usually, the GSM decoder lacks the call parameter that is used to calculate ACF.In order to calculate ACF, auto-correlation unit 110 receives the ACF that data and calculating from GSM decoder 106 pass to VAD unit 109 _DAuto-correlation unit 110 is the part of the GSM encoder described in the standard.To arrive the indicated value P of the energy the voice signal of loud speaker from VAD unit 109 _DPass to finite state machine 105.Also will indicate F from VAD unit 109 _DPass to described finite state machine, whether indication VAD unit detects people's speech.

Finite state machine 106 comprises according to being input to the value configuration switch 103 of finite state machine and 109 function.

The state of the finite state machine shown in Fig. 2 among Fig. 1 and possible transfer.

Transfer between the state is carried out according to following description.Utilize following definitions:

F _E: the VAD sign during coding

F _D: the VAD sign during decoding

P _E: the signal energy during coding

P _D: the signal energy during decoding

Hangover: from the determine switch direction to the time of carrying out switch.This time, necessary long enough was with the compensation indoor echo.

201.F _E=1AND F _D=0 OR F _E=1 and P _E＞P _D, hangover=0

202.F _E=0, hangover=600ms

203.F _D=1 AND F _E=0 OR F _D=1 and P _D＞P _E, hangover=0

204.F _D=0, hangover=600ms

205.F _D=1 AND P _D＞P _E, hangover=600ms

205.F _E=1 AND P _E＞P _D, hangover=600ms

In state TRANSMITTING (transmission) 207, start-up control is from the switch of microphone transporting speech signal and by the switch of control transmission voice signal to loud speaker.In state RECEIVING (reception) 208, by controlling the switch that transmits to loud speaker from the switch and the start-up control of microphone transporting speech signal.In IDLE (free time) state 209, two switches all end.

Described the present invention like this, apparent available multiple mode changes the present invention.And this change does not think to depart from spirit of the present invention and scope, all is intended to be included within the scope of following claim for the conspicuous all such modifications of person skilled in the art person.

Claims

1. method that is used for when the phone application transporting speech reducing echo, described phone application comprises loud speaker and microphone, it is characterized in that, finite state machine influences opening or closing of described loud speaker and microphone according to the signal characteristic that reaches described loud speaker from the signal characteristic of described microphone.

2. according to the method for claim 1, wherein said phone application comprises at least one VAD unit, a GSM encoder and a GSM decoder, it is characterized in that, to pass to described finite state machine from the VAD sign of the signal of microphone will represent to pass to described finite state machine from first value of the representation signal energy in the signal of microphone, the 2nd VAD sign that arrives the signal of loud speaker is passed to described finite state machine, second value of the energy in the signal of expression arrival loud speaker is passed to described finite state machine, according to described VAD sign, described the 2nd VAD sign, described first value and described second value, described finite state machine influence control is from first switch of the transmission of the described signal of described microphone, and described finite state machine is transferred to described secondary signal the second switch of described loud speaker.

3. according to the method for claim 2, it is characterized in that, to pass to described GSM encoder from the first sampling voice signal of described microphone, the first long-term predictor lagged value is passed to a VAD unit, to pass to a described VAD unit from first auto-correlation coefficient of a described GSM encoder, to pass to described finite state machine from first Boolean denotation of a described VAD unit, expression is passed to described finite state machine from first value of the energy of the signal of described microphone from a described VAD unit, receive the second sampling coded speech signal, described second voice signal is passed to the GSM decoder, to pass to the 2nd VAD unit from the second long-term predictor lagged value of described second voice signal, calculate second auto-correlation coefficient and pass to described the 2nd VAD unit, second value of the energy in described second voice signal of expression is passed to described finite state machine from described the 2nd VAD unit, second Boolean denotation is passed to described finite state machine from described VAD unit, and described finite state machine is according to described first Boolean denotation, described second Boolean denotation, described first value and described second value, the control influence is from first switch of the transmission of the described first sampling coded speech signal of described microphone, and the described second decoding voice signal of influence is to the second switch of the transmission of loud speaker.

4. according to the method for claim 2, it is characterized in that, if described finite state machine is got first state, control is arranged to allow this transmission from described first switch of the transmission of described microphone, and control transmission is arranged to not allow this transmission to the described second switch of loud speaker, if described finite state machine is got second state, control is arranged to not allow this transmission from described first switch of the transmission of described microphone, and will controls the described second switch of the transmission of loud speaker is arranged to allow this transmission.

5. according to the method for claim 4, it is characterized in that,, then described first and second switch all is arranged to identical state if described finite state machine is got the third state.

6. according to the method for claim 5, it is characterized in that, if if described first be masked as true with described second be masked as puppet or described first be masked as pseudo-and described first value greater than described second value, then described finite state machine switches to described first state from the described third state; If described first is masked as puppet and has pass by hang-over delay, then described finite state machine switches to the described third state from described first state; If if described second be masked as true and described first and be masked as puppet or described second and be masked as true and described second value greater than described first value, then described finite state machine switches to described second state from the described third state; If described second is masked as pseudo-and has pass by described hang-over delay, then described finite state machine switches to the described third state from described second state; If described second be masked as true and described second value greater than described first value with pass by described hang-over delay, then described finite state machine switches to described second state from described first state; If described first is masked as true and described first value greater than described second value and pass by described hang-over delay, then described finite state machine switches to described first state from described second state.

7. according to the method for claim 6, it is characterized in that described hang-over delay is 600ms.

8. device that is used for when the phone application transporting speech reducing echo, described phone application comprises loud speaker and microphone, it is characterized in that described phone application comprises the finite state machine that opens or closes that is configured to according to influence described loud speaker and microphone from the signal characteristic of described microphone and the signal characteristic that arrives described loud speaker.

9. one kind is configured to phone application transport and the personal computer that receives speech, described phone application comprises the echo that is used to reduce described speech, described phone application comprises loud speaker and microphone, it is characterized in that described phone application comprises that basis reaches the finite state machine that opens or closes that influences described loud speaker and microphone from the signal characteristic of described loud speaker from the signal characteristic of described microphone.