CN104488224A

CN104488224A - Processing audio signals

Info

Publication number: CN104488224A
Application number: CN201280025394.5A
Authority: CN
Inventors: K.V.索伦森
Original assignee: Skype Ltd Ireland
Current assignee: Skype Ltd Ireland
Priority date: 2011-05-26
Filing date: 2012-05-28
Publication date: 2015-04-01
Also published as: WO2014019596A3; EP2735120A2; GB2491173A; WO2014019596A2; US20120303363A1; GB201108885D0

Abstract

A method, user device and computer program product for processing audio signals during a communication session between a user device and a remote node. The method comprising: receiving a plurality of audio signals at audio input means at the user device including at least one primary audio signal and unwanted signals; receiving direction of arrival information of the audio signals at a gain control means; providing to the gain control means known direction of arrival information representative of at least some of said unwanted signals; processing the audio signals at the gain control means by applying a level of gain to generate a gain controlled signal for transmission to the remote node, wherein the level of gain applied is dependent on a comparison between the direction of arrival information of the audio signals and the known direction of arrival information.

Description

Audio signal

Invention field

The present invention relates to audio signal during communication session.

Background technology

Communication system allows user to communicate with one another on network.Network can be such as the Internet or public exchanging telephone network (PSTN).Can transmission of audio signal between nodes in a network, thus allow user in communication system in a communication session to sending each other and received audio signal (as speech data etc.).

Subscriber equipment can have such as microphone etc. can be used for the voice input device of the audio signal such as such as voice received from user.User can enter the communication session of such as private telephone (only having two people in call) or conference telephone (more than two people in call) etc. and another user.The voice of user are received at microphone place, process, on network, be then sent to other user in call.

Be not only the audio signal from user, microphone can also receive other audio signals such as such as background noise, and these other audio signals may disturb the audio signal received from user.

Subscriber equipment can also have the audio output devices such as such as loud speaker, for the audio signal received from user on network during conversing is exported to user.But loud speaker can also be used to the audio signal exporting other application that comfortable subscriber equipment place performs.Such as, subscriber equipment can be perform such as TV that the communication customer end at network communication etc. is applied.When subscriber equipment is busy with call, the microphone intention being connected to subscriber equipment receives customer-furnished, to be intended to other user be sent in call voice or other audio signal.But microphone may pick up the undesirable audio signal exported from the loud speaker of subscriber equipment.The undesirable audio signal exported from subscriber equipment may bring interference to what receive from user at microphone for the audio signal sent call.

When have can by the room of other noise source of microphone pickup in use subscriber equipment time, also there will be problem.

In order to improve as the quality for msp signal of conversing, wish the undesirable audio signal (background noise and undesirable audio signal) suppressing to receive at the voice input device place of subscriber equipment.

Stereophony microphone and microphone array (wherein multiple microphone works as individual equipment) is used just to become more and more common.These make except can realizing in single microphone, can also use extracted spatial information.When using such devices, a kind of method of undesirable audio signal is suppressed to be application of beam shaper.Wave beam forming is the process attempting to strengthen by applying signal transacting the sound come from the one or more direction wanted and the signal that microphone array is received convergence.In order to simplify, we will be described below the situation only having the single direction wanted, but are also suitable for identical method when there being more how interested direction.First Wave beam forming by estimating to receive at microphone place the angle of wishing signal and so-called arrival direction (DOA) information realizes.Adaptability Beam-former use DOA information processes the signal from the microphone in array, to form one or more such wave beam: wherein receive gain on the direction of wishing signal at microphone array place high, and gain is low on other any direction.

Although undesirable audio signal that Beam-former attempt suppresses never desired directions and comes, but the shape and size of the quantity of microphone and microphone array can limit the effect of Beam-former, as a result, although undesirable audio signal is suppressed, still hear.

In order to follow-up single channel process, the output of Beam-former is processed level by as input signal supply automatic growth control (AGC) usually.AGC process level applies gain to the whole signal on channel and based on incoming signal level, suitable level is adjusted in gain in time.

When there being far end activity, can estimate the echo arrived from loudspeaker is from which direction.Identical loudspeaker can be used to play such as music, or if end points is TV, it can be the audio frequency of the program from current viewing.When the positive audio plays of loud speaker instead of far-end speech, it is classified as near-end activity usually, and it can be amplified to regular speech level by automatic growth control.When Near end speaker subsequently sounding time, automatic growth control may adjust for the signal of mistake, and may no longer must not adjust to near-end speech.During optimum gain institute's time spent is got back in adjustment, signal may clipped wave and/or seriously compress, or signal amplitude (i.e. volume) may be too low compared with representing the target level of audible voice time.

In the embodiments of the invention of following explanation, the information about the angle of sound arrival also may be used for automatic imitation and digital auto-gain compensative.It is robust that DOA information is used to gain is controlled for the audio frequency arrived from some direction.By means of the embodiment of present invention, can detect audio frequency be arrive from the angle of loud speaker and make gain constant until or this sound arrives from the angle of (people's) Near end speaker.Like this, will prevent gain from increasing for the sound arrived from undesirable direction.

Summary of the invention

According to a first aspect of the invention, provide a kind of method that audio signal during communication session between subscriber equipment and remote node is processed, described method comprises: the multiple audio signals being received in the voice input device place at described subscriber equipment place, and the plurality of audio signal comprises at least one audio signal and undesirable signal; The arrival direction information of described audio signal is received at gain control place; The known arrival direction information representing and do not wish signal described at least some is provided to described gain control; And process the audio signal at described gain control place by the gain applying certain level, to generate for the gain control signal being sent to remote node, the level of the gain of wherein said applying depends on comparing between the described arrival direction information of described audio signal with described known arrival direction information.

Preferably, multiple audio signal described in described voice input device process comprises the single channel audio output signal of frame sequence to generate, described gain control processed in sequence frame described in each.

Preferably, receive the arrival direction information of the main signal composition of processed present frame at described gain control place, described method comprises further: the arrival direction information of the described main signal composition of described present frame and described known arrival direction information are made comparisons.Can based on described relatively to whether forbidding that the activity of described gain control is made decision.

Described known arrival direction information can be included at least one direction that described voice input device place receives remote signaling, described decision based on whether at described voice input device place, from the described main signal composition receiving at least one direction described in remote signaling at described voice input device and receive described present frame.

Alternatively or additionally, described known arrival direction information can comprise at least one direction sorted out, described decision based on whether at described voice input device place, receive the described main signal composition of present frame from least one direction sorted out described, at least one direction sorted out described can be the direction that at least one undesirable audio signal arrives described voice input device, and is identified based on the characteristics of signals of at least one undesirable audio signal described.

Alternatively or additionally, described known arrival direction information can be included at least one principal direction that described voice input device place receives at least one audio signal described, described decision based on whether at described voice input device place, receive the described main signal composition of described present frame from least one principal direction described.

Preferably, at least one principal direction described is by determining like this: determine the maximized time delay of cross-correlation between the described audio signal that makes to receive at described voice input device place; And detect receive at described voice input device place, with the characteristics of speech sounds in the described audio signal of the time delay of maximum cross-correlation.

Described voice input device can comprise Beam-former, and this Beam-former is configured to: estimate at least one principal direction described; And process described multiple audio signal, to pass through form wave beam and suppress the audio signal from other any direction except described principal direction substantially and generate described single channel audio output signal at least one principal direction described.Described known arrival direction information can comprise the beam pattern of described Beam-former.

If compare from described the activity being defined as forbidding described gain control, then described gain control can be configured to apply once to the gain of the certain level that the frame just in the pre-treatment of described present frame applies processed present frame.Alternatively, if compare from described the activity being defined as forbidding described gain control, then described gain control can be configured to depend on just the frame of the pre-treatment of described present frame signal level, by the change of the gain between the present frame of restrained (capped) and former frame impact, described present frame is applied to the gain of certain level.

If compare from described the activity being defined as forbidding described gain control, then described gain control can be configured to the signal level of handled frame and just compare in the signal level of the frame of the pre-treatment of described present frame; And if the signal level of described present frame is just higher in the signal level of the frame of the pre-treatment of present frame than described, then described gain control is configured to the level of reduction gain and the level of the gain of described reduction is applied to described present frame; And if the signal level of described present frame is just lower in the signal level of the frame of the pre-treatment of present frame than described, then described gain control is configured to the level of increase gain and the level of the gain of described increase is applied to described present frame.

In one embodiment, described voice input device comprises the first and second voice input devices, described in each voice input device process, multiple audio signal is to generate delivery channel, described method comprises further: process each delivery channel at each gain control place by the gain each delivery channel being applied to certain level, to generate for the first and second gain control signals being sent to remote node, the level of wherein said gain depends on comparing between the arrival direction information of described audio signal with described known arrival direction information, and it is all identical for each delivery channel.

Preferably, the voice data received from described remote node at described subscriber equipment place in described communication session exports from the audio output device of described subscriber equipment.

Described undesirable signal can by the source of described subscriber equipment generate, described source comprise following at least one: the audio output device of described subscriber equipment; In the source of the activity at described subscriber equipment place, wherein said activity comprises click activity, and this click activity comprises button point and hits activity, keyboard click activity and Genius mouse click activity.

Alternatively, described undesirable signal can be generated by the source of described subscriber equipment outside.

Preferably, at least one audio signal described is the voice signal received at described voice input device place.

According to a second aspect of the invention, provide a kind of subscriber equipment that audio signal during communication session between subscriber equipment and remote node is processed, described user terminal comprises: voice input device, described voice input device receives multiple audio signal, and the plurality of audio signal comprises at least one audio signal and undesirable signal; And gain control, described gain control receives the arrival direction information of described audio signal and represents the known arrival direction information of not wishing signal described at least some, the gain that described gain control is configured to by applying certain level processes described audio signal, to generate for the gain control signal being sent to remote node, the level of the gain of wherein said applying depends on comparing between the described arrival direction information of described audio signal with described known arrival direction information.

According to a third aspect of the invention we, provide a kind of comprise performed by the computer processor unit at subscriber equipment, to communication session during the computer program of computer-readable instruction that processes of audio signal between described subscriber equipment and remote node, described instruction comprises the instruction for implementing method according to a first aspect of the invention.

Accompanying drawing explanation

In order to understand the present invention better and illustrate how the present invention can drop into practicality, below by example with reference to the following drawings, in the accompanying drawings:

Fig. 1 represents the communication system according to preferred embodiment;

Fig. 2 represents the schematic diagram of the user terminal according to preferred embodiment;

Fig. 3 represents the example context of user terminal;

Fig. 4 a represents the schematic diagram of the voice input device at the user terminal place according to an embodiment;

Fig. 4 b represents the schematic diagram of the voice input device at the user terminal place according to alternate embodiment;

Fig. 5 represents how representative estimates the figure of DOA information;

Fig. 6 illustrates two kinds of methods that can be used to adjust the level of the gain being applied to voice-grade channel.

Embodiment

In following embodiments of the invention, describe such technology: wherein, be not rely on Beam-former completely to make not from the sound attenuating that focus direction is come, on the contrary, the DOA information in use automatic growth control adds the robustness to the sound from other direction any clearly.This is highly beneficial when being distinguished with the near-end voice signals wanted by undesired signal by usage space information.The example of this provenance is the door playing the loudspeaker of music, the fan of blowing and closedown.

The direction also can finding other source is sorted out by using signal.The example of this provenance may be such as cause box fan/air-conditioning system, background music broadcasting and keyboard to knock.

Two kinds of compensation processes can be taked.The first, the undesired source arrived from some direction can be identified, and this angle is got rid of from allowing gain control to make the angle of reaction.

The second, gain can be made to control other direction any except expecting the direction that near-end speech arrives all more insensitive.Second method can guarantee that this moves noise source and does not arrive from equidirectional with main loudspeaker, and is not yet detected as noise source not based on the adjustment of mobile noise source.

First the communication system 100 of preferred embodiment is shown with reference to Fig. 1, Fig. 1.First user (user A 102) the operation subscriber equipment 104 of communication system.Subscriber equipment 104 can be such as mobile phone, TV, personal digital assistant (" PDA "), personal computer (" PC ") (comprising such as Windows, Mac OS with Linux PC), game station or other embedded device that can communicate in communication system 100.

Subscriber equipment 104 comprises central processing unit (CPU) 108, and it can be configured to such as perform the application such as the communication customer end of communication in communication system 100.This application allows subscriber equipment 104 to be engaged in call in communication system 100 and other communication session (such as, instant messaging session).Subscriber equipment 104 can communicate in communication system 100 via the network 106 that can be such as internet or public exchanging telephone network (PSTN).Subscriber equipment 104 can send data to network 106 and receive data from network 106 on link 110.

Fig. 1 also show the remote node that subscriber equipment 104 can communicate with it in communication system 100.In example shown in Figure 1, remote node be can by the second user 112 use and comprise second subscriber equipment 114 of CPU 116, wherein CPU 116 can perform application (such as communication customer end) to communicate on communication network 106 with identical mode that subscriber equipment 104 communication network 106 in the communication system 100 communicates.Subscriber equipment 114 can be such as mobile phone, TV, personal digital assistant (" PDA "), personal computer (" PC ") (comprising such as Windows, Mac OS with Linux PC), game station or other embedded device that can communicate in communication system 100.Subscriber equipment 114 can send data to network 106 and receive data from network 106 on link 118.Therefore user A 102 and user B 112 can communicate with one another on communication network 106.

Fig. 2 shows the schematic diagram of the user terminal 104 performing Client application.User terminal 104 comprises the CPU 108 that equipment given directions by the input equipments such as displays such as being connected with such as screen 204, such as keyboard 214 and such as Genius mouse 212 etc.Display 204 can comprise the touch-screen for inputting data to CPU 108.Output audio equipment 206(such as loud speaker) be connected to CPU 108.Such as the input such as microphone 208 audio frequency apparatus is connected to CPU 108 via automatic gain control equipment 228.Although automatic gain control equipment 228 is represented as independently hardware device in fig. 2, automatic gain control equipment 228 can be implemented with software.Such as, automatic gain control equipment can be included in the client.

CPU 108 is connected to such as network interfaces 226 such as the modulator-demodulators that communicates with network 106.

The example context 300 of user terminal 104 is shown referring to Fig. 3, Fig. 3.

When audio signal is when processed after microphone 208 place is received, identify the audio signal wanted.During processing, the detection based on class characteristics of speech sounds identifies the audio signal wanted and determines the principal direction of main loudspeaker.This is shown in Figure 3, and wherein main loudspeaker (user 102) is illustrated as the source 302 of the audio signal wanted arriving microphone 208 from principal direction d1.Although single main loudspeaker shown in Figure 3, it should be understood that the source of the audio signal that can there is any amount of hope within environment 300 for the sake of simplicity.

The source of undesirable noise signal can be there is within environment 300.Fig. 3 shows the noise source 304 that can arrive undesirable noise signal of microphone 208 within environment 300 from direction d3.The source of undesirable noise signal comprises the equipment such as causing box fan, air-conditioning system and broadcasting music.

The click of undesirable noise signal such as Genius mouse 212, the audio signal of knocking and exporting from loud speaker 206 of keyboard 214, can also arrive microphone 208 from the noise source of user terminal 104.Fig. 3 shows the user terminal 104 being connected to microphone 208 and loud speaker 206.In figure 3, loud speaker 206 is the sources that can arrive undesirable audio signal of microphone 208 from direction d2.

Although microphone 208 and loud speaker 206 are illustrated as the external equipment being connected to user terminal, it should be understood that microphone 208 and loud speaker 206 can be integrated in user terminal 104.

In conventional methods where, AGC process level will depend on that the gain level on whole channel is adjusted to suitable level by incoming signal level.Whenever undesirable noise signal of input that receive from undesirable direction, that be present in AGC process level when is takeed for when being voice, is all amplified to regular speech level by AGC process level.This has influence on the speech quality transmitted in call.

The more detailed view of microphone 208 according to an embodiment and automatic gain control equipment 228 is shown referring now to Fig. 4 a, Fig. 4 a.

Microphone 208 comprises the microphone array 402 be included by multiple microphone, and Beam-former 404.The output of each microphone in microphone array 402 is coupled to Beam-former 404.It will be understood by those skilled in the art that and need multi input to implement Wave beam forming.Microphone array 402 is illustrated as and has 3 microphones in the diagram, but should be appreciated that the quantity of this microphone is only example and limits never in any form.

Beam-former 404 comprises the processing block 409 from microphone array 402 received audio signal.Processing block 409 comprises voice activity detector (VAD) 411 and DOA estimates that its work of block 413(will be explained below).Processing block 409 confirms the character of the audio signal received by microphone array 402, and based on the detection of the class speech quality detected by VAD 411 and in block 413 estimate DOA information, determine the one or more principal directions of main loudspeaker.Beam-former 404 by be formed in from one or more principal direction, to receive at microphone array place wish signal direction on there is high-gain and what its direction in office has the wave beam of low gain, thus use DOA information to carry out audio signal.Although be explained above processing block 409 can determine any amount of principal direction, but the quantity of determined principal direction affects the characteristic of Beam-former, such as, with only determine that compared with single principal direction, the decay of the signal received from other (undesirable) direction at microphone array place is less.The output of Beam-former 404 is so that the form online 406 of single channel to be processed to be supplied to automatic gain control equipment 228.

The output of automatic gain control equipment 228 pairs of Beam-formers applies the gain of certain level.The level being applied to the gain that channel exports from Beam-former depends on the DOA information received at automatic gain control equipment 228.The level how determining gain is described with reference to Fig. 6 below.

The output of Beam-former 404 can be subject to the impact of further signal transacting (as noise suppressed etc.).The circuit of not shown this further signal transacting in the diagram.Noise suppressed can be applied to the amplifying signal of the output of automatic gain control equipment 228, and then is sent to client on online 410 and transmits via network interface 226 on network 106.But preferably, noise suppressed is by the output being applied to Beam-former before automatic gain control equipment 228 that is online 406 apply the level of gain.This is because noise suppressed can reduce speech level (by mistake) in theory slightly, and automatic gain control equipment 228 can increase speech level and make compensation for the slight reduction of speech level caused by noise suppressed after noise suppressed.

The more detailed view of microphone 208 according to alternate embodiment and automatic gain control equipment 228 is shown referring to Fig. 4 b, Fig. 4 b.

User may wish the stereophonic effect using two or more independent audio channels, can provide from Beam-former export stereo, but application of beam shaper may not be wished in some cases.Beam-former is not used in this alternate embodiment.

Microphone 208 comprises multiple microphone 402(and comprises microphone 403 and microphone 405) and processing block 409.

In such an embodiment, audio signal is received at multiple microphone 402 place.For the sake of simplicity, Fig. 4 b shows the multiple microphones 402 comprising two microphones 403 and 405, but should be appreciated that the quantity of this microphone is only example and limits never in any form.

Multiple microphone 402 receives the audio signal on two input channels respectively at microphone 403 and 405 place.The channel of microphone 403 and 405 exports and is coupled to corresponding automatic gain control equipment 228,229.The output of microphone 403 and 405 is also coupled to processing block 409 respectively by line 420,422.The channel that the gain of phase same level is applied to their respective microphones 208 by automatic gain control equipment 228,229 exports.The level being applied to the gain of the output of microphone 208 depends on the DOA information received at automatic gain control equipment 228,229 place.The level how determining gain is described with reference to Fig. 6 below.

The output of microphone 208 can be subject to the impact of further signal transacting (as noise suppressed etc.).Noise suppressed can be applied to the amplifying signal of the output of automatic gain control equipment 228,229, and then is sent to client on online 414,415 to transmit on network 106 via network interface 226.But preferably, noise suppressed was applied to the output of microphone 208 before the level being applied gain by automatic gain control equipment 228,229; Below discuss why this is preferred explanation with reference to Fig. 4.

The work that DOA estimates block 413 is illustrated in greater detail referring to Fig. 5.

Estimate in block 413 at DOA, estimated time delay between the audio signal that receives at multiple microphone place by (such as using correlation technique) and use the priori about the position of described multiple microphone to estimate the source of audio signal, thus estimating DOA information.

As an example, Fig. 5 illustrates the microphone 403 and 405 receiving the audio signal two independent input channels from audio-source 516.Formula (1) can be used to estimate at the arrival direction of audio signal at microphone 403 and 405 place being sufficiently separated by distance d:

Wherein v is the velocity of sound, and τ _dit is time difference---the i.e. time delay of the audio signal arrival microphone 403 and 405 from source 516.Time delay obtains as making the maximized time delay of cross-correlation between the signal of the output of microphone 403 and 405.Then the angle θ corresponding with this time delay can be obtained.Characteristics of speech sounds can be detected, to determine the one or more principal directions of main loudspeaker in the signal of the delay with maximum cross-correlation received.

It should be noted that the cross-correlation calculating signal is the ordinary skill in signal transacting field, thus no longer illustrate in greater detail herein.

It should be noted that in both single channel and multichannel embodiment, the present invention does not require to use Beam-former.

The work of automatic gain control equipment 228 is described in more detail below.For the embodiment of Fig. 4 b, it should be noted that automatic gain control equipment 229 plays function in the same manner.In all embodiments of the invention, automatic gain control equipment 228 be used in user terminal place known and the DOA information represented by DOA block 427, and receive audio signal to be processed.Automatic gain control equipment 228 audio signal on a frame-by-frame basis.The process performed in automatic gain control equipment 228 comprises the gain each frame of the audio signal being input to automatic gain control equipment 228 being applied to certain level.The level being applied to the gain of each frame of audio signal by automatic gain control equipment 228 depends on the DOA information extracted of processed present frame and comparing between the existing knowledge of the DOA information of the known various audio-source of user terminal.The DOA information extracted is passed along frame, thus it be used as except frame self, to the input parameter of automatic gain control equipment 228.

In conventional methods where, AGC process level can process input audio signal on a frame-by-frame basis, but will by means of the gain be allowed to from a sampling value to next sampling value smooth change.AGC process level depends on the signal level of processed present frame and compare the gain present frame be processed being applied to certain level just between the signal level of the frame of the pre-treatment of present frame, and DOA information is not taken into account.

If the signal level of processed present frame is lower in the signal level of the frame of the pre-treatment of present frame than just, then AGC process level will increase the level of gain and the level of the gain of increase will be applied to processed present frame.

If the signal level of processed present frame is higher in the signal level of the frame of the pre-treatment of present frame than just, then AGC process level will reduce the level of gain and the level of the gain of minimizing will be applied to processed present frame.

According to embodiments of the invention, the level being applied to the gain of input audio signal by automatic gain control equipment 228 can be affected in many ways by DOA information.

Be identify based on the detection of class characteristics of speech sounds from being identified as arriving the audio signal of microphone 208 from the direction in the source of hope, and be identified as the principal direction from main loudspeaker.

The DOA information known at user terminal place can comprise the beam pattern 408 of Beam-former.Automatic gain control equipment 228 is processing audio input signal on a frame-by-frame basis.During the process of frame, automatic gain control equipment 228 reads the DOA information of frame, to find out the angle receiving the principal component of frame sound intermediate frequency signal at microphone 208 place.DOA information and the DOA information 427 known at user terminal place of frame are made comparisons.This compares whether decision receives processed frame sound intermediate frequency signal from the direction in the source of hope principal component at microphone 208 place.

Alternatively or additionally, known at user terminal place DOA information 427 can be included in microphone 208 place receives (online 407 supply automatic gain control equipment 228,229) remote signaling angle from the loud speaker of user terminal (as 206 etc.) .

Alternatively or additionally, the DOA information 427 known at user terminal place can get from function 425, this function 425 to the audio frequency from different directions sort out to locate very noisy, may be a direction produced by steady noise source.

When DOA information 427 represent main direction of wishing and by compare determine to receive the principal component of processed frame at microphone 208 place from principal direction time, automatic gain control equipment 228 uses the level of conventional method determination gain described above.

In the first way, if determine to receive from the direction beyond principal direction at microphone 208 place processed frame principal component, then forbid the normal running of automatic gain control equipment 228, and automatic gain control equipment 228 applies once to the gain of the certain level that the frame just in the pre-treatment of present frame applies processed present frame, i.e. the level of gain keeps constant.

This prevent automatic gain control equipment 228 adjustment when receiving undesirable audio signal during conversing at microphone 208 place and will be applied to the gain of frame.Alternatively, can prevent automatic gain control equipment 228 from increasing on the frame with undesirable audio signal.

Fig. 6 shows the operation according to the automatic gain control equipment 228 of the first way in a kind of exemplary scene.

During conversing, automatic gain control equipment 228 receives the DOA information (beam pattern 408) of the principal direction identifying main loudspeaker, and it is maintained in block 427.When process the first frame, automatic gain control equipment 228 reads the DOA information of the first frame, to find out the angle of the principal component receiving the first frame sound intermediate frequency signal at microphone 208 place.DOA information and the DOA information 427 known at user terminal place of the first frame are made comparisons.As the result that this compares, the principal component receiving the first processed frame sound intermediate frequency signal at microphone 208 place from principal direction determined by automatic gain control equipment 228.Based on this DOA information, automatic gain control equipment 228 processes the first frame (signal level is s1) by the gain g1 applying certain level.

When process the second frame, automatic gain control equipment 228 reads the DOA information of the second frame, to find out the angle of the principal component receiving the second frame sound intermediate frequency signal at microphone 208 place.DOA information and the DOA information known at user terminal place of the second frame are made comparisons.As the result that this compares, the principal component not receiving the second processed frame sound intermediate frequency signal at microphone 208 place from principal direction determined by automatic gain control equipment 228.Based on this DOA information, automatic gain control equipment 228 processes the second frame (signal level is s2) by applying gain level g1, i.e. the level of gain keeps constant.

In conventional methods where, because the signal level s1 of signal level s2 ratio (just in the pre-treatment of the second frame) first frame of processed the second frame is lower, thus add gain level and the gain level of increase is applied to the audio signal in the second frame, the audio signal namely in the second frame is promoted to regular speech level.

Usually can suppose that the signal level of speech plus noise is higher than the signal level of noise, but the signal level of the noise between voice outburst on rare occasions can be higher than voice.In the illustrated embodiment, the greater in both automatic gain control equipment 228 uses determines gain factor.

When process the 3rd frame, automatic gain control equipment 228 reads the DOA information of the 3rd frame, to find out the angle of the principal component receiving the 3rd frame sound intermediate frequency signal at microphone 208 place.DOA information and the DOA information known at user terminal place of the 3rd frame are made comparisons.As the result that this compares, the principal component receiving the 3rd processed frame sound intermediate frequency signal at microphone 208 place from principal direction determined by automatic gain control equipment 228.Based on this DOA information, automatic gain control equipment 228 processes the 3rd frame (signal level is s3) by applying gain level g3.

Gain level g3 is adjusted in the same manner as in conventional methods where.In this example, 3rd frame has the signal level higher than the signal level of the second frame, i.e. s3>s2, therefore gain level is decreased to g3 from g1 and the gain level g3 after reducing is applied to the audio signal inputting automatic gain control equipment 228 by automatic gain control equipment 228.

Like this, can depend in this first way that the principal component whether receiving processed frame sound intermediate frequency signal at microphone 208 place from principal direction allows or forbids by the adjustment of automatic gain control equipment 228 pairs of gain levels.

As above-mentioned, automatic gain control equipment 228 can receive DOA information from function 425, and this function 425 identifies undesirable audio signal of noise source arrival microphone 208 from different directions.These undesirable audio signals are from their feature identification, such as, knock from the keyboard on keyboard or the audio signal of fan has the feature different from human speech.Get rid of the angle that the angle that undesirable audio signal arrives microphone 208 may be able to be made a response from automatic gain control equipment 228.Therefore, when receiving the principal component of processed frame sound intermediate frequency signal from the direction be excluded at microphone 208 place, automatic gain control equipment 228 applies once to the gain of the certain level that the frame just in the pre-treatment of present frame applies processed frame, i.e. the level of gain keeps constant.

Can further include demo plant 423.Such as, (such as when Beam-former based on beam pattern 408) detects one or more principal direction once, then the principal direction detected is informed to user 102 through client user interface and inquires that whether the principal direction detected by user 102 is correct by client.As shown in the dotted line in Fig. 4 a, this checking is optional.

If user 102 confirms that detected principal direction is correct, then detected principal direction is sent to automatic gain control equipment 228 as DOA information, and automatic gain control equipment 228 operates as described above.Once user 102 login client and confirm detected by principal direction correct, then detected principal direction can store in memory 210 by communication customer end, continue subsequently to login client, if detected principal direction is consistent with correct principal direction confirmed in memory, then think that detected principal direction is correct.This prevent user 102 to be forced in when logining client at every turn and all to confirm principal direction.

If user illustrates that detected principal direction is incorrect, then detected principal direction is not sent to automatic gain control equipment 228 as DOA information.In this case, continuation is detected principal direction by processing block 409, and only when user 102 confirms that detected principal direction is correct, just detected principal direction is sent to automatic gain control equipment 228.

In the first way, operator scheme makes it possible to fully stop adjustment to gain level based on DOA information.

In the second way, automatic gain control equipment 228 is not with this strict operation mode.

On the contrary, in this second way, automatic gain control equipment 228 can adjust the level that will be applied to the gain of the frame of audio signal when the first way can stop it; But only little adjustment is made to the level of gain.Can implement by taking less gain rank or less gain rank the little adjustment of the level of gain.Automatic gain control equipment is all made a response in either case, but than reacting less in classical scenario.

Operation according to the automatic gain control equipment 228 of the second way in exemplary scene shown in Figure 6 is below described.

Identical with in the first way, during conversing, automatic gain control equipment 228 has the DOA information 427 of the principal direction identifying main loudspeaker.When process the first frame, automatic gain control equipment 228 reads the DOA information of the first frame, to find out the angle of the principal component receiving the first frame sound intermediate frequency signal at microphone 208 place.DOA information and the DOA information known at user terminal place of the first frame are made comparisons.As the result that this compares, the principal component receiving the first processed frame sound intermediate frequency signal at microphone 208 place from principal direction determined by automatic gain control equipment 228.Based on this DOA information, automatic gain control equipment 228 processes the first frame (signal level is s1) by the gain g1 applying certain level.

When process the second frame, automatic gain control equipment 228 reads the DOA information of the second frame, to find out the angle of the principal component receiving the second frame sound intermediate frequency signal at microphone 208 place.DOA information and the DOA information known at user terminal place of the second frame are made comparisons.As the result that this compares, the principal component not receiving the second processed frame sound intermediate frequency signal at microphone 208 place from principal direction determined by automatic gain control equipment 228.Based on this DOA information, automatic gain control equipment 228 applies higher or lower gain level to process the second frame (signal level is s2) by identical with conventional method.Second frame has the signal level lower than the first frame in this embodiment, i.e. s2<s1, and gain level is increased to g2 from g1 and the gain level g2 after increasing is applied to the second frame by automatic gain control equipment 228.This and conventional method closer to, but the change △ g=g2-g1 of gain is in this case constrained on the little quantity place of such as 0.1dB.

When process the 3rd frame, automatic gain control equipment 228 reads the DOA information of the 3rd frame, to find out the angle of the principal component receiving the 3rd frame sound intermediate frequency signal at microphone 208 place.DOA information and the DOA information known at user terminal place of the 3rd frame are made comparisons.As the result that this compares, the principal component receiving the 3rd processed frame sound intermediate frequency signal at microphone 208 place from principal direction determined by automatic gain control equipment 228.Based on this DOA information, automatic gain control equipment 228 processes the 3rd frame (signal level is s3) by applying gain level g3.Gain level g3 changes identically with conventional method up or down.In this example, 3rd frame has the signal level higher than the signal level of the second frame, i.e. s3>s2, therefore gain level is decreased to g3 from g2 and the gain level g3 after reducing is applied to the audio signal inputting automatic gain control equipment 228 by automatic gain control equipment 228.In this case, but not restrained the carrying out of the change from g2 to g3 operates to make the frame that signal level is s3 be promoted to regular speech level.

In exemplary scene described above, as shown in Figure 6, automatic gain control equipment 228 is applied to and successively decreases with little or " rank " reduce at the gain level of the audio signal of automatic gain control equipment 228 place input.It is desirable that automatic gain control equipment 228 does not adjust gain when microphone 208 receives background audio signals, and only just adjust gain smoothly when needing the voice reaching target level.Non-flat gain change can affect the quality of call, and therefore the second way has more advantage because it provides the more level and smooth gain of the communication quality causing improving to control than the first way.

Although embodiment described above refers to the microphone 208 from unique user 102 received audio signal, should be appreciated that microphone such as can receive the audio signal from multiple user in conference telephone.In this scene, the source of multiple undesirable audio signal arrives microphone 208.

Although show and describe the present invention especially with reference to preferred embodiment, it will be understood by those of skill in the art that and can make various change in the form and details and not depart from the scope of the present invention defined by claims.

Claims

1., to the method that the audio signal during communication session between subscriber equipment and remote node processes, described method comprises:

Be received in multiple audio signals at the voice input device place at described subscriber equipment place, the plurality of audio signal comprises at least one audio signal and undesirable signal;

The arrival direction information of described audio signal is received at gain control place;

The known arrival direction information representing and do not wish signal described at least some is provided to described gain control;

The audio signal at described gain control place is processed by the gain applying certain level, to generate for the gain control signal being sent to remote node, wherein applied gain level depends on comparing between the arrival direction information of described audio signal with described known arrival direction information.

2. method according to claim 1, multiple audio signal described in wherein said voice input device process comprises the single channel audio output signal of frame sequence to generate, described gain control processed in sequence frame described in each.

3. method according to claim 2, wherein receive the arrival direction information of the main signal composition of processed present frame at described gain control place, described method comprises further:

The arrival direction information of the described main signal composition of described present frame and described known arrival direction information are made comparisons.

4. method according to claim 3, comprises further: compare based on described the activity determining whether forbidding described gain control.

5. method according to claim 4, wherein said known arrival direction information is included at least one direction that described voice input device place receives remote signaling, described decision based on whether at described voice input device place from the described main signal composition receiving at least one direction described in remote signaling at described voice input device and receive described present frame.

6. the method according to any one in claim 4 or 5, wherein said known arrival direction information comprises at least one direction sorted out, and described decision receives the described main signal composition of described present frame based on whether from least one direction sorted out described at described voice input device place.

7. method according to claim 6, at least one direction sorted out wherein said is the direction that at least one undesirable audio signal arrives described voice input device, and identifies based on the characteristics of signals of at least one undesirable audio signal described.

8. the method according to any one in claim 4 ~ 7, wherein said known arrival direction information is included at least one principal direction that described voice input device place receives at least one audio signal described, and described decision is based on whether receiving the main signal composition of described present frame at described voice input device place from least one principal direction described.

9. method according to claim 8, at least one principal direction wherein said is by determining like this:

Determine the maximized time delay of cross-correlation between the described audio signal that makes to receive at described voice input device place; And

Detect receive at described voice input device place, with the characteristics of speech sounds in the described audio signal of the described time delay of maximum cross-correlation.

10. the method described in any one according to Claim 8 or 9, wherein said voice input device comprises Beam-former, and this Beam-former is configured to:

Estimate at least one principal direction described; And

Process described multiple audio signal, to pass through form wave beam and suppress the audio signal from other any direction except described principal direction substantially and generate described single channel audio output signal at least one principal direction described.

11. methods according to claim 10, wherein said known arrival direction information comprises the beam pattern of described Beam-former further.

12. methods according to any one in claim 4 ~ 11, if wherein compare from described the activity being defined as forbidding described gain control, then described gain control is configured to apply once to the gain of the certain level that the frame just in the pre-treatment of described present frame applies described processed present frame.

13. methods according to any one in claim 4 ~ 11, if wherein compare from described the activity being defined as forbidding described gain control, then described gain control is configured to depend on just in the signal level of the frame of the pre-treatment of described present frame, the gain by the impact of the change of the gain between restrained present frame and former frame, described present frame being applied to certain level.

14. methods according to any one in claim 4 ~ 11, if wherein compare from described the activity being defined as not forbidding described gain control, then described gain control is configured to the signal level of handled frame and just compares in the signal level of the frame of the pre-treatment of described present frame; And

If the signal level of described present frame is just higher in the signal level of the frame of the pre-treatment of present frame than described, then described gain control is configured to the level of reduction gain and the level of the gain of described reduction is applied to described present frame; And

If the signal level of described present frame is just lower in the signal level of the frame of the pre-treatment of present frame than described, then described gain control is configured to the level of increase gain and the level of the gain of described increase is applied to described present frame.

15. methods according to claim 1, wherein said voice input device comprises the first and second voice input devices, and described in each voice input device process, multiple audio signal is to generate delivery channel, and described method comprises further:

Each delivery channel at each gain control place is processed by the gain each delivery channel being applied to certain level, to generate the signal controlled for the first and second gains being sent to remote node, the level of wherein said gain depends on comparing between the described arrival direction information of described audio signal with described known arrival direction information, and all identical for each delivery channel.

16. the method according to the aforementioned claim of any one, comprises further: export in described communication session at the voice data that described subscriber equipment receives from described remote node from the audio output device of described subscriber equipment.

17. the method according to the aforementioned claim of any one, wherein said undesirable signal by the source of described subscriber equipment generate, described source comprise following at least one: the audio output device of described subscriber equipment; In the source of the activity at described subscriber equipment place, wherein said activity comprises click activity, and this click activity comprises button point and hits activity, keyboard click activity and Genius mouse click activity.

18. methods according to any one in claim 1 to 16, wherein said undesirable signal is generated by the source of described subscriber equipment outside.

19. methods according to the aforementioned claim of any one, at least one audio signal wherein said is the voice signal received at described voice input device place.

20. 1 kinds of subscriber equipmenies that the audio signal during communication session between subscriber equipment and remote node is processed, described user terminal comprises:

Voice input device, described voice input device receives multiple audio signal, and the plurality of audio signal comprises at least one audio signal and undesirable signal; And

Gain control, described gain control receives the arrival direction information of described audio signal and represents the known arrival direction information of not wishing signal described at least some, the gain that described gain control is configured to by applying certain level processes described audio signal, to generate the signal controlled for the gain being sent to remote node, wherein applied gain level depends on comparing between the described arrival direction information of described audio signal with described known arrival direction information.

21. 1 kinds of computer programs, comprise performed by the computer processor unit at subscriber equipment, to communication session during the computer-readable instruction that processes of audio signal between described subscriber equipment and remote node, described instruction comprises the instruction for implementing method according to claim 1.